Date: Mar 2023 - Aug 2023
Roles: Data Scientist, Data Engineer
This project was completed while I was a consultant at Melio AI, and some of the project details have been obfuscated.
Overview
Developed ETL pipelines and automated data quality monitoring for regulatory reporting to support compliance.
My Responsibilities
- Built a data processing pipeline using Apache Airflow and Python to populate financial regulatory reports in Excel sheets from a Postgres SQL database.
- Monitored data quality using Great Expectations for schema validation and statistical checks, with anomaly notifications via email.
- Built a synthetic data generation pipeline using Python’s Faker to unblock development in the absence of real data.
Outcomes/Impact
- Automated financial reporting, saving human time and resources.
- Improved development efficiency with robust synthetic data for pipeline testing.
Tools Used
- Python
- Airflow
- Docker
- SQL
- Postgres
- Flyway