Date: Mar 2023 - Aug 2023

Roles: Data Scientist, Data Engineer

This project was completed while I was a consultant at Melio AI, and some of the project details have been obfuscated.

Overview

Developed ETL pipelines and automated data quality monitoring for regulatory reporting to support compliance.

My Responsibilities

  • Built a data processing pipeline using Apache Airflow and Python to populate financial regulatory reports in Excel sheets from a Postgres SQL database.
  • Monitored data quality using Great Expectations for schema validation and statistical checks, with anomaly notifications via email.
  • Built a synthetic data generation pipeline using Python’s Faker to unblock development in the absence of real data.

Outcomes/Impact

  • Automated financial reporting, saving human time and resources.
  • Improved development efficiency with robust synthetic data for pipeline testing.

Tools Used

  • Python
  • Airflow
  • Docker
  • SQL
  • Postgres
  • Flyway