UPDF AI

Building Scalable Serverless Data Pipelines on AWS using Medallion Architecture and Delta Lake CDC

Janaki Ganapathi

2025 · DOI: 10.29322/ijsrp.15.05.2025.p16106
International journal of scientific and research publications · 0 Citations

TLDR

The Medallion Architecture proposes a structured layering system—Bronze, Silver, and Gold—to improve lineage, consistency, and analytical readiness, and empowers elastic scalability, minimal operational overhead, and event-driven orchestration with serverless services.

Abstract

  • The evolution of cloud-native ecosystems has driven the need for scalable, resilient, and efficient data architectures. Traditional data lakes suffer from issues like schema drift, ingestion bottlenecks, and data quality degradation over time. The Medallion Architecture proposes a structured layering system—Bronze (raw), Silver (cleansed), and Gold (business aggregates)—to improve lineage, consistency, and analytical readiness. Integrating this model with AWS serverless services, including Amazon S3, Lambda, Glue, Step Functions, and Athena, empowers elastic scalability, minimal operational overhead, and event-driven orchestration. Furthermore, Delta Lake introduces ACID transactions and Change Data Capture (CDC) capabilities into S3-based storage, enabling efficient incremental updates without complete table rewrites. This paper presents a full architecture leveraging these models to construct modern serverless data pipelines, discusses implementation details, analyzes real-world case studies, and evaluates performance improvements. Observed benefits include up to 70% reduction in ETL latency, 60% cost savings, and near real-time analytics for high-volume enterprise data pipelines.