Build Data Pipelines with Apache Beam

Deliver Unified Batch and Streaming Pipelines for Real-World Production Across Dataflow, Flink, and Spark

Course Curriculum

11 Chapters · 11 Lessons

1
Free Preview
1. Free Preview Free preview
2
Chapter 1: Introduction to Apache Beam and Data Processing
1. (Included in full purchase)
  Introduction to Apache Beam and Data Processing
3
Chapter 2: Stateful and Stateless Processing with Apache Beam
1. (Included in full purchase)
  Stateful and Stateless Processing with Apache Beam
4
Chapter 3: Handling Event Time, Windows, and Triggers
1. (Included in full purchase)
  Handling Event Time, Windows, and Triggers
5
Chapter 4: Building Pipelines with Apache Beam
1. (Included in full purchase)
  Building Pipelines with Apache Beam
6
Chapter 5: Transformations and Coders in Apache Beam
1. (Included in full purchase)
  Transformations and Coders in Apache Beam
7
Chapter 6: Advanced Pipeline Optimization Techniques
1. (Included in full purchase)
  Advanced Pipeline Optimization Techniques
8
Chapter 7: Deploying Apache Beam Pipelines on Different Runners
1. (Included in full purchase)
  Deploying Apache Beam Pipelines on Different Runners
9
Chapter 8: Monitoring, Debugging, and Tuning Apache Beam Pipelines
1. (Included in full purchase)
  Monitoring, Debugging, and Tuning Apache Beam Pipelines
10
Chapter 9: Case Studies: Apache Beam in the Real World
1. (Included in full purchase)
  Case Studies: Apache Beam in the Real World
11
Index
1. (Included in full purchase)
  Index

About the Course

Building Data Pipelines Using Apache Beam provides a practical, production-focused guide to using Beam’s unified programming model to write processing logic once, and run it across multiple runners, without rewriting core code. The book begins with the fundamentals of distributed data processing and Beam’s core abstractions—PCollections, transforms, and pipeline design. You will then progress into stateful and stateless processing, event-time semantics, windows, triggers, watermarks, state, and timers—building the mental models required to reason about correctness at scale. From there, the book moves into advanced transformations, coders, and optimization techniques to help you improve performance, control costs, and ensure reliability. In the later chapters, you will learn how to deploy pipelines across runners such as Dataflow, Flink, and Spark, monitor and debug production workloads, and apply the best practices drawn from real-world case studies. Thus, by the end of the book, you will be able to design, deploy, and operate robust, portable, production-grade data pipelines with confidence.

About the Author

Nuzhi Meyen is a fintech entrepreneur, data scientist, and AI practitioner, Co-Founder and CEO of Helios P2P. He builds production-grade AI, analytics, and blockchain systems for lending and credit risk. With advanced degrees and strong community contributions, he bridges theory and practice to deliver scalable, real-world financial technology solutions.

Building Data Pipelines Using Apache Beam

Course Curriculum

Free Preview

Chapter 1: Introduction to Apache Beam and Data Processing

Chapter 2: Stateful and Stateless Processing with Apache Beam

Chapter 3: Handling Event Time, Windows, and Triggers

Chapter 4: Building Pipelines with Apache Beam

Chapter 5: Transformations and Coders in Apache Beam

Chapter 6: Advanced Pipeline Optimization Techniques

Chapter 7: Deploying Apache Beam Pipelines on Different Runners

Chapter 8: Monitoring, Debugging, and Tuning Apache Beam Pipelines

Chapter 9: Case Studies: Apache Beam in the Real World

Index

About the Course

About the Author