Live Course Module: Apache Flink Course for Data Engineering
Total Duration: 40 Hours (5 Weeks)
Week 1: Introduction to Real-Time Data Processing and Apache Flink
Total Time: 8 hours
-
Introduction to Stream Processing (1 hr)
-
Batch vs Stream Processing
-
Use cases of stream processing in Data Engineering
-
-
Overview of Apache Flink (1.5 hrs)
-
What is Apache Flink?
-
Core features and architecture
-
Comparison with Spark Streaming and Kafka Streams
-
-
Flink Ecosystem and Components (1.5 hrs)
-
Flink Runtime, APIs, and Connectors
-
Job Manager and Task Manager
-
-
Setting up Flink Environment (2 hrs – Lab)
-
Installing and configuring Flink locally or on cloud (AWS/GCP)
-
Running sample streaming and batch jobs
-
-
Hands-On & Assignment (2 hrs)
-
Simple WordCount example in Flink
-
Assignment: Run and monitor a Flink job using the web UI
-
Week 2: Flink Programming Model – DataStream API and DataSet API
Total Time: 8 hours
-
Understanding Flink Data Model (1 hr)
-
DataSet API vs DataStream API
-
Execution model and transformations
-
-
Flink DataStream API (2 hrs)
-
Basic operations: map, filter, keyBy, reduce
-
Windows and triggers
-
-
Flink DataSet API for Batch Processing (1.5 hrs)
-
Batch transformations and aggregations
-
Integrating batch with streaming pipelines
-
-
Stateful Stream Processing (1.5 hrs)
-
Operator state and keyed state
-
State backends and fault tolerance
-
-
Hands-On & Assignment (2 hrs)
-
Build a data transformation pipeline using DataStream API
-
Assignment: Create a real-time data aggregation job
-
Week 3: Time, Windows, and Event Processing in Flink
Total Time: 8 hours
-
Event Time vs Processing Time (1.5 hrs)
-
Understanding time semantics
-
Watermarks and lateness handling
-
-
Windowing Concepts (2 hrs)
-
Tumbling, Sliding, and Session windows
-
Aggregations and custom windows
-
-
Event Processing Patterns (1.5 hrs)
-
Handling out-of-order and late data
-
CEP (Complex Event Processing) overview
-
-
Hands-On & Assignment (3 hrs)
-
Implement time-based window aggregations
-
Assignment: Detect specific event patterns using Flink CEP
-
Week 4: Flink Connectors, State Management, and Checkpointing
Total Time: 8 hours
-
Integrating Flink with Data Sources and Sinks (2 hrs)
-
Kafka, Kinesis, and FileSystem connectors
-
Writing to databases and message queues
-
-
State Management in Flink (1.5 hrs)
-
State storage and recovery
-
RocksDB state backend
-
-
Fault Tolerance and Checkpointing (1.5 hrs)
-
Checkpointing and savepoints
-
Restart strategies and job recovery
-
-
Hands-On & Assignment (3 hrs)
-
Build a streaming pipeline from Kafka → Flink → S3
-
Assignment: Implement checkpointing and verify fault recovery
-
Week 5: Advanced Topics, Deployment, and Capstone Project
Total Time: 8 hours
-
Flink SQL and Table API (2 hrs)
-
Writing SQL queries on streams
-
Integrating Flink SQL with Kafka and Hive
-
-
Performance Optimization and Monitoring (1.5 hrs)
-
Parallelism and task slot tuning
-
Metrics and dashboards
-
-
Deployment and Integration (1.5 hrs)
-
Running Flink on Kubernetes, YARN, and AWS EMR
-
CI/CD and production best practices
-
-
Capstone Project (3 hrs)
-
End-to-end real-time data pipeline
-
Kafka → Flink → PostgreSQL or Data Lake
-
Data cleaning, transformation, and aggregation
-
🧩 Final Deliverables
-
Mini Projects: 3 (DataStream API, Window Processing, Kafka Integration)
-
Capstone Project: 1 Real-Time Data Engineering Pipeline
-
Assessments: Weekly quizzes + project review
-
Tools Covered: Apache Flink, Kafka, S3, Hive, RocksDB, Kubernetes
Reviews
There are no reviews yet.