Live Course Module: Docker Course for Data Engineering
Total Duration: 40 Hours (5 Weeks)
Week 1: Introduction to Containers & Docker Basics (Beginner)
Sessions: 2 × 3–4 hours
-
Introduction to Containerization
-
Virtualization vs. containerization
-
Benefits for data engineering pipelines
-
-
Docker Architecture Overview
-
Docker Engine, CLI, Daemon
-
Images, Containers, Registries
-
-
Installing Docker
-
Docker Desktop / Docker Engine
-
Basic commands:
docker run
,docker ps
,docker stop
,docker rm
-
-
Running Your First Container
-
Explore container lifecycle
-
Hello World, Python container examples
-
-
Hands-on Lab:
-
Run multiple containers, inspect logs, and clean up containers
-
Week 2: Docker Images, Dockerfile & Basic Pipelines (Beginner → Intermediate)
Sessions: 2 × 3–4 hours
-
Docker Images Basics
-
Pull, tag, inspect, remove images
-
Docker Hub and public/private images
-
-
Dockerfile Fundamentals
-
Commands: FROM, RUN, COPY, CMD, EXPOSE
-
Build reproducible environments
-
-
Building Custom Images
-
Python, Pandas, Spark images for data pipelines
-
-
Image Optimization
-
Layering, caching, reducing image size
-
-
Hands-on Lab:
-
Build a custom Python ETL image
-
Run data ingestion script inside container
-
Week 3: Networking, Volumes & Docker Compose (Intermediate)
Sessions: 2 × 3–4 hours
-
Docker Networking Basics
-
Bridge, Host, None networks
-
Container-to-container communication
-
-
Persistent Storage
-
Volumes vs. bind mounts
-
Sharing and persisting data across containers
-
-
Docker Compose Fundamentals
-
Multi-container orchestration with
docker-compose.yml
-
Environment variables & secrets management
-
-
Data Engineering Pipelines with Compose
-
Example: Kafka → Spark → PostgreSQL
-
Scaling services
-
-
Hands-on Lab:
-
Deploy a mini pipeline using Docker Compose
-
Week 4: Logging, Monitoring, Security & Private Registries (Intermediate → Advanced)
Sessions: 2 × 3–4 hours
-
Container Logging
-
Log drivers, logging best practices
-
Collecting logs for ETL processes
-
-
Monitoring Containers
-
Introduction to Prometheus and Grafana
-
Monitoring resource usage of containers
-
-
Security Best Practices
-
Secure images, scan vulnerabilities
-
User permissions, secrets, and environment management
-
-
Private Registries
-
Push/pull images to AWS ECR, Azure ACR, Docker Hub private
-
-
Hands-on Lab:
-
Secure and monitor Spark + PostgreSQL container setup
-
Week 5: CI/CD, Kubernetes Intro & Capstone Project (Advanced)
Sessions: 2 × 3–4 hours
-
Docker in CI/CD Pipelines
-
Integrate Docker with Jenkins, GitHub Actions, Airflow
-
-
Introduction to Kubernetes for Data Engineers
-
Pods, Deployments, Scaling containers
-
When to move from Docker Compose to Kubernetes
-
-
Capstone Project: Containerized ETL Pipeline
-
Airflow + Spark + PostgreSQL + MinIO
-
Multi-stage deployment using Docker images
-
-
Project Review & Presentations
-
Peer review and instructor feedback
-
Best practices recap, Q&A
-
Key Learning Outcomes After 5 Weeks
-
Master Docker architecture, containers, images, and Dockerfiles.
-
Build and manage multi-container data pipelines using Docker Compose.
-
Implement persistent storage, networking, logging, and monitoring.
-
Apply container security best practices.
-
Integrate Docker with CI/CD pipelines.
-
Gain a foundational understanding of Kubernetes for scaling data workflows.
-
Deploy a real-world containerized data engineering pipeline as a capstone project.
Reviews
There are no reviews yet.