Live Course Module: Kubernetes Course for Data Engineering
Total Duration: 40 Hours (5 Weeks)
Week 1: Introduction to Kubernetes & Container Orchestration (Beginner)
Sessions: 2 × 3–4 hours
-
0:00 – 0:45: Introduction to Container Orchestration
-
Why orchestration is needed in Data Engineering
-
Kubernetes vs Docker Swarm vs traditional methods
-
-
0:45 – 1:30: Kubernetes Architecture Overview
-
Master node, worker nodes
-
Components: API Server, Controller Manager, Scheduler, etcd, kubelet, kube-proxy
-
-
1:30 – 2:15: Installing & Configuring Kubernetes
-
Minikube / Kind / K3s setup
-
kubectl installation and configuration
-
-
2:15 – 3:00: Kubernetes Objects Basics
-
Pods, ReplicaSets, Deployments
-
Labels and annotations
-
-
3:00 – 4:00: Hands-on Lab
-
Deploy first Pod and Deployment
-
Explore
kubectl
commands
-
Week 2: Pods, Services, ConfigMaps & Secrets (Beginner → Intermediate)
Sessions: 2 × 3–4 hours
-
0:00 – 0:45: Understanding Pods in Depth
-
Pod lifecycle, multi-container pods
-
Init containers and sidecar containers
-
-
0:45 – 1:30: Services and Networking
-
ClusterIP, NodePort, LoadBalancer, Ingress basics
-
Service discovery and DNS
-
-
1:30 – 2:15: ConfigMaps and Secrets
-
Externalizing configuration
-
Managing sensitive data securely
-
-
2:15 – 3:00: Persistent Storage
-
Volumes, PersistentVolume (PV) and PersistentVolumeClaim (PVC)
-
Storage classes for dynamic provisioning
-
-
3:00 – 4:00: Hands-on Lab
-
Deploy a Pod with ConfigMaps and Secrets
-
Connect Pod to persistent storage
-
Week 3: Deployments, Scaling & Scheduling (Intermediate)
Sessions: 2 × 3–4 hours
-
0:00 – 0:45: Deployments & ReplicaSets
-
Rolling updates, rollbacks, scaling strategies
-
-
0:45 – 1:30: Horizontal & Vertical Pod Autoscaling
-
HPA, VPA basics
-
Resource requests & limits
-
-
1:30 – 2:15: Namespaces & Resource Quotas
-
Organizing clusters for multiple teams
-
Limiting resource usage
-
-
2:15 – 3:00: Scheduling & Affinity Rules
-
NodeSelector, Node Affinity, Pod Affinity/Anti-affinity
-
Taints and Tolerations
-
-
3:00 – 4:00: Hands-on Lab
-
Deploy a scalable multi-Pod application
-
Implement HPA and resource limits
-
Week 4: Helm, Config Management & Monitoring (Intermediate → Advanced)
Sessions: 2 × 3–4 hours
-
0:00 – 0:45: Introduction to Helm
-
Helm charts, releases, repositories
-
Deploying applications via Helm
-
-
0:45 – 1:30: Advanced Configuration Management
-
Secrets management with KMS or HashiCorp Vault
-
Using ConfigMaps for dynamic configuration
-
-
1:30 – 2:15: Monitoring & Logging
-
Prometheus & Grafana for Kubernetes
-
EFK/PLG stack for logs
-
-
2:15 – 3:00: Security Best Practices
-
RBAC, Service Accounts
-
Pod Security Policies, Network Policies
-
-
3:00 – 4:00: Hands-on Lab
-
Deploy a Helm chart for a data application (e.g., Spark)
-
Implement monitoring and RBAC
-
Week 5: Kubernetes for Data Engineering Pipelines (Advanced)
Sessions: 2 × 3–4 hours
-
0:00 – 0:45: Kubernetes for Data Engineering Workloads
-
Running Spark, Flink, Kafka on Kubernetes
-
StatefulSets for databases
-
-
0:45 – 1:30: CI/CD Pipelines with Kubernetes
-
Integrate with Jenkins, GitHub Actions
-
Deploy containerized ETL pipelines
-
-
1:30 – 2:15: Real-World Data Engineering Project
-
Containerized Spark ETL with PostgreSQL or MinIO
-
Deploy using Deployments, Services, PVCs
-
-
2:15 – 3:00: Scaling & High Availability
-
Cluster autoscaling
-
Multi-node deployments for fault tolerance
-
-
3:00 – 4:00: Capstone Project Review & Q&A
-
Presentations and feedback
-
Best practices recap
-
Key Learning Outcomes
-
Understand Kubernetes architecture and key components.
-
Deploy, scale, and manage containerized data applications.
-
Manage configuration, secrets, and persistent storage.
-
Implement monitoring, logging, and security best practices.
-
Deploy real-world data engineering pipelines (Spark, Kafka, PostgreSQL, MinIO) on Kubernetes.
-
Integrate Kubernetes workloads into CI/CD pipelines and prepare for production deployment.
Reviews
There are no reviews yet.