Apache Cassandra for Data Science – Live Course Module
Total Duration: 20 Hours (3 Weeks)
Week 1: Foundations of Cassandra
Goal: Understand Cassandra basics, architecture, and data modeling.
-
Session 1 – Introduction & Use Cases (2 hrs)
-
What is Apache Cassandra & why it matters in Data Science
-
Key features (scalability, fault tolerance, high availability)
-
Real-world case studies (IoT, recommendation systems, financial fraud detection)
-
Hands-on: Setting up Cassandra locally or on the cloud
-
-
Session 2 – Cassandra Architecture (2 hrs)
-
Peer-to-peer architecture
-
Partitioning, replication, and consistency
-
Write/read path explained
-
Lab: Inspecting system tables & cluster metadata
-
-
Session 3 – Data Modeling Basics (2 hrs)
-
Query-driven modeling approach
-
Primary keys, clustering columns, partitioning
-
Hands-on: Designing schema for an e-commerce dataset
-
Week 2: Working with Data
Goal: Learn CQL, CRUD operations, and connecting Cassandra with Python.
-
Session 4 – Cassandra Query Language (CQL) (2 hrs)
-
Basics of CQL syntax
-
CRUD operations (Insert, Select, Update, Delete)
-
Filtering, indexing, and query limitations
-
Lab: Running sample queries on datasets
-
-
Session 5 – Advanced Data Modeling (2 hrs)
-
Denormalization in Cassandra
-
Time-series data modeling
-
Best practices for analytical workloads
-
Lab: Designing schema for clickstream data
-
-
Session 6 – Python Integration (2 hrs)
-
Connecting Cassandra with Python (cassandra-driver)
-
Using Cassandra data with Pandas
-
Simple EDA (Exploratory Data Analysis) on Cassandra datasets
-
Hands-on: Analyzing customer transactions
-
Week 3: Analytics, Scaling & Projects
Goal: Apply Cassandra in real-world data science projects.
-
Session 7 – Cassandra + Spark for Data Science (2 hrs)
-
Introduction to Spark-Cassandra connector
-
Running distributed queries
-
Performing ML tasks using Spark + Cassandra
-
Lab: Simple clustering/classification use case
-
-
Session 8 – Performance & Optimization (2 hrs)
-
Read/write optimization strategies
-
Compaction & caching
-
Scaling Cassandra clusters
-
Lab: Tuning queries & benchmarking
-
-
Session 9 – Capstone Project (2 hrs)
-
Use Case: Building a real-time analytics pipeline
-
Store clickstream/user logs in Cassandra
-
Query + analyze with Python/Spark
-
Visualization & insights presentation
-
-
Session 10 – Wrap-Up & Q&A (1.5 hrs)
-
Recap of all concepts
-
Best practices for Cassandra in data science
-
Next steps & advanced resources
-
Reviews
There are no reviews yet.