Course taught in the CES Data Scientist 2017-2018 program.


This module will present concepts, architectures and algorithms for big data processing and analytics, at a very large scale, in distributed settings. The following topics will be covered:

  • Apache Hadoop
  • Apache Spark
  • Data Stream Processing

A strong focus will be given to labs in this class, so that students can gather enough experience with different existing systems, and understand their respective advantages. The architecture of all distributed computing systems will be discussed in detail during lectures.


  • Lab Assignments due June 15, 2018. Submission (dbc and html file) through Moodle or this link.

Lecture Slides