Course taught in the Data and Knowledge 2nd year Master Program of Université Paris Saclay 2018-2019
This module will present concepts, architectures and algorithms for IoT big data processing and analytics, at a very large scale, in distributed settings. The following topics will be covered:
- Apache Hadoop
- Apache Spark
- Apache Flink
- Apache Beam/Google Cloud DataFlow
- Apache Storm
- Lambda and Kappa Architectures
A strong focus will be given to labs in this class, so that students can gather enough experience with different existing systems, and understand their respective advantages. The architecture of all distributed computing systems will be discussed in detail during lectures.
Evaluation:
- 1/3 Lab Assignments
- 2/3 Final Test
Lecture Slides
-
-
- 1. Introduction to MapReduce Slides
- 2. MapReduce Lab Slides (due 3th October 2018) Submission
- 3. Apache Spark Slides
- 4. Apache Spark Lab Slides (due 17th October 2018) Submission
- 5. Apache Spark 2 Slides 1 – Slides 2
- 6. Apache Kafka, Storm and Samza Slides
- 7. Apache Flink Slides
- 8. Apache Spark Lab 2 Notebook 1 – Notebook 2 (due 8th November 2018) Submission
- 9. Google Cloud Data Flow and Apache Beam Slides Slides 2
- 10. TensorFlow Slides. Keras Slides
-