The course will present algorithms for data analysis and mining while focusing on mining massive datasets. It will focus on both practical and theoretical aspects of data mining. During the course, the students will become familiar with the most successful algorithms for classification, clustering, mining frequent itemsets, and other machine learning/data mining technologies. Students will work on a small project where they will analyze real-world data.
Prerequisite: – Foundations of algorithms and data structures
– Knowledge of Java programming language
– Good programming skills
Instructors: Mostafa H. Chehreghani (Télécom ParisTech) and Albert Bifet (Télécom ParisTech)
Lecture Slides
- 1. Introduction to Big Data Slides
- 2. Clustering Slides Birch Slides
- 3. Classifier Evaluation Slides
- 4. Classification Slides
- 5. Data Preparation Slides
- 6. Apache Spark Slides
- 7. Apache Spark ML Lab Slides
- 8. Link Analysis: PageRank and HITS Slides
- 9. Community Structure in Networks Slides
- 10. Data Stream Mining Slides
- 11. Apache Spark ML Lab 2 Slides
- 12. Apache Spark ML Lab 3 Slides – Submission (10/11/2017)