The course will present algorithms for data analysis and mining while focusing on mining massive datasets. It will focus on both practical and theoretical aspects of data mining. During the course, the students will become familiar with the most successful algorithms for classification, clustering, mining frequent itemsets, and other machine learning/data mining technologies. Students will work on a small project where they will implement some of the algorithms and analyze real-world data.
Prerequisite: – Foundations of algorithms and data structures
– Knowledge of Java programming language
– Good programming skills
Lecture Slides
- 1. Introduction to Big Data Slides
- 2. Data pre-processing and Classifier Evaluation Slides
- 3. Apache Spark Slides
- 4. Classification Slides
- 5. Apache Spark ML Lab Slides
- 6. Clustering Slides
- 7. Frequent Pattern Mining Slides
- 8. Apache Spark ML Lab 2 Slides
- 9. Apache Spark ML Lab 3 Slides
- 9 November 2016: no new session lab