For the Big Data Mining SIGKDD Explorations Dec 2012, we selected four contributions that together show very significant state-of-the-art research in Big Data Mining, and that provide a broad overview of the field and a forecast to the future.
- Scaling Big Data Mining Infrastructure: The Twitter Experience by Jimmy Lin and Dmitriy Ryaboy (Twitter, Inc.). This paper presents insights about Big Data mining infrastructures, and the experience of doing analytics at Twitter. It shows that due to the current state of the data mining tools, it is not straightforward to perform analytics. Most of the time is consumed in preparatory work to the application of data mining methods, and turning preliminary models into robust solutions.
- Mining Heterogeneous Information Networks: A Structural Analysis Approach by Yizhou Sun (Northeastern University) and Jiawei Han (University of Illinois at Urbana-Champaign). This paper shows that mining heterogeneous information networks is a new and promising research frontier in Big Data mining research. It considers interconnected, multi-typed data, including the typical relational database data, as heterogeneous information networks. These semi-structured heterogeneous information network models leverage the rich semantics of typed nodes and links in a network and can uncover surprisingly rich knowledge from interconnected data.
- Big Graph Mining: Algorithms and discoveries by U Kang and Christos Faloutsos(Carnegie Mellon University). This paper presents an overview of mining big graphs, focusing on the use of the Pegasus tool, showing some findings in the Web Graph and Twitter social networks. The paper gives inspirational future research directions for big graph mining.
- Mining Large Streams of User Data for Personalized Recommendations by Xavier Amatriain (Netflix). This paper presents some lessons learned with the Netflix Prize, and discusses the recommender and personalization techniques used in Netflix. It discusses recent important problems and future research directions. Section 4 contains an interesting discussion about if we need more data or better models to improve our learning methodology.