Home

I am Professor at LTCI, Telecom ParisTech, Head of the Data, Intelligence and Graphs (DIG) Group at Telecom ParisTech, and Scientific Collaborator at Ecole Polytechnique. My research focuses on Machine Learning for Data Streams, Big Data Machine Learning and Artificial Intelligence. Problems I investigate are motivated by large scale data, the Internet of Things (IoT), and Big Data Science.

I am also co-leading the open source projects MOA Massive On-line Analysis and Apache SAMOA Scalable Advanced Massive Online Analysis.

What’s new

Machine Learning for Data Streams: with Practical Examples in MOA

  • Series: Adaptive Computation and Machine Learning series
  • Hardcover: 288 pages
  • Publisher: The MIT Press (March 2, 2018)
  • Language: English
  • ISBN-10: 0262037793
  • ISBN-13: 978-0262037792

Today many information sources—including sensor networks, financial markets, social networks, and healthcare monitoring—are so-called data streams, arriving sequentially and at high speed. Analysis must take place in real time, with partial data and without the capacity to store the entire data set. This book presents algorithms and techniques used in data stream mining and real-time analytics. Taking a hands-on approach, the book demonstrates the techniques using MOA (Massive Online Analysis), a popular, freely available open-source software framework, allowing readers to try out the techniques after reading the explanations.

Latest publications

  • Albert Bifet, Jiajin Zhang, Wei Fan, Cheng He, Jianfeng Zhang, Jianfeng Qian, Geoff Holmes, Bernhard Pfahringer: Extremely Fast Decision Tree Mining for Evolving Data Streams. KDD 2017: 1733-1742
  • Albert Bifet: Classifier Concept Drift Detection and the Illusion of Progress. ICAISC (2) 2017: 715-725
  • Heitor Murilo Gomes, Albert Bifet, Jesse Read, Jean Paul Barddal, Fabrício Enembreck, Bernhard Pfharinger, Geoff Holmes, Talel Abdessalem: Adaptive random forests for evolving data stream classification. Machine Learning, Springer, 2017.
  • Heitor Murilo Gomes, Jean Paul Barddal, Fabrício Enembreck, and Albert Bifet: A Survey on Ensemble Learning for Data Stream Classification. ACM Comput. Surv. 50, 2, Article 23 (March 2017), 36 pages.
  • Diego Marron, Jesse Read, Albert Bifet, Nacho Navarro: Data stream classification using random feature functions and novel method combinations. Journal of Systems and Software 127: 195-204 (2017)

Publications

Contact

Albert BIFET

LTCI, Télécom ParisTech
Data, Intelligence and Graphs Team
Office: C201-2
46 rue Barrault
75634 Paris Cedex 13, FRANCE

LIX, École Polytechnique
Bâtiment Alan Turing, Office 1103
1 rue Honoré d’Estienne d’Orves
91120 Palaiseau
FRANCE

University of Waikato
Department of Computer Science – Tari Rorohiko
Machine Learning Research Group
Private Bag 3105
Hamilton 3240, New Zealand

E-mail: albert at albertbifet dot com

Twitter: @abifet

LinkedIn: abifet

Activities

Research

Invited Talks

Professional activities

Books

Machine Learning for Data Streams: with Practical Examples in MOA

Today many information sources—including sensor networks, financial markets, social networks, and healthcare monitoring—are so-called data streams, arriving sequentially and at high speed. Analysis must take place in real time, with partial data and without the capacity to store the entire data set. This book presents algorithms and techniques used in data stream mining and real-time analytics. Taking a hands-on approach, the book demonstrates the techniques using MOA (Massive Online Analysis), a popular, freely available open-source software framework, allowing readers to try out the techniques after reading the explanations.

The book first offers a brief introduction to the topic, covering big data mining, basic methodologies for mining data streams, and a simple example of MOA. More detailed discussions follow, with chapters on sketching techniques, change, classification, ensemble methods, regression, clustering, and frequent pattern mining. Most of these chapters include exercises, an MOA-based lab session, or both. Finally, the book discusses the MOA software, covering the MOA graphical user interface, the command line, use of its API, and the development of new methods within MOA. The book will be an essential reference for readers who want to use data stream mining as a tool, researchers in innovation or data stream mining, and programmers who want to create new algorithms for MOA.

  • Series: Adaptive Computation and Machine Learning series
  • Hardcover: 288 pages
  • Publisher: The MIT Press (March 2, 2018)
  • Language: English
  • ISBN-10: 0262037793
  • ISBN-13: 978-0262037792

Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams

This book is a significant contribution to the subject of mining time-changing data streams and addresses the design of learning algorithms for this purpose. It introduces new contributions on several different aspects of the problem, identifying research opportunities and increasing the scope for applications. It also includes an in-depth study of stream mining and a theoretical analysis of proposed methods and algorithms.

The first section is concerned with the use of an adaptive sliding window algorithm (ADWIN). Since this has rigorous performance guarantees, using it in place of counters or accumulators, it offers the possibility of extending such guarantees to learning and mining algorithms not initially designed for drifting data. Testing with several methods, including Naïve Bayes, clustering, decision trees and ensemble methods, is discussed as well.

The second part of the book describes a formal study of connected acyclic graphs, or ‘trees’, from the point of view of closure-based mining, presenting efficient algorithms for subtree testing and for mining ordered and unordered frequent closed trees.

Lastly, a general methodology to identify closed patterns in a data stream is outlined. This is applied to develop an incremental method, a sliding-window based method, and a method that mines closed trees adaptively from data streams. These are used to introduce classification methods for tree data streams.

 

  • Series: Frontiers in Artificial Intelligence and Applications (Book 207)
  • Hardcover: 224 pages
  • Publisher: IOS Press (February 15, 2010)
  • Language: English
  • ISBN-10: 1607500906
  • ISBN-13: 978-1607500902