Evolving Data Stream Classification and the Illusion of Progress


Data is being generated in real-time in increasing quantities and the distribution generating this data may be changing and evolving. In a paper presented at ECML-PKDD 2013 titled “Pitfalls in benchmarking data stream classification and how to avoid them“, we show that classifying data streams has an important temporal component, which we are currently not considering in the evaluation of data-stream classifiers. In this paper we show how a very simple classifier that considers this temporal component, the non-change classifier that predicts only using the last class seen by the classifier, can outperform current state-of-the-art classifiers in some real-world datasets. We propose to evaluate data streams considering this temporal component, using a new evaluation measure, which provides a more accurate gauge of classifier performance.

Comments are closed.