In this paper we present the Vertical Hoeffding Tree (VHT), the first distributed streaming algorithm for learning decision trees. It features a novel way of distributing decision trees via vertical parallelism. The algorithm is implemented on top of Apache SAMOA, a platform for mining big data streams, and thus able to run on real-world clusters. Our experiments to study the accuracy and throughput of VHT prove its ability to scale while attaining superior performance compared to sequential decision trees.
Nicolas Kourtellis, Gianmarco De Francisci Morales, Albert Bifet, Arinto Murdopo: VHT: Vertical hoeffding tree. BigData 2016: 915-922