论文标题

Stream-Learn-开源Python库,用于困难数据流批处理分析

stream-learn -- open-source Python library for difficult data stream batch analysis

论文作者

Ksieniewicz, Paweł, Zyblewski, Paweł

论文摘要

Stream-Learn是与Scikit-Learn兼容的Python软件包,并开发用于漂移和不平衡的数据流分析。它的主要组件是流生成器,它允许产生一个合成数据流,该数据流可以包含三种主要概念漂移类型(即突然,逐渐和增量漂移)中的每一种,它们的经常性或非经常版本。该软件包允许在既定的评估方法(即测试训练和术前进行测试)进行实验。此外,已经实施了适合数据流分类的估计器,包括简单的分类器和基于最先进的基于块的和在线分类器集合。为了提高计算效率,软件包利用其自己的预测指标实现,用于不平衡的二进制分类任务。

stream-learn is a Python package compatible with scikit-learn and developed for the drifting and imbalanced data stream analysis. Its main component is a stream generator, which allows to produce a synthetic data stream that may incorporate each of the three main concept drift types (i.e. sudden, gradual and incremental drift) in their recurring or non-recurring versions. The package allows conducting experiments following established evaluation methodologies (i.e. Test-Then-Train and Prequential). In addition, estimators adapted for data stream classification have been implemented, including both simple classifiers and state-of-art chunk-based and online classifier ensembles. To improve computational efficiency, package utilises its own implementations of prediction metrics for imbalanced binary classification tasks.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源