论文标题
用于特征选择的合成数据
Synthetic Data for Feature Selection
论文作者
论文摘要
特征选择是机器学习和数据科学研究的重要而积极的研究领域。本文我们的目标是提出一系列合成数据集,这些数据集可以用作特征选择算法的常见参考点。合成数据集允许精确评估所选特征和控制数据参数以进行全面评估。所提出的数据集基于从电子设备到模拟现实生活中的应用程序的应用。为了说明所提出的数据的实用性,我们采用了其中一个数据集来测试几种流行的特征选择算法。该数据集可在GitHub上公开可用,研究人员可以用来评估特征选择算法。
Feature selection is an important and active field of research in machine learning and data science. Our goal in this paper is to propose a collection of synthetic datasets that can be used as a common reference point for feature selection algorithms. Synthetic datasets allow for precise evaluation of selected features and control of the data parameters for comprehensive assessment. The proposed datasets are based on applications from electronics in order to mimic real life scenarios. To illustrate the utility of the proposed data we employ one of the datasets to test several popular feature selection algorithms. The datasets are made publicly available on GitHub and can be used by researchers to evaluate feature selection algorithms.