MLC数据的公平化

论文标题

MLC数据的公平化

FAIRification of MLC data

论文作者

Kostovska, Ana, Bogatinovski, Jasmin, Treven, Andrej, Džeroski, Sašo, Kocev, Dragi, Panov, Panče

论文摘要

多标签分类（MLC）任务越来越多地从机器学习（ML）社区中获得了兴趣，这证明了文献中出现的论文和方法的越来越多。因此，确保正确，正确，健壮和值得信赖的基准测试对于该领域的进一步发展至关重要。我们认为，这可以通过遵守最近出现的数据管理标准来实现，例如公平（可访问，可访问，可互操作和可重复使用）和信任（透明度，责任，用户重点，可持续性和技术）原则。为了公开MLC数据集，我们介绍了遵循这些原理的基于本体的在线目录的MLC数据集。该目录广泛地描述了许多具有可理解的元用功能，MLC特定语义描述和不同数据出处信息的MLC数据集。 MLC数据目录在我们最近在《大自然科学报告》（Kostovska＆Bogatinovski et al。）中的最新出版物中得到了广泛的描述，并在以下网址提供：http：//semantichub.ijs.si/mlcdatasets。此外，我们还提供了一个基于本体的系统，可轻松访问和查询从一项全面的MLC基准研究中获得的性能/基准数据。该系统可在以下网址获得：http：//semantichub.ijs.s.s.si/mlcbenchmark。

The multi-label classification (MLC) task has increasingly been receiving interest from the machine learning (ML) community, as evidenced by the growing number of papers and methods that appear in the literature. Hence, ensuring proper, correct, robust, and trustworthy benchmarking is of utmost importance for the further development of the field. We believe that this can be achieved by adhering to the recently emerged data management standards, such as the FAIR (Findable, Accessible, Interoperable, and Reusable) and TRUST (Transparency, Responsibility, User focus, Sustainability, and Technology) principles. To FAIRify the MLC datasets, we introduce an ontology-based online catalogue of MLC datasets that follow these principles. The catalogue extensively describes many MLC datasets with comprehensible meta-features, MLC-specific semantic descriptions, and different data provenance information. The MLC data catalogue is extensively described in our recent publication in Nature Scientific Reports, Kostovska & Bogatinovski et al., and available at: http://semantichub.ijs.si/MLCdatasets. In addition, we provide an ontology-based system for easy access and querying of performance/benchmark data obtained from a comprehensive MLC benchmark study. The system is available at: http://semantichub.ijs.si/MLCbenchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题