论文标题
弥合语言差距:对跨软件包生态系统的开源机器学习库的绑定的实证研究
Bridging the Language Gap: An Empirical Study of Bindings for Open Source Machine Learning Libraries Across Software Package Ecosystems
论文作者
论文摘要
开源机器学习(ML)库使开发人员能够将高级ML功能集成到自己的应用程序中。但是,流行的ML库(例如TensorFlow)在所有编程语言和软件包生态系统中都不可用。因此,希望使用ML库的开发人员可能需要使用所谓的绑定库(或绑定)来使用其编程语言或生态系统中不可用的开发人员。绑定提供了跨编程语言和软件包生态系统的支持,以重用主机库。例如,即使KERAS库是用Python编写的,Keras .NET绑定也为Nuget(.NET)生态系统中的Keras库提供了支持。在本文中,我们使用一种称为BindFind的方法收集了13个软件包生态系统中546 mL库的2,436个跨生态系统绑定,该方法可以自动识别绑定并将其链接到其主机库。此外,我们对40个流行的开源ML库进行了133个跨生态系统绑定及其开发的深入研究。我们的发现表明,大多数ML库绑定是由社区维护的,NPM是这些绑定最受欢迎的生态系统。我们的研究还表明,大多数绑定仅涵盖有限的主机库版本,通常会在支持新版本的情况下经历大量延误,并且具有广泛的技术滞后。我们的发现重点介绍了为开发人员集成ML库的绑定和开放途径的关键因素,以便研究人员进一步研究软件包生态系统中的绑定。
Open source machine learning (ML) libraries enable developers to integrate advanced ML functionality into their own applications. However, popular ML libraries, such as TensorFlow, are not available natively in all programming languages and software package ecosystems. Hence, developers who wish to use an ML library which is not available in their programming language or ecosystem of choice, may need to resort to using a so-called binding library (or binding). Bindings provide support across programming languages and package ecosystems for reusing a host library. For example, the Keras .NET binding provides support for the Keras library in the NuGet (.NET) ecosystem even though the Keras library was written in Python. In this paper, we collect 2,436 cross-ecosystem bindings for 546 ML libraries across 13 software package ecosystems by using an approach called BindFind, which can automatically identify bindings and link them to their host libraries. Furthermore, we conduct an in-depth study of 133 cross-ecosystem bindings and their development for 40 popular open source ML libraries. Our findings reveal that the majority of ML library bindings are maintained by the community, with npm being the most popular ecosystem for these bindings. Our study also indicates that most bindings cover only a limited range of the host library's releases, often experience considerable delays in supporting new releases, and have widespread technical lag. Our findings highlight key factors to consider for developers integrating bindings for ML libraries and open avenues for researchers to further investigate bindings in software package ecosystems.