论文标题
在Kubernetes上无服务器的推断
Serverless inferencing on Kubernetes
论文作者
论文摘要
组织越来越多地将机器学习模型大规模生产。无服务器比例到零范式的日益普及为部署机器学习模型的机会提供了机会,以帮助减轻基础架构成本时,当许多模型可能不连续使用时。我们将讨论KFServing项目,该项目建立在无服务器范式上,以提供无服务器的机器学习推理解决方案,该解决方案允许数据科学家部署其模型的一致且简单的接口。我们将展示它如何解决自动化基于GPU的推理的挑战,并讨论从生产中使用它的一些经验教训。
Organisations are increasingly putting machine learning models into production at scale. The increasing popularity of serverless scale-to-zero paradigms presents an opportunity for deploying machine learning models to help mitigate infrastructure costs when many models may not be in continuous use. We will discuss the KFServing project which builds on the KNative serverless paradigm to provide a serverless machine learning inference solution that allows a consistent and simple interface for data scientists to deploy their models. We will show how it solves the challenges of autoscaling GPU based inference and discuss some of the lessons learnt from using it in production.