论文标题
Migros:透明操作系统实时迁移支持集装箱的RDMA应用程序
MigrOS: Transparent Operating Systems Live Migration Support for Containerised RDMA-applications
论文作者
论文摘要
主要数据中心提供商正在为其租户以及操作基础架构引入基于RDMA的网络。与传统的基于插座的网络堆栈相比,基于RDMA的网络提供更高的吞吐量,较低的延迟和CPU开销减少。但是,透明的检查点和迁移操作变得更加困难。关键原因是将操作系统从通信的关键途径中删除。结果,某些通信状态本身位于NIC硬件中,并且不再在操作系统的直接控制之下。该控件特别包括对沟通虚拟化的虚拟化支持,这是通信伙伴的实时迁移所需的。在本文中,我们提出了实施基于迁移能力的RDMA网络所需的基本原理。我们建议在软件级别上进行一些更改,并在硬件级别上进行一些更改。作为概念的证明,我们将提议的更改集成到Softroce中,Softroce是ROCE协议的开源内核级实现。我们声称,当不发生迁移时,这些更改会引入没有运行时开销。最后,我们开发了概念验证实现,用于迁移使用基于RDMA的网络的容器化应用程序。
Major data centre providers are introducing RDMA-based networks for their tenants, as well as for operating the underlying infrastructure. In comparison to traditional socket-based network stacks, RDMA-based networks offer higher throughput, lower latency and reduced CPU overhead. However, transparent checkpoint and migration operations become much more difficult. The key reason is that the OS is removed from the critical path of communication. As a result, some of the communication state itself resides in the NIC hardware and is no more under the direct control of the OS. This control includes especially the support for virtualisation of communication which is needed for live migration of communication partners. In this paper, we propose the basic principles required to implement a migration-capable RDMA-based network. We recommend some changes at the software level and small changes at the hardware level. As a proof of concept, we integrate the proposed changes into SoftRoCE, an open-source kernel-level implementation of the RoCE protocol. We claim that these changes introduce no runtime overhead when migration does not happen. Finally, we develop a proof-of-concept implementation for migrating containerised applications that use RDMA-based networks.