芝麻：软件定义的飞地，以确保具有多租户执行的推理加速器

论文标题

芝麻：软件定义的飞地，以确保具有多租户执行的推理加速器

SESAME: Software defined Enclaves to Secure Inference Accelerators with Multi-tenant Execution

论文作者

Banerjee, Sarbartha, Ramrakhyani, Prakash, Wei, Shijia, Tiwari, Mohit

论文摘要

针对复杂CPU设计的硬件 - 灭菌剂损害了安全性和性能。程序几乎无法控制微体系结构，从而导致侧通道泄漏，然后必须转换以具有最差的控制控制和数据流动行为，从而导致大幅下降。我们建议通过将飞地带入加速器丰富的架构领域来解决这些安全和绩效问题。关键思想是构建软件定义的飞地（SDE），其中保护和放缓与应用程序定义的威胁模型相关，并由编译器调整为加速器的特定域。这种垂直集成的方法需要新的硬件数据结构来分区，清除和塑造硬件资源的利用；以及一个实例化和安排这些数据结构以创建加速器多租户飞地的编译器。我们使用全面的原型（芝麻）展示了我们的想法，其中包括对编译器，ISA和微体系结构的修改，以进行脱钩的访问执行（DAE）加速器框架，用于深度学习模型。我们的安全评估表明，可以区分VGG，Resnet和Alexnet中不同层的分类器在使用芝麻运行时无法做到。我们可合成的硬件原型（在Xilinx pynq板上）展示了编译器和微观架构如何实现代码尺寸的威胁模式特定的特定特定的权衡取舍，范围为3-7 $ \％$ \％$ \％$ \％$ \％$，从3.96 $ $ \％$ \％$ \ $ \ $ \ \ \ \ \ \ \ \ \ $ \ \ \ \ \ \ \ \ \ \ \ $ \ $ \ \ \ \ \ \ $ \ \ \ \ $ \ \ \ \ \ $ \ \ $ \％（VS）（VSIOMIAL）的运行时（范围为34.87 $ \％）（系统）。

Hardware-enclaves that target complex CPU designs compromise both security and performance. Programs have little control over micro-architecture, which leads to side-channel leaks, and then have to be transformed to have worst-case control- and data-flow behaviors and thus incur considerable slowdown. We propose to address these security and performance problems by bringing enclaves into the realm of accelerator-rich architectures. The key idea is to construct software-defined enclaves (SDEs) where the protections and slowdown are tied to an application-defined threat model and tuned by a compiler for the accelerator's specific domain. This vertically integrated approach requires new hardware data-structures to partition, clear, and shape the utilization of hardware resources; and a compiler that instantiates and schedules these data-structures to create multi-tenant enclaves on accelerators. We demonstrate our ideas with a comprehensive prototype -- Sesame -- that includes modifications to compiler, ISA, and microarchitecture to a decoupled access execute (DAE) accelerator framework for deep learning models. Our security evaluation shows that classifiers that could distinguish different layers in VGG, ResNet, and AlexNet, fail to do so when run using Sesame. Our synthesizable hardware prototype (on a Xilinx Pynq board) demonstrates how the compiler and micro-architecture enables threat-model-specific trade-offs in code size increase ranging from 3-7 $\%$ and run-time performance overhead for specific defenses ranging from 3.96$\%$ to 34.87$\%$ (across confidential inputs and models and single vs. multi-tenant systems).

下载PDF全文

下载文献需遵守相关版权规定

论文标题