GPUNET：搜索可部署的卷积神经网络中的GPU

论文标题

GPUNET：搜索可部署的卷积神经网络中的GPU

GPUNet: Searching the Deployable Convolution Neural Networks for GPUs

论文作者

Wang, Linnan, Yu, Chenhan, Salian, Satish, Kierat, Slawomir, Migacz, Szymon, Florea, Alex Fit

论文摘要

对于DL从业者来说，定制卷积神经网络（CNN）是一项艰巨的任务。本文打算使用模型中心加快模型自定义，该模型中心包含使用神经体系结构搜索（NAS）的推理潜伏期分级的优化模型。为了实现这一目标，我们构建了一个分布式的NAS系统，以搜索一个新的搜索空间，该搜索空间由突出的因素组成，以影响潜伏期和准确性。由于我们针对GPU，因此我们将NAS优化模型命名为GPUNET，该模型在推理潜伏期和准确性中建立了新的SOTA Pareto前沿。在1 $ ms $之内，GPUNET的准确性更高，比EfficityNet-X和FBNETV3快2倍。我们还在检测任务上验证了GPUNET，而GPUNET在潜伏期和准确性的可可检测任务上始终优于EfficityNet-X和FBNETV3。所有这些数据验证了我们的NAS系统有效且通用来处理不同的设计任务。借助此NAS系统，我们扩展了GPUNET，以涵盖广泛的延迟目标，以便DL从业人员可以在不同的情况下直接部署我们的模型。

Customizing Convolution Neural Networks (CNN) for production use has been a challenging task for DL practitioners. This paper intends to expedite the model customization with a model hub that contains the optimized models tiered by their inference latency using Neural Architecture Search (NAS). To achieve this goal, we build a distributed NAS system to search on a novel search space that consists of prominent factors to impact latency and accuracy. Since we target GPU, we name the NAS optimized models as GPUNet, which establishes a new SOTA Pareto frontier in inference latency and accuracy. Within 1$ms$, GPUNet is 2x faster than EfficientNet-X and FBNetV3 with even better accuracy. We also validate GPUNet on detection tasks, and GPUNet consistently outperforms EfficientNet-X and FBNetV3 on COCO detection tasks in both latency and accuracy. All of these data validate that our NAS system is effective and generic to handle different design tasks. With this NAS system, we expand GPUNet to cover a wide range of latency targets such that DL practitioners can deploy our models directly in different scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题