论文标题
稀疏重量激活训练
Sparse Weight Activation Training
论文作者
论文摘要
神经网络训练是计算和记忆密集型的。稀疏培训可以减轻旨在加速稀疏计算的新兴硬件平台上的负担,但它会影响网络收敛。在这项工作中,我们提出了一种新型的CNN训练算法稀疏重量激活训练(SWAT)。特警比常规训练更具计算和记忆效率。 SWAT基于经验见解修改后传播,即训练期间的收敛往往可以强大,以消除(i)向前传球期间的(i)较小的幅度重量,并且(ii)在向后pass期间的小幅度重量和激活。我们使用CIFAR-10,CIFAR-100和IMAGENET数据集评估了CNN架构(例如Resnet,VGG,Densenet和WideSnet)等最新CNN体系结构的特警。对于Imagenet Swat上的重新网络50,在培训期间,SWAT的总浮点操作(FLOP)减少了80%,导致3.3 $ \ times $ thimes $训练的速度在模拟的稀疏学习加速器代表的代表代表中,而验证准确性仅降低了1.63%。此外,SWAT在激活中将落后期间的记忆足迹降低23%至50%,重量为50%至90%。
Neural network training is computationally and memory intensive. Sparse training can reduce the burden on emerging hardware platforms designed to accelerate sparse computations, but it can affect network convergence. In this work, we propose a novel CNN training algorithm Sparse Weight Activation Training (SWAT). SWAT is more computation and memory-efficient than conventional training. SWAT modifies back-propagation based on the empirical insight that convergence during training tends to be robust to the elimination of (i) small magnitude weights during the forward pass and (ii) both small magnitude weights and activations during the backward pass. We evaluate SWAT on recent CNN architectures such as ResNet, VGG, DenseNet and WideResNet using CIFAR-10, CIFAR-100 and ImageNet datasets. For ResNet-50 on ImageNet SWAT reduces total floating-point operations (FLOPS) during training by 80% resulting in a 3.3$\times$ training speedup when run on a simulated sparse learning accelerator representative of emerging platforms while incurring only 1.63% reduction in validation accuracy. Moreover, SWAT reduces memory footprint during the backward pass by 23% to 50% for activations and 50% to 90% for weights.