论文标题
通过在设备放置中使用有效的图形遍历顺序加速模型平行训练
Accelerate Model Parallel Training by Using Efficient Graph Traversal Order in Device Placement
论文作者
论文摘要
现代神经网络需要长时间的培训才能在大规模数据集上达到不错的表现。加快训练的一种常见方法是模型并行化,其中大型神经网络在多个设备上分开。但是,同一神经网络的不同设备放置导致不同的训练时间。大多数现有的设备放置解决方案通过遍历神经网络图并将其神经元分配给不同设备,将问题视为顺序决策。这项工作研究了图形遍历顺序对设备放置的影响。特别是,我们从经验上研究了不同的图形遍历顺序如何导致不同的设备放置,从而影响训练时间。我们的实验结果表明,最佳的图形遍历顺序取决于神经网络的类型及其计算图特征。在这项工作中,我们还提供了有关在设备放置中选择图形遍历顺序的各种神经网络家族的建议,以改善模型并行化的训练时间。
Modern neural networks require long training to reach decent performance on massive datasets. One common approach to speed up training is model parallelization, where large neural networks are split across multiple devices. However, different device placements of the same neural network lead to different training times. Most of the existing device placement solutions treat the problem as sequential decision-making by traversing neural network graphs and assigning their neurons to different devices. This work studies the impact of graph traversal order on device placement. In particular, we empirically study how different graph traversal order leads to different device placement, which in turn affects the training execution time. Our experiment results show that the best graph traversal order depends on the type of neural networks and their computation graphs features. In this work, we also provide recommendations on choosing graph traversal order in device placement for various neural network families to improve the training time in model parallelization.