论文标题
HPC AI500:用于基准HPC AI系统的方法,工具,车顶线绩效模型和指标
HPC AI500: The Methodology, Tools, Roofline Performance Models, and Metrics for Benchmarking HPC AI Systems
论文作者
论文摘要
近年来,在业务和科学计算领域都采用大规模分布深度学习的趋势,其目标是加快培训时间以达到最先进的质量。 HPC社区对建立致力于运行这些工作负载的HPC AI系统感到非常感兴趣。 HPC AI基准加速了该过程。不幸的是,根据大规模进行HPC AI系统进行基准测试带来了严重的挑战。以前的HPC AI基准都没有实现等效,相关,代表性,负担得起且可重复的目标。本文介绍了用于基准,优化和对HPC AI系统进行基准测试,优化和排名的全面方法,工具,屋顶线的性能模型和创新指标,我们称之为HPC AI500 v2.0。我们将HPC AI系统抽象为九个独立层,并提出明确的基准规则和程序,以确保每层,可重复性和可复制性的等效性。在AIBCHENN的基础上 - 迄今为止最全面的AI基准套件,我们从商业和科学计算中介绍并构建了两个HPC AI基准:图像分类和极端天气分析,既具有代表性又具有可负担性。为了对HPC AI系统的性能和能源效率进行排名,我们提出了有效的拖鞋,并且每瓦有效的拖鞋,这对未能达到目标质量施加了惩罚。我们建议使用卷积和GEMM-两个最强烈的内核函数来测量HPC AI系统的上限性能,并呈现HPC AI屋顶线模型,以指导性能优化。评估表明我们的方法,基准,绩效模型和指标可以以可扩展,简单且负担得起的方式测量,优化和对HPC AI系统进行排名。 HPC AI500 v2.0可从http://www.benchcouncil.org/benchhub/hpc-ai500-benchmark公开获得。
The recent years witness a trend of applying large-scale distributed deep learning in both business and scientific computing areas, whose goal is to speed up the training time to achieve a state-of-the-art quality. The HPC community feels a great interest in building the HPC AI systems that are dedicated to running those workloads. The HPC AI benchmarks accelerate the process. Unfortunately, benchmarking HPC AI systems at scale raises serious challenges. None of previous HPC AI benchmarks achieve the goal of being equivalent, relevant, representative, affordable, and repeatable. This paper presents a comprehensive methodology, tools, Roofline performance models, and innovative metrics for benchmarking, optimizing, and ranking HPC AI systems, which we call HPC AI500 V2.0. We abstract the HPC AI system into nine independent layers, and present explicit benchmarking rules and procedures to assure equivalence of each layer, repeatability, and replicability. On the basis of AIBench -- by far the most comprehensive AI benchmarks suite, we present and build two HPC AI benchmarks from both business and scientific computing: Image Classification, and Extreme Weather Analytics, achieving both representativeness and affordability. To rank the performance and energy-efficiency of HPC AI systems, we propose Valid FLOPS, and Valid FLOPS per watt, which impose a penalty on failing to achieve the target quality. We propose using convolution and GEMM -- the two most intensively-used kernel functions to measure the upper bound performance of the HPC AI systems, and present HPC AI roofline models for guiding performance optimizations. The evaluations show our methodology, benchmarks, performance models, and metrics can measure, optimize, and rank the HPC AI systems in a scalable, simple, and affordable way. HPC AI500 V2.0 are publicly available from http://www.benchcouncil.org/benchhub/hpc-ai500-benchmark.