论文标题
部分可观测时空混沌系统的无模型预测
Can We Run in Parallel? Automating Loop Parallelization for TornadoVM
论文作者
论文摘要
随着多核系统,GPU和FPGA的出现,LOOP并行化已成为加速程序执行的一种有希望的方法。为了熬夜,各种面向性能的编程语言提供了多种构造,以允许程序员编写可行的可行循环。相应地,研究人员开发了技术,可以自动使不会在迭代中依赖依赖和/或调用纯函数的循环。但是,在具有诸如Java等平台独立的运行时的托管语言中,在JIT编译过程中进行复杂的依赖分析实际上是不可行的。在本文中,我们提出了Autotornado,这是Java程序的同类静态+JIT循环并行器中的首个,该程序使用TornadOvm(基于GRAAL的VM(支持 @Paramellally构造插入循环并行化)的异构体系结构的循环。 Autotornado在烟灰框架中对Java程序进行了复杂的依赖性和纯度分析,以生成约束编码条件,在该条件下可以并行化给定的循环。然后,生成的约束将被馈送到Z3定理贵族(我们已与烟灰集成)以注释可以使用@Paralleal构造并行化的循环的规范。我们还添加了Tornadovm中的运行时支持,以使用静态分析结果进行循环并行化。我们对几个标准并行化核的评估表明,Autotornado通过有效的静态分析和接近零的运行时开销正确地将61.3%的手动可行循环平行。据我们所知,Autotornado不仅是对现实世界中的JVM进行基于程序分析的平行化的第一个工具,而且还是第一个将Z3与烟灰集成的烟灰进行回路并行化的工具。
With the advent of multi-core systems, GPUs and FPGAs, loop parallelization has become a promising way to speed-up program execution. In order to stay up with time, various performance-oriented programming languages provide a multitude of constructs to allow programmers to write parallelizable loops. Correspondingly, researchers have developed techniques to automatically parallelize loops that do not carry dependences across iterations, and/or call pure functions. However, in managed languages with platform-independent runtimes such as Java, it is practically infeasible to perform complex dependence analysis during JIT compilation. In this paper, we propose AutoTornado, a first of its kind static+JIT loop parallelizer for Java programs that parallelizes loops for heterogeneous architectures using TornadoVM (a Graal-based VM that supports insertion of @Parallel constructs for loop parallelization). AutoTornado performs sophisticated dependence and purity analysis of Java programs statically, in the Soot framework, to generate constraints encoding conditions under which a given loop can be parallelized. The generated constraints are then fed to the Z3 theorem prover (which we have integrated with Soot) to annotate canonical for loops that can be parallelized using the @Parallel construct. We have also added runtime support in TornadoVM to use static analysis results for loop parallelization. Our evaluation over several standard parallelization kernels shows that AutoTornado correctly parallelizes 61.3% of manually parallelizable loops, with an efficient static analysis and a near-zero runtime overhead. To the best of our knowledge, AutoTornado is not only the first tool that performs program-analysis based parallelization for a real-world JVM, but also the first to integrate Z3 with Soot for loop parallelization.