An LLVM-based Binary Translator For A Heterogeneous System Architecture Simulator
|摘要:||通用圖形處理單元（GPGPU）的計算可以更有效的方式與高度並行的加快程序運行。然而，編程模型對程序員不太友好。內存模型是異質的，這樣的編程需要明確的數據傳輸控制系統主內存和GPU設備內存。在另一方面，其他如基礎的除錯調試和代碼分發缺乏支持。來自AMD的異構系統架構（HSA）以紓緩在GPGPU編程複雜性的軟件開發。特色包括共享內存模型, 可用於不同廠商硬體上的中介語言（IR）及更具體的操作控制，如控制GPGPU的環境中誇工作群的內存存取。在本文中，我們提出的以LLVM為基準開發的HSA轉譯器為了在一個HSA仿真器上提供一個快速的HSAIL轉譯。手寫的HSAIL benchmark以及HSAIL的二元組譯器協助確認功能性上的正確性。|
General purpose graphical processing unit (GPGPU) computation can speed up the programs with high degree of parallelism in a more power efficient way. However, the programming model is not programmer friendly. The memory model is heterogeneous thus such programming needs explicit data transfer control between system main memory and the GPU device memory from the programmers. On the other hand, other infrastructures such as the debugging and the code distribution are lack of support as well. The Heterogeneous System Architecture (HSA) from AMD rises with such issues to ease the software development in the GPGPU programming. Features including the shared memory model and the re-targetable intermediate representation (IR) with more specific operation controlling such as the cross work group controlling ease the software development in the GPGPU environment. In this paper, we present the HSA Translator for the fast simulation of the HSAIL in the functional level system mode simulator called the HSA Simulator performing the simulation of the HSA environment. It consists of the simulator based on the PQEMU for the simulation of the processing unit in the GPGPU environment. The HSA Translator is implemented in the simulator for the native code translation. The HSA Translator leverages the LLVM infrastructure to translate the kernel source code from the Heterogeneous System Architecture Intermediate Language (HSAIL) to the native re-locatable code. The linking of the native binary is done by a self-implemented link-loader called the HSA Link-Loader implemented in the simulator. The simulation of the kernel processing device is performed by using the host threads in order to speed up the simulation. We evaluate the simulation with the self-translated HSAIL benchmark based on the Rodinia benchmark and the AMD OpenCL samples.