進階搜尋


下載電子全文  
系統識別號 U0026-0808201614274700
論文名稱(中文) 支援異質系統架構之繪圖處理器微架構優化
論文名稱(英文) Micro-Architecture Optimization of HSA-Compatible GPU
校院名稱 成功大學
系所名稱(中) 電腦與通信工程研究所
系所名稱(英) Institute of Computer & Communication
學年度 104
學期 2
出版年 105
研究生(中文) 謝宛珊
研究生(英文) Wan-Shan Hsieh
學號 Q36034099
學位類別 碩士
語文別 英文
論文頁數 55頁
口試委員 指導教授-陳中和
口試委員-蕭勝夫
口試委員-李宗南
口試委員-邱瀝毅
口試委員-郭致宏
中文關鍵字 繪圖處理器  OpenCL  異質系統架構  效能量測工具 
英文關鍵字 GPU  HSA  OpenCL  Performance modeling tool 
學科別分類
中文摘要 在這份論文中,我們提出並設計了一個繪圖處理器模擬平台,其目的為幫助我們在積體電路早期開發階段中即能探討繪圖處理器架構。此平台包含OpenCL執行期函式庫、HSA執行期函式庫、客製化編譯工具及以高階程式語言實現之時序準確繪圖處理器模擬器。我們並以學界採用之OpenCL測試程式驗證平台的功能正確性。
為了更加瞭解測試程式及繪圖處理器硬體行為,我們設計了置放於處理器內部的效能量測工具,因此我們能得出不同測試程式的效能瓶頸。根據實驗結果,在記憶體存取單元所花的時間是傷害效能的主要原因,為解決此瓶頸,我們測試了許多種Warp排程機制、並嘗試提供更多硬體資源。在實驗中我們發現測試程式的運算指令密集度是選擇不同排程機制的指標,引導出動態Warp排程策略調整機制。最後我們依據效能量測工具提供的結果優化GPU的硬體設置參數。
英文摘要 In this work, we present a comprehensive GPU simulation platform to explore the architecture design in the early IC development stage. The platform contains an OpenCL Runtime APIs layer, a full HSA Runtime, a compiling toolchain, and a timing-approximate GPU simulator implemented in high-level programming language. We verify the accuracy of the platform with widely-adopted OpenCL benchmarks.
We design a built-in performance modeling tool of the GPU to evaluate the programs’ and hardware’s behaviors. With the assistance of the tool, we are able to target the performance bottlenecks. The main bottleneck turns to be the latency of load/store units inside the GPU. Therefore, we conduct two adjustments including the changing of warp scheduling policies and providing more hardware resources to improve the performance. During the experiment, we learn that Arithmetic Intensity is a metric to choose appropriate warp scheduling policy for a program. We realize this feature by designing a dynamic warp scheduling mechanism. In the end of this work, we propose an optimized GPU configuration based on the results of the performance modeling tool.
論文目次 摘要 I
Abstract II
誌謝 III
List of Tables VI
List of Figures VII
Chapter 1 Introduction 1
Chapter 2 Background 2
2.1 GPU 2
2.1.1 Warp Scheduling 2
2.1.2 Divergence 3
2.2 OpenCL 3
2.2.1 OpenCL Platform Model 4
2.2.2 OpenCL Execution Model 4
2.3 Heterogeneous System Architecture 5
2.3.1 Heterogeneous Queuing (hQ) 5
2.3.2 HSA Intermediate Language (HSAIL) 6
Chapter 3 Related Work 9
3.1 Warp Scheduling Policy 9
3.2 Performance Modeling Tools 9
3.3 Reference Metrics 11
Chapter 4 Simulation Platform 12
4.1 Instruction Set Architecture 13
4.2 Runtime System 13
4.2.1 OpenCL Runtime 13
4.2.2 HSA Runtime 16
4.2.3 Connection between OpenCL and HSA Runtime 18
4.3 Software Toolchain 19
4.4 Hardware Design of GPU 20
4.5 Implementation of GPU Simulation Platform 24
Chapter 5 Methodology: Built-in Profiler 25
Chapter 6 Experiment and Evaluation 28
6.1 Simulation Environment and Benchmarks 28
6.2 Experiment Results Analysis 29
6.3 Performance Evaluation 35
6.3.1 Warp Scheduling Policy Adjustment 35
6.3.2 Memory System Hardware Adjustment 40
Chapter 7 Discussion and Optimization 46
Chapter 8 Conclusion and Future Work 49
8.1 Conclusion 49
8.2 Limitations of the Current Work 50
8.2.1 Full System Simulation Platform 50
8.2.2 Shared Virtual Memory of OpenCL 2.0 50
8.2.3 Incomplete Finalizer 51
8.3 A Full System Heterogeneous Simulation Platform 53
References 54


參考文獻 [1] Advanced Micro Device, Inc. Developer Central. [Online] Available: http://developer.amd.com/tools-and-sdks/opencl-zone/
[2] A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, “Analyzing CUDA Workloads Using a Detailed GPU Simulator,” in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Boston, MA. April 19-21, 2009.
[3] H-Y. Chen, C-H. Chen, “An HSAIL ISA Conformed GPU Platform,” the thesis for Master of Science. National Cheng Kung University, Tainan, Taiwan. 2015.
[4] HSA Foundation, “HSA Platform System Architecture Specification Version 1.0 Final,” 2015.
[5] HSA Foundation, “HSA Programmer's Reference Manual: HSAIL Virtual ISA and Programming Model, Compiler Writer, and Object Format (BRIG) Version 1.0 Final,” 2015.
[6] HSA Foundation, “HSA Runtime Programmer’s Reference Manual Version 1.0,” 2015.
[7] HSA Foundation github. [Online] Available: https://github.com/hsafoundation
[8] K-C. Hsu, C-H. Chen, “Performance Prediction Model on HSA-Compatible General-Purpose GPU System” the thesis for Master of Science. National Cheng Kung University, Tainan, Taiwan. 2016.
[9] N. Jing, Y. Shen, and Y. Lu, et al, “An energy-efficient and scalable eDRAM-based register file architecture for GPGPU,” in ISCA '13 Proceedings of the 40th Annual International Symposium on Computer Architecture Pages 344-355, 2013.
[10] Khronos OpenCL Working Group, “The OpenCL Specification Version: 2.0,” 2014.
[11] N. Lakshminarayana, H. Kim, “Effect of Instruction Fetch and Memory Scheduling on GPU Performance,” Workshop on Language, Compiler, and Architecture Support for GPGPU, in conjunction with HPCA/PPoPP 2010, 2010.
[12] S. Madougou, A. Varbanescu, C. Laat, and R. Nieuwport, “The Landscape of GPGPU Performance Modeling Tools. Parallel Computing (2016),” doi: 10.1016/j.parco.2016.04.002
[13] V. Narasiman, M. Shebanow, and C. Lee, “Improving GPU Performance via Large Warps and Two-Level Warp Scheduling,” in MICRO’11. 2011.
[14] NVIDIA Corporation. [Online] Available: https://developer.nvidia.com/opencl
[15] NVIDIA Corporation, “NVIDIA’s Next Generation CUDA Compute Architecture: Fermi,” 2009.
[16] Quick emulator (qemu). [Online] Available: http://wiki.qemu.org/Main_Page
[17] Rodinia. [Online] Available: http://lava.cs.virginia.edu/Rodinia/download_links.htm
[18] T. Rogers, M. O’Connor, and T. Aamodt1, “Cache-Conscious Wavefront Scheduling,” in MICRO’12, 2012.
[19] The Lex & Yacc Page. [Online] Available: http://dinosaur.compilertools.net/
[20] Y. Zhang, Y. Hu, B. Li, and L. Peng, “Performance and Power Analysis of ATI GPU: A Statistical Approach,” in 2011 6th IEEE International Conference on Networking, Architecture and Storage (NAS), 2011.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2020-08-31起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2020-08-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw