進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0102201623052800
論文名稱(中文) 符合HSA中介語言並支援三維繪圖與通用運算之繪圖處理器設計平台
論文名稱(英文) An HSAIL Conformed GPU Design Platform for General Purpose Computing and 3D Rendering Applications
校院名稱 成功大學
系所名稱(中) 電腦與通信工程研究所
系所名稱(英) Institute of Computer & Communication
學年度 104
學期 1
出版年 105
研究生(中文) 徐鏞
研究生(英文) Yung Hsu
學號 Q36034162
學位類別 碩士
語文別 英文
論文頁數 87頁
口試委員 指導教授-陳中和
口試委員-邱瀝毅
口試委員-郭致宏
口試委員-黃英哲
口試委員-蘇文鈺
中文關鍵字 繪圖處理器  異質架構系統  平行運算  繪圖管線 
英文關鍵字 GPU  heterogeneous system architecture  parallel computing  rendering pipeline 
學科別分類
中文摘要 繪圖處理器具有強大的平行運算能力,因此不僅使用在三維計算機繪圖,也被用於一般任務。本論文提出一系統層級的繪圖處理器設計平台,可同時支援三維繪圖與通用目的運算。此平台之目的在於,幫助處理器架構設計者在早期設計階段進行軟硬體的開發與驗證。此平台具有基於現代繪圖處理器之硬體架構的模擬器。該模擬器包含化可程式化且具有客製指令集架構的單一指令多執行緒處理器、針對繪圖管線所設計的特定模組,以及記憶體系統。此繪圖處理器針對高效能運算以及異質運算而設計,並符合異質架構系統的運行模式與其中介語言。本平台亦提供一特殊的編譯流程與工具鏈,用於編譯OpenGL著色程式與OpenCL內核至HSA中介語言以及客製的二進位指令集。本論文發展了一模擬框架,使設計平台得以運行OpenCL與OpenGL應用程式,該框架實作OpenCL與OpenGL 應用程式介面與其執行期函式庫、模擬器的驅動程式,以及客製的內文與視窗管理函式庫。數個OpenCL與OpenGL基準測試程式已被移植至此平台,開發者可剖析其程式行為並評估效能議題。
英文摘要 Graphics Processing Unit (GPU) has powerful parallel computing ability, so it can not only be used for 3D graphic application, but also for general purpose task. This work proposes a system level GPU design platform supporting 3D rendering and general purpose computing applications. The goal of the platform is to assist the processor architects to explore and verify the hardware as well as the software in the early design stage. The platform has a simulator which models the hardware architecture of the modern GPU, including the programmable Single Instruction Multiple Thread (SIMT) processors with customized instruction set architecture, the dedicated modules for the rendering pipeline, and the memory system. This GPU design is aimed for high performance and heterogeneous computing, and it conforms to the Heterogeneous System Architecture (HSA) execution model and HSA intermediate language (HSAIL). This platform also provides a special compilation flow and a tool chain to compile OpenGL shader programs and OpenCL kernels to HSAIL and our custom binary instruction set. To support executing OpenCL and OpenGL applications on this platform, we also develop a simulation framework, including the implementation of OpenCL and OpenGL APIs and runtime libraries, the driver for the simulator, and a customized context and window management library. Several benchmarks have been ported to this platform. Developers can profile the behavior of programs and evaluate the performance issue for both OpenCL and OpenGL applications
論文目次 Abstract (Chinese) i
Abstract ii
Acknowledgment iv
Table of Contents v
List of Figures viii
List of Tables x
Chapter 1 Introduction 1
1.1 Motivation 1
1.2 Contribution 2
1.3 Organization 2
Chapter 2 Backgrounds 4
2.1 Computer Graphics 4
2.1.1 OpenGL 4
2.1.2 The Rendering Pipeline 5
2.1.3 Programmable Shader 6
2.2 GPU 7
2.2.1 General Purpose Computing on Graphics Processing Units (GPGPUs) 8
2.2.2 OpenCL Framework 8
2.2.3 Task Scheduling and Control Divergence 9
2.3 Heterogeneous System Architecture (HSA) 11
2.3.1 HSA Execution Model 12
2.3.2 HSA Intermediate Language 13
2.3.3 Heterogeneous Queuing and Uniform Memory Access 14
Chapter 3 Related Work 16
3.1 ATTILA 16
3.2 GPGPU-Sim 18
3.3 TEAPOT 19
Chapter 4 GPU Architecture Design 22
4.1 GPU System Architecture 22
4.2 Streaming Multi-Processor 24
4.2.1 Instruction Set Architecture 24
4.2.2 Extension Instruction Set 26
4.2.3 SIMT Processor 30
4.3 Fixed Function Units for Rendering 32
4.3.1 Geometry Unit 33
4.3.2 Rasterizer Unit 34
4.3.3 Per-fragment Operation Unit 35
4.4 Texture Unit 36
4.4.1 Texture Unit Architecture 36
4.4.2 Address Generation Processor 38
4.4.3 Filter Processor 38
4.5 Memory System 42
4.5.1 Memory Segments 42
4.5.2 Memory Hierarchy Model 42
Chapter 5 Compilation Flow of Shader and Computing Kernel 44
5.1 Overview of the Compilation Flow 44
5.2 GLSL Shader Compilation 46
5.2.1 Translator 46
5.2.2 Scalarizer 48
5.2.3 Syntax Conversion 49
5.2.4 Kernel Synthesis 53
5.3 Finalizer 55
Chapter 6 Simulation Framework 56
6.1 Framework Overview 56
6.2 Application Layer 57
6.3 Runtime Libraries Layer 58
6.4 Driver Layer 61
6.5 Context and Window Management 62
6.5.1 X Window System 62
6.5.2 OpenGL Simulation Toolkit for CASLAB (GLSC) 63
Chapter 7 Benchmarks and Evaluation 68
7.1 Benchmarks 68
7.2 Experiment Environment 71
7.3 Experiment Result 73
7.3.1 Shader Workload Profiling 73
7.3.2 Instruction Breakdown 75
7.3.3 Memory Access Profiling 77
Chapter 8 Conclusion 83
References 84
參考文獻 [1] J.W. Sheaffer, K. Skadron, and D.P. Luebke. “Temperature-aware GPU design,” ACM SIGGRAPH Posters, New York, NY, USA, August 2004.
[2] V.M. del Barrio, C. Gonzalez, J. Roca, A. Fernandez and R. Espasa, “ATTILA: a cycle-level execution-driven simulator for modern GPU architectures,” International. Symposium on Performance Analysis of Systems and Software, March 2006, pp. 231-241.
[3] A. Bakhoda, G. Yuan, W. Fung, H. Wong, and T. Aamodt, “Analyzing CUDA workloads using a detailed GPU simulator,” in Proc. of ISPASS, 26-28 April 2009 pp. 163-174.
[4] J.M. Arnau, J.M. Parcerisa and P. Xekalakis, “TEAPOT: a toolset for evaluating performance, power and image quality on mobile graphics systems,” in Proc. of the 27th international ACM conference on International conference on supercomputing, New York, NY, USA, 2013, pp. 37-46.
[5] NVIDIA Corporation. (2009) Whitepaper: NVIDIA’s Next Generation CUDA(TM) Compute Architecture: Fermi. [Online]Available:
http://www.nvidia.com.tw/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
[6] Khronos Group Inc. OpenGL: The Industry's Foundation for High Performance Graphics. [Online] Available: https://www.opengl.org/
[7] HAS Foundation. (2015) HSA Programmer's Reference Manual: HSAIL Virtual ISA and Programming Model, Compiler Writer, and Object Format (BRIG) [Online] Available: http://www.hsafoundation.com/standards/
[8] The HSA Foundation. Heterogeneous System Architecture. [Online] Available: http://www.hsafoundation.com/
[9] The Mesa 3D Graphics Library. [Online] Available: http://www.mesa3d.org/
[10] Khronos Group Inc. [Online] Available: https://www.khronos.org/
[11] NVIDIA Corporation. (2009) Whitepaper: NVIDIA’s Next Generation CUDA(TM) Compute Architecture: Fermi. [Online]Available:
http://www.nvidia.com.tw/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf
[12] OpenGL Architecture Review Board. [Online] Available:
https://www.opengl.org/archives/about/arb/
[13] Khronos Group Inc. OpenCL: The open standard for parallel programming of heterogeneous systems. [Online] Available: https://www.opencl.org/
[14] NVIDIA Corporation. (September 2015) Parallel Thread Execution ISA. Application Guide (Version 4.3).
[Online] Available: http://docs.nvidia.com/cuda/pdf/ptx_isa_4.3.pdf
[15] J. Leng, T. Hetherington, A. Eltantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, “GPUWattch : Enabling energy optimizations in GPGPUs,” in Proc. of the 40th Annual International Symposium on Computer Architecture (ISCA '13), New York, NY, USA , June 2013, pp. 487-498.
[16] S. Li, J.H. Ahn, R.D. Strong, J.B. Brockman, D. M. Tullsen, and N.P. Jouppi, “McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures,” in Proc. of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, New York, NY, USA, 2009, pp. 469-480.
[17] S. Thoziyoor, J. Ahn, M. Monchiero, J. Brockman, and N. Jouppi, "A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies," in 35th International Symposium on Computer Architecture, pp.51-62, 21-25 June 2008.
[18] Gallium 3D. TGSI, Tungsten Graphics Shader Infrastructure. [Online] Available: http://gallium.readthedocs.org/en/latest/tgsi.html
[19] H.Y. Cheng, “An HSAIL conformed GPU platform,” master thesis, National Cheng Kung University, Tainan, Taiwan, 2015.
[20] Intel Corporation. (2015) The Compute Architecture of Intel® Processor Graphics Gen9 [Online] Available:
https://software.intel.com/sites/default/files/managed/c5/9a/The-Compute-Architecture-of-Intel-Processor-Graphics-Gen9-v1d0.pdf
[21] AMD Inc. (2012) White paper: AMD Graphics Cores Next (GCN) Architecture. [Online] Available: https://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf
[22] AMD Inc. CL Offline Compiler: Compile OpenCL kernels to HSAIL. [Online] Available: https://github.com/HSAFoundation/CLOC
[23] NVIDIA Corporation. NV_gpu_program4. [Online] Available:
https://www.opengl.org/registry/specs/NV/gpu_program4.txt
[24] NVIDIA Corporation. Cg Toolkit. [Online] Available:
https://developer.nvidia.com/cg-toolkit
[25] Y.C. Huang, “Dynamic SIMD re-convergence with paired-path comparison,” master thesis, National Cheng Kung University, Tainan, Taiwan, 2015.
[26] J.Y. Liou and C.H Chen, “Re-visit blocking texture cache design for modern GPU,” 11th Int. SoC Design Conference (ISOCC), Jeju, Korea, November 2014, pp. 288-289.
[27] X.Org Foundation. [Online] Available: http://www.x.org/wiki/
[28] GLUT - The OpenGL Utility Toolkit. [Online] Available:
https://www.opengl.org/resources/libraries/glut/
[29] GLFW - An OpenGL library. [Online] Available: http://www.glfw.org/
[30] SFML: Simple and Fast Multimedia Library. [Online] Available: http://www.sfml-dev.org/
[31] J. Leech. (2005) OpenGL(R) Graphics with the X Window System(R) (Version 1.4). [Online] Available: https://www.opengl.org/registry/doc/glx1.4.pdf
[32] AMD Inc. APP SDK - A Complete Development Platform. [Online] Available: http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/
[33] K. Zhou, X. Wang, Y. Tong, M. Desbrun, B. Guo and H. Shum, “Texture Montage: Seamlessly Texturing of Arbitrary Surfaces From Multiple Images”, ACM Trans. on Graphics, vol. 24, No. 3, pp. 1148-1155, 2005.
[34] T.G. Roger, M. O’Connor, and T.M. Aamodt. “Cache-Conscious Wavefront Scheduling,” in Proc. of the 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-45), Washington, DC, USA, Dec 2012, pp. 72-83.
[35] S. Molnar, M. Cox, D. Ellsworth, and H. Fuchs. 1994, "A Sorting Classification of Parallel Rendering.", in Computer Graphics and Applications, IEEE, vol.14, no.4, pp.23-32, July 1994.
[36] H. Gouraud, "Continuous Shading of Curved Surfaces," in IEEE Transactions on Computers, vol.C-20, no.6, pp.623-629, June 1971.
[37] B.T. Phong. "Illumination for computer generated pictures." Communications of the ACM, vol.18.6, pp. 311-317, June 1975.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2021-02-15起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2021-02-15起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw