進階搜尋


下載電子全文  
系統識別號 U0026-1208201117071300
論文名稱(中文) 透過電子系統層級之全系統模擬優化砌塊式繪圖處理器
論文名稱(英文) Tile-Based GPU Optimizations through the ESL Full System Simulation
校院名稱 成功大學
系所名稱(中) 電腦與通信工程研究所
系所名稱(英) Institute of Computer & Communication
學年度 99
學期 2
出版年 100
研究生(中文) 黃煦堯
研究生(英文) Hsu-Yao Huang
學號 Q36984189
學位類別 碩士
語文別 中文
論文頁數 102頁
口試委員 指導教授-陳中和
口試委員-李宗南
口試委員-邱瀝毅
口試委員-蘇文鈺
中文關鍵字 電子系統層級設計  全系統模擬  軟硬體分割  貼圖壓縮  砌塊式繪圖處理器 
英文關鍵字 electronic system level design  ESL  ETC  full system simulation  software/hardware partition  texture compression  tile-based GPU 
學科別分類
中文摘要
       QEMU-SystemC是最近熱門的全系統模擬平台之一,我們之前已經使用它建立了一套3D繪圖系統雛型,針對砌塊式繪圖處理器(tile-based GPU)設計,探討其架構與演算法之優劣,以及初步的軟硬體分割(software/hardware partition),並實現為可合成的硬體,然而在整體效能上卻還有調校的空間;此外,雖然從QEMU中可以得到估計指令的週期數,分別測量出軟硬體執行時間,卻仍然沒有辦法將軟體的負載真實地反應到系統效能中。

       因此,本論文以SysemC建構出抽象層級的tile-based GPU模型,包含貼圖單元,並估計模型需要的週期數以及合成的面積,作為評估效能的模型。接著配合QEMU同步分析器(synchronization profiler),在全系統模擬平台中建立同步機制,得到近似精確的應用程式執行時間,反應到系統中增加效能評估的準確性。GPU內部採用Ericsson Texture Compression (ETC)貼圖壓縮的技術,並擴充它支援透明度(alpha)壓縮,大幅降低外部記憶體的使用量與匯流排頻寬,約為原來的1/6,就Rasterization Engine (RE)部分加快35%。最後針對軟硬體資料流和架構優化,改善Geometry Engine (GE) 96%,改善RE 89%,整體系統效能提升70%;在時脈為200MHz時,GE最大輸出可達7.407 Mtriangles/sec、RE最大輸出為200 Mpixels/sec。
英文摘要
       Previously, we have built a 3D rendering system prototype through QEMU-SystemC, which is a popular full system simulation platform. By the analysis of different algorithms and architectures, we have implemented the synthesized hardware and performed the preliminary software/hardware partition for a tile-based GPU design. However, there is still important remaining work for further design space explorations of the 3D rendering system. We also have measured the software and hardware execution time respectively, by the approximate cycle count obtained from QEMU; however, the software overhead isn’t truly reflected in the system performance.

       In this thesis, we construct an abstract level of the tile-based GPU model using SystemC to assess the performance, including texture mapping unit, and estimate the execution cycle and GPU area used. For the sake of accurate estimation of application software, a synchronization mechanism of virtual platform is proposed, which cooperates with a QEMU synchronization profiler. We adopt the technology of Ericsson Texture Compression (ETC) in our GPU, and extend this design to support alpha compression. In this way, we reduce the usage of external memory to about one sixth, and speed up the Rasterization Engine (RE) by 35%. We optimize the HW/SW data flow and architecture, which improves the Geometry Engine (GE) performance by 96%, the RE by 89%, and the whole system by 70%. Running at 200 MHz, the GPU achieves a maximum throughput of 7.407 Mtriangles/sec at GE, 200 Mpixels/sec at RE.
論文目次

摘要.......................................................I
Abstract.................................................II
誌謝.....................................................III
目錄......................................................IV
表目錄...................................................VII
圖目錄..................................................VIII
第1章 序論..................................................1
1.1 Motivation.........................................1
1.2 Contribution.......................................2
1.3 Organization........................................2
第2章 背景知識與相關研究......................................3
2.1 3D computer graphics...............................3
2.2 Grahpics API.......................................4
2.2.1 OpenGL.......................................5
2.2.2 OpenGL ES....................................5
2.3 Conventional OpenGL pipeline.......................6
2.3.1 Geometry engine..............................8
2.3.2 Rasterization engine........................14
2.4 Tile-based OpenGL ES pipeline.....................17
2.5 Related work......................................19
2.5.1 Electronic System Level(ESL)................19
2.5.2 Full system simulation platform.............19
第3章 軟硬體優化............................................26
3.1 Texture mapping unit..............................26
3.1.1 Texture cache...............................26
3.1.2 Texture decompression.......................29
3.2 Timing model......................................42
3.2.1 Geometry engine(GE) pipeline................42
3.2.2 Rasterization engine(RE) pipeline...........43
3.2.3 SystemC model verification..................44
3.2.4 Synchronization.............................45
3.2.5 ARM bridge module...........................50
3.2.6 Interrupt mechanism.........................51
3.3 Data flow optimizations...........................53
3.3.1 Redundant initialization....................53
3.3.2 Redundant data flow.........................54
3.3.3 Vertex Buffer...............................56
3.3.4 Display List offloading.....................61
3.3.5 RE controller offloading....................68
3.4 Geometry Engine optimizations.....................70
3.4.1 Non-blocking GE pipeline....................70
3.4.2 Non-blocking GE write back..................71
3.5 Rasterization Engine optimizations................73
3.5.1 Non-blocking RE pipeline....................73
3.5.2 Non-blocking RE write back..................74
3.6 Bus architecture..................................77
3.6.1 Multi-layer AHB.............................78
3.6.2 Implementation..............................79
第4章 驗證環境與模擬結果.....................................82
4.1 Verification environment..........................82
4.2 Simulation results................................85
4.2.1 Texture compression.........................85
4.2.2 Optimizations of data flow and architecture.86
4.2.3 Comparison with other GPUs..................92
第5章 結論與未來展望.........................................95
5.1 Conclusion........................................95
5.2 Future work.......................................97
參考文獻...................................................98
參考文獻 [1] S.-T. Shen, “Full System Design and Simulation of a Multi-view Graphics Processor using QEMU,” Master Thesis, Dept. of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, Jul. 2009.
[2] S.-F. Tsai, “Design, Analysis, and Implementation of a Geometry Engine Based on Tile-Based Rendering Architecture in 3D Graphics,” Master Thesis, Dept. of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, Jul. 2009.
[3] J.-Y. Liou, “Design, Analysis, and Implementation of a Rasterization Engine Based on Tile-Based Rendering Architecture in 3D Graphics,” Master Thesis, Dept. of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, Jul. 2009.
[4] K.-C. Chen, “QEMU-CoWare Full System Simulation Platform with Simulation Synchronization Profiler,” Master Thesis, Dept. of Electrical Engineering, National Cheng Kung University, Tainan, Taiwan, Jul. 2010.
[5] H.-L. Lin, “Advanced Texture Unit Design of 3D Rendering System,” Master Thesis, Dept. of Computer Science and Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan, Jul. 2007.
[6] C.-H. Sun, Y.-M. Tsao, and S.-Y. Chien, “High-Quality Mipmapping Texture Compression With Alpha Maps for Graphics Processing Units,” IEEE Trans. Multimedia, vol. 11, no. 4, Jun. 2009.
[7] G. Campbell, T. A. DeFanti, J. Frederiksen, et al., “Two Bit/Pixel Full Color Encoding,” Proc. ACM SIGGRAPH Conf. Computer Graphics, vol. 20, no. 4, pp. 215–223, Aug. 1986.
[8] K. Iourcha, K. S. Nayak, and Z. Hong, “System and Method for Fixed-Rate Block-based Image Compression with Inferred Pixels Values,” US Patent 5,956,431, 1999.
[9] S. Fenny, “Texture Compression using Low-Frequency Signal Modulation,” Proc. ACM SIGGRAPH /EUROGRAPHICS Conf. Graphics hardware (HWWS ‘03), pp. 84–91, Jul. 2003.
[10] J. Ström and T. Akenine-Möller, “PACKMAN : Texture Compression for Mobile Phones,” ACM SIGGRAPH Sketches, 2004.
[11] J. Ström and T. Akenine-Möller, “iPACKMAN : High Quality, Low Complexity Texture Compression for Mobile Phones,” Proc. ACM SIGGRAPH /EUROGRAPHICS Conf. Graphics hardware (HWWS ‘05), pp. 63–70, Jul. 2005.
[12] M. Pettersson and J. Ström, “Texture Compression : THUMB — Two Hues Using Modified Brightness,” Proceedings of Sigrad, Lund, pp. 7–12, 2005.
[13] J. Ström and M. Pettersson, “ETC2: texture compression using invalid combinations,” Proc. ACM SIGGRAPH/EUROGRAPHICS Symp. Graphics hardware, pp. 49-54, Aug. 2007.
[14] M. Pettersson and J. Ström, “Table-based Alpha Compression,” Eurographics Conf. Computer Graphics, vol. 28, no. 2, pp. 687–695, Mar./Apr. 2009.
[15] C.-H. Tsai, “Design of 3D Graphic Tile-based Rendering Engine for Embedded Systems,” Master Thesis, Dept. of Computer Science and Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan, Jul. 2007.
[16] C.-Y. Lin, “Performance Modeling for a 3D Graphics SoC,” Master Thesis, Dept. of Computer Science and Engineering, National Sun Yat-Sen University, Kaohsiung, Taiwan, Jul. 2009.
[17] L.-B. Chen, C.-T. Yeh, H.-Y. Chen, et al., “A System-Level Model of Design Space Exploration for a Tile-Based 3D Graphics Soc Refinement,” IEICE Trans. Fundamentals of Electronics, Communications and Computer Sciences, vol. E92-A, no. 12, pp. 3193-3202, Dec. 2009.
[18] C.-T. Yeh, L.-B. Chen, C.-Y. Lin, et al., “A Bottom-Up Exploration Approach for 3D Graphics Hardware Accelerator in Consumer Electronics,” Proc. SASIMI ‘09, pp. 183-188, Mar. 2009.
[19] C.-H. Sun, “Low Power Graphics Processing Units with Programmable Texture Unit and Universal Rasterizer for Mobile Multimedia Applications,” Master Thesis, Dept. of Electronics Engineering, National Taiwan University, Taipei, Taiwan, Jul. 2008.
[20] T. Akenine-Möller and J. Ström, “Graphics for the masses: a hardware rasterization architecture for mobile phones,” ACM Trans. Graphics (Proc. SIGGRAPH ‘03), vol. 22, no. 3, pp. 801-808, Jul. 2003.
[21] B. Fabrice, “QEMU, a Fast and Portable Dynamic Translator,” Proc. USENIX Ann. Technical Conf., pp. 41-46, 2005.
[22] A. Munshi and J.Leech, “OpenGL ES Common/Common-Lite Profile Specification Version 1.1.12,” Khronos Group, Apr. 2008.
[23] Khronos Group, “OpenGL ES API Registry,” http://www.khronos.org/registry/gles/, 2011.
[24] R. S. Wright, Jr., B. Lipchark, and N. Haemel, OpenGL SuperBible, 4th edition, Addison-Wesley, 2007.
[25] T. Akenine-Möller and E. Haines, Real-Time Rendering, 2nd edition. A. K. Peters, Ltd, Natick, MA, 2002.
[26] K. Pulli, et al., Mobile 3D Graphics with OpenGL ES and M3G, Morgan Kaufmann, 2008.
[27] D. C. Black, J. Donovan, B. Bunton, and A Keist, SystemC: From the Ground Up, 2nd edition, Springer, 2010.
[28] J. Corbet, A. Rubini, and G. Kroah-Hartman, Linux Device Driver, 3rd edition, O’reilly, Feb. 2005.
[29] Synopsys Inc., “Platform Architect,” http://www.synopsys.com/Systems/ArchitectureDesign/pages/PlatformArchitect.aspx, 2011
[30] GreenSocs, “QEMU-SystemC,” http://www.greensocs.com/projects/QEMUSystemC, 2008.
[31] ARM Ltd., “Multi-layer AHB Overview,” 2004.
[32] ARM Ltd., “Mali Graphics Hardware,” http://www.arm.com/products/multimedia/mali-graphics-hardware/index.php, 2011
[33] Tom Olson, “ARM Mali-400 MP:A Scalable GPU for Mobile Devices,” http://www.highperformancegraphics.org/previous/www_2010/media/Hot3D/HPG2010_Hot3D_ARM.pdf, ARM Ltd., Jun. 2010.
[34] Imagination Technologies Ltd., “POWERVR MBX OpenGL ES 1.x SDK,” http://www.imgtec.com/powervr/insider/sdk/KhronosOpenGLES1xMBX.asp.
[35] http://thefree3dmodels.com/
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2012-08-31起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2012-08-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw