進階搜尋


 
系統識別號 U0026-0812200915371332
論文名稱(中文) 多核心系統之雙階層快取記憶體之設計
論文名稱(英文) A two-level cache design for multi-core system
校院名稱 成功大學
系所名稱(中) 電腦與通信工程研究所
系所名稱(英) Institute of Computer & Communication
學年度 97
學期 2
出版年 98
研究生(中文) 郭良宇
研究生(英文) Liang-Yu Kuo
電子信箱 onsunnyday@casmail.ee.ncku.edu.tw
學號 q3696112
學位類別 碩士
語文別 英文
論文頁數 43頁
口試委員 指導教授-陳中和
口試委員-侯廷偉
口試委員-謝明得
口試委員-黃俊岳
中文關鍵字 資料一致性  多核心系統  快取記憶體 
英文關鍵字 multi-core system  coherence cache 
學科別分類
中文摘要 隨著計算能力的需求越來越大,單核心處理器的效能在技術上已經遇到提升的瓶頸,多核心系統成為提升效能最適宜的方法。在使用共享記憶體的多核心系統中,快取記憶體扮演一個重要的角色,其要透過溝通的協定控制各處理器之間的溝通及保證資料的一致性,並需兼具降低處理器存取記憶體延遲時間的功能。本研究基於一個有9級管線的類似ARM 超純量處理器:Symphony32 中實作了一個二階層快取記憶體。本研究中,第二階層快取記憶體被當作共享記憶體,如此可以有效的減少在系統匯流排上平均49.56%的負荷,並於分析了在多核心系統中私有快取記憶體之間連接方式與溝通協定的關係後,對快取記憶體及連接管道的效能做優化。在針對快取記憶體優化的部分使用雙埠來做存取,並讓效能平均增進約10.24%;此外,我們將資料管道的寬度增至與快取記憶體的區塊大小同寬,讓共享資料能快速的透過此資料管道在各個快取記憶體中轉移,減少多執行緒的程式在多核心中執行時等待修改過的共享資料回存的時間,並利用此項特性使得一般資料存取的效能平均增進約8.59%。
英文摘要 While the requirement of computing power becomes larger and larger, multi-core systems have recently become the most appropriate architecture to solve this problem because improving the performance of a single core system has run into a bottleneck. Cache memories play an important role in a shared-memory multi-core system since using private cache needs to ensure the consistence of shared data through coherence protocol. A two-level cache system for Symphony32 which is an ARM-like superscalar processor with 9-stage pipeline is implemented in this study. In our design, the L2 cache is treated as a shared cache, and reduces the traffics on system bus by 49.56% on average. After analyzing the relationship between the interconnection and cache coherence protocol, a two-level cache design that enhances the performance of the cache and the interconnection is presented. To enhance the cache, a dual-ported data cache is used to improve the performance by 10.24%. Moreover, the width of data channel is increased as large as the line size of the cache for transferring the data between caches rapidly, and this improves the performance of data accessing by 8.59%.
論文目次 摘要 IV
Abstract V
Contents VI
Chapter 1 -Introduction 1
1.1 Motivation 1
1.2 Contribution 2
1.3 Organization of the thesis 2
Chapter 2 -Background and related work 4
2.1 Interconnection network topology 4
2.2 Cache coherency problem 5
2.2.1 Directory-based protocol 6
2.2.2 Token-based protocol 7
2.2.3 Snooping-based protocol 8
2.3 Multi-level cache coherence 9
2.4 Related work 10
Chapter 3 -System evaluation and implementation 13
3.1 Interconnection topology and coherence protocol 13
3.2 Multilevel cache organization 14
3.3 Coherence bus design 15
3.4 MESI protocol 18
3.5 Design of dual-ported cache 21
3.6 Summary 23
Chapter 4 -Verification environment and method 24
4.1 Hardware platform 24
4.2 Software platform and benchmarks 25
4.2.1 Software tool chain 25
4.2.2 Benchmarks 26
4.2.2.1 Benchmarks of single-core system 26
4.2.2.2 Benchmarks of multi-core system 27
Chapter 5 -Experiment results 32
5.1 Performance measurement result 32
5.2 Synthesis result 38
Chapter 6 -Conclusion and future work 40
6.1 Conclusion 40
6.2 Future work 40
Reference 41
參考文獻 [1] M. R. Marty and et al., “Improving Multiple-CMP Systems Using Token Coherence,” Proceedings of the 11th int’l symposium on High-performance Computer Architecture (HPCA-11 2005).
[2] M. M. K. Martin, Mark D. Hill, and David A. Wood, “Token Coherence: Decoupling Performance and Correctness,” Proceedings of the 30th Annual International Symposium on Computer Architecture (ISCA’03).
[3] M. M. K. Martin and et al., “Timestamp Snooping: An approach for Extending SMPs,” Proceedings of the Ninth International Conference on Architectural Support for Programming Languages and Operating Systems(ASPLOS-IX), Cambridge, Massachusetts, November 13-15, 2000.
[4] J.-L. Baer and W.-H. Wang, “ON THE INCLUSION PROPERTIES FOR MULTI-LEVEL CACHE HIERARCHIES,” Proceedings of the 15th International Symposium on Computer Architecture, 1988, pages 73-80.
[5] M. R. Marty and et al., “Improving Multiple-CMP Syatems Using Token Coherence,” Proceedings of the 11th Intl Symposium on High-Performance Computer Architecture (HPCA-11 2005).
[6] S. C. Woo and et al., “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proceedings of the 22nd International Symposium on Computer Architecture, pages 24-36, Santa Margherita Ligure, Italy, June 1995.
[7] Mibench test bench, http://www.eecs.umich.edu/mibench/.
[8] Home page for the Stanford Parallel Applications for Shared Memory (SPLASH), http://www-flash.stanford.edu/apps/SPLASH/.
[9] D. Anderson and T. Shanley, “Pentium Processor System, Architecture,” Addison- Wesley Publishing Company, 1995.
[10] C. K. TANG, “Cache system design in the tightly coupled multiprocessor system,” Proceedings of the National Computer Conference, 1976, pages 749-753.
[11] D. Chaiken and et al., “Directory-Based Cache Coherence in Large-Scale Multiprocessors,” IEEE Computer, Vol. 23, No.6, June 1990, pp. 49-59.
[12] A. Silberschatz, P. B. Galvin, and G. Gagne, “Operating System Concepts (7th Edition),” Published by John Wiley & Sons, December 1, 2004.
[13] A. S. Tanenbaum, “Modern Operating Systems (3rd Edition),” Published by Prentice Hall, 2007.
[14] V. Salapura and et al., “Design and implementation of the Blue Gene/P snoop filter,” in 14th International Symposium on High-Performance Computer Architecture, February 2008.
[15] K. Strauss, X. Shen, and J. Torrellas, “Flexile Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors,” Proceedings of the 33rd International Symposium on Computer Architecture (ISCA' 06).
[16] M. Tomasevk and V. Milutinovic, “Hardware Approaches to Cache Coherence in Shared-Memory Multiprocessors,” IEEE micro, December 1994, P52-66.
[17] C.-C. Wang, “Design and Implementation of a Dual-ISA Embedded Microprocessor,” Thesis for Master of science, Institute of Computer and Communication Engineering, National Cheng Kung University.
[18] H.-W. Gao, “Embedded Processor Verification using Particular Characteristics of Linux Operating System,” Thesis for Master of Science, Institute of Computer and Communication Engineering, National Cheng Kung University.
[19] J.-W. Lin, “Design, Analysis, and Implementation of a Parameter-based Out-of-order Superscalar Microprocessor Conforming to ESL Methodology,” Thesis for Master of Science, Institute of Computer and Communication Engineering, National Cheng Kung University.
[20] L. Gwennap, “Alpha 21364 to Ease Memory Bottleneck,” Microprocessor Report, Oct. 1998.
[21] GNU M4 documentation, http://www.gnu.org/software/m4/manual/.
[22] AMD64 Architecture Programmer's Manual Vol 2 'System Programming'.
[23] B.-f. QIAN, L.-M. YAN, “The Research of the Inclusive Cache used in Multi-Core Processor,” 2008 International Conference on Electronic Packaging Technology & High Density Packaging (ICEPT-HDP 2008).
[24] L. Seiler and et al., “Larrabee: A Many-Core x86 Architecture for Visual Computing,” ACM Transaction on Graphics, Vol.27, No.3, Article 18, Publication date: August 2008.
[25] ARM Corporation, “AMBA™ Specification (Rev 2.0)”.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2009-08-27起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2009-08-27起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw