進階搜尋


下載電子全文  
系統識別號 U0026-2707201111081300
論文名稱(中文) RaceWeir: 支援GDB與硬體輔助資料競爭偵測之 多核心虛擬平台
論文名稱(英文) RaceWeir: A Multi-Core Virtual Platform with GDB Support and Hardware-Assisted Data Race Detection
校院名稱 成功大學
系所名稱(中) 電腦與通信工程研究所
系所名稱(英) Institute of Computer & Communication
學年度 99
學期 2
出版年 100
研究生(中文) 江定遠
研究生(英文) Ting-Yuan Jiang
學號 Q36984016
學位類別 碩士
語文別 英文
論文頁數 82頁
口試委員 指導教授-陳中和
口試委員-謝錫堃
口試委員-葉人傑
口試委員-蘇培陞
口試委員-王永鐘
中文關鍵字 快取資料一致性  資料競爭偵測  電子系統層級設計  多核心系統  多執行緒程式除錯 
英文關鍵字 cache coherency  data race detection  ESL design flow  multi-core system  multithreaded program debugging 
學科別分類
中文摘要 隨著製程飛快的演進,現代晶片中可包含不只一個處理器甚至是一整個系統。系統單晶片(System on Chip)和多核心晶片(Chip Multi Processing)是現在大型積體電路設計領域的主要趨勢。因此,軟體工程師通常將程式設計成多執行緒,進而利用多核心的系統資源並增加程式執行的效能。不幸的是,軟體除錯的工作也同時變得更複雜。
本篇論文中,我們提出了兩個藉由硬體輔助來增強軟體的除錯功能。第一個是pGDB,我們修改了GNU原版的除錯器GDB並加入一些硬體來支援多核心除錯。pGDB能夠同步控制所有核心作單步執行或繼續執行,以及收集所有核心上的除錯資訊。第二個是RaceWeir,我們提出了一個新的資料競爭偵測的機制及實作方法。RaceWeir動態的偵測多執行緒程式中的資料競爭問題,且我們提出的RaceWeir的面積只有過往方法的29.4%。
我們在一具有功能準確性之粗略時間(approximately-timed)多核心系統模型上做模擬。此多核心平台是基於SytemC標準來實現,而其上的每個節點以ARM-Versatile開發板為雛型且各節點間的連接採用共用匯流排架構。實驗結果顯示pGDB功能正確而RaceWeir能夠偵測到不論是原本就存在或是人為加入的資料競爭情況。
英文摘要 Due to the rapid advancement in process technology, more than one processor or even an entire system can reside in a single chip. SoC (System on Chip) and CMP (Chip Multi Processor) now lead the trend in VLSI (Very Large System Integration) circuit design. Consequently, software programmers usually design multithreaded applications to exploit the hardware resource and improve the performance. Unfortunately, software debugging becomes much more complicated as well.
In this thesis, we propose two hardware-based mechanisms to support software debugging in a CMP system. First, we propose pGDB, a debugger for multi-core system. In pGDB, we modify the GNU debugger and add hardware modules to support synchronized step/continue functions for all processor cores and as a result to gather debugging information from all cores. Secondly, we propose RaceWeir, a novel hardware-based race detection mechanism. RaceWeir dynamically detects the race condition in multithreaded programs and it costs only 29.4% of the area compared with previous work.
Our simulations results are demonstrated on an approximately-timed multi-core platform based on SystemC. The nodes on our platform are an ARM-Versatile like system, which is based on shared bus architecture. The results show that the pGDB works correctly and the RaceWeir catches either existing races or injecting races in SPLASH-2 benchmarks.
論文目次 摘要 I
Abstract II
Acknowledgements III
Content IV
List of Tables VII
List of Figures VIII
Chapter 1- Introduction 1
1.1 Motivation 1
1.2 Contribution 3
1.3 Scope and Organization 3
Chapter 2- Background and Related Works 5
2.1 Cache Coherency Policy 5
2.1.1 MESI 5
2.1.2 MESIF 7
2.1.3 MOESI 8
2.1.4 MOESI+F 10
2.2 GNU Debugger 11
2.2.1 Introduction to GDB 11
2.2.2 Remote Debugging Protocol 11
2.3 Parallel Program Debugging 13
2.3.1 Bugs in Parallel Programs 13
2.3.2 State-of-art Architecture Supported Debugging Methods 17
2.4 Dynamic Data Race Detection Algorithms 18
2.4.1 Happens-Before Data Race Detection Algorithm 19
2.4.2 Lockset Algorithm 22
2.4.3 Hybrid Algorithm 24
2.4.4 Summary 24
2.5 Related Works 25
Chapter 3- System Framework 26
3.1 Subsystem Design 26
3.1.1 ARM-based Instruction Set Simulator (ISS) 26
3.1.1 TLM 2.0 Local Bus 27
3.1.2 ARM Versatile-PB like Platform 30
3.2 Multi-core Virtual Platform 31
3.2.1 Symmetric Multiprocessing 31
3.2.2 Memory Map 32
3.3 RaceWeir 33
3.3.1 Hardware Implementation 33
3.3.2 False Alarm Pruning Technology 37
3.4 pGDB (Puppeteer GDB and Puppet GDBstub) 41
3.4.1 The Virtual Platform with GDB 41
3.4.2 Puppet GDBstub Design 42
3.4.3 Synchronized Step and Continue Operation 43
Chapter 4- Platform Verification 45
4.1 Verification Methodology 45
4.1.1 Threading Library without OS Support 45
4.1.2 Porting SPLASH-2 on MVP Platform 46
4.2 Co-work with Puppeteer GDB 48
4.2.1 Gathering Information from Other Nodes 48
4.2.2 Synchronized Step and Continue Operations 51
4.3 Data race detection 53
4.3.1 Finding Injected Races 53
4.3.2 Finding Existing Races 56
Chapter 5- Evaluation and Results 57
5.1 Experimental Environment and Parameters 57
5.2 Performances of Cache Coherency Protocols 58
5.2.1 Miss Rate 58
5.2.2 Shared Bus Access 60
5.3 False Alarm Pruning Technologies 62
5.3.1 Effectiveness of Pruning Technologies 62
5.3.2 Impact from Caching Local Data 64
5.4 Impact from Lockset Granularity 66
5.5 Impact from L2 Cache 67
5.6 8-Core MVP 69
Chapter 6- Conclusions 72
Chapter 7- Future Works 73
References 74
參考文獻 [1] “AMD 64 Architecture Programmer’s Manual Vol 2 ‘System Programming,” Advanced Micro Device (AMD)., May. 2011.
[2] “ARM CoreSight Architecture Specification,” ARM Co. Ltd., Mar. 2005.
[3] “OSCI TLM-2.0 Language Reference Manual,” Open SystemC Initiative (OSCI), Jul. 2009.
[4] M. Montoreano, “Transaction Level Modeling using OSCI TLM 2.0,” Synopsys, Inc., May. 2007.
[5] A. Muzahid, D. Suarez, S. Qi and J. Torrellas, “SigRace: Signature-Based Data Race Detection,” Proceeding of the 36th Annual International Symposium on Computer Architecture (ISCA 09), pp. 325-336, Austin, TX, US, June 2009.
[6] A. Nistor, D. Marinov and J. Torrellas, “Light64: Lightweight Hardware Support for Data Race Detection during Systematic Testing of Parallel Programs,” Proceeding of the 42th Annual IEEE/ACM International Symposium on Micro-architecture (MICRO 09), pp. 541-552, New York, New York, USA, Dec. 2009.
[7] A. Ramirez, F. Cabarcas, B. Juurlink, et al., “The SARC Architecture,” Proceeding of the 43th Annual IEEE/ACM International Symposium on Micro-architecture (MICRO 10), vol. 30, no.5, pp. 16-29, Atlanta, Georgia, USA, Dec. 2010.
[8] B. F. Qian and L. M. Yan, “The Research of the Inclusive Cache used in Multi-Core Processor,” International Conference on Electronic Packaging Techonology and High Desity Packaging (ICEPT-HDP 2008), Jul. 2008.
[9] B. H. Bloom, “Space/time trade-offs in hash coding with allowable errors,” Communications of the ACM, vol. 13, no. 7, pp. 422-426, 1970.
[10] B. Lucia, L. Ceze and K. Strauss, “ColorSafe: Architectural Support for Debugging and Dynamically Avoiding Multi-variable Atomicity Violations,” Proceeding of the 37th Annual International Symposium on Computer Architecture (ISCA 10), pp. 222-233, Saint-Malo, France, Jun. 2010.
[11] B. Lucia, J.Devietti, K. Strauss and L. Ceze, “Atom-Aid: Detecting and Surviving Atomicity Violation,” Proceeding of the 35th Annual International Symposium on Computer Architecture (ISCA 08), pp. 277-288, Beijing, China, Jun. 2008
[12] C. Bienia, S. Kumar and L. Kai, “PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors,” IEEE International Symposium on Workload Characterization, pp. 47-56, Seattle, WA, USA, Sep. 2008.
[13] C. Flanagan and S.N. Freund, “FastTrack: Efficient and Precise Dynamic Race Detection” Proceeding of the 2009 ACM SIGPLAN conference on Programming language design and implantation (PLDI 09), pp. 121-133, Dublin, Ireland, USA, Jun. 2009.
[14] C.-M. Yang, “An Efficient ESL Co-Simulation Platform using Shared-Memory Communication Scheme,” 2011 master thesis of National Cheng Kung University, Tainan, Taiwan, Feb. 2011.
[15] C.-N. Wen, S.-H. Chou, T.-Fu. Chen and A.-P. Su, “NUDA: A Non-Uniform Dubugging Architecture and Non-Intrusive Race Detection for Many-Core,” IEEE Design Automation Conference (DAC 09), pp. 148-153, San Francisco, California, USA, Jul. 2009.
[16] C. von Praun and T. R. Gross, “Object race detection,” Proceeding of the 16th ACM SIGPLAN conference on Object oriented programming, systems, languages and applications (OOPSLA 01), pp. 70-82, 2001.
[17] D. Anderson and T.Shanley, “Pentium Processor System Architecture,” Addison-Wesley Publishing Company, April. 1995.
[18] D. C. Black, J. Donovan, B. Bunton, and A. Keist, “SystemC: From the Ground up 2nd Edition,” Springer Media Inc., 2010.
[19] D. Engler and K. Ashcraft, “RaceX: Effective, Static Detection of Race Conditions and Deadlocks,” Proceedings of the 19th ACM symposium on Operating Systems Principles (SOSP 03), pp. 237-252, Bolton Landing, NY, USA, Oct. 2003.
[20] D. Marino, M. Musuvathi and S. Narayanasamy, “LiteRace: Effective Sampling for Lightweight Data-Race Detection,” Proceeding of the 2009 ACM SIGPLAN conference on Programming language design and implantation (PLDI 09), pp. 134-143, Dublin, Ireland, Jun. 2009.
[21] D. Perkovic and P. J. Kelecher, “Online data-race detection via coherency guarantees,” Proceeding of the second USENIX symposium on Operating Systems design and implementation,” pp. 47-57, New York, NY, USA, 1996.
[22] D. R. Hower and M. D. Hill, “Rerun: Exploiting Episodes for Lightweight Memory Race Recording,” Proceeding of the 35th Annual International Symposium on Computer Architecture (ISCA 08), pp. 265-276, Beijing, China, Jun. 2008.
[23] E. Pozniansky and A. Schuster, “Efficient On-the-Fly Data Race Detection in Multithreaded C++ Programs,” Proceeding of the 9th ACM SIGPLAN symposium on Principle and Practice of Parallel Programming (PPoPP 03), pp. 179-190, San Diego, California, USA, Jun. 2003.
[24] H. Nishiyama, “Detecting data access using dynamic escape analysis based on read barrier,” Proceeding of the third conference on Virtual Machine Research and Technology Symposium (VM 04), pp. 127-138, May. 2004..
[25] H. Hum, et al, “Forward State for use in Cache Coherency in a Multiprocessor System,” US Patent No. 6922756 B2, Jul. 2005.
[26] J. Choi, A. Loginov and V. Sarkar, “Static data race analysis for multithreaded object-oriented programs,” Technical Report RC22146, IBM Research, 2001.
[27] J. Gilmore and S.Shebs, “GDB Internals-A Guide to the Internal of the GNU Debugger,” Cygnus Solutions, Feb.2004.
[28] J. Devietti, B. Lucia, L. Ceze and M. Oskin, “DMP: Deterministic Shared-Memory Multiprocessing,” Proceeding of the 43th Annual IEEE/ACM International Symposium on Micro-architecture (MICRO 10)IEEE Micro, vol. 30, no.1, pp. 40-49, Atlanta, Georgia, USA, Dec. 2010.
[29] J. Yu and S. Narayanasamy, “A Case for an Interleaving Constrained Shared-Memory Multi-Processor,” Proceeding of the 36th Annual International Symposium on Computer Architecture (ISCA 09), pp. 325-336, Austin, TX, US, Jun. 2009.
[30] L. Hammond, et al., “Transactional Memory Coherence and Consistency,” Proceeding of the 31th Annual International Symposium on Computer Architecture (ISCA 04), pp. 102-114, München, Germany, Jun. 2004.
[31] L. Lamport, “Time, clocks and ordering of events in a distributed system,” Communications of the ACM, vol. 21, no. 7, pp. 558-565, 1978.
[32] L.-Y. Kuo, “A two-level cache design for multi-core system,” 2009 master thesis of National Cheng Kung University, Tainan, Taiwan, Jul. 2009.
[33] M. Musuvathi, S. Qadeer, T. Ball, G. Basler, P. A. Nainar and I. Neamtiu, “Finding and reproducing Heisenbugs in concurrent programs,” Proceeding of the 7th USENIX conference on Operating Systems Design and Implementation (OSDI 08), San Diego, California, USA, 2008.
[34] M. Prvulovic, “CORD: Cost-effective (and nearly overhead-free) Order-Recording and Data Race Detection,” Proceeding of the 12th International Symposium on High Performance Computer Architecture (HPCA 06), Austin, TX, US, Feb. 2006.
[35] M. Prvulovic and J. Torrellas, “ReEnact: Using Thread-Level Speculation Mechanisms to Debug Data Races in Multithreaded Codes,” Proceeding of the 30th Annual International Symposium on Computer Architecture (ISCA 03), pp. 110-121, San Diego, California, USA, Jun. 2003.
[36] M. Xu, R. Bodik and M. Hill, “A Serializability Violation Detector for Shared-Memory Server programs,” Proceeding of the 2005 ACM SIGPLAN conference on Programming language design and implantation (PLDI 05), pp. 1-14, Chicago, Illinois, USA, Jun. 2005.
[37] M. Xu, M. Hill and R. Bodik, “A Regulated Transitive Reduction (RTR) for Longer Memory Race Recording,” Proceeding of the 12th International conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 06), pp. 49-60, Oct. 2006.
[38] N. Sterling, “Warlock: A static data race analysis tool,” USENIX Winter Technical Conference, 1993.
[39] P. Montesinos, L. Ceze and J. Torrellas, “DeLorean: Recording and Deterministically Replaying Shared-Memory Multiprocessor Execution Efficiently,” Proceeding of the 35th Annual International Symposium on Computer Architecture (ISCA 08), pp. 289-300, Beijing, China, Jun. 2008.
[40] P. Zhou, F. Qin, W. Lin, et al., “iWatcher: Efficient Architectural Support for Software Debugging,” Proceeding of the 37th Proceeding of the 31th Annual International Symposium on Computer Architecture (ISCA 04), pp. 102-114, München, Germany, Jun. 2004.
[41] P.Zhou, R. Teodorescu and Y. Zhou, “HARD: Hardware-Assisted Lockset-based Race Detection,” Proceeding of the 13th International Symposium on High Performance Computer Architecture (HPCA 07), Phoenix, Arizona, US, Feb. 2007.
[42] P. Zhou, W. Liu, L. Fei, et al., “AccMon: Automatically Detecting Memory-related Bugs via Program Counter-based Invariants,” Proceeding of the 37th Annual IEEE/ACM International Symposium on Micro-architecture (MICRO 04), pp. 269-280, Doubletree Hotel, Portland, Oregon, Dec. 2004
[43] R. Stallman, R. Pesch, S. Shebs, et al., “Debugging with GDB-The GNU Source-level Debugger 9th Edition,” Cygnus Solutions, Feb. 2004.
[44] R. O’Callahan and J.-D. Choi, “Hybrid Dynamic Data Race Detection,” Proceeding of the 9th ACM SIGPLAN symposium on Principle and Practice of Parallel Programming (PPoPP 03), pp. 167-178, San Diego, California, USA, Jun. 2003.
[45] S. C. Woo and et al., “The SPLASH-2 Programs: Characterization and Methodological Considerations,” Proceeding of the 22th Annual International Symposium on Computer Architecture (ISCA 95), pp. 24-36, Santa Margherita Ligure, Italy, Jun. 1995.
[46] S.-L. Min and J.-D. Choi, “An efficient cache-based access anomaly detection scheme,” Proceeding of the 4th International conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 91), pp. 235-244, , New York, NY, USA, 1991.
[47] S. Lu, P. Zhou, W. Liu, Y. Zhou and J. Torrellas, “PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection,” Proceeding of the 39th Annual IEEE/ACM International Symposium on Micro-architecture (MICRO 06), pp. 38-52, Orlando, Florida, USA, Dec. 2006.
[48] S. Lu, J. Tucek, F. Qin and Y. Zhou, “AVIO: Detecting Atomicity via Access Interleaving Invariants,” Proceeding of the 12th International conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 06), pp. 37-48, Oct. 2006.
[49] S. Narayanasamy, C. Pereira and B. Calder, “Recording Shared Memory Dependencies Using Strata,” Proceeding of the 12th International conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 06), pp. 229-240, Oct. 2006.
[50] S. Narayanasamy, Z. Wang, J. Tigani, et al., “Automatically Classifying Benign and Harmful Data Races Using Replay Analysis” Proceeding of the 2007 ACM SIGPLAN conference on Programming language design and implantation (PLDI 07), pp. 22-31, San Diego, California, USA, Jun. 2007.
[51] S. Savage, M. Burrow, G. Nelson, P. Sobalvarro and T. Anderson, “Eraser: A dynamic data race detector for multithreaded programs,” ACM Transaction on Computer Systems (TCS), vol. 15, no. 4, pp. 391-411, New York, NY, USA, Nov. 1997.
[52] S. Kaxiras and G. Keramidas, “SARC Coherence: Scaling Directory Cache Coherence in Performance and Power,” Proceeding of the 43th Annual IEEE/ACM International Symposium on Micro-architecture (MICRO 10), vol. 30, no.5, pp. 54-65, Sep. 2010.
[53] S.-Y. Lee, “An Instruction Set Simulator with GDB Support and its Full System Simulation Virtual Platform,” 2010 master thesis of National Cheng Kung University, Tainan, Taiwan, Jul. 2010.
[54] T. Elmas, s. Qadeer and S. Tasiran, “Goldilocks: A race and transaction-aware java runtime,” Proceeding of the 2007 ACM SIGPLAN conference on Programming language design and implantation (PLDI 07), pp. 245-255, New York, NY, USA, Jun. 2007.
[55] Y. Qi, R. Das, Z. D. Luo and M. Trotter, “MulticoreSDK: A Pratical and Efficient Data Race Detector,” Proceedings of the 7th Workshop on Parallel and Distributed Systems (PADTAD 09), no. 5, Chicago, Illinois, USA, Jun. 2009.
[56] Y. Yu, T. Rodeheffer and W. Chen, “RaceTrack: Efficient Detection of Data Race Condition via Adaptive Tracking,” Proceedings of the 20th ACM symposium on Operating Systems Principles (SOSP 05), pp. 221-234, Brighton, United Kingdom, Oct. 2005.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2013-08-31起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2013-08-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw