系統識別號 U0026-1211201720161600
論文名稱(中文) 藉由重複存取行為感知配置與遷移策略減少非對稱性混合式底層快取能耗
論文名稱(英文) RAP: Reducing the Energy of Asymmetric Hybrid Last-Level Cache via Repetitive Access Aware Placement and Migration
校院名稱 成功大學
系所名稱(中) 資訊工程學系
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 106
學期 1
出版年 106
研究生(中文) 羅靖淵
研究生(英文) Jing-Yuan Luo
學號 P76044279
學位類別 碩士
語文別 英文
論文頁數 40頁
口試委員 指導教授-林英超
中文關鍵字 自旋轉移力矩隨機存取記憶體(STT-MRAM)  混合式快取記憶體  能源消耗  非對稱式記憶體  震盪緩存塊 
英文關鍵字 Spin-Transfer Torque Magnetoresistive Random Access Memory(STT-MRAM)  Hybrid cache  Energy consumption  Asymmetric memory  Thrashing block 
中文摘要 近年來,新型非揮發性記憶體(NVM) 具有低靜態耗能及高密度的特性而引起相當大的重視。
其中,自旋轉移力矩磁性隨機存取記憶體(STT-MRAM) 具有與SRAM相仿的讀取速度,相當適合製作高容量Last level cache(LLC)。然而,STT-MRAM 卻具有寫入時間長與寫入能耗高等問題。
為了減少STT-MRAM 的非對稱性存取能耗與延遲影響,藉由結合SRAM 與STT-MRAM 優點設計出混合式快取記憶體(hybrid cache),且為了有效使用混合式記憶體需要設計良好的配置與遷移策略。
此論文中,我們發現衝突性失誤(conflict miss) 經常發生於L2 而導致許多緩存塊(cache block) 在L2 與LLC 中震盪。
如果這些會造成寫入動作的震盪緩存塊,例如: 汙染的震盪緩存塊(dirty thrashing blocks) 被配置在hybrid cache 的STT-MRAM中,將會造成過多的底層快取能耗,特別在執行記憶體存取次數較多的程式上。

因此,本論文提出重複存取行為感知配置與遷移策略(RAP) 以降低震盪緩存塊對底層混合式快取造成過多的能耗。
RAP 配置汙染的的震盪緩存塊(dirty thrashing blocks) 於SRAM 並且將SRAM 被移出的乾淨震盪緩存塊(clean thrashing blocks) 遷移至STT-MRAM。
RAP 最高可以改善38.04% 與平均改善19.96% 的底層快取能耗於四核心系統執行四支相同測試程式。
在四核心系統中執行四支混合不同的測試程式,我們的方法可以最高改善36.60% 與平均改善25.92% 的底層快取能耗。
在執行記憶體存取次數較多的程式上與其他的研究比較,我們的方法可以比access aware policy 改善平均20.20% (最高35.68%),以及與adaptive placement and migration policy 比較,我們的方法 可以改善平均9.75% (最高26.67%)的底層快取能耗且僅有些微系統效損失。
英文摘要 In recent years, emerging non-volatile memory (NVM) has favorable properties, such as low leakage and high density and has attracted a lot of attention.
Among them, spin-transfer torque magnetoresistive random access memory (STT-MRAM) that has read speed comparable to SRAM is a good candidate to build large last-level caches (LLCs).
However, STT-MRAM suffers from long write latency and high write energy.
To mitigate the impact of asymmetric read/write energy and latency, hybrid cache designs have been proposed to combine the merits of STT-MRAM and SRAM and good block placement
and migration policies are necessary to use the hybrid cache efficiently. In this thesis, we find that conflict
miss occurs on L2 frequently causing the blocks thrash between L2 and LLC.
If the thrashing block that cause write activities, i.e. dirty thrashing block, are placed in STT-MRAM of hybrid LLC, these will result in excessive energy consumption, especially in memory bound benchmarks.
Therefore, this thesis proposes repetitive access aware placement and migration (RAP) to mitigate energy consumption caused by thrashing block. RAP places dirty thrashing blocks into SRAM and migrate clean thrashing blocks, which is evicted from SRAM, to STT-MRAM.

RAP can reduce up to 38.04% and 19.96% reduction on average of LLC energy for running four copies of workloads on a four-core system.
For a four-core system that runs mix workloads, our technique also can reduce up to 36.60% and 25.92% on average of LLC energy.
When compared to the previous access aware policy and adaptive placement and migration policy, the proposed technique can reduce 20.20% on average (up to 35.68%) and 9.75.% on average (up to 26.67%) energy consumption with insignificant performance degradation. Evaluation results show large energy consumption reduction with minimal performance loss.
論文目次 摘要 .............i
Abstract ............iii
Table of Contents ............v
List of Tables ...........vi
List of Figures ........ ....vii
Chapter 1.Introduction 1
1.1 Background ............ 1
1.2 Main Contributions .......... 3
1.3 Thesis Organization ........... 4
Chapter 2.Preliminaries and Motivation 5
2.1 Fundamentals of STT-MRAM ......... 5
2.2 Repetitive Access Behavior .......... 7
2.2.1.Classification of LLC block operations ....... . 7
2.2.2.Definition of the thrashing block ....... . 9
Chapter 3.Hybrid Cache Architecture with RAP 12
3.1 Thrashing block predictor .......... 13
3.2 Repetitive access aware placement and migration policy ..... 15
3.3 Thrashing-block aware replacement policy .......17
Chapter 4.Evaluation and Results 19
4.1 Evaluation setup ...........19
4.2 Running single workloads on multi-core system ..... .22
4.3 Running mix workloads on multi-core system ...... . 27
4.4 Hardware overhead .......... 31
4.5 Sensitivity analysis ...........32
4.5.1.Evaluation for thrashingth and migrationth ...... 32
4.5.2.Evaluation on the system with higher core count ......33
4.5.3.Evaluation for different STT-MRAM read/write energy ratios ... .36
Chapter 5.Conclusion .......... 38
References ............. 39
參考文獻 [1] J. Ahn, S. Yoo, and K. Choi. Write intensity prediction for energy-efficient non-volatile
caches. In International Symposium on Low Power Electronics and Design (ISLPED),
pages 223–228, Sept 2013.
[2] J. Ahn, S. Yoo, and K. Choi. Dasca: Dead write prediction assisted stt-ram cache architecture.
In 2014 IEEE 20th International Symposium on High Performance Computer
Architecture (HPCA), pages 25–36, Feb 2014.
[3] J. Ahn, S. Yoo, and K. Choi. Prediction hybrid cache: An energy-efficient stt-ram cache
architecture. IEEE Transactions on Computers, 65(3):940–951, March 2016.
[4] Dmytro Apalkov, Alexey Khvalkovskiy, Steven Watts, Vladimir Nikitin, Xueti Tang,
Daniel Lottis, Kiseok Moon, Xiao Luo, Eugene Chen, Adrian Ong, Alexander Driskill-
Smith, and Mohamad Krounbi. Spin-transfer torque magnetic random access memory
(stt-mram). J. Emerg. Technol. Comput. Syst., 9(2):13:1–13:35, May 2013.
[5] Jean-Loup Baer and Tien-Fu Chen. An effective on-chip preloading scheme to reduce
data access penalty. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing,
Supercomputing ’91, pages 176–186, New York, NY, USA, 1991. ACM.
[6] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi,
Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti,
Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and
David A. Wood. The gem5 simulator. SIGARCH Comput. Archit. News, 39(2):1–7,
August 2011.
[7] M. T. Chang, P. Rosenfeld, S. L. Lu, and B. Jacob. Technology comparison for large lastlevel
caches (l3cs): Low-leakage sram, low write-energy stt-ram, and refresh-optimized
edram. In 2013 IEEE 19th International Symposium on High Performance Computer
Architecture (HPCA), pages 143–154, Feb 2013.
[8] H. Y. Cheng, J. Zhao, J. Sampson, M. J. Irwin, A. Jaleel, Y. Lu, and Y. Xie. Lap: Loopblock
aware inclusion properties for energy-efficient asymmetric last level caches.
In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture
(ISCA), pages 103–114, June 2016.
[9] P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie. Prime: A
novel processing-in-memory architecture for neural network computation in rerambased
main memory. In 2016 ACM/IEEE 43rd Annual International Symposium on
Computer Architecture (ISCA), pages 27–39, June 2016.
[10] X. Dong, C. Xu, Y. Xie, and N. P. Jouppi. Nvsim: A circuit-level performance, energy,
and area model for emerging nonvolatile memory. IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, 31(7):994–1007, July 2012.
[11] John L. Henning. Spec cpu2006 benchmark descriptions. SIGARCH Comput. Archit.
News, 34(4):1–17, September 2006.
[12] Benjamin C. Lee, Engin Ipek, Onur Mutlu, and Doug Burger. Architecting phase change
memory as a scalable dram alternative. In Proceedings of the 36th Annual International
Symposium on Computer Architecture, ISCA ’09, pages 2–13, New York, NY, USA,
2009. ACM.
[13] Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P Jouppi. Cacti 6.0: A
tool to model large caches. HP Laboratories, pages 22–31, 2009.
[14] H. Noguchi, K. Ikegami, N. Shimomura, T. Tetsufumi, J. Ito, and S. Fujita. Highly reliable
and low-power nonvolatile cache memory with advanced perpendicular stt-mram
for high-performance cpu. In 2014 Symposium on VLSI Circuits Digest of Technical
Papers, pages 1–2, June 2014.
[15] Takashi Ohsawa, Hiroki Koike, Sadahiko Miura, Hiroaki Honjo, Keiichi Tokutome,
Shoji Ikeda, Takahiro Hanyu, Hideo Ohno, and Tetsuo Endoh. 1mb 4t-2mtj nonvolatile
stt-ram for embedded memories using 32b fine-grained power gating technique with
1.0ns/200ps wake-up/power-off times. In VLSIC, 2012.
[16] Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, and Jude A. Rivers. Scalable high
performance main memory system using phase-change memory technology. In Proceedings
of the 36th Annual International Symposium on Computer Architecture, ISCA
’09, pages 24–33, New York, NY, USA, 2009. ACM.
[17] Z. Sun, X. Bi, H. Li, W. F. Wong, Z. L. Ong, X. Zhu, and W. Wu. Multi retention level
stt-ram cache designs with a dynamic refresh scheme. In 2011 44th Annual IEEE/ACM
International Symposium on Microarchitecture (MICRO), pages 329–338, Dec 2011.
[18] Shun-Ming Syu, Yu-Hui Shao, and Ing-Chao Lin. High-endurance hybrid cache design
in cmp architecture with cache partitioning and access-aware policy. In Proceedings of
the 23rd ACM International Conference on Great Lakes Symposium on VLSI, GLSVLSI
’13, pages 19–24, New York, NY, USA, 2013. ACM.
[19] K. Tsuchida, T. Inaba, K. Fujita, Y. Ueda, T. Shimizu, Y. Asao, T. Kajiyama,
M. Iwayama, K. Sugiura, S. Ikegawa, T. Kishi, T. Kai, M. Amano, N. Shimomura,
H. Yoda, and Y. Watanabe. A 64mb mram with clamped-reference and adequatereference
schemes. In 2010 IEEE International Solid-State Circuits Conference -
(ISSCC), pages 258–259, Feb 2010.
[20] Z. Wang, D. A. Jiménez, C. Xu, G. Sun, and Y. Xie. Adaptive placement and migration
policy for an stt-ram-based hybrid cache. In 2014 IEEE 20th International Symposium
on High Performance Computer Architecture (HPCA), pages 13–24, Feb 2014.
[21] C. Xu, D. Niu, N. Muralimanohar, R. Balasubramonian, T. Zhang, S. Yu, and Y. Xie.
Overcoming the challenges of crossbar resistive memory architectures. In 2015 IEEE
21st International Symposium
  • 同意授權校內瀏覽/列印電子全文服務,於2022-09-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2022-09-01起公開。

  • 如您有疑問,請聯絡圖書館