進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-2207201922352300
論文名稱(中文) 電子系統層級虛擬平台之卷積加速器設計與驗證
論文名稱(英文) An ESL(electronic system level)virtual platform for convolution accelerator design and verification
校院名稱 成功大學
系所名稱(中) 電腦與通信工程研究所
系所名稱(英) Institute of Computer & Communication
學年度 107
學期 2
出版年 108
研究生(中文) 林柏榕
研究生(英文) Bo-Rong Lin
學號 VM6052084
學位類別 碩士
語文別 中文
論文頁數 53頁
口試委員 指導教授-陳中和
口試委員-陳培殷
口試委員-郭致宏
中文關鍵字 電子系統層級設計  類神經網路加速器  資料切割  資料重用 
英文關鍵字 Electronic System Level design  Neural network accelerator  Data partition  Data reuse 
學科別分類
中文摘要 近年來AI(Artificial Intelligence)的蓬勃發展,使深度類神經網路(Deep Neural Networks, DNNs)廣泛應用於各個領域,而AI(Artificial Intelligence)與IoT(Internet of Things)的結合使AI的應用朝向嵌入式系統發展;但是DNN在運算過程中會面臨大量的資料搬移與計算複雜度的問題,因此對於嵌入式系統的功率消耗及效能會是巨大的挑戰。為了解決此問題當前的研究主要是朝著FPGA、ASIC硬體加速器進行探討。
本論文基於Micro Darknet for Inference框架來開發ESL虛擬平台,其平台主要針對嵌入式系統來開發加速器之微架構並驗證其正確性,透過ESL的概念能使開發者在開發初期進行軟硬體協同模擬,以達到快速開發與驗證的目的。我們使用MDFI分析YOLOv3-tiny模型的每一層執行時間,根據我們的觀察在整個執行時間中有93%是執行卷積運算,因此我們設計卷積加速器來加速卷積運算。其加速器也提供一些可重新配置的參數,其可配置的參數有PE個數、加速器的記憶體大小、加速器上資料重用的方法,使開發者能夠快速開發加速器與驗證並對其進行效能評估。
最後我們於ESL虛擬平台與Raspberry Pi3上執行YOLOv3-tiny模型來驗證卷積加速器的行為正確性,並假設每一層的資料都能一次載入至加速器上的記憶體之後進行效能的評估,結果顯示ESL虛擬平台的執行時間快了Raspberry Pi3大約2.3x左右。
英文摘要 In recent years, Deep Neural Networks (DNNs) have been successfully applied to many computer visions. However, DNN needs to face a lot of data movements and computational complexities in the calculation process, so it will be a huge challenge for power consumption and performance.
In this paper, we propose an ESL virtual platform based on MDFI (Micro Darknet for Inference) for convolution accelerator design and verification. In order to quickly develop accelerators and perform verification in the early stages of development, we assume that the data for each layer of the model can be loaded into the memory on the accelerator at a time and compared with the Raspberry Pi3. The result shows that the execution time of the ESL virtual platform is about 2.3x faster than the Raspberry Pi3.
論文目次 摘要 I
誌謝 VIII
目錄 IX
表目錄 XII
圖目錄 XIII
第1章 序論 1
1.1 論文動機 2
1.2 論文貢獻 2
1.3 論文架構 3
第2章 背景知識與相關研究 4
2.1 Electronics System Level design 4
2.1.1 Simulation Accuracy 5
2.1.2 SystemC 6
2.2 QEMU 7
2.2.1 Portable Dynamic Translation 7
2.2.2 Translation Block 8
2.2.3 Peripheral Model 9
2.2.4 Interrupt Handler 9
2.3 Linux Device Driver 10
2.3.1 Classes of devices and modules 11
2.3.2 I/O ports and I/O memory 11
2.4 MDFI 12
2.5 Computer Vision 14
2.5.1 Image Classification 14
2.5.2 Object Detection 16
2.6 加速器架構與DNN硬體加速器 18
2.6.1 加速器架構 18
2.6.2 DNN硬體加速器 20
2.7 Data reuse method 21
2.7.1 No local reuse 21
2.7.2 Input reuse 22
2.7.3 Filter reuse 23
2.7.4 Output reuse 24
第3章 虛擬平台與卷積加速器設計與實現 25
3.1 虛擬平台介紹 25
3.2 分析模型時間瓶頸 26
3.3 修改MDFI使其符合平台需求 27
3.4 卷積加速器驅動程式設計 28
3.4.1 User space driver 28
3.4.2 Kernel space driver 29
3.5 QEMU與SystemC溝通介面設計 30
3.6 卷積加速器設計 31
3.6.1 Configuration register 32
3.6.2 On-chip memory 32
3.6.3 Controller 33
3.6.4 PE Architecture 34
3.7 Control flow 35
3.7.1 Filter size 1x1 35
3.7.2 Filter size 3x3 36
3.7.3 Other filter size 37
3.8 Execution flow 38
3.8.1 MDFI 38
3.8.2 CAS_Driver 39
3.8.3 Accelerator 39
第4章 實驗結果與效能評估 40
4.1 Verification result of different NN model 40
4.1.1 Object detection – YOLOv3-tiny 41
4.1.2 Image classification – AlexNet and ResNet18 42
4.2 Verification different data reuse of NN model on VP 43
4.2.1 Input reuse 43
4.2.2 Filter reuse 43
4.2.3 Output reuse 44
4.3 Performance evaluation 45
4.3.1 Time of execution other layers on QEMU 45
4.3.2 Time of bus transfer data 46
4.3.3 Time of Accelerator operation 47
4.3.4 Comparisons performance and analysis 49
第5章 結論與未來展望 50
5.1 結論 50
5.2 未來目標 50
參考文獻 52
參考文獻 [1] Y. H. Chen, T. Krishna, J. Emer, and V. Sze, “Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks,” IEEE Journal of Solid-State Curcyuts(JSSC), vol. 52, no. 1, pp. 127-138, Jan. 2017.
[2] S.Han et al., “EIE: Efficient Inference Engine on Compressed Deep Neural Network,” Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016, vol. 16, pp. 243–254, 2016
[3] Nvdla deep learning accelerator. http://nvdla.org, 2017.
[4] A. Gerstlauer , C. Haubelt , A. D. Pimentel , T. P. Stefanov , D. D. Gajski , and J.Teich, “Electronic System-Level Synthesis Methodologies,” IEEE Trans. Comput. Des. Integr. Circuits and Syst., vol. 28, no. 10, pp. 1517-1530, 2009.
[5] G. Schirner, A. Gerstlauer, and R. Domer, “Fast and Accurate Processor Models for Efficient MPSoC Design,” ACM TODAES, vol. 15, Iss. 2, Article 10, Feb. 2010.
[6] B. Fabrice, “QEMU, a Fast and Portable Dynamic Translator,” Proceeding of USENIX Annual Technical Conference, pp. 41-46, 2005.
[7] T. Wei-Chung, “Layer-wise Fixed Point Quantization for Deep Convolutional Neural Networks and Implementation of YOLOv3 Inference Engine/深度卷積網路之逐層定點數量化方法與實作YOLOv3推論引擎,” Natl. Cheng K. Univ. - NCKU, 2019.
[8] J. Min-Zhi, “Optimization of YOLOv3 Inference Engine for Edge Device/優化YOLOv3推論引擎並實現於終端裝置,” Natl. Cheng K. Univ. - NCKU, 2018.
[9] J. Redmon, “Darknet: Open source neural networks in c.,” 2013. [Online]. Available: http://pjreddie.com/darknet/..
[10] G. E. H.Krizhevsky, Alex, Ilya Sutskever, “ImageNet Classification with Deep Convolutional Neural Networks,” J. Geotech. Geoenvironmental Eng., vol. 12, p. 04015009, 2015.
[11] K. Simonyan, A. Zisserman, “Very Deep Convolutional NetWorks for Large-Scale Image Recognition,” In ICLR, 2015.
[12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” Comput. Vis. Pattern Recognit., pp. 770–778, 2016.
[13] J. Redmon, S.Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” 2015.
[14] Guo, et al., “Angel-Eye:A Complete Design Flow for Mapping CNN onto Embedded FPGA,” IEEE Trans, Computer Aided Design of Integrated Circuits and Systems(TCAD), DOI 10.1109/TCAD,2017.
[15] Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. “DaDianNao: A Machine-Learning Supercomputer,” Proc. IEEE/ACM Intl. Microarchitecture (MICRO), 2015, pp. 609-622
[16] Z. Du, et al., “ShiDianNao: Shifting vision processing closer to the sensor,” Proc. Intl. Symp. Computer Architecture (ISCA), pp. 82-1024, 2015.
[17] S. Zhang, etc al., “Cambricon-X: An accelerator for sparse neural networks,” Proc. IEEE/ACM Intl. Microarchitecture (MICRO), 2016.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2024-01-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2024-01-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw