進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-2007202010202600
論文名稱(中文) 具可重組態及層重用的捲積神經網絡加速器設計
論文名稱(英文) Design of Convolutional Neural Network Accelerator with Reconfigurable Layer Reuse
校院名稱 成功大學
系所名稱(中) 資訊工程學系
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 108
學期 2
出版年 109
研究生(中文) 阮德日光
研究生(英文) Quang Nguyen
學號 P76077078
學位類別 碩士
語文別 英文
論文頁數 39頁
口試委員 指導教授-林英超
口試委員-陳培殷
口試委員-Thang Viet Huynh
中文關鍵字 卷積神經網路  層組  積體電路設計  層重用 
英文關鍵字 convolutional neural network  layer group  IC design  layer reuse 
學科別分類
中文摘要 卷積神經網絡( CNN)在圖像識別,目標檢測和自動駕駛汽車等許多領域得到了廣泛使用,並且隨著層數的增加,它需要大量的計算和記憶體使用。因此,降低其計算複雜度和記憶體使用量至關重要。在本論文中,我們對特徵圖和權重使用定點量化,並提出了高度靈活的 CNN 加速器。我們使用 8 位元定點量化來大大減少特徵圖和權重的存儲空間要求,而 LeNet-5 使用在 MNIST 數據集的準確性僅略有降低。在硬件加速器中,我們提出了一種具有可重配置層組的高度靈活的 CNN 加速器。層組將 Padding、 Convolution、 ReLU、 Max-Pooling 和 Flatten 組合到一個層中,並且該層是可重新配置的,並且能夠執行 Convolution 或 Max-Pooling 操作或這兩者都做。我們提出的方法的優點在於,通過組合不同的層或電路,可以減少延遲和面積以及功耗。模擬結果表示,該方法將關鍵路徑延遲減少了 29.07%,而面積和功耗分別減少了 13.54%和 20.28%。此外,我們的層組可以在 CNN 加速器中的多個位置多次重複使用,以進一步改善結果。
英文摘要 Convolutional neural network (CNN) is widely used in many areas such as image recognition, object detection, and self-driving cars and it requires a huge amount of computation and memory usage when the number of layers increases. Hence, it is critical to reduce its computational complexity and memory usage. In this thesis, we use a fixed-point quantization for feature maps as well as weights and propose a highly flexible CNN accelerator. We use 8-bit fixed-point quantization to greatly reduce the memory space requirement of the feature maps and weights and the accuracy of LeNet-5 with MNIST dataset is only slightly reduced. In the hardware accelerator, we propose a highly flexible CNN accelerator with reconfigurable layer group. The layer group combines padding, convolution, ReLU, maxpooling and flatten operations into a layer, and the layer is reconfigurable, and it is able to perform convolutional operations or max-pooling operations or both of them. The advantage of the proposed method is that by combining different layers or circuits, the delay and area, as well as power consumption, can be reduced. The simulation results show that the proposed approach reduces the critical path delay by 29.07% while cell area and power consumption is decreased by 13.54% and 20.28% respectively. In addition, our layer group can be reused for multiple times at multiple places in the CNN accelerator to further improve the results.
論文目次 摘要 i
Abstract ii
Table of Contents iii
List of Tables v
List of Figures vi
Chapter 1. Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Main contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2. Background and Related Works 4
2.1 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1 Convolutional Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.3 Fully-Connected Layer . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Fixed-Point Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Layer Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4 Reconfigurable Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Chapter 3. CNN Model Selection and Data Quantization 13
3.1 Workflow Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 CNN Model Selection and Training . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Data Quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Chapter 4. Hardware Architecture 18
4.1 Platform Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 Reconfigurable Layer Group . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2.1 Convolutional Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.2 Max-Pooling Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.3 Combined Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Layer Group Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Chapter 5. Experimental Setup and Results 32
5.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.1 Resources Utilization . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.2 Memory Usage Comparison . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.3 Critical Path Delay, Cell Area and Power Comparison . . . . . . . . . 36
Chapter 6. Conclusion 37
References 38
參考文獻 [1] S. Draghici. On the Capabilities of Neural Networks using Limited Precision Weights. Neural Networks, pages 395 – 414, 2002.
[2] J. Holi and J. Hwang. Finite Precision Error Analysis of Neural Network Hardware Implementations. IEEE Transactions on Computers, pages 281–290, 1993.
[3] M. Horowitz. 1.1 Computing’s Energy Problem (and what we can do about it). In IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pages 10–14, 2014.
[4] M. Link. Fully Connected Layers in Convolutional Neural Networks: The Complete Guide. https://missinglink.ai/guides/convolutional-neural-networks, 2016.
[5] H. Muckenhirn, M. Magimai.-Doss, and S. Marcell. Towards Directly Modeling Raw Speech Signal for Speaker Verification using CNNs. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4884–4888, 2018.
[6] V. Nair and G. Hinton. Rectified Linear Units Improve Restricted Boltzmann Machines. In International Conference on Machine Learning (ICML), pages 807–814, 2010.
[7] D. Rongshi and T. Yongming. Accelerator Implementation of Lenet-5 Convolution Neural Network Based on FPGA with HLS. In International Conference on Circuits, System and Simulation (ICCSS), pages 64–67, 2019.
[8] S. Saha. Convolutional Neural Networks (CNNs). http://fourier.eng.hmc.edu/e176/lectures/ch10/node8.html, December 2018.
[9] B. Jacob et al. Quantization and Training of Neural Networks for Efficient IntegerArithmetic-Only Inference. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2704–2713, 2018.
[10] C. Szegedy et al. Going Deeper with Convolutions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1–9, 2015.
[11] C. Zhang et al. Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks. In ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), page 161– 170, New York, NY, USA, 2015.
[12] G. Huang et al. Densely Connected Convolutional Networks. arXiv e-prints, page 1608.06993, August 2016.
[13] I. Freeman et al. Effnet: An Efficient Structure for Convolutional Neural Networks. In IEEE International Conference on Image Processing (ICIP), pages 6–10, 2018.
[14] J. Redmon et al. You Only Look Once: Unified, Real-Time Object Detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 779–788, 2016.
[15] K. Guo et al. Software-Hardware Codesign for Efficient Neural Network Acceleration. IEEE Micro, pages 18–25, 2017.
[16] K. He et al. Deep Residual Learning for Image Recognition. arXiv e-prints, page 1512.03385, December 2015.
[17] M. Bojarski et al. End to End Learning for Self-Driving Cars. arXiv e-prints, page 1604.07316, April 2016.
[18] M. Sankaradas et al. A Massively Parallel Coprocessor for Convolutional Neural Networks. In IEEE International Conference on Application-specific Systems, Architectures and Processors, pages 53–60, 2009.
[19] O. Köpüklü et al. Convolutional Neural Networks with Layer Reuse. In IEEE International Conference on Image Processing (ICIP), pages 345–349, 2019.
[20] R. Hu et al. Learning to Reason: End-to-End Module Networks for Visual Question Answering. In IEEE International Conference on Computer Vision (ICCV), pages 804–813, 2017.
[21] T. Chen et al. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning. ACM SIGARCH Computer Architecture News, page 269– 284, February 2014.
[22] T. Pohlen et al. Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3309–3318, 2017.
[23] W. Chen et al. An Asynchronous and Reconfigurable CNN Accelerator. In IEEE International Conference on Electron Devices and Solid State Circuits (EDSSC), pages 1–2, 2018.
[24] X. Zhang et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6848–6856, 2018.
[25] Y. Chen et al. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits, pages 127–138, 2017.
[26] Y. Lecun et al. Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE, pages 2278–2324, 1998.
[27] Y. Liao et al. Low Power CNN Accelerator for Mobile Lensless Imaging System. In IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC), pages 1–3, 2019.
[28] Y. Wei et al. HCP: A Flexible CNN Framework for Multi-Label Image Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(9):1901–1907, 2016.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2025-12-30起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2025-12-30起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw