系統識別號 U0026-2201201912020800
論文名稱(中文) 深度卷積網路之逐層定點數量化方法與實作YOLOv3推論引擎
論文名稱(英文) Layer-wise Fixed Point Quantization for Deep Convolutional Neural Networks and Implementation of YOLOv3 Inference Engine
校院名稱 成功大學
系所名稱(中) 電腦與通信工程研究所
系所名稱(英) Institute of Computer & Communication
學年度 107
學期 1
出版年 108
研究生(中文) 曾微中
研究生(英文) Wei-Chung Tseng
學號 Q36054227
學位類別 碩士
語文別 中文
論文頁數 70頁
口試委員 指導教授-陳中和
中文關鍵字 類神經網路加速  卷積運算  網路量化  前項傳播  終端AI 
英文關鍵字 Edge Device  Machining Learning  CNNs quantization  CNNs optimization 
中文摘要 先進的深度卷積神經網絡在許多領域取得巨大的成功,但由於其通常需要龐大的運算資源,而無法應用於終端移動設備。例如在Raspberry Pi 3上使用Tensorflow執行SSD_Mobilenet-專為終端裝置優化的物件辨識網路,辨識單張圖片需要大約25秒。而對於更多層的DNN模型例如Resnet101 Faster RCNN光是權重就需要接近600Mbyte ,由於所需記憶體過大甚至無法在Raspberry Pi 3上執行。
本論文提出一種網路量化方法以及硬體前期設計架構MDFI(Micro Darknet For Inference)。MDFI作為純C語言構成的前向傳導DNN框架,主要支援物件辨識網路模型,不使用動態函式庫例如Protocol-buffer以及保持不到280kByte的執行檔大小,適合為終端移動設備所使用。由於不使用動態函式庫,其運算行為可作為硬體設計的參照,作為ESL的前期描述模型。
英文摘要 With the increasing popularity of mobile devices and the effectiveness of deep learning-based algorithms, people try to put deep learning models on mobile devices. However, it is limited by the complexity of computational and software overhead.
We propose an efficient framework for inference to fit resource-limited devices with about 1000 times smaller than Tensorflow in code size, and a layer-wised quantization scheme that allows inference computed by fixed-point arithmetic. The fixed-point quantization scheme is more efficient than floating point arithmetic with power consumption reduced to 8% left in cost grained evaluation and reduce model size to 40%~25% left, and keep TOP5 accuracy loss under 1% in Alexnet on ImageNet.
論文目次 摘要 I
目錄 VI
圖目錄 X
第1章 序論 1
1.1 論文動機 1
1.2 論文貢獻 2
1.3 論文架構 3
第2章 背景知識 4
2.1 類神經網路(Neural Network) 4
2.1.1 卷積神經網路(Convolution Neural Network ) 4
2.2 DNN in Computer Vision 7
2.2.1 Image Classification 7
2.2.2 Objection Detection 8
2.2.3 Image Segmentation 12
2.3 DNN開發介紹 13
2.4 DNN開發環境 14
第3章 相關議題探討 related work 15
第4章 MDFI (Micro Darknet for Inference) 16
4.1 實作目的 16
4.2 基礎架構選擇 17
4.3 Darknet 19
4.3.1 Darknet使用 19
4.3.2 Darknet 架構 21
4.4 MDFI (Micro Darknet For Inference) 22
4.4.1 系統架構與實作方法 23
4.4.2 實作結果 25
第5章 定點數量化(Fixed Point) 29
5.1 Why Fixed Point 29
5.2 定點數格式 30
5.3 量化方法 31
5.3.1 量化目標 33
5.3.2 點積 34
5.3.3 偏移量相加 36
5.3.4 方法調校 37
5.3.5 小結 37
第6章 實驗環境與數據 38
6.1 定點數相關參數探討 38
6.1.1 模型各卷積層對定點數的敏感度 38
6.1.2 不同輸入對特徵圖的影響 41
6.1.3 數值分佈 42
6.2 定點數準確度比較 44
6.3 方法參數與模型準確度 45
6.3.1 影像分類 45
6.3.2 物件辨識 46
6.4 量化對模型尺寸的影響 47
6.5 方法參數與位元長度 48
6.6 定點數運算對運算功率的影響 49
6.7 小結 50
第7章 結論與未來目標 52
7.1 結論 52
7.2 未來目標 52
參考文獻 53
參考文獻 [1] G. E. H.Krizhevsky, Alex, Ilya Sutskever, “ImageNet Classification with Deep Convolutional Neural Networks,” J. Geotech. Geoenvironmental Eng., vol. 12, p. 04015009, 2015.
[2] O.Russakovsky et al., “ImageNet Large Scale Visual Recognition Challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[3] F. and othersChollet, “Keras,” 2015. [Online]. Available: https://keras.io.
[4] M.Abadi et al., “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems.”
[5] Y.Jia et al., “Caffe: Convolutional Architecture for Fast Feature Embedding *,” 2014.
[6] A.Paszke et al., “Automatic differentiation in PyTorch,” 31st Conf. Neural Inf. Process. Syst., no. Nips, pp. 1–4, 2017.
[7] L.Lab, “theano Documentation Release 1.0.0,” 2017.
[8] J. Redmon, “Darknet: Open source neural networks in c.,” 2013. [Online]. Available: http://pjreddie.com/darknet/.
[9] M.Abadi et al., “TensorFlow Lite,” 2017. [Online]. Available: https://www.tensorflow.org/lite/.
[10] S.Teerapittayanon, B.McDanel, andH. T.Kung, “Distributed Deep Neural Networks over the Cloud, the Edge and End Devices,” Proc. - Int. Conf. Distrib. Comput. Syst., pp. 328–339, 2017.
[11] J.Redmon andA.Farhadi, “YOLOv3: An Incremental Improvement,” 2018.
[12] P.Molchanov, S.Tyree, T.Karras, T.Aila, andJ.Kautz, “Pruning Convolutional Neural Networks for Resource Efficient Inference,” no. 2015, pp. 1–17, 2016.
[13] R.Girshick, J.Donahue, T.Darrell, U. C.Berkeley, andJ.Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” 2014 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 2–9, 2012.
[14] R.Girshick, “Fast R-CNN.”
[15] R.Girshick, J.Donahue, T.Darrell, U. C.Berkeley, J.Malik, andF. und T. des L. N.-W.Ministerium für Innovation, Wissenschaft, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” 2014 IEEE Conf. Comput. Vis. Pattern Recognit., pp. 1–9, 2012.
[16] J.Redmon, S.Divvala, R.Girshick, andA.Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” 2015.
[17] J.Redmon, “Yolo9000,” Cvpr, 2017.
[18] W.Liu et al., “SSD: Single shot multibox detector,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9905 LNCS, pp. 21–37.
[19] J. R. R.Uijlings, K. E. A.Van DeSande, T.Gevers, andA. W. M.Smeulders, “Selective Search for Object Recognition,” 2012.
[20] T.Durand, T.Mordan, N.Thome, andM.Cord, “Learning to Refine Object Segments,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, vol. 2017–Janua, pp. 5957–5966.
[21] E.Asensio, C.Medina, M.Frías, andM. I. S.deRojas, “Microsoft COCO: Common Objects in Context,” J. Am. Ceram. Soc., vol. 99, no. 12, pp. 4121–4127, 2016.
[22] K.He, X.Zhang, S.Ren, andJ.Sun, “Deep Residual Learning for Image Recognition,” Comput. Vis. Pattern Recognit., pp. 770–778, 2016.
[23] C.Szegedy, S.Reed, P.Sermanet, V.Vanhoucke, andA.Rabinovich, “Going deeper with convolutions,” pp. 1–12.
[24] J.Hale, “Deep Learning Framework Power Scores 2018,” kaggle, 2018. [Online]. Available: https://www.kaggle.com/discdiver/deep-learning-framework-power-scores-2018.
[25] Y.Cheng, D.Wang, P.Zhou, andT.Zhang, “A Survey of Model Compression and Acceleration for Deep Neural Networks,” pp. 1–10, 2017.
[26] F. N.Iandola, S.Han, M. W.Moskewicz, K.Ashraf, W. J.Dally, andK.Keutzer, “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and,” Feb.2016.
[27] A. G.Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” Apr.2017.
[28] X.Zhang, X.Zhou, M.Lin, andJ.Sun, “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices,” Jul.2017.
[29] G.Huang, Z.Liu, L.Van DerMaaten, andK. Q.Weinberger, “Densely Connected Convolutional Networks.”
[30] S.Anwar, K.Hwang, andW.Sung, “Structured Pruning of Deep Convolutional Neural Networks,” ACM J. Emerg. Technol. Comput. Syst., vol. 13, no. 3, 2017.
[31] G.Hinton, O.Vinyals, andJ.Dean, “Distilling the Knowledge in a Neural Network,” Mar.2015.
[32] F.Li, B.Zhang, andB.Liu, “Ternary Weight Networks,” no. Nips, 2016.
[33] M.Courbariaux, I.Hubara, D.Soudry, R.El-Yaniv, andY.Bengio, “Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1,” 2016.
[34] C.Leng, H.Li, S.Zhu, andR.Jin, “Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM,” Jul.2017.
[35] “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.”
[36] B.Jacob et al., “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference,” 2017.
[37] S.Han, H.Mao, andW. J.Dally, “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding,” pp. 1–14, 2015.
[38] S.Han et al., “EIE: Efficient Inference Engine on Compressed Deep Neural Network,” Proc. - 2016 43rd Int. Symp. Comput. Archit. ISCA 2016, vol. 16, pp. 243–254, 2016.
[39] Y.Chen, T.Krishna, J.Emer, andV.Sze, “Eyeriss : An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Future of Deep Learning Recognit ion DCNN Accelerator is Crucial • High Throughput for Real-time,” IEEE Int. Solid-State Circuits Conf., pp. 1–43, 2016.
[40] N. P.Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing Unit,” ACM SIGARCH Comput. Archit. News, vol. 45, no. 2, pp. 1–12, 2017.
[41] I.Hubara, M.Courbariaux, D.Soudry, R.El-Yaniv, andY.Bengio, “Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations,” Sep.2016.
[42] “IBM Knowledge Center - Managing code size.” [Online]. Available: https://www.ibm.com/support/knowledgecenter/SSAT4T_16.1.1/com.ibm.xlf1611.lelinux.doc/proguide/managingcodesize.html. [Accessed: 26-Dec-2018].
[43] Min-Zhi Ji, “Optimization of YOLOv3 Inference Engine for Edge Device,” Natl. Cheng K. Univ. - NCKU, 2019.
[44] “GNU linker.” [Online]. Available: https://gcc.gnu.org/onlinedocs/gcc/Link-Options.html.
[45] “GNU, Code Coverage Reports.”
[46] Balau, “Analyzing C source files dependencies in a program.” [Online]. Available: https://balau82.wordpress.com/2013/11/24/analyzing-c-source-files-dependencies-in-a-program/.
[47] “gcov—a Test Coverage Program.”
[48] M.Rastegari, V.Ordonez, J.Redmon, andA.Farhadi, “XNOR-net: Imagenet classification using binary convolutional neural networks,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, vol. 9908 LNCS.
[49] “Benefits of Using Fixed-Point Hardware - MATLAB & Simulink.” [Online]. Available: https://www.mathworks.com/help/fixedpoint/gs/benefits-of-fixed-point-hardware.html. [Accessed: 26-Dec-2018].
[50] Shao-Ming Lai, Chih-Hung Kuo, “An Efficient Dual-Precision Floating-Point Special Function Unit,” 29th VLSI Des. Symp.
[51] C.Lomont, “Introduction to Intel Advanced Vector Extensions,” p. 21, 2011.
[52] “Graphviz - Graph Visualization Software.” [Online]. Available: http://www.graphviz.org/documentation/.

  • 同意授權校內瀏覽/列印電子全文服務,於2024-01-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2024-01-01起公開。

  • 如您有疑問,請聯絡圖書館