進階搜尋


 
系統識別號 U0026-0508201321524300
論文名稱(中文) 可重新配置與虛擬化之類神經網路處理器
論文名稱(英文) A Reconfigurable and Virtualizable Neural Net Processor
校院名稱 成功大學
系所名稱(中) 電腦與通信工程研究所
系所名稱(英) Institute of Computer & Communication
學年度 101
學期 2
出版年 102
研究生(中文) 李冠賢
研究生(英文) Kuan-Hsien Lee
學號 Q36004036
學位類別 碩士
語文別 英文
論文頁數 58頁
口試委員 指導教授-陳中和
召集委員-蕭勝夫
口試委員-陳培殷
口試委員-許明華
口試委員-李維聰
中文關鍵字 類神經網路  硬體加速神經網路訓練  虛擬化  多層感知器神經網路  霍普菲爾網路  倒傳遞演算法  虛擬機器 
英文關鍵字 neural network  on-chip training  virtualization  multi-layer perceptron  Hopfield neural network  error-backpropagation  virtual machine 
學科別分類
中文摘要 我們提出了一個可重新配置與虛擬化之類神經網路加速處理器,並且介紹了其硬體架構、軟硬體的溝通與硬體的虛擬化機制。我們提出的硬體架構能讓各種不同拓樸的類神經網路應用可以透過此硬體加速,其中包含了多層感知器類神經網路與霍普菲爾網路。倒傳遞演算法為最常見且實用的多層感知器類神經網路學習演算法。但倒傳遞演算法的網路訓練過程可能會耗費許多的時間,而若用硬體來平行處理作加速又會遇到記憶體存取的瓶頸造成效能不佳。為此我們提出一種存放類神經網路權重值的記憶體排列方式來消除記憶體存取的瓶頸,以此增進在硬體上處理倒傳遞演算法加速的效能。
此外,我們提出了一個建立在CASL hypervisor上綜合虛擬硬體與直接存取的虛擬化架構,在此虛擬化架構下可讓此類神經網路加速處理器達到效能的平衡且不影響整體效能。最後,我們在硬體上增加了輔助虛擬化所需的單元,使hypervisor 能夠公平且有效率的控管硬體資源。
英文摘要 In this thesis, we present a reconfigurable and virtualizable neural net processor, and introduce the hardware architecture, software-hardware co-design, and the device virtualization mechanism. The proposed hardware architecture is reconfigurable to accommodate several different neural network topologies and applications of MLP and Hopfield network in a single chip. To eliminate the memory bottleneck of error-backpropagation neural network training algorithm, we propose a weight memory management method to interleave the memory access order, and it improves the performance of on-chip training.
In addition, we propose a hybrid device-emulation and direct-access virtualization mechanism based on the CASL hypervisor to achieve low performance overhead and workload balance among multiple virtual machines. With the dual-bank weight memory design and other hardware supports, the hypervisor can manage the hardware resource effectively and fairly.
論文目次 Chapter 1 - Introduction 1
1.1 Motivation 1
1.2 Contribution 2
1.3 Organization 2
Chapter 2 - Background 3
2.1 Neural Networks 3
2.1.1 Multi-layer Perceptron (MLP) Networks 3
2.1.2 Error Backpropagation Learning Algorithm 4
2.1.3 Hopfield Neural Networks 5
2.1.4 Neural Network Implementation 6
2.2 Device Virtualization 6
2.2.1 Devices Emulation 7
2.2.2 Direct Access 7
2.2.3 Para-Virtualized Devices 7
Chapter 3 - Related work 8
3.1 Layer-multiplexing 8
3.2 Time-sharing 9
3.3 On-chip training 10
Chapter 4 - Reconfigurable and Virtualizable Neural Net Processor Architecture 12
4.1 Architecture Overview 12
4.2 Processor Control Unit 13
4.3 Neural Processing Element (NPE) 16
4.4 Synaptic Weight Memory 18
4.5 Neuron Mapping Mechanism 21
Chapter 5 - Full System Virtualization Framework 24
5.1 Virtualization Platform Architecture 24
5.2 CASL Hypervisor Architecture 25
5.2.1 Page Tables 26
5.2.2 Device Virtualization 27
5.2.3 Interrupt Virtualization 28
5.2.4 Virtual Machine Scheduling Method 29
Chapter 6 - Device Virtualization of the Neural Net Processor 31
6.1 Software and Hardware Co-design 31
6.2 Hardware Virtualization Mechanism 33
6.2.1 Virtualization System Framework 33
6.2.2 Hardware Context Switching 37
6.2.3 Dual-bank Weight Memory Design 38
6.3 Hardware Resource Scheduling With Context-Prefetching 39
6.4 Hardware Architecture Supports for Virtualization 40
Chapter 7 - Experimental Result 42
7.1 Experimental Environment Configuration 42
7.1.1 Host Configuration 42
7.1.2 Hardware Configuration 42
7.1.3 Virtualization Platform Configuration 43
7.1.4 Benchmarks 44
7.2 Forward and Training Speed 44
7.3 Hardware Context Switching Time 48
7.4 Execution Time in Virtualized and Non-virtualized Environment 50
Chapter 8 - Conclusion 53
References 54

參考文獻 [1] Z. Lin, Y. Dong, Y Li, T. Watanabe, “A hybrid architecture for efficient FPGA-based implementation of multilayer neural network,” IEEE Asia Pacific Conference on Circuits and Systems. APCCAS’10, Dec. 2010, pp.616-619.
[2] N. Nedjah, R. M. da Silva, L. de Macedo Mourelle, “Compact yet efficient hardware implementation of artificial neural networks with customized topology,” Expert Systems with Applications, vol. 39, Issue 10, Aug. 2012.
[3] N. Nedjah et al., “Dynamic MAC-Based Architecture of Artificial Neural Networks Suitable for Hardware Implementation on FPGAs,” Neurocomputing, vol. 72, nos. 10-12, pp. 2171-2179, 2009.
[4] I. D. dos Santos Miranda and A. I. A. Cunha, “ASIC design of a novel high performance neuroprocessor architecture for multi layered perceptron networks,” Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design. New York, NY, USA, 2009, pp. 1-6.
[5] S. Himavathi, D. Anitha, and A. Muthuramalingam, “Feedforward neural network implementation in FPGA using layer multiplexing for effective resource utilization,” IEEE Transactions on Neural Networks, vol.18, no.3, pp. 880-888, 2007.
[6] A. Youssef, K. Mohammed, and A. Nasar, “A Reconfigurable, Generic and Programmable Feed Forward Neural Network Implementation in FPGA,” in International Conference on Modelling and Simulation. UKSim-AMSS’12, Cambridge, UK, Mar. 2012.
[7] P. O. Domingos, F. M. Silva, and H. C. Neto, “An efficient and scalable architecture for neural networks with backpropagation learning,” in Proc. Field Programmable Logic and Application, Aug. 2005, pp. 89-94.
[8] A. Farmahini-Farahani, S. M. Fakhraie, and S. Safari, “Scalable architecture for on-chip neural network training using swarm intelligence,” in Proc. Design, Automation and Test in Europe Conf. DATE’08, Mar. 2008.
[9] R. J. Aliaga, et al., “Multiprocessor SoC implementation of neural network training on FPGA,” in International Conference on Advances in Electronics and Micro-electronics. ENICS’08, 2008, pp. 149-154
[10] R. J. Aliaga, et al., “System-on-Chip Implementation of Neural Network Training on FPGA,” International Journal On Advances in Systems and Measurements, vol. 2, no. 1, pp. 44-55, 2009.
[11] E. Ordoñez-Cardenas, and R. de J. Romero-Troncoso, “MLP neural network and on-line backpropagation learning implementation in a low-cost fpga,” in Proc. ACM Great Lakes symposium on VLSI. GLSVLSI’08, New York, NY, USA, 2008, pp. 333-338.
[12] Y. Sun, and A. C. Cheng, “Machine learning on-a-chip: A high-performance low-power reusable neuron architecture for artificial neural networks in ECG classifications,” Computers in Biology and Medicine, pp. 751-757, 2012.
[13] C. T. Liu, “CASL Hypervisor and its Virtualization Platform,” 2012 master thesis of National Cheng Kung University, Tainan, Jul., 2012.
[14] O. L. Mangasarian and W. H. Wolberg, “Cancer diagnosis via linear programming,” SIAM News, vol. 23, no. 5, pp. 1-18, Sep. 1990.
[15] W. H. Wolberg and O. L. Mangasarian, “Multisurface method of pattern separation for medical diagnosis applied to breast cytology,” in Proc. National Academy of Sciences, vol. 87, U.S.A., Dec. 1990, pp. 9193-9196.
[16] O. L. Mangasarian, R. Setiono, and W. H. Wolberg, “Pattern recognition via linear programming: Theory and application to medical diagnosis,” in Large-scale numerical optimization, Philadelphia, U.S.A., 1990, pp. 22-30.
[17] K. P. Bennett and O. L. Mangasarian, “Robust linear programming discrimination of two linearly inseparable sets,” Optimization Methods and Software, vol. 1, pp. 23-34, 1992
[18] A. S. Georghiades and P. N. Belhumeur, “From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 643-660, 2001.
[19] R. A. Fisher, “The use of multiple measurements in taxonomic problems,” Annals of eugenics, vol. 7, no. 2, pp. 179-188, Sep. 1936.
[20] http://en.wikipedia.org/wiki/Artificial_neuron
[21] http://en.wikipedia.org/wiki/Multilayer_perceptron
[22] http://en.wikipedia.org/wiki/Backpropagation
[23] https://en.wikipedia.org/wiki/Hopfield_network
[24] J. Misra and I. Saha, “Artificial neural networks in hardware: A survey of two decades of progress,” Neurocomputing, vol. 74, no. 1, pp. 239-255, Dec 2010.
[25] N. Izeboudjen, C. Larbes, and A. Farah, “A new classification approach for neural networks hardware: from standards chips to embedded systems on chip,” Artificial Intelligence Review, pp. 1-44, 2012.
[26] F. M. Dias, A. Antunes, and A. M. Mota, “Artificial neural networks: a review of commercial hardware,” Engineering Applications of Artificial Intelligence, vol. 17, no. 8, pp. 945-952, 2004.
[27] Y. Liao, “Neural networks in hardware: A survey,” Department of Computer Science, University of California, 2001.
[28] H. Raj and K. Schwan, “High performance and scalable I/O virtualization via self-virtualized devices,” in Proc. international symposium on High performance distributed computing, HPDC’07, New York, U.S.A., 2007, pp. 179-188.
[29] Y. Dong, et al., “High performance network virtualization with SR-IOV,” Journal of Parallel and Distributed Computing, vol. 72, no. 11, pp. 1471-1480, Nov. 2012.
[30] P. Barham, et al., “Xen and the art of virtualization,” in ACM SIGOPS Operating Systems Review, vol. 37, no. 5, Dec. 2003, pp. 164-177.
[31] M. H. Hassoun, Fundamentals of artificial neural networks, 1st ed. MIT Press Cambridge, MA, USA, 1995.
[32] Y. Ago, Y. Ito, and K. Nakano, “An FPGA implementation for neural networks with the FDFM processor core approach,” International Journal of Parallel, Emergent and Distributed Systems, vol. 28, no. 4, pp. 308-320, 2013.
[33] K. Bache and M. Lichman, UCI Repository for Machine Learning Databases, Dept. of Information and Computer Sciences, Univ. of California, Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html, 2013.
[34] D. Bhattacharjee, et al., “A Parallel Framework for Multilayer Perceptron for Human Face Recognition,” International Journal of Computer Science and Security, IJCSS’10, vol. 3, no. 6, pp. 491-507, 2010.
[35] Wide I/O Single Data Rate, Standard J. E. D. E. C. JESD229, Dec. 2011.
[36] M. Jung, et al., “TLM modelling of 3D stacked wide I/O DRAM subsystems: a virtual platform for memory controller design space exploration,” in Proc. 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, New York, NY, USA, 2013.
[37] J. Seo, et al., “A 45 nm CMOS neuromorphic chip with a scalable architecture for learning in networks of spiking neurons,” in Proc. Custom Integrated Circuits Conference, CICC’11, San Jose, CA, USA, 2011, pp. 1–4.
[38] P. Merolla, et al., “A digital neurosynaptic core using embedded crossbar memory with 45pj per spike in 45nm,” in Proc. Custom Integrated Circuits Conference,CICC’11, San Jose, CA, USA, 2011.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2015-08-15起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2015-08-15起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw