進階搜尋


下載電子全文  
系統識別號 U0026-0402202022351600
論文名稱(中文) 基於子序列資訊的高效遞歸神經網路訓練過程
論文名稱(英文) An efficient training process for recurrent neural network based on subsequence information
校院名稱 成功大學
系所名稱(中) 電機工程學系
系所名稱(英) Department of Electrical Engineering
學年度 108
學期 1
出版年 109
研究生(中文) 黃建勛
研究生(英文) Chien-Hsun Huang
學號 N26070326
學位類別 碩士
語文別 中文
論文頁數 45頁
口試委員 指導教授-張天豪
口試委員-高宏宇
口試委員-解巽評
口試委員-吳謂勝
口試委員-陳健生
中文關鍵字 遞歸神經網路  子序列  加速訓練 
英文關鍵字 recurrent neural network  subsequence  accelerated training process 
學科別分類
中文摘要 遞歸神經網路 (Recurrent Neural Network) 是一種適合用於分析序列資料的神經網路架構,這種架構的特色在於會依序輸入序列中的每個符號,計算出隱藏狀態並且保留於模型內部,藉此學習到序列前後的相關性。但也因為訓練時需要等待先前資訊的計算,這個過程無法平行處理,因此,如何提升遞歸神經網路的學習速度一直是個重要的研究課題。
除了先天上無法平行處理的缺點外,在處理序列資料時,通常每條序列的長度不會完全相同。以自然語言為例,有些句子可能只有三個字,但也有句子可以長達數十個字。一般來說會使用一個特殊字元將資料集內的每一條序列填補成同樣的長度,如此一來會讓長度較短的序列擁有很多沒有用的資訊,進而造成後續運算的浪費。
本研究提出了一個訓練遞歸神經網路的方法,此方法從完整序列中隨機取樣出長度較短的子序列,利用子序列在資料集上做訓練。我們找了圖像、文本及蛋白質序列等三大領域共八個資料集來做實驗,實驗結果顯示我們可以使用較少的訓練時間來達到跟使用完整序列來訓練時一樣好的測試結果。接著在本研究也提出最佳的取樣方式,在訓練時間以及測試分數上能有較好的表現。最後也透過使用不同的遞歸神經網路單元搭配這種訓練方法來證明我們的方法擁有很好的強健性。
英文摘要 Recurrent neural network is a neural network architecture suitable for analyzing sequence data. The characteristic of this architecture is that each token in the sequence is input in order, and the hidden state is calculated and kept inside the model. The model can learn the correlation between the token. Because it needs to wait for the calculation of the previous information during training, this training process cannot be processed in parallel. Therefore, how to improve the training speed of recurrent neural networks has always been an important research topic.
In addition to the disadvantage that it cannot be processed in parallel in nature, the length of each sequence is usually not the same when processing sequence data. For example, some sentences may only have three words, but some sentences may have dozens of words. A special character is usually used to pad each sequence in the dataset to the same length. Thus, a shorter sequence will contain a lot of useless information, which will lead to a waste of computing resources.
This study proposes a method for training recurrent neural networks on various datasets using subsequences. We found eight datasets in three major areas, including images, text, and biological sequences, for experiments. By inputting different subsequences for training in each epoch, we can use less training time to achieve the same test scores as when using the full sequence to train. Then in this study, the best sampling method is also proposed, which can perform better in training time and test scores. Finally, we also prove that our method has good robustness by using different recurrent neural network units with this training method.
論文目次 致謝 XVI
圖目錄 XIX
表目錄 XXI
第一章 緒論 1
第二章 相關研究 3
2.1 遞歸神經網路 3
2.1.1 遞歸層 3
2.1.2 遞歸神經元 4
2.2 基於序列片段的訓練方法 9
2.2.1 Vanilla Transformer 9
2.2.2 Transformer XL 10
2.2.3 TS-LSTM and Temporal-Inception 11
第三章 研究方法 13
3.1 資料集 13
3.1.1 MNIST 13
3.1.2 Fashion-MNIST 14
3.1.3 CIFAR-10 15
3.1.4 Sentiment 15
3.1.5 Sentiment140 15
3.1.6 IMDB 16
3.1.7 AMP 16
3.1.8 ACP 16
3.2 資料前處理 17
3.2.1 特徵編碼 17
3.2.2 標籤編碼 20
3.3 遞歸網路模型架構 20
3.4 子序列取樣 22
3.5 模型訓練配置 24
第四章 實驗結果 26
4.1 效能評估標準 26
4.2 基於子序列訓練模型表現評估 27
4.2.1 文本資料集 28
4.2.1 生物序列資料集 29
4.2.2 圖片資料集 30
4.3 文本資料集及生物序列資料集分析 31
4.4 圖片資料集分析 32
4.5 子序列取樣方法的影響 34
4.5.1 單次隨機取樣方法 35
4.5.2 完整序列切割方法 36
4.5.3 三種取樣方法的比較 38
4.6 對遞歸神經網路單元的強健性 39
4.6.1 GRU 39
4.6.2 SimpleRNN 40
第五章 結論 43
參考文獻 44
參考文獻 [1] J. L. Elman, "Finding structure in time," Cognitive science, vol. 14, no. 2, pp. 179-211, 1990.
[2] J. J. Hopfield, "Neural networks and physical systems with emergent collective computational abilities," Proceedings of the national academy of sciences, vol. 79, no. 8, pp. 2554-2558, 1982.
[3] M. I. Jordan, "Serial order: A parallel distributed processing approach," in Advances in psychology, vol. 121: Elsevier, 1997, pp. 471-495.
[4] Y. Wang, M. Huang, and L. Zhao, "Attention-based LSTM for aspect-level sentiment classification," in Proceedings of the 2016 conference on empirical methods in natural language processing, 2016, pp. 606-615.
[5] G. Pollastri, D. Przybylski, B. Rost, and P. Baldi, "Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles," Proteins: Structure, Function, and Bioinformatics, vol. 47, no. 2, pp. 228-235, 2002.
[6] J. Ba, V. Mnih, and K. Kavukcuoglu, "Multiple object recognition with visual attention," arXiv preprint arXiv:1412.7755, 2014.
[7] R. Al-Rfou, D. Choe, N. Constant, M. Guo, and L. Jones, "Character-level language modeling with deeper self-attention," in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 3159-3166.
[8] Z. Dai et al., "Transformer-xl: Attentive language models beyond a fixed-length context," arXiv preprint arXiv:1901.02860, 2019.
[9] C.-Y. Ma, M.-H. Chen, Z. Kira, and G. AlRegib, "Ts-lstm and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition," Signal Processing: Image Communication, vol. 71, pp. 76-87, 2019.
[10] P. J. Grother, "NIST special database 19," Handprinted forms and characters database, National Institute of Standards and Technology, 1995.
[11] A. Krizhevsky, V. Nair, and G. Hinton, "The CIFAR-10 dataset," online: http://www. cs. toronto. edu/kriz/cifar. html, vol. 55, 2014.
[12] Y. LeCun, C. Cortes, and C. Burges, "MNIST handwritten digit database," AT&T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist, vol. 2, p. 18, 2010.
[13] H. Xiao, K. Rasul, and R. Vollgraf, "Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms," arXiv preprint arXiv:1708.07747, 2017.
[14] A. Go, R. Bhayani, and L. Huang, "Twitter sentiment classification using distant supervision," CS224N Project Report, Stanford, vol. 1, no. 12, p. 2009, 2009.
[15] W. Chen, H. Ding, P. Feng, H. Lin, and K.-C. Chou, "iACP: a sequence-based tool for identifying anticancer peptides," Oncotarget, vol. 7, no. 13, p. 16895, 2016.
[16] A. Tyagi, P. Kapoor, R. Kumar, K. Chaudhary, A. Gautam, and G. Raghava, "In silico models for designing and discovering novel anticancer peptides. Sci Rep 3: 2984," ed, 2013.
[17] D. Veltri, U. Kamath, and A. Shehu, "Deep learning improves antimicrobial peptide recognition," Bioinformatics, vol. 34, no. 16, pp. 2740-2747, 2018.
[18] S. Vijayakumar and P. Lakshmi, "ACPP: a web server for prediction and design of anti-cancer peptides," International Journal of Peptide Research and Therapeutics, vol. 21, no. 1, pp. 99-106, 2015.
[19] L. Wei, C. Zhou, H. Chen, J. Song, and R. Su, "ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides," Bioinformatics, vol. 34, no. 23, pp. 4007-4016, 2018.
[20] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," nature, vol. 323, no. 6088, pp. 533-536, 1986.
[21] G. E. Hinton, S. Osindero, and Y.-W. Teh, "A fast learning algorithm for deep belief nets," Neural computation, vol. 18, no. 7, pp. 1527-1554, 2006.
[22] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[23] A. Graves and J. Schmidhuber, "Offline handwriting recognition with multidimensional recurrent neural networks," in Advances in neural information processing systems, 2009, pp. 545-552.
[24] W. Byeon, T. M. Breuel, F. Raue, and M. Liwicki, "Scene labeling with lstm recurrent neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3547-3555.
[25] R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," in International conference on machine learning, 2013, pp. 1310-1318.
[26] V. Khomenko, O. Shyshkov, O. Radyvonenko, and K. Bokhan, "Accelerating recurrent neural network training using sequence bucketing and multi-gpu data parallelization," in 2016 IEEE First International Conference on Data Stream Mining & Processing (DSMP), 2016, pp. 100-103: IEEE.
[27] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[28] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv:1412.3555, 2014.
[29] D. J. Finney, Probit analysis: a statistical treatment of the sigmoid response curve. Cambridge university press, Cambridge, 1952.
[30] I. Sutskever, O. Vinyals, and Q. Le, "Sequence to sequence learning with neural networks," Advances in NIPS, 2014.
[31] A. Vaswani et al., "Attention is all you need," in Advances in neural information processing systems, 2017, pp. 5998-6008.
[32] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[33] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," arXiv preprint arXiv:1502.03167, 2015.
[34] O. Levy and Y. Goldberg, "Neural word embedding as implicit matrix factorization," in Advances in neural information processing systems, 2014, pp. 2177-2185.
[35] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2020-02-14起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2020-02-14起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw