||An efficient training process for recurrent neural network based on subsequence information
||Department of Electrical Engineering
recurrent neural network
accelerated training process
遞歸神經網路 (Recurrent Neural Network) 是一種適合用於分析序列資料的神經網路架構，這種架構的特色在於會依序輸入序列中的每個符號，計算出隱藏狀態並且保留於模型內部，藉此學習到序列前後的相關性。但也因為訓練時需要等待先前資訊的計算，這個過程無法平行處理，因此，如何提升遞歸神經網路的學習速度一直是個重要的研究課題。
Recurrent neural network is a neural network architecture suitable for analyzing sequence data. The characteristic of this architecture is that each token in the sequence is input in order, and the hidden state is calculated and kept inside the model. The model can learn the correlation between the token. Because it needs to wait for the calculation of the previous information during training, this training process cannot be processed in parallel. Therefore, how to improve the training speed of recurrent neural networks has always been an important research topic.
In addition to the disadvantage that it cannot be processed in parallel in nature, the length of each sequence is usually not the same when processing sequence data. For example, some sentences may only have three words, but some sentences may have dozens of words. A special character is usually used to pad each sequence in the dataset to the same length. Thus, a shorter sequence will contain a lot of useless information, which will lead to a waste of computing resources.
This study proposes a method for training recurrent neural networks on various datasets using subsequences. We found eight datasets in three major areas, including images, text, and biological sequences, for experiments. By inputting different subsequences for training in each epoch, we can use less training time to achieve the same test scores as when using the full sequence to train. Then in this study, the best sampling method is also proposed, which can perform better in training time and test scores. Finally, we also prove that our method has good robustness by using different recurrent neural network units with this training method.
第一章 緒論 1
第二章 相關研究 3
2.1 遞歸神經網路 3
2.1.1 遞歸層 3
2.1.2 遞歸神經元 4
2.2 基於序列片段的訓練方法 9
2.2.1 Vanilla Transformer 9
2.2.2 Transformer XL 10
2.2.3 TS-LSTM and Temporal-Inception 11
第三章 研究方法 13
3.1 資料集 13
3.1.1 MNIST 13
3.1.2 Fashion-MNIST 14
3.1.3 CIFAR-10 15
3.1.4 Sentiment 15
3.1.5 Sentiment140 15
3.1.6 IMDB 16
3.1.7 AMP 16
3.1.8 ACP 16
3.2 資料前處理 17
3.2.1 特徵編碼 17
3.2.2 標籤編碼 20
3.3 遞歸網路模型架構 20
3.4 子序列取樣 22
3.5 模型訓練配置 24
第四章 實驗結果 26
4.1 效能評估標準 26
4.2 基於子序列訓練模型表現評估 27
4.2.1 文本資料集 28
4.2.1 生物序列資料集 29
4.2.2 圖片資料集 30
4.3 文本資料集及生物序列資料集分析 31
4.4 圖片資料集分析 32
4.5 子序列取樣方法的影響 34
4.5.1 單次隨機取樣方法 35
4.5.2 完整序列切割方法 36
4.5.3 三種取樣方法的比較 38
4.6 對遞歸神經網路單元的強健性 39
4.6.1 GRU 39
4.6.2 SimpleRNN 40
第五章 結論 43
 J. L. Elman, "Finding structure in time," Cognitive science, vol. 14, no. 2, pp. 179-211, 1990.
 J. J. Hopfield, "Neural networks and physical systems with emergent collective computational abilities," Proceedings of the national academy of sciences, vol. 79, no. 8, pp. 2554-2558, 1982.
 M. I. Jordan, "Serial order: A parallel distributed processing approach," in Advances in psychology, vol. 121: Elsevier, 1997, pp. 471-495.
 Y. Wang, M. Huang, and L. Zhao, "Attention-based LSTM for aspect-level sentiment classification," in Proceedings of the 2016 conference on empirical methods in natural language processing, 2016, pp. 606-615.
 G. Pollastri, D. Przybylski, B. Rost, and P. Baldi, "Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles," Proteins: Structure, Function, and Bioinformatics, vol. 47, no. 2, pp. 228-235, 2002.
 J. Ba, V. Mnih, and K. Kavukcuoglu, "Multiple object recognition with visual attention," arXiv preprint arXiv:1412.7755, 2014.
 R. Al-Rfou, D. Choe, N. Constant, M. Guo, and L. Jones, "Character-level language modeling with deeper self-attention," in Proceedings of the AAAI Conference on Artificial Intelligence, 2019, vol. 33, pp. 3159-3166.
 Z. Dai et al., "Transformer-xl: Attentive language models beyond a fixed-length context," arXiv preprint arXiv:1901.02860, 2019.
 C.-Y. Ma, M.-H. Chen, Z. Kira, and G. AlRegib, "Ts-lstm and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition," Signal Processing: Image Communication, vol. 71, pp. 76-87, 2019.
 P. J. Grother, "NIST special database 19," Handprinted forms and characters database, National Institute of Standards and Technology, 1995.
 A. Krizhevsky, V. Nair, and G. Hinton, "The CIFAR-10 dataset," online: http://www. cs. toronto. edu/kriz/cifar. html, vol. 55, 2014.
 Y. LeCun, C. Cortes, and C. Burges, "MNIST handwritten digit database," AT&T Labs [Online]. Available: http://yann. lecun. com/exdb/mnist, vol. 2, p. 18, 2010.
 H. Xiao, K. Rasul, and R. Vollgraf, "Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms," arXiv preprint arXiv:1708.07747, 2017.
 A. Go, R. Bhayani, and L. Huang, "Twitter sentiment classification using distant supervision," CS224N Project Report, Stanford, vol. 1, no. 12, p. 2009, 2009.
 W. Chen, H. Ding, P. Feng, H. Lin, and K.-C. Chou, "iACP: a sequence-based tool for identifying anticancer peptides," Oncotarget, vol. 7, no. 13, p. 16895, 2016.
 A. Tyagi, P. Kapoor, R. Kumar, K. Chaudhary, A. Gautam, and G. Raghava, "In silico models for designing and discovering novel anticancer peptides. Sci Rep 3: 2984," ed, 2013.
 D. Veltri, U. Kamath, and A. Shehu, "Deep learning improves antimicrobial peptide recognition," Bioinformatics, vol. 34, no. 16, pp. 2740-2747, 2018.
 S. Vijayakumar and P. Lakshmi, "ACPP: a web server for prediction and design of anti-cancer peptides," International Journal of Peptide Research and Therapeutics, vol. 21, no. 1, pp. 99-106, 2015.
 L. Wei, C. Zhou, H. Chen, J. Song, and R. Su, "ACPred-FL: a sequence-based predictor using effective feature representation to improve the prediction of anti-cancer peptides," Bioinformatics, vol. 34, no. 23, pp. 4007-4016, 2018.
 D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," nature, vol. 323, no. 6088, pp. 533-536, 1986.
 G. E. Hinton, S. Osindero, and Y.-W. Teh, "A fast learning algorithm for deep belief nets," Neural computation, vol. 18, no. 7, pp. 1527-1554, 2006.
 A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
 A. Graves and J. Schmidhuber, "Offline handwriting recognition with multidimensional recurrent neural networks," in Advances in neural information processing systems, 2009, pp. 545-552.
 W. Byeon, T. M. Breuel, F. Raue, and M. Liwicki, "Scene labeling with lstm recurrent neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3547-3555.
 R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," in International conference on machine learning, 2013, pp. 1310-1318.
 V. Khomenko, O. Shyshkov, O. Radyvonenko, and K. Bokhan, "Accelerating recurrent neural network training using sequence bucketing and multi-gpu data parallelization," in 2016 IEEE First International Conference on Data Stream Mining & Processing (DSMP), 2016, pp. 100-103: IEEE.
 S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
 J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv:1412.3555, 2014.
 D. J. Finney, Probit analysis: a statistical treatment of the sigmoid response curve. Cambridge university press, Cambridge, 1952.
 I. Sutskever, O. Vinyals, and Q. Le, "Sequence to sequence learning with neural networks," Advances in NIPS, 2014.
 A. Vaswani et al., "Attention is all you need," in Advances in neural information processing systems, 2017, pp. 5998-6008.
 K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
 S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," arXiv preprint arXiv:1502.03167, 2015.
 O. Levy and Y. Goldberg, "Neural word embedding as implicit matrix factorization," in Advances in neural information processing systems, 2014, pp. 2177-2185.
 D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.