進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0107202015061100
論文名稱(中文) 動態神經網路 : 結合神經互信息應用於文本分類
論文名稱(英文) Dynamic Neural Networks : Apply Neural Mutual Information for Text classification
校院名稱 成功大學
系所名稱(中) 數據科學研究所
系所名稱(英) Institute of Data Science
學年度 108
學期 2
出版年 109
研究生(中文) 陳冠君
研究生(英文) Kuan Chun Chen
學號 RE6071062
學位類別 碩士
語文別 英文
論文頁數 39頁
口試委員 指導教授-李國榮
共同指導教授-李政德
口試委員-古倫維
口試委員-高宏宇
口試委員-吳宗憲
中文關鍵字 神經網路搜索  互信息  貝式定理 
英文關鍵字 Neural Architecture Search  Mutual Information  Bayesian Theorem 
學科別分類
中文摘要 如何學習文字的特徵表示對於文本分類、對話生成等等的自然語言處理任務是非常重要的。最近有不同的神經網路模型被提出來學習文字的特徵表示,但像這樣人工設計的神經網路模型架構可以有無數種,我們並不知道哪一種模型架構是最優的。由於神經網路搜索技術的發展日益成熟,因此我們可以使用此技術來自動搜尋神經網路架構,然而,大多數的神經網路搜索技術的目標是嘗試去最大化資料集的分類準確率而不是著重在學習文字的特徵表示,因此會引發過擬合資料集的問題,所以我們結合互信息來學習文字的特徵表示,同時最大化資料集的分類準確率以及互信息進行聯合學習。我們的方法應用神經網路搜索技術於自然處理領域並結合互信息進行聯合學習,並且我們的方法在文本分類的資料集上打敗其他文本分類的模型以及能夠在資料較少的情況下有好的表現。
英文摘要 Learning text representation is important for text classification, text generation and other Natural Language Processing (NLP) tasks. Recently, diverse model structure has been proposed to learn text representation. But such manually designed model can have infinite combinations, so we don't know which model structure is optimal. Recently, Neural Architecture Search (NAS) techniques were developed to solve such problems. However, most of NAS techniques tried to archieve high classification accuracy of dataset instead of focusing on learning input representation which can benefit on the classification accuracy of dataset. Hence, we compute mutual information between input representation and output representation of each layer of neural network and maximize it. Through maximizing mutual information, we can learn text representation. We proposed a method which applied NAS to do model structure search and use mutual information as objective function in text classification. Our method outperforms other models in text classification and perform the state-of-art result in scarce data setting.
論文目次 摘要i
Abstract ii
誌謝iii
Table of Contents iv
List of Tables vi
List of Figures vii
Chapter 1. Introduction 1
1.1. Background . . . . . . . . . . . . . . . . . . . . 1
1.1.1. Rule-based systems . . . . . . . . . . . . . . . 1
1.1.2. Machine learning systems . . . . . . . . . . . . 2
1.1.3. Hybrid systems . . . . . . . . . . . . . . . . . 2
1.2. Motivation . . . . . . . . . . . . . . . . . . . . 3
1.3. Problem . . . . . . . . . . . . . . . . . . . . . .3
1.4. Challenge . . . . . . . . . . . . . . . . . . . . .4
1.5. Our Method . . . . . . . . . . . . . . . . . . . . 4
1.6. Paper Structure . . . . . . . . . . . . . . . . . .5
Chapter 2. Related Work 6
2.1. Text Classification . . . . . . . . . . . . . . . .6
2.1.1. C-LSTM . . . . . . . . . . . . . . . . . . . . . 6
2.1.2. Recurrent Convolution Neural Networks . . . . . .7
2.1.3. Hierarchical Attention Network . . . . . . . . . 8
2.2. Neural Architecture Search . . . . . . . . . . . . 11
2.3. Mutual Information . . . . . . . . . . . . . . . . 12
2.4. Short Summary . . . . . . . . . . . . . . . . . . .13
Chapter 3. Methodology 14
3.1. Input Representation . . . . . . . . . . . . . . . 14
3.2. Model Search . . . . . . . . . . . . . . . . . . . 16
3.3. Selected Operations . . . . . . . . . . . . . . . .17
3.3.1. None Operation . . . . . . . . . . . . . . . . . 18
3.3.2. Convolution . . . . . . . . . . . . . . . . . . .18
3.3.3. Dilated Convolution . . . . . . . . . . . . . . .19
3.3.4. Pooling . . . . . . . . . . . . . . . . . . . . .19
3.4. Discrete Layer . . . . . . . . . . . . . . . . . . 20
3.5. Output . . . . . . . . . . . . . . . . . . . . . . 22
3.6. Joint Learning . . . . . . . . . . . . . . . . . . 22
3.7. Algorithms . . . . . . . . . . . . . . . . . . . . 23
Chapter 4. Experiment 24
4.1. Experiment Setting . . . . . . . . . . . . . . . . 24
4.2. Data Description . . . . . . . . . . . . . . . . . 24
4.2.1. IMDB . . . . . . . . . . . . . . . . . . . . . . 24
4.2.2. AG News . . . . . . . . . . . . . . . . . . . . .24
4.2.3. Yelp . . . . . . . . . . . . . . . . . . . . . . 25
4.3. Baselines . . . . . . . . . . . . . . . . . . . . .25
4.3.1. C-LSTM . . . . . . . . . . . . . . . . . . . . . 25
4.3.2. Transformer . . . . . . . . . . . . . . . . . . .26
4.3.3. ENAS . . . . . . . . . . . . . . . . . . . . . . 26
4.3.4. SMASH . . . . . . . . . . . . . . . . . . . . . .26
4.3.5. TextNAS . . . . . . . . . . . . . . . . . . . . .26
4.4. Evaluation Metric . . . . . . . . . . . . . . . . .26
4.5. Experiment Result . . . . . . . . . . . . . . . . .27
4.5.1. Compare with Baselines . . . . . . . . . . . . . 27
4.5.2. Searched Model Structue . . . . . . . . . . . . .28
4.5.3. Parameter Analysis . . . . . . . . . . . . . . . 28
4.5.4. Ablation Study . . . . . . . . . . . . . . . . . 30
Chapter 5. Conclusion and Future Work 33
References 34
Appendix A. Searched Model Structure 36
A.1. Searched model with different number nodes on IMDB . . . . . . . . . . . 36
A.2. Searched model with different number nodes on AG News . . . . . . . . . 37
A.3. Searched model with different number nodes on Yelp . . . . . . . . . . . . 38
參考文獻 [1] Devansh Arpit, Huan Wang, Caiming Xiong, Richard Socher, and Yoshua Bengio. Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning. arXiv e-prints, page arXiv:2002.09046, February 2020.
[2] Ishmael Belghazi, Sai Rajeswar, Aristide Baratin, R. Devon Hjelm, and Aaron C.Courville. MINE: mutual information neural estimation. CoRR, abs/1801.04062, 2018.
[3] Y-Lan Boureau, J. Ponce, and Yann Lecun. A theoretical analysis of feature pooling in visual recognition. pages 111–118, 11 2010.
[4] Leo Breiman. Random forests. Mach. Learn., 45(1):5–32, October 2001.
[5] Andrew Brock, Theodore Lim, James M. Ritchie, and Nick Weston. SMASH: one-shot model architecture search through hypernetworks. CoRR, abs/1708.05344, 2017.
[6] Marti A. Hearst. Support vector machines. IEEE Intelligent Systems, 13(4):18–28, July 1998.
[7] Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Philip Bachman, Adam Trischler, and Yoshua Bengio. Learning deep representations by mutual information estimation and maximization. In ICLR 2019. ICLR, April 2019.
[8] Nal Kalchbrenner and Phil Blunsom. Recurrent convolutional neural networks for discourse compositionality. In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, pages 119–126, Sofia, Bulgaria, August 2013. Association for Computational Linguistics.
[9] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[10] Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: differentiable architecture search. CoRR, abs/1806.09055, 2018.
[11] Pengfei Liu, Xipeng Qiu, and Xuanjing Huang. Recurrent neural network for text classification with multi-task learning. CoRR, abs/1605.05101, 2016.
[12] Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. Effective approaches to attention-based neural machine translation. CoRR, abs/1508.04025, 2015.
[13] Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.
[14] Julian McAuley and Jure Leskovec. Hidden factors and hidden topics: Understanding rating dimensions with review text. pages 165–172, 10 2013.
[15] Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, and Jeff Dean. Efficient neural architecture search via parameter sharing. CoRR, abs/1802.03268, 2018.
[16] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N.Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. CoRR, abs/1706.03762, 2017.
[17] Yujing Wang, Yaming Yang, Yi-Ren Chen, Jing Bai, Ce Zhang, Guinan Su, Xiaoyu Kou, Yunhai Tong, Mao Yang, and Lidong Zhou. Textnas: A neural architecture search space tailored for text representation. In AAAI, 2020.
[18] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1480–1489, San Diego, California, June 2016. Association for Computational Linguistics.
[19] Fisher Yu and Vladlen Koltun. Multi-scale context aggregation by dilated convolutions. In International Conference on Learning Representations (ICLR), May 2016.
[20] Xiang Zhang, Junbo Jake Zhao, and Yann LeCun. Character-level convolutional networks for text classification. CoRR, abs/1509.01626, 2015.
[21] Chunting Zhou, Chonglin Sun, Zhiyuan Liu, and Francis Lau. A c-lstm neural network for text classification. 11 2015.
[22] Barret Zoph and Quoc V. Le. Neural architecture search with reinforcement learning. CoRR, abs/1611.01578, 2016.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2025-07-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2025-07-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw