進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-1508201918282600
論文名稱(中文) 基於兩階段轉換器之可變長度抽象摘要
論文名稱(英文) Variable-Length Abstractive Summarization using Two-stage Transformer-based Method
校院名稱 成功大學
系所名稱(中) 資訊工程學系
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 107
學期 2
出版年 108
研究生(中文) 鄭皓澤
研究生(英文) Hao-Tse Cheng
電子信箱 top30339@gmail.com
學號 P76064106
學位類別 碩士
語文別 英文
論文頁數 66頁
口試委員 指導教授-吳宗憲
口試委員-王新民
口試委員-王家慶
口試委員-陳嘉平
口試委員-禹良治
中文關鍵字 自動化摘要系統  抽象式摘要  抽取式摘要  文本分割  可變長度摘要  Transformer  BERT  長短期記憶模型 
英文關鍵字 automatic summarization system  abstractive summarization  extractive summarization  text segmentation  variable-length summarization  Transformer  BERT  LSTM 
學科別分類
中文摘要 近年來資訊量的快速增長已經使如何快速瀏覽大量文章成為每個人都要面對的問題,自動化摘要系統可以協助人們解決此問題,使人們在獲取文章內容的同時省去非常大量的時間。自動化摘要系統可區分為抽取式摘要與抽象式摘要,前者可由使用者自行決定產生摘要長度;後者則能產生更符合人類寫出且較通順的摘要。
本論文主要貢獻為提出兩階段模型訓練方式完成一個可變長度抽象摘要模型的訓練,成功解決過去自動化摘要系統無法兼顧可變長度與流暢度兩者的問題。可變長度抽象摘要模型包含數個子模型,在文本分割模型方面,本論文提出的系統成功結合目前最先進的語言表徵模型BERT與雙向長短期記憶模型完成文本分割任務並增進其效果;在摘要生成模型方面,本論文結合抽取式摘要與抽象式摘要,成功使用Transformer在產生文章的標題摘要上與目前最先進的模型有相近的效果。
本論文整理一個新的大型中文文本分割語料庫ChWiki_181k,並提出一個基於BERT的文本分割模型作為該語料庫的基準模型供後續研究做比較。在摘要方面使用LCSTS語料庫,成功使用兩階段模型完成訓練可變長度抽象摘要系統,並在人工主觀評測上最高準確率達70%,證明本論文所提之架構能產生有效的可變長度摘要。
英文摘要 Due to the rapid growth of information available, how to efficiently process and utilize these text-based resources has become an increasingly crucial challenge to address. Such a problem can be solved with an automatic summarization system. Most summarization systems are divided into two types: extractive methods and abstractive methods. Extractive methods form the summary by extracting segments of text from the document. Abstractive methods process the document and then generate a text summary. The former can allow the user to specify the length of the summary, while the latter is able to produce a more fluent and human-like summary.
The main contribution of this thesis is to propose a two-stage method for training the variable-length abstractive summarization model. This is an improvement over previous models that cannot simultaneously achieve fluency and variable length for the summarization results. The variable-length abstractive summarization model is divided into a text segmentation module and three generation modules. The proposed text segmentation module, which utilizes BERT and Bidirectional LSTM, shows improved performance over existing methods. The generation modules combine extractive and abstractive methods to produce near state-of-the-art headline summaries.
A new large-scale Chinese text segmentation dataset called ChWiki_181k is introduced. A BERT-based text segmentation model is proposed to be the baseline model on ChWiki_181k. LCSTS is adopted to train summarization models, and a variable-length abstractive summarization system is trained with a two-stage method. The proposed variable-length abstractive summarization system achieved a maximum of 70% accuracy in human subjective evaluation, and the experimental result has shown the proposed model could generate proper variable-length summaries.
論文目次 摘要 I
Abstract II
誌謝 IV
Contents V
List of Tables VII
List of Figures VIII
Chapter 1 Introduction 1
1.1 Background 1
1.2 Motivation 3
1.3 Literature Review 5
1.3.1 Abstractive Text Summarization 5
1.3.2 Extractive Text Summarization 6
1.3.3 Text Segmentation 7
1.3.4 Sequence to Sequence Model 8
1.3.5 Pre-training Method 10
1.4 Problems 11
1.5 Proposed Methods 13
Chapter 2 System Framework 15
2.1 Text Segmentation Model 16
2.1.1 BERT 17
2.1.2 LSTM Model 21
2.1.3 BERT-based Text Segmentation Model 23
2.2 Extractive Model 26
2.3 Document Summarization Model 29
2.3.1 Transformer 29
2.3.2 Summarization Combining Extraction and Abstraction 33
2.4 Segment Summarization Model 34
2.4.1 Segment transformer model 35
2.4.2 Loss Values of Two-stage Model 37
Chapter 3 Experimental Results and Discussion 40
3.1 Evaluation Metrics 40
3.1.1 Pk Indicator 40
3.1.2 ROUGE 41
3.1.3 Subjective Evaluation Method 42
3.2 Dataset 43
3.2.1 Text Segmentation Corpus 43
3.2.2 Summarization Corpus 45
3.3 Experimental Results and Discussion 46
3.3.1 Evaluation of the Text Segmentation Model BERT-biLSTM 46
3.3.2 Evaluation of the Ablation of the Extractive Model 49
3.3.3 Evaluation of the Document Transformer Model 50
3.3.4 Evaluation of the Variable-Length Abstractive Summary 54
Chapter 4 Conclusion and Future Work 60
Reference 62
參考文獻 [1] "Wikipedia." https://www.wikipedia.org/ (accessed 19 June, 2019).
[2] "中文維基百科." https://zh.wikipedia.org/wiki/%E4%B8%AD%E6%96%87%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91 (accessed 19 June, 2019).
[3] S. Primativo, D. Spinelli, P. Zoccolotti, M. De Luca, and M. Martelli, "Perceptual and cognitive factors imposing “speed limits” on reading rate: a study with the rapid serial visual presentation," PloS one, vol. 11, no. 4, p. e0153786, 2016.
[4] D. Roth. " The Answer Factory: Demand Media and the Fast, Disposable, and Profitable as Hell Media Model." https://www.wired.com/2009/10/ff-demandmedia/ (accessed 19 June, 2019).
[5] 運動公社. "從查比高恩斯空難與轉會流言——體育新聞的內容農場現象." https://www.hk01.com/01%E5%8D%9A%E8%A9%95-Sports/61554/%E5%BE%9E%E6%9F%A5%E6%AF%94%E9%AB%98%E6%81%A9%E6%96%AF%E7%A9%BA%E9%9B%A3%E8%88%87%E8%BD%89%E6%9C%83%E6%B5%81%E8%A8%80-%E9%AB%94%E8%82%B2%E6%96%B0%E8%81%9E%E7%9A%84%E5%85%A7%E5%AE%B9%E8%BE%B2%E5%A0%B4%E7%8F%BE%E8%B1%A1 (accessed 19 June, 2019).
[6] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI Blog, vol. 1, no. 8, 2019.
[7] OpenAI, "Better Language Models and Their Implications," ed, February 14, 2019.
[8] H. P. Luhn, "The automatic creation of literature abstracts," IBM Journal of research and development, vol. 2, no. 2, pp. 159-165, 1958.
[9] Y. Gong and X. Liu, "Generic text summarization using relevance measure and latent semantic analysis," in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, 2001: ACM, pp. 19-25.
[10] R. Mihalcea and P. Tarau, "Textrank: Bringing order into text," in Proceedings of the 2004 conference on empirical methods in natural language processing, 2004.
[11] M. Afsharizadeh, H. Ebrahimpour-Komleh, and A. Bagheri, "Query-oriented text summarization using sentence extraction technique," in 2018 4th International Conference on Web Research (ICWR), 2018: IEEE, pp. 128-132.
[12] K. Al-Sabahi, Z. Zuping, and M. J. I. A. Nadher, "A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS)," vol. 6, pp. 24205-24212, 2018.
[13] K.-Y. Chen, S.-H. Liu, B. Chen, H.-M. J. I. A. T. o. A. Wang, Speech,, and L. Processing, "An Information Distillation Framework for Extractive Summarization," vol. 26, no. 1, pp. 161-170, 2018.
[14] Y. Liu, "Fine-tune BERT for Extractive Summarization," arXiv preprint arXiv:1903.10318.
[15] A. M. Rush, S. Chopra, and J. Weston, "A neural attention model for abstractive sentence summarization," arXiv preprint arXiv:1509.00685, 2015.
[16] S. Chopra, M. Auli, and A. M. Rush, "Abstractive sentence summarization with attentive recurrent neural networks," in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 93-98.
[17] B. Hu, Q. Chen, and F. Zhu, "Lcsts: A large scale chinese short text summarization dataset," arXiv preprint arXiv:1506.05865, 2015.
[18] P. Li, W. Lam, L. Bing, and Z. Wang, "Deep recurrent generative decoder for abstractive text summarization," arXiv preprint arXiv:1708.00625, 2017.
[19] H. Zhang et al., "Pretraining-Based Natural Language Generation for Text Summarization," arXiv preprint arXiv:1902.09243, 2019.
[20] B. Wei, X. Ren, Y. Zhang, X. Cai, Q. Su, and X. Sun, "Regularizing output distribution of abstractive chinese social media text summarization for improved semantic consistency," ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), vol. 18, no. 3, p. 31, 2019.
[21] C. Li, W. Xu, S. Li, and S. Gao, "Guiding generation for abstractive text summarization based on key information guide network," in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018, pp. 55-60.
[22] H. Wang, L. Jing, and H. Shao, "Research on method of sentence similarity based on ontology," in 2009 WRI Global Congress on Intelligent Systems, 2009, vol. 2: IEEE, pp. 465-469.
[23] R. Nallapati, B. Zhou, C. Gulcehre, and B. Xiang, "Abstractive text summarization using sequence-to-sequence rnns and beyond," arXiv preprint arXiv:1602.06023, 2016.
[24] K. Cho et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
[25] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in Advances in neural information processing systems, 2014, pp. 3104-3112.
[26] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[27] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[28] A. See, P. J. Liu, and C. D. Manning, "Get to the point: Summarization with pointer-generator networks," arXiv preprint arXiv:1704.04368, 2017.
[29] G. Glavaš, F. Nanni, and S. P. Ponzetto, "Unsupervised text segmentation using semantic relatedness graphs," 2016: Association for Computational Linguistics.
[30] P. Badjatiya, L. J Kurisinkel, M. Gupta, and V. Varma, "Attention-Based Neural Text Segmentation," in Advances in Information Retrieval, Cham, G. Pasi, B. Piwowarski, L. Azzopardi, and A. Hanbury, Eds., 2018// 2018: Springer International Publishing, pp. 180-193.
[31] O. Koshorek, A. Cohen, N. Mor, M. Rotman, and J. Berant, "Text segmentation as a supervised learning task," arXiv preprint arXiv:1803.09337, 2018.
[32] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[33] O. Vinyals and Q. Le, "A neural conversational model," arXiv preprint arXiv:1506.05869, 2015.
[34] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: Lessons learned from the 2015 mscoco image captioning challenge," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 4, pp. 652-663, 2016.
[35] S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko, "Sequence to sequence-video to text," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 4534-4542.
[36] D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.
[37] Z. Lin et al., "A structured self-attentive sentence embedding," arXiv preprint arXiv:1703.03130, 2017.
[38] A. Vaswani et al., "Attention is all you need," in Advances in Neural Information Processing Systems, 2017, pp. 5998-6008.
[39] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pre-training," URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/languageunsupervised/language understanding paper. pdf, 2018.
[40] E. Loper and S. Bird, "NLTK: the natural language toolkit," arXiv preprint cs/0205028, 2002.
[41] M. E. Peters et al., "Deep contextualized word representations," arXiv preprint arXiv:1802.05365, 2018.
[42] R. Socher et al., "Recursive deep models for semantic compositionality over a sentiment treebank," in Proceedings of the 2013 conference on empirical methods in natural language processing, 2013, pp. 1631-1642.
[43] A. Warstadt, A. Singh, and S. R. Bowman, "Neural network acceptability judgments," arXiv preprint arXiv:1805.12471, 2018.
[44] A. Williams, N. Nangia, and S. R. Bowman, "A broad-coverage challenge corpus for sentence understanding through inference," arXiv preprint arXiv:1704.05426, 2017.
[45] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, "Squad: 100,000+ questions for machine comprehension of text," arXiv preprint arXiv:1606.05250, 2016.
[46] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, "Glue: A multi-task benchmark and analysis platform for natural language understanding," arXiv preprint arXiv:1804.07461, 2018.
[47] D. Cer, M. Diab, E. Agirre, I. Lopez-Gazpio, and L. Specia, "Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation," arXiv preprint arXiv:1708.00055, 2017.
[48] W. B. Dolan and C. Brockett, "Automatically constructing a corpus of sentential paraphrases," in Proceedings of the Third International Workshop on Paraphrasing (IWP2005), 2005.
[49] K. M. Hermann et al., "Teaching machines to read and comprehend," in Advances in neural information processing systems, 2015, pp. 1693-1701.
[50] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, "Convolutional sequence to sequence learning," in Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017: JMLR. org, pp. 1243-1252.
[51] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[52] J. L. Ba, J. R. Kiros, and G. E. Hinton, "Layer normalization," arXiv preprint arXiv:1607.06450, 2016.
[53] D. Britz, A. Goldie, M.-T. Luong, and Q. Le, "Massive exploration of neural machine translation architectures," arXiv preprint arXiv:1703.03906, 2017.
[54] D. Beeferman, A. Berger, and J. Lafferty, "Statistical models for text segmentation," Machine learning, vol. 34, no. 1-3, pp. 177-210, 1999.
[55] C.-Y. Lin, "Rouge: A package for automatic evaluation of summaries," in Text summarization branches out, 2004, pp. 74-81.
[56] J. R. Landis and G. G. Koch, "The measurement of observer agreement for categorical data," biometrics, pp. 159-174, 1977.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2021-08-31起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2021-08-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw