||Variable-Length Abstractive Summarization using Two-stage Transformer-based Method
||Institute of Computer Science and Information Engineering
automatic summarization system
Due to the rapid growth of information available, how to efficiently process and utilize these text-based resources has become an increasingly crucial challenge to address. Such a problem can be solved with an automatic summarization system. Most summarization systems are divided into two types: extractive methods and abstractive methods. Extractive methods form the summary by extracting segments of text from the document. Abstractive methods process the document and then generate a text summary. The former can allow the user to specify the length of the summary, while the latter is able to produce a more fluent and human-like summary.
The main contribution of this thesis is to propose a two-stage method for training the variable-length abstractive summarization model. This is an improvement over previous models that cannot simultaneously achieve fluency and variable length for the summarization results. The variable-length abstractive summarization model is divided into a text segmentation module and three generation modules. The proposed text segmentation module, which utilizes BERT and Bidirectional LSTM, shows improved performance over existing methods. The generation modules combine extractive and abstractive methods to produce near state-of-the-art headline summaries.
A new large-scale Chinese text segmentation dataset called ChWiki_181k is introduced. A BERT-based text segmentation model is proposed to be the baseline model on ChWiki_181k. LCSTS is adopted to train summarization models, and a variable-length abstractive summarization system is trained with a two-stage method. The proposed variable-length abstractive summarization system achieved a maximum of 70% accuracy in human subjective evaluation, and the experimental result has shown the proposed model could generate proper variable-length summaries.
List of Tables VII
List of Figures VIII
Chapter 1 Introduction 1
1.1 Background 1
1.2 Motivation 3
1.3 Literature Review 5
1.3.1 Abstractive Text Summarization 5
1.3.2 Extractive Text Summarization 6
1.3.3 Text Segmentation 7
1.3.4 Sequence to Sequence Model 8
1.3.5 Pre-training Method 10
1.4 Problems 11
1.5 Proposed Methods 13
Chapter 2 System Framework 15
2.1 Text Segmentation Model 16
2.1.1 BERT 17
2.1.2 LSTM Model 21
2.1.3 BERT-based Text Segmentation Model 23
2.2 Extractive Model 26
2.3 Document Summarization Model 29
2.3.1 Transformer 29
2.3.2 Summarization Combining Extraction and Abstraction 33
2.4 Segment Summarization Model 34
2.4.1 Segment transformer model 35
2.4.2 Loss Values of Two-stage Model 37
Chapter 3 Experimental Results and Discussion 40
3.1 Evaluation Metrics 40
3.1.1 Pk Indicator 40
3.1.2 ROUGE 41
3.1.3 Subjective Evaluation Method 42
3.2 Dataset 43
3.2.1 Text Segmentation Corpus 43
3.2.2 Summarization Corpus 45
3.3 Experimental Results and Discussion 46
3.3.1 Evaluation of the Text Segmentation Model BERT-biLSTM 46
3.3.2 Evaluation of the Ablation of the Extractive Model 49
3.3.3 Evaluation of the Document Transformer Model 50
3.3.4 Evaluation of the Variable-Length Abstractive Summary 54
Chapter 4 Conclusion and Future Work 60
 "Wikipedia." https://www.wikipedia.org/ (accessed 19 June, 2019).
 "中文維基百科." https://zh.wikipedia.org/wiki/%E4%B8%AD%E6%96%87%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91 (accessed 19 June, 2019).
 S. Primativo, D. Spinelli, P. Zoccolotti, M. De Luca, and M. Martelli, "Perceptual and cognitive factors imposing “speed limits” on reading rate: a study with the rapid serial visual presentation," PloS one, vol. 11, no. 4, p. e0153786, 2016.
 D. Roth. " The Answer Factory: Demand Media and the Fast, Disposable, and Profitable as Hell Media Model." https://www.wired.com/2009/10/ff-demandmedia/ (accessed 19 June, 2019).
 運動公社. "從查比高恩斯空難與轉會流言——體育新聞的內容農場現象." https://www.hk01.com/01%E5%8D%9A%E8%A9%95-Sports/61554/%E5%BE%9E%E6%9F%A5%E6%AF%94%E9%AB%98%E6%81%A9%E6%96%AF%E7%A9%BA%E9%9B%A3%E8%88%87%E8%BD%89%E6%9C%83%E6%B5%81%E8%A8%80-%E9%AB%94%E8%82%B2%E6%96%B0%E8%81%9E%E7%9A%84%E5%85%A7%E5%AE%B9%E8%BE%B2%E5%A0%B4%E7%8F%BE%E8%B1%A1 (accessed 19 June, 2019).
 A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI Blog, vol. 1, no. 8, 2019.
 OpenAI, "Better Language Models and Their Implications," ed, February 14, 2019.
 H. P. Luhn, "The automatic creation of literature abstracts," IBM Journal of research and development, vol. 2, no. 2, pp. 159-165, 1958.
 Y. Gong and X. Liu, "Generic text summarization using relevance measure and latent semantic analysis," in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, 2001: ACM, pp. 19-25.
 R. Mihalcea and P. Tarau, "Textrank: Bringing order into text," in Proceedings of the 2004 conference on empirical methods in natural language processing, 2004.
 M. Afsharizadeh, H. Ebrahimpour-Komleh, and A. Bagheri, "Query-oriented text summarization using sentence extraction technique," in 2018 4th International Conference on Web Research (ICWR), 2018: IEEE, pp. 128-132.
 K. Al-Sabahi, Z. Zuping, and M. J. I. A. Nadher, "A Hierarchical Structured Self-Attentive Model for Extractive Document Summarization (HSSAS)," vol. 6, pp. 24205-24212, 2018.
 K.-Y. Chen, S.-H. Liu, B. Chen, H.-M. J. I. A. T. o. A. Wang, Speech,, and L. Processing, "An Information Distillation Framework for Extractive Summarization," vol. 26, no. 1, pp. 161-170, 2018.
 Y. Liu, "Fine-tune BERT for Extractive Summarization," arXiv preprint arXiv:1903.10318.
 A. M. Rush, S. Chopra, and J. Weston, "A neural attention model for abstractive sentence summarization," arXiv preprint arXiv:1509.00685, 2015.
 S. Chopra, M. Auli, and A. M. Rush, "Abstractive sentence summarization with attentive recurrent neural networks," in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 93-98.
 B. Hu, Q. Chen, and F. Zhu, "Lcsts: A large scale chinese short text summarization dataset," arXiv preprint arXiv:1506.05865, 2015.
 P. Li, W. Lam, L. Bing, and Z. Wang, "Deep recurrent generative decoder for abstractive text summarization," arXiv preprint arXiv:1708.00625, 2017.
 H. Zhang et al., "Pretraining-Based Natural Language Generation for Text Summarization," arXiv preprint arXiv:1902.09243, 2019.
 B. Wei, X. Ren, Y. Zhang, X. Cai, Q. Su, and X. Sun, "Regularizing output distribution of abstractive chinese social media text summarization for improved semantic consistency," ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), vol. 18, no. 3, p. 31, 2019.
 C. Li, W. Xu, S. Li, and S. Gao, "Guiding generation for abstractive text summarization based on key information guide network," in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018, pp. 55-60.
 H. Wang, L. Jing, and H. Shao, "Research on method of sentence similarity based on ontology," in 2009 WRI Global Congress on Intelligent Systems, 2009, vol. 2: IEEE, pp. 465-469.
 R. Nallapati, B. Zhou, C. Gulcehre, and B. Xiang, "Abstractive text summarization using sequence-to-sequence rnns and beyond," arXiv preprint arXiv:1602.06023, 2016.
 K. Cho et al., "Learning phrase representations using RNN encoder-decoder for statistical machine translation," arXiv preprint arXiv:1406.1078, 2014.
 I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in Advances in neural information processing systems, 2014, pp. 3104-3112.
 S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
 J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
 A. See, P. J. Liu, and C. D. Manning, "Get to the point: Summarization with pointer-generator networks," arXiv preprint arXiv:1704.04368, 2017.
 G. Glavaš, F. Nanni, and S. P. Ponzetto, "Unsupervised text segmentation using semantic relatedness graphs," 2016: Association for Computational Linguistics.
 P. Badjatiya, L. J Kurisinkel, M. Gupta, and V. Varma, "Attention-Based Neural Text Segmentation," in Advances in Information Retrieval, Cham, G. Pasi, B. Piwowarski, L. Azzopardi, and A. Hanbury, Eds., 2018// 2018: Springer International Publishing, pp. 180-193.
 O. Koshorek, A. Cohen, N. Mor, M. Rotman, and J. Berant, "Text segmentation as a supervised learning task," arXiv preprint arXiv:1803.09337, 2018.
 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
 O. Vinyals and Q. Le, "A neural conversational model," arXiv preprint arXiv:1506.05869, 2015.
 O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: Lessons learned from the 2015 mscoco image captioning challenge," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 4, pp. 652-663, 2016.
 S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko, "Sequence to sequence-video to text," in Proceedings of the IEEE international conference on computer vision, 2015, pp. 4534-4542.
 D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.
 Z. Lin et al., "A structured self-attentive sentence embedding," arXiv preprint arXiv:1703.03130, 2017.
 A. Vaswani et al., "Attention is all you need," in Advances in Neural Information Processing Systems, 2017, pp. 5998-6008.
 A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pre-training," URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/languageunsupervised/language understanding paper. pdf, 2018.
 E. Loper and S. Bird, "NLTK: the natural language toolkit," arXiv preprint cs/0205028, 2002.
 M. E. Peters et al., "Deep contextualized word representations," arXiv preprint arXiv:1802.05365, 2018.
 R. Socher et al., "Recursive deep models for semantic compositionality over a sentiment treebank," in Proceedings of the 2013 conference on empirical methods in natural language processing, 2013, pp. 1631-1642.
 A. Warstadt, A. Singh, and S. R. Bowman, "Neural network acceptability judgments," arXiv preprint arXiv:1805.12471, 2018.
 A. Williams, N. Nangia, and S. R. Bowman, "A broad-coverage challenge corpus for sentence understanding through inference," arXiv preprint arXiv:1704.05426, 2017.
 P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, "Squad: 100,000+ questions for machine comprehension of text," arXiv preprint arXiv:1606.05250, 2016.
 A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, "Glue: A multi-task benchmark and analysis platform for natural language understanding," arXiv preprint arXiv:1804.07461, 2018.
 D. Cer, M. Diab, E. Agirre, I. Lopez-Gazpio, and L. Specia, "Semeval-2017 task 1: Semantic textual similarity-multilingual and cross-lingual focused evaluation," arXiv preprint arXiv:1708.00055, 2017.
 W. B. Dolan and C. Brockett, "Automatically constructing a corpus of sentential paraphrases," in Proceedings of the Third International Workshop on Paraphrasing (IWP2005), 2005.
 K. M. Hermann et al., "Teaching machines to read and comprehend," in Advances in neural information processing systems, 2015, pp. 1693-1701.
 J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, "Convolutional sequence to sequence learning," in Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017: JMLR. org, pp. 1243-1252.
 K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
 J. L. Ba, J. R. Kiros, and G. E. Hinton, "Layer normalization," arXiv preprint arXiv:1607.06450, 2016.
 D. Britz, A. Goldie, M.-T. Luong, and Q. Le, "Massive exploration of neural machine translation architectures," arXiv preprint arXiv:1703.03906, 2017.
 D. Beeferman, A. Berger, and J. Lafferty, "Statistical models for text segmentation," Machine learning, vol. 34, no. 1-3, pp. 177-210, 1999.
 C.-Y. Lin, "Rouge: A package for automatic evaluation of summaries," in Text summarization branches out, 2004, pp. 74-81.
 J. R. Landis and G. G. Koch, "The measurement of observer agreement for categorical data," biometrics, pp. 159-174, 1977.