Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
 D. Bahdanau, K. Cho and Y. J. a. p. a. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
 Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In North American Association for Computational Linguistics (NAACL), 2019.
 Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008), 2017.
 Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., & Fidler, S. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision (pp. 19-27), 2015.
 Logan Lebanoff, Kaiqiang Song, and Fei Liu. Adapting the neural encoder-decoder framework from single to multi-document summarization. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2018.
 Jianmin Zhang, Jiwei Tan, and Xiaojun Wan. Adapting Neural Single-Document Summarization Model for Abstractive Multi-Document Summarization: A Pilot Study. In Proceedings of the 11th International Conference on Natural Language Generation, 2018.
 Gunes Erkan and Dragomir R Radev. Lexrank: ¨ Graph-Based Lexical Centrality as Salience in Text Summarization. Journal of artificial intelligence research, 22:457–479, 2004.
 Rada Mihalcea and Paul Tarau. Textrank: Bringing Order into Text. In Proceedings of the 2004 conference on empirical methods in natural language processing, 2014.
 Michihiro Yasunaga, Rui Zhang, Kshitijh Meelu, Ayush Pareek, Krishnan Srinivasan, and Dragomir R. Radev. Graph-Based Neural Multi-Document Summarization. In Proceedings of CoNLL-2017. Association for Computational Linguistics, 2017.
 Ziqiang Cao, Wenjie Li, Sujian Li, and Furu Wei. Improving Multi-Document Summarization via Text Classification. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pages 3053–3059, 2017.
 Jaime Carbonell and Jade Goldstein. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 335–336. ACM, 1998.
 Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, and Noam Shazeer. Generating Wikipedia by Summarizing Long Sequences. In Proceedings of the 6th International Conference on Learning Representations, 2018.
 Logan Lebanoff, Kaiqiang Song, Franck Dernoncourt, Doo Soon Kim, Seokhwan Kim, Walter Chang, and Fei Liu. Scoring sentence singletons and pairs for abstractive summarization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2019.
 Yashar Mehdad, Giuseppe Carenini, Frank W Tompa, and Raymond T Ng. Abstractive meeting summarization with entailment and fusion. In Proc. of the 14th European Workshop on Natural Language Generation. pages 136–146, 2013.
 Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar S. Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke S. Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692, 2019.
 Ani Nenkova and Kathleen McKeown. Automatic summarization. Foundations and Trends in Information Retrieval, 2011.
 Abigail See, Peter J. Liu, and Christopher D. Manning. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 1073–1083, 2017.
 Alexander R. Fabbri, Irene Li, Tianwei She, Suyi Li, Dragomir R. Radev. Multi-news: A large-scale multi-document summarization dataset and abstractive hierarchical model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1074–1084, 2019.
 Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Stan Szpakowicz Marie-Francine Moens, editor, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74–81, 2004.
 Klein, G., Kim, Y., Deng, Y., Senellart, J., and Rush, A. M. OpenNMT: Open-Source Toolkit for Neural Machine Translation. ArXiv e-prints, 2017.
 Sebastian Gehrmann, Yuntian Deng, and Alexander M.Rush. Bottom-Up Abstractive Summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018, pages 4098–4109, 2018.
 Wang, D., S. Zhu, T. Li, and Y. Gong. Comparative document summarization via discriminative sentence selection. In Proceeding of CIKM, 2009.