系統識別號 U0026-1708202011211100
論文名稱(中文) 應用BERT模型執行TRL分級
論文名稱(英文) TRL Labeling by BERT Model
校院名稱 成功大學
系所名稱(中) 工程科學系
系所名稱(英) Department of Engineering Science
學年度 108
學期 2
出版年 109
研究生(中文) 丁士珉
研究生(英文) Shin-Min Ting
學號 N96071106
學位類別 碩士
語文別 中文
論文頁數 56頁
口試委員 指導教授-王明習
中文關鍵字 技術成熟度  自然語言處理  機器學習  雙向變形編碼器語言表示模型 
英文關鍵字 Technical Readiness Level (TRL)  Natural Language Processing (NLP)  Machine Learning  Bidirectional Encoder Representations from Transformer (BERT) 
中文摘要 技術成熟度(Technical Readiness Level, TRL)是一項用來評估技術發展程度的指標,其評估方式大致分為由研發人員自行評估或是由外部相關領域專家來評估兩種,但同一評估目標的分級可能因為評估者的背景而有差異,評測者也必須經過閱讀大量資料後,才能夠給予評測目標一個合適的分級,因此需要耗費大量的時間與人力資源,所以本研究想透過近年來蓬勃發展的電腦技術來節省寶貴的人力,也希望能產生更客觀的分級結果。本研究使用由Google開發、在自然語言處理(Natural Language Processing, NLP)領域取得頂尖成績的雙向變形編碼器語言表示模型(Bidirectional Encoder Representations from Transformers, BERT)來進行,內容為根據評測者所評測的簡單評述來給予技術成熟度分級的實驗,隨著時間推進與技術成熟度分級的推廣,未來將會有更多的評測資料可供機器學習(Machine Learning)使用。本研究使用經外部專家評測後的資料,擷取項目名稱、學科領域類別、計畫內容簡述等項目來訓練模型,並讓模型產生對於該計畫之技術成熟度等級的預測。我們得到的成果顯示簡述的書寫方式對於預測目標有很大的影響,從我們所蒐集到的資料中,得到75%左右的精確度。
英文摘要 Technical Readiness Level (TRL) is a criterion to measure the developing stage of one technology or project. The measurement comes in two ways, one is done by developing group themselves and the other is done by external professionals with relative domain knowledge. However, the level given to the exact same technology may vary by estimator’s background, and the estimators may have to examine a lot of paper or data to determine the level of the project. The measurement is time consuming and requires precious human resources. In this study, it is tried to use computer to do the TRL leveling job, to save time and/or human resources and provide an objective measurement. The Bidirectional Encoder Representations from Transformer (BERT) model is applied as the core component for this study. The BERT model is proposed by Google and achieves state-of-the-art performance in natural language processing domain. The training and testing data comes from the results of professional expert evaluation for project results. Each evaluation result includes a brief summary which state about the developed techniques, the maturity of the developed method for applying the result to different phase of the relative application. In this study, 1163 data is collected, 960 for training and 203 for testing. From the experimental results show, the accuracy of the proposed method is around 75%.
論文目次 摘要 ii
目錄 viii
圖目錄 x
表目錄 xii
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 1
1.3 論文架構 2
第二章 相關資料探討 3
2.1 技術成熟度 3
2.2 預測方法 5
2.2.1 自然語言處理 5
2.2.2 類神經網路之概述 6
2.2.3 注意力機制 12
2.3 相關文獻探討 26
第三章 研究方法 29
3.1 數據集與預先處理流程 29
3.2 實驗流程與預訓練模型結構 32
3.3 實際訓練與測試過程 34
第四章 實驗結果與討論 36
4.1 實驗環境 38
4.2 實驗結果與數據 38
4.2.1 原始資料測試 39
4.2.2 新增描述語句實驗 42
4.2.3 刪去描述不足資料實驗 46
4.3 實驗結果討論 49
第五章 結論與未來展望 51
參考文獻 52
參考文獻 [1] J. C. Mankins, "Technology readiness levels: a white paper", NASA, Office of Space Access and Technology, Advanced Concepts Office. 6 April 1995.
[2] D. Michie, D. J. Spiegelhalter, C. C. Taylor, and J. Campbell (Eds.). "Machine learning, neural and statistical classification", Ellis Horwood, USA. 1994.
[3] W. J. Hutchins, "The Georgetown-IBM Experiment Demonstrated in January 1954", 2004, In: Frederking R.E., Taylor K.B. (eds) Machine Translation: From Real Users to Research. AMTA 2004. Lecture Notes in Computer Science, vol 3265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30194-3_12
[4] W. S. Mcculloch, and W. Pitts, "A logical calculus of the ideas immanent in nervous activity.", The bulletin of mathematical biophysics, Vol. 5 No. 4, December 1943, pp. 115–133, ISSN 0007-4985. doi: 10.1007/BF02478259
[5] B. Farley and W. Clark, "Simulation of self-organizing systems by digital computer", In Transactions of the IRE Professional Group on Information Theory, Vol. 4 No. 4, September 1954, pp. 76-84, doi:10.1109/TIT.1954.1057468
[6] P. Werbos, "Beyond regression: new tools for prediction and analysis in the behavioral science", Thesis (Ph. D.). Appl. Math. Harvard University. 1974.
[7] D. Rumelhart, G. Hinton, and R. Williams, "Learning representations by back-propagating errors", Nature, Vol. 323, 1986, pp. 533-536, doi:10.1038/323533a0
[8] S. Hochreiter and J. Schmidhuber. "Long short-term memory", Neural Computation, Vol. 9 No. 8, November 1997, pp. 1735-1780, PMID 9377276, doi:10.1162/neco.1997.9.8.1735
[9] D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate", In CoRR, 2015, arXiv:1409.0473, https://arxiv.org/abs/1409.0473
[10] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks", In Advances in Neural Information Processing Systems, 2014, pp. 3104-3112, arXiv:1409.3215, https://arxiv.org/abs/1409.3215
[11] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, and I. Polosukhin, "Attention is all you need", In Advances in Neural Information Processing Systems, 2017, pp. 5998-6008, arXiv:1706.03762, https://arxiv.org/abs/1706.03762
[12] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, "Bert: pre-training of deep bidirectional transformers for language understanding", 2018, arXiv:1810.04805, https://arxiv.org/abs/1810.04805
[13] Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult", IEEE Transactions on Neural Networks, Vol. 5 No. 2, 1994, pp. 157-166.
[14] R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks", In International Conference on Machine Learning, Atlanta, Georgia, USA, 16-21 February, 2013, pp. 1310-1318.
[15] L. Kaiser, and I. Sutskever, "Neural gpus learn algorithms", 2015, arXiv preprint, arXiv:1511.08228, https://arxiv.org/abs/1511.08228
[16] N. Kalchbrenner, L. Espeholt, K. Simonyan, A. V. D. Oord, A. Graves, and K. Kavukcuoglu, "Neural machine translation in linear time", 2016, arXiv preprint, arXiv:1610.10099, https://arxiv.org/abs/1610.10099
[17] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, "Deep contextualized word representations", In Proceedings of NAACL-HLT Vol. 1, New Orleans, Louisiana, USA, 1-6 June, 2018, pp. 2227-2237, arXiv:1802.05365, https://arxiv.org/abs/1802.05365
[18] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding with unsupervised learning", 2018, Technical report, OpenAI. https://openai.com/blog/language-unsupervised/
[19] K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation", 2014, arXiv preprint, arXiv:1406.1078, https://arxiv.org/abs/1406.1078
[20] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling", 2014, arXiv preprint, arXiv:1412.3555, https://arxiv.org/abs/1412.3555
[21] O. Kuchaiev, and B. Ginsburg, (2017). "Factorization tricks for LSTM networks", 2017, arXiv preprint, arXiv:1703.10722, https://arxiv.org/abs/1703.10722
[22] N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean, "Outrageously large neural networks: The sparsely-gated mixture-of-experts layer", 2017, arXiv preprint, arXiv:1701.06538, https://arxiv.org/abs/1701.06538
[23] Y. Kim, C. Denton, L. Hoang, and A. M. Rush, "Structured attention networks", 2017, arXiv preprint, arXiv:1702.00887, https://arxiv.org/abs/1702.00887
[24] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, "Show, attend and tell: Neural image caption generation with visual attention", In International Conference on Machine Learning, Lille, France, 6-11 June, 2015, pp. 2048-2057, arXiv:1502.03044
[25] R. Collobert, and J. Weston, "A unified architecture for natural language processing: Deep neural networks with multitask learning", In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5-9 July, 2008, pp. 160-167.
[26] A. M. Dai, and Q. V. Le, "Semi-supervised sequence learning", In Advances in Neural Information Processing Systems, 2015, pp. 3079-3087.
[27] A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, "Supervised learning of universal sentence representations from natural language inference data", In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7-11 September, 2017, pp. 670–680.
[28] B. McCann, J. Bradbury, C. Xiong, and R. Socher, "Learned in translation: Contextualized word vectors", In Advances in Neural Information Processing Systems, Long Beach Convention Center, Long Beach, California, USA, 4-9 December, 2017, pp. 6294-6305.
[29] W. L. Taylor, "“Cloze procedure”: A new tool for measuring readability", Journalism Quarterly, Vol. 30 No. 4, 1953, pp. 415-433.
[30] C. Chelba, T. Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, and T. Robinson, "One billion word benchmark for measuring progress in statistical language modeling", 2013, arXiv preprint, arXiv:1312.3005, https://arxiv.org/abs/1312.3005
[31] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, "Glue: A multi-task benchmark and analysis platform for natural language understanding", In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium,1 November, 2018, pp. 353-355, Association for Computational Linguistics, arXiv:1804.07461, https://arxiv.org/abs/1804.07461
[32] I. Turc, M. W. Chang, K. Lee, and K. Toutanova, "Well-read students learn better: On the importance of pre-training compact models", 2019, arXiv preprint, arXiv:1908.08962, https://arxiv.org/abs/1908.08962
[33] D. Zhang, J. Wang, and X. Zhao, "Estimating the Uncertainty of Average F1 Scores. ", In Proceedings of the 2015 International Conference on The Theory of Information Retrieval, Northampton, Massachusetts, USA, 27-30 September, 2015, pp. 317-320, doi:10.1145/2808194.2809488.
  • 同意授權校內瀏覽/列印電子全文服務,於2020-08-28起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2020-08-28起公開。

  • 如您有疑問,請聯絡圖書館