進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-1107201909083700
論文名稱(中文) 基於機器學習之議題解決方式推薦模型-以議題追蹤系統為例
論文名稱(英文) Problem Solving Recommendation Model based on Machine Learning for an Issue Tracking System
校院名稱 成功大學
系所名稱(中) 工業與資訊管理學系碩士在職專班
系所名稱(英) Department of Industrial and Information Management (on the job class)
學年度 107
學期 2
出版年 108
研究生(中文) 鄒欣怡
研究生(英文) Hsin-I Tsou
學號 r37061260
學位類別 碩士
語文別 中文
論文頁數 58頁
口試委員 指導教授-王惠嘉
口試委員-李昇暾
口試委員-劉任修
口試委員-郭俊良
中文關鍵字 議題追蹤系統  議題分類  自動擷取摘要 
英文關鍵字 Issue Tracking System  Issue Classification  Text Summarization 
學科別分類
中文摘要 全球電子化的普及,提高了各行各業對軟體的依賴度,客製化軟體的需求亦不斷增加,面對來自不同使用者提出的各種異動需求,專案管理者需透過管理工具-議題追蹤系統(Issue Tracking System,ITS)來確保每個事項都能準確地被追蹤與執行。然而,ITS的項目包羅萬象,要使用人工方式來過濾或分類這些議題,是一項繁瑣、耗時且不具效益的工作;另一方面,每一份議題報告都包含許多描述性的自然語言,在ITS有限的查詢條件中,查找相似的問題不是件容易的事,因而造成一個問題多人提報的情形,不但增加開發人員的工作負擔,也同時提高了整個專案的成本。
過去有不少研究提出各種自動化方法來對議題報告進行分類或分群,不過這些研究大都著重於將議題依嚴重性分類或找出議題間的關聯性。事實上,負責人員的回覆紀錄中通常具有議題的處理歷程及解決方式,若能從中擷取重要資訊,即可複製歷史經驗縮短負責人員的處理時間,而在文字摘要技術的應用上,卻鮮少有針對議題報告之問題筆記進行摘要的相關研究。因此,除了準確地分類議題,若能進一步使用自動摘要技術在問題筆記中萃取出可能的解決方案推薦給議題負責人員,將更有助益。
本研究將以S公司之ITS為例,建置一套議題解決方式推薦模型,使用分類方法找出新進議題所屬之功能類別、分群方法群聚相同功能類別內的相似議題,最後再透過自動摘要技術擷取出相似議題的問題筆記來推薦給負責人員。由實驗結果可以得知,使用本研究之推薦模型能有效幫助議題被指派者檢視相似議題之處理方法,進而提昇議題處理效率。
英文摘要 Global electronically popularization has increased the reliance on software, the demand for customized software is also increasing and facing the various transaction requests from different users, project managers need to ensure that everything is accurately tracked and executed through the management tool, Issue Tracking System (ITS). However, ITS projects are all-encompassing and it is a cumbersome, time-consuming and unproductive job to use manual methods to filter or classify these issues. On the other hand, each issue report contains many descriptive natural languages. In the limited query conditions of ITS, it is not easy to find similar problems, thus causing a problem to be reported by many people, not only increases the workload of the program developer but also increases the cost of the project.
In the past, many studies have proposed various automated methods to classify or group issue reports, but most of these studies focus on categorizing issues by severity or finding relevance between issues. In fact, the reply record of the issue report usually has the processing history and solutions, which are useful information for the assignee.
This paper proposes a solution recommendation model for classifying issues and automatically summarizing the solution. It can be known from the experimental results that the solution recommendation model can help the assignee to obtain the solution of similar issues, thereby improving the processing efficiency.
論文目次 目錄
1. 緒論 1
1.1. 研究背景與動機 1
1.2. 研究目的 4
1.3. 研究範圍與限制 5
1.4. 研究流程 6
1.5. 論文架構 7
2. 文獻探討 8
2.1. 議題追蹤系統 8
2.2. 特徵選取 9
2.2.1. 詞頻 10
2.2.2. 詞頻-逆向文件頻率 10
2.2.3. N元模型 11
2.3. 分類方法 12
2.3.1. K-近鄰演算法 12
2.3.2. 樸素貝氏分類法 13
2.3.3. 支持向量機 14
2.3.4. 隨機森林 15
2.3.5. 交叉驗證 17
2.4. 分群方法 19
2.4.1. K-means 19
2.4.2. 聚合式階層分群法 20
2.4.3. 平均側影法 21
2.5. 自動擷取摘要方法 21
2.5.1. TextRank 22
2.6. 小結 23
3. 研究方法 24
3.1. 研究架構 24
3.2. 資料前處理 28
3.2.1. 文字處理 29
3.3. 功能分類 31
3.3.1. 隨機森林模型訓練 31
3.4. 議題分群 32
3.5. 自動擷取問題筆記摘要 33
4. 系統建置與驗證 36
4.1. 系統建置環境 36
4.2. 實驗設計 37
4.2.1. 實驗資料集 37
4.2.2. 衡量指標 39
4.2.3. 實驗一、功能分類 42
4.2.4. 實驗二、議題分群 43
4.2.5. 實驗三、自動摘要 47
4.3. 實驗結果說明 48
5. 結論與未來研究 49
5.1. 研究貢獻與結論 49
5.2. 未來研究方向 51
參考文獻 52
參考文獻 Aburomman, A. A., & Reaz, M. B. (2017). A novel weighted support vector machines multiclass classifier based on differential evolution for intrusion detection systems. Information Sciences, 414, 225-246. doi:10.1016/j.ins.2017.06.007
Aggarwal, K., Rutgers, T., Timbers, F., Hindle, A., Greiner, R., & Stroulia, E. (2015, 2-6 March 2015). Detecting duplicate bug reports with software engineering domain knowledge. Paper presented at the 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).
Akman, S., Ozmut, M., Aydin, B., & Gokturk, S. (2016). Experience report: implementing requirement traceability throughout the software development life cycle. Journal of Software-Evolution and Process, 28(11), 950-954. doi:10.1002/smr.1824
Alguliyev, R. M., Aliguliyev, R. M., Isazade, N. R., Abdi, A., & Idris, N. (2019). COSUM: Text summarization based on clustering and optimization. Expert Systems, 36(1), 17. doi:10.1111/exsy.12340
Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). Text Summarization Techniques: A Brief Survey. International Journal of Advanced Computer Science and Applications, 8(10), 397-405.
Arabshahi, H., & Fazlollahtabar, H. (2018). Classifying Innovative Activities Using Decision Tree and Gini Index. International Journal of Innovation and Technology Management, 15(3), 14. doi:10.1142/s0219877018500256
Babu, T. A., & Kumar, P. R. (2018, 4-5 Jan. 2018). Characterization and classification of uterine magnetomyography signals using KNN classifier. Paper presented at the 2018 Conference on Signal Processing And Communication Engineering Systems (SPACES).
Banerjee, S., Syed, Z., Helmick, J., Culp, M., Ryan, K., & Cukic, B. (2017). Automated triaging of very large bug repositories. Information and Software Technology, 89, 1-13. doi:10.1016/j.infsof.2016.09.006
Barushka, A., & Hajek, P. (2018). Spam filtering using integrated distribution-based balancing approach and regularized deep neural networks. Applied Intelligence, 48(10), 3538-3556. doi:10.1007/s10489-018-1161-y
Belgiu, M., & Drăguţ, L. (2016). Random forest in remote sensing: A review of applications and future directions. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24-31. doi:https://doi.org/10.1016/j.isprsjprs.2016.01.011
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32. doi:10.1023/a:1010933404324
Bužić, D., & Dobša, J. (2018, 21-25 May 2018). Lyrics classification using Naive Bayes. Paper presented at the 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).
Cheng, D., Shi, Y., Lin, T., Gwee, B., & Toh, K. (2018). Hybrid K-Means Clustering and Support Vector Machine Method for Via and Metal Line Detections in Delayered IC Images. IEEE Transactions on Circuits and Systems II: Express Briefs, 1-1. doi:10.1109/TCSII.2018.2827044
Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20(3), 273-297. doi:10.1007/bf00994018
da Silva, P. R. N., Gabbar, H. A., Vieira, P., & da Costa, C. T. (2018). A new methodology for multiple incipient fault diagnosis in transmission lines using QTA and Naive Bayes classifier. International Journal of Electrical Power & Energy Systems, 103, 326-346. doi:10.1016/j.ijepes.2018.05.036
Dastgheib, M. B., Fakhrahmad, S. M., & Jahromi, M. Z. (2017). Perspell: A new Persian semanticbased spelling correction system. Digital Scholarship in the Humanities, 32(3), 543-553. doi:10.1093/llc/fqw015
Desokey, E. N., Badr, A., & Hegazy, A. F. (2017, 27-28 Dec. 2017). Enhancing stock prediction clustering using K-means with genetic algorithm. Paper presented at the 2017 13th International Computer Engineering Conference (ICENCO).
Du, P., Samat, A., Waske, B., Liu, S., & Li, Z. (2015). Random Forest and Rotation Forest for fully polarized SAR image classification using polarimetric and spatial features. ISPRS Journal of Photogrammetry and Remote Sensing, 105, 38-53. doi:https://doi.org/10.1016/j.isprsjprs.2015.03.002
Duan, J. Y., Ji, T. X., & Wang, H. (2018). Error Correction for Search Engine by Mining Bad Case. Ieice Transactions on Information and Systems, E101D(7), 1938-1945. doi:10.1587/transinf.2017EDP7284
Gomes, S. R., Saroar, S. G., Mosfaiul, M., Telot, A., Khan, B. N., Chakrabarty, A., & Mostakim, M. (2017, 28-30 Sept. 2017). A comparative approach to email classification using Naive Bayes classifier and hidden Markov model. Paper presented at the 2017 4th International Conference on Advances in Electrical Engineering (ICAEE).
Granik, M., & Mesyura, V. (2017, 29 May-2 June 2017). Fake news detection using naive Bayes classifier. Paper presented at the 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON).
Graovac, J., Kovacevic, J., & Pavlovic-Lazetic, G. (2017). Hierarchical vs. flat n-gram-based text categorization: can we do better? Computer Science and Information Systems, 14(1), 103-121. doi:10.2298/csis151017030g
Guan, H., Li, J., Chapman, M., Deng, F., Ji, Z., & Yang, X. (2013). Integration of orthoimagery and lidar data for object-based urban thematic mapping using random forests. International Journal of Remote Sensing, 34(14), 5166-5186. doi:10.1080/01431161.2013.788261
Hajj, N., Filo, M., & Awad, M. (2018). Automated composer recognition for multi-voice piano compositions using rhythmic features, n-grams and modified cortical algorithms. Complex & Intelligent Systems, 4(1), 55-65. doi:10.1007/s40747-017-0052-x
Herzig, K., Just, S., & Zeller, A. (2013, 18-26 May 2013). It's not a bug, it's a feature: How misclassification impacts bug prediction. Paper presented at the 2013 35th International Conference on Software Engineering (ICSE).
Jo, T. (2018, 11-14 Feb. 2018). String Vector based KNN for text categorization. Paper presented at the 2018 20th International Conference on Advanced Communication Technology (ICACT).
Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11-21. doi:10.1108/eb026526
Kim, K. (2018). An improved semi-supervised dimensionality reduction using feature weighting: Application to sentiment analysis (Vol. 109).
Kukkar, A., & Mohana, R. (2018). A Supervised Bug Report Classification with Incorporate and Textual field Knowledge. Procedia Computer Science, 132, 352-361. doi:https://doi.org/10.1016/j.procs.2018.05.194
Lin, C.-Y., & Hovy, E. (2003). Automatic Evaluation of Summaries Using n-gram Co-occurrence Statistics.
Liu, W. J., Wang, S. S., Chen, X., & Jiang, H. (2018). Predicting the Severity of Bug Reports Based on Feature Selection. International Journal of Software Engineering and Knowledge Engineering, 28(4), 537-558. doi:10.1142/s0218194018500158
Liu, Y., Shi, Y. K., Xu, M. W., Zhang, L. L., Yu, N., & Ding, Y. L. (2017, 10-13 Dec. 2017). A further improved support vector machine model along with particle swarm optimization for face orientations recognition based on eigeneyes by using hybrid kernel. Paper presented at the 2017 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM).
Lo, S. L., Cambria, E., Chiong, R., & Cornforth, D. (2016). A multilingual semi-supervised approach in deriving Singlish sentic patterns for polarity detection. Knowledge-Based Systems, 105, 236-247. doi:10.1016/j.knosys.2016.04.024
Luaphol, B., Srikudkao, B., Polpinij, J., & Kaenampornpan, M. (2018, 24-26 Oct. 2018). Assembling Relevant Bug Report using the Constraint-based k-means Clustering. Paper presented at the 2018 International Conference on Information Technology (InCIT).
Manalu, S. R. (2017, 27-30 June 2017). Stop words in review summarization using TextRank. Paper presented at the 2017 14th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON).
Manochandar, S., & Punniyamoorthy, M. (2018). Scaling feature selection method for enhancing the classification performance of Support Vector Machines in text mining. Computers & Industrial Engineering, 124, 139-156. doi:10.1016/j.cie.2018.07.008
Masso, M. (2015). Sequence-Based Predictive Models of Resistance to HIV-1 Integrase Inhibitors: An n-Grams Approach to Phenotype Assessment. Current Hiv Research, 13(6), 497-502. doi:10.2174/1570162x13666150624100535
Nagwani, N. K., & Verma, S. (2016). Generating Intelligent Summary Terms for Improving Knowledge Discovery in Software Bug Repositories. International Journal of Software Engineering and Knowledge Engineering, 26(5), 827-844. doi:10.1142/s0218194016500273
Palshikar, G. K., Apte, M., & Pandita, D. (2018). Weakly Supervised and Online Learning of Word Models for Classification to Detect Disaster Reporting Tweets. Information Systems Frontiers, 20(5), 949-959. doi:10.1007/s10796-018-9830-2
Pandey, N., Sanyal, D. K., Hudait, A., & Sen, A. (2017). Automated classification of software issue reports using machine learning techniques: an empirical study. Innovations in Systems and Software Engineering, 13(4), 279-297. doi:10.1007/s11334-017-0294-1
Pawlovsky, A. P. (2018, 24-27 Jan. 2018). An ensemble based on distances for a kNN method for heart disease diagnosis. Paper presented at the 2018 International Conference on Electronics, Information, and Communication (ICEIC), Honolulu, HI, USA.
Pradip, K. G., & Patil, D. R. (2016, 18-19 March 2016). Summarization of sentences using fuzzy and hierarchical clustering approach. Paper presented at the 2016 Symposium on Colossal Data Analysis and Networking (CDAN).
Rahman, A., & Islam, Z. (2018). Application of a density based clustering technique on biomedical datasets. Applied Soft Computing, 73, 623-634. doi:10.1016/j.asoc.2018.09.012
Rakha, M. S., Bezemer, C. P., & Hassan, A. E. (2018). Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval. Empirical Software Engineering, 23(5), 2597-2621. doi:10.1007/s10664-017-9590-5
Rampado, O., Gianusso, L., Nava, C. R., & Ropolo, R. (2019). Analysis of a CT patient dose database with an unsupervised clustering approach. Physica Medica-European Journal of Medical Physics, 60, 91-99. doi:10.1016/j.ejmp.2019.03.015
Roslan, R., Nazery, N. A., Jamil, N., & Hamzah, R. (2017, 24-27 Oct. 2017). Color-based bird image classification using Support Vector Machine. Paper presented at the 2017 IEEE 6th Global Conference on Consumer Electronics (GCCE).
Saito, S., Iimura, Y., Massey, A. K., & Antón, A. I. (2017, 4-8 Sept. 2017). How Much Undocumented Knowledge is there in Agile Software Development?: Case Study on Industrial Project Using Issue Tracking System and Version Control System. Paper presented at the 2017 IEEE 25th International Requirements Engineering Conference (RE).
Sanchez-Gomez, J. M., Vega-Rodriguez, M. A., & Perez, C. J. (2018). Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowledge-Based Systems, 159, 1-8. doi:10.1016/j.knosys.2017.11.029
Sharma, M., & Singh, V. (2016). Clustering-based association rule mining for bug assignee prediction (Vol. 11).
Shukla, S. K., & Koley, E. (2017, 21-23 Dec. 2017). Detection and classification of open conductor faults in six-phase transmission system using k-nearest neighbour algorithm. Paper presented at the 2017 7th International Conference on Power Systems (ICPS).
Singha, S., & Shenoy, P. P. (2018). An adaptive heuristic for feature selection based on complementarity. Machine Learning, 107(12), 2027-2071. doi:10.1007/s10994-018-5728-y
Stewart, T. G., Zeng, D. L., & Wu, M. C. (2018). Constructing support vector machines with missing data. Wiley Interdisciplinary Reviews-Computational Statistics, 10(4), 16. doi:10.1002/wics.1430
Thu, H. N. T., Ngoc, C. N., Ngoc, T. N., & Huynh, H. X. (2016). Improving Quality of Vietnamese Text Summarization Based on Sentence Compression. International Journal of Advanced Computer Science and Applications, 7(2), 362-366.
Tian, Y., Lo, D., Xia, X., & Sun, C. N. (2015). Automated prediction of bug report priority using multi-factor analysis. Empirical Software Engineering, 20(5), 1354-1383. doi:10.1007/s10664-014-9331-y
Wang, X. J., Zhang, Y. J., Luo, Y. P., He, J. H., Ling, P., & Fang, C. (2018). Two-layer coordination architecture HIF detection with mu PMU data. Journal of Engineering-Joe(15), 1033-1037. doi:10.1049/joe.2018.0258
Weinmann, M., Jutzi, B., Hinz, S., & Mallet, C. (2015). Semantic point cloud interpretation based on optimal neighborhoods, relevant features and efficient classifiers. ISPRS Journal of Photogrammetry and Remote Sensing, 105, 286-304. doi:https://doi.org/10.1016/j.isprsjprs.2015.01.016
Yildiz, K., Camurcu, Y., & Dogan, B. (2018). Comparison of Dimension Reduction Techniques on High Dimensional Datasets. International Arab Journal of Information Technology, 15(2), 256-262.
Zhao, S. T., Sun, J. Q., Shimizu, K., & Kadota, K. (2018). Silhouette Scores for Arbitrary Defined Groups in Gene Expression Data and Insights into Differential Expression Results. Biological Procedures Online, 20, 12. doi:10.1186/s12575-018-0067-8
Zhou, S., Xu, Z., & Liu, F. (2017). Method for Determining the Optimal Number of Clusters Based on Agglomerative Hierarchical Clustering. IEEE Transactions on Neural Networks and Learning Systems, 28(12), 3007-3017. doi:10.1109/TNNLS.2016.2608001
Zhou, Y., Tong, Y., Gu, R., & Gall, H. (2016). Combining text mining and data mining for bug report classification. Journal of Software: Evolution and Process, 28(3), 150-176. doi:10.1002/smr.1770
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2022-07-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2022-07-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw