進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-1008201717033600
論文名稱(中文) 社群問答網站答案品質分析
論文名稱(英文) Spam Detection and Quality Evaluation in Community Question Answering
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 105
學期 2
出版年 106
研究生(中文) 林思婷
研究生(英文) Si-Ting Lin
學號 R76054014
學位類別 碩士
語文別 中文
論文頁數 67頁
口試委員 指導教授-王惠嘉
口試委員-劉任修
口試委員-高宏宇
口試委員-盧文祥
中文關鍵字 社群問答網站  答案品質  廣告答案 
英文關鍵字 Community Question Answering  Answer Quality  Spam Answers 
學科別分類
中文摘要   網路科技的蓬勃發展使網路成為新型態的資訊分享平台,社群問答網站也應運而生,社群問答網站允許使用者以自然語言的方式提出問題並獲得其他使用者詳細的回答,使用者也可搜尋過往問答紀錄觀看是否有相似的問題及回答,因網站中的答案是由使用者自行提供,鑒於使用者的知識限制及自然語言表達方式過於複雜,答案品質會有極大落差。
  另外,近年來行銷方式改變,廠商會徵求寫手於各大社群網站、論壇及部落格等網站中撰寫推銷自家產品及服務或攻擊對手的文章,目前社群問答網站中也逐漸出現這些廣告文章,使用者須要花費大量時間過濾掉廣告答案及低品質的答案,才能獲得真正符合其需求的答案。過往對社群問答網站答案品質的研究大多將答案分為高品質與低品質兩個類別,但因廣告答案通常會是推銷與問題相關的產品,若採用過往相關研究的方法,可能會因為答案與問題高度相關而將廣告答案判斷為高品質答案,使分類結果不如預期。過往研究亦指出不同問題類型對於答案品質的定義會有不同。因此本研究欲過濾出社群問答網站中的廣告答案,並於不同問題類型下分析答案品質,將答案分為高品質答案、低品質答案及廣告答案三類,讓使用者閱讀答案時能更有效率。實驗結果顯示考慮問題類型的答案品質分析時準確率為0.842,於答案品質分析前先進行廣告答案識別有助於降低將廣告答案誤判為高品質答案的比例。
英文摘要 The rapid development of Internet makes it a new information sharing platform. Community question answering websites emerge as the time required. Users can post and answer questions in the community. Since the answers are devoted by volunteers, due to the knowledge limitation of users and the complexity of natural language expression, the answer quality varies greatly.
Excepting to the quality of answers, some answers are posted by the writers who are paid to post advertising content in social media for commercial purpose. The community question answering become the targets of those campaigns recently. Several researches try to classify the answers in the community question answering website into high-quality and low-quality. However, using those research methods, the spam answer may be misjudged as high-quality answers because the spam answers are usually highly related to the question.
In order to ignore the spam answers and suggest the real high-quality answers, this study wants to filter the spam answers and evaluate the quality of the non-spam answers under different question types. The answers will be divided into high-quality, low-quality and spam. The results show that the accuracy of our quality analysis method is 0.842, and doing spam filtering before answer quality analysis can reduce the proportion of misjudging spam answers as high-quality answers.
論文目次 第1章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 4
1.3 研究範圍與限制 5
1.4 研究流程 5
1.5 論文大綱 6
第2章 文獻探討 8
2.1 社群問答網站答案品質研究 8
2.2 文件分類 10
2.2.1 文件表示方法 10
2.2.2 特徵選取方法 10
2.2.3 分類器 12
2.3 垃圾評論過濾 14
2.4 小結 15
第3章 研究方法 16
3.1 研究架構 16
3.2 資料前處理模組(Data Preprocessing) 19
3.3 問題分類模組(Question Classification) 21
3.4 廣告答案識別模組(Spam Answer Identification) 24
3.4.1 廣告資訊資料庫建立(Initial Spam Information Dataset) 24
3.4.2 廣告答案分類器建立(Spam Classifier Establishment) 26
3.4.3 第一階段過濾(Phase I Filtering) 27
3.4.4 第二階段過濾(Phase II Filtering) 28
3.4.5 廣告資訊資料庫更新(Spam Information Dataset Updating) 28
3.5 答案品質分析模組(Answer Quality Analysis) 30
3.5.1 各問題類型之答案品質分類(Type-Specified Quality Classification) 30
3.5.2 混合式答案品質分析(Hybrid Quality Analysis) 34
第4章 系統建置與驗證 36
4.1 系統環境建置 36
4.2 實驗方法 36
4.2.1 資料來源 37
4.2.2 評估指標 39
4.3 參數設定 40
4.4 實驗結果 45
4.4.1 實驗一 45
4.4.2 實驗二 47
4.4.3 實驗三 49
4.4.4 實驗四 50
4.4.5 實驗五 56
第5章 結論 58
5.1 研究成果 58
5.2 未來研究方向 60
參考文獻 62
附錄 65
參考文獻 Agichtein, E., Castillo, C., Donato, D., Gionis, A., & Mishne, G. (2008). Finding high-quality content in social media. Paper presented at the Proceedings of the 2008 International Conference on Web Search and Data Mining, Palo Alto, California, USA.
Alberto, T. C., Lochter, J. V., & Almeida, T. A. (2015). Post or Block? Advances in Automatically Filtering Undesired Comments. Journal of Intelligent & Robotic Systems, 80, S245-S259.
Altınel, B., Can Ganiz, M., & Diri, B. (2015). A corpus-based semantic kernel for text classification by using meaning values of terms. Engineering Applications of Artificial Intelligence, 43, 54-66.
Arai, K., & Handayani, A. N. (2013). Predicting quality of answer in collaborative Q/A community. International Journal of Advanced Research in Artificial Intelligence, 2(3), 21-25.
Blooma, M. J., Goh, D. H. L., & Chua, A. Y. K. (2012). Predictors of high-quality answers. Online Information Review, 36(3), 383-400.
Chen, C., Wu, K., Srinivasan, V., & Kesav, B. R. (2015). The Best Answers? Think Twice: Identifying Commercial Campagins in the CQA Forums. Journal of Computer Science and Technology, 30(4), 810-828.
Chua, A. Y. K., & Banerjee, S. (2013). So fast so good: An analysis of answer quality and answer speed in community Question-answering sites. Journal of the American Society for Information Science and Technology, 64(10), 2058-2068.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297.
Fattah, M. A. (2015). New term weighting schemes with combination of multiple classifiers for sentiment analysis. Neurocomputing, 167, 434-442.
Habernal, I., Ptacek, T., & Steinberger, J. (2014). Supervised sentiment analysis in Czech social media. Information Processing & Management, 50(5), 693-707.
Kim, H. K., & Kim, M. (2016). Model-induced term-weighting schemes for text classification. Applied Intelligence, 45(1), 30-43.
Li, H., Chen, Z., Mukherjee, A., Liu, B., & Shao, J. (2015). Analyzing and detecting opinion spam on a large-scale dataset via temporal and spatial patterns. Paper presented at the Proceedings of the 9th International AAAI Conference on Web and Social Media (ICWSM-15), Oxford, UK.
Lin, H. T., Lin, C. J., & Weng, R. C. (2007). A note on Platt’s probabilistic outputs for support vector machines. Machine learning, 68(3), 267-276.
Liu, B., Feng, J., Liu, M., Hu, H., & Wang, X. (2015). Predicting the quality of user-generated answers using co-training in community-based question answering portals. Pattern Recognition Letters, 58, 29-34.
Liu, Y., Wang, Y., Feng, L., & Zhu, X. (2016). Term frequency combined hybrid feature selection method for spam filtering. Pattern Analysis and Applications, 19(2), 369-383.
Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. (2013). Fake review detection: Classification and analysis of real and pseudo reviews: UIC-CS-03-2013. Technical Report.
Shah, C., & Pomerantz, J. (2010). Evaluating and predicting answer quality in community QA. Paper presented at the Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland.
Sharma, A., & Dey, S. (2012). A comparative study of feature selection and machine learning techniques for sentiment analysis. Paper presented at the Proceedings of the 2012 ACM Research in Applied Computation Symposium, San Antonio, TX, USA
Toba, H., Ming, Z. Y., Adriani, M., & Chua, T. S. (2014). Discovering high quality answers in community question answering archives using a hierarchy of classifiers. Information Sciences, 261, 101-115.
Xia, R., Xu, F., Zong, C., Li, Q., Qi, Y., & Li, T. (2015). Dual Sentiment Analysis: Considering Two Sides of One Review. IEEE Transactions on Knowledge and Data Engineering, 27(8), 2120-2133.
Yao, Y., Tong, H., Xie, T., Akoglu, L., Xu, F., & Lu, J. (2015). Detecting high-quality posts in community question answering sites. Information Sciences, 302, 70-82.
Yen, S. J., Wu, Y. C., Yang, J. C., Lee, Y. S., Lee, C. J., & Liu, J. J. (2013). A support vector machine-based context-ranking model for question answering. Information Sciences, 224, 77-87.
mis2000lab(2015)。破窗理論 & 論壇走向.....以Yahoo知識+為例。2016年9月26日,取自http://ithelp.ithome.com.tw/articles/10166745
高照明(2012)。語料庫建構技術—研究報告。2017年5月26日,取自http://wd.naer.edu.tw/project/NAER-101-12-F-2-03-00-2-01.pdf
維基百科(2016)。問答系統。2016年8月22日,取自https://zh.wikipedia.org/wiki/%E5%95%8F%E7%AD%94%E7%B3%BB%E7%B5%B1
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2022-12-31起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2022-12-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw