進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-2306201311233300
論文名稱(中文) 應用語言特徵於分類具聲譽之網路論壇評論者
論文名稱(英文) A Computational Linguistic Approach to Characterize Reputable Reviewers in Online forums
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 101
學期 2
出版年 102
研究生(中文) 張文姬
研究生(英文) Wen-Chi Chang
學號 r76001079
學位類別 碩士
語文別 英文
論文頁數 53頁
口試委員 指導教授-李昇暾
口試委員-林清河
口試委員-耿伯文
口試委員-蔡馥璟
中文關鍵字 文字探勘  信譽評估  語言風格  言據性 
英文關鍵字 Text mining  Reputation estimation  Linguistic style  Evidentiality 
學科別分類
中文摘要 網際網路上充斥大量文字資料,從其中找出可信任的資源相當重要。過去研究多採用使用者之間的網路連結品質、互相投票制度,來找出可靠的部落格或評論者。但此機制易受互惠關係及大者越大效應影響。
文字是人類溝通的媒介,非常適合作為追蹤可靠資訊來源的研究對象。文字探勘多採用內容字(Content words)作為處理對象,以unigram、bigram、feature selection等技術擷取特徵。然而內容字會依主題不同而有劇烈變動,於是讀者只能靠網站推薦熱門評論家,或自行逐年大量閱讀累積記錄可信任的評論者。
因此,本研究將提出一個以文字語言特徵為主的分類機制,改善依靠網路信任機制的缺點。我們認為文字語言特徵如閱讀難度、言據性、嚴謹的書寫格式、功能字(Function words)分布等屬性能有效幫助刻畫有信譽的(Reputable)評論者特徵。
語言或文字風格(Linguistic style)相較於內容字(Content words)是更穩定、可靠的個人指標,它不易受到主題或喜好的影響而改變,因此適合作為區別個體的特徵。本研究將應用語言特徵經過SVM學習進行分類。最後,為了評估本研究所提出的模型,將使用三種評估指標,包括Accuracy、Precision、Recall,並與網路信任機制比較及結合,以期達到較前人更好的效果,並供後續研究者做延伸探討。
英文摘要 Due to the advent and prosperity of Web 2.0, large amount of text data from online users emerge in cyberspace. To effectively filter low-quality information, locating trustworthy resource is an important task.
Past researches mostly utilize link strength quality among users, and voting system to find trustworthy bloggers or reviewers. In this study, we regard texts, as a major communication media, are very suitable for identifying whether the writers are trustworthy or not.
Previous text mining tasks focus on content words, using unigram, bigram, etc., skills to extract features. However, content words may vary dramatically as the topic vary, which leads to difficulty for users to track writer-oriented sources.
To make it easier for users to find reputable or trustworthy reviewers, this study raises a mechanism based on linguistic style of text of reviewers, such as Flesch reading level, writing formality, evidentiality, distribution of function words, to characterize reviewers. Linguistic styles are of extra-proposition which reflects how people communicate, whereas content words convey what people say. Compared to content words, it is a stable, reliable character and not under conscious control that can be used to distinguish individual difference (Pennebaker, Mehl et al. 2003).
Applying these features to SVM, we will utilize accuracy, precision, recall to estimate performance compared to the web of trust mechanism.
論文目次 摘要 III
Abstract IV
Table of Contents V
List of Tables IX
List of Figures X
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Research Objective 2
1.3 Research Structure 3
Chapter 2 Literature Review 4
2.1 Finding target users in community 4
2.2 Concepts related to reputation 5
2.2.1 Reputation 5
2.2.2 Credibility 5
2.2.3 Evidentiality 6
2.3 LIWC 7
2.4 Summary 9
Chapter 3 Research Method 10
3.1 Web Crawling 11
3.2 Data Preprocess 14
3.3 Extract extra-propositional data 14
3.3.1 Writing formality 15
3.3.2 Flesch Reading Ease 16
3.3.3 Evidentiality 17
3.3.4 LIWC factors 18
3.4 Logistic regression 23
3.4.1 Evaluation 24
3.5 Support Vector Machine 24
3.5.1 Cross validation 25
3.5.2 Evaluation 26
3.5.3 Algorithm 27
Chapter 4 Experiment and Analysis 29
4.1 Data collection -- Electronics 29
4.2 Experiment I - Web of trust 31
4.2.1 Experiment results – logistic regression 31
4.2.2 Experiment results – SVM 33
4.3 Experiment II – Linguistic Style 34
4.3.1 Experiment results – logistic regression 34
4.3.2 Experiment results –SVM 35
4.4 Experiment comparison of Electronics dataset 37
4.5 Data collection – Movies 38
4.6 Experiment I - Web of trust 39
4.6.1 Experiment results – logistic regression 39
4.6.2 Experiment results – SVM 40
4.7 Experiment II – Linguistic Style 41
4.7.1 Experiment results – logistic regression 41
4.7.2 Experiment results –SVM 42
4.8 Experiment comparison 43
4.9 Combination of both models 45
4.10 Analysis 47
Chapter 5 Conclusion and Future work 49
References 51
參考文獻 Allport, G.W. (1961). Pattern and growth in personality. New York: Holt, Rinehart &
Winston.
Beukeboom, C. J., M. Tanis and I. E. Vermeulen (2012). "The Language of Extraversion: Extraverted People Talk More Abstractly,Introverts Are More Concrete." Journal of Language and Social Psychology.
Burke, R. (2002). "Hybrid Recommender Systems: Survey and Experiments." User Modeling and User-Adapted Interaction 12(4): 331-370.
Chafe, W. (1986). Evidentiality in English conversation and academic writing. Evidentiality: the linguistic coding of epistemology. W. L. Chafe and J. Nichols, Ablex Publishing Corporation: 261-272.
David W. Hosmer, J., S. Lemeshow and R. X. Sturdivant (2013). Applied Logistic Regression, Wiley.
DeLancey, S. (2001). "The mirative and evidentiality." Journal of Pragmatics 33(3): 369-382.
Dewaele, J.-M. and A. Furnham (2000). "Personality and speech production: a pilot study of second language learners." Personality and Individual Differences 28(2): 355-365.
Ding, C. H. Q. and I. Dubchak (2001). "Multi-class protein fold recognition using support vector machines and neural networks." Bioinformatics 17(4): 349-358.
Flesch, R. (1948). "A new readability yardstick." Journal of Applied Psychology 32(3): 221-233.
Golbeck, J. (2005). Semantic Web Interaction through Trust Network Recommender Systems. International Symposium on Wearable Computers.
Groom, C. J. and J. W. Pennebaker (2002). "Words." Journal of Research in Personality 36(6): 615-621.
Hancock, J. T., L. E. Curry, S. Goorha and M. Woodworth (2007). "On Lying and Being Lied To: A Linguistic Analysis of Deception in Computer-Mediated Communication." Discourse Processes 45(1): 1-23.
Heisele, B., P. Ho, J. Wu and T. Poggio (2003). "Face recognition: component-based versus global approaches." Comput. Vis. Image Underst. 91(1-2): 6-21.
Kim, K. I., K. Jung, S. H. Park and H. J. Kim (2002). "Support Vector Machines for Texture Classification." IEEE Trans. Pattern Anal. Mach. Intell. 24(11): 1542-1550.
Kim, S.-M., P. Pantel, T. Chklovski and M. Pennacchiotti (2006). Automatically assessing review helpfulness. Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Sydney, Australia, Association for Computational Linguistics: 423-430.
Ku, Y.-C., C.-P. Wei and H.-W. Hsiao (2012). "To whom should I listen? Finding reputable reviewers in opinion-sharing communities." Decision Support Systems 53(3): 534-542.
Li, J. and M. Chignell (2010). "Birds of a feather: How personality influences blog writing and reading." International Journal of Human-Computer Studies 68(9): 589-602.
Liu, J., Y. Cao, C.-Y. Lin, Y. Huang and M. Zhou (2007). Low-Quality Product Review Detection in Opinion Summarization. Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).
Lu, Y., P. Tsaparas, A. Ntoulas and L. Polanyi (2010). Exploiting social context for review quality prediction. Proceedings of the 19th international conference on World wide web. Raleigh, North Carolina, USA, ACM: 691-700.
Metzger, M. J. (2007). "Making sense of credibility on the web: Models for evaluating online information and recommendations for future research." Journal of the American Society for Information Science and Technology 58(13): 2078-2091.
Newman, M. L., J. W. Pennebaker, D. S. Berry and J. M. Richards (2003). "Lying words: predicting deception from linguistic styles." Pers Soc Psychol Bull 29(5): 665-675.
Nowson (2006). The Language of Weblogs: A study of genre and individual differences.
Pennebaker, J. and T. Lay (2002). "Language Use and Personality during Crises: Analyses of Mayor Rudolph Giuliani's Press Conferences." Journal of Research in Personality 36(3): 271-282.
Pennebaker, J. W. and L. A. King (1999). "Linguistic styles: Language use as an individual difference." Journal of Personality and Social Psychology 77(6): 1296-1312.
Pennebaker, J. W., M. R. Mehl and K. G. Niederhoffer (2003). "Psychological aspects of natural language use: Our words, our selves." Annual review of psychology 54(1): 547--577.
Riggs, T. and R. Wilensky (2001). An algorithm for automated rating of reviewers. Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries. Roanoke, Virginia, United States, ACM: 381-387.
Rubin, V. L. and E. D. Liddy (2006). "Assessing credibility of weblogs."
Sexton, J. B. and R. L. Helmreich (2000). "Analyzing cockpit communications: the links between language, performance, error, and workload." Hum Perf Extrem Environ 5(1): 63-68.
Su, Q., C.-R. Huang and H. K.-y. Chen (2010). Evidentiality for text trustworthiness detection. Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground. Uppsala, Sweden, Association for Computational Linguistics: 10-17.
Tausczik, Y. R. and J. W. Pennebaker (2010). "The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods." Journal of Language and Social Psychology 29(1): 24-54.
Thelwall, M., K. Buckley, G. Paltoglou, D. Cai and A. Kappas (2010). "Sentiment strength detection in short informal text." Journal of the American Society for Information Science and Technology 61(12): 2544-2558.
Toms, E. G. and A. R. Taves (2004). "Measuring user perceptions of web site reputation." Inf. Process. Manage. 40(2): 291-317.
Trusov, M., A. V. Bodapati and R. E. Bucklin (2010). "Determining Influential Users in Internet Social Networks." Journal of Marketing Research 47(4): 643-658.
Tussyadiah, I. P. and D. R. Fesenmaier (2008). "Marketing Places Through First‐Person Stories—an Analysis of Pennsylvania Roadtripper Blog." Journal of Travel & Tourism Marketing 25(3-4): 299-311.
Vapnik, V. N. (1998). Statistical learning theory. New York, Wiley.
Weerkamp, W. and M. d. Rijke (2008). "Credibility improves topical blog post retrieval."
Yimam-Seid, D. and A. Kobsa (2003). "Expert-Finding Systems for Organizations: Problem and Domain Analysis and the DEMOIR Approach." Journal of Organizational Computing and Electronic Commerce 13(1): 1-24.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2023-01-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw