進階搜尋


下載電子全文  
系統識別號 U0026-0808201611144100
論文名稱(中文) 基於社群媒體情感分析歸納產品屬性優缺點
論文名稱(英文) Aspect-based Pros and Cons Summarization from Social Media Sentiment Analysis
校院名稱 成功大學
系所名稱(中) 資訊工程學系
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 104
學期 2
出版年 105
研究生(中文) 沈昱成
研究生(英文) Yu-Cheng Shen
學號 P76034402
學位類別 碩士
語文別 英文
論文頁數 44頁
口試委員 指導教授-蔣榮先
口試委員-高宏宇
口試委員-吳宗憲
口試委員-盧文祥
口試委員-蔡正發
中文關鍵字 屬性擷取  情感分析  循序樣式挖掘  基於屬性的情感分析  資訊擷取 
英文關鍵字 Aspect extraction  Sentiment analysis  Sequential pattern mining  Aspect-based sentiment analysis  Information extraction 
學科別分類
中文摘要 越來越多的用戶使用社交媒體分享或交換各種主題的意見,使用者在發文中的情緒和意見表達對消費者和企業來說是一個有用的資源,它能夠協助消費者者和企業去做出某些決策,而由於社群媒體資料成長快速和資料量的龐大,使用者很難快速地閱讀並從中總結出使用者的情緒傾向或意見。以往的研究通常使用相關情緒分析的方法從特定項目的文檔級別中挖掘並總結使用者的意見傾向,然而,從文檔級別的提取情感傾向有某些缺點,我們無法知道用戶覺得高興或是不高興的是項目中的哪些方面、屬性或是功能。循序樣式探勘就是個方法可以被實作去擷取其產品中的屬性,然而,傳統的循序模式挖掘法應用於屬性提取具有兩個問題,像是晶格結構問題和靈活度的問題。此外,如果我們能準確地擷取產品的屬性和分類其屬性的情緒傾向,這些產品屬性對人們來說還是太多了,使用者還是很難從中總結出他們想要看到的資訊。正如上面提到的問題,這項研究為此提出一個產品屬性的優點和缺點總結系統包含了我們提出的靈活循序規則探勘方法來幫助用戶從社群媒體來匯總訊息和進一步的決策。
這項研究的目的是開發一個產品屬性優點和缺點總結系統,其中我們利用提出的靈活循序規則探勘方法來發現循序規則,並用這些規則來擷取相關的產品屬性,然後透過機器學習的做法來分類屬性在句子中情緒傾向。當完成擷取屬性和分類屬性的情緒傾向後,我們使用這些屬性和其情緒傾向來總結出對於給定的查詢(產品)的一些屬性當作優點和缺點。
在實驗中,我們建立了一個實驗數據集來評估產品屬性檢索和基於屬性的情緒分類的準確性和可行性。在產品屬性擷取的實驗中,實驗結果說明我們提出的靈活循序規則探勘方法具有不錯精確度但有較低的召回率,而且和其他的基礎的方法比起來,此方法也有比較好的性能。在基於屬性的情緒分類實驗中,結果說明產品屬性檢索在產品屬性的情緒分類中扮演著一個重要的角色,而且和其他的基礎的方法比起來,此方法也有比較好的性能。最後我們選擇了具有最小和最大數據量的兩個產品來討論其找出的優點和缺點結果。結果顯示性能在小資料量時還不錯,然而性能在資料量大時並沒有很好,大部分的實驗資料都是偏向某一領域,而此產品算是比較新穎的,因此實驗資料產生出來的規則在此產品上可能不太適用。
在這項研究中,我們成功地開發了產品屬性的優點和缺點總結系統。通過我們建立的實驗數據集進行評估和案例研究設置中,我們驗證了系統的可用性和有效性。最後,我們期望此系統可以幫助人們從社群媒體中閱讀和總結出情感傾向,並進一步協助用戶在決策或決定行銷策略。
英文摘要 More and more users use social media to share opinions on a variety of topics. People’s opinions and sentiments expressed on social media are useful resources for consumers and companies that can help them to make decisions. Due to information explosion, it is difficult for people to read and summarize the sentiment orientation of social media. Previous studies have often used a sentiment analysis method to mine and summarize user opinions of items at document level. However, extracting sentiment orientation at the document level has some limitations. We cannot determine which aspects of the items that users are happy or unhappy about. Sequential pattern mining can be implemented to extract such aspects. However, traditional sequential pattern mining has two problems for aspect extraction, including lattice structure problem and flexibility problem. Furthermore, if we extract these aspects and classify their sentiment orientation precisely, there will be too many aspects for people to summarize the information they want to see. Because of the problems mentioned above, this study proposed a framework of aspect-based pros and cons summarization with flexible sequential rule mining proposed to help users to summarize information and make decisions based on social media.
The aim of this study is to develop an aspect-based pros and cons summarization system. Flexible sequential rule mining is proposed to discover sequential rules and utilize the rules to extract the aspects of an item. Then, sentiment orientation of aspects are classified via the machine-learning approach. Upon sentiment classification of the aspects, we use aspects and the sentiment information to summarize some pros and cons for a given query (item) based on social media.
In the experiment, we built an experimental data set to evaluate the accuracy of aspect retrieval and aspect-based sentiment classification. In the experiments on aspect retrieval, the results showed that flexible sequential rule mining has high precision but low recall. Furthermore, it obtained better performance than baseline methods. In the experiments on aspect-based sentiment classification, the results showed that aspect retrieval is an important part of the sentiment classification of aspects. Furthermore, the performance of our model was better than that of baseline methods. Finally, we selected two products had the smallest and largest quantity of data to discuss the results of pros and cons summarization. The performance of small data was better than that of large data. Most of the experimental data belonged to a particular area. Therefore, the rules generated from the experimental dataset on this product may not be applicable to other contexts.
In this study, we developed an aspect-based pros and cons summarization system. Through building an experimental data set for evaluation and a case study, we verified the availability and validity of our system. Finally, we expect the system will help people read and summarize sentiment orientation from social media that will assist them in making decisions or determine marketing strategies.
論文目次 中文摘要 I
Abstract III
誌謝 V
Contents VI
List of Tables VIII
List of Figures IX
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Research Objective and Specific Aims 4
1.3 Thesis Organization 5
Chapter 2 Related Work 6
2.1 Aspect Extraction 6
2.1.1 Frequency-based approach 6
2.1.2 Machine-Learning approach 7
2.1.3 Rule-based approach 8
2.2 Sequential Pattern Mining 9
2.3 Sentiment Analysis 10
2.3.1 Lexicon based approach 10
2.3.2 Machine-Learning based approach 11
Chapter 3 Aspect-based Pros and Cons Summarization 13
3.1 Data Collection 14
3.2 Preprocessing 15
3.3 Aspect Retrieval 16
3.3.1 Sentiment Lexicons Anonymization 17
3.3.2 Aspect & Noun Anonymization 18
3.3.3 Rule Generation 19
3.3.4 Aspect Extraction 20
3.3.5 Post-processing 21
3.4 Feature Extraction 21
3.5 Sentiment Classification 23
3.6 Pros & Cons Summarization 24
Chapter 4 Experiments 25
4.1 Experimental Design 25
4.2 Experimental Dataset Collection 26
4.3 Evaluation Criteria 28
4.4 Experimental Results 29
4.4.1 Aspect Retrieval 30
4.4.2 Aspect-based sentiment classification 32
4.5 Case Study and Discussion 36
Chapter 5 Conclusions and Future Work 39
5.1 Conclusions 39
5.2 Future Work 40
References 42
參考文獻 Aggarwal, C. C., & Han, J. (2014). Frequent pattern mining: Springer.

Chamlertwat, W., Bhattarakosol, P., Rungkasiri, T., & Haruechaiyasak, C. (2012). Discovering Consumer Insight from Twitter via Sentiment Analysis. J. UCS, 18(8), 973-992.

Esuli, A., & Sebastiani, F. (2006). Sentiwordnet: A publicly available lexical resource for opinion mining. Paper presented at the Proceedings of LREC.

Fournier-Viger, P., Nkambou, R., & Tseng, V. S.-M. (2011). RuleGrowth: mining sequential rules common to several sequences by pattern-growth. Paper presented at the Proceedings of the 2011 ACM symposium on applied computing.

Gimpel, K., Schneider, N., O'Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., & Smith, N. A. (2011). Part-of-speech tagging for twitter: Annotation, features, and experiments. Paper presented at the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2.

Go, A., Bhayani, R., & Huang, L. (2009). Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford, 1, 12.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. ACM SIGKDD explorations newsletter, 11(1), 10-18.

Hu, M., & Liu, B. (2004a). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.

Hu, M., & Liu, B. (2004b). Mining opinion features in customer reviews. Paper presented at the AAAI.

Huang, S.-T., Li, P.-S., & Kao, H.-Y. (2015). Identification of item features in microblogging data. Paper presented at the 2015 Conference on Technologies and Applications of Artificial Intelligence (TAAI).

Jakob, N., & Gurevych, I. (2010). Extracting opinion targets in a single-and cross-domain setting with conditional random fields. Paper presented at the Proceedings of the 2010 conference on empirical methods in natural language processing.

Jin, W., Ho, H. H., & Srihari, R. K. (2009). OpinionMiner: a novel machine learning system for web opinion mining and extraction. Paper presented at the Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining.

Lek, H. H., & Poo, D. C. (2013). Aspect-based Twitter sentiment classification. Paper presented at the 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

Li, S., Zhou, L., & Li, Y. (2015). Improving aspect extraction by augmenting a frequency-based method with web-based similarity measures. Information Processing & Management, 51(1), 58-67.

Liu, B. (2010). Sentiment Analysis and Subjectivity. Handbook of natural language processing, 2, 627-666.

Liu, Q., Gao, Z., Liu, B., & Zhang, Y. (2015). Automated rule selection for aspect extraction in opinion mining. Paper presented at the Proceedings of the 24th International Conference on Artificial Intelligence.

Mabroukeh, N. R., & Ezeife, C. I. (2010). A taxonomy of sequential pattern mining algorithms. ACM Computing Surveys (CSUR), 43(1), 3.

Mei, Q., Ling, X., Wondra, M., Su, H., & Zhai, C. (2007). Topic sentiment mixture: modeling facets and opinions in weblogs. Paper presented at the Proceedings of the 16th international conference on World Wide Web.

Moghaddam, S., & Ester, M. (2010). Opinion digger: an unsupervised opinion miner from unstructured product reviews. Paper presented at the Proceedings of the 19th ACM international conference on Information and knowledge management.

Moghaddam, S., & Ester, M. (2012). On the design of LDA models for aspect-based opinion mining. Paper presented at the Proceedings of the 21st ACM international conference on Information and knowledge management.

Mohammad, S. M., Kiritchenko, S., & Zhu, X. (2013). NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets. arXiv preprint arXiv:1308.6242.

Owoputi, O., O'Connor, B., Dyer, C., Gimpel, K., Schneider, N., & Smith, N. A. (2013). Improved part-of-speech tagging for online conversational text with word clusters.

Pak, A., & Paroubek, P. (2010). Twitter as a Corpus for Sentiment Analysis and Opinion Mining. Paper presented at the LREc.

Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: sentiment classification using machine learning techniques. Paper presented at the Proceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10.

Qiu, G., Liu, B., Bu, J., & Chen, C. (2009). Expanding Domain Sentiment Lexicon through Double Propagation. Paper presented at the IJCAI.

Tumasjan, A., Sprenger, T. O., Sandner, P. G., & Welpe, I. M. (2010). Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. ICWSM, 10, 178-185.

Turney, P. D. (2002). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Paper presented at the Proceedings of the 40th annual meeting on association for computational linguistics.

Wang, H., Can, D., Kazemzadeh, A., Bar, F., & Narayanan, S. (2012). A system for real-time twitter sentiment analysis of 2012 us presidential election cycle. Paper presented at the Proceedings of the ACL 2012 System Demonstrations.

Yu, C. (2009). Mining product features from free-text customer reviews: An SVM-based approach. Paper presented at the 2009 First International Conference on Information Science and Engineering.

王秀芬. (2011). 網路社群發展媒體與分析: 產業情報研究所(Market Intelligence & Consulting Institute, MIC)研究報告.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2018-08-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2018-08-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw