進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0807201315222100
論文名稱(中文) 改善中文產品特徵擷取於3C商品評價之研究
論文名稱(英文) A Method of Improving Chinese Product Feature Extraction in 3C Product Evaluation
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 101
學期 2
出版年 102
研究生(中文) 張薰云
研究生(英文) Hsun-Yun Chang
學號 R76004124
學位類別 碩士
語文別 中文
論文頁數 67頁
口試委員 指導教授-王惠嘉
口試委員-盧文祥
口試委員-高宏宇
口試委員-劉任修
中文關鍵字 中文斷詞改善  產品特徵擷取  意見偵測  產品評價 
英文關鍵字 Chinese segmentation refinement  Product feature extraction  Opinion detection  Product evaluation 
學科別分類
中文摘要 目前網路上存在著很多購物網站,雖然這些網站大幅增加了使用者消費的便利性和多樣性,卻也導致消費者在決定購買哪項產品時無從下手。即使這些購物網站大多數都有提供產品推薦功能來給予消費者購買建議,但隨著Web2.0的發展,資訊分享變得越來越快速,這些推薦結果已經不足以滿足使用者對於購買產品時的資訊需要。尤其是3C相關產品的生命週期較短、單價較高,因此消費者傾向從網路上去搜尋其他曾經購買過這項產品的使用者所發表的意見和評論,藉以做出更明智的購買決策。但是這些評論文章通常是高度非結構化的內容,其中包含了產品本身的特徵以及評論者對這些特徵的主觀意見,使得消費者即使在閱讀了這麼多的評論之後,仍然必須依靠自己歸納整理才能做為購買時的依據。
過去有許多研究試圖從這些複雜且非結構化的評論文章中找出產品特徵及描述它的意見字詞,以做為評論者對於特定產品的評價資訊,但他們大都是針對英文評論文章進行處理,由於中文和英文的語言特性大不相同,使得處理中文評論文章時會碰上許多問題。所以本研究針對中文3C產品評論文章,藉由改善中文斷詞的結果來增加產品特徵和意見字詞的擷取效能,並以本研究提出的方法找出其他使用者所關注的特徵以及其評論意見優劣,將其做為產品的評價資訊提供給使用者。實驗證明本研究改善後的斷詞結果提升了特徵和意見字詞的擷取效果,修正後的意見極性計算方法也增加了正負面評價的判斷正確率,最終實作出的系統能夠提供更完整的資訊,幫助使用者更快、更容易地了解其他人對於產品的評價,讓使用者在購買3C產品時可以做出更好的決策。
英文摘要 There are many online-shopping sites on the Internet, these websites not only increase the convenience and diversity of the purchasing of consumers, but also increase the difficulty of users in making purchasing decision. Even if most of these sites has product recommendation to give consumers advice, but as the development of Web 2.0, these results are not able to fulfill the information needs of the users. Especially 3C products have short life cycle and high price, consumers tend to search for the comments and reviews of the users who have purchased the products in order to make a better decision.
But these reviews are usually highly-unstructured content, which contains the features of products and the subjective opinions of the features. It is difficult for users to extract useful information from these reviews. Many researches are trying to identify the product features and their opinions from those complex and unstructured reviews as the evaluation of the product, but most of them focus on the English reviews. Due to the difference of Chinese and English, it is much difficult in processing Chinese reviews.
This study focus on the Chinese reviews, try to increase the performance of feature and opinion extraction. We propose a method to identify the product features which the other users concerned and its opinions, and use these information as the evaluations of products for the users. Experiments show that the improved segmentation has increased the effects of feature and opinion extraction. The revised polarity calculation method also increases the accuracy of opinion polarity. The system can provide more comprehensive information to help users understand the evaluations of product provided by the other users more quickly and easily, the users can make better decisions while purchasing 3C product.
論文目次 1. 緒論 1
1.1. 研究背景與動機 2
1.2. 研究目的 4
1.3. 研究限制與範圍 5
1.4. 研究流程 6
1.5. 論文大綱 7
2. 文獻探討 9
2.1. 自然語言處理 9
2.1.1. 中文斷詞處理 9
2.2. 搜尋引擎 12
2.2.1. Google 13
2.3. 產品特徵擷取 14
2.3.1. 非監督式的擷取方法 14
2.3.2. 監督式的擷取方法 15
2.4. 意見探勘 16
2.4.1. 意見極性計算 17
2.5. 小結 20
3. 研究方法 21
3.1. 研究架構 21
3.2. 資料收集與前處理模組 22
3.3. 斷詞改善模組 23
3.3.1. 潛在字詞偵測 24
3.3.2. 潛在字詞合併 26
3.3.3. 合併篩選規則 27
3.4. 意見偵測模組 29
3.4.1. 意見極性計算 29
3.5. 特徵擷取模組 32
3.5.1. Frequent Feature 33
3.5.2. Infrequent Feature 34
3.6. 產品評價模組 35
3.6.1. 產品名稱擷取 35
3.6.2. Feature-Opinion pair 36
3.6.3. 產品評價分數 37
3.7. 小結 38
4. 系統建置與驗證 39
4.1. 系統建置 39
4.1.1. 實作環境 39
4.1.2. 系統處理流程 39
4.2. 實驗方法 40
4.2.1. 資料來源 41
4.2.2. 評估指標 42
4.3. 實驗結果與分析 43
4.4. 系統畫面範例 57
4.5. 小結 59
5. 結論以及未來研究方向 60
5.1. 研究成果 60
5.2. 未來研究方向 62
參考文獻 63
參考文獻 英文文獻
Bollegala, D., Matsuo, Y., & Ishizuka, M. (2011). Automatic Discovery of Personal Name Aliases from the Web. Knowledge and Data Engineering, IEEE Transactions on, 23(6)
Brooke, J. (1996). SUS: A quick and dirty usability scale. In P. W. Jordan, B. Weerdmeester, A. Thomas & I. L. McLelland (Eds.), Usability evaluation in industry: Taylor and Francis.
Chang, P.-C., Galley, M., & Manning, C. D. (2008). Optimizing Chinese word segmentation for machine translation performance. Paper presented at the Proceedings of the Third Workshop on Statistical Machine Translation, Columbus, Ohio.
Chen, J., & Yin, J. (2006). Recommendation Based on Influence Sets In Proc. of WebKDD 2006: KDD Workshop on Web Mining and Web Usage Analysis, in conjunction with the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2006.
Chen, K.-J., & Bai, M.-H. (1998). Unknown Word Detection for Chinese by a Corpus-based Learning Method. International Journal of Computational Linguistics and Chinese Language Processing, 3(1), 27-44.
Chen, K.-J., & Liu, S.-H. (1992). Word identification for Mandarin Chinese sentences. Paper presented at the Proceedings of the 14th conference on Computational linguistics - Volume 1, Nantes, France.
Chen, K.-J., & Ma, W.-Y. (2002). Unknown word extraction for Chinese documents. Paper presented at the Proceedings of the 19th international conference on Computational linguistics - Volume 1, Taipei, Taiwan.
Cheung, C. M. K. (2008). The structure of web-based information systems satisfaction: Testing of competing models. Journal of the American Society for Information Science and Technology, 59(10), 1617-1630.
Chu, W., & Park, S.-T. (2009). Personalized recommendation on dynamic content using predictive bilinear models. Paper presented at the Proceedings of the 18th international conference on World wide web, Madrid, Spain.
Deshpande, M., & Karypis, G. (2004). Item-based top-N recommendation algorithms. ACM Transactions on Information Systems, 22(1), 143-177.
Ding, C., Simon, H. D., Jin, R., & Li, T. (2007). A learning framework using Green's function and kernel regularization with application to recommender system. Paper presented at the Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, San Jose, California, USA.
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.-M., Shaked, T., Soderland, S., . . . Yates, A. (2005). Unsupervised named-entity extraction from the web: an experimental study. Artificial Intelligence, 165(1), 91-134.
Fu, G., & Luke, K.-K. (2005). Chinese named entity recognition using lexicalized HMMs. ACM SIGKDD Explorations Newsletter, 7(1), 19-25.
Godes, D., & Mayzlin, D. (2004). Using Online Conversations to Study Word-of-Mouth Communication. Marketing Science, 23(4), 545-560.
Hariharan, S., Srimathi, R., Sivasubramanian, M., & Pavithra, S. (2010). Opinion mining and summarization of reviews in web forums. Paper presented at the Proceedings of the Third Annual ACM Bangalore Conference, Bangalore, India.
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA.
Hu, N., Liu, L., & Zhang, J. (2008). Do online reviews affect product sales? The role of reviewer characteristics and temporal effects. Information Technology and Management, 9(3), 201-214.
Koren, Y. (2009). Collaborative filtering with temporal dynamics. Paper presented at the Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France.
Ku, L.-W., & Chen, H.-H. (2007). Mining opinions from the Web: Beyond relevance retrieval. Journal of the American Society for Information Science and Technology, 58(12), 1838-1850.
Leung, C. W. K., Chan, S. C. F., & Chung, F. (2006). Integrating Collaborative Filtering and Sentiment Analysis: A Rating Inference Approach. Paper presented at the Proceedings of The ECAI 2006 Workshop on Recommender Systems.
Levy, R., & Manning, C. (2003). Is it harder to parse Chinese, or the Chinese Treebank? Paper presented at the Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, Sapporo, Japan.
Li, T., Zhu, S., & Ogihara, M. (2003). A new distributed data mining model based on similarity. Paper presented at the Proceedings of the 2003 ACM symposium on Applied computing, Melbourne, Florida.
Liang, T.-P., Yang, Y.-F., Chen, D.-N., & Ku, Y.-C. (2008). A semantic-expansion approach to personalized knowledge recommendation. Decision Support Systems, 45(3), 401-412.
Liu, B., Hu, M., & Cheng, J. (2005). Opinion observer: analyzing and comparing opinions on the Web. Paper presented at the Proceedings of the 14th international conference on World Wide Web, Chiba, Japan.
Ma, W.-Y., & Chen, K.-J. (2003). A bottom-up merging algorithm for Chinese unknown word extraction. Paper presented at the Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17, Sapporo, Japan.
Missen, M. M. S., Boughanem, M., & Cabanac, G. (2009). Challenges for Sentence Level Opinion Detection in Blogs. Paper presented at the Proceedings of the 2009 Eigth IEEE/ACIS International Conference on Computer and Information Science.
Monfil-Contreras, E. U., Alor-Hernández, G., Cortes-Robles, G., Rodriguez-Gonzalez, A., & Gonzalez-Carrasco, I. (2013). RESYGEN: A Recommendation System Generator using domain-based heuristics. Expert Systems with Applications, 40(1), 242-256.
Mudambi, S. M., & Schuff, D. (2010). What Makes a Helpful Online Review? A Study of Customer Reviews on Amazon.com.. Mis Quarterly, 34(1), 185-200.
Palmer, D. D. (1997). A trainable rule-based algorithm for word segmentation. Paper presented at the Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain.
Pang, B., & Lee, L. (2005). Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. Paper presented at the Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, Michigan.
Peng, F., Feng, F., & McCallum, A. (2004). Chinese segmentation and new word detection using conditional random fields. Paper presented at the Proceedings of the 20th international conference on Computational Linguistics, Geneva, Switzerland.
Popescu, A.-M., & Etzioni, O. (2005). Extracting product features and opinions from reviews. Paper presented at the Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada.
Sproat, R., & Shih, C. (1990). A statistical method for finding word boundaries in Chinese text. Computer Processing of Chinese and Oriental Languages, 4(4), 336-351.
Sun, W. (2011). A stacked sub-word model for joint Chinese word segmentation and part-of-speech tagging. Paper presented at the Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, Portland, Oregon.
Teahan, W. J., McNab, R., Wen, Y., & Witten, I. H. (2000). A compression-based algorithm for Chinese word segmentation. Computational Linguistics, 26(3), 375-393.
Turney, P. D. (2001). Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. Paper presented at the Proceedings of the 12th European Conference on Machine Learning.
Turney, P. D. (2002). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Paper presented at the Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania.
Wang, B., & Wang, H. (2007). Bootstrapping both Product Properties and Opinion Words from Chinese Reviews with Cross-Training. Paper presented at the Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence.
Wang, D., Zhu, S., & Li, T. (2013). SumView: A Web-based engine for summarizing product reviews and customer opinions. Expert Systems with Applications, 40(1), 27-33.
Wang, F. L., & Yang, C. C. (2007). Mining Web data for Chinese segmentation. Journal of the American Society for Information Science, 58(12), 1820-1837.
Wei, Z. H., Miao, D. Q., Chauchat, J. H., Zhao, R., & Li, W. (2009). N-grams based feature selection and text representation for Chinese Text Classification.. International Journal of Computational Intelligence Systems, 2(4), 365-374.
Wenjing, Z., & Yanquan, Z. (2009, 24-27 Sept. 2009). A template-based approach to extract product features and sentiment words. Paper presented at the Natural Language Processing and Knowledge Engineering, 2009. NLP-KE 2009. International Conference on.
Wong, P.-k., & Chan, C. (1996). Chinese word segmentation based on maximum matching and word binding force. Paper presented at the Proceedings of the 16th conference on Computational linguistics - Volume 1, Copenhagen, Denmark.
Wu, Z., & Tseng, G. (1993). Chinese text segmentation for text retrieval: Achievements and problems. Journal of the American Society for Information Science and Technology, 44(9), 532-542.
Xue, N. (2003). Chinese Word Segmentation as Character Tagging. Computational Linguistics and Chinese Language Processing, 29-48.
Xue, N., Chiou, F.-D., & Palmer, M. (2002). Building a large-scale annotated Chinese corpus. Paper presented at the Proceedings of the 19th international conference on Computational linguistics - Volume 1, Taipei, Taiwan.
Yang, C. C., & Li, K. W. (2005). A heuristic method based on a statistical approach for Chinese text segmentation. Journal of the American Society for Information Science and Technology, 56(13), 1438-1447.
Yang, C. C., Luk, J. W. K., Yung, S. K., & Yen, J. (2000). Combination and boundary detection approaches on Chinese indexing. Journal of the American Society for Information Science and Technology, 51(4), 340-351.
Yao, P., & Yu, W. (2011, 24-26 Dec. 2011). Mining product features and opinions based on pattern matching. Paper presented at the Computer Science and Network Technology (ICCSNT), 2011 International Conference on.
Yeh, C. L., & Lee, H. J. (1991). Rule-based word identification for Mandarin Chinese sentences - a unification approach. Paper presented at the Computer Processing of Chinese and Oriental Languages.
Zeng, D., Wei, D., Chau, M., & Wang, F. (2011). Domain-specific Chinese word segmentation using suffix tree and mutual information. Information Systems Frontiers, 13(1), 115-125.
Zhuang, L., Jing, F., & Zhu, X.-Y. (2006). Movie review mining and summarization. Paper presented at the Proceedings of the 15th ACM international conference on Information and knowledge management, Arlington, Virginia, USA.
網路資料
Alexa. (2012a). Alexa Top 500 Global Sites Retrieved November 15th, 2012, from http://www.alexa.com/topsites/global
Alexa. (2012b). Alexa Top Sites in Taiwan Retrieved November 15th, 2012, from http://www.alexa.com/topsites/countries/TW
Dong, Z., & Dong, Q. (2007). Chinese/English Vocabulary for Sentiment Analysis, from http://www.keenage.com/html/e_index.html
ePrice. (2012). ePrice比價王 Retrieved December 23th, 2012, from http://www.eprice.com.tw/
Mobile01. (2012). Mobile 01 論壇 Retrieved December 23th, 2012, from http://www.mobile01.com/
Sogi. (2012). Sogi! 手機王 Retrieved December 23th, 2012, from http://www.sogi.com.tw/
Wikipedia. (2012). Google搜尋 - 維基百科,自由的百科全書 Retrieved November 23th, 2012, from http://zh.wikipedia.org/wiki/Google%E6%90%9C%E5%B0%8B
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2018-07-17起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw