進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-1706201417045000
論文名稱(中文) 美食文章名稱實體辨識方法之研究
論文名稱(英文) The Method of Name Entity Recognition in Cuisine Article
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 102
學期 2
出版年 103
研究生(中文) 黃品瑞
研究生(英文) Ping-Ruei Huang
學號 R76014048
學位類別 碩士
語文別 中文
論文頁數 51頁
口試委員 指導教授-王惠嘉
口試委員-盧文祥
口試委員-高宏宇
口試委員-劉任修
中文關鍵字 餐廳名稱  美食名稱  部落格探勘  名稱實體擷取  搜尋引擎 
英文關鍵字 restaurant name  cuisine name  blog mining  Name Entity Recognition  search engine 
學科別分類
中文摘要 隨著生活型態逐漸精緻,美食觀光成為近年來熱門的現象,而近年來台灣美食小吃逐漸在國際受到矚目,甚至有許多外國觀光客為此慕名而來。而在決定享用那些美食之前,許多人會依其它人的建議去選擇。在WEB2.0的社群平台當中,部落格擁有豐富的美食相關資訊以及知識,這些有關美食的意見分享以及資訊交流,成為許多人在做決策時的參考依據。
此外,行動裝置的發展,讓地理資訊系統(Geographic Information System, GIS)以及相關的在地化服務(Location Based Service, LBS)隨之而生,許多使用者會利用所在地理位置來查詢需要的資料。但在查詢部落格文章時,常常因為螢幕大小的限制導致使用者在閱讀上的不便。因此,如何快速地從文章中獲得重要的名稱實體成為一個值得研究的議題。
因手持設備顯示的限制,需將資料精簡且準確的選出重要資訊。然而,常遇到的問題是因為擷取錯誤的用詞,導致資訊不夠精確。而過去許多研究為了改善此狀況,嘗試從這些複雜且非結構化的部落格文章中找出文章中的重要字詞。因此,名稱實體辨識(Name Entity Recognition, NER)成了重要工作。獲得名稱實體後,另一工作是判斷作者對特定美食的評價意見,此為意見探勘(Opinion Mining)的應用。
綜上所述,本研究設計一NER的方法,透過觀察文章寫作習慣以及搜尋引擎的幫助,提高美食文章的餐廳名稱實體擷取準確度。並利用資料集中美食名稱斷詞後的特徵字詞,加上搜尋引擎的篩選,重組找出文章中的美食名稱。接著將美食名稱和意見字詞對應,形成成對資訊]。最後,透過文章中的地址資訊轉換後的經緯度地理訊息,建立一個美食地圖,將結果呈現給使用者,作為找尋美食的參考依據。
英文摘要 Among the Web 2.0 community platform, blogs have a wealth of information and knowledge of cuisine. These information of cuisine could be referenced by many people when making decisions. But when looking for blog post, it is often due to limitations of the size of handheld devices, you need to streamline and accurately get the important information. The most common problem is capturing the wrong words. Previous researches tried to find the important words in complex and unstructured blog post, but they couldn’t. Due to these reasons, Name Entity Recognition becomes an important work. We design a Name Entity Recognition method which include restaurant name entity and cuisine name entity to get important words in the article. For the purpose of getting restaurant name entity, we combine the traditional Name Entity Recognition methods, informativeness scores, with search engine. In addition, we observe the author’s writing habits to adjust informativeness scores. Furthermore, we use the results of word segmentation and search engine to get cuisine name entity. The results show the accuracy of named entity recognition become higher. After getting cuisine name entity, we capture author’s opinion through observing the sequence and the part of speech of words. Finally, we use these information to construct a cuisine system. This system can provide user information such as restaurant name, cuisine name and opinion.
論文目次 1. 緒 論 1
1.1 研究背景 1
1.2 研究動機與目的 3
1.3 研究範圍與限制 6
1.4 研究流程 6
1.5 論文大綱 7
2. 文獻探討 9
2.1 名稱實體辨識 9
2.1.1 Inverse Document Frequency (IDF) 9
2.1.2 xI measure 10
2.1.3 Residual IDF 10
2.1.4 Gain 11
2.2 自然語言處理 11
2.2.1 詞性標記 11
2.2.2 向量空間模型 12
2.2.3 中文斷詞處理 12
2.3 機器學習 15
2.3.1 監督式機器學習法 15
2.3.2 非監督式機器學習法 15
2.3.3 半監督式機器學習法 16
2.4 地理資訊系統 16
2.4.1 Google地圖 16
2.5 意見探勘 17
2.6 小結... 18
3. 研究方法 19
3.1 研究架構 19
3.2 資料前處理模組 21
3.3 名稱實體擷取模組 22
3.3.1 地理資訊 22
3.3.2 餐廳名稱 24
3.3.3 美食名稱 27
3.4 美食意見擷取模組 30
4. 系統建置與驗證 33
4.1 系統建置環境 33
4.2 實驗方法 33
4.2.1 資料來源 34
4.2.2 評估指標 34
4.3 實驗結果與分析 35
4.4 系統畫面範例 39
4.5 小結 41
5. 結論以及未來方向 42
5.1 研究成果 42
5.2 未來研究方向 45
參考文獻 47
參考文獻 英文文獻
Bendersky, M., & Croft, W. B. (2008). Discovering key concepts in verbose queries. Paper presented at the Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval.
Bollegala, D., Matsuo, Y., & Ishizuka, M. (2011). Automatic Discovery of Personal Name Aliases from the Web. Knowledge and Data Engineering, IEEE Transactions on, 23(6), 831-844.
Bookstein, A., & Swanson, D. R. (1974). Probabilistic models for automatic indexing. Journal of the American Society for Information Science, 25(5), 312-316. doi: 10.1002/asi.4630250505
Brunsden, D., Doornkamp, J. C., Fookes, P. G., Jones, D. K. C., & Kelly, J. M. H. (1975). Large scale geomorphological mapping and highway engineering design. Quarterly Journal of Engineering Geology, 8(3).
Bunescu, R. C., & Pasca, M. (2006). Using Encyclopedic Knowledge for Named entity Disambiguation. Paper presented at the EACL.
Cao, D., Liao, X., Xu, H., & Bai, S. (2008). Blog Post and Comment Extraction Using Information Quantity of Web Format. In H. Li, T. Liu, W.-Y. Ma, T. Sakai, K.-F. Wong & G. Zhou (Eds.), Information Retrieval Technology (Vol. 4993, pp. 298-309): Springer Berlin Heidelberg.
Chang, P.-C., Galley, M., & Manning, C. D. (2008). Optimizing Chinese word segmentation for machine translation performance. Paper presented at the Proceedings of the Third Workshop on Statistical Machine Translation, Columbus, Ohio.
Chen, K.-J., & Bai, M.-H. (1998). Unknown Word Detection for Chinese by a Corpus-based Learning Method. Computational Linguistics and Chinese Language Processing, 3(1), 27-44.
Chen, K.-J., & Liu, S.-H. (1992). Word identification for Mandarin Chinese sentences. Paper presented at the Proceedings of the 14th conference on Computational linguistics - Volume 1, Nantes, France.
Chen, K.-J., & Ma, W.-Y. (2002). Unknown word extraction for Chinese documents. Paper presented at the Proceedings of the 19th international conference on Computational linguistics - Volume 1, Taipei, Taiwan.
Cho, H.-C., Okazaki, N., Miwa, M., & Tsujii, J. i. (2013). Named entity recognition with multiple segment representations. Information Processing & Management, 49(4), 954-965. doi: http://dx.doi.org/10.1016/j.ipm.2013.03.002
Church, K., & Gale, W. (1999). Inverse Document Frequency (IDF): A Measure of Deviations from Poisson. In S. Armstrong, K. Church, P. Isabelle, S. Manzi, E. Tzoukermann & D. Yarowsky (Eds.), Natural Language Processing Using Very Large Corpora (Vol. 11, pp. 283-295): Springer Netherlands.
Clifton, C., & Cooley, R. (1999). TopCat: Data Mining for Topic Identification in a Text Corpus. In J. Żytkow & J. Rauch (Eds.), Principles of Data Mining and Knowledge Discovery (Vol. 1704, pp. 174-183): Springer Berlin Heidelberg.
Duric, A., & Song, F. (2012). Feature selection for sentiment analysis based on content and syntax models. Decis. Support Syst., 53(4), 704-711.
Fu, G., & Luke, K.-K. (2005). Chinese named entity recognition using lexicalized HMMs. SIGKDD Explor. Newsl., 7(1), 19-25.
Gavalas, D., Konstantopoulos, C., Mastakas, K., & Pantziou, G. (2013). Mobile recommender systems in tourism. Journal of Network and Computer Applications(0).
Gustavsson, M., Seijmonsbergen, A. C., & Kolstrup, E. (2008). Structure and contents of a new geomorphological GIS database linked to a geomorphological map — With an example from Liden, central Sweden. Geomorphology, 95(3–4), 335-349. doi: http://dx.doi.org/10.1016/j.geomorph.2007.06.014
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA.
Hu, N., Liu, L., & Zhang, J. (2008). Do online reviews affect product sales? The role of reviewer characteristics and temporal effects. Information Technology and Management, 9(3), 201-214.
Jiang, H., Wang, X., & Tian, J. (2010). Second-Order HMM for Event Extraction from Short Message. In C. Hopfe, Y. Rezgui, E. Métais, A. Preece & H. Li (Eds.), Natural Language Processing and Information Systems (Vol. 6177, pp. 149-156): Springer Berlin Heidelberg.
Jones, K. S. (1973). Index term weighting. Information Storage and Retrieval, 9(11), 619-633. doi: http://dx.doi.org/10.1016/0020-0271(73)90043-0
Kienholz, H. (1978). Maps of geomorphology and natural hazards of Grindelwald, Switzerland: scale 1: 10,000. Arctic and Alpine Research, 169-184.
Koizumi, R., & In'nami, Y. (2012). Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens. System, 40(4), 554-564. doi: http://dx.doi.org/10.1016/j.system.2012.10.012
Lease, M., Allan, J., & Croft, W. B. (2009). Regression rank: Learning to meet the opportunity of descriptive queries Advances in Information Retrieval (pp. 90-101): Springer.
Levy, R., & Manning, C. (2003). Is it harder to parse Chinese, or the Chinese Treebank? Paper presented at the Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, Sapporo, Japan.
Liu, C., Chen, C., & Chen, M. S. (2010, 10-13 Oct. 2010). Le Festin: Shop sign recognition assisted food recommendation system. Paper presented at the Wearable Computers (ISWC), 2010 International Symposium on.
Liu, D. R., Tsai, P. Y., & Chiu, P. H. (2011). Personalized recommendation of popular blog articles for mobile applications. Information Sciences, 181(9), 1552-1572.
Liu, H., He, J., Wang, T., Song, W., & Du, X. (2013). Combining user preferences and user opinions for accurate recommendation. Electronic Commerce Research and Applications, 12(1), 14-23.
Ma, W.-Y., & Chen, K.-J. (2003). A bottom-up merging algorithm for Chinese unknown word extraction. Paper presented at the Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17, Sapporo, Japan.
Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing: MIT press.
Marrese-Taylor, E., Velásquez, J. D., Bravo-Marquez, F., & Matsuo, Y. (2013). Identifying Customer Preferences about Tourism Products Using an Aspect-based Opinion Mining Approach. Procedia Computer Science, 22(0), 182-191.
Mitchell, T. M. (1997). Machine learning. 1997. Burr Ridge, IL: McGraw Hill, 45.
Monfil-Contreras, E. U., Alor-Hernández, G., Cortes-Robles, G., Rodriguez-Gonzalez, A., & Gonzalez-Carrasco, I. (2013). RESYGEN: A Recommendation System Generator using domain-based heuristics. Expert Systems with Applications, 40(1), 242-256.
O'Leary, D. E. (2011). Blog mining-review and extensions: “From each according to his opinion”. Decision Support Systems, 51(4), 821-830.
Papineni, K. (2001). Why inverse document frequency? Paper presented at the Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies, Pittsburgh, Pennsylvania.
Pingley, A., Yu, W., Zhang, N., Fu, X., & Zhao, W. (2012). A context-aware scheme for privacy-preserving location-based services. Computer Networks, 56(11), 2551-2568.
Polifroni, J., Kiss, I., & Adler, M. (2010). Bootstrapping Named Entity Extraction for the Creation of Mobile Services. Paper presented at the LREC.
Rennie, J. D. M., & Jaakkola, T. (2005). Using term informativeness for named entity detection. Paper presented at the Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, Salvador, Brazil.
Rushdi Saleh, M., Martín-Valdivia, M. T., Montejo-Ráez, A., & Ureña-López, L. A. (2011). Experiments with SVM to classify opinions in different domains. Expert Systems with Applications, 38(12), 14799-14804.
Saha, S. K., Sarkar, S., & Mitra, P. (2009). Feature selection techniques for maximum entropy based biomedical named entity recognition. Journal of Biomedical Informatics, 42(5), 905-911.
Shen, D., Zhang, J., Su, J., Zhou, G., & Tan, C.-L. (2004). Multi-criteria-based active learning for named entity recognition. Paper presented at the Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain.
Shih, C. C., Peng, T. C., & Lai, W. S. (2009, 1-4 Nov. 2009). Mining the blogosphere to generate local cuisine hotspots for mobile map service. Paper presented at the Digital Information Management, 2009. ICDIM 2009. Fourth International Conference on.
Teahan, W. J., McNab, R., Wen, Y., & Witten, I. H. (2000). A compression-based algorithm for Chinese word segmentation. Comput. Linguist., 26(3), 375-393.
Vu, T. H. N., Ryu, K. H., & Park, N. (2009). A method for predicting future location of mobile user for location-based services system. Computers & Industrial Engineering, 57(1), 91-105.
Wong, P.-k., & Chan, C. (1996). Chinese word segmentation based on maximum matching and word binding force. Paper presented at the Proceedings of the 16th conference on Computational linguistics - Volume 1, Copenhagen, Denmark.
Wu, Z., & Tseng, G. (1993). Chinese text segmentation for text retrieval: Achievements and problems. Journal of the American Society for Information Science, 44(9), 532-542.
Xue, N., Chiou, F.-D., & Palmer, M. (2002). Building a large-scale annotated Chinese corpus. Paper presented at the Proceedings of the 19th international conference on Computational linguistics - Volume 1, Taipei, Taiwan.
Yang, C. C., Luk, J. W. K., Yung, S. K., & Yen, J. (2000). Combination and boundary detection approaches on Chinese indexing. Journal of the American Society for Information Science, 51(4), 340-351.
Yin, X., & Shah, S. (2010). Building taxonomy of web search intents for name entity queries. Paper presented at the Proceedings of the 19th international conference on World wide web.
Yu, L.-C., He, W.-C., Chien, W.-N., & Tseng, Y.-H. (2013). Identification of Code-Switched Sentences and Words Using Language Modeling Approaches. Mathematical Problems in Engineering, 2013.
Zhuang, L., Jing, F., & Zhu, X.-Y. (2006). Movie review mining and summarization. Paper presented at the Proceedings of the 15th ACM international conference on Information and knowledge management, Arlington, Virginia, USA.
網路資源
Wikipedia. (2013). Google地圖 - 維基百科,自由的百科全書. from http://zh.wikipedia.org/wiki/Google%E5%9C%B0%E5%9B%BE
Alexa. (2013). Alexa Top 500 Global Sites. from http://www.alexa.com/topsites/global
高照明. (2012). 語料庫建構技術 -研究報告. from http://wd.naer.edu.tw/project/NAER-101-12-F-2-03-00-2-01.pdf
創市際市場研究顧問公司. (2012). 2012年網路社群白皮書. from http://zh.scribd.com/doc/117223697/2012%E5%B9%B4%E7%B6%B2%E8%B7%AF%E7%A4%BE%E7%BE%A4%E7%99%BD%E7%9A%AE%E6%9B%B8
資策會FIND. (2013). 服務創新體驗設計系統研究與推動計畫. from http://www.find.org.tw/find/home.aspx
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2019-06-27起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw