進階搜尋


 
系統識別號 U0026-0812200912110251
論文名稱(中文) 利用多搜尋結果進行階層分群之查詢結果萃取之研究
論文名稱(英文) Query Result Distillation by Hierarchical Clustering and Result Aggregation on Multiple Search Engines
校院名稱 成功大學
系所名稱(中) 資訊工程學系碩博士班
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 94
學期 2
出版年 95
研究生(中文) 林致祿
研究生(英文) Chih-Lu Lin
學號 p7693105
學位類別 碩士
語文別 英文
論文頁數 44頁
口試委員 口試委員-鄧維光
口試委員-王惠嘉
指導教授-高宏宇
口試委員-盧文祥
中文關鍵字 分群  搜尋引擎  中文搜尋環境 
英文關鍵字 User goal  Search engine  Clustering 
學科別分類
中文摘要 隨著近年來網路快速的發展,我們經由網路可以接觸到的網路資源也隨之越來越多,但問題卻也伴隨而來,例如:缺乏有效尋找到有用資源的辦法。雖然對這個問題已經有很多有效的解決辦法,而眾多解決辦法之中以搜尋引擎及其相關的技術在此領域最為蓬勃發展,但仍然有一部分的問題尚需解決,如本篇論文中會提到的(一)對於一個短查詢來說,搜尋引擎不容易了解使用者的目的,難以提供給使用者真正想取得資源位置。(二)搜尋引擎如同就像無邊境的圖書館,雖然網頁被索引起來,但是當索引的數量過於龐大的話,仍需一套好的分群的辦法將大量的結果根據描述主題分群,來提高搜尋引擎能給使用者的便利性。因此,本篇論文的重心在於延續前人的對於分群的研究,以對於搜尋結果的分析,找出可運用的新特徵及一套可使用於中文搜尋環境下的分群方法,不僅預先替使用者產生易讀的群名,提高使用者使用搜尋引擎的便利性。另外,本篇論文亦會對一個查詢,適不適合作分群的處理作研究,目的在於避免不必要的處理,造成使用者多餘的閱讀負擔。

英文摘要 As the rapid development of the network environment in recent years, we could get more and more Web resources, however some problems happened as followed, e.g., Lacking of the effective method of finding the Web resources. This problem is solved as the birth of the search engines, but there are some other problems and issues needed to be resolved.
For some examples that will be mentioned in this paper: (1) for a short query, it is difficult for search engines to understand what users’ goal of the Web search. As a result, search engines are difficult to provide Web resources that are related to users’ search goal. (2) Without the effective method for helping the users, finds their information need among search engines’ enormous indexes. Therefore, this paper will focus on continuing and improving the previous work about clustering, and also try to study the suitability of pre-deciding that whether the query should be clustered or not, in order to avoid additional overheads both the search engines and users.


論文目次 中文摘要 I
ABSTRACT II
誌謝 III
TABLE LISTING VI
FIGURE LISTING VII
1. INTRODUCTION 1
2. MOTIVATION 5
3. ISSUES IN THIS PROBLEM 9
 3.1 THE CLUSTER NAME 9
 3.2 THE EVALUATION METHOD 9
 3.3 USING DIVERSE RESULTS OF DIFFERENT SEARCH ENGINES 9
 3.4 CHINESE SEARCHING RESULTS 10
4. RELATED WORK 10
 4.1 OVERALL DESCRIPTION 10
 4.2 DESCRIPTION OF THE PRESENT WEB SEARCH ENVIRONMENT 12
 4.3 OVERALL INTRODUCTION OF PREVIOUS METHODS 13
  4.3.1 Traditional methods 13
  4.3.2 Extended version of traditional methods 13
  4.3.3 Suffix Tree Clustering (STC) 13
  4.3.4 Clustering under the network environment or present search engine: 14
 4.4 DESCRIPTION OF ZHENG’S METHOD [16] 14
  4.4.1 Search Result Fetching 15
  4.4.2 Document Parsing and Phrase Property Calculation 15
  4.4.3 SVR brief description 18
5. EXPERIMENT 19
 5.1 PRE-CLUSTERING ANALYSIS 19
  5.1.1 Method description 20
  5.1.2 Real Cases: 22
 5.2 OUR PROPOSED METHOD 27
  5.2.1 Description 27
  5.2.2 Data set 27
  5.2.3 Description of experiment’s property 29
  5.2.4 URL structure 30
  5.2.5 About filtering method of Chinese search 33
  5.2.6 Combing Results of Multiple Search Engines 33
  5.2.7 Experiment Result and Discussion 34
 5.3 OUR RESEARCH ON HIERARCHICAL CLUSTER 37
  5.3.1 Description 37
  5.3.2 Dataset and Algorithm 37
  5.3.3 Evaluation 39
6. CONCLUSION AND FUTURE WORK 40
7. REFERENCE: 41
參考文獻 [1] D. Beeferman and A. Berger. Agglomerative clustering of a search engine query log. In Proceedings of ACMSIGKDD ’00, 2000
[2] D. R. Cutting, D. R. Karger, and J. O. Pederson. Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections. In Proceedings of the 16th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval (SIGIR'93), pages 125-135, Pittsburgh, PA, 1993.
[3] C. C. Chang, C. J. Lin, LIBSVM: A library for sup- port vector machines, 2001, Software available at http:// www.csie.ntu.edu.tw/?cjlin/papers/libsvm.ps.gz
[4] N. Eiron and K.S. McCurley. Analysis of anchor text for Web search. In Proceedings of ACM SIGIR ’03,2003.
[5] M. A. Hearst, J. O. Pedersen. Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'96), Zurich, June 1996.
[6] I. Kang and G. Kim. Query type classification for web document retrieval. In Proceedings of ACM SIGIR’03, 2003.
[7] R. Kraft and J. Zien. Mining anchor text for query refinement. In Proceedings of the Thirteenth Int’l.World Wide Web Conf., 2004.
[8] D. Lawrie, W. B. Croft. Finding Topic Words for Hierarchical Summarization. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01), pages 349-357, 2001.
[9] B. Lent, R. Agrawal, R. Srikant. Discovering Trends in Text Databases. In Proceedings of the 3rd Int'l Conference on Knowledge Discovery in Databases and Data Mining (KDD'97), Newport Beach, California, August 1997.
[10] A. V. Leouski. W. B. Croft. An Evaluation of Techniques for Clustering Search Results. Technical Report IR-76, Department of Computer Science, University of Massachusetts, Amherst, 1996.
[11] A. Leuski and J. Allan. Improving Interactive Retrieval by Combining Ranked List and Clustering. In Proceedings of RIAO, College de France, pp. 665-681, 2000.
[12] B. Liu, C. W. Chin, and H. T. Ng. Mining Topic-Specific Concepts and Definitions on the Web. In Proceedings of the Twelfth International World Wide Web Conference (WWW'03), Budapest, Hungary, 2003.
[13] U. Lee , Z. Liu , J.H. Cho, Automatic identification of user goals in Web search, Proceedings of the 14th international conference on World Wide Web, May 10-14, 2005, Chiba, Japan
[14] D. E. Rose and D. Levinson. Understanding user goals in Web search. In Proceedings of the Thirteenth Int’l.World Wide Web Conf., 2004.
[15] C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a very large Web search engine query log. SIGIR Forum, 33(1):6 – 12, 1999.
[16] H. J. Zheng, Q. C. He, Z. Chen, W. Y. Ma, J. Ma. Learning to cluster Web search results. In Proceedings of SIGIR ’04, pages 210–217, 2004.
[17] O. Zamir, O. Etzioni. Web Document Clustering: A Feasibility Demonstration, In Proceedings of the 19th International ACM SIGIR Conference on Research and Development of Information Retrieval (SIGIR'98), 46-54, 1998.
[18] O. Zamir, O. Etzioni. Grouper: A Dynamic Clustering Interface to Web Search Results. In Proceedings of the Eighth International World Wide Web Conference (WWW8), Toronto, Canada, May 1999.
[19] Google, http://www.google.com
[20] Yahoo, http://tw.yahoo.com
[21] Vivisimo, http://vivisimo.com
[22] MSN search, http://www.msn.com.tw
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2006-08-29起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2006-08-29起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw