進階搜尋


 
系統識別號 U0026-0812200913390365
論文名稱(中文) 利用頻繁樣式陣列(FP-array)提高互動式探勘之效率
論文名稱(英文) An Efficient Method for Interactive Mining Using FP-array
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 95
學期 2
出版年 96
研究生(中文) 廖強棋
研究生(英文) Chiang-Chi Liao
學號 r7694118
學位類別 碩士
語文別 英文
論文頁數 48頁
口試委員 口試委員-王清正
口試委員-李賢得
指導教授-利德江
中文關鍵字 頻繁項目集  探勘方法與演算法  資料探勘  關聯規則 
英文關鍵字 mining methods and algorithms  frequent itemset  Association rule  data mining 
學科別分類
中文摘要 當探勘頻繁項目集(mining frequent itemset)時,通常需要多次改變最小支持度來得到所需的結果,而在傳統的演算法中,當其改變最小支持度時,則其必須重新完整執行整個演算法,這樣將會需要很長的執行時間。這篇論文提出一個有效率的方法來互動式地探勘頻繁項目集,搭配頻繁樣式陣列(FP-array)和候選項目集的概念,提出的方法只需要增加一些簡單的計算和估計的技術便可以達到在探勘過程中持續回饋(continuous feedback)。另外當重新建立頻繁樣式樹(FP-tree)時,儲存最小的最小支持度頻繁樣式樹(Least Minimum Support Frequent Pattern Tree),則這個動作將可以使整個演算法更有效率。在這個創新的方法中有四個屬性: (1)使用者可以較有效率地得到有用的頻繁項目集; (2)使用者可以在探勘的過程中自由的改變最小支持度的門檻值; (3)其將會在探勘的過程中給使用者持續回瞶; (4)其會減少在不同支持度中所需重新建立頻繁項目樹的時間。
英文摘要 When mining for frequent itemsets, one usually must change the minimum supports frequently to get expected results. Additionally, in traditional algorithms, one must re-run the algorithm completely when changing the minimum supports, and that requires a long running time. This paper proposes an efficient method to mine for frequent itemsets interactively. By collocating FP-array and candidate itemset concepts, the presented method needs only some simple computation and estimate techniques to achieve continuous feedback in the mining process. Furthermore, by storing the Least Minimum Support Frequent Pattern Tree when rebuilding the FP-tree, the action makes itself more efficient. There are four features in the novel method: (1) The user can get the useful frequent itemsets more efficiently; (2) The user is free to change the minimum support threshold during a mining process; (3) It gives continuous feedback to the users during processing; (4) It decreases the time needed for rebuilding the tree with different supports.
論文目次 1 Introduction 10
1.1 Background and motivation 10
1.2 Problem definition 10
1.3 Methodology 11
1.4 Overview of thesis 11
2 Literature Review 12
3 Methodology 17
3.1 The Table Estimate 17
3.2 Using Nonlinear Regression Model 23
3.3 The other Estimate Techniques 26
3.4 The Least Minimum Support Frequent Pattern Tree 27
3.5 Continuous Estimate Frequent Pattern Growth 28
4 Experimental Performance Evaluation 30
4.1 The Accuracy of Estimate Techniques 31
4.2 Comparing the method with FP-growth* 39
5 Conclusion 44
Reference 46
參考文獻 [1]http://www.almaden.ibm.com/cs/quest/syndata.html, 2003.
[2]http://fimi.cs.helsinki.fi, 2003.
[3]C. C. Aggarwal and P. S. Yu, “Online generation of association rules,”Proc. of the IEEE ICDE'98, 1998, pages 402-411.
[4]R. Agrawal, T. Imielinski, and A. Swami. “Mining Association Rules between Sets of Items in Large Databases.” Proc. of ACM SIGMOD, pages 207-216,May1993
[5]R. Agrawal, R. Srikant. ”Fast Algorithms for Mining Association Rules.” Proc.Of the 20th Very Large Data Bases (VLDB-94), pp. 487-499,Santiago, Chile,1994.
[6]A. Amir, R. Feldman, and R. Kashi, “A new and versatile method for association generation,” Principles of Data Mining and KnowledgeDiscovery, First European Symposium, PKDD '97, 1997, pages 221-231.
[7]S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. ”Dynamic itemset counting and implication rules for market basket data.” Proc. of the ACM SIGMOD Int’l Conf.on Management of Data, 1997.
[8]W. Cheung and O.R. Zaiane. Incremental mining of frequent patterns without candidate generation or support constraint. In Proc. IDEAS 2003, pp. 111–116.
[9]D. W. Cheung, J. Han, V. T. Ng, and C. Y. Wong. “Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique.” In Proc. of the International Conference on Data Engineering, pages 106-114, 1996.
[10]G Grahne and J Zhu, “Fast algorithms for frequent itemset mining using FP-trees.” IEEE Transactions on Knowledge and Data Engineering, 2005,pages 1347-1362
[11]Jiawei Han, Jian Pei, and Yiwen Yin.” Mining Frequent Patterns without Candidate Generation.” ACM SIGMOD Intl. Conference on Management of Data, 2000.
[12]C. Hidber Online Association Rule Mining. In Proceedings of the 1999 International Conference on Management of Data (SIGMOD), pages 145~156, 1999.
[13]K. L. Lee, G. Lee and A. L. P. Chen. “Efficient Graph-Based Algorithms for Discovering and Maintaining Knowledge in Large Databases.” Proc.Pacific-Asia Conference on Knowledge Discovery and Data Mining, 1999.
[14]J. S. Park, M. S. Chen and P. S. Yu. “An Effective Hash Based Algorithm for Mining Association Rules.” Proc. of ACM SIGMOD, May 23-25, 1995, pp.175-186.
[15]J. Pei, J. Han, and R.Mao, CLOSET: An efficient algorithm for mining frequent closed itemsets. SIGMOD.2000.
[16]K Wang, L. Tang, J. Han, and J. Liu, “Top down FP-Growth for Association Rule Mining, ” to appear in the 6th Pacific Area Conference on Knowledge Discovery and Data Mining, May 6-8, Taipei, Taiwan, PAKDD-2002.
[17]S. J. Yen and A. L. P. Chen. “An Efficient Approach to Discovering Knowledge from Large Databases.” Proc. IEEE/ACM International Conference on Parallel and Distributed Information Systems (PDIS), 1996.
[18]M. J. Zaki, and C.-J. Hsiao, CHARM: An Efficient Algorithm for Closed Itemset Mining. SIAM International Conference on Data Mining. 2002.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2017-06-27起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2017-06-27起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw