進階搜尋


下載電子全文  
系統識別號 U0026-2404201201072400
論文名稱(中文) 高效性架位效益探勘演算法之研究
論文名稱(英文) A Study on Efficient Algorithms for On-shelf Utility Mining
校院名稱 成功大學
系所名稱(中) 資訊工程學系碩博士班
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 100
學期 2
出版年 101
研究生(中文) 藍國誠
研究生(英文) Guo-Cheng Lan
電子信箱 rrfoheiay@gmail.com
學號 P78951189
學位類別 博士
語文別 英文
論文頁數 154頁
口試委員 口試委員-陳良弼
口試委員-陳銘憲
口試委員-蘇豐文
口試委員-曾守正
口試委員-林文揚
口試委員-謝孫源
共同指導教授-洪宗貝
指導教授-曾新穆
中文關鍵字 資料挖掘  關聯規則挖掘  效益挖掘  上限邊界  投影技術  時序資料庫  架位期間 
英文關鍵字 Data mining  association-rule mining  utility mining  upper-bound  projection technique  temporal database  on-shelf period 
學科別分類
中文摘要 近年來,資料挖掘技術已被廣泛應用在各種資料的分析應用上。特別的是因為效益挖掘接近經濟應用的相關性,所以已成為一個非常受歡迎的研究議題。不同於傳統關聯規則挖掘,在效益挖掘裡的項目集同時考慮項目在交易內購買的數量與利潤,因此,項目集的實際重要性是能被評估的。然而,這傳統模式仍然需要產生大量的候選項目集並計算它們的效益上限邊界值,所以,須耗費大量的時間成本來處理這些候選項目集。另外,商店裡的產品通常會有數次的上下架行為,若是以整個資料庫取代商品的上架期間來挖掘,則會導致挖掘結果存在著某些偏斜狀況。因此,本論文主旨在探討與研發從時序性資料庫中挖掘出高效益項目集與高架位效益項目集等資訊。
對於效益挖掘的議題,本論文分別提出了GPA (Gradual Pruning Approach)與PTA (Projection-based Upper-bound Tightening Approach)等演算法來提昇尋找高效益項目集的執行效率。這兩個方法分別採用不同的策略來改進傳統TWU模式,進而有效地緊縮項目集的效益上限邊界值。在實驗評估裡,實驗結果也呈現出所提的方法相較於採用傳統TWU模式的演算法,能在數個資料上皆有近50%的效能改善率。
關於商品的架位期間,目前尚未有相關研究在探討具有架位期間資訊的高效益項目集資訊。為了處理這問題,本論文提出一個架位效益挖掘的新研究議題,藉以獲得在時序資料庫內更精確的項目集效益值。本論文分別提了TP-HOU (Two-Phased Algorithm for Mining High On-shelf Utility Itemsets)、TS-Mine (Three-Scan-Mine)與MSA (Multi-Strategy Approach)等三個挖掘演算法來處理此問題。其中,TP-HOU是以傳統兩階段效益挖掘方法為基礎所發展的方法,而TS-Mine方法是一個三階段的執行方法,同時,也使用一個上限邊界預測策略來減少不必要的候選項目集。最後,本論文提出一個具有過濾條件的投影技術挖掘方法MSA,此方法僅需一個階段的執行,也可有效地改善項目集的架位效益上限邊界值。最後的實驗評估結果顯示在數個資料庫上,MSA方法在降低候選項目集數量與執行效率方面皆是優於TP-HOU與TS-Mine。
英文摘要 Data mining techniques have been widely used for data analysis in recent years. Especially, utility mining has become a very popular topic because of its close relation to economic applications. Different from traditional association-rule mining, an itemset in utility mining considers both profits and quantities of items in a set of transactions, and its actual significance can thus be measured. The traditional TWU model may, however, generate a large number of candidates, thus spending a lot of time in dealing with them. Besides, the on-shelf and off-shelf actions of products in a store are usually done for multiple times. If an entire database, instead of the transactions in on-shelf time periods, is used for mining, some biases may exist in the mining results. In this dissertation, we thus develop several efficient methods for mining high utility itemsets and high on-shelf utility itemsets from temporal databases.
For the issue of utility mining, two mining approaches, GPA (Gradual Pruning Approach) and PTA (Projection-based Upper-bound Tightening Approach), are proposed to speed up the execution efficiency of finding high utility itemsets. They adopt different strategies to improve the traditional TWU model and effectively tighten the upper-bounds of utility values for itemsets. The experimental evaluation shows that the proposed approaches could deliver up to 50% improvement of efficiency over the traditional TWU model on several datasets.
As to on-shelf periods of products, there has not been work yet for discovering high utility itemsets with on-shelf time periods. To address this, the dissertation then presents a new research issue, named on-shelf utility mining, to obtain more accurate utility value of each itemset in temporal databases. There are three mining algorithms including TP-HOU (Two-Phased Algorithm for Mining High On-shelf Utility Itemsets), TS-Mine (Three-Scan-Mine), and MSA (Multi-Strategy Approach), are proposed. The TP-HOU algorithm is extended from the traditional two-phase utility mining algorithm to handle the problem of on-shelf utility mining. The TS-Mine algorithm is executed in three phases, and an upper-bound prediction strategy is used to reduce the number of unpromising itemsets. Finally, the MSA algorithm, which is based on the projection technique and with an effective filtering condition, is presented. It needs only one phase and can improve the on-shelf utility upper bounds for itemsets. At last, experiments are made for comparing the three algorithms and the results show that the MSA algorithm outperforms the other two algorithms in the number of candidate itemsets and in the execution efficiency on several datasets.
論文目次 Content
中文摘要 I
ABSTRACT III
誌謝 V
Content VI
List of Tables IX
List of Figures XI
Chapter 1 Introduction 1
1.1 Motivation 4
1.2 Overview of the Dissertation 6
1.3 Organization of the Dissertation 8
Chapter 2 Background and Related Work 9
2.1 Association-Rule Mining 9
2.2 Constraint-Based Mining 11
2.3 Temporal Association-Rule Mining 12
2.4 Utility Mining 13
Chapter 3 Efficient Algorithms with Improved Upper-bound Strategies 15
3.1 Introduction 15
3.2 Problem Statement and Definitions 19
3.3 Gradual Pruning Algorithm, GPA 24
3.3.1.1 Pruning Unpromising Item Strategy 25
3.3.1.2 Reducing Data Size Strategy 29
3.3.1.3 The Proposed Utility Mining Algorithm GPA 30
3.3.2 An Example of Using the GPA Algorithm 32
3.4 Projection-based Tightening Approach, PTA 38
3.4.1.1 The Strategy of Pruning Unpromising Items 39
3.4.1.2 The Filtering Strategy from Item Relationship 43
3.4.1.3 The Indexing Mechanism 46
3.4.1.4 The Proposed Utility Mining Algorithm, PTA 47
3.4.2 An Example of Using the PTA algorithm 52
3.5 Experimental Evaluation 58
3.5.1 Experimental Datasets 59
3.5.2 Effectiveness of the Upper-Bound Tightening Strategy 61
3.5.3 Efficiency Evaluation 63
3.5.4 Evaluation on Real Datasets BMS-POS and Chain-store 66
3.6 Summary 68
Chapter 4 Efficient Algorithms for Mining High On-shelf Utility Itemsets 69
4.1 Introduction 69
4.2 Problem Statement and Definitions 73
4.3 Two-phase Algorithm TP-HOU 82
4.3.1.1 The PTTU Table 83
4.3.1.2 The Temporary Itemset 84
4.3.1.3 The Proposed Two-phase Algorithm for High On-shelf Utility Itemsets, TP-HOU 85
4.3.2 An Example of Using the TP-HOU Algorithm 89
4.4 Three-scan Mining Algorithm TS-Mine 95
4.4.1.1 The Itemset Generation 95
4.4.1.2 The Upper Bound of On-shelf Utility 98
4.4.1.3 The Proposed Three-scan Mining Algorithm TS-Mine 99
4.4.2 An Example of Using the TS-Mine Algorithm 102
4.5 Multi-strategy Algorithm MSA 109
4.5.1.1 The Recognizing Strategy for Unpromising Items 110
4.5.1.2 The Upper-Bound Tightening Strategy 111
4.5.1.3 The Proposed Multi-strategy Mining Algorithm, MSA 114
4.5.2 An Example of Using the MSA Algorithm 119
4.6 Experimental Evaluation 126
4.6.1 Experimental Datasets 126
4.6.2 Evaluation on Number of Promising On-shelf Utility Itemsets 128
4.6.3 Efficiency Evaluation 130
4.6.4 Efficiency Evaluation for MSA 132
4.6.5 Evaluation on a Real Dataset 135
4.7 Summary 136
Chapter 5 Conclusions and Future Work 137
5.1 Conclusions 137
5.2 Future Work 140
VITA 143
Bibliography 144
Publications 151
參考文獻 [1] A. Adhikari and P. R. Rao, “Synthesizing Heavy Association Rules from Different Real Data Sources,” Pattern Recognition Letters, Vol. 29, No. 1, pp. 59-71, 2005.
[2] R. Agrawal, T. Imieliński and A. Swami, “Mining Association Rule between Sets of Items in Large Databases,” The ACM SIGMOD International Conference on Management of Data, pp. 207-216, May 1993.
[3] R. Agrawal, R. Srikant and Q. Vu, “Mining Association Rules with Item Constraints,” The 3rd International Conference on Knowledge Discovery in Databases and Data Mining, Newport Beach, California, pp. 66-73, 1997.
[4] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” The 20th International Conference on Very Large Data Bases, pp. 478-499, Sept. 1994.
[5] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” The 11th International Conference on Data Engineering, pp. 3-14, Mar. 1995.
[6] C. F. Ahmed, S. K. Tanbeer, and B. S. Jeong, “A Novel Approach for Mining High-utility Sequential Patterns in Sequence Databases,” ETRI Journal, Vol. 32, No. 5, pp. 676-686, 2010.
[7] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee, “Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases,” IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 3, pp. 1708-1721, 2009.
[8] J. M. Ale and G. H. Rossi, “An Approach to Discovering Temporal Association Rules,” The 2000 ACM Symposium on Applied Computing, pp. 294-300, 2000.
[9] L. M. Aouad, N. A. Le-Khac, and M. T. Kechadi, “Performance Study of Distributed Apriori-like Frequent Itemsets Mining,” Knowledge and Information Systems, Vol. 23, No. 1, pp. 55-72, 2010.
[10] M. Boley and H. Grosskreutz, “Approximating the Number of Frequent Sets in Dense Data,” Knowledge and Information Systems, Vol. 21, No. 1, pp. 65-89, 2009.
[11] F. Bonchi and C. Lucchese, “Pushing Tougher Constraints in Frequent Pattern Mining,” The Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 114-124. 2005.
[12] F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi, “ExAMiner: Optimized Level-wise Frequent Pattern Mining with Monotone Constraints,” The 3rd IEEE International Conference on Data Mining, pp. 11-18, 2003.
[13] J. F. Boulicaut and B. Jeudy, “Using Constraints during Itemset Mining: Should We Prune or Not? ,” The Bases de Données Avancées BDA'00, pp. 221-237, 2000.
[14] R. Chan, Q. Yang, and Y. Shen, “Mining High Utility Itemsets,” The 3rd IEEE International Conference on Data Mining, pp. 19-26, 2003.
[15] J. H. Chang, “Mining Weighted Sequential Patterns in a Sequence Database with a Time-interval Weight,” Knowledge Based Systems, Vol. 24, No.1, pp.1-9, 2011.
[16] C. Y. Chang, M. S. Chen, and C. H. Lee, “Mining General Temporal Association Rules for Items with Different Exhibition Periods,” The Third IEEE International Conference on Data Mining, pp. 59-66, 2002.
[17] C. J. Chu, Vincent S. Tseng, and T. Liang, “An Efficient Algorithm for Mining High Utility Itemsets with Negative Item Values in Large Databases,” Applied Mathematics and Computation, Vol. 215, No. 2, pp. 767-778, 2008.
[18] C. J. Chu, V. Tseng, and T. Liang, “Mining Temporal Rare Utility Itemsets in Large Databases using Relative Utility Thresholds,” International Journal of Innovative Computing, Information and Control, Vol. 4, No. 8, pp. 2775-2792, 2008.
[19] Y. L. Chen, K. Tang, R. J. Shen, and Y. H. Hu, “Market Basket Analysis in a Multiple Store Environment,” Decision Support Systems, Vol. 40, No. 2, pp. 339-354, 2005.
[20] A. Erwin, R. P. Gopalan, and N. R. Achuthan, “A Bottom-up Projection Based Algorithm for Mining High Utility Itemsets,” The 2nd Workshop on Integrating AI and Data Mining, pp. 3-11, 2007.
[21] A. Erwin, R. P. Gopalan, and N. R. Achuthan, “CTU-Mine – An Efficient High Utility Itemset Mining Algorithm using the Pattern Growth Approach,” The 7th IEEE International Conferences on Computer and Information Technology, pp. 71-76, 2007.
[22] Frequent Itemsets Mining Dataset Repository. Available at (http://fimi.cs.helsink i.fi/data/).
[23] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” The ACM SIGMOD International Conference on Management of Data, pp. 1-12, 2000.
[24] J. Han, V. S. Lakshmanan, and R. T. Ng, “Constraint-based, Multidimensional Data Mining,” IEEE Computer, Vol. 32, No. 8, pp. 46-50, 1999.
[25] J. Hu and A. Mojsilovic, “High-utility Pattern Mining: A Method for Discovery of High-utility Item Sets,” Pattern Recognition, Vol. 40, No. 11, pp. 3317-3324, 2007.
[26] IBM Quest Data Mining Project, “Quest Synthetic Data Generation Code,” Available at (http://www.almaden.ibm.com/cs/quest/syndata.html).
[27] G. C. Lan, T. P. Hong and Vincent S. Tseng, “A Projection-based Approach for Discovering High Average-utility Itemsets,” The Journal of Information Science and Engineering, Special Issue on Artificial Intelligence, Vol. 28, No. 1, pp. 193-209, 2011.
[28] G. C. Lan, T. P. Hong, and Vincent S. Tseng, “Discovery of High Utility Itemsets through a Projection-based Approach,” accepted and to appear in Knowledge and Information Systems, 2012.
[29] C. H. Lee, C. R. Lin, and M. S. Chen, “On Mining General Temporal Association Rules in a Publication Database,” The 2001 IEEE International Conference on Data Mining, pp. 337-344, 2001.
[30] H. F. Li, “MHUI-max: An Efficient Algorithm for Discovering High-utility Itemsets from Data Streams,” Journal of Information Science, Vol. 37, No. 5, pp. 532-545, 2011.
[31] H. F. Li, H. Y. Huang, and S. Y. Lee, “Fast and Memory Efficient Mining of High Utility Itemsets from Data Streams: with and without Negative Item Profits,” Knowledge and Information Systems, Vol. 28, No. 3, pp. 495-522, 2011.
[32] Y. Li, P. Ning, X. S. Wang, and S. Jajodia, “Discovering Calendar-based Temporal Association Rules,” Data & Knowledge Engineering, Vol. 44, No. 2, pp. 193-218, 2003.
[33] Y. C. Li, J. S. Yeh, and C. C. Chang, “Isolated Items Discarding Strategy for Discovering High Utility Itemsets,” Data & Knowledge Engineering, Vol. 64, No. 1, pp. 198-217, 2008.
[34] D-I. Lin and Z. M. Kedem, “Pincer-search: a New Algorithm for Discovering the Maximum Frequent Set,” The 6th International Conference on Extending Database Technology, pp. 103-119, 1998.
[35] C. W. Lin, T. P. Hong, and W. H. Lu, “Efficiently Mining High Average Utility Itemsets with a Tree Structure,” The 2th Asian Conference on Intelligent Information and Database Systems, pp. 131-139, 2010.
[36] J. Liu, Y. Pan, K. Wang, and J. Han, “Mining Frequent Item Sets by Opportunistic Projection,” The International Conference on Knowledge Discovery in Databases, pp. 229-238, 2002.
[37] Y. Liu, W. Liao, and A. Choudhary, “A Fast High Utility Itemsets Mining Algorithm,” The Utility-Based Data Mining Workshop, pp. 90-99, 2005.
[38] Microsoft Corporation, Example Database FoodMart of Microsoft Analysis Services.
[39] R. T. Ng, V. S. Lakshmanan, J. Han, and A. Pang, “Exploratory Mining and Pruning Optimizations of Constrained Association Rules,” The ACM SIGMOD International Conference on Management of Data, pp. 13-24, 1998.
[40] J. Pei, J. Han, and V. S. Lakshmanan, “Mining Frequent Item Sets with Convertible Constraints,” The 17th International Conference on Data Engineering, pp. 433-442, 2001.
[41] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, “Mining Sequential Patterns by Pattern-growth: The PrefixSpan Approach,” IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, pp. 1424-1440, 2004.
[42] T. P. Hong, C. H. Lee, S. L. Wang, “An Incremental Mining Algorithm for High Average-Utility Itemsets,” The 10th International Symposium on Pervasive Systems, pp. 421-425, 2009.
[43] T. P. Hong, C. H. Lee, and S. L. Wang, “Effective Utility Mining with the Measure of Average Utility,” Expert System with Applications, Vol. 38, No. 7, pp. 8259-8265, 2011.
[44] B. Ozden, S. Ramaswamy, and A. Silberschatz. “Cyclic Association Rules,” The 14th International Conference on Data Engineering, Orlando, Florida, USA, pp. 12-421, 1998.
[45] R. R. Rajalaxmi and A. M. Natarajan, “A Novel Sanitization Approach for Privacy Preserving Utility Itemset Mining,” Computer and Information Science, Vol. 1, No. 3, pp. 77-82, 2008.
[46] J. F. Roddick and M. Spiliopoilou, “A Survey of Temporal Knowledge Discovery Paradigms and Methods,” IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 4, pp. 750-767, 2002.
[47] K. Saxena, “Efficient Mining of Weighted Temporal Association Rules,” The 2009 World Congress on Computer Science and Information Engineering, pp.421-425, 2009.
[48] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” The 21st International Conference on Very Large Data Bases, pp. 407-419, 1995.
[49] R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,” The ACM SIGMOD Conference on Management of Data (SIGMOD’96). Montreal, Canada, pp. 1-12, 1996.
[50] B. E. Shie, H. F. Hsiao, and Vincent S. Tseng, “Efficient Algorithms for Discovering High Utility User Behavior Patterns in Mobile Commerce Environments,”in Knowledge and Information Systems, 2011.
[51] Vincent S. Tseng, C. J. Chu, and T. Liang, “Efficient Mining of Temporal High Utility Itemsets from Data Streams,” The ACM KDD Workshop on Utility-Based Data Mining, pp. 1105-1117, 2006.
[52] Vincent S. Tseng, H. C. Lu, and C. H. Huang, “Mining Temporal Mobile Sequential Patterns in Location-based Service Environments,” The 13th IEEE International Conference on Parallel and Distributed Systems, pp. 1-8, 2007.
[53] C. M. Wang, S. H. Chen, and Y. F. Huang, “A Fuzzy Approach for Mining High Utility Quantitative Itemsets,” The IEEE International Conference on Fuzzy Systems, pp. 1909-1913, 2009.
[54] C. Y. Wang, S. S. Tseng, and T. P. Hong, “Flexible Online Association Rule Mining Based on Multidimensional Pattern Relations,” Information Science, Vol. 176, No. 12, pp. 1752-1780, 2006.
[55] X. Wu and S. Zhang, “Synthesizing High-frequency Rules from Different Data Sources,” IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 2, pp. 353-367, 2003.
[56] Y. Xu, Benjamin C. M. Fung, and K. Wang, Ada W. C. Fu, J. Pei, “Publishing Sensitive Transactions for Itemset Utility,” The 8th IEEE International Conference on Data Mining, pp. 1109-1114, 2008.
[57] H. Yao and H. J. Hamilton, “Mining Itemset Utilities from Transaction Databases,” Data & Knowledge Engineering, Vol. 59, No. 3, pp. 603-626, 2006.
[58] H. Yao, H. J. Hamilton, and C. J. Butz, “A Foundational Approach to Mining Itemset Utilities from Databases,” The 4th SIAM International Conference on Data Mining, pp. 482-486, 2004.
[59] J. S. Yeh, C. Y. Chang, and Y. T. Wang, “Efficient Algorithms for Incremental Utility Mining,” The International Conference on Ubiquitous Information Management and Communication, pp. 229-234, 2008.
[60] G. Yu, K. Li, and S. Shao, “Mining High Utility Itemsets in Large High Dimensional Data,” The International Workshop on Knowledge Discovery and Data Mining, pp. 17-20, 2008.
[61] S. J. Yen and Y. S. Lee, “Mining High Utility Quantitative Association Rules,” The 9th International Conference on Data Warehousing and Knowledge Discovery, Lecture Notes in Computer Science 4654, pp. 283-292, 2007.
[62] C. H. Yun and M. S. Chen, “Mining Mobile Sequential Patterns in a Mobile Commerce Environment,” IEEE Transactions on Systems, Man, and Cybernetics, Part C, Vol. 37, No. 2, pp. 278-295, 2007.
[63] U. Yun, John J. Leggett, and T. J. Ong, “Mining Weighted Sequential Patterns Based on Length-decreasing Support Constraints,” The IEEE International Conference on Intelligence and Security Informatics, pp. 650-651, 2006.
[64] U. Yun and John J. Leggett, “WSpan: Weighted Sequential Pattern Mining in Large Sequence Databases,” The 3rd International IEEE Conference Intelligent Systems, pp. 512-517, 2006.
[65] U. Yun and K. H. Ryu, “Discovering Important Sequential Patterns with Length-decreasing Weighted Support Constraints,” International Journal of Information Technology and Decision Making, Vol. 9, No. 4, pp. 575-599, 2010.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2018-12-31起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2018-12-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw