||A Study on Efficient Algorithms for On-shelf Utility Mining
||Institute of Computer Science and Information Engineering
對於效益挖掘的議題，本論文分別提出了GPA (Gradual Pruning Approach)與PTA (Projection-based Upper-bound Tightening Approach)等演算法來提昇尋找高效益項目集的執行效率。這兩個方法分別採用不同的策略來改進傳統TWU模式，進而有效地緊縮項目集的效益上限邊界值。在實驗評估裡，實驗結果也呈現出所提的方法相較於採用傳統TWU模式的演算法，能在數個資料上皆有近50%的效能改善率。
關於商品的架位期間，目前尚未有相關研究在探討具有架位期間資訊的高效益項目集資訊。為了處理這問題，本論文提出一個架位效益挖掘的新研究議題，藉以獲得在時序資料庫內更精確的項目集效益值。本論文分別提了TP-HOU (Two-Phased Algorithm for Mining High On-shelf Utility Itemsets)、TS-Mine (Three-Scan-Mine)與MSA (Multi-Strategy Approach)等三個挖掘演算法來處理此問題。其中，TP-HOU是以傳統兩階段效益挖掘方法為基礎所發展的方法，而TS-Mine方法是一個三階段的執行方法，同時，也使用一個上限邊界預測策略來減少不必要的候選項目集。最後，本論文提出一個具有過濾條件的投影技術挖掘方法MSA，此方法僅需一個階段的執行，也可有效地改善項目集的架位效益上限邊界值。最後的實驗評估結果顯示在數個資料庫上，MSA方法在降低候選項目集數量與執行效率方面皆是優於TP-HOU與TS-Mine。
Data mining techniques have been widely used for data analysis in recent years. Especially, utility mining has become a very popular topic because of its close relation to economic applications. Different from traditional association-rule mining, an itemset in utility mining considers both profits and quantities of items in a set of transactions, and its actual significance can thus be measured. The traditional TWU model may, however, generate a large number of candidates, thus spending a lot of time in dealing with them. Besides, the on-shelf and off-shelf actions of products in a store are usually done for multiple times. If an entire database, instead of the transactions in on-shelf time periods, is used for mining, some biases may exist in the mining results. In this dissertation, we thus develop several efficient methods for mining high utility itemsets and high on-shelf utility itemsets from temporal databases.
For the issue of utility mining, two mining approaches, GPA (Gradual Pruning Approach) and PTA (Projection-based Upper-bound Tightening Approach), are proposed to speed up the execution efficiency of finding high utility itemsets. They adopt different strategies to improve the traditional TWU model and effectively tighten the upper-bounds of utility values for itemsets. The experimental evaluation shows that the proposed approaches could deliver up to 50% improvement of efficiency over the traditional TWU model on several datasets.
As to on-shelf periods of products, there has not been work yet for discovering high utility itemsets with on-shelf time periods. To address this, the dissertation then presents a new research issue, named on-shelf utility mining, to obtain more accurate utility value of each itemset in temporal databases. There are three mining algorithms including TP-HOU (Two-Phased Algorithm for Mining High On-shelf Utility Itemsets), TS-Mine (Three-Scan-Mine), and MSA (Multi-Strategy Approach), are proposed. The TP-HOU algorithm is extended from the traditional two-phase utility mining algorithm to handle the problem of on-shelf utility mining. The TS-Mine algorithm is executed in three phases, and an upper-bound prediction strategy is used to reduce the number of unpromising itemsets. Finally, the MSA algorithm, which is based on the projection technique and with an effective filtering condition, is presented. It needs only one phase and can improve the on-shelf utility upper bounds for itemsets. At last, experiments are made for comparing the three algorithms and the results show that the MSA algorithm outperforms the other two algorithms in the number of candidate itemsets and in the execution efficiency on several datasets.
List of Tables IX
List of Figures XI
Chapter 1 Introduction 1
1.1 Motivation 4
1.2 Overview of the Dissertation 6
1.3 Organization of the Dissertation 8
Chapter 2 Background and Related Work 9
2.1 Association-Rule Mining 9
2.2 Constraint-Based Mining 11
2.3 Temporal Association-Rule Mining 12
2.4 Utility Mining 13
Chapter 3 Efficient Algorithms with Improved Upper-bound Strategies 15
3.1 Introduction 15
3.2 Problem Statement and Definitions 19
3.3 Gradual Pruning Algorithm, GPA 24
220.127.116.11 Pruning Unpromising Item Strategy 25
18.104.22.168 Reducing Data Size Strategy 29
22.214.171.124 The Proposed Utility Mining Algorithm GPA 30
3.3.2 An Example of Using the GPA Algorithm 32
3.4 Projection-based Tightening Approach, PTA 38
126.96.36.199 The Strategy of Pruning Unpromising Items 39
188.8.131.52 The Filtering Strategy from Item Relationship 43
184.108.40.206 The Indexing Mechanism 46
220.127.116.11 The Proposed Utility Mining Algorithm, PTA 47
3.4.2 An Example of Using the PTA algorithm 52
3.5 Experimental Evaluation 58
3.5.1 Experimental Datasets 59
3.5.2 Effectiveness of the Upper-Bound Tightening Strategy 61
3.5.3 Efficiency Evaluation 63
3.5.4 Evaluation on Real Datasets BMS-POS and Chain-store 66
3.6 Summary 68
Chapter 4 Efficient Algorithms for Mining High On-shelf Utility Itemsets 69
4.1 Introduction 69
4.2 Problem Statement and Definitions 73
4.3 Two-phase Algorithm TP-HOU 82
18.104.22.168 The PTTU Table 83
22.214.171.124 The Temporary Itemset 84
126.96.36.199 The Proposed Two-phase Algorithm for High On-shelf Utility Itemsets, TP-HOU 85
4.3.2 An Example of Using the TP-HOU Algorithm 89
4.4 Three-scan Mining Algorithm TS-Mine 95
188.8.131.52 The Itemset Generation 95
184.108.40.206 The Upper Bound of On-shelf Utility 98
220.127.116.11 The Proposed Three-scan Mining Algorithm TS-Mine 99
4.4.2 An Example of Using the TS-Mine Algorithm 102
4.5 Multi-strategy Algorithm MSA 109
18.104.22.168 The Recognizing Strategy for Unpromising Items 110
22.214.171.124 The Upper-Bound Tightening Strategy 111
126.96.36.199 The Proposed Multi-strategy Mining Algorithm, MSA 114
4.5.2 An Example of Using the MSA Algorithm 119
4.6 Experimental Evaluation 126
4.6.1 Experimental Datasets 126
4.6.2 Evaluation on Number of Promising On-shelf Utility Itemsets 128
4.6.3 Efficiency Evaluation 130
4.6.4 Efficiency Evaluation for MSA 132
4.6.5 Evaluation on a Real Dataset 135
4.7 Summary 136
Chapter 5 Conclusions and Future Work 137
5.1 Conclusions 137
5.2 Future Work 140
 A. Adhikari and P. R. Rao, “Synthesizing Heavy Association Rules from Different Real Data Sources,” Pattern Recognition Letters, Vol. 29, No. 1, pp. 59-71, 2005.
 R. Agrawal, T. Imieliński and A. Swami, “Mining Association Rule between Sets of Items in Large Databases,” The ACM SIGMOD International Conference on Management of Data, pp. 207-216, May 1993.
 R. Agrawal, R. Srikant and Q. Vu, “Mining Association Rules with Item Constraints,” The 3rd International Conference on Knowledge Discovery in Databases and Data Mining, Newport Beach, California, pp. 66-73, 1997.
 R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” The 20th International Conference on Very Large Data Bases, pp. 478-499, Sept. 1994.
 R. Agrawal and R. Srikant, “Mining Sequential Patterns,” The 11th International Conference on Data Engineering, pp. 3-14, Mar. 1995.
 C. F. Ahmed, S. K. Tanbeer, and B. S. Jeong, “A Novel Approach for Mining High-utility Sequential Patterns in Sequence Databases,” ETRI Journal, Vol. 32, No. 5, pp. 676-686, 2010.
 C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee, “Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases,” IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 3, pp. 1708-1721, 2009.
 J. M. Ale and G. H. Rossi, “An Approach to Discovering Temporal Association Rules,” The 2000 ACM Symposium on Applied Computing, pp. 294-300, 2000.
 L. M. Aouad, N. A. Le-Khac, and M. T. Kechadi, “Performance Study of Distributed Apriori-like Frequent Itemsets Mining,” Knowledge and Information Systems, Vol. 23, No. 1, pp. 55-72, 2010.
 M. Boley and H. Grosskreutz, “Approximating the Number of Frequent Sets in Dense Data,” Knowledge and Information Systems, Vol. 21, No. 1, pp. 65-89, 2009.
 F. Bonchi and C. Lucchese, “Pushing Tougher Constraints in Frequent Pattern Mining,” The Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 114-124. 2005.
 F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi, “ExAMiner: Optimized Level-wise Frequent Pattern Mining with Monotone Constraints,” The 3rd IEEE International Conference on Data Mining, pp. 11-18, 2003.
 J. F. Boulicaut and B. Jeudy, “Using Constraints during Itemset Mining: Should We Prune or Not? ,” The Bases de Données Avancées BDA'00, pp. 221-237, 2000.
 R. Chan, Q. Yang, and Y. Shen, “Mining High Utility Itemsets,” The 3rd IEEE International Conference on Data Mining, pp. 19-26, 2003.
 J. H. Chang, “Mining Weighted Sequential Patterns in a Sequence Database with a Time-interval Weight,” Knowledge Based Systems, Vol. 24, No.1, pp.1-9, 2011.
 C. Y. Chang, M. S. Chen, and C. H. Lee, “Mining General Temporal Association Rules for Items with Different Exhibition Periods,” The Third IEEE International Conference on Data Mining, pp. 59-66, 2002.
 C. J. Chu, Vincent S. Tseng, and T. Liang, “An Efficient Algorithm for Mining High Utility Itemsets with Negative Item Values in Large Databases,” Applied Mathematics and Computation, Vol. 215, No. 2, pp. 767-778, 2008.
 C. J. Chu, V. Tseng, and T. Liang, “Mining Temporal Rare Utility Itemsets in Large Databases using Relative Utility Thresholds,” International Journal of Innovative Computing, Information and Control, Vol. 4, No. 8, pp. 2775-2792, 2008.
 Y. L. Chen, K. Tang, R. J. Shen, and Y. H. Hu, “Market Basket Analysis in a Multiple Store Environment,” Decision Support Systems, Vol. 40, No. 2, pp. 339-354, 2005.
 A. Erwin, R. P. Gopalan, and N. R. Achuthan, “A Bottom-up Projection Based Algorithm for Mining High Utility Itemsets,” The 2nd Workshop on Integrating AI and Data Mining, pp. 3-11, 2007.
 A. Erwin, R. P. Gopalan, and N. R. Achuthan, “CTU-Mine – An Efficient High Utility Itemset Mining Algorithm using the Pattern Growth Approach,” The 7th IEEE International Conferences on Computer and Information Technology, pp. 71-76, 2007.
 Frequent Itemsets Mining Dataset Repository. Available at (http://fimi.cs.helsink i.fi/data/).
 J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” The ACM SIGMOD International Conference on Management of Data, pp. 1-12, 2000.
 J. Han, V. S. Lakshmanan, and R. T. Ng, “Constraint-based, Multidimensional Data Mining,” IEEE Computer, Vol. 32, No. 8, pp. 46-50, 1999.
 J. Hu and A. Mojsilovic, “High-utility Pattern Mining: A Method for Discovery of High-utility Item Sets,” Pattern Recognition, Vol. 40, No. 11, pp. 3317-3324, 2007.
 IBM Quest Data Mining Project, “Quest Synthetic Data Generation Code,” Available at (http://www.almaden.ibm.com/cs/quest/syndata.html).
 G. C. Lan, T. P. Hong and Vincent S. Tseng, “A Projection-based Approach for Discovering High Average-utility Itemsets,” The Journal of Information Science and Engineering, Special Issue on Artificial Intelligence, Vol. 28, No. 1, pp. 193-209, 2011.
 G. C. Lan, T. P. Hong, and Vincent S. Tseng, “Discovery of High Utility Itemsets through a Projection-based Approach,” accepted and to appear in Knowledge and Information Systems, 2012.
 C. H. Lee, C. R. Lin, and M. S. Chen, “On Mining General Temporal Association Rules in a Publication Database,” The 2001 IEEE International Conference on Data Mining, pp. 337-344, 2001.
 H. F. Li, “MHUI-max: An Efficient Algorithm for Discovering High-utility Itemsets from Data Streams,” Journal of Information Science, Vol. 37, No. 5, pp. 532-545, 2011.
 H. F. Li, H. Y. Huang, and S. Y. Lee, “Fast and Memory Efficient Mining of High Utility Itemsets from Data Streams: with and without Negative Item Profits,” Knowledge and Information Systems, Vol. 28, No. 3, pp. 495-522, 2011.
 Y. Li, P. Ning, X. S. Wang, and S. Jajodia, “Discovering Calendar-based Temporal Association Rules,” Data & Knowledge Engineering, Vol. 44, No. 2, pp. 193-218, 2003.
 Y. C. Li, J. S. Yeh, and C. C. Chang, “Isolated Items Discarding Strategy for Discovering High Utility Itemsets,” Data & Knowledge Engineering, Vol. 64, No. 1, pp. 198-217, 2008.
 D-I. Lin and Z. M. Kedem, “Pincer-search: a New Algorithm for Discovering the Maximum Frequent Set,” The 6th International Conference on Extending Database Technology, pp. 103-119, 1998.
 C. W. Lin, T. P. Hong, and W. H. Lu, “Efficiently Mining High Average Utility Itemsets with a Tree Structure,” The 2th Asian Conference on Intelligent Information and Database Systems, pp. 131-139, 2010.
 J. Liu, Y. Pan, K. Wang, and J. Han, “Mining Frequent Item Sets by Opportunistic Projection,” The International Conference on Knowledge Discovery in Databases, pp. 229-238, 2002.
 Y. Liu, W. Liao, and A. Choudhary, “A Fast High Utility Itemsets Mining Algorithm,” The Utility-Based Data Mining Workshop, pp. 90-99, 2005.
 Microsoft Corporation, Example Database FoodMart of Microsoft Analysis Services.
 R. T. Ng, V. S. Lakshmanan, J. Han, and A. Pang, “Exploratory Mining and Pruning Optimizations of Constrained Association Rules,” The ACM SIGMOD International Conference on Management of Data, pp. 13-24, 1998.
 J. Pei, J. Han, and V. S. Lakshmanan, “Mining Frequent Item Sets with Convertible Constraints,” The 17th International Conference on Data Engineering, pp. 433-442, 2001.
 J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, “Mining Sequential Patterns by Pattern-growth: The PrefixSpan Approach,” IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, pp. 1424-1440, 2004.
 T. P. Hong, C. H. Lee, S. L. Wang, “An Incremental Mining Algorithm for High Average-Utility Itemsets,” The 10th International Symposium on Pervasive Systems, pp. 421-425, 2009.
 T. P. Hong, C. H. Lee, and S. L. Wang, “Effective Utility Mining with the Measure of Average Utility,” Expert System with Applications, Vol. 38, No. 7, pp. 8259-8265, 2011.
 B. Ozden, S. Ramaswamy, and A. Silberschatz. “Cyclic Association Rules,” The 14th International Conference on Data Engineering, Orlando, Florida, USA, pp. 12-421, 1998.
 R. R. Rajalaxmi and A. M. Natarajan, “A Novel Sanitization Approach for Privacy Preserving Utility Itemset Mining,” Computer and Information Science, Vol. 1, No. 3, pp. 77-82, 2008.
 J. F. Roddick and M. Spiliopoilou, “A Survey of Temporal Knowledge Discovery Paradigms and Methods,” IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 4, pp. 750-767, 2002.
 K. Saxena, “Efficient Mining of Weighted Temporal Association Rules,” The 2009 World Congress on Computer Science and Information Engineering, pp.421-425, 2009.
 R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” The 21st International Conference on Very Large Data Bases, pp. 407-419, 1995.
 R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,” The ACM SIGMOD Conference on Management of Data (SIGMOD’96). Montreal, Canada, pp. 1-12, 1996.
 B. E. Shie, H. F. Hsiao, and Vincent S. Tseng, “Efficient Algorithms for Discovering High Utility User Behavior Patterns in Mobile Commerce Environments,”in Knowledge and Information Systems, 2011.
 Vincent S. Tseng, C. J. Chu, and T. Liang, “Efficient Mining of Temporal High Utility Itemsets from Data Streams,” The ACM KDD Workshop on Utility-Based Data Mining, pp. 1105-1117, 2006.
 Vincent S. Tseng, H. C. Lu, and C. H. Huang, “Mining Temporal Mobile Sequential Patterns in Location-based Service Environments,” The 13th IEEE International Conference on Parallel and Distributed Systems, pp. 1-8, 2007.
 C. M. Wang, S. H. Chen, and Y. F. Huang, “A Fuzzy Approach for Mining High Utility Quantitative Itemsets,” The IEEE International Conference on Fuzzy Systems, pp. 1909-1913, 2009.
 C. Y. Wang, S. S. Tseng, and T. P. Hong, “Flexible Online Association Rule Mining Based on Multidimensional Pattern Relations,” Information Science, Vol. 176, No. 12, pp. 1752-1780, 2006.
 X. Wu and S. Zhang, “Synthesizing High-frequency Rules from Different Data Sources,” IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 2, pp. 353-367, 2003.
 Y. Xu, Benjamin C. M. Fung, and K. Wang, Ada W. C. Fu, J. Pei, “Publishing Sensitive Transactions for Itemset Utility,” The 8th IEEE International Conference on Data Mining, pp. 1109-1114, 2008.
 H. Yao and H. J. Hamilton, “Mining Itemset Utilities from Transaction Databases,” Data & Knowledge Engineering, Vol. 59, No. 3, pp. 603-626, 2006.
 H. Yao, H. J. Hamilton, and C. J. Butz, “A Foundational Approach to Mining Itemset Utilities from Databases,” The 4th SIAM International Conference on Data Mining, pp. 482-486, 2004.
 J. S. Yeh, C. Y. Chang, and Y. T. Wang, “Efficient Algorithms for Incremental Utility Mining,” The International Conference on Ubiquitous Information Management and Communication, pp. 229-234, 2008.
 G. Yu, K. Li, and S. Shao, “Mining High Utility Itemsets in Large High Dimensional Data,” The International Workshop on Knowledge Discovery and Data Mining, pp. 17-20, 2008.
 S. J. Yen and Y. S. Lee, “Mining High Utility Quantitative Association Rules,” The 9th International Conference on Data Warehousing and Knowledge Discovery, Lecture Notes in Computer Science 4654, pp. 283-292, 2007.
 C. H. Yun and M. S. Chen, “Mining Mobile Sequential Patterns in a Mobile Commerce Environment,” IEEE Transactions on Systems, Man, and Cybernetics, Part C, Vol. 37, No. 2, pp. 278-295, 2007.
 U. Yun, John J. Leggett, and T. J. Ong, “Mining Weighted Sequential Patterns Based on Length-decreasing Support Constraints,” The IEEE International Conference on Intelligence and Security Informatics, pp. 650-651, 2006.
 U. Yun and John J. Leggett, “WSpan: Weighted Sequential Pattern Mining in Large Sequence Databases,” The 3rd International IEEE Conference Intelligent Systems, pp. 512-517, 2006.
 U. Yun and K. H. Ryu, “Discovering Important Sequential Patterns with Length-decreasing Weighted Support Constraints,” International Journal of Information Technology and Decision Making, Vol. 9, No. 4, pp. 575-599, 2010.