
系統識別號 
U00262404201201072400 
論文名稱(中文) 
高效性架位效益探勘演算法之研究 
論文名稱(英文) 
A Study on Efficient Algorithms for Onshelf Utility Mining 
校院名稱 
成功大學 
系所名稱(中) 
資訊工程學系碩博士班 
系所名稱(英) 
Institute of Computer Science and Information Engineering 
學年度 
100 
學期 
2 
出版年 
101 
研究生(中文) 
藍國誠 
研究生(英文) 
GuoCheng Lan 
電子信箱 
rrfoheiay@gmail.com 
學號 
P78951189 
學位類別 
博士 
語文別 
英文 
論文頁數 
154頁 
口試委員 
口試委員陳良弼 口試委員陳銘憲 口試委員蘇豐文 口試委員曾守正 口試委員林文揚 口試委員謝孫源 共同指導教授洪宗貝 指導教授曾新穆

中文關鍵字 
資料挖掘
關聯規則挖掘
效益挖掘
上限邊界
投影技術
時序資料庫
架位期間

英文關鍵字 
Data mining
associationrule mining
utility mining
upperbound
projection technique
temporal database
onshelf period

學科別分類 

中文摘要 
近年來，資料挖掘技術已被廣泛應用在各種資料的分析應用上。特別的是因為效益挖掘接近經濟應用的相關性，所以已成為一個非常受歡迎的研究議題。不同於傳統關聯規則挖掘，在效益挖掘裡的項目集同時考慮項目在交易內購買的數量與利潤，因此，項目集的實際重要性是能被評估的。然而，這傳統模式仍然需要產生大量的候選項目集並計算它們的效益上限邊界值，所以，須耗費大量的時間成本來處理這些候選項目集。另外，商店裡的產品通常會有數次的上下架行為，若是以整個資料庫取代商品的上架期間來挖掘，則會導致挖掘結果存在著某些偏斜狀況。因此，本論文主旨在探討與研發從時序性資料庫中挖掘出高效益項目集與高架位效益項目集等資訊。
對於效益挖掘的議題，本論文分別提出了GPA (Gradual Pruning Approach)與PTA (Projectionbased Upperbound Tightening Approach)等演算法來提昇尋找高效益項目集的執行效率。這兩個方法分別採用不同的策略來改進傳統TWU模式，進而有效地緊縮項目集的效益上限邊界值。在實驗評估裡，實驗結果也呈現出所提的方法相較於採用傳統TWU模式的演算法，能在數個資料上皆有近50%的效能改善率。
關於商品的架位期間，目前尚未有相關研究在探討具有架位期間資訊的高效益項目集資訊。為了處理這問題，本論文提出一個架位效益挖掘的新研究議題，藉以獲得在時序資料庫內更精確的項目集效益值。本論文分別提了TPHOU (TwoPhased Algorithm for Mining High Onshelf Utility Itemsets)、TSMine (ThreeScanMine)與MSA (MultiStrategy Approach)等三個挖掘演算法來處理此問題。其中，TPHOU是以傳統兩階段效益挖掘方法為基礎所發展的方法，而TSMine方法是一個三階段的執行方法，同時，也使用一個上限邊界預測策略來減少不必要的候選項目集。最後，本論文提出一個具有過濾條件的投影技術挖掘方法MSA，此方法僅需一個階段的執行，也可有效地改善項目集的架位效益上限邊界值。最後的實驗評估結果顯示在數個資料庫上，MSA方法在降低候選項目集數量與執行效率方面皆是優於TPHOU與TSMine。

英文摘要 
Data mining techniques have been widely used for data analysis in recent years. Especially, utility mining has become a very popular topic because of its close relation to economic applications. Different from traditional associationrule mining, an itemset in utility mining considers both profits and quantities of items in a set of transactions, and its actual significance can thus be measured. The traditional TWU model may, however, generate a large number of candidates, thus spending a lot of time in dealing with them. Besides, the onshelf and offshelf actions of products in a store are usually done for multiple times. If an entire database, instead of the transactions in onshelf time periods, is used for mining, some biases may exist in the mining results. In this dissertation, we thus develop several efficient methods for mining high utility itemsets and high onshelf utility itemsets from temporal databases.
For the issue of utility mining, two mining approaches, GPA (Gradual Pruning Approach) and PTA (Projectionbased Upperbound Tightening Approach), are proposed to speed up the execution efficiency of finding high utility itemsets. They adopt different strategies to improve the traditional TWU model and effectively tighten the upperbounds of utility values for itemsets. The experimental evaluation shows that the proposed approaches could deliver up to 50% improvement of efficiency over the traditional TWU model on several datasets.
As to onshelf periods of products, there has not been work yet for discovering high utility itemsets with onshelf time periods. To address this, the dissertation then presents a new research issue, named onshelf utility mining, to obtain more accurate utility value of each itemset in temporal databases. There are three mining algorithms including TPHOU (TwoPhased Algorithm for Mining High Onshelf Utility Itemsets), TSMine (ThreeScanMine), and MSA (MultiStrategy Approach), are proposed. The TPHOU algorithm is extended from the traditional twophase utility mining algorithm to handle the problem of onshelf utility mining. The TSMine algorithm is executed in three phases, and an upperbound prediction strategy is used to reduce the number of unpromising itemsets. Finally, the MSA algorithm, which is based on the projection technique and with an effective filtering condition, is presented. It needs only one phase and can improve the onshelf utility upper bounds for itemsets. At last, experiments are made for comparing the three algorithms and the results show that the MSA algorithm outperforms the other two algorithms in the number of candidate itemsets and in the execution efficiency on several datasets.

論文目次 
Content
中文摘要 I
ABSTRACT III
誌謝 V
Content VI
List of Tables IX
List of Figures XI
Chapter 1 Introduction 1
1.1 Motivation 4
1.2 Overview of the Dissertation 6
1.3 Organization of the Dissertation 8
Chapter 2 Background and Related Work 9
2.1 AssociationRule Mining 9
2.2 ConstraintBased Mining 11
2.3 Temporal AssociationRule Mining 12
2.4 Utility Mining 13
Chapter 3 Efficient Algorithms with Improved Upperbound Strategies 15
3.1 Introduction 15
3.2 Problem Statement and Definitions 19
3.3 Gradual Pruning Algorithm, GPA 24
3.3.1.1 Pruning Unpromising Item Strategy 25
3.3.1.2 Reducing Data Size Strategy 29
3.3.1.3 The Proposed Utility Mining Algorithm GPA 30
3.3.2 An Example of Using the GPA Algorithm 32
3.4 Projectionbased Tightening Approach, PTA 38
3.4.1.1 The Strategy of Pruning Unpromising Items 39
3.4.1.2 The Filtering Strategy from Item Relationship 43
3.4.1.3 The Indexing Mechanism 46
3.4.1.4 The Proposed Utility Mining Algorithm, PTA 47
3.4.2 An Example of Using the PTA algorithm 52
3.5 Experimental Evaluation 58
3.5.1 Experimental Datasets 59
3.5.2 Effectiveness of the UpperBound Tightening Strategy 61
3.5.3 Efficiency Evaluation 63
3.5.4 Evaluation on Real Datasets BMSPOS and Chainstore 66
3.6 Summary 68
Chapter 4 Efficient Algorithms for Mining High Onshelf Utility Itemsets 69
4.1 Introduction 69
4.2 Problem Statement and Definitions 73
4.3 Twophase Algorithm TPHOU 82
4.3.1.1 The PTTU Table 83
4.3.1.2 The Temporary Itemset 84
4.3.1.3 The Proposed Twophase Algorithm for High Onshelf Utility Itemsets, TPHOU 85
4.3.2 An Example of Using the TPHOU Algorithm 89
4.4 Threescan Mining Algorithm TSMine 95
4.4.1.1 The Itemset Generation 95
4.4.1.2 The Upper Bound of Onshelf Utility 98
4.4.1.3 The Proposed Threescan Mining Algorithm TSMine 99
4.4.2 An Example of Using the TSMine Algorithm 102
4.5 Multistrategy Algorithm MSA 109
4.5.1.1 The Recognizing Strategy for Unpromising Items 110
4.5.1.2 The UpperBound Tightening Strategy 111
4.5.1.3 The Proposed Multistrategy Mining Algorithm, MSA 114
4.5.2 An Example of Using the MSA Algorithm 119
4.6 Experimental Evaluation 126
4.6.1 Experimental Datasets 126
4.6.2 Evaluation on Number of Promising Onshelf Utility Itemsets 128
4.6.3 Efficiency Evaluation 130
4.6.4 Efficiency Evaluation for MSA 132
4.6.5 Evaluation on a Real Dataset 135
4.7 Summary 136
Chapter 5 Conclusions and Future Work 137
5.1 Conclusions 137
5.2 Future Work 140
VITA 143
Bibliography 144
Publications 151

參考文獻 
[1] A. Adhikari and P. R. Rao, “Synthesizing Heavy Association Rules from Different Real Data Sources,” Pattern Recognition Letters, Vol. 29, No. 1, pp. 5971, 2005.
[2] R. Agrawal, T. Imieliński and A. Swami, “Mining Association Rule between Sets of Items in Large Databases,” The ACM SIGMOD International Conference on Management of Data, pp. 207216, May 1993.
[3] R. Agrawal, R. Srikant and Q. Vu, “Mining Association Rules with Item Constraints,” The 3rd International Conference on Knowledge Discovery in Databases and Data Mining, Newport Beach, California, pp. 6673, 1997.
[4] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” The 20th International Conference on Very Large Data Bases, pp. 478499, Sept. 1994.
[5] R. Agrawal and R. Srikant, “Mining Sequential Patterns,” The 11th International Conference on Data Engineering, pp. 314, Mar. 1995.
[6] C. F. Ahmed, S. K. Tanbeer, and B. S. Jeong, “A Novel Approach for Mining Highutility Sequential Patterns in Sequence Databases,” ETRI Journal, Vol. 32, No. 5, pp. 676686, 2010.
[7] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee, “Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases,” IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 3, pp. 17081721, 2009.
[8] J. M. Ale and G. H. Rossi, “An Approach to Discovering Temporal Association Rules,” The 2000 ACM Symposium on Applied Computing, pp. 294300, 2000.
[9] L. M. Aouad, N. A. LeKhac, and M. T. Kechadi, “Performance Study of Distributed Apriorilike Frequent Itemsets Mining,” Knowledge and Information Systems, Vol. 23, No. 1, pp. 5572, 2010.
[10] M. Boley and H. Grosskreutz, “Approximating the Number of Frequent Sets in Dense Data,” Knowledge and Information Systems, Vol. 21, No. 1, pp. 6589, 2009.
[11] F. Bonchi and C. Lucchese, “Pushing Tougher Constraints in Frequent Pattern Mining,” The PacificAsia Conference on Knowledge Discovery and Data Mining, pp. 114124. 2005.
[12] F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi, “ExAMiner: Optimized Levelwise Frequent Pattern Mining with Monotone Constraints,” The 3rd IEEE International Conference on Data Mining, pp. 1118, 2003.
[13] J. F. Boulicaut and B. Jeudy, “Using Constraints during Itemset Mining: Should We Prune or Not? ,” The Bases de Données Avancées BDA'00, pp. 221237, 2000.
[14] R. Chan, Q. Yang, and Y. Shen, “Mining High Utility Itemsets,” The 3rd IEEE International Conference on Data Mining, pp. 1926, 2003.
[15] J. H. Chang, “Mining Weighted Sequential Patterns in a Sequence Database with a Timeinterval Weight,” Knowledge Based Systems, Vol. 24, No.1, pp.19, 2011.
[16] C. Y. Chang, M. S. Chen, and C. H. Lee, “Mining General Temporal Association Rules for Items with Different Exhibition Periods,” The Third IEEE International Conference on Data Mining, pp. 5966, 2002.
[17] C. J. Chu, Vincent S. Tseng, and T. Liang, “An Efficient Algorithm for Mining High Utility Itemsets with Negative Item Values in Large Databases,” Applied Mathematics and Computation, Vol. 215, No. 2, pp. 767778, 2008.
[18] C. J. Chu, V. Tseng, and T. Liang, “Mining Temporal Rare Utility Itemsets in Large Databases using Relative Utility Thresholds,” International Journal of Innovative Computing, Information and Control, Vol. 4, No. 8, pp. 27752792, 2008.
[19] Y. L. Chen, K. Tang, R. J. Shen, and Y. H. Hu, “Market Basket Analysis in a Multiple Store Environment,” Decision Support Systems, Vol. 40, No. 2, pp. 339354, 2005.
[20] A. Erwin, R. P. Gopalan, and N. R. Achuthan, “A Bottomup Projection Based Algorithm for Mining High Utility Itemsets,” The 2nd Workshop on Integrating AI and Data Mining, pp. 311, 2007.
[21] A. Erwin, R. P. Gopalan, and N. R. Achuthan, “CTUMine – An Efficient High Utility Itemset Mining Algorithm using the Pattern Growth Approach,” The 7th IEEE International Conferences on Computer and Information Technology, pp. 7176, 2007.
[22] Frequent Itemsets Mining Dataset Repository. Available at (http://fimi.cs.helsink i.fi/data/).
[23] J. Han, J. Pei, and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” The ACM SIGMOD International Conference on Management of Data, pp. 112, 2000.
[24] J. Han, V. S. Lakshmanan, and R. T. Ng, “Constraintbased, Multidimensional Data Mining,” IEEE Computer, Vol. 32, No. 8, pp. 4650, 1999.
[25] J. Hu and A. Mojsilovic, “Highutility Pattern Mining: A Method for Discovery of Highutility Item Sets,” Pattern Recognition, Vol. 40, No. 11, pp. 33173324, 2007.
[26] IBM Quest Data Mining Project, “Quest Synthetic Data Generation Code,” Available at (http://www.almaden.ibm.com/cs/quest/syndata.html).
[27] G. C. Lan, T. P. Hong and Vincent S. Tseng, “A Projectionbased Approach for Discovering High Averageutility Itemsets,” The Journal of Information Science and Engineering, Special Issue on Artificial Intelligence, Vol. 28, No. 1, pp. 193209, 2011.
[28] G. C. Lan, T. P. Hong, and Vincent S. Tseng, “Discovery of High Utility Itemsets through a Projectionbased Approach,” accepted and to appear in Knowledge and Information Systems, 2012.
[29] C. H. Lee, C. R. Lin, and M. S. Chen, “On Mining General Temporal Association Rules in a Publication Database,” The 2001 IEEE International Conference on Data Mining, pp. 337344, 2001.
[30] H. F. Li, “MHUImax: An Efficient Algorithm for Discovering Highutility Itemsets from Data Streams,” Journal of Information Science, Vol. 37, No. 5, pp. 532545, 2011.
[31] H. F. Li, H. Y. Huang, and S. Y. Lee, “Fast and Memory Efficient Mining of High Utility Itemsets from Data Streams: with and without Negative Item Profits,” Knowledge and Information Systems, Vol. 28, No. 3, pp. 495522, 2011.
[32] Y. Li, P. Ning, X. S. Wang, and S. Jajodia, “Discovering Calendarbased Temporal Association Rules,” Data & Knowledge Engineering, Vol. 44, No. 2, pp. 193218, 2003.
[33] Y. C. Li, J. S. Yeh, and C. C. Chang, “Isolated Items Discarding Strategy for Discovering High Utility Itemsets,” Data & Knowledge Engineering, Vol. 64, No. 1, pp. 198217, 2008.
[34] DI. Lin and Z. M. Kedem, “Pincersearch: a New Algorithm for Discovering the Maximum Frequent Set,” The 6th International Conference on Extending Database Technology, pp. 103119, 1998.
[35] C. W. Lin, T. P. Hong, and W. H. Lu, “Efficiently Mining High Average Utility Itemsets with a Tree Structure,” The 2th Asian Conference on Intelligent Information and Database Systems, pp. 131139, 2010.
[36] J. Liu, Y. Pan, K. Wang, and J. Han, “Mining Frequent Item Sets by Opportunistic Projection,” The International Conference on Knowledge Discovery in Databases, pp. 229238, 2002.
[37] Y. Liu, W. Liao, and A. Choudhary, “A Fast High Utility Itemsets Mining Algorithm,” The UtilityBased Data Mining Workshop, pp. 9099, 2005.
[38] Microsoft Corporation, Example Database FoodMart of Microsoft Analysis Services.
[39] R. T. Ng, V. S. Lakshmanan, J. Han, and A. Pang, “Exploratory Mining and Pruning Optimizations of Constrained Association Rules,” The ACM SIGMOD International Conference on Management of Data, pp. 1324, 1998.
[40] J. Pei, J. Han, and V. S. Lakshmanan, “Mining Frequent Item Sets with Convertible Constraints,” The 17th International Conference on Data Engineering, pp. 433442, 2001.
[41] J. Pei, J. Han, B. MortazaviAsl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M. Hsu, “Mining Sequential Patterns by Patterngrowth: The PrefixSpan Approach,” IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 11, pp. 14241440, 2004.
[42] T. P. Hong, C. H. Lee, S. L. Wang, “An Incremental Mining Algorithm for High AverageUtility Itemsets,” The 10th International Symposium on Pervasive Systems, pp. 421425, 2009.
[43] T. P. Hong, C. H. Lee, and S. L. Wang, “Effective Utility Mining with the Measure of Average Utility,” Expert System with Applications, Vol. 38, No. 7, pp. 82598265, 2011.
[44] B. Ozden, S. Ramaswamy, and A. Silberschatz. “Cyclic Association Rules,” The 14th International Conference on Data Engineering, Orlando, Florida, USA, pp. 12421, 1998.
[45] R. R. Rajalaxmi and A. M. Natarajan, “A Novel Sanitization Approach for Privacy Preserving Utility Itemset Mining,” Computer and Information Science, Vol. 1, No. 3, pp. 7782, 2008.
[46] J. F. Roddick and M. Spiliopoilou, “A Survey of Temporal Knowledge Discovery Paradigms and Methods,” IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 4, pp. 750767, 2002.
[47] K. Saxena, “Efficient Mining of Weighted Temporal Association Rules,” The 2009 World Congress on Computer Science and Information Engineering, pp.421425, 2009.
[48] R. Srikant and R. Agrawal, “Mining Generalized Association Rules,” The 21st International Conference on Very Large Data Bases, pp. 407419, 1995.
[49] R. Srikant and R. Agrawal, “Mining Quantitative Association Rules in Large Relational Tables,” The ACM SIGMOD Conference on Management of Data (SIGMOD’96). Montreal, Canada, pp. 112, 1996.
[50] B. E. Shie, H. F. Hsiao, and Vincent S. Tseng, “Efficient Algorithms for Discovering High Utility User Behavior Patterns in Mobile Commerce Environments,”in Knowledge and Information Systems, 2011.
[51] Vincent S. Tseng, C. J. Chu, and T. Liang, “Efficient Mining of Temporal High Utility Itemsets from Data Streams,” The ACM KDD Workshop on UtilityBased Data Mining, pp. 11051117, 2006.
[52] Vincent S. Tseng, H. C. Lu, and C. H. Huang, “Mining Temporal Mobile Sequential Patterns in Locationbased Service Environments,” The 13th IEEE International Conference on Parallel and Distributed Systems, pp. 18, 2007.
[53] C. M. Wang, S. H. Chen, and Y. F. Huang, “A Fuzzy Approach for Mining High Utility Quantitative Itemsets,” The IEEE International Conference on Fuzzy Systems, pp. 19091913, 2009.
[54] C. Y. Wang, S. S. Tseng, and T. P. Hong, “Flexible Online Association Rule Mining Based on Multidimensional Pattern Relations,” Information Science, Vol. 176, No. 12, pp. 17521780, 2006.
[55] X. Wu and S. Zhang, “Synthesizing Highfrequency Rules from Different Data Sources,” IEEE Transactions on Knowledge and Data Engineering, Vol. 15, No. 2, pp. 353367, 2003.
[56] Y. Xu, Benjamin C. M. Fung, and K. Wang, Ada W. C. Fu, J. Pei, “Publishing Sensitive Transactions for Itemset Utility,” The 8th IEEE International Conference on Data Mining, pp. 11091114, 2008.
[57] H. Yao and H. J. Hamilton, “Mining Itemset Utilities from Transaction Databases,” Data & Knowledge Engineering, Vol. 59, No. 3, pp. 603626, 2006.
[58] H. Yao, H. J. Hamilton, and C. J. Butz, “A Foundational Approach to Mining Itemset Utilities from Databases,” The 4th SIAM International Conference on Data Mining, pp. 482486, 2004.
[59] J. S. Yeh, C. Y. Chang, and Y. T. Wang, “Efficient Algorithms for Incremental Utility Mining,” The International Conference on Ubiquitous Information Management and Communication, pp. 229234, 2008.
[60] G. Yu, K. Li, and S. Shao, “Mining High Utility Itemsets in Large High Dimensional Data,” The International Workshop on Knowledge Discovery and Data Mining, pp. 1720, 2008.
[61] S. J. Yen and Y. S. Lee, “Mining High Utility Quantitative Association Rules,” The 9th International Conference on Data Warehousing and Knowledge Discovery, Lecture Notes in Computer Science 4654, pp. 283292, 2007.
[62] C. H. Yun and M. S. Chen, “Mining Mobile Sequential Patterns in a Mobile Commerce Environment,” IEEE Transactions on Systems, Man, and Cybernetics, Part C, Vol. 37, No. 2, pp. 278295, 2007.
[63] U. Yun, John J. Leggett, and T. J. Ong, “Mining Weighted Sequential Patterns Based on Lengthdecreasing Support Constraints,” The IEEE International Conference on Intelligence and Security Informatics, pp. 650651, 2006.
[64] U. Yun and John J. Leggett, “WSpan: Weighted Sequential Pattern Mining in Large Sequence Databases,” The 3rd International IEEE Conference Intelligent Systems, pp. 512517, 2006.
[65] U. Yun and K. H. Ryu, “Discovering Important Sequential Patterns with Lengthdecreasing Weighted Support Constraints,” International Journal of Information Technology and Decision Making, Vol. 9, No. 4, pp. 575599, 2010.

論文全文使用權限 
同意授權校內瀏覽/列印電子全文服務，於20181231起公開。同意授權校外瀏覽/列印電子全文服務，於20181231起公開。 


