進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-1101201917115200
論文名稱(中文) 在資料串流環境中的頻繁循序關聯規則以及高效益關聯規則探勘
論文名稱(英文) Frequent Sequential Pattern and High Utility Pattern Mining in the Data Stream Environments
校院名稱 成功大學
系所名稱(中) 電腦與通信工程研究所
系所名稱(英) Institute of Computer & Communication
學年度 107
學期 1
出版年 107
研究生(中文) 畢杰
研究生(英文) Bijay Prasad Jaysawal
學號 Q38027014
學位類別 博士
語文別 英文
論文頁數 99頁
口試委員 指導教授-黃仁暐
口試委員-陳銘憲
口試委員-曾新穆
口試委員-張嘉惠
口試委員-高宏宇
口試委員-莊坤達
中文關鍵字 Progressive Mining  Sequential Pattern Across Multiple Streams  High Utility Pattern  Data Stream Mining 
英文關鍵字 Progressive Mining  Sequential Pattern Across Multiple Streams  High Utility Pattern  Data Stream Mining 
學科別分類
中文摘要 在這篇論文中,我們解決了探勘頻繁循序樣式和高效益樣式在資料串流的問題,在資料串流的探勘中,我們期望演算法只需要執行一次,而且演算法在執行時間和記憶體使用量上有更高的要求,盡可能達到使用最少的記憶體和執行時間來完成計算。
首先,我們研究在多串流中的頻繁循序樣式探勘問題,在多串流的環境下,循序樣式通常會出現在跨多個資料串流上,再著在實際的應用面,使用者通常會想要著重在較新的資料上,對過時的資料會比較不感興趣,因此我們又結合了漸進式探勘的概念去找到跨資料串流上的頻繁循序樣式,我們提出一個演算法PSP-AMS,可以有效率的漸進式挖掘跨多個資料串流上的頻繁循序樣式。在PSP-AMS演算法中,我們提出了一個特殊的PSP-MS-tree資料結構,這個資料結構上的節點紀錄了項目出現的資訊以及項目之間的關聯性等,藉由走訪這個樹狀結構上可以有效的新增節點、更新現有節點、或者是刪除過時的節點,來達到有效率的跨多資料串流探勘頻繁循序樣式。
接著我們研究高效益樣式的問題,相對於頻繁樣式探勘是將出現頻率當作是衡量標準,在高效益樣式探勘的問題中則將效益值作為重要性衡量的指標,找到效益值高過門閥的樣式。過去有許多文獻提出了許多演算法來探勘高效益樣式,我們觀察到有些演算法在稀疏資料集上表現良好,而另外有些演算法在密集資料集上表現較突出,但很難有一個演算法在兩種資料上的表現都非常好,所以為了解決不同類型資料集上演算法表現的問題,我們提出一個新的演算法DMHUPS,DMHUPS同時為多個有希望的候選者計算效益和更嚴格的擴展上限值,並結合我們所提出的IUData_List資料結構來儲存項目的資訊,使得計算上可以更有效率,達到DMHUPS在稀疏和密集資料集上都可以更有效率的去探勘高效益樣式的目標。
最後我們將高效益樣式探勘擴展到多資料串流環境中,我們提出一個只須執行一次且單階段的演算法SOHUPDS,其中為了要處理多資料串流的情境,我們結合了滑動視窗的技術,使得SOHUPDS能在資料串流中更有效率的探勘高效益樣式。另外,我們又提出了一個新的資料結構IUDataListSW,用來儲存當下滑動視窗中長度為1的項目集裡的資訊,透過追蹤這個資料結構裡的資訊,演算法可以更有效率的計算項目集的擴展可能性,以此達到更有效率的探勘高效益樣式。
英文摘要 In this dissertation, we addressed frequent sequential pattern and high utility pattern mining in data streams environment. In data stream mining, it is desirable that the algorithms perform the operations using single-pass. Moreover, algorithms are required to be efficient in terms of execution time. In addition, it is also desirable that the algorithms require less amount of memory usage.

We first explored frequent sequential pattern mining in multiple streams. Specifically, we proposed an efficient algorithm named PSP-AMS to progressively mine frequent sequential patterns across multiple streams. In multiple streams environment, sequential pattern may appear across multiple streams. In addition, users would like to see the patterns based on recent data rather than the old data. Therefore, we utilize progressive mining approach to find the across-streams sequential patterns. PSP-AMS uses a novel data structure PSP-MS-tree to insert new items, update current items, and delete obsolete items. By maintaining a PSP-MS-tree, PSP-AMS efficiently finds the frequent sequential patterns across multiple streams.

Next, we explored high utility pattern mining. High utility pattern mining focuses utility value as measure of importance, whereas frequent pattern mining focuses on frequency as measure of importance. There are several studies in literature that propose algorithms for mining high utility patterns. We observed that some of the algorithms perform well on sparse dataset, whereas some of the algorithms perform well on dense datasets. To address this issue, we propose a novel algorithm called DMHUPS in conjunction with a data structure called IUData List to efficiently mine high utility patterns on both sparse and dense datasets. In addition, DMHUPS algorithm simultaneously calculates utility and tighter extension upper-bound values for multiple promising candidates.

We then explored high utility pattern mining in data stream environment. To deal with data stream environment, we utilize sliding window approach and propose an effective single-pass and one-phase algorithm, SOHUPDS, for mining high utility patterns in a data stream. Moreover, we propose a data structure IUDataListSW, which stores information of length-1 itemsets for the current sliding window. SOHUPDS utilizes IUDataListSW to efficiently mine high utility patterns over a data stream.
論文目次 摘要 - i
Abstract - iii
Acknowledgements - v
Table of Contents - vi
List of Tables - viii
List of Figures - ix
Chapter 1. Introduction - 1
1.1. Motivation - 1
1.2. Overview of the Dissertation - 2
1.2.1. Progressive mining of Sequential Patterns Across Multiple Streams - 2
1.2.2. DMHUPS: Discovering Multiple High Utility Patterns Simultaneously - 4
1.2.3. SOHUPDS: A Single-pass One-phase Algorithm for Mining High Utility Patterns over a Data Stream - 5
1.2.4. Integrated Framework - 7
1.3. Organization of the Dissertation - 7
Chapter 2. PSP-AMS: Progressive mining of Sequential Patterns Across Multiple Streams - 9
2.1. Introduction - 9
2.2. Related Work - 12
2.3. Preliminaries - 15
2.4. Progressive Mining of Sequential Pattern Across Multiple Streams - 17
2.4.1. PSP-MS-tree - 17
2.4.2. Algorithm PSP-AMS - 18
2.4.3. i-extension and s-extension - 22
2.4.4. Maintaining the PSP-MS-tree - 24
2.5. Performance Evaluation - 30
2.5.1. Experimental Environment and Dataset - 30
2.5.2. Execution Time and Memory Usage on Synthetic Dataset - 31
2.5.3. Experiments on a real dataset - 36
2.5.4. Scalability Test - 38
2.6. Summary - 41
Chapter 3. DMHUPS: Discovering Multiple High Utility Patterns Simultaneously - 42
3.1. Introduction - 42
3.2. Related Work - 45
3.3. Preliminaries - 47
3.4. Proposed method for mining high utility patterns - 49
3.4.1. The search space - 49
3.4.2. IUData List - 51
3.4.3. DMHUPS - 51
3.5. Optimization Techniques - 58
3.5.1. Avoiding Search by Lookahead (Lookahead strategy) - 58
3.5.2. Reducing the Cost of Database Scans by Transaction Merging (TransactionMerging strategy) - 58
3.5.3. Only One Item for Extension (OneItemExtension strategy) - 59
3.6. Performance Evaluation - 59
3.6.1. Datasets - 60
3.6.2. Execution Time and Memory Usage on Sparse Datasets - 60
3.6.3. Execution Time and Memory Usage on Dense Datasets - 63
3.6.4. Comparison of the number of visited nodes - 66
3.7. Summary - 70
Chapter 4. SOHUPDS: A Single-pass One-phase Algorithm for Mining High Utility Patterns over Data Streams - 71
4.1. Introduction - 71
4.2. Related Work - 73
4.3. Preliminaries - 75
4.4. Algorithm SOHUPDS - 78
4.4.1. IUDataListSW - 78
4.4.2. SOHUPDS - 79
4.5. Performance Evaluation - 84
4.5.1. Performance test with different minimum utility thresholds - 84
4.5.2. Performance test with different sliding window sizes - 87
4.5.3. Performance test on large dataset - 89
4.6. Summary - 91
Chapter 5. Conclusions - 92
References - 94
參考文獻 [1] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the Eleventh International Conference on Data Engineering, pages 3–14, March 1995.
[2] Rakesh Agrawal, Ramakrishnan Srikant, et al. Fast algorithms for mining association rules. In Proc. 20th int. conf. very large data bases, VLDB, volume 1215, pages 487–499, 1994.
[3] C. F. Ahmed, S. K. Tanbeer, B. S. Jeong, and Y. K. Lee. Efficient tree structures for high utility pattern mining in incremental databases. IEEE Transactions on Knowledge and Data Engineering, 21(12):1708–1721, 2009.
[4] Chowdhury Farhan Ahmed, Syed Khairuzzaman Tanbeer, Byeong-Soo Jeong, and Ho-Jin Choi. Interactive mining of high utility patterns over data streams. Expert Systems with Applications, 39(15):11979 – 11991, 2012.
[5] J. Ayres, J. Flannick, J. Gehrke, and T. Yiu. Sequential pattern mining using a bitmap representation. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 429–435, July 2002.
[6] Raymond Chan, Qiang Yang, and Yi-Dong Shen. Mining high utility itemsets. In Third IEEE International Conference on Data Mining, pages 19–26, 2003.
[7] L. Chang, T. Wang, D. Yang, and H. Luan. Seqstream: Mining closed sequential patterns over stream sliding windows. In 2008 Eighth IEEE International Conference on Data Mining, pages 83–92, 2008.
[8] G. Chen, X. Wu, and X. Zhu. Sequential pattern mining in multiple streams. In Fifth IEEE International Conference on Data Mining, pages 585–588, November 2005.
[9] Yi-Cheng Chen, Yu-Lun Ko, Wen-Chih Peng, and Wang-Chien Lee. Mining appliance usage patterns in smart home environment. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 99–110, 2013.
[10] Hong Cheng, Xifeng Yan, and Jiawei Han. Incspan: Incremental mining of sequential patterns in large database. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, pages 527–532, 2004.
[11] Bi-Ru Dai, Jen-Wei Huang, Mi-Yen Yeh, and Ming-Syan Chen. Clustering on demand for multiple data streams. In 2004 Fourth IEEE International Conference on Data Mining, pages 367–370, Nov 2004.
[12] Bi-Ru Dai, Jen-Wei Huang, Mi-Yen Yeh, and Ming-Syan Chen. Adaptive clustering for multiple evolving streams. IEEE Transactions on Knowledge and Data Engineering, 18(9):1166–1180, Sept 2006.
[13] Siddharth Dawar and Vikram Goyal. Up-hist tree: An efficient data structure for mining high utility patterns from transaction databases. In Proceedings of the 19th International Database Engineering & Applications Symposium, IDEAS ’15, pages 56–61. ACM, 2014.
[14] Siddharth Dawar, Veronica Sharma, and Vikram Goyal. Mining top-k high-utility itemsets from a data stream under sliding window model. Applied Intelligence, 47(4):1240–1255, Dec 2017.
[15] C. I. Ezeife and M. Monwar. Ssm : A frequent sequential data stream patterns miner. In Computational Intelligence and Data Mining, 2007. CIDM 2007. IEEE Symposium on, pages 120–126, 2007.
[16] Philippe Fournier-Viger, Antonio Gomariz, Ted Gueniche, Azadeh Soltani, Cheng-Wei Wu, and Vincent S Tseng. Spmf: a java open-source pattern mining library. The Journal of Machine Learning Research, 15(1):3389–3393, 2014.
[17] Philippe Fournier-Viger, Cheng-Wei Wu, Souleymane Zida, and Vincent S Tseng. Fhm: faster high-utility itemset mining using estimated utility co-occurrence pruning. In Foundations of intelligent systems, pages 83–92. Springer, 2014.
[18] Jiawei Han, Jian Pei, and Micheline Kamber. Data mining: concepts and techniques. Elsevier, 2011.
[19] Chin-Chuan Ho, Hua-Fu Li, Fang-Fei Kuo, and Suh-Yin Lee. Incremental mining of sequential patterns over a stream sliding window. In Sixth IEEE International Conference on Data Mining - Workshops (ICDMW’06), pages 677 –681, 2006.
[20] Jen-Wei Huang, Su-Chen Lin, and Ming-Syan Chen. Dpsp: Distributed progressive sequential pattern mining on the cloud. In 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-10), 2010.
[21] Jen-Wei Huang, Chi-Yao Tseng, Jian-Chih Ou, and Ming-Syan Chen. A general model for sequential pattern mining with a progressive database. IEEE Transactions on Knowledge and Data Engineering, 20:1153 –1167, 2008.
[22] Bijay Prasad Jaysawal and Jen-Wei Huang. Mining frequent progressive usage patterns across multiple mobile broadcasting channels. In Trends and Applications in Knowledge Discovery and Data Mining, volume 8643 of Lecture Notes in Computer Science, pages 149–155. Springer International Publishing, 2014.
[23] Bijay Prasad Jaysawal and Jen-Wei Huang. Dmhups: Discovering multiple high utility patterns simultaneously. Knowledge and Information Systems, May 2018.
[24] Bettahally N. Keshavamurthy, Mitesh Sharma, and Durga Toshniwal. Efficient support coupled frequent pattern mining over progressive databases. CoRR, abs/1005.5434, 2010.
[25] Ron Kohavi, Carla E. Brodley, Brian Frasca, Llew Mason, and Zijian Zheng. Kddcup 2000 organizers’ report: Peeling the onion. SIGKDD Explor. Newsl., 2(2):86–93, December 2000.
[26] Srikumar Krishnamoorthy. Pruning strategies for mining high utility itemsets. Expert Systems with Applications, 42(5):2371 – 2381, 2015.
[27] Guanling Lee, Yi-Chun Chen, and Kuo-Che Hung. Ptree: Mining sequential patterns efficiently in multiple data streams environment. Journal of Information Science and Engineering, 2013.
[28] S. C. Lee, E. Lee, W. Choi, and U. M. Kim. Extracting temporal behavior patterns of mobile user. In 2008 Fourth International Conference on Networked Computing and Advanced Information Management, volume 2, pages 455–462, 2008.
[29] H. F. Li, H. Y. Huang, Y. C. Chen, Y. J. Liu, and S. Y. Lee. Fast and memory efficient mining of high utility itemsets in data streams. In 2008 Eighth IEEE International Conference on Data Mining, pages 881–886, Dec 2008.
[30] Vance Chiang-Chi Liao and Ming-Syan Chen. Dfsp: a depth-first spelling algorithm for sequential pattern mining of biological sequences. Knowledge and Information Systems, 38(3):623–639, Mar 2014.
[31] Jerry Chun-Wei Lin, Wensheng Gan, Tzung-Pei Hong, and Jeng-Shyang Pan. Incrementally updating high-utility itemsets with transaction insertion. In Advanced Data Mining and Applications, pages 44–56. Springer, 2014.
[32] J. Liu, K. Wang, and B. C. M. Fung. Direct discovery of high utility itemsets without candidate generation. In 2012 IEEE 12th International Conference on Data Mining, pages 984–989, 2012.
[33] J. Liu, K. Wang, and B. C. M. Fung. Mining high utility patterns in one phase without generating candidates. IEEE Transactions on Knowledge and Data Engineering, 28(5):1245–1257, May 2016.
[34] Mengchi Liu and Junfeng Qu. Mining high utility itemsets without candidate generation. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM ’12, pages 55–64. ACM, 2012.
[35] Ying Liu, Wei-keng Liao, and Alok Choudhary. A fast high utility itemsets mining algorithm. In Proceedings of the 1st international workshop on Utility-based data mining, pages 90–99. ACM, 2005.
[36] E. H. C. Lu and V. S. Tseng. Mining cluster-based mobile sequential patterns in locationbased service environments. In 2009 Tenth International Conference on Mobile Data Management: Systems, Services and Middleware, pages 273–278, May 2009.
[37] A. Mhatre, M. Verma, and D. Toshniwal. Extracting sequential patterns from progressive databases: A weighted approach. In 2009 International Conference on Signal Processing Systems, pages 788–792, 2009.
[38] SonN. Nguyen, Xingzhi Sun, and MariaE. Orlowska. Improvements of incspan: Incremental mining of sequential patterns in large database. In 9th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-05), 2005.
[39] S. Parthasarathy, M. J. Zaki, M. Ogihara, and S. Dwarkadas. Incremental and interactive sequence mining. In Proceedings of the Eighth International Conference on Information and Knowledge Management, CIKM ’99, pages 251–258, 1999.
[40] J. Pei, J. Han, B. Mortazavi-Asl, J. Wang, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11):1424–1440, November 2004.
[41] Wen-Chih Peng and Zhung-Xun Liao. Mining sequential patterns across multiple sequence databases. Data & Knowledge Engineering, 68(10):1014 – 1033, 2009.
[42] J Pisharath, Y Liu, B Ozisikyilmaz, R Narayanan, WK Liao, A Choudhary, and G Memik. Nu-minebench version 2.0 dataset and technical report, 2013.
[43] C. Raissi, P. Poncelet, and M. Teisseire. Speed : Mining maxirnal sequential patterns over data streams. In 2006 3rd International IEEE Conference Intelligent Systems, pages 546–552, 2006.
[44] Heungmo Ryang and Unil Yun. High utility pattern mining over data streams with sliding window technique. Expert Systems with Applications, 57:214 – 231, 2016.
[45] Jayakrushna Sahoo, Ashok Kumar Das, and A. Goswami. An efficient approach for mining association rules from high utility itemsets. Expert Systems with Applications, 42(13):5754 – 5778, 2015.
[46] Bai-En Shie, Hui-Fang Hsiao, Vincent S. Tseng, and Philip S. Yu. Mining high utility mobile sequential patterns in mobile commerce environments. In International Conference on Database Systems for Advanced Applications, pages 224–238. Springer, 2011.
[47] Bai-En Shie, Philip S. Yu, and Vincent S. Tseng. Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Systems with Applications, 39(17):12947 – 12960, 2012.
[48] V. S. Tseng, E. Hsueh-Chan Lu, and Cheng-Hsien Huang. Mining temporal mobile sequential patterns in location-based service environments. In 2007 International Conference on Parallel and Distributed Systems, volume 2, pages 1–8, Dec 2007.
[49] V. S. Tseng, B. E. Shie, C. W. Wu, and P. S. Yu. Efficient algorithms for mining high utility itemsets from transactional databases. IEEE Transactions on Knowledge and Data Engineering, 25(8):1772–1786, 2013.
[50] V. S. Tseng, C. W. Wu, P. Fournier-Viger, and P. S. Yu. Efficient algorithms for mining the concise and lossless representation of high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 27(3):726–739, 2015.
[51] V. S. Tseng, C. W. Wu, P. Fournier-Viger, and P. S. Yu. Efficient algorithms for mining top-k high utility itemsets. IEEE Transactions on Knowledge and Data Engineering, 28(1):54–67, Jan 2016.
[52] Vincent S Tseng, Chun-Jung Chu, and Tyne Liang. Efficient mining of temporal high utility itemsets from data streams. In Second International Workshop on Utility-Based Data Mining, volume 18, 2006.
[53] Vincent S. Tseng, Cheng-Wei Wu, Bai-En Shie, and Philip S. Yu. Up-growth: An efficient algorithm for high utility itemset mining. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, pages 253–262. ACM, 2010.
[54] Cheng Wei Wu, Bai-En Shie, Vincent S Tseng, and Philip S Yu. Mining top-k high utility itemsets. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 78–86. ACM, 2012.
[55] Pei-Hsin Wu, Wen-Chih Peng, and Ming-Syan Chen. Mining sequential alarm patterns in a telecommunication database. In Willem Jonker, editor, Databases in Telecommunications II, pages 37–51, Berlin, Heidelberg, 2001. Springer Berlin Heidelberg.
[56] Wenyan Wu and Le Gruenwald. Research issues in mining multiple data streams. In Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques, StreamKDD ’10, pages 56–60, 2010.
[57] C. Xu, Y. Chen, and R. Bie. Sequential pattern mining in data streams using the weighted sliding window model. In 2009 15th International Conference on Parallel and Distributed Systems, pages 886–890, 2009.
[58] S.-Y. Yang, C.-M. Chao, P.-Z. Chen, and C.-H. Sun. Incremental mining of acrossstreams sequential patterns in multiple data streams. Journal of Computers, 6(3):449–457, 2011.
[59] M. Y. Yeh, B. R. Dai, and M. S. Chen. Clustering over multiple evolving streams by events and correlations. IEEE Transactions on Knowledge and Data Engineering, 19(10):1349–1362, 2007.
[60] C. H. Yun and M. S. Chen. Mining mobile sequential patterns in a mobile commerce environment. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 37(2):278–295, March 2007.
[61] U. Yun, G. Lee, and E. Yoon. Efficient high utility pattern mining for establishing manufacturing plans with sliding window control. IEEE Transactions on Industrial Electronics, 64(9):7239–7249, Sept 2017.
[62] Unil Yun, Donggyu Kim, Heungmo Ryang, Gangin Lee, and Kyung-Min Lee. Mining recent high average utility patterns based on sliding window from stream data. Journal of Intelligent & Fuzzy Systems, 30(6):3605–3617, 2016.
[63] Unil Yun and Gangin Lee. Sliding window based weighted erasable stream pattern mining for stream data applications. Future Generation Computer Systems, 59:1 – 20, 2016.
[64] Unil Yun, Gwangbum Pyun, and Eunchul Yoon. Efficient mining of robust closed weighted sequential patterns without information loss. International Journal on Artificial Intelligence Tools, 24(01):1550007, 2015.
[65] Unil Yun and Heungmo Ryang. Incremental high utility pattern mining with static and dynamic databases. Applied Intelligence, 42(2):323–352, 2015.
[66] Unil Yun, Heungmo Ryang, and Keun Ho Ryu. High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates. Expert Systems with Applications, 41(8):3861 – 3878, 2014.
[67] Mohammed J. Zaki. Efficient enumeration of frequent sequences. In Proceedings of the seventh international conference on Information and knowledge management, CIKM ’98, pages 68–75, New York, NY, USA, 1998. ACM.
[68] Souleymane Zida, Philippe Fournier-Viger, Jerry Chun-Wei Lin, Cheng-Wei Wu, and Vincent S Tseng. Efim: A highly efficient algorithm for high-utility itemset mining. In Advances in Artificial Intelligence and Soft Computing, pages 530–546. Springer, 2015.
[69] Morteza Zihayat and Aijun An. Mining top-k high utility patterns over data streams. Information Sciences, 285:138 – 161, 2014. Processing and Mining Complex Data Streams.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2024-01-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2024-01-10起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw