||A Study on Genetic-Fuzzy Data Mining Techniques
||Institute of Computer Science and Information Engineering
multi-objective genetic algorithm
在以往的研究中，許多遺傳模糊探勘方法都是使用整合處理商品方式針對單一支持度的模糊探勘問題。然而，這些方法在演化的過程是耗時的。故在本論文，我們針對此類型問題提出ㄧ個可以減少演化時間的改良演算法。所提的方法首先利用k-means分群技術將染色體分成k群。在同ㄧ群的每條染色體進而利用所在群的代表染色體的單一大項目集數目和其本身的隸屬函數的合適度值(suitability value)去計算適合度值(fitness value)。如此ㄧ來，演化時間便可以因為減少計算單一大項目集的次數而大幅度的降低。
此外，在實際應用上不同的評估標準亦是需要被考慮的。可以考慮不同評估標準的權衡關係的多目標演化演算法便非常適合用於解決這樣的問題。相較於遺傳演算法搜尋單一最佳解，多目標遺傳演算法搜尋一個解集合給使用者參考運用，此集合通稱為柏拉圖最佳解(Pareto-Optimal Surface)。故在本論文的最後一部份，我們提出一演算法企圖找出介於兩個評估標準下的柏拉圖最佳解。此兩個評估標準分別為單一大項目集的總數(total number of large 1-itemsets)和隸屬函數的合適度值(suitability)。使用者因此可以選擇恰當的隸屬函數進行模糊關聯規則的探勘。最後，我們進行實驗顯示所提的方法的有效性。從實驗結果，我們可以發現所提的方法中的確可以考慮隸屬函數的合適度程度與所挖掘的知識量。
Data mining techniques have been widely used for data analysis in recent years. Since transactions may consist of quantitative, linguistic and uncertain data, the fuzzy-set theory is applied to these kinds of transactions and many mining approaches are then extended for deriving fuzzy association rules. Membership functions, however, are usually assumed to be known in advance in most of these approaches. How to automatically derive appropriate membership functions and mined knowledge thus presents a challenge.
In this dissertation, the genetic algorithms are adopted to derive a set of appropriate membership functions, which is then used to find fuzzy association rules. We divide the genetic-fuzzy mining problems into four kinds according to the types of fuzzy mining problems and the ways of processing items. The types of fuzzy mining problems include single-minimum-support fuzzy-mining and multiple-minimum-support fuzzy-mining. The ways of processing items include processing all the items together (integrated) and processing them individually (divide-and-conquer).
In the past, several genetic-fuzzy data mining approaches have been proposed for the integrated single-minimum-support problem. They are, however, usually time-consuming on the evaluation process. In this dissertation, we thus first propose an improved approach to speed up of the evaluation time for this type of problem. The proposed approach divides chromosomes into k groups by using the k-means clustering technique. All the chromosomes in a cluster then use the number of large 1-itemsets derived from the representative chromosome in the cluster and their own suitability of membership functions to calculate the fitness values. The evaluation cost can thus be greatly reduced due to the time-saving in finding 1-itemsets.
The above approach is proposed for the mining problem with single minimum supports. In real applications, different items may have different criteria to judge their importance. We then propose a novel approach for the integrated multiple-minimum-support problem. The proposed approach first gathers similar items into groups. All the items in the same group are considered to have similar characteristics and are assigned similar values for initializing a better population. Each chromosome is evaluated by the criteria of requirement satisfaction and suitability of membership functions to estimate its fitness value. The proposed algorithm has two main advantages. The first one is that the proposed approach can derive an acceptable minimum support value and membership functions of each item for fuzzy association-rule mining. The second one is that the proposed approach can get a better initial population, including an appropriate number of linguistic terms, minimum support values, and membership functions of items by using the clustering technique. The above two approaches can also be easily extended to individual processing ways.
Besides, several criteria may be considered at the same time in a real application. The multi-objective evolutionary algorithms, which are used to find a set of solutions with trade-offs among different criteria, are thus very suitable for solving such a task. A set of solutions, namely Pareto-Optimal Surface, is derived and given to users, instead of only the best one solution obtained by genetic algorithms. In the last part of the dissertation, we thus propose an approach to find the Pareto solutions based on the two objective functions, namely the total number of large 1-itemsets and the suitability of membership functions, for deriving membership functions for mining fuzzy association rules. Users can thus choose appropriate solutions to mine fuzzy association rules. At last, experiments are made to show the effectiveness of the proposed approaches. From the experimental results, we find that the proposed approaches can indeed consider well both the suitability of membership function and the amount of mined knowledge.
摘 要 I
LIST OF FIGURES IX
LIST OF TABLES XII
CHAPTER 1 INTRODUCTION 1
1.1 MOTIVATION 1
1.2 OVERVIEW OF THE DISSERTATION 4
1.2.1 A Cluster-Based Genetic-Fuzzy Mining Approach for Items with Single Minimum Support 4
1.2.2 A Genetic-Fuzzy Mining Approach for Items with Multiple Minimum Supports 5
1.2.3 A Multi-Objective Genetic-Fuzzy Mining Approach for Items with Single Minimum Support 7
1.3 ORGANIZATION OF THE DISSERTATION 8
CHAPTER 2 REVIEW OF RELATED WORK 9
2.1 DATA MINING 9
2.2 FUZZY SET 11
2.3 FUZZY DATA MINING 13
2.4 GENETIC ALGORITHMS 16
2.5 GENETIC-FUZZY DATA MINING TECHNIQUES 18
2.5.1 Type I Problem: The Integrated Genetic-Fuzzy Mining Problem for Items with a Single Minimum Support 19
2.5.2 Type II Problem: The Divide-and-Conquer Genetic-Fuzzy Mining Problem for Items with a Single Minimum Support 23
CHAPTER 3 GENETIC-FUZZY DATA-MINING METHOD FOR ITEMS WITH SINGLE MINIMUM SUPPORT 26
3.1 INTRODUCTION 26
3.2 A GA-BASED MINING FRAMEWORK 27
3.3 MINING MEMBERSHIP FUNCTIONS AND ASSOCIATION RULES 29
3.3.1 Chromosome Representation 29
3.3.2 Initial Population 30
3.3.3 Fitness and Selection 30
3.3.4 Clustering Chromosomes 31
3.3.5 Genetic Operators 32
3.4 THE PROPOSED MINING ALGORITHM (ICGFSMS) 33
3.5 AN EXAMPLE 37
3.6 EXPERIMENTAL RESULTS 43
3.6.1 Description of the Experimental Datasets 43
3.6.2 Performance of the Proposed Algorithm 43
3.6.3 Comparison Results of the Previous Approach (IGFSMS) and the Proposed Approach (ICGFSMS) 48
3.6.4 Parameters Setting Evaluation on the Proposed Approach 51
3.6.5 Different Data Distributions Evaluation on the Proposed Approach 56
3.7 SUMMERY 64
CHAPTER 4 GENETIC-FUZZY DATA-MINING METHOD FOR ITEMS WITH MULTIPLE MINIMUM SUPPORTS 65
4.1 INTRODUCTION 65
4.2 A GA-BASED MINING FRAMEWORK 66
4.3 MINING MEMBERSHIP FUNCTIONS AND ASSOCIATION RULES 68
4.3.1 Chromosome Representation 68
4.3.2 Initial Population 70
4.3.3 The Required Number of Large 1-itemsets 76
4.3.4 Fitness and Selection 77
4.3.5 Genetic Operators 79
4.4 THE PROPOSED MINING ALGORITHM 80
4.5 AN EXAMPLE 82
4.6 EXPERIMENTAL RESULTS 85
4.6.1 Description of the Experimental Datasets 86
4.6.2 The Performance of the Proposed Approach 86
4.7 SUMMARY 93
CHAPTER 5 MULTI-OBJECTIVE GENETIC-FUZZY MINING APPROACH FOR ITEMS WITH SINGLE MINIMUM SUPPORT 95
5.1 INTRODUCTION 95
5.2 GA-BASED MULTI-OBJECTIVE OPTIMIZATION PROBLEMS 96
5.3 THE MULTI-OBJECTIVE GENETIC-FUZZY MINING APPROACH 99
5.3.1 Chromosome Representation 99
5.3.2 Initial Population 99
5.3.3 The Two Objective Functions 100
5.3.4 Fitness Assignment 102
5.3.5 Genetic Operators 104
5.4 THE PROPOSED MINING ALGORITHM 105
5.5 AN EXAMPLE 108
5.6 EXPERIMENTAL RESULTS 114
5.6.1 The Evolution of Pareto Fronts by the Proposed Approach 114
5.6.2 The Effects with Minimum Supports and Minimum Confidences 116
5.7 SUMMERY 118
CHAPTER 6 CONCLUSIONS AND FUTURE WORKS 120
 W. H. Au, Keith C. C. Chan and X. Yao, “A novel evolutionary data mining algorithm with applications to churn prediction,” IEEE Transactions on Evolutionary Computation, Vol. 7, No. 6, pp. 532-545, 2003.
 R. Agrawal, T. Imielinksi and A. Swami, “Mining association rules between sets of items in large database,” The ACM SIGMOD Conference on Management of Data, pp. 207-216, 1993.
 R. Agrawal, T. Imielinksi and A. Swami, “Database mining: a performance perspective,” IEEE Transactions on Knowledge and Data Engineering, Vol. 5, No. 6, pp. 914-925, 1993.
 Y. Aumann and Y. Lindell, “A statistical theory for quantitative association rules,” The ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 261-270, 1999.
 R. Agrawal, R. Srikant and Q. Vu, “Mining association rules with item constraints,” The International Conference on Knowledge Discovery in Databases and Data Mining, pp. 67-73, 1997.
 R. Agrawal and R. Srikant, “Fast algorithm for mining association rules,” The International Conference on Very Large Data Bases, pp. 487-499, 1994.
 A. Ben-Dor, R. Shamir and Z. Yakhini, “Clustering gene expression patterns,” The Annual International Conference on Computational Molecular Biology, pp. 281-297, 1999.
 C. C. Chan and W. H. Au, “Mining fuzzy association rules,” The Conference on Information and Knowledge Management, pp. 209-215, 1997.
 J. Casillas, O. Cordón, M. J. del Jesus and F. Herrera, “Genetic tuning of fuzzy rule deep structures preserving interpretability and its interaction with fuzzy rule set reduction,” IEEE Transactions on Fuzzy Systems, Vol. 13, No. 1, pp. 13-29, 2005.
 O. Cordón, F. Herrera and P. Villar, “Generating the knowledge base of a fuzzy rule-based system by the genetic learning of the data base,” IEEE Transactions on Fuzzy Systems, Vol. 9, No. 4, pp. 667-674, 2001.
 J. Chen, A. Mikulcic and D. H. Kraft, “An integrated approach to information retrieval with fuzzy clustering and fuzzy inferencing,” in O. Pons, M. A. Vila and J. Kacprzyk (eds.), Knowledge Management in Fuzzy Databases, Heidelberg, Germany: Physica-Verlag, 2000.
 C. A. Coello, D. A. Van Veldhuizen and G. B. Lamont, Evolutionary Algorithms for Solving Multi-objective Problems, Kluwer Academic Publishers, 2002.
 J. C. Dunn, “A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters,” Journal of Cybernetics, Vol. 3, pp. 32-57, 1973.
 K. Deb, Multi-objective Optimization Using Evolutionary Algorithms, John Wiley & Sons, 2001.
 K. Deb, S. Agrawal, A. Pratab and T. Meyarivan, “A fast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, Vol. 6, No. 2, pp. 681-695, 2002.
 P. J. Darwen and X. Yao, “Speciation as automatic categorical modularization,” IEEE Transactions on Evolutionary Computation, Vol. 1, No 2, pp. 101-108, 1997.
 M. Ester, H. P. Kriegel, J. Sander and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” The International Conference on knowledge Discovery and Data Mining, pp. 226-231, 1996.
 C. M. Fonseca and P. J. Fleming, "Genetic algorithms for multiobjective optimization: Formulation, discussion and generalization," The International Confidence on Genetic Algorithms, pp. 416-423, 1993.
 W. J. Frawley, G. Piatetsky-Shapiro and C. J. Matheus, “Knowledge discovery in databases: an overview,” The AAAI Workshop on Knowledge Discovery in Databases, pp. 1-27, 1991.
 D. E. Goldberg, Genetic Algorithms in Search, Optimization & Machine Learning, Addison Wesley, 1989.
 J. J. Grefenstette, “Optimization of control parameters for genetic algorithms”, IEEE Trans System Man, and Cybernetics, Vol. 16, No. 1, pp. 122-128, 1986.
 J. H. Holland. Adaptation in Natural and Artificial Systems, University of Michigan Press, 1975.
 T. P. Hong, M. J. Chiang and S. L. Wang, "Mining from quantitative data with linguistic minimum supports and confidences", The Eleventh IEEE International Conference on Fuzzy Systems, pp. 494-499, 2002.
 T. P. Hong, C. H. Chen, Y. C. Lee and Y. L. Wu, "Genetic-fuzzy data mining with divide-and-conquer strategy", IEEE Transactions on Evolutionary Computation, Vol. 12, No. 2, pp. 252-265, 2008.
 T. P. Hong, C. H. Chen, Y. L. Wu and Y. C. Lee, “A GA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership functions,” Soft Computing, Vol. 10, No. 11, pp. 1091-1101. 2006.
 T. P. Hong, C. H. Chen, Y. L. Wu and Vincent S. Tseng, "Finding active membership functions in fuzzy data mining," The Workshop on Foundations of Data Mining in The Fourth IEEE International Conference on Data Mining, 2004.
 A. Homaifar, S. Guan and G. E. Liepins, “A new approach on the traveling salesman problem by genetic algorithms,” The Fifth International Conference on Genetic Algorithms, pp. 460-466, 1993.
 T. P. Hong, C. S. Kuo and S. C. Chi, "Mining association rules from quantitative data," Intelligent Data Analysis, Vol. 3, No. 5, pp. 363-376, 1999.
 T. P. Hong, C. S. Kuo and S. C. Chi, "Trade-off between time complexity and number of rules for fuzzy mining from quantitative data," International Journal of Uncertainty, Fuzziness and Knowledge-based Systems, Vol. 9, No. 5, pp. 587-604, 2001.
 T. P. Hong, K. Y. Lin and B. C. Chien, “Mining fuzzy multiple-level association rules from quantitative data,” Applied Intelligence, Vol. 18, No. 1, pp. 79-90, 2003.
 T. P. Hong, K. Y. Lin and S. L. Wang, “Fuzzy data mining for interesting generalized association rules,” Fuzzy Sets and Systems, Vol. 138, No. 2, pp. 255-269, 2003.
 F. Herrera, M. Lozano and J. L. Verdegay, “Fuzzy connectives based crossover operators to model genetic algorithms population diversity,” Fuzzy Sets and Systems, Vol. 92, No. 1, pp. 21-30, 1997.
 T. P. Hong and Y. C. Lee, “Mining coverage-based fuzzy rules by evolutional computation,” The IEEE International Conference on Data Mining, pp. 218-224, 2001.
 P. A. Heng, T. T. Wong, Y. Rong, Y. P. Chui, Y. M. Xie, K. S. Leung and P. C. Leung, “Intelligent inferencing and haptic simulation for Chinese acupuncture learning and training,” IEEE Transactions on Information Technology in Biomedicine, Vol. 10, No. 1, pp. 28-41, 2006.
 H. Ishibuchi and T. Yamamoto, “Rule weight specification in fuzzy rule-based classification systems,” IEEE Transactions on Fuzzy Systems, Vol. 13, No. 4, pp. 428-435, 2005.
 H. Ishibuchi and T. Yamamoto, “Fuzzy rule selection by multi-objective genetic local search algorithms and rule evaluation measures in data mining,” Fuzzy Sets and Systems, Vol. 141, pp. 59-88, 2004.
 Y. Jin, Multi-objective Machine Learning, Springer, 2006.
 M. Kaya and R. Alhajj, “A clustering algorithm with genetically optimized membership functions for fuzzy association rules mining,” The IEEE International Conference on Fuzzy Systems, pp. 881-886, 2003.
 M. Kaya and R. Alhajj, "Utilizing genetic algorithms to optimize membership functions for fuzzy weighted association rules mining", Applied Intelligence, Vol. 24, No. 1, pp. 7-15, 2006.
 M. Kaya and R. Alhajj, “Integrating multi-objective genetic algorithms into clustering for fuzzy association rules mining,” The Fourth IEEE International Conference on Data Mining, pp. 431-434, 2004.
 M. Kaya, “Multi-objective genetic algorithm based approaches for mining optimized fuzzy association rules,” Soft computing, Vol. 10, pp. 578-586, 2006.
 C. Kuok, A. Fu and M. Wong, “Mining fuzzy association rules in databases,” SIGMOD Record, Vol. 27, No. 1, pp. 41-46, 1998.
 V. R. Khare, X. Yao, B. Sendhoff, Y. Jin and H. Wersing, “Co-evolutionary modular neural networks for automatic problem decomposition,” The 2005 IEEE Congress on Evolutionary Computation, Vol. 3, pp. 2691-2698, 2005.
 Y. C. Lee, T. P. Hong and W. Y. Lin, “Mining fuzzy association rules with multiple minimum supports using maximum constraints”, Lecture Notes in Computer Science, Vol. 3214, pp. 1283-1290, 2004.
 Y. C. Lee, T. P. Hong and T. C. Wang, "Mining multiple-level association rules under the maximum constraint of multiple minimum supports", Lecture Notes in Computer Science, Vol. 4031, pp. 1329-1338, 2006.
 Y. C. Lee, T. P. Hong and T. C. Wang, "Multi-level fuzzy mining with multiple minimum supports", Expert Systems with Applications, Vol. 34, No. 1, pp. 459-468, 2008.
 H. Liang, Z. Wu and Q. Wu, “A fuzzy based supply chain management decision support system,” The World Congress on Intelligent Control and Automation, Vol. 4, pp. 2617-2621, 2002.
 J. B. McQueen, “Some Methods of Classification and Analysis of Mutivariate Observations,” Proceedings of the 5th Berkeley Symposium on Mathematical Satistics and Probability, pp. 281-297, 1967.
 Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag, 1994.
 M. Mitchell, An Introduction to Genetic Algorithms, MIT press, 1996.
 E. H. Mamdani, “Applications of fuzzy algorithms for control of simple dynamic plants,” IEEE Proceedings, pp.1585-1588, 1974.
 A. Parodi and P. Bonelli, “A new approach of fuzzy classifier systems,” The Fifth International Conference on Genetic Algorithms, Morgan Kaufmann, Los Altos, CA, pp. 223-230, 1993.
 H. Roubos and M. Setnes, “Compact and transparent fuzzy models and classifiers through iterative complexity reduction,” IEEE Transactions on Fuzzy Systems, Vol. 9, No. 4, pp. 516-524, 2001.
 K. A. Rasmani and Q. Shen, “Modifying weighted fuzzy subsethood-based rule models with fuzzy quantifiers,” The IEEE International Conference on Fuzzy Systems, Vol. 3, pp. 1679-1684, 2004.
 J. D. Schaffer, “Multiple objective optimization with vector evaluated genetic algorithms,” The International Conference on Genetic Algorithms, pp. 93-100, 1985.
 R. Srikant and R. Agrawal, “Mining quantitative association rules in large relational tables,” The 1996 ACM SIGMOD International Conference on Management of Data, pp. 1-12, 1996.
 Elie Sanchez, et al, Genetic Algorithms and Fuzzy Logic Systems: Soft Computing Perspectives (Advances in Fuzzy Systems - Applications and Theory, Vol. 7), World-Scientific, 1997.
 William Siler and J. James, Fuzzy Expert Systems and Fuzzy Reasoning, John Wiley & Sons, 2004.
 M. Setnes and H. Roubos, “GA-fuzzy modeling and classification: Complexity and performance,” IEEE Transactions on Fuzzy Systems, Vol. 8, No. 5, pp. 509-522, 2000.
 C. H. Wang, T. P. Hong and S. S. Tseng, “Integrating fuzzy knowledge by genetic algorithms,” IEEE Transactions on Evolutionary Computation, Vol. 2, No.4, pp. 138-149, 1998.
 C. H. Wang, T. P. Hong and S. S. Tseng, “Integrating membership functions and fuzzy rule sets from multiple knowledge sources,” Fuzzy Sets and Systems, Vol. 112, pp. 141-154, 2000.
 X. Yao, “Adaptive divide-and-conquer using populations and ensembles,” The International Conference on Machine Learning and Application, pp. 13-20, 2003.
 S. Yue, E. Tsang, D. Yeung and D. Shi, “Mining fuzzy association rules with weighted items,” The IEEE International Conference on Systems, Man and Cybernetics, pp. 1906-1911, 2000.
 H. Zhang and D. Liu, Fuzzy Modeling and Fuzzy Control, Springer Verlag, 2006.
 L. A. Zadeh, “Fuzzy sets,” Information and Control, Vol. 8, No. 3, 1965, pp. 338-353.
 E. Zitzler, M. Laumanns and L. Thiele, "SPEA2: Improving the strength Pareto evolutionary algorithm for multiobjective optimization," Evolutionary Methods for Design, Optimization and Control with Application to Industrial Problems, pp. 95-100, 2001.
 Z. Zhang, Y. Lu and B. Zhang, “An effective partitioning-combining algorithm for discovering quantitative association rules,” The Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 261-270, 1997.