進階搜尋


下載電子全文  
系統識別號 U0026-1806201714530200
論文名稱(中文) 有初始值篩選的核函數K中心聚類法
論文名稱(英文) Kernel K Medoids Algorithm with Selected Initial Values
校院名稱 成功大學
系所名稱(中) 統計學系
系所名稱(英) Department of Statistics
學年度 105
學期 2
出版年 106
研究生(中文) 方茜
研究生(英文) Qian Fang
電子信箱 fx2358183@gmail.com
學號 R26043021
學位類別 碩士
語文別 英文
論文頁數 36頁
口試委員 指導教授-溫敏杰
口試委員-吳國龍
口試委員-吳宗正
口試委員-高正雄
中文關鍵字 核函數  K-中心點  初始值 
英文關鍵字 Kernel function  K medoids  Initialization 
學科別分類
中文摘要 本研究所提出的聚類方法將高斯核函數與 k-中心聚類法結合起來,與此同時,還 加入了利用變量 Vj (Park and Jun, 2009) 來對資料進行排序並篩選出 r 個中間值作為 我們的初始中心點。初始值的篩選讓聚類分析過程更加高效,而高斯核函數的加 入可以讓我們的聚類方法比較不容易受異常值和干擾數據的影響。為了評估我們 所提出來的方法,我們分析了一些真實數據,合成數據以及關聯數據,並用 ARI (Adjusted Rand Index), F1 score 和 MSE (Mean Squared Error) 這些指標進行結果評估, 將其與 k-平均值 (k means) 聚類法,k 中心點 (k medoids) 聚類法的分群評估結果進行 比較。評估結果顯示,本文所提出的分群方法與 k 平均值 (k means) 聚類法,k-中心 點 (kmedoids) 聚類法相比,有更好的分群效果。
英文摘要 This study proposes a clustering algorithm that combine gaussian kernel function with k medoids clustering algorithm. In the meanwhile, we use a variable called Vj (Park and Jun, 2009) to rank objects and select the r middle values as our initial centers. The selection of initial values makes the clustering process more efficient, and the combination of gaussian kernel function makes the clustering outcome more resistant to outliers or noises. To evaluate the proposed algorithm, we analyze some real, synthetic and relational datasets and compar- ing with the results of other algorithms in terms of the Adjusted Rand Index, F1 score and Mean Squared Error. The outcomes show that our proposed algorithm having better cluster- ing performance over the other mentioned algorithms (k means, k medoids) in this study.
論文目次 摘要 i
Abstract ii
致謝 iii
Table of Contents iv
List of Tables vi
List of Figures vii
Chapter 1. Introduction 1
Chapter 2. Background 3
K Means Algorithm and KMedoidsAlgorithm . . . . . . . . . . . . . . . . . 3
. K means .................................. 3
. K medoids................................. 5
Kernel Function.................................. 6
Initialization.................................... 8
RelationalData .................................. 9
Different Dissimilarity Measures......................... 9
. Distance-based dissimilarity measures................... 9
. Correlation-based dissimilarity measures . . . . . . . . . . . . . . . . . 11
Chapter 3. Kernel k medoids algorithm with selected initial values 13
TheProposedMethod............................... 13
SensitivityCurve ................................. 15
Chapter 4. Numerical Experiments 18
Measures of Results................................ 18
. Adjusted rand index ............................ 18
. F1 score................................... 18
. Mean squared error............................. 19
Conventional Data ................................ 19
. Data description .............................. 19
. Clustering outcomes ............................ 19
Synthetic Data................................... 20
. Synthetic data formed by skewed distributions . . . . . . . . . . . . . . 20
. Synthetic data with outlier or noise .................... 22
RelationalData .................................. 24
Chapter 5. Conclusion 26
Bibliography 27
參考文獻 [1] Agrawal,K.P.andGarg,S.,Patel,P. Performance Measures for Densed and Arbitrary Shaped Clusters. CS-Journals, Vol 6, pp. 388-350, 2015.
[2] Chang, C. C. and Lin C. J . Training ν-support vector classifiers: Theory and algorithms. Neural Computation, 13(9):2119–2147, 2001.
[3] Duda, R., Hart P. and Stork, D. Pattern Classification, seconded. John Wiley and Sons, New York.
[4] Hubert, L. and Arabie, P. Comparing partitions. Journal of Classification, 2, 193– 218, 1985.
[5] Jain, A. K. Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31, 651–666, 2010.
[6] Kaufman, L. and Rousseeuw, P. Finding Groups in Data: An Introduction To Cluster Analysis. John Wiley, New York., ISBN: 0-471-87876-6, 1990.
[7] Lance, G. L. and Williams, W. T . A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems John Wiley and Sons, New York.
[8] MacQueen, J. Some methods for classification and analysis of multivariate observa- tions. Fifth Berkeley Symposium on Mathematics, Statistics and Probability, University of California Press, pp. 281–297, 1967.
[9] Mei, J. P. and Chen, L. Fuzzy clustering with weighted medoids for relational data. Pattern Recognition, 43, 1964–1974, 2010.
[10] Park, H.S. and Jun, C.H . A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, 36, 3336–3341, 2009.
[11] Saltelli, A., Tarantola, S., Campolongo, F. and Ratto, M. Sensitivity Analysis in Prac- tice, a Guide to Assessing Scientific Models. New York: Wiley, 2004.
[12] Wu, K. L. and Lin, Y. J. Kernelized K-Means Algorithm Based on Gaussian Kernel. Advances in Control and Communication, pp 657-664, 2012.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2018-06-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2018-06-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw