進階搜尋


下載電子全文  
系統識別號 U0026-2406201317160800
論文名稱(中文) 改善K-means分群方法之研究─以樣本點為基礎
論文名稱(英文) A Study of Improving K-means Clustering Method- Based on Sample Points
校院名稱 成功大學
系所名稱(中) 統計學系碩博士班
系所名稱(英) Department of Statistics
學年度 101
學期 2
出版年 102
研究生(中文) 高修恒
研究生(英文) Hsiou-Hen Kao
學號 R26001061
學位類別 碩士
語文別 英文
論文頁數 52頁
口試委員 指導教授-溫敏杰
口試委員-吳宗正
口試委員-吳國龍
口試委員-顏榮祥
中文關鍵字 群集分析  K-means  Relational data 
英文關鍵字 Cluster analysis  K-means  relational data 
學科別分類
中文摘要 我們將K-means演算法中的中心點由平均數改為在樣本點上,提出了K-exemplars演算法。K-exemplars演算法不僅可以處理原始資料,更可以處理Relational data。雖然K-exemplars的分群正確率未必比K-means好,但也不會太差。且K-exemplars的疊代次數顯著比K-means少,因此K-exemplars的收斂速度較快。例如,在Iris data中,K-means與K-exemplars之疊代次數分別為7.22和4.02,K-exemplars疊代次數減少3.2次。K-exemplars可使用在不同的距離公式,並改善了K-means易受到離群值影響的問題。
英文摘要 Comparing to K-means algorithm, we constrain the cluster centers on the data points rather than the mean, so we propose K-exemplars algorithm. Based on this concept, K-exemplars algorithm can not just deal with the raw data but also the relational data. Although the cluster accuracy rate of K-exemplars method may not be better than K-means method, the difference is small. But the iteration times is less than K-means method significantly. This leads the convergence rate of K-exemplars is faster than K-means. In Iris data, the iteration times of K-means and K-exemplars are 7.22 and 4.02, respectively; K-exemplars reduces 3.2 iterations. Moreover, K-exemplars can be applied on any specified dissimilarity measure. K-means is influenced by outlier, but K-exemplars improves this problem.
論文目次 Contents................................................I
List of Tables.........................................II
List of Figures.......................................III
Chapter 1 Introduction..................................1
1.1 Research Background and Research Motivations........1
1.2 Research Objectives.................................2
1.3 Research Structure..................................2
Chapter 2 Literature Review.............................4
2.1 K-means Algorithm...................................4
2.2 Relational Data.....................................5
Chapter 3 Research Methodology..........................7
3.1 K-exemplars Algorithm...............................7
3.2 K-exemplars Algorithm- II...........................9
Chapter 4 Numerical Examples...........................12
4.1 Clustering by Raw Data.............................12
4.2 Clustering with Outlier or Noise Data..............16
4.3 Clustering by Relational Data......................22
4.4 Clustering by K-exemplars- II......................31
Chapter 5 Conclusions and Future Studies...............32
References.............................................34
Appendix (1): R code- K-means and K-exemplars..........36
Appendix (2): R code- K-exemplars- II..................43
Appendix (3): The Uniform-16 data set..................45
參考文獻 1. Arai, K. and Barakbah, A. R. (2007), “Hierarchical K-means: an algorithm for centroids initialization for K-means,” Reports of the Faculty of Science and Engineering, Saga University, 36 (1), pp. 25-31.
2. Chen, S. Y. (2005), “Multivariate analysis,” Fourth edition, Hwa Tai publishing, Taipei, Taiwan. (in Chinese).
3. Fisher, R. A. (1936), “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, 7 (2), pp. 179-188.
4. Hwang, C. M., Yang, M. S., Hung, W. L. and Lee, M.G. (2012), “A similarity measure of intuitionistic fuzzy sets based on Sugeno integral with its application to pattern recognition,” Information Sciences, 189, pp. 93-109.
5. Jain, A. K. (2010), “Data clustering: 50 years beyond K-means,” Pattern Recognition Letters, 31, pp. 651-666.
6. Johnson, R. A. and Wichern, D. W. (2007), “Applied multivariate statistical analysis,” Sixth edition, Pearson Education, Inc., Upper Saddle River, New Jersey, USA.
7. MacLeod, N. (n. d.), “Palaeo-math 101: MDS and ordination,” Retrieved March 14, 2013, from
http://www.palass.org/modules.php?name=palaeo_math&page=20
8. MacQueen, J. B. (1967), “Some methods for classification and analysis of multivariate observations,” Proceedings of 5th Berkeley symposium on mathematical statistics and probability, University of California Press, pp. 281-297.
9. Pal, K., Pal, N. R., Keller, J. M. and Bezdek, J. C. (2005), “Relational mountain (density) clustering method and web log analysis,” International Journal of Intelligent Systems, 20, pp. 375-392.
10. Rand, W. M. (1971), “Objective criteria for the evaluation of clustering methods,” Journal of the American Statistical Association, 66 (336), pp. 846-850.
11. Saltelli, A., Tarantola, S., Campolongo, F. and Ratto, M. (2004), “Sensitivity analysis in practice,” John Wiley & Sons Ltd, Chichester, England.
12. Wu, K. L. and Lin, Y. J. (2012), “Kernelized K-means algorithm based on gaussian kernel,” Advances in Control and Communication, LNEE, 137, pp. 657-664.
13. Yang, M. S. and Shih, H. M. (2001), “Cluster analysis based on fuzzy relations,” Fuzzy Sets and Systems,120, pp.197-212.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2015-07-02起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2015-07-02起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw