進階搜尋


下載電子全文  
系統識別號 U0026-2208201419532100
論文名稱(中文) 子空間投影之密度函數比估計在二元分類問題之應用
論文名稱(英文) A Classification Approach Based on Density Ratio Estimation with Subspace Projection
校院名稱 成功大學
系所名稱(中) 統計學系
系所名稱(英) Department of Statistics
學年度 102
學期 2
出版年 103
研究生(中文) 陳慶全
研究生(英文) ChingChuan Chen
學號 R26014014
學位類別 碩士
語文別 英文
論文頁數 94頁
口試委員 指導教授-陳瑞彬
口試委員-張源俊
口試委員-鄭順林
口試委員-張升懋
中文關鍵字 密度函數比  維度詛咒  維度縮減  AUC  partial AUC 
英文關鍵字 density ratio function  curse of dimensionality  dimension reduction  AUC  partial AUC 
學科別分類
中文摘要 在此篇文章中,我們提出以密度函數比的方法分類的方法。Kanamori et al. (2009)提出利用最小平方法直接估計密度函數比的方法,並以其解決分類問題。然而,維度詛咒造成計算上的問題。為克服此問題,我們提出將資料投影到適當的維度下,再進行密度函數比的估計。我們以AUC做為評估分類的依據,模擬與實際資料都顯示我們的方法與羅吉斯回歸、SVM等方法可相比擬。另外,我們也以partial AUC作為評估分類的依據,結果亦呈現我們方法表現尚佳。
英文摘要 In this work, we consider a classification method based on density ratio estimation. Kanamori et al. (2009) proposed a direct estimation with least-squares approach for the density ratio estimation and showed how to use their density ration estimation approach for classification problem. However, the curse of the dimensionality would be caused the computational problem. To overcome this problem, we suggest projecting data into the proper subspace and then implement the density ratio estimation on this subspace instead of the whole data. We can choose to rotate data or basis. The latter is more efficient than the fronter. Simulation studies with different scenarios and several real examples are used to illustrate the performances of the proposed method. Based on the area under the receiver operating characteristic (ROC) curve (AUC) classification score, the results show the improvements of the proposed method and demonstrate the proposed method is comparable with other approaches, for example, logistic model approach. We also consider other classification score, partial AUC, the results presents that the proposed method performs fairly.
論文目次 摘要I
Abstract II
Acknowledgements III
Contents IV
List of Tables VI
1 Introduction 1
2 Literature Review 2
2.1 Framework of uLSIF 2
3 Methodology 5
3.1 Framework of PuLSIF_RD 5
3.1.1 ProjectionMatrix 5
3.1.2 RotationMatrix 5
3.1.3 Projection uLSIF 6
3.1.4 Summary of PuLSIF_RD 7
3.2 Framework of PuLSIF_RB 7
4 Results of PuLSIF Rotation Data 9
4.1 Simulation Results 9
4.2 Results of Real Dataset 15
5 Results of PuLSIF Rotation Basis 19
5.1 Simulation Results 19
5.2 Results of Real Dataset 23
5.3 Comparison with PuLSIF_RD 26
6 Results of PuLSIF_RB for Redundant Variables 31
7 Results of PuLSIF_RB Partial AUC 36
7.1 Methodology 36
7.2 Simulation Results and Results of Real Dataset 36
8 Conclusions and FutureWork 41
8.1 Conclusions 41
8.2 FutureWork 41
A Tables: The Results of PuLSIF_RD 42
B Tables: The Results of PuLSIF_RB 59
C Tables: The Results of PuLSIF_RB for Redundant Variables 73
D Tables: The Results of PuLSIF_RB Partial AUC 82
References 93

參考文獻 References
D. Bamber. The area above the ordinal dominance graph and the area below the receiver operating journal graph. Journal ofMathematical Psychology, 12(4):387–415, 1975.
L. E. Dodd and M. S. Pepe. Partial auc estimation and regression. Biometrics, 59(3):614–623, September 2003.
G.W. Flake and S. Lawrence. Efficient svm regression training with smo. Machine Learning, 46:271–290, 2002.
T. Hastie, R. Tibshirani, and J. Friedman. The elements of statistical learning. Springer,2009.
T. K. Ho and E. M. Kleinberg. Building projectable classifiers of arbitrary complexity. In Proceedings of the 13th International Conference on Pattern Recognition, pages 880–885, August 1996.
R. Hooke and T. A. Jeeves. "direct search" solution of numerical and statistical problems. Journal of the Association for ComputingMachinery, 8(2):212–229, 1961.
G. James,Witten.Witten, T. Hastie, and R. Tibshirani. An introduction to statistical learning. Springer, 2013.
T. Kanamori, S. Hido, and M. Sugiyama. A least-squares approach to direct importance estimation. Journal ofMachine Learning Research, 10:1391–1445, 2009.
T. Kanamori, T. Suzuki, and M. Sugiyama. Statistical analysis of kernel-based least-squares density-ratio estimation. Machine Learning, 86(3):335–367, 2012.
Ker-Chau Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414):316–327, June 1991.
B. V. Ramana, M. S. P. Babu, and N. B. Venkateswarlu. A critical study of selected classification algorithms for liver disease diagnosis. International Journal of Database Management Systems, 3(2):506–516,May 2011.
B. V. Ramana, M. S. P. Babu, and N. B. Venkateswarlu. A critical comparative study of liver patients from usa and india: An exploratory analysis. International Journal of Computer Science Issues, 9(2):506–516,May 2012.
G. Rätsch, T. Onoda, and K.-R. Müller. Soft margins for adaboost. Machine Learning, 42:287–320, 2001.
J. Q. Su and J. S. Liu. Linear combinations of multiple diagnostic markers. Journal of the American Statical Association, 88(424):1350–1355, December 1993.
M. Sugiyama, M. Yamada, P. Bünau, T. Suzuki, T. Kanamori, and M. Kawanabe. Direct density-ratio estimation with dimensionality reduction via least-squares heterodistributional subspace search. Neural Networks, 24(2):183–198, 2011.
M. Sugiyama, T. Suzuki, and T. Kanamori. Density Ratio Estimation in Machine Learning. Cambridge University Press, 2012.
Z. Wang and Y.-C. I. Chang. Marker selection via maximizing the partial area under the roc curve of linear risk scores. Biostatistics, pages 1–17, August 2010.
I-C. Yeh, K.-J. Yang, and T.-M. Ting. Knowledge discovery on rfm model using bernoulli sequence. Expert Systems with Applications, 36:5866-5871, April 2009.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2019-09-02起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2019-09-02起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw