進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-2406201412200800
論文名稱(中文) 基於MapReduce分散式單調性支援向量機之研究
論文名稱(英文) A Study of MapReduce-Based Distributed Monotonic SVM Model
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 102
學期 2
出版年 103
研究生(中文) 陳泰霖
研究生(英文) Tai-Lin Chen
學號 R76014022
學位類別 碩士
語文別 英文
論文頁數 56頁
口試委員 指導教授-李昇暾
口試委員-林清河
口試委員-耿伯文
口試委員-鄭亦君
中文關鍵字 支援向量機  Hadoop  MapReduce  單調性先驗知識 
英文關鍵字 SVM  Hadoop  MapReduce  Monotonic Prior knowledge 
學科別分類
中文摘要 支援向量機(SVM)是一種高運算成本的機器學習演算法,傳統的SVM透過使用二次規劃的方法計算矩陣產生相當大的運算成本以進行資料分類,因此本研究利用Hadoop框架透過MapReduce平行運算架構將支援向量機演算法進行分散式處理,本研究所提出的方法可以有效率的提高運算速度,減少記憶體負擔,增加支援向量機處理大量資料的可行性。
由於網路的快速興起,雲端運算在近年來更加成熟,雲端作業系統整合高運算能力的基礎建設,突破了過去處理資料的限制,近年來資料產生的速度越來越快,在某些狀況下使用一台主機進行處理所需要的時間變得過長,結合雲端運算以及機器學習,可以從大量的資料中得到更多具有價值的資訊。
本研究結合開源軟體Hadoop,Hadoop結合了MapReduce運算結構,並且具有結合電腦叢集中儲存能力的檔案結構,MapReduce可以自動的將叢集中的運算資源進行分配,讓開發者更加專注於資料處理的部分。
本研究將透過針對支援向量機的模型,提出一個考慮單調性資料型態的模型稱為Monotonic SVM(MCSVM),對於資料中具有單調性質之資料給予先驗知識,以提升模型的準確率,此模型需要使用二次規劃針對整個資料矩陣求解,造成單調性支援向量機之複雜度高,求解時間長,本研究透過MapReduce平行運算,提出MapReduce MCSVM,針對高複雜度之特性在經過資料切割後,大幅降低訓練時間,增加單調性支援向量機之實務可行性。
英文摘要 Support Vector Machine (SVM) is a high computing cost algorithm. Traditional SVM uses quadratic programming to solve the classification problem, but incurs high cost during computation. To solve this problem, this study proposes the use of MapReduce in Hadoop. In order to increase the accuracy of classification, we utilize monotonic prior knowledge from experts during the training phase.
Due to the rapid development of the Internet and storage infrastructure, cloud computing has matured in recent years. Some cloud operating systems integrate the high computation ability of cloud infrastructure and to break through limitations of data processing in the past. Data has been produced at a growing rate in recent years, and the volume of data has become so too large to be processed by a single machine. By combining cloud computing and machine learning, we can obtain more valuable information in from large scale data.
This study uses Hadoop, which is an open-source framework, to implement the MapReduce framework, which is a distributed computing environment and a distributed file system. MapReduce automatically allocates computing resources among the cluster, and allows developers to focus on data processing.
This study proposes a model of SVM called MCSVM that considers the monotonic property of data. Prior knowledge of monotonic property is given to the model to increase the accuracy of classification prediction. The MCSVM uses quadratic programming to find the optimal solution, which results in high complexity and the need for long training time. This study proposes a MapReduce MCSVM that significantly reduces the required training time, and increases the feasibility of MCSVM in real world applications.
論文目次 摘要 III
ABSTRACT IV
誌謝 V
List of Table VIII
List of Figure IX
Chapter 1 Introduction 1
1.1 Background and motivation 1
1.2 Objectives of Research 2
1.3 Organization of Research 3
Chapter 2 Literature Review 5
2.1 Cloud Computing 5
2.2 Hadoop 6
2.2.1 Hadoop Distributed File System (HDFS) 7
2.2.2 MapReduce 7
2.3 Support vector machine (SVM) 8
2.3.1 Construction of SVM 9
2.4 Classification with Monotonicity Constraints 12
Chapter 3 Research Methodology 15
3.1 Concept of Monotonicity 15
3.2 Derivate Monotonicity to SVM 16
3.3 Constructing Monotonicity Constraints 20
3.4 Solve MC-SVM in subSVM 21
3.5 MCSVM with MapReduce Framework 23
3.5.1 Data preprocessing 25
3.5.2 MapReduce training module 26
3.5.3 Testing module 28
Chapter 4 Experiment and Result analysis 30
4.1 Environment of Experiments and Data Collection 30
4.1.1 Experiment environment 30
4.1.2 Data Collection 31
4.2 Experiment step 35
4.3 Performance measures 37
4.4 Experiment result 39
Chapter 5 Conclusions and Future Works 52
5.1 Conclusions 52
5.2 Recommendations for future works 53
Reference 54
參考文獻 Alham, N. K., Li, M., Liu, Y., & Hammoud, S. (2011). A MapReduce-based distributed SVM algorithm for automatic image annotation. Computers & Mathematics with Applications, 62(7), 2801-2811.
Archer, N. P., & Wang, S. (1993). Learning bias in neural networks and an approach to controlling its effect in monotonic classification. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 15(9), 962-966. doi: 10.1109/34.232084
Borthakur, D. (2007). The hadoop distributed file system: Architecture and design.
Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2), 121-167.
Caruana, G., Li, M., & Liu, Y. (2013). An ontology enhanced parallel SVM for scalable spam filter training. Neurocomputing, 108, 45-57.
Cortes, C., & Vapnik, V. (1995a). Support-vector networks. Machine learning, 20(3), 273-297.
Cortes, C., & Vapnik, V. (1995b). Support-Vector Networks. Mach. Learn., 20(3), 273-297. doi: 10.1023/a:1022627411411
Courant, R., & Hilbert, D. (1970). Methods of Mathematical Physics (Vol. I, II). New York: Wiley Interscience.
Dean, J., & Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113.
Dembczyński, K., Kotłowski, W., & Słowiński, R. (2008). Ensemble of Decision Rules for Ordinal Classification with Monotonicity Constraints. In G. Wang, T. Li, J. Grzymala-Busse, D. Miao, A. Skowron & Y. Yao (Eds.), Rough Sets and Knowledge Technology (Vol. 5009, pp. 260-267): Springer Berlin Heidelberg.
Doumpos, M., & Pasiouras, F. (2005). Developing and Testing Models for Replicating Credit Ratings: A Multicriteria Approach. Computational Economics, 25(4), 327-341. doi: 10.1007/s10614-005-6412-4
Doumpos, M., & Zopounidis, C. (2009). MONOTONIC SUPPORT VECTOR MACHINES FOR CREDIT RISK RATING. New Mathematics and Natural Computation, 05(03), 557-570. doi: doi:10.1142/S1793005709001520
Gamarnik, D. (1998). Efficient learning of monotone concepts via quadratic optimization. Paper presented at the Proceedings of the eleventh annual conference on Computational learning theory, Madison, Wisconsin, United States.
Ghemawat, S., Gobioff, H., & Leung, S.-T. (2003). The Google file system. Paper presented at the ACM SIGOPS Operating Systems Review.
Greco, S., Matarazzo, B., & Słowiński, R. (1998). A new rough set approach to evaluation of bankruptcy risk. Operational tools in the management of financial risks, 121-136.
Huang, W., Nakamori, Y., & Wang, S.-Y. (2005). Forecasting stock market movement direction with support vector machine. Computers & Operations Research, 32(10), 2513-2522. doi: http://dx.doi.org/10.1016/j.cor.2004.03.016
Huang, Z., Chen, H., Hsu, C.-J., Chen, W.-H., & Wu, S. (2004). Credit rating analysis with support vector machines and neural networks: a market comparative study. Decis. Support Syst., 37(4), 543-558. doi: 10.1016/s0167-9236(03)00086-1
Kim, H. S., & Sohn, S. Y. (2010). Support vector machines for default prediction of SMEs based on technology credit. European Journal of Operational Research, 201(3), 838-846. doi: http://dx.doi.org/10.1016/j.ejor.2009.03.036
Man Gyun, N., Won Seo, P., & Dong Hyuk, L. (2008). Detection and Diagnostics of Loss of Coolant Accidents Using Support Vector Machines. Nuclear Science, IEEE Transactions on, 55(1), 628-636. doi: 10.1109/tns.2007.911136
Mell, P., & Grance, T. (2011). The NIST definition of cloud computing (draft). NIST special publication, 800(145), 7.
Mercer, J. (1909). Functions of Positive and Negative Type, and Their Connection with the Theory of Integral Equations. Transactions of the London Philosophical Society (V), 9, 415-446.
Pazzani, M. J., Mani, S., & Shankle, W. R. (2001). Acceptance of Rules Generated by Machine Learning among Medical Experts. Methods of Information in Medicine(2001 (Vol. 40): Issue 5 2001), 380-385.
Pendharkar, P. C., & Rodger, J. A. (2003). Technical efficiency-based selection of learning cases to improve forecasting accuracy of neural networks under monotonicity assumption. Decision Support Systems, 36(1), 117-136. doi: http://dx.doi.org/10.1016/S0167-9236(02)00138-0
Platt, J. (1998). Sequential minimal optimization: A fast algorithm for training support vector machines.
Potharst, R., & Feelders, A. J. (2002). Classification trees for problems with monotonicity constraints. SIGKDD Explor. Newsl., 4(1), 1-10. doi: 10.1145/568574.568577
Schölkopf, B., & Smola, A. J. (2002). Learning with Kernels --Support Vector Machines, Regularization, Optimization and Beyond. Cambridge, Massachusetts: The MIT Press.
Shin, K.-S., Lee, T. S., & Kim, H.-j. (2005). An application of support vector machines in bankruptcy prediction model. Expert Systems with Applications, 28(1), 127-135. doi: http://dx.doi.org/10.1016/j.eswa.2004.08.009
Vapnik, V. N. (1995). The nature of statistical learning theory: Springer-Verlag New York, Inc.
Vapnik, V. N. (1998). Statistical learning theory.
Vapnik, V. N. (1998). Statistical learning theory: Wiley.
Wang, S. (1995). The Unpredictability of Standard Back Propagation Neural Networks in Classification Applications. Management Science, 41(3), 555-559. doi: 10.2307/2632981
Wang, S. (2003). Adaptive non-parametric efficiency frontier analysis: a neural-network-based model. Computers & Operations Research, 30(2), 279-295. doi: http://dx.doi.org/10.1016/S0305-0548(01)00095-8
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2024-12-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw