進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0608201813024500
論文名稱(中文) 具通透性之Hadoop資料服務
論文名稱(英文) A Transparent Hadoop Data Service
校院名稱 成功大學
系所名稱(中) 資訊工程學系
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 106
學期 2
出版年 107
研究生(中文) 李信穎
研究生(英文) Hsin-Ying Lee
學號 P76051315
學位類別 碩士
語文別 中文
論文頁數 28頁
口試委員 指導教授-蕭宏章
口試委員-許慶賢
口試委員-劉炳宏
口試委員-趙育昌
口試委員-李強
中文關鍵字 Hadoop  HDFS  HBase  分散式儲存 
英文關鍵字 Hadoop  HDFS  HBase  Distributed data store 
學科別分類
中文摘要 Hadoop為一計算框架及分散式儲存系統,其儲存系統名為Hadoop Distributed File System (HDFS),設計用來儲存超大型檔案,卻無法有效的處理大量小資料,目前雖然存在許多方法可以解決小資料的議題,但對於使用者來說需要額外付出相當多的處理步驟,而HBase是建構在HDFS上的分散式資料庫,提供高效率的隨機存取,可以用來解決小資料的問題,但這兩套系統皆需要花費時間學習外且皆使用Java語言撰寫,對於一般使用者來說是個相當高的門檻。
本論文提出一具通透性的分散式儲存系統,設計的目的是為了解決HDFS上小資料的議題外,並支援HDFS Interface介面用以相容Hadoop體系相關應用,使得Hive及Spark等專案之使用者不需額外修改程式碼就可以直接使用系統存取資料,同時提供簡易的Web API隱藏巨量資料平台後複雜的操作,讓使用者輕易的透過API,將資料匯入至巨量資料平台。
英文摘要 Hadoop is an open source distributed processing framework and storage for big data. Its storage called Hadoop Distributed File System (HDFS). HDFS is designed for storing very large files with streaming data access patterns, but it can't effectively handle lots of files. Although there are many ways to solve small data problem, users still need to take a lot of extra processing. HBase is a distributed database that is often paired with Hadoop, providing efficient random access, HBase can be used to solve small data problems, but these two systems must to take time to learn and are written in Java, which are high barrier introducing to data analysts. This paper proposes a transparent distributed storage system, designed to solve the problem of small data on HDFS, and supports the HDFS Interface to be compatible with Hadoop ecosystem, such as Hive, Spark, etc. Users can access the data directly without changing any code, and system also provide a simple Web API to hide the data platform’s complex operations, let users migrate data into the data platform from other file servers through the API.
論文目次 摘要 i
Extended Abstract ii
致謝 v
目錄 vi
表目錄 viii
圖目錄 ix
Chapter 1. 簡介 1
Chapter 2. 背景研究 4
2.1 HDFS 4
2.2 HBase 4
2.3 小資料於HDFS上之議題 5
2.4 通用的接口 5
Chapter 3. 系統用戶端 6
3.1 Web APIs 6
3.1.1 檔案傳輸 7
3.2 Authorization (認證授權) 8
3.2.1 認證授權設定 8
3.3 Mapping 8
3.4 Hadoop體系專案支援 9
3.5 啟動/關閉HDS與系統參數 9
Chapter 4. 系統架構 10
4.1 HTTP Server 10
4.1.1 Connection limits 11
4.1.2 Load Balancer 11
4.2 Lock Manager 11
4.3 Task Manager 12
4.4 Transfer 13
4.5 Metrics & Time Phase Logger 13
4.6 HDFS Interface 14
Chapter 5. HDS 儲存架構 15
5.1 讀取及寫入流程 15
5.2 HDS目錄結構設計 17
5.3 Table and Schema 18
5.4 大型資料於HDFS上的管理 19
Chapter 6. 實驗 20
6.1 測試環境 20
6.2 Overhead 21
6.3 Load Balance 22
6.4 Scalability and Fault Tolerance 23
6.5 Transparency 24
Chapter 7. 結論 26
參考資料 27
參考文獻 [1] Hadoop. Available: https://hadoop.apache.org/
[2] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop Distributed File System,” in Proc. of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, Washington, DC, USA, 2010.
[3] Tom White, The Small Files Problem. Available:
http://blog.cloudera.com/blog/2009/02/the-small-files-problem/
[4] HAR. Available:
https://hadoop.apache.org/docs/current/hadoop-archives/HadoopArchives.html
[5] SequenceFile. Available: https://wiki.apache.org/hadoop/SequenceFile
[6] HBase. Available: https://hbase.apache.org/
[7] Samba. Available: https://www.samba.org/
[8] FTP. Available: https://www.ietf.org/rfc/rfc959.txt
[9] Representational State Transfer. Available: https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm
[10] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google File System,” in Proc. of the Nineteenth ACM Symposium on Operating Systems Principles, New York, NY, USA, 2003.
[11] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A.Fikes, and R.E. Gruber, “Bigtable: A Distributed Storage System for Structured Data,” ACM Trans. Comput. Syst., vol. 26, no. 2, 2008.
[12] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, “ZooKeeper: Wait-free Coordination for Internet-scale Systems,” in Proc. of the 2010 USENIX Conference on USENIX Annual Technical Conference, Berkeley, CA, USA, 2010.
[13] B. Dong, J. Qiu, Q. Zheng, X. Zhong, J. Li, and Y. Li, “A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files,” in Proc. of the 2010 IEEE International Conference on Services Computing, Washington, DC, USA, 2010.
[14] X. Liu, J. Han, Y. Zhong, C. Han, and X. He, “Implementing WebGIS on Hadoop: A Case Study of Improving Small File I/O Performance on HDFS,” in Proc. of the 2009 IEEE International Conference on Cluster Computing and Workshops, 2009.
[15] 曾冠博. HDS: The Web-based Data Service over Hadoop. 成功大學分散式系統實驗
室, 2017.
[16] Hive. Available: https://hive.apache.org/
[17] Spark. Available: https://spark.apache.org/
[18] M. Lai, E. Koontz, A. Purtell, HBase Coprocessor. Available: https://blogs.apache.org/hbase/entry/coprocessor_introduction
[19] H. C. Hsiao, H. Y. Chung, H. Shen, and Y. C. Chao, “Load Rebalancing for Distributed File Systems in Clouds,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 5, 2013.
[20] H. C. Hsiao, H. Liao, S. T. Chen, and K. C. Huang, “Load Balance with Imperfect Information in Structured Peer-to-Peer Systems,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 4, 2011.
[21] JMC. Available: https://www.oracle.com/technetwork/java/javaseproducts/mission-control/java-mission-control-1998576.html
[22] Phoenix. Available: https://phoenix.apache.org/
[23] PosgreSQL. Available: https://www.postgresql.org/
[24] MySQL. Available: https://www.mysql.com/
[25] MSSQLServer. Available:
https://www.microsoft.com/en-us/sql-server/sql-server-2016
[26] Oracle. Available: https://www.oracle.com/database/index.html
[27] Hadoop List API Issues. Available:
https://issues.apache.org/jira/browse/HADOOP-10987
[28] Cloudera. Available: https://www.cloudera.com/
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2023-12-31起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2023-12-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw