進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0508201815350100
論文名稱(中文) 基於Hadoop之輕量級資料轉傳系統
論文名稱(英文) Hadoop Data Service Lite in a Server Box
校院名稱 成功大學
系所名稱(中) 資訊工程學系
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 106
學期 2
出版年 107
研究生(中文) 孫崇恩
研究生(英文) Chung-En Sun
學號 P76051129
學位類別 碩士
語文別 中文
論文頁數 26頁
口試委員 口試委員-劉炳宏
口試委員-許慶賢
口試委員-趙育昌
口試委員-賴冠州
指導教授-蕭宏章
中文關鍵字 Hadoop  HDFS  HBase  分散式儲存  資料一致性 
英文關鍵字 Hadoop  HDFS  HBase  Distributed Storage System  Data Consistency 
學科別分類
中文摘要 Hadoop是目前被世界許多大公司與研究機構廣泛使用與討論的大型分散式儲存與運算框架的系統,其儲存系統HDFS(Hadoop Distributed File System),旨在處理TB級別以上的巨量資料,然而目前除了Hadoop原生檔案系統的操作與使用,還有許多元件與功能,是依循其框架或設計理念,而被研發出能增進HDFS管理效能,或是亦於讓使用者更方便操作HDFS。目前的HDFS主要使用物件導向程式(Object-oriented programming) 程式JAVA撰寫,在使用上,除了系統本身各元件之架構與資料流的運行,需花一定時間了解,其撰寫所使用的程式語言,也是學習上的一項門檻,對於使用Hadoop有更進階需求的開發者,無疑需要累積相當的實力與開發經驗,才能順利在系統上增添功能,惟HDFS在不斷的增加、修正版本後,也陸續提供了許多的框架與API(Application Programming Interface),讓開發者能省略部分底層設計原型,進而達到快速開發與方便上手的設計模式與技巧。
本篇論文中,我們基於HDFS之分散式儲存系統,以及其他常見可用來儲存資料的工具,如FTP(File Transfer Protocol)、HTTP(HyperText Transfer Protocol)、Linux,設計出目的在省略巨量資料平台底層的複雜操作,以目前被廣泛使用且簡易的API,來設計跨不同檔案伺服器的資料轉傳系統,降低HDFS對於使用者入門的障礙,並探討Hadoop與此轉傳系統,建置於單一伺服器的極限與讀寫議題。
英文摘要 The collection and analysis of data has always been the trend of current technology development. How to accommodate huge amounts of data through high-speed and efficient systems, which can be handled by non-traditional stand-alone database, in the face of such level of data, It is necessary to use multi-node or even decentralized systems to achieve this goal. In addition to the scalability of the storage space, such systems have also made a certain amount of consideration and optimization for a large number of read and write requests, and the most The highly respected system is the Hadoop project. This system achieves users' huge amounts of data through the advantages of master-slave architecture, multi-node backup mechanism, control of read-write locks, and the ability to arbitrarily expand related analysis and calculation projects. demand. In this paper, we have a distributed storage system based on HDFS, and other tools that can be used to store data, such as FTP (File Transfer Protocol), HTTP (HyperText Transfer Protocol), Linux, designed to omit huge data platforms. The underlying complex operation, with the currently widely used and simple API, to design a data transfer system across different file servers, to reduce the barriers to HDFS entry for users, and to explore Hadoop and this transfer system, built in a single Server limits and read and write issues.
論文目次 目錄
摘要 i
Extended Abstract ii
致謝 v
目錄 vi
表目錄 viii
圖目錄 ix
Chapter 1. 簡介 1
Chapter 2. 研究背景 5
2.1 HDFS 5
2.2 HBase 6
2.3 Phoenix 6
2.4 Zookeeper 6
Chapter 3. 動機與議題 7
3.1動機 7
3.2議題 8
3.2.1 Network 8
3.2.2 Heavy software components 8
3.2.3 Load balancing 8
3.2.4 Failure 9
3.2.5 Transparency 9
3.2.6 Maintenance 9
3.2.7 Authentication 9
Chapter 4. HDSLite技術架構 10
4.1 使用者角度 10
4.1.1 HTTP APIs 11
4.1.2傳輸檔案 11
4.1.3 其他服務 12
4.1.4 啟動/關閉 HDSLite 與 config 設定 13
4.2 HDSLite系統架構 13
4.2.1 HTTP Server 13
4.2.2 Load Balance 14
4.2.3 Lock Manager 15
4.2.4 Transfer 16
Chapter 5. 實驗 17
5.1測試環境 17
5.2 Stress Test for Write’s 18
5.3 Stress Test for Read’s 21
Chapter 6. 結論 24
參考資料 25

參考文獻 [1] Hadoop. Available: https://hadoop.apache.org/
[2] HDFS. Available: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
[3] HBase. Available: https://hbase.apache.org/
[4] MapReduce. Available: J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, Jan. 2008.
[5] Hive. Available: https://hive.apache.org/
[6] VNIC. Available: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.2/html/Technical_Reference_Guide/Virtual_Network_Interface_Controller_VNIC.html
[7] Load Balance. Available: H. C. Hsiao, H. Y. Chung, H. Shen, and Y. C. Chao, “Load Rebalancing for Distributed File Systems in Clouds,” IEEE Transactions on Parallel and Distributed Systems, vol. 24, no. 5, pp. 951–962, May 2013.
[8] Transparency. Available: https://www.techopedia.com/definition/30732/protocol-transparent
[9] Memstore. Available: https://hbase.apache.org/book.html
[10] Datanode. Available: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDataNodeAdminGuide.html
[11] Namenode. Available: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html

[12] Zookeeper. Available: P. Hunt, M. Konar, F. P. Junqueira, and B. Reed, “ZooKeeper: Wait-free Coordination for Internet-scale Systems,” in Proc. of the 2010 USENIX Conference on USENIX Annual Technical Conference, Berkeley, CA, USA, 2010, pp. 11–11.
[13] SQL. Available: https://www.mysql.com/
[14] JDBC. Available: https://docs.oracle.com/javase/7/docs/api/java/sql/package-summary.html
[15] Google File System. Available: S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google File System,” in Proc. of the Nineteenth ACM Symposium on Operating Systems Principles, New York, NY, USA, 2003, pp. 29–43.
[16] Secondary Namenode. Available: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
[17] BigTable. Available: F. Chang et al., “Bigtable: A Distributed Storage System for Structured Data,” ACM Trans. Comput. Syst., vol. 26, no. 2, p. 4:1–4:26, Jun. 2008.
[18] RegionServer. Available: https://hbase.apache.org/book.html
[19] FTP. Available: FTP. Available: https://www.ietf.org/rfc/rfc959.txt
[20] Spark. Available: https://spark.apache.org/
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2023-12-31起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2023-12-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw