進階搜尋


下載電子全文  
系統識別號 U0026-1808202017544500
論文名稱(中文) 考量服務層級協議的物聯網資料分析工作排程
論文名稱(英文) Prioritized Job Scheduling for SLA-Aware IoT Data Analytics
校院名稱 成功大學
系所名稱(中) 醫學資訊研究所
系所名稱(英) Institute of Medical Informatics
學年度 108
學期 2
出版年 109
研究生(中文) 呂伯駿
研究生(英文) Po-Chun Lu
學號 Q56074085
學位類別 碩士
語文別 英文
論文頁數 37頁
口試委員 口試委員-范耀中
口試委員-陳建志
口試委員-蕭宏章
指導教授-莊坤達
中文關鍵字 工作排程  雲端運算  多階層回饋佇列 
英文關鍵字 job scheduling  cloud computing  mult-level feedback queue 
學科別分類
中文摘要 物聯網資料分析服務的數量規模逐年增長,為了解決資料分析系統的技術債,一些企業如 Uber 也提出了端對端的資料分析框架來應對。而在這類框架中,一般是採用先到先服務 (First Come First Served) 的方式來對工作進行排序,然而在線上物聯網資料分析領域,資 料流會受到網路環境的穩定性影響,存在工作運算所需的資料不可用的問題。而為了滿足不 同的服務層級協議,不同的工作會有不同的截止期限,運算時間,以及運算所需資源,其工 作的重要度並不同。此外在線上環境,並無法提前得知所有工作的出現時間來先行排程。為 了處理這些問題,本篇論文提出了一多階層回饋佇列 (Multilevel Feedback Queue) 的組合式 排程策略來處理工作優先度議題。此外也設計了基於動態規劃,能考慮到資料不可用問題的 演算法,並透過實驗證明我們的方法在各種工作種類與數量分佈下,相比先到先服務 (First Come First Served)以及基於貪婪演算法的方式有更佳的效果。
英文摘要 The number of IoT data analysis services is increasing over years. In order to handle the technical debts of the data analysis system, some companies such as Uber have proposed an end- to-end data analysis framework to deal with these issues. In this type of frameworks, it generally uses the First Come First Served method to execute jobs. However, in the field of online IoT data analysis, the data flow will be affected by the stability of the network environment. This causes that data required for the analytic jobs may not be available. In addition, to meet different service-level agreements (SLA), different jobs have different deadlines, execution time, and computing resources. Thus the importance of the jobs is different. Moreover, in the online environment, it is difficult to know the arrival time of all jobs to schedule in advance. In order to meet these needs, this thesis proposes a Multilevel Feedback Queue strategy to deal with work priority issues. We also design a dynamic programming algorithm, which can take into account the problem of data unavailability. Finally, we also examine the performance of the proposed method via various job importance level distributions and demonstrate that it outperforms the FCFS and greedy-based methods.
論文目次 中文摘要 ............................................ i
Abstract............................................. ii
Acknowledgment ........................................ iii Contents............................................. iv
ListofTables.......................................... vi
ListofFigures ......................................... vii
1 Introduction......................................... 1
2 RelatedWork........................................ 7
2.1 JobSchedulingforDataAnalysis .......................... 7
2.2 DataAnalysisFramework .............................. 9
3 ProblemStatements .................................... 10
4 Methodology......................................... 14
4.1 Framework....................................... 14
4.2 JobScheduler..................................... 16
4.2.1 QueueSelection................................ 16
4.2.2 InnerqueueScheduling............................ 17
4.2.3 Reallocation.................................. 20
5 ExperimentalResults.................................... 21
5.1 ExperimentalSetting................................. 21
5.2 ExperimentalResults................................. 23
6 Conclusions ......................................... 33
Bibliography .......................................... 34
參考文獻 [1] J. Hermann and M. D. Balso, “Iot industry status,” https://www.sipo.org.tw/ industry-overview/industry-state-quo/iot-industry-state-quo.html, July 12 2018.
[2] D. Sculley, G. Holt, and D. G. et al., “Hidden technical debt in machine learning systems,” in Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015, pp. 2503–2511.
[3] “Kubeflow,” https://www.kubeflow.org.
[4] “mlflow,” https://mlflow.org.
[5] “metaflow,” https://metaflow.org.
[6] J. Hermann and M. D. Balso, “Meet michelangelo: Uber’s machine learning platform,” https://eng.uber.com/michelangelo-machine-learning-platform/, September 5 2017.
[7] “Hybrid cloud considerations for big data and analytics,” https://www.omg.org/cloud/ deliverables/CSCC-Hybrid-Cloud-Considerations-for-Big-Data-and-Analytics.pdf, Cloud Standards Customer Council, Tech. Rep., 2017.
[8] M. Zaharia, M. Chowdhury, and M. J. F. et al., “Spark: Cluster computing with working sets,” in 2nd USENIX Workshop on Hot Topics in Cloud Computing. USENIX Associa- tion, 2010.
[9] V. K. Vavilapalli and A. C. M. et al., “Apache hadoop YARN: yet another resource negotiator,” in ACM Symposium on Cloud Computing. ACM, 2013, pp. 5:1–5:16. [Online]. Available: https://doi.org/10.1145/2523616.2523633
[10] B. Hindman, A. Konwinski, M. Zaharia, and et al., “Mesos: A platform for fine-grained resource sharing in the data center,” in Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation. USENIX Association, 2011.
[11] A. Verma and L. P. et al., “Large-scale cluster management at google with borg,” in Proceedings of the Tenth European Conference on Computer Systems. ACM, 2015, pp. 18:1–18:17.
[12] B. Lucier, I. Menache, J. Naor, and J. Yaniv, “Efficient online scheduling for deadline- sensitive jobs: extended abstract,” in 25th ACM Symposium on Parallelism in Algorithms and Architectures. ACM, 2013, pp. 305–314.
[13] Y. Azar and I. K. et al., “Truthful online scheduling with commitments,” in Proceedings of theSixteenthACMConferenceonEconomicsandComputation. ACM,2015,pp.715–732.
[14] L. Chen, S. Liu, B. Li, and B. Li, “Scheduling jobs across geo-distributed datacenters with max-min fairness,” IEEE Trans. Netw. Sci. Eng., vol. 6, no. 3, pp. 488–500, 2019.
[15] Z. Huang, B. Balasubramanian, and M. W. et al., “Need for speed: CORA scheduler for optimizing completion-times in the cloud,” in 2015 IEEE Conference on Computer Communications. IEEE, 2015, pp. 891–899.
[16] Z. Huang, B. Balasubramanian, M. Wang, T. Lan, M. Chiang, and D. H. K. Tsang, “RUSH: A robust scheduler to manage uncertain completion-times in shared clouds,” in 36th IEEE International Conference on Distributed Computing Systems. IEEE Computer Society, 2016, pp. 242–251.
[17] Z. Hu, B. Li, Z. Qin, and R. S. M. Goh, “Job scheduling without prior information in big data processing systems,” in 37th IEEE International Conference on Distributed Comput- ing Systems. IEEE Computer Society, 2017, pp. 572–582.
[18] Z. Hu, B. Li, Q. Zheng, and R. S. M. Goh, “Low latency big data processing without prior information,” IEEE Trans. Cloud Comput., 2020.
[19] Z. Hu, B. Li, C. Chen, and X. Ke, “Flowtime: Dynamic scheduling of deadline-aware work- flows and ad-hoc jobs,” in 38th IEEE International Conference on Distributed Computing Systems. IEEE Computer Society, 2018, pp. 929–938.
[20] Y. Bao, Y. Peng, C. Wu, and Z. Li, “Online job scheduling in distributed machine learning clusters,” in 2018 IEEE Conference on Computer Communications. IEEE, 2018, pp. 495– 503.
[21] J. Lu, P. Li, and K. W. et al., “Topology-aware job scheduling for machine learning cluster,” in 2019 IEEE Global Communications Conference. IEEE, 2019, pp. 1–6.
[22] R. Zhou, Z. Li, C. Wu, and Z. Huang, “An efficient cloud market mechanism for computing jobs with soft deadlines,” IEEE/ACM Trans. Netw., vol. 25, no. 2, pp. 793–805, 2017.
[23] N. Jain, I. Menache, J. Naor, and J. Yaniv, “Near-optimal scheduling mechanisms for deadline-sensitive jobs in large computing clusters,” ACM Trans. Parallel Comput., vol. 2, no. 1, pp. 3:1–3:29, 2015. [Online]. Available: https://doi.org/10.1145/2742343
[24] Y. Bao, Y. Peng, and C. Wu, “Deep learning-based job placement in distributed machine learning clusters,” in 2019 IEEE Conference on Computer Communications. IEEE, 2019, pp. 505–513.
[25] H. Xu, Y. Liu, and W. C. L. et al., “Efficient online resource allocation in heterogeneous clusters with machine variability,” in 2019 IEEE Conference on Computer Communica-
tions. IEEE, 2019, pp. 478–486.
[26] V. Jalaparti, P. Bod ́ık, I. Menache, S. Rao, K. Makarychev, and M. Caesar, “Network- aware scheduling for data-parallel jobs: Plan when you can,” Computer Communication Review, vol. 45, no. 5, pp. 407–420, 2015.
[27] Y. Cao, L. Lu, J. Yu, S. Qian, and Y. Z. et al., “Online cost-aware service requests scheduling in hybrid clouds for cloud bursting,” in Web Information Systems Engineering - WISE 2017 - 18th International Conference, ser. Lecture Notes in Computer Science, vol. 10569. Springer, 2017, pp. 259–274.
[28] L. Jiang, L. D. Xu, H. Cai, Z. Jiang, F. Bu, and B. Xu, “An iot-oriented data storage framework in cloud computing platform,” IEEE Trans. Ind. Informatics, vol. 10, no. 2, pp. 1443–1451, 2014.
[29] G. Mokhtari, A. Anvari-Moghaddam, and Q. Zhang, “A new layered architecture for future big data-driven smart homes,” IEEE Access, vol. 7, pp. 19 002–19 012, 2019.
[30] X. Liu and P. S. Nielsen, “A hybrid ict-solution for smart meter data analytics,” CoRR, vol. abs/1606.05787, 2016.
[31] T. Wilcox, N. Jin, P. A. Flach, and J. Thumim, “A big data platform for smart meter data analytics,” Comput. Ind., vol. 105, pp. 250–259, 2019.
[32] J. Egeblad and D. Pisinger, “Heuristic approaches for the two- and three-dimensional knapsack packing problem,” Comput. Oper. Res., vol. 36, no. 4, pp. 1026–1049, 2009.
[33] M. Bartlett, A. M. Frisch, and Y. H. et al., “The temporal knapsack problem and its solution,” in Integration of AI and OR Techniques in Constraint Programming for Com- binatorial Optimization Problems, Second International Conference, ser. Lecture Notes in Computer Science, vol. 3524. Springer, 2005, pp. 34–48.
[34] “Big data architecture,” https://docs.microsoft.com/zh-tw/azure/architecture/guide/ architecture-styles/big-data, November 20 2019.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2020-08-26起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2020-08-26起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw