«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

j.issn.2095-0411.2024.05.007]
点击复制

Spark异构集群负载均衡调度策略()

分享到：

常州大学学报(自然科学版)[ISSN:2095-0411/CN:32-1822/N]

卷:: 第36卷
期数:: 2024年05期

页码:: 61-70

栏目:: 计算机与信息工程

出版日期:: 2024-09-28

文章信息/Info

Title:: Load balancing scheduling policies for Spark heterogeneous clusters

文章编号:: 2095-0411(2024)05-0061-10

作者:: 陶宇炜¹; 谢爱娟²; 1.常州大学信息化建设与大数据处, 江苏常州 213164; 2.常州大学石油化工学院, 江苏常州 213164

Author(s):: TAO Yuwei¹; XIE Aijuan²; 1.Office of IT Services and Big Data, Changzhou University, Changzhou 213164, China; 2.School of Petrochemical Engineering,Changzhou University, Changzhou 213164, China

关键词:: 异构性; 作业调度; 负载均衡; Spark

Keywords:: heterogeneous; job scheduling; load balancing; Spark

分类号:: TP 302

DOI:: 10.3969/j.issn.2095-0411.2024.05.007

文献标志码:: A

摘要:: 针对Spark可扩展分布式平台在作业任务调度时,没有考虑异构集群节点计算能力的差异和负载均衡问题,导致系统性能受到影响,文章构建了一种Spark环境下异构集群节点负载均衡调度策略。计算节点根据抽样算法,预测数据分布特征,将数据均衡划分为多个分区,根据异构集群节点静态负载和动态负载权重分配,获得异构集群节点实时负载,动态调度作业任务。最后,在异构集群上,通过Wordcount,TeraSort,K-means 三种基准测试比较分析。实验结果表明,该算法运行时间明显减少,异构集群的性能得到提升。

Abstract:: Aiming at the problem that the Spark scalable distributed platform does not consider the computing capabilities of heterogeneous cluster nodes and load balance during job task scheduling, which affects the system performance, this paper constructs heterogeneous cluster nodes load balance scheduling policy under the Spark environment.Heterogeneous cluster node predicts the data distribution characteristics according to the sampling algorithm, divides the data into balancing partitions.According to the static load and dynamic load weight distribution, heterogeneous cluster node obtains the real-time load, and dynamically schedules job tasks.Finally,Wordcount,TeraSort, and K-means three benchmark tests were used to compare and analyze during heterogeneous cluster operation. Experimental results show that this algorithm can reduce the execution time significantly, and improve the performance of heterogeneous cluster.

参考文献/References:

[1] 冯兴杰, 贺阳. 基于节点性能的Hadoop作业调度算法改进[J]. 计算机应用与软件, 2017, 34(5): 223-228.
[2] 简琤峰, 平靖, 张美玉. 面向边缘计算的Storm边缘节点调度优化方法[J]. 计算机科学, 2020, 47(5): 277-283.
[3] ZAHARIA M, CHOWDHURY M, FRANKLIN M J, et al. Spark: cluster computing with working sets[C]//Proceedings of the 2nd USENIX conference on hot topics in cloud computing. New York: ACM, 2010: 10.
[4] WIKTORSKI T. Data-intensive systems: principles and fundamentals using Hadoop and Spark[M]. Cham: Springer International Publishing, 2019.
[5] 郑晓薇, 项明, 张大为, 等. 基于节点能力的Hadoop集群任务自适应调度方法[J]. 计算机研究与发展, 2014, 51(3): 618-626.
[6] XU X L, CAO L L, WANG X H. Adaptive task scheduling strategy based on dynamic workload adjustment for heterogeneous hadoop clusters[J]. IEEE Systems Journal, 2016, 10(2): 471-482.
[7] YONG M, GAREGRAT N, MOHAN S. Towards a resource aware scheduler in Hadoop[C]//Proc of the 7th IEEE International Conference on Web Services.[S.l.]: IEEE, 2009: 102-109.
[8] 徐佳俊, 刘功申, 苏波, 等. 基于Spark的异构集群调度策略研究[J]. 计算机科学与应用, 2016(11): 692-704.
[9] 胡亚红, 盛夏, 毛家发. 资源不均衡Spark环境任务调度优化算法研究[J]. 计算机工程与科学, 2020, 42(2): 203-209.
[10] CHAMBERS B, ZAHARIA M. Spark: the definitive guide[M]. 张岩峰, 王方京, 陈晶晶, 译. 北京: 中国电力出版社, 2020.
[11] KOTOULAS S, OREN E, VAN HARMELEN F. Mind the data skew: distributed inferencing by speed dating in elastic regions[C]//Proceedings of the 19th international conference on World wide web. New York: ACM, 2010: 531-540.
[12] DAVIDSON A, OR A. Optimizing shuffle performance in Spark[EB/OL].[2018-11-25]. https://people.eecs.berkeley.edu/～kubitron/courses/cs262a-F13/projects/reports/project16_report.pdf.
[13] 詹剑锋, 高婉铃, 王磊, 等. Big Data Bench: 开源的大数据系统评测基准[J]. 计算机学报, 2016, 39(1): 196-211.

备注/Memo

备注/Memo:: 收稿日期: 2024-02-20。
基金项目: 2021年江苏省教育科学"十四五"规划立项课题资助项目(D/2021/01/131); 2021年常州大学石油化工学院教育教学研究课题资助项目(SHJY202101)。
作者简介: 陶宇炜(1968—), 男, 江苏常州人, 硕士, 高级实验师。E-mail: tyw@cczu.edu.cn

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed203
全文下载/Downloads162
评论/Comments

更新日期/Last Update: 1900-01-01