[1]陶宇炜,谢爱娟.Spark异构集群负载均衡调度策略[J].常州大学学报(自然科学版),2024,36(05):61-70.[doi:10.3969/j.issn.2095-0411.2024.05.007]
 TAO Yuwei,XIE Aijuan.Load balancing scheduling policies for Spark heterogeneous clusters[J].Journal of Changzhou University(Natural Science Edition),2024,36(05):61-70.[doi:10.3969/j.issn.2095-0411.2024.05.007]
点击复制

Spark异构集群负载均衡调度策略()
分享到:

常州大学学报(自然科学版)[ISSN:2095-0411/CN:32-1822/N]

卷:
第36卷
期数:
2024年05期
页码:
61-70
栏目:
计算机与信息工程
出版日期:
2024-09-28

文章信息/Info

Title:
Load balancing scheduling policies for Spark heterogeneous clusters
文章编号:
2095-0411(2024)05-0061-10
作者:
陶宇炜1 谢爱娟2
1.常州大学 信息化建设与大数据处, 江苏 常州 213164; 2.常州大学 石油化工学院, 江苏 常州 213164
Author(s):
TAO Yuwei1 XIE Aijuan2
1.Office of IT Services and Big Data, Changzhou University, Changzhou 213164, China; 2.School of Petrochemical Engineering,Changzhou University, Changzhou 213164, China
关键词:
异构性 作业调度 负载均衡 Spark
Keywords:
heterogeneous job scheduling load balancing Spark
分类号:
TP 302
DOI:
10.3969/j.issn.2095-0411.2024.05.007
文献标志码:
A
摘要:
针对Spark可扩展分布式平台在作业任务调度时,没有考虑异构集群节点计算能力的差异和负载均衡问题,导致系统性能受到影响,文章构建了一种Spark环境下异构集群节点负载均衡调度策略。计算节点根据抽样算法,预测数据分布特征,将数据均衡划分为多个分区,根据异构集群节点静态负载和动态负载权重分配,获得异构集群节点实时负载,动态调度作业任务。最后,在异构集群上,通过Wordcount,TeraSort,K-means 三种基准测试比较分析。实验结果表明,该算法运行时间明显减少,异构集群的性能得到提升。
Abstract:
Aiming at the problem that the Spark scalable distributed platform does not consider the computing capabilities of heterogeneous cluster nodes and load balance during job task scheduling, which affects the system performance, this paper constructs heterogeneous cluster nodes load balance scheduling policy under the Spark environment.Heterogeneous cluster node predicts the data distribution characteristics according to the sampling algorithm, divides the data into balancing partitions.According to the static load and dynamic load weight distribution, heterogeneous cluster node obtains the real-time load, and dynamically schedules job tasks.Finally,Wordcount,TeraSort, and K-means three benchmark tests were used to compare and analyze during heterogeneous cluster operation. Experimental results show that this algorithm can reduce the execution time significantly, and improve the performance of heterogeneous cluster.

参考文献/References:

[1] 冯兴杰, 贺阳. 基于节点性能的Hadoop作业调度算法改进[J]. 计算机应用与软件, 2017, 34(5): 223-228.
[2] 简琤峰, 平靖, 张美玉. 面向边缘计算的Storm边缘节点调度优化方法[J]. 计算机科学, 2020, 47(5): 277-283.
[3] ZAHARIA M, CHOWDHURY M, FRANKLIN M J, et al. Spark: cluster computing with working sets[C]//Proceedings of the 2nd USENIX conference on hot topics in cloud computing. New York: ACM, 2010: 10.
[4] WIKTORSKI T. Data-intensive systems: principles and fundamentals using Hadoop and Spark[M]. Cham: Springer International Publishing, 2019.
[5] 郑晓薇, 项明, 张大为, 等. 基于节点能力的Hadoop集群任务自适应调度方法[J]. 计算机研究与发展, 2014, 51(3): 618-626.
[6] XU X L, CAO L L, WANG X H. Adaptive task scheduling strategy based on dynamic workload adjustment for heterogeneous hadoop clusters[J]. IEEE Systems Journal, 2016, 10(2): 471-482.
[7] YONG M, GAREGRAT N, MOHAN S. Towards a resource aware scheduler in Hadoop[C]//Proc of the 7th IEEE International Conference on Web Services.[S.l.]: IEEE, 2009: 102-109.
[8] 徐佳俊, 刘功申, 苏波, 等. 基于Spark的异构集群调度策略研究[J]. 计算机科学与应用, 2016(11): 692-704.
[9] 胡亚红, 盛夏, 毛家发. 资源不均衡Spark环境任务调度优化算法研究[J]. 计算机工程与科学, 2020, 42(2): 203-209.
[10] CHAMBERS B, ZAHARIA M. Spark: the definitive guide[M]. 张岩峰, 王方京, 陈晶晶, 译. 北京: 中国电力出版社, 2020.
[11] KOTOULAS S, OREN E, VAN HARMELEN F. Mind the data skew: distributed inferencing by speed dating in elastic regions[C]//Proceedings of the 19th international conference on World wide web. New York: ACM, 2010: 531-540.
[12] DAVIDSON A, OR A. Optimizing shuffle performance in Spark[EB/OL].[2018-11-25]. https://people.eecs.berkeley.edu/~kubitron/courses/cs262a-F13/projects/reports/project16_report.pdf.
[13] 詹剑锋, 高婉铃, 王磊, 等. Big Data Bench: 开源的大数据系统评测基准[J]. 计算机学报, 2016, 39(1): 196-211.

备注/Memo

备注/Memo:
收稿日期: 2024-02-20。
基金项目: 2021年江苏省教育科学"十四五"规划立项课题资助项目(D/2021/01/131); 2021年常州大学石油化工学院教育教学研究课题资助项目(SHJY202101)。
作者简介: 陶宇炜(1968—), 男, 江苏常州人, 硕士, 高级实验师。E-mail: tyw@cczu.edu.cn
更新日期/Last Update: 1900-01-01