HTDcr: a job execution framework for high-throughput computing on supercomputers

Jiang, Jiazhi; Huang, Dan<sup>*</sup>; Chen, Hu<sup>*</sup>; Lu, Yutong; Liao, Xiangke

doi:10.1007/s11432-022-3657-3

摘要

High-throughput computing (HTC) is a computing paradigm that aims to accomplish jobs by easily breaking them into smaller, independent components. However, it requires a large amount of computing power for a long time. Most existing HTC frameworks are job-oriented without support for coscheduling with hardware architecture and task-level execution. Also, most of the frameworks reach a limited scale, and their usability needs further improvement. Herein, we present HTDcr, a job execution framework for the HTC on supercomputers. This study aims to improve the throughput, task dispatching, and usability of the framework. In detail, the throughput optimizations include a sophisticated designed task management system, a hierarchical scheduler, and the co-optimization of the task-scheduling strategy with the application and hardware characteristics. The optimizations for usability include a programable execution workflow, mechanisms for more robust and reliable service qualities, and a fine-grained resource allocation system for the colocation of multiple jobs. According to our evaluations, HTDcr can achieve outstanding scalability and high throughput on large-scale clusters for the HTC workload. We evaluate HTDcr with several microbenchmarks and real-world applications on Tianhe-2 and Sunway TaihuLight to demonstrate its effects on existing design mechanisms. For instance, the task scheduling for two real-world applications integrated with the application and hardware characteristics achieves 1.7x and 1.9x speedups over the basic task-scheduling strategy.

单位
中山大学

全文

访问全文

分享分享被引浏览

更新时间：2024-03-23 09:01

HTDcr: a job execution framework for high-throughput computing on supercomputers

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友