
HOSHMAND – An AI-Driven Compute Scheduler
Abstract
In this news articles we want to talk about ‘HOSHMAND’ – an AI-based system utilizing a customized recurrent neural network framework for superior job scheduling in dynamic cloud environments. HOSHMAND streamlines task execution across various nodes, manages diverse resources efficiently, and significantly trims job allocation time compared to conventional methods. It remembers past configurations, thus eliminating redundant scheduling computations, saving computing resources, and expediting task execution. Tested on cloud-based datasets, HOSHMAND has shown substantial improvements in scheduling efficacy, thus redefining the horizon of AI-enabled job scheduling for cloud computing.
Introduction
The consistent diffusion of Artificial Intelligence (AI) across diverse sectors has brought forth a captivating breakthrough in AI-aided compute scheduling at this year’s Computer Software & Applications Conference (COMPSAC 2024). The collaboration involving Prof. Dr. Julian Kunkel, Ph.D. candidate Aasish Kumar Sharma, and research assistant Michael Bidollahkhani from the University of Göttingen has gained accolades for their cutting-edge studies.
The GWDG (Gesellschaft für Wissenschaftliche Datenverarbeitung MBH Göttingen) affiliated research group presented a transformational concept around AI compute schedulers in their research paper [1] which could optimally enhance the job allocation time in such systems. Their innovative approach leverages AI techniques for early job allocation time prediction which marks a significant step forward in computational task allocation processes. This unique methodology could revolutionize various computing-intensive industries, heralding a promising era in compute scheduling research. The group’s pioneering contributions to COMPSAC 2024 underscores the University of Göttingen’s leading role in driving global technological advancements.
Comparative Analysis
HOSHMAND’s performance was contrasted with conventional load-balancing algorithms like Opportunistic Load Balancing (OLB), Minimum Execution Time (MET), and Minimum Completion Time (MCT) [2]. The comparison revealed HOSHMAND’s exceptional efficiency in scheduling response time and resource utilization. Conventional static scheduling strategies algorithms (such as OLB, MET, MCT, et al.) showed increased response times with rising nodes, resulting in slower performance as compared to HOSHMAND. HOSHMAND proved to be notably responsive and adaptive due to its low prediction times which utilize both historical and real-time data. For more details, refer to the article [1].
Conclusion and Future Directions
The outcomes demonstrate the prominent effectiveness of the AI-based RNN methodology. This approach effectively overcomes the inherent limitations of traditional scheduling algorithms by adapting to shifting job specifications and system conditions, promoting resource efficiency, and enabling continuous learning.
Future research is planned to analyse HOSHMAND’s performance in varying cloud computing environments while extending its capabilities to handle a broader set of scheduling scenarios.
Acknowledgements
The research under DECICE Project from the University of Göttingen is primarily funded by the European Horizon Project (Grant No. 101092582). The project is dedicated to developing a robust Digital Twin Platform as a Service (DT-PaaS). We also gratefully acknowledge the contributions from GWDG and express our gratitude to the Federal Ministry of Education and Research and state governments (visible at https://www.nhr-verein.de/unsere-partner), whose unified support and funding through the National High Performance Computing (NHR) have decisively propelled the project’s trajectory towards success.
References:
[1] Michael Bidollahkhani, Aasish Kumar Sharma, and Julian M. Kunkel. HOSHMAND: Accelerated AI-Driven Scheduler Emulating Conventional Task Distribution Techniques for Cloud Workloads. In: 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC), pages 2313–2320.
[2] S. K. Mishra, B. Sahoo, and P. P. Parida. Load balancing in cloud computing: a big picture. In: Journal of King Saud UniversityComputer and Information Sciences, 32(2):149–158, 2020.
Author(s): Felix Stein, GWDG (Gesellschaft für Wissenschaftliche Datenverarbeitung MBH Göttingen)