AI-Driven Job Scheduling in Cloud Computing: A Comprehensive Review
Academic Background
With the rapid development of cloud computing technology, the demand for efficient job scheduling in dynamic and heterogeneous cloud environments has grown significantly. Traditional scheduling algorithms perform well in simple systems but are no longer sufficient for modern, complex cloud infrastructures. Issues such as resource heterogeneity, energy consumption, and real-time adaptability have prompted researchers to explore AI-driven solutions. AI-based job scheduling techniques, including machine learning, optimization techniques, heuristic techniques, and hybrid AI models, offer greater adaptability, scalability, and energy efficiency. This paper aims to comprehensively review AI-driven job scheduling technologies, analyze the strengths and weaknesses of existing methods, and explore how AI can overcome the limitations of traditional algorithms.
Source of the Paper
This paper is co-authored by Yousef Sanjalawe, Salam Al-E’mari, Salam Fraihat, and Sharif Makhadmeh, and was published in the journal Artificial Intelligence Review on March 24, 2025. The DOI of the paper is 10.1007/s10462-025-11208-8.
Main Content
1. Research Motivation and Background
Cloud computing has transformed the management and distribution of computational resources through on-demand services. However, as cloud infrastructures expand and diversify, job scheduling has become a critical challenge. Traditional scheduling algorithms such as First-Come-First-Serve (FCFS), Round Robin, and Priority Scheduling perform poorly in cloud environments because they cannot handle the complexity of large task volumes, resource heterogeneity, and dynamic workloads. Therefore, researchers have turned to AI-based solutions that can adapt to system changes in real-time and continuously optimize resource allocation.
2. AI-Driven Job Scheduling Technologies
This paper provides a comprehensive review of AI-driven job scheduling technologies, which are mainly categorized as follows:
2.1 Machine Learning Approaches
Machine learning plays a crucial role in job scheduling by analyzing large datasets to identify patterns and make predictions, thereby improving scheduling efficiency. This paper details the applications of supervised learning, reinforcement learning, unsupervised learning, and deep learning in job scheduling.
Supervised Learning: By training models to predict job execution times and resource requirements, more accurate scheduling decisions can be made. For example, Onyema et al. (2024) proposed a task scheduling model for multi-cloud environments, using supervised learning techniques for task classification and resource allocation.
Reinforcement Learning: Through reward or penalty mechanisms, reinforcement learning models can autonomously explore different scheduling strategies, gradually optimizing system performance. For instance, Shi et al. (2022) proposed a Spark job scheduling method based on Deep Reinforcement Learning (DRL), significantly reducing cluster usage costs.
Unsupervised Learning: In the absence of labeled data, unsupervised learning algorithms can discover hidden patterns, such as job clustering or resource usage trends, to allocate resources more effectively. For example, Singhal et al. (2024) proposed a job scheduling method based on the Rock Hyrax model, which reduces job completion time and energy consumption by clustering resources.
Deep Learning: Deep learning techniques such as Artificial Neural Networks (ANNs) and Deep Q-Networks (DQNs) excel in complex cloud environments. For instance, Lin et al. (2018) proposed a cloud job scheduling strategy based on DQN, significantly reducing job completion time.
2.2 AI Optimization Techniques
Optimization techniques play a key role in job scheduling by finding the best solutions in dynamic environments. This paper introduces evolutionary algorithms, swarm intelligence algorithms, bio-inspired algorithms, and gradient-based optimization methods.
Evolutionary Algorithms: For example, Genetic Algorithms (GAs) simulate natural selection processes to gradually optimize resource allocation and job sequences. Lane et al. (2022) proposed a dynamic hierarchical connection system based on GA, significantly reducing the completion time of heterogeneous tasks.
Swarm Intelligence Algorithms: Techniques such as Particle Swarm Optimization (PSO) and Ant Colony Optimization (ACO) simulate group behavior to find optimal scheduling solutions. For example, Gouasmi et al. (2017) proposed a distributed scheduling algorithm based on PSO, significantly reducing the cost of MapReduce jobs.
2.3 Hybrid AI Models
Hybrid AI models combine multiple AI technologies to provide more comprehensive solutions. For instance, Ali and Ali (2023) proposed a cloud-fog edge scheduling method that integrates the Catastrophic Genetic Algorithm (CGA) with a blockchain-based trust framework, significantly improving job scheduling efficiency.
3. Future Research Directions
This paper proposes three main directions for future research: scalability, better integration of AI with traditional scheduling methods, and the application of emerging technologies such as edge computing and blockchain. These directions aim to further enhance the adaptability, security, and energy efficiency of cloud job scheduling.
Conclusion and Significance
By comprehensively reviewing AI-driven job scheduling technologies, this paper reveals their immense potential in cloud computing. AI technologies not only improve resource utilization and system performance but also significantly reduce energy consumption and operational costs. This research provides important theoretical support and practical guidance for future cloud job scheduling, offering significant scientific value and application prospects.
Research Highlights
- Comprehensiveness: This paper is the first to comprehensively review AI-driven job scheduling technologies, covering multiple fields such as machine learning, optimization techniques, and hybrid AI models.
- Innovation: It proposes several novel scheduling methods, such as Spark job scheduling based on deep reinforcement learning and cloud-fog edge scheduling based on genetic algorithms.
- Practicality: The research findings can be directly applied to real-world cloud computing environments, helping enterprises improve resource utilization and reduce operational costs.
Other Valuable Information
This paper also explores the application of AI technologies in edge computing and blockchain, providing new directions for future research. For example, integrating blockchain technology can enhance the security and transparency of job scheduling, while edge computing can significantly reduce latency and improve real-time performance.
Through this research, we can see the broad application prospects of AI technologies in cloud computing, which are expected to further drive the development and application of cloud computing technologies in the future.