DECICE

Interaction between the Framework and the Orchestration Platform

DECICE is an AI-based, open and portable cloud management framework for optimization and deployment of applications in a compute continuum. To achieve this, the framework has to integrate with the orchestration platform to control the workloads. An initial version of the software components has been developed to facilitate this interaction. This article gives insights into the job submission and execution in the orchestration platform.

Kubernetes, a widely used open-source platform for managing containerized applications, was chosen as the underlying orchestration platform in the project. To submit jobs to the orchestration platform, the DECICE framework has to interact with the Kubernetes API server. The current integration uses a declarative style. The framework will have to send DECICE job specifications to the Kubernetes API server and the orchestration platform should run the job using its own primitives.

For this interaction, the framework as well as the Kubernetes API server have been extended. A glue code that translates between DECICE’s internal format and the Kubernetes-specific format has been developed for the framework to talk to the Kubernetes API server. This glue code makes use of the Kubernetes Python client to make RESTful calls to the API server. The API server is extended by defining custom API resources to store and retrieve DECICE Job specifications. A custom controller has been implemented to monitor DECICE job custom resources and instantiate the necessary primitive objects, such as pods and volume claims, in the cluster to execute the job. Besides generating the primitive objects to run the DECICE job, custom controller code can also generate instances of other custom API resources used by KubeEdge, SEDNA, Volcano, etc.

The use of custom controllers can abstract the details of Kubernetes-specific primitives away from the framework. This separation of concerns makes it easier to upgrade the way jobs are interpreted because it requires changes only to the controller code without having to change the framework and its glue code. When the framework needs to connect to a different type of orchestration platform, the glue code can be replaced without any significant modifications to the framework. Therefore, the framework itself has been kept independent and can be ported to different platforms. The implementation of these components will evolve alongside other ongoing developments and optimization efforts in the project.

Author(s): Aadesh Baskar, High Performance Computing Center (HLRS) University of Stuttgart

Spread the love
back to top icon