The Future is Intelligent: AI in the Cloud-Native Era
Abstract
Mirantis senior staff explained why Kubernetes has become the de‑facto operating system for modern AI workloads and how open‑source, cloud‑native tooling can keep AI infrastructure vendor‑agnostic, cost‑effective, and data‑sovereign. The talk traced the evolution from early private‑cloud solutions to today’s multi‑cloud, GPU‑aware Kubernetes stacks, highlighted survey data on AI‑ML developers, and detailed the technical and operational challenges of building AI‑ready platforms. The presenters then showcased a composable, open‑source platform (Codent) that leverages Cluster‑API, Helm‑based templating, and a suite of AI‑focused tools (KTSGPT, k‑agent, AI‑bricks, etc.) and concluded with a live demo of an end‑to‑end translation service running on a GPU‑enabled Azure cluster.
Detailed Summary
- The session began with housekeeping (group photo) and a brief welcome from Bharath N R.
- Bharath introduced Mirantis’ Open Source Program Office (OSPO), whose mission is to contribute upstream to the open‑source projects that underpin enterprise cloud stacks.
- Satyam Bhardwaj followed, describing his focus on CNCF‑related projects, especially Kubernetes, and positioning Mirantis as a long‑standing pioneer in private‑cloud technology (OpenStack, Docker Enterprise, Mirantis Kubernetes Engine, Lens UI).
2. From Private Cloud to Cloud‑Native AI
2.1 Evolution of Cloud Architecture
- Early cloud expectations: a single, public‑cloud endpoint.
- Reality today: multi‑cloud (AWS, Azure, private clouds, edge) with 20+ clusters per organization.
- The proliferation of APIs and services has turned cloud from a “simple interaction” into a complex orchestration problem.
2.2 Kubernetes as the Common Control Plane
- Kubernetes is framed as the “OS of the future” – a universal control plane, scheduler, and API surface that can abstract away the underlying heterogeneity.
- The speaker stressed that calling Kubernetes an OS is no longer optimistic; it is already the reality for most production workloads.
2.3 Cloud‑Native Survey Highlights
| Metric | Figure (CNCF Survey) |
|---|---|
| Total cloud‑native developers | 15.6 M |
| Developers who identify as AI/ML engineers | 52 % (~7.1 M) |
| Already running AI workloads on K8s | 36 % |
| Planning AI workloads on K8s | 18 % |
- The surge in LLMs and “agents” has accelerated the migration of AI workloads onto Kubernetes.
3. AI Infrastructure Challenges (Presented by Satyam)
3.1 End‑to‑End Stack Complexity
- An AI request traverses GPU, storage, network, and monitoring layers.
- Inefficient GPU utilization (e.g., “burning GPUs”) drives the need for tighter orchestration and cost control.
3.2 Kubernetes Fundamentals (for the audience)
- Analogy: a set of five servers hosting an app; Kubernetes guarantees high‑availability, auto‑scaling, self‑healing, and auto‑hauling.
- Kubernetes is the second‑largest open‑source project after Linux, governed by a neutral community rather than a single vendor.
3.3 Operational Friction
| Challenge | Description |
|---|---|
| Multi‑cluster / multi‑cloud fragmentation | Managing dozens of clusters creates a proliferation of YAML manifests and divergent APIs. |
| Visibility & governance | Disparate regions make it hard to obtain a unified view; GitOps tools struggle at scale. |
| Regulation & compliance | Data‑sovereignty mandates (e.g., EU) require audit trails, security hardening, and consistent policy enforcement. |
| GPU onboarding & multi‑tenancy | On‑board GPUs quickly, share them efficiently across teams, and deal with vendor‑specific slicing (NVIDIA MIG, AMD SR‑IOV). |
| Operational efficiency | Slow provisioning (weeks for cloud GPUs), high maintenance overhead, and noisy‑neighbor effects. |
Takeaway: Modern infrastructure is harder; without automation, innovation stalls.
4. AI‑Specific Infrastructure Pain Points
- Technical complexity – No single standard; countless frameworks (TensorFlow, PyTorch, JAX) and platform components must interoperate.
- Operational efficiency – Utilization, provisioning speed, and networking become bottlenecks once the stack is assembled.
- User experience – Developers still face 6‑week GPU request cycles on hyperscalers; they demand instant, pre‑configured environments with reliable performance.
- Multi‑tenancy – High‑cost GPUs must be shared safely; different vendors expose different slicing APIs, leading to fragmentation and noise.
5. Kubernetes AI Conformance
- A conformance layer ensures that AI/ML workloads are portable across managed Kubernetes services (GKE, AKS, private clusters).
- Six conformance factors:
- Hardware accelerators – GPU, TPU, etc.
- Operators – Standardized CRDs for AI workloads.
- Scheduling – Consistent resource‑allocation semantics.
- Security / compliance – CVE scanning, policy enforcement (OPA/Kyverno).
- Observability – OpenTelemetry, Prometheus, Grafana.
- Lifecycle management – Version‑ed APIs, upgrade pathways.
6. AI Workflow Pillars
| Pillar | Typical Open‑Source Tools |
|---|---|
| Training | PyTorch (≈ 80 % of Hugging Face training), TensorFlow, JAX |
| Inference | AI‑bricks (CNCF‑graduated inference framework), LLMD, VLLM (distributed inference, KV‑cache) |
| Agents | Custom LLM‑driven agents that perform node selection, memory management, or service orchestration |
7. Platform‑Engineering Blueprint (Presented by Bharath)
A five‑step model for building a resilient AI platform:
- Developer Experience – Unified portal, self‑service tooling (e.g., Backstage).
- Security – CVE mitigation, software‑supply‑chain policies (OPA, Kyverno).
- Foundation – CI/CD pipelines, IaC, API management (Kong), feature‑flag systems.
- Resilience Engineering – Incident management, chaos testing, reliability testing.
- Cost & Observability – Cloud‑cost dashboards (OpenCost), logging & tracing (Elastic, OpenTelemetry).
7.1 Highlighted Open‑Source Tools
| Tool | Function |
|---|---|
| KTSGPT | LLM‑driven debugging of K8s clusters (log analysis + remediation suggestions). |
| k‑agent | Framework for deploying and orchestrating AI agents inside K8s. |
| k‑tops | Packages DevOps practices as reusable AI/ML models (OCI‑compatible). |
| Kubeflow | End‑to‑end ML workflow engine (training → CI/CD → deployment). |
| KServe | Production‑grade inference serving (model versioning, autoscaling). |
| AI‑bricks / LLMD / VLLM | Specialized inference runtimes with GPU‑aware scheduling and KV‑cache. |
| Nebius, Chainguard, SalePoint | Security‑focused solutions for AI pipelines. |
| QAI | Multi‑agent orchestration platform for K8s. |
| GitLab, Harness | AI‑enhanced software delivery pipelines. |
| AI‑SRE, Cube‑Cost AI | Observability and cost‑optimization for AI workloads. |
8. “MCP Server” Concept
- MCP (Modular Control Plane) servers act as plug‑and‑play adapters for each cloud‑native component, providing a standardized API surface and simplifying integration across the ecosystem.
9. Codent (Composable AI‑Ready Platform)
9.1 Architectural Overview
-
Three layers:
- Cluster Management – Powered by Cluster‑API (CAPI) with provider‑specific implementations (AWS, Azure, GCP, OpenStack, Bare‑Metal).
- State Management – Handled by SvelteOS (Helm‑based, GitOps‑ready) for services such as ingress, cert‑manager, GPU operators.
- Observability – Standard stack (Prometheus, Grafana, OpenTelemetry, OpenCost).
-
Composable design: each layer is defined by YAML templates (cluster spec + service spec). Swapping a GPU type (e.g., T4 → H100) or a cloud provider is a single change in the template.
9.2 Toolchain Choices
| Layer | Technology |
|---|---|
| Container Runtime | K0s (lightweight K8s distribution) |
| Cluster Provisioning | Cluster‑API (CAPI) |
| Service Deployment | Helm charts (no heavy IaC tools required) |
| GitOps | Argo CD (integrated with SvelteOS) |
| GPU Operator | NVIDIA GPU Operator (or AMD equivalent) |
| Service Mesh | Istio |
| Ingress / Cert‑Management | Kong, cert‑manager |
| Serverless / Model Serving | Knative, KServe |
9.3 Live Demo (Azure GPU Cluster)
- Cluster spec: Azure VM series with Tesla T4 GPUs; control‑plane and worker flavors defined in a single Helm‑based YAML.
- Service spec: Deploy NVIDIA GPU Operator, Istio, cert‑manager, KServe, K0s.
- Application: A translation service (English → Hindi) using an offline AI model. Demonstrated end‑to‑end provisioning in ≈ 15‑20 minutes.
- Observation: The demo highlighted a gap in Indian‑language models, prompting a call for India‑native AI assets.
10. Future Outlook
- Anticipated growth: ~75 % of cloud‑native engineers will be AI/ML engineers within the next few years.
- Roadmap: tighter integration of MCP servers for autonomous platform behavior, broader support for regional AI models, and continued open‑source convergence across the CNCF landscape.
11. Q&A / Closing Remarks
- The audience asked about GPU driver pain points, time to provision clusters, and availability of Indian language models.
- Bharath reiterated that the composable, Helm‑only approach reduces operational overhead and that the open‑source community is actively building the missing models.
- The session wrapped with a thank‑you from the moderators and an invitation to connect for deeper technical discussions.
Key Takeaways
- Kubernetes is the universal OS for AI workloads; its ecosystem now includes GPU‑aware scheduling and dynamic resource allocation (GA in K8s 1.34/1.35).
- More than half of cloud‑native developers are AI/ML engineers, and a growing fraction already run AI on Kubernetes.
- AI infrastructure challenges are three‑fold: technical complexity, operational efficiency, and multi‑tenancy/security/compliance.
- Kubernetes AI Conformance (six-factor checklist) is essential for portable, repeatable AI workloads across clouds.
- A five‑step platform‑engineering model (developer experience → cost & observability) provides a practical roadmap for building resilient AI platforms.
- Open‑source tooling (KTSGPT, k‑agent, AI‑bricks, Kubeflow, KServe, etc.) enables end‑to‑end AI pipelines without vendor lock‑in.
- Codent demonstrates that a composable, Helm‑templated stack can provision a full AI‑ready Kubernetes cluster (including GPU operators) in under 20 minutes.
- MCP servers act as standardized adapters, simplifying the integration of disparate cloud‑native components.
- India‑specific AI models are still scarce; the community is urged to develop and open‑source localized models.
- Future workforce shift: expect three‑quarters of cloud‑native engineers to be AI/ML focused, underscoring the strategic importance of open, cloud‑native AI platforms.
See Also: