NUMA, noisy neighbours, finops, avoiding over provisioning and underutilisation: AI is speedrunning the essentials of virtualisation and cloud computing, but how do you isolate for security and still get enough information to make good decisions?

Mary Branscombe's avatar

having watched VMs go through the cycle of getting efficient and then getting protected even from admins, I've always been a bit salty about whether containers had enough isolation and it's been interesting to see projects emerge that suggest people agree with me: now AI agents are forcing the issue

Mary Branscombe's avatar

But because containers weren't designed for isolation, adding it tends to mess up the metrics you need both for observability and for Kubernetes' own autoscaling and the way VMs usually solved for this (agents) isn't necessarily the best option. Plus, for a long time Kubernetes just shrugged at GPUs

Mary Branscombe's avatar

GPUs can use the new Dynamic Resource Allocation API to tell Kubernetes about themselves so you can do more than, say, allocate a whole GPU; they can also report back metrics. @alex.zenla.io tells me how @edera.dev uses DRA to get all the metrics you want out of containers in a Kubernetes native way

Mary Branscombe's avatar

Handling AI agents at scale demands isolation that starts up quickly, has a low memory footprint and generally, as @squillace.bsky.social puts it 'makes Kubernetes go brrrr'; Edera's focus on metrics fits in nicely with the way the ecosystem is turning the crank on security and performance this year

Mary Branscombe's avatar

also, the CAST AI stats I quote about utilization for memory and GPU are just astonishing: Edera's customers seem to be doing slightly better than the industry average of *5%* by having about 30% utilization on their GPUs but this level of overprovisioning is untenable at today's costs

Mary Branscombe's avatar

if you're interested in the wider isolation space, I've written about Kata containers and Kubevirt (containers in VMs and VMs in containers respectively), but this is a great survey of the isolation landscape (although it misses out interesting things like Hyperlight, a Rust app-level VM option)

  • Kubernetes

  • isolation

  • performance

  • utilisation

  • metrics

  • AI

  • agentic AI

  • Edera

  • GPU