Skip to content

2025

The Business Case for CPU-Based AI Inference

Your finance team doesn't care about tokens per second. They care about predictable costs, compliance risk, and vendor lock-in. Here's how CPU inference stacks up.

The other week I published a technical deep-dive on running LLM inference with AMD EPYC processors and ZenDNN. The benchmarks showed that a $0.79/hour VM can push 40-125 tokens per second depending on model size, genuinely usable performance for a surprising range of workloads.

But benchmarks don't answer the question that actually matters: Should you do this?

Running AI Inference on AMD EPYC Without a GPU in Sight

Spoiler: You don't need a $40,000 GPU to run LLM inference. Sometimes 24 CPU cores and the right software stack will do just fine.

The AI infrastructure conversation has become almost synonymous with GPU procurement battles, NVIDIA allocation queues, and eye-watering hardware costs. But here's a reality that doesn't get enough attention: for many inference workloads, especially during development, testing, and moderate-scale production, modern CPUs with optimized software can deliver surprisingly capable performance at a fraction of the cost.

Solving GPU Passthrough Memory Addressing in OpenStack

Delivering Accelerator enabled Developer Cloud Functionality on Rackspace OpenStack Flex.

When AMD launched the AMD Developer Cloud, we took notice. Here was a streamlined platform giving developers instant access to high-performance MI300X GPUs, complete with pre-configured containers, Jupyter environments, and pay-as-you-go pricing. The offering resonated with the AI/ML community because it eliminated friction: spin up a GPU instance, start training, destroy it when done.

Getting Started with GPU Compute on Rackspace OpenStack Flex

Your instance is up, your GPU is attached, and now you're staring at a blank terminal wondering what's next. Time to get Clouding.

Rackspace OpenStack Flex delivers GPU-enabled compute instances with real hardware acceleration, not the "GPU-adjacent" experience you might get elsewhere. Whether you're running inference workloads, training models, or just need raw parallel compute power, getting your instance properly configured is the difference between frustration and actual productivity.

Enabling Hybrid Cloud with RackConnect Global

RackConnect® Global (RCG) is a software-defined, multi-cloud interconnection platform that links Rackspace Technology customers with other Rackspace data centers, third-party data centers, and third-party clouds through direct, private, low-latency, virtual connections. All traffic flowing between every endpoint avoids the public Internet and instead rides the Rackspace private backbone.

Within OpenStack Flex, Rackspace's next-generation multi-tenant public cloud, RackConnect Global allows customers to connect their virtual machine instances to third-party datacenters or to dedicated (baremetal) server environments within Rackspace DCs. RackConnect Global can enable customers to leverage the elasticity of public cloud while utilizing dedicated hardware gateways and firewalls, or simply add routes to remote resources over the RCG link.

A new paradigm for cloud-native infrastructure

Spot

What if spinning up a fully isolated Kubernetes cluster took seconds instead of hours, and cost a fraction of traditional managed Kubernetes? What if that cluster could run worker nodes anywhere in the world, even across clouds, while still being centrally managed? What if the Control Plane itself could be treated as a workload, scaling elastically and sharing infrastructure with hundreds of other clusters? What if all of this functionality was available now?

Rackspace launched "Spot", a Kubernetes offering with a clear mission: to provide fully managed Kubernetes clusters at compelling cost-efficiency, powered by dynamic spot/auction compute, and delivered as a turnkey experience.

In doing so, a fundamental question has to be confronted: how do you build a multi-tenant service that can spin up hundreds, or even thousands, of isolated Kubernetes clusters, each with its own Control Plane, without the overhead and complexity that traditional architectures entail?

What was needed was way more than a simple Kubernetes cluster creation automation: Rackspace needed an architecture built for scale, elasticity, and efficient multi-tenant orchestration. That's where Kamaji came in.

Streamlining Node Access with K9s and kubectl-node-shell

Debugging Kubernetes clusters often requires direct access to nodes. There are several ways to access your nodes... ssh, iLO/DRAC, kubectl debug, etc. I love shortcuts, aliases, functions, and scripts that can help me quickly gather data and help with my troubleshooting. I have found K9s, a powerful terminal UI for Kubernetes, and how to enhance it with kubectl-node-shell for seamless node access. This quick blog will hopefully give you another tool you can use with your kubernetes clusters.

Octavia OVN Overview

Octavia provides the ability to utilize OVN(Open Virtual Networking) as a provider driver to deploy layer 4 loadbalancing. While Octavia OVN loadbalancers may not be as feature rich as Amphora based loadbalancing OVN based loadbalancers provide a resource efficient, fast deploying and highly availble loadbalancing solution. View the Amphora and OVN loadbalancing comparison matrix for more info about which features are supported.

For additional information regarding Octavia OVN loadbalancers view the Octavia OVN Docs.

CAPI powered TALOS clusters on Rackspace OpenStack Flex

We detail how to build immutable, secure, and minimal Kubernetes clusters by combining Cluster API (CAPI) with TALOS OS. This powerful stack allows you to leverage the cloud-agnostic management capabilities of CAPI while benefiting from TALOS's minimal attack surface. Deploying on RackSpace OpenStack Flex grants you complete control over your underlying infrastructure, maximizing resource efficiency and providing a production-ready cloud-native environment. This integration simplifies day-2 operations and delivers enterprise-grade security for your private cloud.