Skip to content

2026

Bringing the AMD Radeon AI PRO R9700 Online in OpenStack Flex

I'll be honest. When the AMD Radeon AI PRO R9700 first showed up on my radar, I wasn't sure what to make of it. It's not a traditional datacenter card and it's not a gaming card either. The R9700 is a 32 GB professional GPU that won't break the bank, and sits in a product category that didn't really exist eighteen months ago.

This week our team brought a pair of R9700 GPUs online in Rackspace OpenStack Flex; like any good story there was a bit of drama with servers, placement, shipping times, cables oddities, chassis crisis, and more; we had the making of a full feature length K-Drama with all the twists and turns. Once we got past the drama, parts were installed and powered on, the entire deployment took about ten minutes which is a testament to the power of Genestack's Kubernetes-native architecture and OpenStack's hardware-agnostic design.

Getting Started with AMD GPU Compute on Rackspace OpenStack Flex

Your instance is up, your AMD GPU is attached, and you're staring at a terminal with no nvidia-smi to lean on. Welcome to the other side.

If you've read our NVIDIA getting started guide, you know the drill: provision an instance, install drivers, verify the hardware, start computing. The AMD path follows the same logic but with different tooling. Instead of CUDA, you're working with ROCm. Instead of nvidia-smi, you've got rocm-smi. Instead of a driver ecosystem that's had two decades of cloud deployment polish, you've got one that's been moving fast and getting dramatically better, but still has some rough edges worth knowing about.

So You Need Enterprise GPUs: A No-BS Guide to H100, H200, and B200

Let's be honest—NVIDIA's naming conventions are designed to confuse procurement teams. H100 SXM5, H100 NVL, H200 SXM, B200... it sounds like someone spilled alphabet soup on a product roadmap.

I've spent way too many hours explaining these differences to engineering teams, so here's everything you actually need to know before signing that hardware purchase order.

Why Your "Senior ML Engineer" Can't Deploy a 70B Model

TL;DR: Small models (≤30B) and large models (100B+) require fundamentally different infrastructure skills. Small models are an inference optimization problem—make one GPU go fast. Large models are a distributed systems problem—coordinate a cluster, manage memory as the primary constraint, and plan for multi-minute failure recovery. The threshold is around 70B parameters. Most ML engineers are trained for the first problem, not the second.

Here's something companies learn after burning through 6 figures in cloud credits: the skills for small models and large models are completely different. And most of your existing infra people can't do both.

Once you cross ~70B parameters, your job description flips. You're not doing inference optimization anymore. You're doing distributed resource management. Also known as: the nightmare.