In OpenStack environments using the Cinder LVM backend with tgt, a volume deletion can fail even after the instance side of the workflow appears complete. One common cause is that the iSCSI session on the block storage node is still active, preventing tgt from removing the target.
When a pod fails to join the network in a Kube-OVN-backed cluster, the first symptom often looks like a generic CNI problem. In one Genestack-operated IAD sandbox case, the actual cause was a duplicate IP allocation: a new pod was assigned an address that was still recorded against an older, non-running pod in the same subnet.
In containerized OpenStack compute environments, a hard reboot or instance start can fail even when the hypervisor node itself looks healthy. One failure mode is a permissions mismatch on /dev/kvm, where the device inside the libvirt pod is mapped with ownership or permissions that do not line up with the host device.
When Neutron and OVN drift out of sync, one of the standard recovery tools is neutron-ovn-db-sync-util. In some environments, though, the sync itself can fail before it repairs anything, especially if Neutron still contains stale objects that reference routers that no longer exist.
In a large-scale OpenStack environment, especially one leveraging Genestack, networking is the lifeblood of the platform. One of the more frustrating issues operators can face is the intermittent Floating IP: connectivity works for a while, then drops unexpectedly, or only succeeds from certain source networks.
I'll be honest. When the AMD Radeon AI PRO R9700 first showed up on my radar, I wasn't sure what to make of it. It's not a traditional datacenter card and it's not a gaming card either. The R9700 is a 32 GB professional GPU that won't break the bank, and sits in a product category that didn't really exist eighteen months ago.
This week our team brought a pair of R9700 GPUs online in Rackspace OpenStack Flex; like any good story there was a bit of drama with servers, placement, shipping times, cables oddities, chassis crisis, and more; we had the making of a full feature length K-Drama with all the twists and turns. Once we got past the drama, parts were installed and powered on, the entire deployment took about ten minutes which is a testament to the power of Genestack's Kubernetes-native architecture and OpenStack's hardware-agnostic design.
Your instance is up, your AMD GPU is attached, and you're staring at a terminal with no nvidia-smi to lean on. Welcome to the other side.
If you've read our NVIDIA getting started guide, you know the drill: provision an instance, install drivers, verify the hardware, start computing. The AMD path follows the same logic but with different tooling. Instead of CUDA, you're working with ROCm. Instead of nvidia-smi, you've got rocm-smi. Instead of a driver ecosystem that's had two decades of cloud deployment polish, you've got one that's been moving fast and getting dramatically better, but still has some rough edges worth knowing about.
Your finance team doesn't care about tokens per second. They care about predictable costs, compliance risk, and vendor lock-in. Here's how CPU inference stacks up.
Spoiler: You don't need a $40,000 GPU to run LLM inference. Sometimes 24 CPU cores and the right software stack will do just fine.
The AI infrastructure conversation has become almost synonymous with GPU procurement battles, NVIDIA allocation queues, and eye-watering hardware costs. But here's a reality that doesn't get enough attention: for many inference workloads, especially during development, testing, and moderate-scale production, modern CPUs with optimized software can deliver surprisingly capable performance at a fraction of the cost.
Delivering Accelerator enabled Developer Cloud Functionality on Rackspace OpenStack Flex.
When AMD launched the AMD Developer Cloud, we took notice. Here was a streamlined platform giving developers instant access to high-performance MI300X GPUs, complete with pre-configured containers, Jupyter environments, and pay-as-you-go pricing. The offering resonated with the AI/ML community because it eliminated friction: spin up a GPU instance, start training, destroy it when done.