As large language models grow in capability, they also grow in complexity—requiring GPU memory and compute beyond what most single systems can provide. For infrastructure and operations teams, this creates new challenges around deployment, scheduling, cost management, and reliability.
In this session, we’ll introduce LLM-d, an open, Kubernetes-native framework for distributed inference. You’ll learn how Red Hat is leading efforts across the community to shape LLM-d into a scalable, operator-friendly platform for production GenAI.
We’ll demonstrate how LLM-d integrates into OpenShift AI, supports multi-GPU workloads, and provides: