Scaling Generative AI with Confidence: LLM-d and OpenShift for Distributed Inference

As large language models grow in capability, they also grow in complexity—requiring GPU memory and compute beyond what most single systems can provide. For infrastructure and operations teams, this creates new challenges around deployment, scheduling, cost management, and reliability.

In this session, we’ll introduce LLM-d, an open, Kubernetes-native framework for distributed inference. You’ll learn how Red Hat is leading efforts across the community to shape LLM-d into a scalable, operator-friendly platform for production GenAI.

We’ll demonstrate how LLM-d integrates into OpenShift AI, supports multi-GPU workloads, and provides:

Declarative model deployment using Kubernetes-native APIs
Distributed serving for large models like Llama3 and Granite

7717-Scaling Generative AI with Confidence: LLM-d and OpenShift for Distributed Inference

First name

Last name

Phone

Company Name

Job Title

Company Size

Industry

Department

City

Address

Postcode

Country

Red Hat may use your personal data to inform you about its products, services, and events.

Checkbox Field

Notify me about products, services, and events.

You can stop receiving marketing emails by clicking the unsubscribe link in each email or withdraw your consent at any time in the preference center. See Privacy Statement for details.