ITRT(IT Research Trends)

Towards Using Partitioned GPU Virtual Functions for Mixture of Experts

연구 분야: Cryptography

논문 키워드: #virtualization #expert #experts #scalability #larger

학회: European Conference on Parallel Processing

초록

Recent advancements in large language models (LLMs) have shown that smaller, fine-tuned models have comparable or better performance against larger general-purpose models in domain-specific knowledge, even when quantized. However, these models suffer from several issues in production systems: under-utilizing memory and potential data security risks. We propose a new method of mixture of experts (MoE) inference utilizing GPU partitioning combined with single-root IO virtualization (SRIOV), enabling better utilization of GPU memory and scalability, while ensuring model weights remain secure. LLMs today come in a variety of sizes and quantization levels, each with its own memory requirement. Using SRIOV, we can partition the GPU into one or more virtual functions (VFs), altering allocated memory and compute to fit the needs of these LLMs. With AMD Instinct™ MI300X [1], for example, one VF can have 24 to 192 GB of high bandwidth memory (HBM), scaling into 1.5 TB per node. These SRIOV-enabled virtual machines also address the load imbalance inherent in MoE models, eliminating the need for an auxiliary loss for load balancing, while maintaining fast interconnect between all components, providing low latency during inference. Additionally, isolation capabilities built into SRIOV ensure native data security as virtual functions are isolated from each other, creating the possibility of new use cases where different vendors may provide their own expert to the mixture.

📄 논문 정보

발행 연도	2025년
인용수	0
출판 국가	United States
사이트	Springer
좋아요 수	0

Towards Using Partitioned GPU Virtual Functions for Mixture of Experts

Towards Using Partitioned GPU Virtual Functions for Mixture of Experts

Vignesh Chander

Tony Yi

Jerry Jiang

Vamsi Alla

📄 논문 정보

연관 논문 목록 (28건)

Towards Using Partitioned GPU Virtual Functions for Mixture of Experts

Towards Using Partitioned GPU Virtual Functions for Mixture of Experts

📄 논문 정보

연관 논문 목록 (28건) 내 서재 담기

연관 논문 목록 (28건)