Minimizing transformer inference overhead using controlling element on Shenwei AI accelerator


연구 분야: Cryptography



학회: Frontiers of Information Technology & Electronic Engineering


초록

Transformer models have become a cornerstone of various natural language processing (NLP) tasks. However, the substantial computational overhead during the inference remains a significant challenge, limiting their deployment in practical applications. In this study, we address this challenge by minimizing the inference overhead in transformer models using the controlling element on artificial intelligence (AI) accelerators. Our work is anchored by four key contributions. First, we conduct a comprehensive analysis of the overhead composition within the transformer inference process, identifying the primary bottlenecks. Second, we leverage the management processing element (MPE) of the Shenwei AI (SWAI) accelerator, implementing a three-tier scheduling framework that significantly reduces the number of host-device launches to approximately 1/10 000 of the original PyTorch-GPU setup. Third, we introduce a zero-copy memory management technique using segment-page fusion, which significantly reduces memory access latency and improves overall inference efficiency. Finally, we develop a fast model loading method that eliminates redundant computations during model verification and initialization, reducing the total loading time for large models from 22 128.31 ms to 1041.72 ms. Our contributions significantly enhance the optimization of transformer models, enabling more efficient and expedited inference processes on AI accelerators.


Author Profile
Yulong Zhao (赵玉龙)

State Key Laboratory of Mathematical Engineering and Advanced Computing Wuxi 214000 China

Andorra
Author Profile
Chunzhi Wu (吴春志)

State Key Laboratory of Mathematical Engineering and Advanced Computing Wuxi 214000 China

Andorra
Author Profile
Yizhuo Wang (王一卓)

School of Non-Commissioned Officer Space Engineering University Beijing 100004 China

China

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Andorra, China
사이트 Springer
좋아요 수 0

연관 논문 목록 (50건)