Heterogeneous Syslog Analysis: There Is Hope


연구 분야: Analysis



학회: SC-W '23: Proceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis


초록

Identifying system hardware failures and anomalies is a unique challenge in heterogeneous testbed clusters because of variation in the ways that the system log reports errors and warnings. We present a novel approach for the real-time classification of syslog messages generated by a heterogeneous testbed cluster to proactively identify potential hardware issues and security events. By integrating machine learning models with high-performance computing systems, our system facilitates continuous system health monitoring. The paper introduces a taxonomy for classifying system issues into actionable categories of problems, while filtering out groups of messages that the system administrators would consider unimportant "noise". Finally, we experiment with using large language models as a message classifier, and share our results and experience with doing so. Results demonstrate promising performance, and more explainable results compared to currently available techniques, but the computational costs may offset the benefits.


Author Profile
Andres Quan

Los Alamos National Laboratory United States of America

United States
Author Profile
Leah Howell

Los Alamos National Laboratory USA

United States
Author Profile
Hugh N Greenberg

Los Alamos National Laboratory USA

United States

📄 논문 정보

발행 연도 2023년
인용수 1
출판 국가 United States
사이트 ACM
좋아요 수 0

연관 논문 목록 (28건)