ITRT(IT Research Trends)

An Approach for Efficient Processing of Machine Operational Data

연구 분야: Databases

논문 키워드: #computing #ibm #hundreds #supercomputer #supercomputers

학회: International Conference on Database and Expert Systems Applications

초록

Supercomputers come in a variety of sizes and architectures with thousands of interconnected nodes. Most organizations are required to produce metrics for their funding sources to prove that these machines are being utilized and meeting the availability requirements. While tracking the state of an individual server is trivial, measuring uptime of a supercomputer with several thousand nodes spanning tens to hundreds of cabinets and rows with one or more mounted file systems is a complex task. Additionally, supercomputers have diverse architectures and System Logic (which includes unique characteristics of the machine itself such as networking topology, size, partitions, hardware layout, physical configuration and component hierarchy). These constraints complicate the computation of standardized metrics such as Mean Time To Failure (MTTI), Mean Time to Failure (MTTF), availability, and utilization. At the Argonne Leadership Computing Facility (ALCF), we developed a tool that standardizes the analyses of these machines so that these metrics can be computed accurately and efficiently. We call this tool Operational Data Processing System (ODPS), and use it to process the data that Theta, a 4,392 node Cray XC40, generates. In addition to the XC40, this tool also works with Mira, a 49,152 node IBM BG/Q system that ALCF houses. This paper explores how ODPS processes the data from Theta and Mira, including the storage design decisions and architecture-independent approach to metric calculations. We quantitatively evaluate our approach, comparing it to alternative methods for storing and processing supercomputer machine state in the database.

📄 논문 정보

발행 연도	2023년
인용수	0
출판 국가	Israel
사이트	Springer
좋아요 수	0

An Approach for Efficient Processing of Machine Operational Data

An Approach for Efficient Processing of Machine Operational Data

Ben Lenard

Eric Pershey

Zachary Nault

Alexander Rasin

📄 논문 정보

연관 논문 목록 (60건)

An Approach for Efficient Processing of Machine Operational Data

An Approach for Efficient Processing of Machine Operational Data

📄 논문 정보

연관 논문 목록 (60건) 내 서재 담기

연관 논문 목록 (60건)