The Status-Quo in nested data processing for high-energy physics


연구 분야: Databases



학회: The VLDB Journal


초록

Nested data is valuable and ubiquitous. It is being generated in ever-increasing volumes across industrial and research environments and frequently contains valuable information that is extracted through analytical workloads. Despite its popularity and value, there is no clear-cut understanding of the status quo in analytical workloads for nested data in high-energy physics (HEP). In this paper, we seek to define the landscape of nested data processing in HEP by evaluating 10 systems and their query languages on the IRIS HEP ADL benchmark, a popular and representative HEP benchmark. We attempt not only to understand how well these systems perform from a query latency and scalability point of view but also from a query language usability perspective. The result of our evaluation paints an interesting and rather complex picture of existing solutions. Many of the evaluated systems are between one and two orders of magnitude slower than the domain-specific system used in HEP today, while a few of the commodity systems provide on-par performance at greater costs. Moreover, the evaluated query languages and dialects vary greatly in how naturally and concisely they can express nested query patterns. These observations suggest that while commodity data management systems and their query languages are viable tools for nested data processing, significant work remains to make them competitive with domain-specific solutions like those used by the HEP community.


Author Profile
Dan Graur

Department of Computer Science ETH Zürich Zürich Switzerland

Ethiopia
Author Profile
Ingo Müller

Department of Computer Science ETH Zürich Zürich Switzerland

Ethiopia
Author Profile
Mason Proffitt

Department of Physics University of Washington Seattle USA

United States

📄 논문 정보

발행 연도 2025년
인용수 0
출판 국가 Ethiopia, United States
사이트 Springer
좋아요 수 0

연관 논문 목록 (53건)