연구 분야: Databases
학회: The VLDB Journal
Nested data is valuable and ubiquitous. It is being generated in ever-increasing volumes across industrial and research environments and frequently contains valuable information that is extracted through analytical workloads. Despite its popularity and value, there is no clear-cut understanding of the status quo in analytical workloads for nested data in high-energy physics (HEP). In this paper, we seek to define the landscape of nested data processing in HEP by evaluating 10 systems and their query languages on the IRIS HEP ADL benchmark, a popular and representative HEP benchmark. We attempt not only to understand how well these systems perform from a query latency and scalability point of view but also from a query language usability perspective. The result of our evaluation paints an interesting and rather complex picture of existing solutions. Many of the evaluated systems are between one and two orders of magnitude slower than the domain-specific system used in HEP today, while a few of the commodity systems provide on-par performance at greater costs. Moreover, the evaluated query languages and dialects vary greatly in how naturally and concisely they can express nested query patterns. These observations suggest that while commodity data management systems and their query languages are viable tools for nested data processing, significant work remains to make them competitive with domain-specific solutions like those used by the HEP community.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Ethiopia, United States |
| 사이트 | Springer |
| 좋아요 수 | 0 |