ITRT(IT Research Trends)

RDF Data Partitioning for Efficient SPARQL Query Processing with Spark SQL

연구 분야: Databases

논문 키워드: #large #fast #big #exploding #triples

학회: International Conference on Information Integration and Web Intelligence

초록

In the age of big data, the volume of RDF data has been exploding due to the growing demands for open data, including Linked Open Data (LOD), semantic data processing, and knowledge graphs. Large-scale RDF data may contain millions to hundreds of millions of triples, comprising subject, predicate, and object, making fast query processing on such datasets challenging. To address this issue, distributed parallel processing systems like Apache Spark has been successfully used. One of the key issues in such systems is to partition the data to maximize performance while balancing the load and minimizing communication between processing nodes by taking into account the dataset’s characteristics and the workload. In this study, we propose a method of RDF data partitioning for efficient query processing by Spark SQL. We exploit the statistics of RDF data and the workload information representing typical user queries, allowing us to group strongly related RDF triples into the same partition. Moreover, we employ indexes whereby only the necessary partitions are loaded for answering a query, reducing the amount of data to be processed and improving query processing performance. Our evaluation experiments showed that the proposed scheme outperformed the comparative methods in table load time and query time for most benchmark queries in a single-node setting.

Kosuke Yamasaki

Graduate School of Science and Technology University of Tsukuba Tsukuba Japan

Andorra

Toshiyuki Amagasa

Graduate School of Science and Technology University of Tsukuba Tsukuba Japan

Andorra

📄 논문 정보

발행 연도	2023년
인용수	0
출판 국가	Andorra
사이트	Springer
좋아요 수	0

RDF Data Partitioning for Efficient SPARQL Query Processing with Spark SQL

RDF Data Partitioning for Efficient SPARQL Query Processing with Spark SQL

📄 논문 정보

연관 논문 목록 (105건) 내 서재 담기

연관 논문 목록 (105건)