From Data Warehouse to Lakehouse: A Comparative Review


연구 분야: Databases



학회: 2022 IEEE International Conference on Big Data (Big Data)


초록

Digital information systems currently generate a vast amount of data every minute which emphasizes the continuing need to advance big data management systems with efficient data ingestion and knowledge extraction capabilities. To address the ‘big data’ problems due to high volume, velocity, variety, and veracity, data management systems evolved from structured databases to big data storage systems, graph databases, data warehouses, and data lakes but each solution has its strengths and shortcomings. The need to produce actionable knowledge fast from unstructured data ingested from distributed sources requires a marriage of data warehouses and data lakes to create a data Lakehouse (LH). The objective is to use the strengths of the data warehouse in producing insights fast from processed merged data, and of the data lake in ingesting and storing high-speed unstructured data with post-storage transformation and analytics capabilities. In this paper, we present a comparative review of the existing data warehouse and data lake technology to highlight their strengths and weaknesses and propose the desired and necessary features of the LH architecture, which has recently gained a lot of attention in the big data management research community.


Author Profile
Ahmed A. Harby

School of Computing Queen’s University Kingston Canada

Canada
Author Profile
Farhana Zulkernine

School of Computing Queen’s University Kingston Canada

Canada

📄 논문 정보

발행 연도 2022년
인용수 45
출판 국가 Canada
사이트 IEEE
좋아요 수 0

연관 논문 목록 (88건)