연구 분야: Databases
학회: 2024 2nd International Conference on Signal Processing and Intelligent Computing (SPIC)
Real time data warehouse refers to a data warehouse that achieves real-time requirements in data collection, processing, analysis, and application. Traditional data warehouses mainly focus on batch processing tasks, which are difficult to meet the needs of real-time business analysis and decision-making. Therefore, real-time data warehouses have emerged.This article studies and practices a real-time data warehouse construction method based on Flink. In the specific implementation process, real-time log data is collected through Flume, business data is collected through Maxwell, and peak shaving and decoupling of data are carried out using message queue Kafka. Using the Flink stream processing engine, build a real-time data warehouse to collect data in real-time from large-scale data sources, and perform Extract, Transform, Load(ETL) processing, dimension modeling, aggregation, and dimension degradation on the original data. Store the processing results in real-time in the data warehouse. Finally, the data is imported into Clickhouse based on task metrics and visualized for display.The research objective of this article is to explore the design and implementation of a real-time data warehouse based on Flink, aiming to achieve rapid processing and analysis of real-time data, and solve the data processing and analysis needs in practical business. The test results indicate that the applied technology is correct, the software design is reasonable, and meets the expected design requirements.
| 발행 연도 | 2024년 |
|---|---|
| 인용수 | 136 |
| 출판 국가 | Andorra |
| 사이트 | IEEE |
| 좋아요 수 | 0 |