연구 분야: Databases
학회: 2025 7th International Conference on Signal Processing, Computing and Control (ISPCC)
The need for effective Extract, Transform, Load (ETL) technologies that can manage the growing volumes of both structured and unstructured data in information lakehouse architectures is increasing due to the rapid expansion of data environments. Existing ETL systems struggle with performance, scalability, and adaptability, making it difficult to handle the rising demands of both batch and real-time data processing. To address these challenges, this study proposes a novel Hybrid ETL Infrastructure that combines contextual data partitioning with AI-driven orchestration to optimize data ingestion and real-time analysis in information lakehouses. The system addresses key issues, such as inefficient data handling, slow processing times, and inadequate transformation methods, by dynamically adjusting the ETL pipeline based on user requests, context-aware partitioning, and data characteristics. The AI-driven orchestration ensures efficient job scheduling by seamlessly switching between batch and real-time processing based on data importance, thereby improving both performance and flexibility. Contextual partitioning reduces processing costs and enhances query performance by automatically organizing data according to domain-specific knowledge and query intent. The primary goal of this framework is to improve data transformation and loading performance in information lakehouses, enabling faster and more accurate decision-making. Preliminary results show a 30% reduction in ETL processing time and significant improvements in query efficiency and accuracy, particularly in complex, cross-domain data retrieval environments. The proposed approach offers a scalable, adaptable, and intelligent solution for modern data lakehouse scenarios, outperforming traditional ETL methods.
| 발행 연도 | 2025년 |
|---|---|
| 인용수 | 30 |
| 출판 국가 | Andorra |
| 사이트 | IEEE |
| 좋아요 수 | 0 |