On Customer Data Deduplication - Research vs. Industrial Perspective:


연구 분야: Databases



학회: European Conference on Advances in Databases and Information Systems


초록

In this tutorial we present the results of researching, designing, implementing, and deploying data deduplication pipelines for customer records in a big financial institution. The tutorial is based on our experience gained within a R&D project. In the project we developed two deduplication pipelines. The first one is based on statistical modeling, whereas the second one is based on machine learning. Both pipelines were extensively tested on a real data set including customer records. The pipeline based on statistical modeling has already been deployed in the production system of the financial institution and processes batches of over 20 million of customer records .


Author Profile
Robert Wrembel

Poznan University of Technology Poznań Poland

Poland
Author Profile
Witold Andrzejewski

Poznan University of Technology Poznań Poland

Poland
Author Profile
Bartosz Bębel

Poznan University of Technology Poznań Poland

Poland

📄 논문 정보

발행 연도 2024년
인용수 0
출판 국가 Poland
사이트 Springer
좋아요 수 0

연관 논문 목록 (23건)