연구 분야: Databases
학회: European Conference on Advances in Databases and Information Systems
In this tutorial we present the results of researching, designing, implementing, and deploying data deduplication pipelines for customer records in a big financial institution. The tutorial is based on our experience gained within a R&D project. In the project we developed two deduplication pipelines. The first one is based on statistical modeling, whereas the second one is based on machine learning. Both pipelines were extensively tested on a real data set including customer records. The pipeline based on statistical modeling has already been deployed in the production system of the financial institution and processes batches of over 20 million of customer records .
| 발행 연도 | 2024년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | Poland |
| 사이트 | Springer |
| 좋아요 수 | 0 |