Extracting Provenance of Machine Learning Experiment Pipeline Artifacts


연구 분야: Software Development



학회: European Conference on Advances in Databases and Information Systems


초록

Experiment management systems (EMSs), such as MLflow, are increasingly used to streamline the collection and management of machine learning (ML) artifacts in iterative and exploratory ML experiment workflows. However, EMSs typically suffer from limited provenance capabilities rendering it hard to analyze the provenance of ML artifacts and gain knowledge for improving experiment pipelines. In this paper, we propose a comprehensive provenance model compliant with the W3C PROV standard, which captures the provenance of ML experiment pipelines and their artifacts related to Git and MLflow activities. Moreover, we present the tool MLFLOW2PROV that extracts provenance graphs according to our model from existing projects enabling collected pipeline provenance information to be queried, analyzed, and further processed.


Author Profile
Marius Schlegel

TU Ilmenau Ilmenau Germany

Germany
Author Profile
Kai-Uwe Sattler

TU Ilmenau Ilmenau Germany

Germany

📄 논문 정보

발행 연도 2023년
인용수 0
출판 국가 Germany
사이트 Springer
좋아요 수 0

연관 논문 목록 (127건)