A Large-scale Data Set and an Empirical Study of Docker Images Hosted on Docker Hub


연구 분야: Software Development



학회: 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME)


초록

Docker is currently one of the most popular containerization solutions. Previous work investigated various characteristics of the Docker ecosystem, but has mainly focused on Dockerfiles from GitHub, limiting the type of questions that can be asked, and did not investigate evolution aspects. In this paper, we create a recent and more comprehensive data set by collecting data from Docker Hub, GitHub, and Bitbucket. Our data set contains information about 3,364,529 Docker images and 378,615 git repositories behind them. Using this data set, we conduct a large-scale empirical study with four research questions where we reproduce previously explored characteristics (e.g., popular languages and base images), investigate new characteristics such as image tagging practices, and study evolution trends. Our results demonstrate the maturity of the Docker ecosystem: we find more reliance on ready-to-use language and application base images as opposed to yet-to-be-configured OS images, a downward trend of Docker image sizes demonstrating the adoption of best practices of keeping images small, and a declining trend in the number of smells in Dockerfiles suggesting a general improvement in quality. On the downside, we find an upward trend in using obsolete OS base images, posing security risks, and find problematic usages of the latest tag, including version lagging. Overall, our results bring good news such as more developers following best practices, but they also indicate the need to build tools and infrastructure embracing new trends and addressing potential issues.


Author Profile
Changyuan Lin

Department of Electrical and Computer Engineering University of Alberta Edmonton Canada

Andorra
Author Profile
Sarah Nadi

Department of Computing Science University of Alberta Edmonton Canada

Canada
Author Profile
Hamzeh Khazaei

Department of Electrical Engineering and Computer Science York University Toronto Canada

Andorra

📄 논문 정보

발행 연도 2020년
인용수 23
출판 국가 Andorra, Canada
사이트 IEEE
좋아요 수 0

연관 논문 목록 (122건)