Towards Practical Self-Healing Distributed Databases


연구 분야: Databases



학회: 2020 IEEE Infrastructure Conference


초록

As distributed databases expand in popularity, there is ever-growing research into new database architectures that are designed from the start with built-in self-tuning and self-healing features. In real world deployments, however, migration to these entirely new systems is impractical and the challenge is to keep massive fleets of existing databases available under constant software and hardware change. Apache Cassandra is one such existing database that helped to popularize "scale-out" distributed databases and it runs some of the largest existing deployments of any open-source distributed database.In this paper, we demonstrate the techniques needed to transform the typical, highly manual, Apache Cassandra deployment into a self-healing system. We start by composing specialized agents together to surface the needed signals for a self-healing deployment and to execute local actions. Then we show how to combine the signals from the agents into the cluster level control-planes required to safely iterate and evolve existing deployments without compromising database availability. Finally, we show how to create simulated models of the database's behavior, allowing rapid iteration with minimal risk. With these systems in place, it is possible to create a truly self-healing database system within existing large-scale Apache Cassandra deployments.


Author Profile
Joseph Lynch

Cloud Data Engineering Netflix Inc. Los Gatos USA

United States
Author Profile
Dinesh Ashok Joshi

Apache Cassandra Apache Software Foundation San Jose USA

United States

📄 논문 정보

발행 연도 2020년
인용수 1
출판 국가 United States
사이트 IEEE
좋아요 수 0

연관 논문 목록 (320건)