This is a scary problem when you’re recovering from an outage of your database machines. If you’re running a Galera cluster and they all go offline, you’ll need to do a bit of work to restart the cluster and make it safe.
Galera relies on the fact that there’s at least one node running in your cluster at all times. If your entire cluster goes offline, you won’t be able to start it again, even with the –wsrep-new-cluster option. The process to restart it is simple, but requires some manual intervention.
First, get the contents of grastate.dat on all the cluster nodes. On Debian and Ubuntu, this is at
The output should resemble:
# GALERA saved state version: 2.1 uuid: 1ea21a3a-e76c-11e6-96a4-a291688a9d43 seqno: -1 safe_to_bootstrap: 0
The line we’re concerned about is
seqno. Find the node with the highest sequence number, or if they’re all the same (-1 for a graceful shutdown) pick any node.
Edit that file on the node you’ve selected and set
safe_to_bootstrap to 1. Start that node with the
--wsrep-new-cluster option. You should be able to bring online the remaining nodes and have a cluster once again.