Database Integrity v2

When it comes to database integrity, a two-hosts master-slave setup looks promising in theory. It surely is better than having only a single DB, but that is about the only pro-argument available.

Approach v1: Master + Slave1 

Okay, so we have a critical system running a database server w/ important production-grade data serving thousands of web clients. This qualifies as a highly critical system, and should never be influenced by additional manual queries to reduce the possibility of operational problems. So if any of the DB data is scheduled to undergo some form of analysis, that is hopefully only going to happen on a slave system. This might generally sound like a perfectly good setup, but only as long as the slave is not having any integrity issues. In this scenario, If the data gets inconsistent, we have to stop both master and slave which means downtime and then copy over all the data, restart master and slave, setup and reinitiate replication, and hope that the master-slave-setup is now fully functional again.

 

Approach v2: Master + Slave 1 + Slave n 

Instead of simply deploying the previously described setup which is still risky as it is the only chance for a good backup and also forces us to take the production system offline in case of failure, we take a far better approach by simply deploying at least another slave and perhaps another backup system. Now, if one slave fails or has inconsistent data, chances are still very high that one other slave still has good data on it. So, in this scenario, we only have to halt the good slave’s DB, copy it over onto the failed slave, optionally create a  hot backup somewhere else (e.g. a simple rsync onto a NAS) in between, and all of this while the productional system just keeps running w/o any interruption. Also, there is no need to configure the master-slave setup again as the “repaired” slave will just continue where the “good” slave cleanly stopped replicating from the same master. Additionally, backups can be taken at any time w/o greater influence on any of the important assets.

I strongly recommend the latter setup, simply b/c  it survived every  incident there ever was, and that means it is fully functional since many years, regardless of whichever one of the slaves is failing from time to time.