A common pattern for replication cluster setups is to spread servers over more than one datacentre. This can provide benefits such as geographically- distributed read replicas and DR (disaster recovery capability). However this also means there is a risk of disconnection at network level between datacentre locations, which would result in a split-brain scenario if servers in a secondary data centre were no longer able to see the primary in the main data centre and promoted a standby among themselves.
ltcluster enables provision of "witness server" to
artificially create a quorum of servers in a particular location, ensuring
that nodes in another location will not elect a new primary if they
are unable to see the majority of nodes. However this approach does not
scale well, particularly with more complex replication setups, e.g.
where the majority of nodes are located outside of the primary datacentre.
It also means the witness
node needs to be managed as an
extra LightDB instance outside of the main replication cluster, which
adds administrative and programming complexity.
ltcluster4
introduces the concept of location
:
each node is associated with an arbitrary location string (default is
default
); this is set in ltcluster.conf
, e.g.:
node_id=1 node_name=node1 conninfo='host=node1 user=ltcluster dbname=ltcluster connect_timeout=2' data_directory='/var/lib/lightdb/data' location='dc1'
In a failover situation, ltclusterd will check if any servers in the same location as the current primary node are visible. If not, ltclusterd will assume a network interruption and not promote any node in any other location (it will however enter degraded monitoring mode until a primary becomes visible).