1.1. Concepts

This guide assumes that you are familiar with LightDB administration and streaming replication concepts. For further details on streaming replication, see the LightDB documentation section on streaming replication.

The following terms are used throughout the ltcluster documentation.

replication cluster
In the ltcluster documentation, "replication cluster" refers to the network of LightDB servers connected by streaming replication.
node
A node is a single LightDB server within a replication cluster.
upstream node
The node a standby server connects to, in order to receive streaming replication. This is either the primary server, or in the case of cascading replication, another standby.
failover
This is the action which occurs if a primary server fails and a suitable standby is promoted as the new primary. The ltclusterd daemon supports automatic failover to minimise downtime.
switchover
In certain circumstances, such as hardware or operating system maintenance, it's necessary to take a primary server offline; in this case a controlled switchover is necessary, whereby a suitable standby is promoted and the existing primary removed from the replication cluster in a controlled manner. The ltcluster command line client provides this functionality.
fencing
In a failover situation, following the promotion of a new standby, it's essential that the previous primary does not unexpectedly come back on line, which would result in a split-brain situation. To prevent this, the failed primary should be isolated from applications, i.e. "fenced off".
witness server

ltcluster provides functionality to set up a so-called "witness server" to assist in determining a new primary server in a failover situation with more than one standby. The witness server itself is not part of the replication cluster, although it does contain a copy of the ltcluster metadata schema.

The purpose of a witness server is to provide a "casting vote" where servers in the replication cluster are split over more than one location. In the event of a loss of connectivity between locations, the presence or absence of the witness server will decide whether a server at that location is promoted to primary; this is to prevent a "split-brain" situation where an isolated location interprets a network outage as a failure of the (remote) primary and promotes a (local) standby.

A witness server only needs to be created if ltclusterd is in use.