This guide assumes that you are familiar with LightDB administration and
streaming replication concepts. For further details on streaming
replication, see the LightDB documentation section on
streaming replication.
The following terms are used throughout the ltcluster documentation.
- replication cluster
-
In the ltcluster documentation, "replication cluster" refers to the network
of LightDB servers connected by streaming replication.
- node
-
A node is a single LightDB server within a replication cluster.
- upstream node
-
The node a standby server connects to, in order to receive streaming replication.
This is either the primary server, or in the case of cascading replication, another
standby.
- failover
-
This is the action which occurs if a primary server fails and a suitable standby
is promoted as the new primary. The ltclusterd daemon supports automatic failover
to minimise downtime.
- switchover
-
In certain circumstances, such as hardware or operating system maintenance,
it's necessary to take a primary server offline; in this case a controlled
switchover is necessary, whereby a suitable standby is promoted and the
existing primary removed from the replication cluster in a controlled manner.
The ltcluster command line client provides this functionality.
- fencing
-
In a failover situation, following the promotion of a new standby, it's
essential that the previous primary does not unexpectedly come back on
line, which would result in a split-brain situation. To prevent this,
the failed primary should be isolated from applications, i.e. "fenced off".
- witness server
ltcluster provides functionality to set up a so-called "witness server" to
assist in determining a new primary server in a failover situation with more
than one standby. The witness server itself is not part of the replication
cluster, although it does contain a copy of the ltcluster metadata schema.
The purpose of a witness server is to provide a "casting vote" where servers
in the replication cluster are split over more than one location. In the event
of a loss of connectivity between locations, the presence or absence of
the witness server will decide whether a server at that location is promoted
to primary; this is to prevent a "split-brain" situation where an isolated
location interprets a network outage as a failure of the (remote) primary and
promotes a (local) standby.
A witness server only needs to be created if ltclusterd
is in use.