12.1. Pausing the ltclusterd service

12.1.1. Prerequisites for pausing ltclusterd
12.1.2. Pausing/unpausing ltclusterd
12.1.3. Details on the ltclusterd pausing mechanism

In normal operation, ltclusterd monitors the state of the LightDB node it is running on, and will take appropriate action if problems are detected, e.g. (if so configured) promote the node to primary, if the existing primary has been determined as failed.

However, ltclusterd is unable to distinguish between planned outages (such as performing a switchover or installing LightDB maintenance released), and an actual server outage. In versions prior to ltcluster 4.2 it was necessary to stop ltclusterd on all nodes (or at least on all nodes where ltclusterd is configured for automatic failover) to prevent ltclusterd from making unintentional changes to the replication cluster.

Note

For major LightDB upgrades, ltclusterd should be shut down completely and only started up once the ltcluster packages for the new LightDB major version have been installed.

12.1.1. Prerequisites for pausing ltclusterd

In order to be able to pause/unpause ltclusterd, following prerequisites must be met:

  • LightDB on all nodes must be accessible from the node where the pause/unpause operation is executed, using the conninfo string shown by ltcluster cluster show.

Note

These conditions are required for normal ltcluster operation in any case.

12.1.2. Pausing/unpausing ltclusterd

To pause ltclusterd, execute ltcluster service pause (ltcluster 4.2 - 4.4: ltcluster daemon pause), e.g.:

$ ltcluster -f /etc/ltcluster.conf service pause
NOTICE: node 1 (node1) paused
NOTICE: node 2 (node2) paused
NOTICE: node 3 (node3) paused

The state of ltclusterd on each node can be checked with ltcluster service status (ltcluster 4.2 - 4.4: ltcluster daemon status), e.g.:

$ ltcluster -f /etc/ltcluster.conf service status
 ID | Name  | Role    | Status  | ltclusterd | PID  | Paused?
----+-------+---------+---------+---------+------+---------
 1  | node1 | primary | running | running | 7851 | yes
 2  | node2 | standby | running | running | 7889 | yes
 3  | node3 | standby | running | running | 7918 | yes

Note

If executing a switchover with ltcluster standby switchover, ltcluster will automatically pause/unpause the ltclusterd service as part of the switchover process.

If the primary (in this example, node1) is stopped, ltclusterd running on one of the standbys (here: node2) will react like this:

[2019-08-28 12:22:21] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
[2019-08-28 12:22:21] [INFO] checking state of node 1, 1 of 5 attempts
[2019-08-28 12:22:21] [INFO] sleeping 1 seconds until next reconnection attempt
...
[2019-08-28 12:22:24] [INFO] sleeping 1 seconds until next reconnection attempt
[2019-08-28 12:22:25] [INFO] checking state of node 1, 5 of 5 attempts
[2019-08-28 12:22:25] [WARNING] unable to reconnect to node 1 after 5 attempts
[2019-08-28 12:22:25] [NOTICE] node is paused
[2019-08-28 12:22:33] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state
[2019-08-28 12:22:33] [DETAIL] ltclusterd paused by administrator
[2019-08-28 12:22:33] [HINT] execute "ltcluster service unpause" to resume normal failover mode

If the primary becomes available again (e.g. following a software upgrade), ltclusterd will automatically reconnect, e.g.:

[2019-08-28 12:25:41] [NOTICE] reconnected to upstream node "node1" (ID: 1) after 8 seconds, resuming monitoring

To unpause the ltclusterd service, execute ltcluster service unpause ((ltcluster 4.2 - 4.4: ltcluster daemon unpause), e.g.:

$ ltcluster -f /etc/ltcluster.conf service unpause
NOTICE: node 1 (node1) unpaused
NOTICE: node 2 (node2) unpaused
NOTICE: node 3 (node3) unpaused

Note

If the previous primary is no longer accessible when ltclusterd is unpaused, no failover action will be taken. Instead, a new primary must be manually promoted using ltcluster standby promote, and any standbys attached to the new primary with ltcluster standby follow.

This is to prevent execution of ltcluster service unpause resulting in the automatic promotion of a new primary, which may be a problem particularly in larger clusters, where ltclusterd could select a different promotion candidate to the one intended by the administrator.

12.1.3. Details on the ltclusterd pausing mechanism

The pause state of each node will be stored over a LightDB restart.

ltcluster service pause and ltcluster service unpause can be executed even if ltclusterd is not running; in this case, ltclusterd will start up in whichever pause state has been set.

Note

ltcluster service pause and ltcluster service unpause do not start/stop ltclusterd.

The commands ltcluster daemon start and ltcluster daemon stop (if correctly configured) can be used to start/stop ltclusterd on individual nodes.