In certain circumstances, ltclusterd is not able to fulfill its primary mission of monitoring the node's upstream server. In these cases it enters "degraded monitoring" mode, where ltclusterd remains active but is waiting for the situation to be resolved.
Situations where this happens are:
Example output in a situation where there is only one standby with failover=manual
,
and the primary node is unavailable (but is later restarted):
[2017-08-29 10:59:19] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in normal state (automatic failover disabled) [2017-08-29 10:59:33] [WARNING] unable to connect to upstream node "node1" (ID: 1) [2017-08-29 10:59:33] [INFO] checking state of node 1, 1 of 5 attempts [2017-08-29 10:59:33] [INFO] sleeping 1 seconds until next reconnection attempt (...) [2017-08-29 10:59:37] [INFO] checking state of node 1, 5 of 5 attempts [2017-08-29 10:59:37] [WARNING] unable to reconnect to node 1 after 5 attempts [2017-08-29 10:59:37] [NOTICE] this node is not configured for automatic failover so will not be considered as promotion candidate [2017-08-29 10:59:37] [NOTICE] no other nodes are available as promotion candidate [2017-08-29 10:59:37] [HINT] use "ltcluster standby promote" to manually promote this node [2017-08-29 10:59:37] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state (automatic failover disabled) [2017-08-29 10:59:53] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state (automatic failover disabled) [2017-08-29 11:00:45] [NOTICE] reconnected to upstream node "node1" (ID: 1) after 68 seconds, resuming monitoring [2017-08-29 11:00:57] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in normal state (automatic failover disabled)
By default, ltclusterd
will continue in degraded monitoring mode indefinitely.
However a timeout (in seconds) can be set with degraded_monitoring_timeout
,
after which ltclusterd will terminate.
If ltclusterd is monitoring a primary mode which has been stopped and manually restarted as a standby attached to a new primary, it will automatically detect the status change and update the node record to reflect the node's new status as an active standby. It will then resume monitoring the node as a standby.