12.3. "degraded monitoring" mode

12.3. "degraded monitoring" mode
Prev	Up	Chapter 12. ltclusterd operation	Home	Next

In certain circumstances, ltclusterd is not able to fulfill its primary mission of monitoring the node's upstream server. In these cases it enters "degraded monitoring" mode, where ltclusterd remains active but is waiting for the situation to be resolved.

Situations where this happens are:

a failover situation has occurred, no nodes in the primary node's location are visible
a failover situation has occurred, but no promotion candidate is available
a failover situation has occurred, but the promotion candidate could not be promoted
a failover situation has occurred, but the node was unable to follow the new primary
a failover situation has occurred, but no primary has become available
a failover situation has occurred, but automatic failover is not enabled for the node
ltclusterd is monitoring the primary node, but it is not available (and no other node has been promoted as primary)

Example output in a situation where there is only one standby with failover=manual, and the primary node is unavailable (but is later restarted):

    [2017-08-29 10:59:19] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in normal state (automatic failover disabled)
    [2017-08-29 10:59:33] [WARNING] unable to connect to upstream node "node1" (ID: 1)
    [2017-08-29 10:59:33] [INFO] checking state of node 1, 1 of 5 attempts
    [2017-08-29 10:59:33] [INFO] sleeping 1 seconds until next reconnection attempt
    (...)
    [2017-08-29 10:59:37] [INFO] checking state of node 1, 5 of 5 attempts
    [2017-08-29 10:59:37] [WARNING] unable to reconnect to node 1 after 5 attempts
    [2017-08-29 10:59:37] [NOTICE] this node is not configured for automatic failover so will not be considered as promotion candidate
    [2017-08-29 10:59:37] [NOTICE] no other nodes are available as promotion candidate
    [2017-08-29 10:59:37] [HINT] use "ltcluster standby promote" to manually promote this node
    [2017-08-29 10:59:37] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state (automatic failover disabled)
    [2017-08-29 10:59:53] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state (automatic failover disabled)
    [2017-08-29 11:00:45] [NOTICE] reconnected to upstream node "node1" (ID: 1) after 68 seconds, resuming monitoring
    [2017-08-29 11:00:57] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in normal state (automatic failover disabled)

By default, ltclusterd will continue in degraded monitoring mode indefinitely. However a timeout (in seconds) can be set with degraded_monitoring_timeout, after which ltclusterd will terminate.

Note

If ltclusterd is monitoring a primary mode which has been stopped and manually restarted as a standby attached to a new primary, it will automatically detect the status change and update the node record to reflect the node's new status as an active standby. It will then resume monitoring the node as a standby.

Prev	Up	Next
12.2. ltclusterd and paused WAL replay	Home	12.4. Storing monitoring data