From ltcluster 5, ltcluster makes it possible to provide a script to ltclusterd which, in a failover situation, will be executed by the promotion candidate (the node which has been selected to be the new primary) to confirm whether the node should actually be promoted.
To use this, failover_validation_command
in ltcluster.conf
to a script executable by the lightdb
system user, e.g.:
failover_validation_command=/path/to/script.sh %n
The %n
parameter will be replaced with the node ID when the script is
executed. A number of other parameters are also available, see section
"Optional configuration for automatic failover" for details.
This script must return an exit code of 0
to indicate the node should promote itself.
Any other value will result in the promotion being aborted and the election rerun.
There is a pause of election_rerun_interval
seconds before the election is rerun.
Sample ltclusterd log file output during which the failover validation script rejects the proposed promotion candidate:
[2019-03-13 21:01:30] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds [2019-03-13 21:01:30] [NOTICE] promotion candidate is "node2" (ID: 2) [2019-03-13 21:01:30] [NOTICE] executing "failover_validation_command" [2019-03-13 21:01:30] [DETAIL] /usr/local/bin/failover-validation.sh 2 [2019-03-13 21:01:30] [INFO] output returned by failover validation command: Node ID: 2 [2019-03-13 21:01:30] [NOTICE] failover validation command returned a non-zero value: "1" [2019-03-13 21:01:30] [NOTICE] promotion candidate election will be rerun [2019-03-13 21:01:30] [INFO] 1 followers to notify [2019-03-13 21:01:30] [NOTICE] notifying node "node3" (ID: 3) to rerun promotion candidate selection INFO: node 3 received notification to rerun promotion candidate election [2019-03-13 21:01:30] [NOTICE] rerunning election after 15 seconds ("election_rerun_interval")