7.4. Troubleshooting switchover issues

7.4. Troubleshooting switchover issues
Prev	Up	Chapter 7. Performing a switchover with ltcluster	Home	Next

7.4.1. Demotion candidate (old primary) does not shut down
7.4.2. Switchover aborts with an "exclusive backup" error

As emphasised previously, performing a switchover is a non-trivial operation and there are a number of potential issues which can occur. While ltcluster attempts to perform sanity checks, there's no guaranteed way of determining the success of a switchover without actually carrying it out.

7.4.1. Demotion candidate (old primary) does not shut down

ltcluster may abort a switchover with a message like:

ERROR: shutdown of the primary server could not be confirmed
HINT: check the primary server status before performing any further actions

This means the shutdown of the old primary has taken longer than ltcluster expected, and it has given up waiting.

In this case, check the LightDB log on the primary server to see what is going on. It's entirely possible the shutdown process is just taking longer than the timeout set by the configuration parameter shutdown_check_timeout (default: 60 seconds), in which case you may need to adjust this parameter.

Note

Note that shutdown_check_timeout is set on the node where ltcluster standby switchover is executed (promotion candidate); setting it on the demotion candidate (former primary) will have no effect.

If the primary server has shut down cleanly, and no other node has been promoted, it is safe to restart it, in which case the replication cluster will be restored to its original configuration.

7.4.2. Switchover aborts with an "exclusive backup" error

ltcluster may abort a switchover with a message like:

ERROR: unable to perform a switchover while primary server is in exclusive backup mode
HINT: stop backup before attempting the switchover

This means an exclusive backup is running on the current primary; interrupting this will not only abort the backup, but potentially leave the primary with an ambiguous backup state.

To proceed, either wait until the backup has finished, or cancel it with the command SELECT pg_stop_backup(). For more details see the LightDB documentation section Making an exclusive low level backup.

Prev	Up	Next
7.3. Caveats	Home	Chapter 8. Event Notifications