ltcluster node rejoin

ltcluster node rejoin — rejoin a dormant (stopped) node to the replication cluster

Description

Enables a dormant (stopped) node to be rejoined to the replication cluster.

This can optionally use lt_rewind to re-integrate a node which has diverged from the rest of the cluster, typically a failed primary.

Tip

If the node is running and needs to be attached to the current primary, use ltcluster standby follow.

Note ltcluster standby follow can only be used for standbys which have not diverged from the rest of the cluster.

Usage

      ltcluster node rejoin -d '$conninfo'

where $conninfo is the LightDB conninfo string of the current primary node (or that of any reachable node in the cluster, but not the local node). This is so that ltcluster can fetch up-to-date information about the current state of the cluster.

ltcluster.conf for the stopped node *must* be supplied explicitly if not otherwise available.

Options

--dry-run

Check prerequisites but don't actually execute the rejoin.

--force-rewind[=/path/to/lt_rewind]

Execute lt_rewind.

--config-files

comma-separated list of configuration files to retain after executing lt_rewind.

Currently lt_rewind will overwrite the local node's configuration files with the files from the source node, so it's advisable to use this option to ensure they are kept.

--config-archive-dir

Directory to temporarily store configuration files specified with --config-files; default: /tmp.

-W/--no-wait

Don't wait for the node to rejoin cluster.

If this option is supplied, ltcluster will restart the node but not wait for it to connect to the primary.

Configuration file settings

  • node_rejoin_timeout: the maximum length of time (in seconds) to wait for the node to reconnect to the replication cluster (defaults to the value set in standby_reconnect_timeout, 60 seconds).

    Note that standby_reconnect_timeout must be set to a value equal to or greater than node_rejoin_timeout.

Event notifications

A node_rejoin event notification will be generated.

Exit codes

One of the following exit codes will be emitted by ltcluster node rejoin:

SUCCESS (0)

The node rejoin succeeded; or if --dry-run was provided, no issues were detected which would prevent the node rejoin.

ERR_BAD_CONFIG (1)

A configuration issue was detected which prevented ltcluster from continuing with the node rejoin.

ERR_NO_RESTART (4)

The node could not be restarted.

ERR_REJOIN_FAIL (24)

The node rejoin operation failed.

Notes

Currently ltcluster node rejoin can only be used to attach a standby to the current primary, not another standby.

The node's LightDB instance must have been shut down cleanly. If this was not the case, it will need to be started up until it has reached a consistent recovery point, then shut down cleanly.

In LightDB 21 and later, this will be done automatically if the --force-rewind is provided (even if an actual rewind is not necessary).

Tip

If LightDB is started in single-user mode and input is directed from /dev/null/, it will perform recovery then immediately quit, and will then be in a state suitable for use by lt_rewind.

          rm -f /var/lib/pgsql/data/recovery.conf
          lightdb --single -D /var/lib/pgsql/data/ < /dev/null

Note that standby.signal must be removed from the data directory for LightDB to be able to start in single user mode.

Using lt_rewind

ltcluster node rejoin can optionally use lt_rewind to re-integrate a node which has diverged from the rest of the cluster, typically a failed primary. lt_rewind is available in LightDB 21 and later as part of the core distribution.

Note

lt_rewind requires that either wal_log_hints is enabled, or that data checksums were enabled when the cluster was initialized. See the lt_rewind documentation for details.

We strongly recommend familiarizing yourself with lt_rewind before attempting to use it with ltcluster, as while it is an extremely useful tool, it is not a "magic bullet" which can resolve all problematic replication situations.

A typical use-case for lt_rewind is when a scenario like the following is encountered:

    $ ltcluster node rejoin -f /etc/ltcluster.conf -d 'host=node3 dbname=ltcluster user=ltcluster' \
        --force-rewind --config-files=postgresql.local.conf,lightdb.conf --verbose --dry-run
    INFO: replication connection to the rejoin target node was successful
    INFO: local and rejoin target system identifiers match
    DETAIL: system identifier is 6652184002263212600
    ERROR: this node cannot attach to rejoin target node 3
    DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710
    HINT: use --force-rewind to execute lt_rewind

Here, node3 was promoted to a primary while the local node was still attached to the previous primary; this can potentially happen during e.g. a network split. lt_rewind can re-sync the local node with node3, removing the need for a full reclone.

To have ltcluster node rejoin use lt_rewind, pass the command line option --force-rewind, which will tell ltcluster to execute lt_rewind to ensure the node can be rejoined successfully.

lt_rewind and configuration file retention

Be aware that if lt_rewind is executed and actually performs a rewind operation, any configuration files in the LightDB data directory will be overwritten with those from the source server.

To prevent this happening, provide a comma-separated list of files to retain using the --config-file command line option; the specified files will be archived in a temporary directory (whose parent directory can be specified with --config-archive-dir, default: /tmp) and restored once the rewind operation is complete.

Example using ltcluster node rejoin and lt_rewind

Example, first using --dry-run, then actually executing the node rejoin command.

    $ ltcluster node rejoin -f /etc/ltcluster.conf -d 'host=node3 dbname=ltcluster user=ltcluster' \
        --config-files=postgresql.local.conf,lightdb.conf --verbose --force-rewind --dry-run
    INFO: replication connection to the rejoin target node was successful
    INFO: local and rejoin target system identifiers match
    DETAIL: system identifier is 6652460429293670710
    NOTICE: lt_rewind execution required for this node to attach to rejoin target node 3
    DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710
    INFO: prerequisites for using lt_rewind are met
    INFO: file "postgresql.local.conf" would be copied to "/tmp/ltcluster-config-archive-node2/postgresql.local.conf"
    INFO: file "postgresql.replication-setup.conf" would be copied to "/tmp/ltcluster-config-archive-node2/lightdb.replication-setup.conf"
    INFO: lt_rewind would now be executed
    DETAIL: lt_rewind command is:
      lt_rewind -D '/var/lib/lightdb/data' --source-server='host=node3 dbname=ltcluster user=ltcluster'
    INFO: prerequisites for executing NODE REJOIN are met

Note

If --force-rewind is used with the --dry-run option, this checks the prerequisites for using lt_rewind, but is not an absolute guarantee that actually executing lt_rewind will succeed. See also section Caveats below.

    $ ltcluster node rejoin -f /etc/ltcluster.conf -d 'host=node3 dbname=ltcluster user=ltcluster' \
        --config-files=postgresql.local.conf,lightdb.conf --verbose --force-rewind
    NOTICE: lt_rewind execution required for this node to attach to rejoin target node 3
    DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710
    NOTICE: executing lt_rewind
    DETAIL: lt_rewind command is "lt_rewind -D '/var/lib/lightdb/data' --source-server='host=node3 dbname=ltcluster user=ltcluster'"
    NOTICE: 2 files copied to /var/lib/lightdb/data
    NOTICE: setting node 2's upstream to node 3
    NOTICE: starting server using "lt_ctl -l /var/log/lightdb/startup.log -w -D '/var/lib/pgsql/data' start"
    NOTICE: NODE REJOIN successful
    DETAIL: node 2 is now attached to node 3

Caveats when using ltcluster node rejoin

ltcluster node rejoin attempts to determine whether it will succeed by comparing the timelines and relative WAL positions of the local node (rejoin candidate) and primary (rejoin target). This is particularly important if planning to use lt_rewind, which currently (as of LightDB 21) may appear to succeed (or indicate there is no action needed) but potentially allow an impossible action, such as trying to rejoin a standby to a primary which is behind the standby. ltcluster will prevent this situation from occurring.

Currently it is not possible to detect a situation where the rejoin target is a standby which has been "promoted" by removing recovery.conf (LightDB 21 and later: standby.signal) and restarting it. In this case there will be no information about the point the rejoin target diverged from the current standby; the rejoin operation will fail and the current standby's LightDB log will contain entries with the text "record with incorrect prev-link".

We strongly recommend running ltcluster node rejoin with the --dry-run option first. Additionally it might be a good idea to execute the lt_rewind command displayed by ltcluster with the lt_rewind --dry-run option. Note that lt_rewind does not indicate that it is running in --dry-run mode.

Warning

In all current LightDB versions (as of September 2020), lt_rewind contains a corner-case bug which affects standbys in a very specific situation.

This situation occurs when a standby was shut down before its primary node, and an attempt is made to attach this standby to another primary in the same cluster (following a "split brain" situation where the standby was connected to the wrong primary). In this case, ltcluster will correctly determine that lt_rewind should be executed, however lt_rewind incorrectly decides that no action is necessary.

In this situation, ltcluster will report something like:

    NOTICE: lt_rewind execution required for this node to attach to rejoin target node 1
    DETAIL: rejoin target server's timeline 3 forked off current database system timeline 2 before current recovery point 0/7019C10

but when executed, lt_rewind will report:

    lt_rewind: servers diverged at WAL location 0/7015540 on timeline 2
    lt_rewind: no rewind required

and if an attempt is made to attach the standby to the new primary, LightDB logs on the standby will contain errors like:

    [2020-09-07 15:01:41 UTC]    LOG:  00000: replication terminated by primary server
    [2020-09-07 15:01:41 UTC]    DETAIL:  End of WAL reached on timeline 2 at 0/7015540.
    [2020-09-07 15:01:41 UTC]    LOG:  00000: new timeline 3 forked off current database system timeline 2 before current recovery point 0/7019C10

Currently it is not possible to resolve this situation using lt_rewind. A patch has been submitted and will hopefully be included in a forthcoming LightDB minor release.

As a workaround, start the primary server the standby was previously attached to, and ensure the standby can be attached to it. If lt_rewind was actually executed, it will have copied in the .history file from the target primary server; this must be removed. ltcluster node rejoin can then be used to attach the standby to the original primary. Ensure any changes pending on the primary have propogated to the standby. Then shut down the primary server first, before shutting down the standby. It should then be possible to use ltcluster node rejoin to attach the standby to the new primary.

See also

ltcluster standby follow