12.4. Storing monitoring data

When ltclusterd is running with the option monitoring_history=true, it will constantly write standby node status information to the monitoring_history table, providing a near-real time overview of replication status on all nodes in the cluster.

The view replication_status shows the most recent state for each node, e.g.:

    ltcluster=# select * from ltcluster.replication_status;
    -[ RECORD 1 ]-------------+------------------------------
    primary_node_id           | 1
    standby_node_id           | 2
    standby_name              | node2
    node_type                 | standby
    active                    | t
    last_monitor_time         | 2017-08-24 16:28:41.260478+09
    last_wal_primary_location | 0/6D57A00
    last_wal_standby_location | 0/5000000
    replication_lag           | 29 MB
    replication_time_lag      | 00:00:11.736163
    apply_lag                 | 15 MB
    communication_time_lag    | 00:00:01.365643

The interval in which monitoring history is written is controlled by the configuration parameter monitor_interval_secs; default is 2.

As this can generate a large amount of monitoring data in the table ltcluster.monitoring_history. it's advisable to regularly purge historical data using the ltcluster cluster cleanup command; use the -k/--keep-history option to specify how many day's worth of data should be retained.

It's possible to use ltclusterd to run in monitoring mode only (without automatic failover capability) for some or all nodes by setting failover=manual in the node's ltcluster.conf file. In the event of the node's upstream failing, no failover action will be taken and the node will require manual intervention to be reattached to replication. If this occurs, an event notification standby_disconnect_manual will be created.

Note that when a standby node is not streaming directly from its upstream node, e.g. recovering WAL from an archive, apply_lag will always appear as 0 bytes.

Tip

If monitoring history is enabled, the contents of the ltcluster.monitoring_history table will be replicated to attached standbys. This means there will be a small but constant stream of replication activity which may not be desirable. To prevent this, convert the table to an UNLOGGED one with:

     ALTER TABLE ltcluster.monitoring_history SET UNLOGGED;

This will however mean that monitoring history will not be available on another node following a failover, and the view ltcluster.replication_status will not work on standbys.