When ltclusterd is running with the option monitoring_history=true
,
it will constantly write standby node status information to the
monitoring_history
table, providing a near-real time
overview of replication status on all nodes
in the cluster.
The view replication_status
shows the most recent state
for each node, e.g.:
ltcluster=# select * from ltcluster.replication_status; -[ RECORD 1 ]-------------+------------------------------ primary_node_id | 1 standby_node_id | 2 standby_name | node2 node_type | standby active | t last_monitor_time | 2017-08-24 16:28:41.260478+09 last_wal_primary_location | 0/6D57A00 last_wal_standby_location | 0/5000000 replication_lag | 29 MB replication_time_lag | 00:00:11.736163 apply_lag | 15 MB communication_time_lag | 00:00:01.365643
The interval in which monitoring history is written is controlled by the
configuration parameter monitor_interval_secs
;
default is 2.
As this can generate a large amount of monitoring data in the table
ltcluster.monitoring_history
. it's advisable to regularly
purge historical data using the ltcluster cluster cleanup
command; use the -k/--keep-history
option to
specify how many day's worth of data should be retained.
It's possible to use ltclusterd to run in monitoring
mode only (without automatic failover capability) for some or all
nodes by setting failover=manual
in the node's
ltcluster.conf
file. In the event of the node's upstream failing,
no failover action will be taken and the node will require manual intervention to
be reattached to replication. If this occurs, an
event notification
standby_disconnect_manual
will be created.
Note that when a standby node is not streaming directly from its upstream
node, e.g. recovering WAL from an archive, apply_lag
will always appear as
0 bytes
.
If monitoring history is enabled, the contents of the ltcluster.monitoring_history
table will be replicated to attached standbys. This means there will be a small but
constant stream of replication activity which may not be desirable. To prevent
this, convert the table to an UNLOGGED
one with:
ALTER TABLE ltcluster.monitoring_history SET UNLOGGED;
This will however mean that monitoring history will not be available on
another node following a failover, and the view ltcluster.replication_status
will not work on standbys.