ltcluster cluster matrix

ltcluster cluster matrix — runs ltcluster cluster show on each node and summarizes output

Description

ltcluster cluster matrix runs ltcluster cluster show on each node and arranges the results in a matrix, recording success or failure.

ltcluster cluster matrix requires a valid ltcluster.conf file on each node. Additionally, passwordless ssh connections are required between all nodes.

Example

Example 1 (all nodes up):

    $ ltcluster -f /etc/ltcluster.conf cluster matrix

    Name   | Id |  1 |  2 |  3
    -------+----+----+----+----
     node1 |  1 |  * |  * |  *
     node2 |  2 |  * |  * |  *
     node3 |  3 |  * |  * |  *

Example 2 (node1 and node2 up, node3 down):

    $ ltcluster -f /etc/ltcluster.conf cluster matrix

    Name   | Id |  1 |  2 |  3
    -------+----+----+----+----
     node1 |  1 |  * |  * |  x
     node2 |  2 |  * |  * |  x
     node3 |  3 |  ? |  ? |  ?
    

Each row corresponds to one server, and indicates the result of testing an outbound connection from that server.

Since node3 is down, all the entries in its row are filled with ?, meaning that there we cannot test outbound connections.

The other two nodes are up; the corresponding rows have x in the column corresponding to node3, meaning that inbound connections to that node have failed, and * in the columns corresponding to node1 and node2, meaning that inbound connections to these nodes have succeeded.

Example 3 (all nodes up, firewall dropping packets originating from node1 and directed to port 5432 on node3) - running ltcluster cluster matrix from node1 gives the following output:

    $ ltcluster -f /etc/ltcluster.conf cluster matrix

    Name   | Id |  1 |  2 |  3
    -------+----+----+----+----
     node1 |  1 |  * |  * |  x
     node2 |  2 |  * |  * |  *
     node3 |  3 |  ? |  ? |  ?

Note this may take some time depending on the connect_timeout setting in the node conninfo strings; default is 1 minute which means without modification the above command would take around 2 minutes to run; see comment elsewhere about setting connect_timeout)

The matrix tells us that we cannot connect from node1 to node3, and that (therefore) we don't know the state of any outbound connection from node3.

In this case, the ltcluster cluster crosscheck command will produce a more useful result.

Exit codes

One of the following exit codes will be emitted by ltcluster cluster matrix:

SUCCESS (0)

The check completed successfully and all nodes are reachable.

ERR_BAD_SSH (12)

One or more nodes could not be accessed via SSH.

ERR_NODE_STATUS (25)

LightDB on one or more nodes could not be reached.

Note

This error code overrides ERR_BAD_SSH.