Important Node Metrics
When you visit the metrics endpoint (see Node Inspection Service), you will notice that there are a large number of metrics and counters being produced by your node. Most of these metrics and counters are useful only for blockchain development and diagnosing hard-to-find issues. As a result, we recommend that node operators ignore most metrics and pay attention to only the key metrics presented below:
As Aptos continues to grow and develop the blockchain software, many metrics will come and go. As a result, we recommend relying on the presence of only the metrics explicitly mentioned below. All other metrics should be considered unstable and may be changed/removed without warning.
Consensus
If you are running a validator node, the following consensus metrics are important:
aptos_consensus_proposals_count
: Counts the number of times the node sent a block proposal to the network. The count will increase only when the validator is chosen to be a proposer, which depends on the node's stake and leader election reputation. You should expect this metric to increase at least once per hour.aptos_consensus_last_committed_round
: Counts the last committed round of the node. During consensus, we expect this value to increase once per consensus round, which should be multiple times per second. If this does not happen, it is likely the node is not participating in consensus.aptos_consensus_timeout_count
: Counts the number of times the node locally timed out while trying to participate in consensus. If this counter increases, it is likely the node is not participating in consensus and may be having issues, e.g., network difficulties.aptos_state_sync_executing_component_counters{label="consensus"
: This counter increases a few times per second as long as the node is participating in consensus. When this counter stops increasing, it means the node is not participating in consensus, and has likely fallen back to state synchronization (e.g., because it fell behind the rest of the validators and needs to catch up).
State sync
If you are running a fullnode (or a validator that still needs to synchronize to the latest blockchain state), the following state sync metrics are important:
aptos_state_sync_version{type="synced"}
: This metric displays the current synced version of the node, i.e., the number of transactions the node has processed. If this metric stops increasing, it means the node is not syncing. Likewise, if this metric doesn't increase faster than the rate at which new transactions are committed to the blockchain, it means the node is unlikely to get and stay up-to-date with the rest of the network. Note: if you've selected to use fast sync, this metric won't increase until all states have been downloaded, which may take some time. See (3) below.aptos_data_client_highest_advertised_data{data_type="transactions"}
: This metric displays the highest version synced and advertised by the peers that your node is connected to. As a result, when this metric is higher thanaptos_state_sync_version{type="synced"}
(above) it means your node can see new blockchain data and will sync the data from its peers.aptos_state_sync_version{type="synced_states"}
: This metric counts the number of states that have been downloaded while a node is fast syncing. If this metric doesn't increase, andaptos_state_sync_version{type="synced"}
doesn't increase (from above), it means the node is not syncing at all and an issue has likely occurred.aptos_state_sync_bootstrapper_errors
andaptos_state_sync_continuous_syncer_errors
: If your node is facing issues syncing (or is seeing transient failures), these metrics will increase each time an error occurs. Theerror_label
inside these metrics will display the error type.
Compare the synced version shown by aptos_state_sync_version{type="synced"}
with the highest version shown on the Aptos Explorer to see how far behind the latest blockchain version your node is. Remember to select the correct network that your node is syncing to (e.g., mainnet
).
Networking
The following network metrics are important, for both validators and fullnodes:
aptos_connections{direction="inbound"
andaptos_connections{direction="outbound"
: These metrics count the number of peers your node is connected to, as well as the direction of the network connection. Aninbound
connection means that a peer (e.g., another fullnode) has connected to you. Anoutbound
connection means that your node has connected to another node (e.g., connected to a validator fullnode).- If your node is a validator, the sum of both
inbound
andoutbound
connections should be equal to the number of other validators in the network. Note that only the sum of these connections matter. If all connections areinbound
, or all areoutbound
, this doesn't matter. - If your node is a fullnode, the number of
outbound
connections should be> 0
. This will ensure your node is able to synchronize. Note that the number ofinbound
connections matters only if you want to act as a seed in the network and allow other nodes to connect to you as discussed Fullnode Network Connections.
- If your node is a validator, the sum of both
Mempool
The following mempool metrics are important:
core_mempool_index_size{index="system_ttl"
: This metric displays the number of transactions currently sitting in the mempool of the node and waiting to be committed to the blockchain:- If your node is a fullnode, it's highly unlikely that this metric will be
> 0
, unless transactions are actively being sent to your node via the REST API and/or other fullnodes that have connected to you. Most fullnode operators should ignore this metric. - If your node is a validator, you can use this metric to see if transactions from your node's mempool are being included in the blockchain (e.g., if the count decreases). Likewise, if this metric only increases, it means that either: (i) your node is unable to forward transactions to other validators to be included in the blockchain; or (ii) that the entire blockchain is under heavy load and may soon become congested.
- If your node is a fullnode, it's highly unlikely that this metric will be
REST API
The following REST API metrics are important:
-
aptos_api_requests_count{method="GET"
andaptos_api_requests_count{method="POST"
: These metrics count the number of REST APIGET
andPOST
requests that have been received via the node's REST API. This allows you to monitor and track the amount of REST API traffic on your node. You can also further use theoperation_id
in the metric to monitor the types of operations the requests are performing. -
aptos_api_response_status_count
: This metric counts the number of response types that were sent for the REST API. For example,aptos_api_response_status_count{status="200"}
counts the number of requests that were successfully handled with a200
response code. You can use this metric to track the success and failure rate of the REST API traffic.