Running a validator node can sometimes present challenges. Here’s a guide to troubleshooting some common issues you might encounter with your Shardeum node.
Symptom: Node logs show errors like invoke-exit: exitUncleanly: isReadyToJoin: Not ready to join or the node stops shortly after starting, especially after a known network upgrade.
Cause: The Shardeum network (testnet, stagenet) has been upgraded to a new version, and your validator is running an older, incompatible Docker image.
Solution:
Check official Shardeum channels (Discord, Telegram, forums) for announcements about the latest validator image tag for the specific network you're on.
Pull the new Docker image: docker pull ghcr.io/shardeum/shardeum-validator:NEW_TAG
Update your docker-compose.yml to use the NEW_TAG.
Recreate your container: docker-compose down && docker-compose up -d.
Port Accessibility Issues:
Symptom: Node status shows state: stopped with exitMessage: Unable to access external or internal ports... or the node is stuck in waiting-for-network.
Cause: The network cannot reach your validator on its configured P2P ports (SHMINT, SHMEXT).
Solution:
Verify your docker-compose.yml: Ensure correct port mapping (e.g., "9001:9001", "10001:10001"). The host port and container port for SHMINT and SHMEXT must match the values set in the environment variables.
Firewall: Check your server's firewall (e.g., ufw on Ubuntu, cloud provider security groups). Ensure the host ports used for SHMINT and SHMEXT are open for incoming TCP traffic from the internet.
Public IP: Confirm your node is correctly detecting its public IP. If it's behind a complex NAT or VPN, this might be an issue.
Test reachability: curl http://YOUR_SERVER_PUBLIC_IP:HOST_PORT_FOR_SHMEXT/nodeinfo.
Insufficient System Resources:
Symptom: Node crashes, becomes unresponsive, or logs show out-of-memory errors.
Cause: Your server doesn't meet the minimum CPU, RAM, or disk space requirements.
Solution: Upgrade your server resources according to Shardeum's official recommendations. Ensure ample free disk space.
Corrupted Data / Database Issues:
Symptom: Node fails to start, errors related to database files in logs.
Cause: Improper shutdown, disk errors, or other issues might corrupt the node's local data.
Solution:
Backup secrets.json! This file contains your validator's identity.
You might try stopping the node, removing the contents of the data directory (the one mapped in your docker-compose.yml volume, except for secrets.json), and restarting. The node will then resync.
If secrets.json is corrupted and you have no backup, the stake associated with that specific validator identity might be difficult to recover without the programmatic unstaking methods (see advanced guides).
Software Bugs:
Symptom: Unexplained crashes, persistent errors even with correct configuration.
Cause: A bug in the current validator software version.
Solution:
Check official Shardeum channels for any known issues or patches.
Provide detailed logs to the Shardeum team (see below).
Usually means your server's clock is out of sync with the network. Ensure your system time is synchronized using NTP.
Can also be caused by high network latency or issues with the RPC endpoint.
Error: No stake found (during unstake):
The wallet you're using (identified by its private key) has no SHM staked to any node.
A previous unstake attempt for this stake might have succeeded (even if the CLI seemed stuck). Check your wallet balance on the explorer.
This node is in the network's Standby list. You can unstake only after the node leaves the Standby list! (or similar messages for active / ready state):
You are trying to unstake a node that is not properly stopped and past its stake lock period. Follow the correct unstaking sequence: wait for standby, then operator-cli stop, then wait for stakeState.unlocked: true and stakeState.remainingTime: 0 before running operator-cli unstake.
Stake amount is less than minimum required stake amount:
You are trying to stake less than the network's minimum requirement (e.g., less than 2400 SHM on stagenet).
AxiosError: timeout of XXXXms exceeded / Unable to fetch data from network (out of retries: unknown reason) (during stake/status):
Could be temporary network issues or problems with the archivers/RPC endpoint. Try again after a few minutes.
Can also occur if the staker wallet has insufficient funds (the error message isn't always direct for this case).
If trying to stake, and stakeable.reason in operator-cli status shows "Network request failed, allowing stake by default", it means the CLI couldn't verify the 30-min staking cooldown due to network issues. The stake might still fail if you're within the cooldown.
The node process might have crashed or is not running properly. Check pm2 logs inside the container.
Could also indicate an issue with the secrets.json file or the node's ability to read its own state.
Failed to execute unstake transaction: Error: processing response error (body={\"jsonrpc\":\"2.0\",\"id\":53,\"error\":{\"code\":101,...}}):
This is a generic RPC error from the network. The actual error is in the message field within the JSON body. For example: "message\":\"This node is still selected in the network. You can unstake only after the node leaves the network!\".