Advanced Operations

This guide covers advanced operational topics for running production-grade Shardeum nodes.

While the current codebase supports advanced node operations, operators attempting advanced configurations should have a solid understanding of blockchain infrastructure and system administration.

1. Production Deployment Best Practices

Using systemd Service

Create a systemd service file for automatic restarts and easier management:

sudo nano /etc/systemd/system/shardeumd.service

Example service file:

[Unit]
Description=Shardeum Node
After=network-online.target
 
[Service]
User=root
ExecStart=/usr/local/bin/shardeumd start --home /root/.mainnet/node0
Restart=on-failure
RestartSec=3
LimitNOFILE=65535
 
[Install]
WantedBy=multi-user.target

Enable and start the service:

sudo systemctl enable shardeumd
sudo systemctl start shardeumd
sudo systemctl status shardeumd

Firewall Configuration

For Full Nodes / RPC Nodes:

sudo ufw allow 27656/tcp   # P2P
sudo ufw allow 8545/tcp    # JSON-RPC (optional)
sudo ufw allow 8546/tcp    # WebSocket (optional)
sudo ufw enable

For Validators:

sudo ufw allow 27656/tcp   # P2P
sudo ufw allow 8545/tcp    # JSON-RPC (optional)
sudo ufw allow 8546/tcp    # WebSocket (optional)
sudo ufw enable

For validators, consider restricting RPC access to localhost only. Never expose validator RPC endpoints publicly.

2. Monitoring and Alerting

Enable Prometheus Metrics

Edit config.toml:

prometheus = true
prometheus_listen_addr = ":26660"

Monitoring Stack Setup

Recommended tools:

Prometheus - Metrics collection
Grafana - Visualization dashboards
Alertmanager - Alert notifications

Key metrics to monitor:

Sync status
Block height
Validator jail status
Disk space usage
Memory usage
CPU usage
Missed blocks
Peer count
Network latency

Alert Conditions

Set up alerts for:

Node falls behind by more than 100 blocks
Validator is jailed
Disk usage exceeds 80%
Memory usage exceeds 90%
Peer count drops below 5
Node stops producing blocks (for validators)

3. Security Best Practices

Sentry Node Architecture

A recommended production setup for validators:

Internet → Sentry Nodes (Public) → Validator (Private IP only)

Benefits:

Hides validator's IP address
Absorbs DDoS traffic
Reduces attack surface
Improves security

Configuration:

Run validator on private network
Connect validator only to sentry nodes
Configure sentry nodes with public IPs
Update persistent_peers to point validator at sentries

Key Management System (KMS)

For enhanced security, consider:

Tendermint KMS for validator key management
Hardware Security Modules (HSM) for key storage
YubiHSM2 integration
Remote signing capabilities

KMS setup requires advanced configuration. Thoroughly test in a non-production environment first.

Security Checklist

✅ Use firewall rules to restrict access
✅ Disable SSH password authentication (use keys only)
✅ Keep system packages updated
✅ Use fail2ban or similar intrusion prevention
✅ Implement DDoS protection
✅ Regular security audits
✅ Monitor logs for suspicious activity
✅ Use VPN for administrative access

4. Backup and Recovery

Critical Files to Back Up

Validator-specific:

~/.mainnet/$NODE_ID/config/priv_validator_key.json
~/.mainnet/$NODE_ID/data/priv_validator_state.json

All nodes:

~/.mainnet/$NODE_ID/config/node_key.json
~/.mainnet/$NODE_ID/config/config.toml
~/.mainnet/$NODE_ID/config/app.toml

Wallet keys:

# Mnemonic phrase (keep offline and secure)

Backup Script Example

#!/bin/bash
NODE_ID="node0"
BACKUP_DIR="/secure/backup/location"
DATE=$(date +%Y%m%d_%H%M%S)
 
# Create backup directory
mkdir -p $BACKUP_DIR/$DATE
 
# Backup critical files
cp ~/.mainnet/$NODE_ID/config/priv_validator_key.json $BACKUP_DIR/$DATE/
cp ~/.mainnet/$NODE_ID/config/node_key.json $BACKUP_DIR/$DATE/
cp ~/.mainnet/$NODE_ID/config/*.toml $BACKUP_DIR/$DATE/
 
# Create encrypted archive
tar -czf $BACKUP_DIR/backup_$DATE.tar.gz -C $BACKUP_DIR $DATE
rm -rf $BACKUP_DIR/$DATE
 
echo "Backup completed: backup_$DATE.tar.gz"

Disaster Recovery

If validator key is compromised:

Immediately unbond and remove validator
Generate new keys
Create new validator
Report incident to network

If node fails:

Deploy new server with identical configuration
Restore backup files
Sync node to current block height
Unjail validator if necessary

5. Performance Optimization

Pruning Strategies

Full nodes (custom pruning):

pruning = "custom"
pruning-keep-recent = "10000"
pruning-interval = "50"

Archive nodes (no pruning):

--pruning nothing

Validators:

Use minimal pruning or default settings
Avoid aggressive pruning to maintain full state

Database Optimization

Enable state sync for faster initial sync:

Edit config.toml:

[statesync]
enable = true
rpc_servers = "rpc1.shardeum.org:26657,rpc2.shardeum.org:26657"
trust_height = <recent_height>
trust_hash = "<block_hash>"

Hardware Tuning

SSD optimization:

# Enable TRIM
sudo systemctl enable fstrim.timer
 
# Check I/O scheduler
cat /sys/block/nvme0n1/queue/scheduler

Network tuning:

# Increase network buffers
sudo sysctl -w net.core.rmem_max=134217728
sudo sysctl -w net.core.wmem_max=134217728

6. Scaling RPC Infrastructure

Load Balancing

For high-traffic dApps:

Use Nginx, HAProxy, or AWS ELB
Run multiple RPC nodes behind a reverse proxy
Implement rate limiting to avoid overload
Separate "public RPC" from "private infra RPC"

Example Nginx configuration:

upstream rpc_backend {
    least_conn;
    server 10.0.1.10:8545;
    server 10.0.1.11:8545;
    server 10.0.1.12:8545;
}
 
server {
    listen 80;
    server_name rpc.example.com;
 
    location / {
        proxy_pass http://rpc_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
 
        # Rate limiting
        limit_req zone=rpc_limit burst=10 nodelay;
    }
}

Caching Strategies

Cache common queries (latest block, chain ID)
Use Redis for query caching
Implement CDN for static responses

7. Logging and Debugging

Viewing Logs

If using systemd:

journalctl -u shardeumd -f
journalctl -u shardeumd --since "1 hour ago"

If running manually:

tail -f ~/.mainnet/$NODE_ID/node.log

Debug Mode

Enable verbose logging in config.toml:

log_level = "debug"

Common Debug Commands

# Check sync status
shardeumd status | jq '.SyncInfo'
 
# Check peer connections
curl -s http://localhost:26657/net_info | jq '.result.n_peers'
 
# Query consensus state
curl -s http://localhost:26657/consensus_state
 
# Check validator signing info
shardeumd query slashing signing-info $(shardeumd comet show-validator)

8. Upgrade Procedures

Coordinated Network Upgrades

Preparation:

Monitor official announcements for upgrade schedule
Backup all critical files
Test upgrade on testnet first
Prepare rollback plan

Upgrade steps:

Stop the node: sudo systemctl stop shardeumd
Backup current binary: cp $(which shardeumd) shardeumd.backup
Download and install new binary
Verify version: shardeumd version
Start node: sudo systemctl start shardeumd
Monitor logs for issues

Rollback Procedure

If upgrade fails:

sudo systemctl stop shardeumd
cp shardeumd.backup /usr/local/bin/shardeumd
sudo systemctl start shardeumd

9. Troubleshooting Advanced Issues

High Memory Usage

# Check memory usage
free -h
htop
 
# Restart node to clear memory
sudo systemctl restart shardeumd

Database Corruption

# Reset data (will require full resync)
shardeumd tendermint unsafe-reset-all --home ~/.mainnet/$NODE_ID
 
# Restore from snapshot (if available)
# Download snapshot and extract to data directory

Network Connectivity Issues

# Test peer connectivity
telnet <peer-ip> 27656
 
# Check firewall
sudo ufw status verbose
 
# Monitor network traffic
sudo iftop -i eth0

10. Important Resources

Chain ID: shardeum_8118-1 (mainnet)
EVM Chain ID: 8118 (hex: 0x1fb6)
Official Documentation: docs.shardeum.org
GitHub: github.com/shardeum
Discord: Community support and announcements

Advanced operations require careful planning and testing. Always test configuration changes in a non-production environment first.

Advanced Operations

On this page