Advanced Operations
This guide covers advanced operational topics for running production-grade Shardeum nodes.
While the current codebase supports advanced node operations, operators attempting advanced configurations should have a solid understanding of blockchain infrastructure and system administration.
1. Production Deployment Best Practices
Using systemd Service
Create a systemd service file for automatic restarts and easier management:
Example service file:
Enable and start the service:
Firewall Configuration
For Full Nodes / RPC Nodes:
For Validators:
For validators, consider restricting RPC access to localhost only. Never expose validator RPC endpoints publicly.
2. Monitoring and Alerting
Enable Prometheus Metrics
Edit config.toml:
Monitoring Stack Setup
Recommended tools:
- Prometheus - Metrics collection
- Grafana - Visualization dashboards
- Alertmanager - Alert notifications
Key metrics to monitor:
- Sync status
- Block height
- Validator jail status
- Disk space usage
- Memory usage
- CPU usage
- Missed blocks
- Peer count
- Network latency
Alert Conditions
Set up alerts for:
- Node falls behind by more than 100 blocks
- Validator is jailed
- Disk usage exceeds 80%
- Memory usage exceeds 90%
- Peer count drops below 5
- Node stops producing blocks (for validators)
3. Security Best Practices
Sentry Node Architecture
A recommended production setup for validators:
Benefits:
- Hides validator's IP address
- Absorbs DDoS traffic
- Reduces attack surface
- Improves security
Configuration:
- Run validator on private network
- Connect validator only to sentry nodes
- Configure sentry nodes with public IPs
- Update
persistent_peersto point validator at sentries
Key Management System (KMS)
For enhanced security, consider:
- Tendermint KMS for validator key management
- Hardware Security Modules (HSM) for key storage
- YubiHSM2 integration
- Remote signing capabilities
KMS setup requires advanced configuration. Thoroughly test in a non-production environment first.
Security Checklist
- ✅ Use firewall rules to restrict access
- ✅ Disable SSH password authentication (use keys only)
- ✅ Keep system packages updated
- ✅ Use fail2ban or similar intrusion prevention
- ✅ Implement DDoS protection
- ✅ Regular security audits
- ✅ Monitor logs for suspicious activity
- ✅ Use VPN for administrative access
4. Backup and Recovery
Critical Files to Back Up
Validator-specific:
All nodes:
Wallet keys:
Backup Script Example
Disaster Recovery
If validator key is compromised:
- Immediately unbond and remove validator
- Generate new keys
- Create new validator
- Report incident to network
If node fails:
- Deploy new server with identical configuration
- Restore backup files
- Sync node to current block height
- Unjail validator if necessary
5. Performance Optimization
Pruning Strategies
Full nodes (custom pruning):
Archive nodes (no pruning):
Validators:
- Use minimal pruning or default settings
- Avoid aggressive pruning to maintain full state
Database Optimization
Enable state sync for faster initial sync:
Edit config.toml:
Hardware Tuning
SSD optimization:
Network tuning:
6. Scaling RPC Infrastructure
Load Balancing
For high-traffic dApps:
- Use Nginx, HAProxy, or AWS ELB
- Run multiple RPC nodes behind a reverse proxy
- Implement rate limiting to avoid overload
- Separate "public RPC" from "private infra RPC"
Example Nginx configuration:
Caching Strategies
- Cache common queries (latest block, chain ID)
- Use Redis for query caching
- Implement CDN for static responses
7. Logging and Debugging
Viewing Logs
If using systemd:
If running manually:
Debug Mode
Enable verbose logging in config.toml:
Common Debug Commands
8. Upgrade Procedures
Coordinated Network Upgrades
Preparation:
- Monitor official announcements for upgrade schedule
- Backup all critical files
- Test upgrade on testnet first
- Prepare rollback plan
Upgrade steps:
- Stop the node:
sudo systemctl stop shardeumd - Backup current binary:
cp $(which shardeumd) shardeumd.backup - Download and install new binary
- Verify version:
shardeumd version - Start node:
sudo systemctl start shardeumd - Monitor logs for issues
Rollback Procedure
If upgrade fails:
9. Troubleshooting Advanced Issues
High Memory Usage
Database Corruption
Network Connectivity Issues
10. Important Resources
- Chain ID:
shardeum_8118-1(mainnet) - EVM Chain ID:
8118(hex:0x1fb6) - Official Documentation: docs.shardeum.org
- GitHub: github.com/shardeum
- Discord: Community support and announcements
Advanced operations require careful planning and testing. Always test configuration changes in a non-production environment first.