3.6 KiB
Orleans Clustering Troubleshooting Guide
Overview
This document provides troubleshooting steps for Orleans clustering issues, particularly the "Connection attempt to endpoint failed" errors that can occur in Docker deployments.
Common Issues and Solutions
1. Connection Timeout Errors
Error: Connection attempt to endpoint S10.0.0.9:11111:114298801 timed out after 00:00:05
Cause: Orleans silos are trying to connect to each other but the network configuration is preventing proper communication.
Solutions:
A. Environment Variables
You can disable Orleans clustering completely by setting:
DISABLE_ORLEANS_CLUSTERING=true
This will fall back to localhost clustering mode for testing.
B. Docker Network Configuration
Ensure the Docker compose file includes proper network configuration:
services:
managing.api:
ports:
- "11111:11111" # Orleans silo port
- "30000:30000" # Orleans gateway port
hostname: managing-api
C. Database Connection Issues
If the Orleans database is unavailable, the system will automatically fall back to:
- Localhost clustering
- Memory-based grain storage
2. Configuration Options
Production Settings
In appsettings.Production.json:
{
"RunOrleansGrains": true,
"Orleans": {
"EnableClustering": true,
"ConnectionTimeout": 60,
"MaxJoinAttempts": 3
}
}
Environment Variables
RUN_ORLEANS_GRAINS: Enable/disable Orleans grains (true/false)DISABLE_ORLEANS_CLUSTERING: Force localhost clustering (true/false)ORLEANS_ADVERTISED_IP: Set specific IP address for Orleans clustering (e.g., "192.168.1.100")ASPNETCORE_ENVIRONMENT: Set environment (Production/Development/etc.)
3. Network Configuration Improvements
The following improvements have been made to handle Docker networking issues:
-
Endpoint Configuration:
listenOnAnyHostAddress: trueallows binding to all network interfaces- Increased timeout values for better reliability
-
Fallback Mechanisms:
- Automatic fallback to localhost clustering if database unavailable
- Memory storage fallback for grain persistence
-
Improved Timeouts:
- Response timeout: 60 seconds
- Probe timeout: 10 seconds
- Join attempt timeout: 120 seconds
4. Monitoring and Debugging
Orleans Dashboard
Available in development mode at: http://localhost:9999
- Username: admin
- Password: admin
Health Checks
Monitor application health at:
/health- Full health check/alive- Basic liveness check
5. Emergency Procedures
If Orleans clustering is causing deployment issues:
- Immediate Fix: Set environment variable
DISABLE_ORLEANS_CLUSTERING=true - Restart Services: Restart the managing.api container
- Check Logs: Monitor for connection timeout errors
- Database Check: Verify PostgreSQL Orleans database connectivity
6. Database Requirements
Orleans requires these PostgreSQL databases:
- Main application database (from
PostgreSql:ConnectionString) - Orleans clustering database (from
PostgreSql:Orleans)
If either is unavailable, the system will gracefully degrade functionality.
Testing the Fix
- Deploy with the updated configuration
- Monitor logs for Orleans connection errors
- Verify grain functionality through the dashboard (development) or API endpoints
- Test failover scenarios by temporarily disabling database connectivity
Related Files
src/Managing.Bootstrap/ApiBootstrap.cs- Orleans configurationsrc/Managing.Docker/docker-compose.yml- Docker networkingsrc/Managing.Api/appsettings.*.json- Environment-specific settings