# Orleans Clustering Troubleshooting Guide ## Overview This document provides troubleshooting steps for Orleans clustering issues, particularly the "Connection attempt to endpoint failed" errors that can occur in Docker deployments. ## Common Issues and Solutions ### 1. Connection Timeout Errors **Error**: `Connection attempt to endpoint S10.0.0.9:11111:114298801 timed out after 00:00:05` **Cause**: Orleans silos are trying to connect to each other but the network configuration is preventing proper communication. **Solutions**: #### A. Environment Variables You can disable Orleans clustering completely by setting: ```bash DISABLE_ORLEANS_CLUSTERING=true ``` This will fall back to localhost clustering mode for testing. #### B. Docker Network Configuration Ensure the Docker compose file includes proper network configuration: ```yaml services: managing.api: ports: - "11111:11111" # Orleans silo port - "30000:30000" # Orleans gateway port hostname: managing-api ``` #### C. Database Connection Issues If the Orleans database is unavailable, the system will automatically fall back to: - Localhost clustering - Memory-based grain storage ### 2. Configuration Options #### Production Settings In `appsettings.Production.json`: ```json { "RunOrleansGrains": true, "Orleans": { "EnableClustering": true, "ConnectionTimeout": 60, "MaxJoinAttempts": 3 } } ``` #### Environment Variables - `RUN_ORLEANS_GRAINS`: Enable/disable Orleans grains (true/false) - `DISABLE_ORLEANS_CLUSTERING`: Force localhost clustering (true/false) - `ORLEANS_ADVERTISED_IP`: Set specific IP address for Orleans clustering (e.g., "192.168.1.100") - `ASPNETCORE_ENVIRONMENT`: Set environment (Production/Development/etc.) ### 3. Network Configuration Improvements The following improvements have been made to handle Docker networking issues: 1. **Endpoint Configuration**: - `listenOnAnyHostAddress: true` allows binding to all network interfaces - Increased timeout values for better reliability 2. **Fallback Mechanisms**: - Automatic fallback to localhost clustering if database unavailable - Memory storage fallback for grain persistence 3. **Improved Timeouts**: - Response timeout: 60 seconds - Probe timeout: 10 seconds - Join attempt timeout: 120 seconds ### 4. Monitoring and Debugging #### Orleans Dashboard Available in development mode at: `http://localhost:9999` - Username: admin - Password: admin #### Health Checks Monitor application health at: - `/health` - Full health check - `/alive` - Basic liveness check ### 5. Emergency Procedures If Orleans clustering is causing deployment issues: 1. **Immediate Fix**: Set environment variable `DISABLE_ORLEANS_CLUSTERING=true` 2. **Restart Services**: Restart the managing.api container 3. **Check Logs**: Monitor for connection timeout errors 4. **Database Check**: Verify PostgreSQL Orleans database connectivity ### 6. Database Requirements Orleans requires these PostgreSQL databases: - Main application database (from `PostgreSql:ConnectionString`) - Orleans clustering database (from `PostgreSql:Orleans`) If either is unavailable, the system will gracefully degrade functionality. ## Testing the Fix 1. Deploy with the updated configuration 2. Monitor logs for Orleans connection errors 3. Verify grain functionality through the dashboard (development) or API endpoints 4. Test failover scenarios by temporarily disabling database connectivity ## Related Files - `src/Managing.Bootstrap/ApiBootstrap.cs` - Orleans configuration - `src/Managing.Docker/docker-compose.yml` - Docker networking - `src/Managing.Api/appsettings.*.json` - Environment-specific settings