Update silo/cluster config
This commit is contained in:
118
src/Managing.Api/README-ORLEANS-TROUBLESHOOTING.md
Normal file
118
src/Managing.Api/README-ORLEANS-TROUBLESHOOTING.md
Normal file
@@ -0,0 +1,118 @@
|
||||
# Orleans Clustering Troubleshooting Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This document provides troubleshooting steps for Orleans clustering issues, particularly the "Connection attempt to endpoint failed" errors that can occur in Docker deployments.
|
||||
|
||||
## Common Issues and Solutions
|
||||
|
||||
### 1. Connection Timeout Errors
|
||||
|
||||
**Error**: `Connection attempt to endpoint S10.0.0.9:11111:114298801 timed out after 00:00:05`
|
||||
|
||||
**Cause**: Orleans silos are trying to connect to each other but the network configuration is preventing proper communication.
|
||||
|
||||
**Solutions**:
|
||||
|
||||
#### A. Environment Variables
|
||||
You can disable Orleans clustering completely by setting:
|
||||
```bash
|
||||
DISABLE_ORLEANS_CLUSTERING=true
|
||||
```
|
||||
|
||||
This will fall back to localhost clustering mode for testing.
|
||||
|
||||
#### B. Docker Network Configuration
|
||||
Ensure the Docker compose file includes proper network configuration:
|
||||
```yaml
|
||||
services:
|
||||
managing.api:
|
||||
ports:
|
||||
- "11111:11111" # Orleans silo port
|
||||
- "30000:30000" # Orleans gateway port
|
||||
hostname: managing-api
|
||||
```
|
||||
|
||||
#### C. Database Connection Issues
|
||||
If the Orleans database is unavailable, the system will automatically fall back to:
|
||||
- Localhost clustering
|
||||
- Memory-based grain storage
|
||||
|
||||
### 2. Configuration Options
|
||||
|
||||
#### Production Settings
|
||||
In `appsettings.Production.json`:
|
||||
```json
|
||||
{
|
||||
"RunOrleansGrains": true,
|
||||
"Orleans": {
|
||||
"EnableClustering": true,
|
||||
"ConnectionTimeout": 60,
|
||||
"MaxJoinAttempts": 3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
#### Environment Variables
|
||||
- `RUN_ORLEANS_GRAINS`: Enable/disable Orleans grains (true/false)
|
||||
- `DISABLE_ORLEANS_CLUSTERING`: Force localhost clustering (true/false)
|
||||
- `ASPNETCORE_ENVIRONMENT`: Set environment (Production/Development/etc.)
|
||||
|
||||
### 3. Network Configuration Improvements
|
||||
|
||||
The following improvements have been made to handle Docker networking issues:
|
||||
|
||||
1. **Endpoint Configuration**:
|
||||
- `listenOnAnyHostAddress: true` allows binding to all network interfaces
|
||||
- Increased timeout values for better reliability
|
||||
|
||||
2. **Fallback Mechanisms**:
|
||||
- Automatic fallback to localhost clustering if database unavailable
|
||||
- Memory storage fallback for grain persistence
|
||||
|
||||
3. **Improved Timeouts**:
|
||||
- Response timeout: 60 seconds
|
||||
- Probe timeout: 10 seconds
|
||||
- Join attempt timeout: 120 seconds
|
||||
|
||||
### 4. Monitoring and Debugging
|
||||
|
||||
#### Orleans Dashboard
|
||||
Available in development mode at: `http://localhost:9999`
|
||||
- Username: admin
|
||||
- Password: admin
|
||||
|
||||
#### Health Checks
|
||||
Monitor application health at:
|
||||
- `/health` - Full health check
|
||||
- `/alive` - Basic liveness check
|
||||
|
||||
### 5. Emergency Procedures
|
||||
|
||||
If Orleans clustering is causing deployment issues:
|
||||
|
||||
1. **Immediate Fix**: Set environment variable `DISABLE_ORLEANS_CLUSTERING=true`
|
||||
2. **Restart Services**: Restart the managing.api container
|
||||
3. **Check Logs**: Monitor for connection timeout errors
|
||||
4. **Database Check**: Verify PostgreSQL Orleans database connectivity
|
||||
|
||||
### 6. Database Requirements
|
||||
|
||||
Orleans requires these PostgreSQL databases:
|
||||
- Main application database (from `PostgreSql:ConnectionString`)
|
||||
- Orleans clustering database (from `PostgreSql:Orleans`)
|
||||
|
||||
If either is unavailable, the system will gracefully degrade functionality.
|
||||
|
||||
## Testing the Fix
|
||||
|
||||
1. Deploy with the updated configuration
|
||||
2. Monitor logs for Orleans connection errors
|
||||
3. Verify grain functionality through the dashboard (development) or API endpoints
|
||||
4. Test failover scenarios by temporarily disabling database connectivity
|
||||
|
||||
## Related Files
|
||||
|
||||
- `src/Managing.Bootstrap/ApiBootstrap.cs` - Orleans configuration
|
||||
- `src/Managing.Docker/docker-compose.yml` - Docker networking
|
||||
- `src/Managing.Api/appsettings.*.json` - Environment-specific settings
|
||||
@@ -29,6 +29,11 @@
|
||||
},
|
||||
"RunOrleansGrains": true,
|
||||
"DeploymentMode": false,
|
||||
"Orleans": {
|
||||
"EnableClustering": true,
|
||||
"ConnectionTimeout": 60,
|
||||
"MaxJoinAttempts": 3
|
||||
},
|
||||
"AllowedHosts": "*",
|
||||
"WorkerBotManager": true,
|
||||
"WorkerBalancesTracking": true,
|
||||
|
||||
Reference in New Issue
Block a user