Update silo/cluster config

This commit is contained in:
2025-08-16 05:09:04 +07:00
parent d2975be0f5
commit eeb2923646
18 changed files with 205 additions and 740 deletions

View File

@@ -0,0 +1,118 @@
# Orleans Clustering Troubleshooting Guide
## Overview
This document provides troubleshooting steps for Orleans clustering issues, particularly the "Connection attempt to endpoint failed" errors that can occur in Docker deployments.
## Common Issues and Solutions
### 1. Connection Timeout Errors
**Error**: `Connection attempt to endpoint S10.0.0.9:11111:114298801 timed out after 00:00:05`
**Cause**: Orleans silos are trying to connect to each other but the network configuration is preventing proper communication.
**Solutions**:
#### A. Environment Variables
You can disable Orleans clustering completely by setting:
```bash
DISABLE_ORLEANS_CLUSTERING=true
```
This will fall back to localhost clustering mode for testing.
#### B. Docker Network Configuration
Ensure the Docker compose file includes proper network configuration:
```yaml
services:
managing.api:
ports:
- "11111:11111" # Orleans silo port
- "30000:30000" # Orleans gateway port
hostname: managing-api
```
#### C. Database Connection Issues
If the Orleans database is unavailable, the system will automatically fall back to:
- Localhost clustering
- Memory-based grain storage
### 2. Configuration Options
#### Production Settings
In `appsettings.Production.json`:
```json
{
"RunOrleansGrains": true,
"Orleans": {
"EnableClustering": true,
"ConnectionTimeout": 60,
"MaxJoinAttempts": 3
}
}
```
#### Environment Variables
- `RUN_ORLEANS_GRAINS`: Enable/disable Orleans grains (true/false)
- `DISABLE_ORLEANS_CLUSTERING`: Force localhost clustering (true/false)
- `ASPNETCORE_ENVIRONMENT`: Set environment (Production/Development/etc.)
### 3. Network Configuration Improvements
The following improvements have been made to handle Docker networking issues:
1. **Endpoint Configuration**:
- `listenOnAnyHostAddress: true` allows binding to all network interfaces
- Increased timeout values for better reliability
2. **Fallback Mechanisms**:
- Automatic fallback to localhost clustering if database unavailable
- Memory storage fallback for grain persistence
3. **Improved Timeouts**:
- Response timeout: 60 seconds
- Probe timeout: 10 seconds
- Join attempt timeout: 120 seconds
### 4. Monitoring and Debugging
#### Orleans Dashboard
Available in development mode at: `http://localhost:9999`
- Username: admin
- Password: admin
#### Health Checks
Monitor application health at:
- `/health` - Full health check
- `/alive` - Basic liveness check
### 5. Emergency Procedures
If Orleans clustering is causing deployment issues:
1. **Immediate Fix**: Set environment variable `DISABLE_ORLEANS_CLUSTERING=true`
2. **Restart Services**: Restart the managing.api container
3. **Check Logs**: Monitor for connection timeout errors
4. **Database Check**: Verify PostgreSQL Orleans database connectivity
### 6. Database Requirements
Orleans requires these PostgreSQL databases:
- Main application database (from `PostgreSql:ConnectionString`)
- Orleans clustering database (from `PostgreSql:Orleans`)
If either is unavailable, the system will gracefully degrade functionality.
## Testing the Fix
1. Deploy with the updated configuration
2. Monitor logs for Orleans connection errors
3. Verify grain functionality through the dashboard (development) or API endpoints
4. Test failover scenarios by temporarily disabling database connectivity
## Related Files
- `src/Managing.Bootstrap/ApiBootstrap.cs` - Orleans configuration
- `src/Managing.Docker/docker-compose.yml` - Docker networking
- `src/Managing.Api/appsettings.*.json` - Environment-specific settings

View File

@@ -29,6 +29,11 @@
},
"RunOrleansGrains": true,
"DeploymentMode": false,
"Orleans": {
"EnableClustering": true,
"ConnectionTimeout": 60,
"MaxJoinAttempts": 3
},
"AllowedHosts": "*",
"WorkerBotManager": true,
"WorkerBalancesTracking": true,