Files
managing-apps/src/Managing.Api/README-ORLEANS-TROUBLESHOOTING.md
2025-08-16 05:23:28 +07:00

3.6 KiB

Orleans Clustering Troubleshooting Guide

Overview

This document provides troubleshooting steps for Orleans clustering issues, particularly the "Connection attempt to endpoint failed" errors that can occur in Docker deployments.

Common Issues and Solutions

1. Connection Timeout Errors

Error: Connection attempt to endpoint S10.0.0.9:11111:114298801 timed out after 00:00:05

Cause: Orleans silos are trying to connect to each other but the network configuration is preventing proper communication.

Solutions:

A. Environment Variables

You can disable Orleans clustering completely by setting:

DISABLE_ORLEANS_CLUSTERING=true

This will fall back to localhost clustering mode for testing.

B. Docker Network Configuration

Ensure the Docker compose file includes proper network configuration:

services:
  managing.api:
    ports:
      - "11111:11111"  # Orleans silo port
      - "30000:30000"  # Orleans gateway port
    hostname: managing-api

C. Database Connection Issues

If the Orleans database is unavailable, the system will automatically fall back to:

  • Localhost clustering
  • Memory-based grain storage

2. Configuration Options

Production Settings

In appsettings.Production.json:

{
  "RunOrleansGrains": true,
  "Orleans": {
    "EnableClustering": true,
    "ConnectionTimeout": 60,
    "MaxJoinAttempts": 3
  }
}

Environment Variables

  • RUN_ORLEANS_GRAINS: Enable/disable Orleans grains (true/false)
  • DISABLE_ORLEANS_CLUSTERING: Force localhost clustering (true/false)
  • ORLEANS_ADVERTISED_IP: Set specific IP address for Orleans clustering (e.g., "192.168.1.100")
  • ASPNETCORE_ENVIRONMENT: Set environment (Production/Development/etc.)

3. Network Configuration Improvements

The following improvements have been made to handle Docker networking issues:

  1. Endpoint Configuration:

    • listenOnAnyHostAddress: true allows binding to all network interfaces
    • Increased timeout values for better reliability
  2. Fallback Mechanisms:

    • Automatic fallback to localhost clustering if database unavailable
    • Memory storage fallback for grain persistence
  3. Improved Timeouts:

    • Response timeout: 60 seconds
    • Probe timeout: 10 seconds
    • Join attempt timeout: 120 seconds

4. Monitoring and Debugging

Orleans Dashboard

Available in development mode at: http://localhost:9999

  • Username: admin
  • Password: admin

Health Checks

Monitor application health at:

  • /health - Full health check
  • /alive - Basic liveness check

5. Emergency Procedures

If Orleans clustering is causing deployment issues:

  1. Immediate Fix: Set environment variable DISABLE_ORLEANS_CLUSTERING=true
  2. Restart Services: Restart the managing.api container
  3. Check Logs: Monitor for connection timeout errors
  4. Database Check: Verify PostgreSQL Orleans database connectivity

6. Database Requirements

Orleans requires these PostgreSQL databases:

  • Main application database (from PostgreSql:ConnectionString)
  • Orleans clustering database (from PostgreSql:Orleans)

If either is unavailable, the system will gracefully degrade functionality.

Testing the Fix

  1. Deploy with the updated configuration
  2. Monitor logs for Orleans connection errors
  3. Verify grain functionality through the dashboard (development) or API endpoints
  4. Test failover scenarios by temporarily disabling database connectivity
  • src/Managing.Bootstrap/ApiBootstrap.cs - Orleans configuration
  • src/Managing.Docker/docker-compose.yml - Docker networking
  • src/Managing.Api/appsettings.*.json - Environment-specific settings