Add doc for workers architecture
This commit is contained in:
@@ -0,0 +1,78 @@
|
|||||||
|
# Overall System Architecture
|
||||||
|
|
||||||
|
This diagram shows the complete system architecture with API Server Cluster, Compute Worker Cluster, and their interactions with the database and external services.
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
graph TB
|
||||||
|
subgraph "Monorepo Structure"
|
||||||
|
subgraph "API Server Cluster"
|
||||||
|
API1[Managing.Api<br/>API-1<br/>Orleans]
|
||||||
|
API2[Managing.Api<br/>API-2<br/>Orleans]
|
||||||
|
API3[Managing.Api<br/>API-3<br/>Orleans]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph "Compute Worker Cluster"
|
||||||
|
W1[Managing.Compute<br/>Worker-1<br/>8 cores, 6 jobs]
|
||||||
|
W2[Managing.Compute<br/>Worker-2<br/>8 cores, 6 jobs]
|
||||||
|
W3[Managing.Compute<br/>Worker-3<br/>8 cores, 6 jobs]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph "Shared Projects"
|
||||||
|
APP[Managing.Application<br/>Business Logic]
|
||||||
|
DOM[Managing.Domain<br/>Domain Models]
|
||||||
|
INFRA[Managing.Infrastructure<br/>Database Access]
|
||||||
|
end
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph "External Services"
|
||||||
|
DB[(PostgreSQL<br/>Job Queue)]
|
||||||
|
INFLUX[(InfluxDB<br/>Candles)]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph "Clients"
|
||||||
|
U1[User 1]
|
||||||
|
U2[User 2]
|
||||||
|
U1000[User 1000]
|
||||||
|
end
|
||||||
|
|
||||||
|
U1 --> API1
|
||||||
|
U2 --> API2
|
||||||
|
U1000 --> API3
|
||||||
|
|
||||||
|
API1 --> DB
|
||||||
|
API2 --> DB
|
||||||
|
API3 --> DB
|
||||||
|
|
||||||
|
W1 --> DB
|
||||||
|
W2 --> DB
|
||||||
|
W3 --> DB
|
||||||
|
|
||||||
|
W1 --> INFLUX
|
||||||
|
W2 --> INFLUX
|
||||||
|
W3 --> INFLUX
|
||||||
|
|
||||||
|
API1 -.uses.-> APP
|
||||||
|
API2 -.uses.-> APP
|
||||||
|
API3 -.uses.-> APP
|
||||||
|
W1 -.uses.-> APP
|
||||||
|
W2 -.uses.-> APP
|
||||||
|
W3 -.uses.-> APP
|
||||||
|
|
||||||
|
style API1 fill:#4A90E2
|
||||||
|
style API2 fill:#4A90E2
|
||||||
|
style API3 fill:#4A90E2
|
||||||
|
style W1 fill:#50C878
|
||||||
|
style W2 fill:#50C878
|
||||||
|
style W3 fill:#50C878
|
||||||
|
style DB fill:#FF6B6B
|
||||||
|
style INFLUX fill:#FFD93D
|
||||||
|
```
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
- **API Server Cluster**: Handles HTTP requests, creates jobs, returns immediately
|
||||||
|
- **Compute Worker Cluster**: Processes CPU-intensive backtest jobs
|
||||||
|
- **PostgreSQL**: Job queue and state management
|
||||||
|
- **InfluxDB**: Time-series data for candles
|
||||||
|
- **Shared Projects**: Common business logic used by both API and Compute services
|
||||||
|
|
||||||
52
assets/documentation/Workers processing/02-Request-Flow.md
Normal file
52
assets/documentation/Workers processing/02-Request-Flow.md
Normal file
@@ -0,0 +1,52 @@
|
|||||||
|
# Request Flow Sequence Diagram
|
||||||
|
|
||||||
|
This diagram shows the complete request flow from user submission to job completion and status polling.
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
sequenceDiagram
|
||||||
|
participant User
|
||||||
|
participant API as API Server<br/>(Orleans)
|
||||||
|
participant DB as PostgreSQL<br/>(Job Queue)
|
||||||
|
participant Worker as Compute Worker
|
||||||
|
participant Influx as InfluxDB
|
||||||
|
|
||||||
|
User->>API: POST /api/backtest/bundle
|
||||||
|
API->>API: Create BundleBacktestRequest
|
||||||
|
API->>API: Generate BacktestJobs from variants
|
||||||
|
API->>DB: INSERT BacktestJobs (Status: Pending)
|
||||||
|
API-->>User: 202 Accepted<br/>{bundleRequestId, status: "Queued"}
|
||||||
|
|
||||||
|
Note over Worker: Polling every 5 seconds
|
||||||
|
Worker->>DB: SELECT pending jobs<br/>(ORDER BY priority, createdAt)
|
||||||
|
DB-->>Worker: Return pending jobs
|
||||||
|
Worker->>DB: UPDATE job<br/>(Status: Running, AssignedWorkerId)
|
||||||
|
Worker->>Influx: Load candles for backtest
|
||||||
|
Influx-->>Worker: Return candles
|
||||||
|
|
||||||
|
loop Process each candle
|
||||||
|
Worker->>Worker: Run backtest logic
|
||||||
|
Worker->>DB: UPDATE job progress
|
||||||
|
end
|
||||||
|
|
||||||
|
Worker->>DB: UPDATE job<br/>(Status: Completed, ResultJson)
|
||||||
|
Worker->>DB: UPDATE BundleBacktestRequest<br/>(CompletedBacktests++)
|
||||||
|
|
||||||
|
User->>API: GET /api/backtest/bundle/{id}/status
|
||||||
|
API->>DB: SELECT BundleBacktestRequest + job stats
|
||||||
|
DB-->>API: Return status
|
||||||
|
API-->>User: {status, progress, completed/total}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Flow Steps
|
||||||
|
|
||||||
|
1. **User Request**: User submits bundle backtest request
|
||||||
|
2. **API Processing**: API creates bundle request and generates individual backtest jobs
|
||||||
|
3. **Job Queue**: Jobs are inserted into database with `Pending` status
|
||||||
|
4. **Immediate Response**: API returns 202 Accepted with bundle request ID
|
||||||
|
5. **Worker Polling**: Compute workers poll database every 5 seconds
|
||||||
|
6. **Job Claiming**: Worker claims jobs using PostgreSQL advisory locks
|
||||||
|
7. **Candle Loading**: Worker loads candles from InfluxDB
|
||||||
|
8. **Backtest Processing**: Worker processes backtest with progress updates
|
||||||
|
9. **Result Storage**: Worker saves results and updates bundle progress
|
||||||
|
10. **Status Polling**: User polls API for status updates
|
||||||
|
|
||||||
@@ -0,0 +1,54 @@
|
|||||||
|
# Job Processing Flow
|
||||||
|
|
||||||
|
This diagram shows the detailed flow of how compute workers process backtest jobs from the queue.
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart TD
|
||||||
|
Start([User Creates<br/>BundleBacktestRequest]) --> CreateJobs[API: Generate<br/>BacktestJobs]
|
||||||
|
CreateJobs --> InsertDB[(Insert Jobs<br/>Status: Pending)]
|
||||||
|
|
||||||
|
InsertDB --> WorkerPoll{Worker Polls<br/>Database}
|
||||||
|
|
||||||
|
WorkerPoll -->|Every 5s| CheckJobs{Jobs<br/>Available?}
|
||||||
|
CheckJobs -->|No| Wait[Wait 5s]
|
||||||
|
Wait --> WorkerPoll
|
||||||
|
|
||||||
|
CheckJobs -->|Yes| ClaimJobs[Claim Jobs<br/>Advisory Lock]
|
||||||
|
ClaimJobs --> UpdateStatus[Update Status:<br/>Running]
|
||||||
|
|
||||||
|
UpdateStatus --> CheckSemaphore{Semaphore<br/>Available?}
|
||||||
|
CheckSemaphore -->|No| WaitSemaphore[Wait for<br/>slot]
|
||||||
|
WaitSemaphore --> CheckSemaphore
|
||||||
|
|
||||||
|
CheckSemaphore -->|Yes| AcquireSemaphore[Acquire<br/>Semaphore]
|
||||||
|
AcquireSemaphore --> LoadCandles[Load Candles<br/>from InfluxDB]
|
||||||
|
|
||||||
|
LoadCandles --> ProcessBacktest[Process Backtest<br/>CPU-intensive]
|
||||||
|
|
||||||
|
ProcessBacktest --> UpdateProgress{Every<br/>10%?}
|
||||||
|
UpdateProgress -->|Yes| SaveProgress[Update Progress<br/>in DB]
|
||||||
|
SaveProgress --> ProcessBacktest
|
||||||
|
UpdateProgress -->|No| ProcessBacktest
|
||||||
|
|
||||||
|
ProcessBacktest --> BacktestComplete{Backtest<br/>Complete?}
|
||||||
|
BacktestComplete -->|No| ProcessBacktest
|
||||||
|
BacktestComplete -->|Yes| SaveResult[Save Result<br/>Status: Completed]
|
||||||
|
|
||||||
|
SaveResult --> UpdateBundle[Update Bundle<br/>Progress]
|
||||||
|
UpdateBundle --> ReleaseSemaphore[Release<br/>Semaphore]
|
||||||
|
ReleaseSemaphore --> WorkerPoll
|
||||||
|
|
||||||
|
style Start fill:#4A90E2
|
||||||
|
style ProcessBacktest fill:#50C878
|
||||||
|
style SaveResult fill:#FF6B6B
|
||||||
|
style WorkerPoll fill:#FFD93D
|
||||||
|
```
|
||||||
|
|
||||||
|
## Key Components
|
||||||
|
|
||||||
|
- **Worker Polling**: Workers continuously poll database for pending jobs
|
||||||
|
- **Advisory Locks**: PostgreSQL advisory locks prevent multiple workers from claiming the same job
|
||||||
|
- **Semaphore Control**: Limits concurrent backtests per worker (default: CPU cores - 2)
|
||||||
|
- **Progress Updates**: Progress is saved to database every 10% completion
|
||||||
|
- **Bundle Updates**: Individual job completion updates the parent bundle request
|
||||||
|
|
||||||
@@ -0,0 +1,69 @@
|
|||||||
|
# Database Schema & Queue Structure
|
||||||
|
|
||||||
|
This diagram shows the entity relationships between BundleBacktestRequest, BacktestJob, and User entities.
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
erDiagram
|
||||||
|
BundleBacktestRequest ||--o{ BacktestJob : "has many"
|
||||||
|
BacktestJob }o--|| User : "belongs to"
|
||||||
|
|
||||||
|
BundleBacktestRequest {
|
||||||
|
UUID RequestId PK
|
||||||
|
INT UserId FK
|
||||||
|
STRING Status
|
||||||
|
INT TotalBacktests
|
||||||
|
INT CompletedBacktests
|
||||||
|
INT FailedBacktests
|
||||||
|
DATETIME CreatedAt
|
||||||
|
DATETIME CompletedAt
|
||||||
|
STRING UniversalConfigJson
|
||||||
|
STRING DateTimeRangesJson
|
||||||
|
STRING MoneyManagementVariantsJson
|
||||||
|
STRING TickerVariantsJson
|
||||||
|
}
|
||||||
|
|
||||||
|
BacktestJob {
|
||||||
|
UUID Id PK
|
||||||
|
UUID BundleRequestId FK
|
||||||
|
STRING JobType
|
||||||
|
STRING Status
|
||||||
|
INT Priority
|
||||||
|
TEXT ConfigJson
|
||||||
|
TEXT CandlesJson
|
||||||
|
INT ProgressPercentage
|
||||||
|
INT CurrentBacktestIndex
|
||||||
|
INT TotalBacktests
|
||||||
|
INT CompletedBacktests
|
||||||
|
DATETIME CreatedAt
|
||||||
|
DATETIME StartedAt
|
||||||
|
DATETIME CompletedAt
|
||||||
|
TEXT ResultJson
|
||||||
|
TEXT ErrorMessage
|
||||||
|
STRING AssignedWorkerId
|
||||||
|
DATETIME LastHeartbeat
|
||||||
|
}
|
||||||
|
|
||||||
|
User {
|
||||||
|
INT Id PK
|
||||||
|
STRING Name
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Table Descriptions
|
||||||
|
|
||||||
|
### BundleBacktestRequest
|
||||||
|
- Represents a bundle of multiple backtest jobs
|
||||||
|
- Contains variant configurations (date ranges, money management, tickers)
|
||||||
|
- Tracks overall progress across all jobs
|
||||||
|
|
||||||
|
### BacktestJob
|
||||||
|
- Individual backtest execution unit
|
||||||
|
- Contains serialized config and candles
|
||||||
|
- Tracks progress, worker assignment, and heartbeat
|
||||||
|
- Links to parent bundle request
|
||||||
|
|
||||||
|
### Key Indexes
|
||||||
|
- `idx_status_priority`: For efficient job claiming (Status, Priority DESC, CreatedAt)
|
||||||
|
- `idx_bundle_request`: For bundle progress queries
|
||||||
|
- `idx_assigned_worker`: For worker health monitoring
|
||||||
|
|
||||||
@@ -0,0 +1,103 @@
|
|||||||
|
# Deployment Architecture
|
||||||
|
|
||||||
|
This diagram shows the production deployment architecture with load balancing, clustering, and monitoring.
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
graph TB
|
||||||
|
subgraph "Load Balancer"
|
||||||
|
LB[NGINX/Cloudflare]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph "API Server Cluster"
|
||||||
|
direction LR
|
||||||
|
API1[API-1<br/>Orleans Silo<br/>Port: 11111]
|
||||||
|
API2[API-2<br/>Orleans Silo<br/>Port: 11121]
|
||||||
|
API3[API-3<br/>Orleans Silo<br/>Port: 11131]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph "Compute Worker Cluster"
|
||||||
|
direction LR
|
||||||
|
W1[Worker-1<br/>8 CPU Cores<br/>6 Concurrent Jobs]
|
||||||
|
W2[Worker-2<br/>8 CPU Cores<br/>6 Concurrent Jobs]
|
||||||
|
W3[Worker-3<br/>8 CPU Cores<br/>6 Concurrent Jobs]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph "Database Cluster"
|
||||||
|
direction LR
|
||||||
|
DB_MASTER[(PostgreSQL<br/>Master<br/>Job Queue)]
|
||||||
|
DB_REPLICA[(PostgreSQL<br/>Replica<br/>Read Only)]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph "Time Series DB"
|
||||||
|
INFLUX[(InfluxDB<br/>Candles Data)]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph "Monitoring"
|
||||||
|
PROM[Prometheus]
|
||||||
|
GRAF[Grafana]
|
||||||
|
end
|
||||||
|
|
||||||
|
LB --> API1
|
||||||
|
LB --> API2
|
||||||
|
LB --> API3
|
||||||
|
|
||||||
|
API1 --> DB_MASTER
|
||||||
|
API2 --> DB_MASTER
|
||||||
|
API3 --> DB_MASTER
|
||||||
|
|
||||||
|
W1 --> DB_MASTER
|
||||||
|
W2 --> DB_MASTER
|
||||||
|
W3 --> DB_MASTER
|
||||||
|
|
||||||
|
W1 --> INFLUX
|
||||||
|
W2 --> INFLUX
|
||||||
|
W3 --> INFLUX
|
||||||
|
|
||||||
|
W1 --> PROM
|
||||||
|
W2 --> PROM
|
||||||
|
W3 --> PROM
|
||||||
|
API1 --> PROM
|
||||||
|
API2 --> PROM
|
||||||
|
API3 --> PROM
|
||||||
|
|
||||||
|
PROM --> GRAF
|
||||||
|
|
||||||
|
DB_MASTER --> DB_REPLICA
|
||||||
|
|
||||||
|
style LB fill:#9B59B6
|
||||||
|
style API1 fill:#4A90E2
|
||||||
|
style API2 fill:#4A90E2
|
||||||
|
style API3 fill:#4A90E2
|
||||||
|
style W1 fill:#50C878
|
||||||
|
style W2 fill:#50C878
|
||||||
|
style W3 fill:#50C878
|
||||||
|
style DB_MASTER fill:#FF6B6B
|
||||||
|
style INFLUX fill:#FFD93D
|
||||||
|
style PROM fill:#E67E22
|
||||||
|
style GRAF fill:#E67E22
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deployment Components
|
||||||
|
|
||||||
|
### Load Balancer
|
||||||
|
- **NGINX/Cloudflare**: Distributes incoming requests across API servers
|
||||||
|
- Health checks and failover support
|
||||||
|
|
||||||
|
### API Server Cluster
|
||||||
|
- **3+ Instances**: Horizontally scalable Orleans silos
|
||||||
|
- Each instance handles HTTP requests and Orleans grain operations
|
||||||
|
- Ports: 11111, 11121, 11131 (for clustering)
|
||||||
|
|
||||||
|
### Compute Worker Cluster
|
||||||
|
- **3+ Instances**: Dedicated CPU workers
|
||||||
|
- Each worker: 8 CPU cores, 6 concurrent backtests
|
||||||
|
- Total capacity: 18 concurrent backtests across cluster
|
||||||
|
|
||||||
|
### Database Cluster
|
||||||
|
- **Master**: Handles all writes (job creation, updates)
|
||||||
|
- **Replica**: Read-only for status queries and reporting
|
||||||
|
|
||||||
|
### Monitoring
|
||||||
|
- **Prometheus**: Metrics collection
|
||||||
|
- **Grafana**: Visualization and dashboards
|
||||||
|
|
||||||
@@ -0,0 +1,96 @@
|
|||||||
|
# Concurrency Control Flow
|
||||||
|
|
||||||
|
This diagram shows how the semaphore-based concurrency control works across multiple workers.
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
graph LR
|
||||||
|
subgraph "Database Queue"
|
||||||
|
Q[Pending Jobs<br/>Priority Queue]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph "Worker-1"
|
||||||
|
S1[Semaphore<br/>6 slots]
|
||||||
|
J1[Job 1]
|
||||||
|
J2[Job 2]
|
||||||
|
J3[Job 3]
|
||||||
|
J4[Job 4]
|
||||||
|
J5[Job 5]
|
||||||
|
J6[Job 6]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph "Worker-2"
|
||||||
|
S2[Semaphore<br/>6 slots]
|
||||||
|
J7[Job 7]
|
||||||
|
J8[Job 8]
|
||||||
|
J9[Job 9]
|
||||||
|
J10[Job 10]
|
||||||
|
J11[Job 11]
|
||||||
|
J12[Job 12]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph "Worker-3"
|
||||||
|
S3[Semaphore<br/>6 slots]
|
||||||
|
J13[Job 13]
|
||||||
|
J14[Job 14]
|
||||||
|
J15[Job 15]
|
||||||
|
J16[Job 16]
|
||||||
|
J17[Job 17]
|
||||||
|
J18[Job 18]
|
||||||
|
end
|
||||||
|
|
||||||
|
Q -->|Claim 6 jobs| S1
|
||||||
|
Q -->|Claim 6 jobs| S2
|
||||||
|
Q -->|Claim 6 jobs| S3
|
||||||
|
|
||||||
|
S1 --> J1
|
||||||
|
S1 --> J2
|
||||||
|
S1 --> J3
|
||||||
|
S1 --> J4
|
||||||
|
S1 --> J5
|
||||||
|
S1 --> J6
|
||||||
|
|
||||||
|
S2 --> J7
|
||||||
|
S2 --> J8
|
||||||
|
S2 --> J9
|
||||||
|
S2 --> J10
|
||||||
|
S2 --> J11
|
||||||
|
S2 --> J12
|
||||||
|
|
||||||
|
S3 --> J13
|
||||||
|
S3 --> J14
|
||||||
|
S3 --> J15
|
||||||
|
S3 --> J16
|
||||||
|
S3 --> J17
|
||||||
|
S3 --> J18
|
||||||
|
|
||||||
|
style Q fill:#FF6B6B
|
||||||
|
style S1 fill:#50C878
|
||||||
|
style S2 fill:#50C878
|
||||||
|
style S3 fill:#50C878
|
||||||
|
```
|
||||||
|
|
||||||
|
## Concurrency Control Mechanisms
|
||||||
|
|
||||||
|
### 1. Database-Level (Advisory Locks)
|
||||||
|
- **PostgreSQL Advisory Locks**: Prevent multiple workers from claiming the same job
|
||||||
|
- Atomic job claiming using `pg_try_advisory_lock()`
|
||||||
|
- Ensures exactly-once job processing
|
||||||
|
|
||||||
|
### 2. Worker-Level (Semaphore)
|
||||||
|
- **SemaphoreSlim**: Limits concurrent backtests per worker
|
||||||
|
- Default: `Environment.ProcessorCount - 2` (e.g., 6 on 8-core machine)
|
||||||
|
- Prevents CPU saturation while leaving resources for Orleans messaging
|
||||||
|
|
||||||
|
### 3. Cluster-Level (Queue Priority)
|
||||||
|
- **Priority Queue**: Jobs ordered by priority, then creation time
|
||||||
|
- VIP users get higher priority
|
||||||
|
- Fair distribution across workers
|
||||||
|
|
||||||
|
## Capacity Calculation
|
||||||
|
|
||||||
|
- **Per Worker**: 6 concurrent backtests
|
||||||
|
- **3 Workers**: 18 concurrent backtests
|
||||||
|
- **Average Duration**: ~47 minutes per backtest
|
||||||
|
- **Throughput**: ~1,080 backtests/hour
|
||||||
|
- **1000 Users × 10 backtests**: ~9 hours to process full queue
|
||||||
|
|
||||||
@@ -0,0 +1,74 @@
|
|||||||
|
# Monorepo Project Structure
|
||||||
|
|
||||||
|
This diagram shows the monorepo structure with shared projects used by both API and Compute services.
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
graph TD
|
||||||
|
ROOT[Managing.sln<br/>Monorepo Root]
|
||||||
|
|
||||||
|
ROOT --> API[Managing.Api<br/>API Server<br/>Orleans]
|
||||||
|
ROOT --> COMPUTE[Managing.Compute<br/>Worker App<br/>No Orleans]
|
||||||
|
|
||||||
|
ROOT --> SHARED[Shared Projects]
|
||||||
|
|
||||||
|
SHARED --> APP[Managing.Application<br/>Business Logic]
|
||||||
|
SHARED --> DOM[Managing.Domain<br/>Domain Models]
|
||||||
|
SHARED --> INFRA[Managing.Infrastructure<br/>Database/External]
|
||||||
|
SHARED --> COMMON[Managing.Common<br/>Utilities]
|
||||||
|
|
||||||
|
API --> APP
|
||||||
|
API --> DOM
|
||||||
|
API --> INFRA
|
||||||
|
API --> COMMON
|
||||||
|
|
||||||
|
COMPUTE --> APP
|
||||||
|
COMPUTE --> DOM
|
||||||
|
COMPUTE --> INFRA
|
||||||
|
COMPUTE --> COMMON
|
||||||
|
|
||||||
|
style ROOT fill:#9B59B6
|
||||||
|
style API fill:#4A90E2
|
||||||
|
style COMPUTE fill:#50C878
|
||||||
|
style SHARED fill:#FFD93D
|
||||||
|
```
|
||||||
|
|
||||||
|
## Project Organization
|
||||||
|
|
||||||
|
### Root Level
|
||||||
|
- **Managing.sln**: Solution file containing all projects
|
||||||
|
|
||||||
|
### Service Projects
|
||||||
|
- **Managing.Api**: API Server with Orleans
|
||||||
|
- Controllers, Orleans grains, HTTP endpoints
|
||||||
|
- Handles user requests, creates jobs
|
||||||
|
|
||||||
|
- **Managing.Compute**: Compute Worker App (NEW)
|
||||||
|
- Background workers, job processors
|
||||||
|
- No Orleans dependency
|
||||||
|
- Dedicated CPU processing
|
||||||
|
|
||||||
|
### Shared Projects
|
||||||
|
- **Managing.Application**: Business logic
|
||||||
|
- `Backtester.cs`, `TradingBotBase.cs`
|
||||||
|
- Used by both API and Compute
|
||||||
|
|
||||||
|
- **Managing.Domain**: Domain models
|
||||||
|
- `BundleBacktestRequest.cs`, `BacktestJob.cs`
|
||||||
|
- Shared entities
|
||||||
|
|
||||||
|
- **Managing.Infrastructure**: External integrations
|
||||||
|
- Database repositories, InfluxDB client
|
||||||
|
- Shared data access
|
||||||
|
|
||||||
|
- **Managing.Common**: Utilities
|
||||||
|
- Constants, enums, helpers
|
||||||
|
- Shared across all projects
|
||||||
|
|
||||||
|
## Benefits
|
||||||
|
|
||||||
|
1. **Code Reuse**: Shared business logic between API and Compute
|
||||||
|
2. **Consistency**: Same domain models and logic
|
||||||
|
3. **Maintainability**: Single source of truth
|
||||||
|
4. **Type Safety**: Shared types prevent serialization issues
|
||||||
|
5. **Testing**: Shared test projects
|
||||||
|
|
||||||
@@ -0,0 +1,71 @@
|
|||||||
|
# Implementation Plan
|
||||||
|
|
||||||
|
## Phase 1: Database & Domain Setup
|
||||||
|
|
||||||
|
- [ ] Create `BacktestJob` entity in `Managing.Domain/Backtests/`
|
||||||
|
- [ ] Create `BacktestJobStatus` enum (Pending, Running, Completed, Failed)
|
||||||
|
- [ ] Create database migration for `BacktestJobs` table
|
||||||
|
- [ ] Add indexes: `idx_status_priority`, `idx_bundle_request`, `idx_assigned_worker`
|
||||||
|
- [ ] Create `IBacktestJobRepository` interface
|
||||||
|
- [ ] Implement `BacktestJobRepository` with advisory lock support
|
||||||
|
|
||||||
|
## Phase 2: Compute Worker Project
|
||||||
|
|
||||||
|
- [ ] Create `Managing.Compute` project (console app/worker service)
|
||||||
|
- [ ] Add project reference to shared projects (Application, Domain, Infrastructure)
|
||||||
|
- [ ] Configure DI container (NO Orleans)
|
||||||
|
- [ ] Create `BacktestComputeWorker` background service
|
||||||
|
- [ ] Implement job polling logic (every 5 seconds)
|
||||||
|
- [ ] Implement job claiming with PostgreSQL advisory locks
|
||||||
|
- [ ] Implement semaphore-based concurrency control
|
||||||
|
- [ ] Implement progress callback mechanism
|
||||||
|
- [ ] Implement heartbeat mechanism (every 30 seconds)
|
||||||
|
- [ ] Add configuration: `MaxConcurrentBacktests`, `JobPollIntervalSeconds`
|
||||||
|
|
||||||
|
## Phase 3: API Server Updates
|
||||||
|
|
||||||
|
- [ ] Update `BacktestController` to create jobs instead of calling grains directly
|
||||||
|
- [ ] Implement `CreateBundleBacktest` endpoint (returns immediately)
|
||||||
|
- [ ] Implement `GetBundleStatus` endpoint (polls database)
|
||||||
|
- [ ] Update `Backtester.cs` to generate `BacktestJob` entities from bundle variants
|
||||||
|
- [ ] Remove direct Orleans grain calls for backtests (keep for other operations)
|
||||||
|
|
||||||
|
## Phase 4: Shared Logic
|
||||||
|
|
||||||
|
- [ ] Extract backtest execution logic from `BacktestTradingBotGrain` to `Backtester.cs`
|
||||||
|
- [ ] Make backtest logic Orleans-agnostic (can run in worker or grain)
|
||||||
|
- [ ] Add progress callback support to `RunBacktestAsync` method
|
||||||
|
- [ ] Ensure candle loading works in both contexts
|
||||||
|
|
||||||
|
## Phase 5: Monitoring & Health Checks
|
||||||
|
|
||||||
|
- [ ] Add health check endpoint to compute worker
|
||||||
|
- [ ] Add metrics: pending jobs, running jobs, completed/failed counts
|
||||||
|
- [ ] Add stale job detection (reclaim jobs from dead workers)
|
||||||
|
- [ ] Add logging for job lifecycle events
|
||||||
|
|
||||||
|
## Phase 6: Deployment
|
||||||
|
|
||||||
|
- [ ] Create Dockerfile for `Managing.Compute`
|
||||||
|
- [ ] Create deployment configuration for compute workers
|
||||||
|
- [ ] Configure environment variables for compute cluster
|
||||||
|
- [ ] Set up monitoring dashboards (Prometheus/Grafana)
|
||||||
|
- [ ] Configure auto-scaling rules for compute workers
|
||||||
|
|
||||||
|
## Phase 7: Testing & Validation
|
||||||
|
|
||||||
|
- [ ] Test single backtest job processing
|
||||||
|
- [ ] Test bundle backtest with multiple jobs
|
||||||
|
- [ ] Test concurrent job processing (multiple workers)
|
||||||
|
- [ ] Test job recovery after worker failure
|
||||||
|
- [ ] Test priority queue ordering
|
||||||
|
- [ ] Load test with 1000+ concurrent users
|
||||||
|
|
||||||
|
## Phase 8: Migration Strategy
|
||||||
|
|
||||||
|
- [ ] Keep Orleans grains as fallback during transition
|
||||||
|
- [ ] Feature flag to switch between Orleans and Compute workers
|
||||||
|
- [ ] Gradual migration: test with small percentage of traffic
|
||||||
|
- [ ] Monitor performance and error rates
|
||||||
|
- [ ] Full cutover once validated
|
||||||
|
|
||||||
75
assets/documentation/Workers processing/README.md
Normal file
75
assets/documentation/Workers processing/README.md
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
# Workers Processing Architecture
|
||||||
|
|
||||||
|
This folder contains documentation for the enterprise-grade backtest processing architecture using a database queue pattern with separate API and Compute worker clusters.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The architecture separates concerns between:
|
||||||
|
- **API Server**: Handles HTTP requests, creates jobs, returns immediately (fire-and-forget)
|
||||||
|
- **Compute Workers**: Process CPU-intensive backtest jobs from the database queue
|
||||||
|
- **Database Queue**: Central coordination point using PostgreSQL
|
||||||
|
|
||||||
|
## Documentation Files
|
||||||
|
|
||||||
|
1. **[01-Overall-Architecture.md](./01-Overall-Architecture.md)**
|
||||||
|
- Complete system architecture diagram
|
||||||
|
- Component relationships
|
||||||
|
- External service integrations
|
||||||
|
|
||||||
|
2. **[02-Request-Flow.md](./02-Request-Flow.md)**
|
||||||
|
- Sequence diagram of request flow
|
||||||
|
- User request → Job creation → Processing → Status polling
|
||||||
|
|
||||||
|
3. **[03-Job-Processing-Flow.md](./03-Job-Processing-Flow.md)**
|
||||||
|
- Detailed job processing workflow
|
||||||
|
- Worker polling, job claiming, semaphore control
|
||||||
|
|
||||||
|
4. **[04-Database-Schema.md](./04-Database-Schema.md)**
|
||||||
|
- Entity relationship diagram
|
||||||
|
- Database schema for job queue
|
||||||
|
- Key indexes and relationships
|
||||||
|
|
||||||
|
5. **[05-Deployment-Architecture.md](./05-Deployment-Architecture.md)**
|
||||||
|
- Production deployment topology
|
||||||
|
- Load balancing, clustering, monitoring
|
||||||
|
|
||||||
|
6. **[06-Concurrency-Control.md](./06-Concurrency-Control.md)**
|
||||||
|
- Concurrency control mechanisms
|
||||||
|
- Semaphore-based limiting
|
||||||
|
- Capacity calculations
|
||||||
|
|
||||||
|
7. **[07-Monorepo-Structure.md](./07-Monorepo-Structure.md)**
|
||||||
|
- Monorepo project organization
|
||||||
|
- Shared projects and dependencies
|
||||||
|
|
||||||
|
## Key Features
|
||||||
|
|
||||||
|
- ✅ **No Timeouts**: Fire-and-forget pattern with polling
|
||||||
|
- ✅ **Scalable**: Horizontal scaling of both API and Compute clusters
|
||||||
|
- ✅ **Reliable**: Jobs persist in database, survive restarts
|
||||||
|
- ✅ **Efficient**: Dedicated CPU resources for compute work
|
||||||
|
- ✅ **Enterprise-Grade**: Handles 1000+ users, priority queue, health checks
|
||||||
|
|
||||||
|
## Architecture Principles
|
||||||
|
|
||||||
|
1. **Separation of Concerns**: API handles requests, Compute handles CPU work
|
||||||
|
2. **Database as Queue**: PostgreSQL serves as reliable job queue
|
||||||
|
3. **Shared Codebase**: Monorepo with shared business logic
|
||||||
|
4. **Resource Isolation**: Compute workers don't interfere with API responsiveness
|
||||||
|
5. **Fault Tolerance**: Jobs survive worker failures, can be reclaimed
|
||||||
|
|
||||||
|
## Capacity Planning
|
||||||
|
|
||||||
|
- **Per Worker**: 6 concurrent backtests (8-core machine)
|
||||||
|
- **3 Workers**: 18 concurrent backtests
|
||||||
|
- **Throughput**: ~1,080 backtests/hour
|
||||||
|
- **1000 Users × 10 backtests**: ~9 hours processing time
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
1. Create `Managing.Compute` project
|
||||||
|
2. Implement `BacktestJob` entity and repository
|
||||||
|
3. Create `BacktestComputeWorker` background service
|
||||||
|
4. Update API controllers to use job queue pattern
|
||||||
|
5. Deploy compute workers to dedicated servers
|
||||||
|
|
||||||
Reference in New Issue
Block a user