Files
managing-apps/assets/documentation/Workers processing/IMPLEMENTATION-PLAN.md

72 lines
2.9 KiB
Markdown

# Implementation Plan
## Phase 1: Database & Domain Setup
- [ ] Create `BacktestJob` entity in `Managing.Domain/Backtests/`
- [ ] Create `BacktestJobStatus` enum (Pending, Running, Completed, Failed)
- [ ] Create database migration for `BacktestJobs` table
- [ ] Add indexes: `idx_status_priority`, `idx_bundle_request`, `idx_assigned_worker`
- [ ] Create `IBacktestJobRepository` interface
- [ ] Implement `BacktestJobRepository` with advisory lock support
## Phase 2: Compute Worker Project
- [ ] Create `Managing.Compute` project (console app/worker service)
- [ ] Add project reference to shared projects (Application, Domain, Infrastructure)
- [ ] Configure DI container (NO Orleans)
- [ ] Create `BacktestComputeWorker` background service
- [ ] Implement job polling logic (every 5 seconds)
- [ ] Implement job claiming with PostgreSQL advisory locks
- [ ] Implement semaphore-based concurrency control
- [ ] Implement progress callback mechanism
- [ ] Implement heartbeat mechanism (every 30 seconds)
- [ ] Add configuration: `MaxConcurrentBacktests`, `JobPollIntervalSeconds`
## Phase 3: API Server Updates
- [ ] Update `BacktestController` to create jobs instead of calling grains directly
- [ ] Implement `CreateBundleBacktest` endpoint (returns immediately)
- [ ] Implement `GetBundleStatus` endpoint (polls database)
- [ ] Update `Backtester.cs` to generate `BacktestJob` entities from bundle variants
- [ ] Remove direct Orleans grain calls for backtests (keep for other operations)
## Phase 4: Shared Logic
- [ ] Extract backtest execution logic from `BacktestTradingBotGrain` to `Backtester.cs`
- [ ] Make backtest logic Orleans-agnostic (can run in worker or grain)
- [ ] Add progress callback support to `RunBacktestAsync` method
- [ ] Ensure candle loading works in both contexts
## Phase 5: Monitoring & Health Checks
- [ ] Add health check endpoint to compute worker
- [ ] Add metrics: pending jobs, running jobs, completed/failed counts
- [ ] Add stale job detection (reclaim jobs from dead workers)
- [ ] Add logging for job lifecycle events
## Phase 6: Deployment
- [ ] Create Dockerfile for `Managing.Compute`
- [ ] Create deployment configuration for compute workers
- [ ] Configure environment variables for compute cluster
- [ ] Set up monitoring dashboards (Prometheus/Grafana)
- [ ] Configure auto-scaling rules for compute workers
## Phase 7: Testing & Validation
- [ ] Test single backtest job processing
- [ ] Test bundle backtest with multiple jobs
- [ ] Test concurrent job processing (multiple workers)
- [ ] Test job recovery after worker failure
- [ ] Test priority queue ordering
- [ ] Load test with 1000+ concurrent users
## Phase 8: Migration Strategy
- [ ] Keep Orleans grains as fallback during transition
- [ ] Feature flag to switch between Orleans and Compute workers
- [ ] Gradual migration: test with small percentage of traffic
- [ ] Monitor performance and error rates
- [ ] Full cutover once validated