# Implementation Plan ## Phase 1: Database & Domain Setup - [ ] Create `BacktestJob` entity in `Managing.Domain/Backtests/` - [ ] Create `BacktestJobStatus` enum (Pending, Running, Completed, Failed) - [ ] Create database migration for `BacktestJobs` table - [ ] Add indexes: `idx_status_priority`, `idx_bundle_request`, `idx_assigned_worker` - [ ] Create `IBacktestJobRepository` interface - [ ] Implement `BacktestJobRepository` with advisory lock support ## Phase 2: Compute Worker Project - [ ] Create `Managing.Compute` project (console app/worker service) - [ ] Add project reference to shared projects (Application, Domain, Infrastructure) - [ ] Configure DI container (NO Orleans) - [ ] Create `BacktestComputeWorker` background service - [ ] Implement job polling logic (every 5 seconds) - [ ] Implement job claiming with PostgreSQL advisory locks - [ ] Implement semaphore-based concurrency control - [ ] Implement progress callback mechanism - [ ] Implement heartbeat mechanism (every 30 seconds) - [ ] Add configuration: `MaxConcurrentBacktests`, `JobPollIntervalSeconds` ## Phase 3: API Server Updates - [ ] Update `BacktestController` to create jobs instead of calling grains directly - [ ] Implement `CreateBundleBacktest` endpoint (returns immediately) - [ ] Implement `GetBundleStatus` endpoint (polls database) - [ ] Update `Backtester.cs` to generate `BacktestJob` entities from bundle variants - [ ] Remove direct Orleans grain calls for backtests (keep for other operations) ## Phase 4: Shared Logic - [ ] Extract backtest execution logic from `BacktestTradingBotGrain` to `Backtester.cs` - [ ] Make backtest logic Orleans-agnostic (can run in worker or grain) - [ ] Add progress callback support to `RunBacktestAsync` method - [ ] Ensure candle loading works in both contexts ## Phase 5: Monitoring & Health Checks - [ ] Add health check endpoint to compute worker - [ ] Add metrics: pending jobs, running jobs, completed/failed counts - [ ] Add stale job detection (reclaim jobs from dead workers) - [ ] Add logging for job lifecycle events ## Phase 6: Deployment - [ ] Create Dockerfile for `Managing.Compute` - [ ] Create deployment configuration for compute workers - [ ] Configure environment variables for compute cluster - [ ] Set up monitoring dashboards (Prometheus/Grafana) - [ ] Configure auto-scaling rules for compute workers ## Phase 7: Testing & Validation - [ ] Test single backtest job processing - [ ] Test bundle backtest with multiple jobs - [ ] Test concurrent job processing (multiple workers) - [ ] Test job recovery after worker failure - [ ] Test priority queue ordering - [ ] Load test with 1000+ concurrent users ## Phase 8: Migration Strategy - [ ] Keep Orleans grains as fallback during transition - [ ] Feature flag to switch between Orleans and Compute workers - [ ] Gradual migration: test with small percentage of traffic - [ ] Monitor performance and error rates - [ ] Full cutover once validated