Files
managing-apps/.cursor/commands/benchmark-backtest-performance.md

218 lines
11 KiB
Markdown

# Benchmark Backtest Performance
This command runs the backtest performance tests and records the results in the performance benchmark CSV file.
## Usage
Run this command to benchmark backtest performance and update the tracking CSV:
```
/benchmark-backtest-performance
```
Or run the script directly:
```bash
./scripts/benchmark-backtest-performance.sh
```
## What it does
1. Runs the **main performance telemetry test** (`ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry`)
2. Runs the **two-scenarios performance test** (`ExecuteBacktest_With_Two_Scenarios_Should_Show_Performance_Telemetry`) - tests pre-calculated signals with 2 indicators and validates business logic consistency
3. Runs **two business logic validation tests**:
- `ExecuteBacktest_With_ETH_FifteenMinutes_Data_Should_Return_LightBacktest`
- `ExecuteBacktest_With_ETH_FifteenMinutes_Data_Second_File_Should_Return_LightBacktest`
4. **Validates Business Logic**: Compares Final PnL with the first run baseline to ensure optimizations don't break behavior
5. Extracts performance metrics from the test output
6. Appends a new row to `src/Managing.Workers.Tests/performance-benchmarks.csv` (main test)
7. Appends a new row to `src/Managing.Workers.Tests/performance-benchmarks-two-scenarios.csv` (two-scenarios test)
8. **Never commits changes automatically**
## CSV Format
The CSV file contains clean numeric values for all telemetry metrics:
- `DateTime`: ISO 8601 timestamp when the benchmark was run
- `TestName`: Name of the test that was executed
- `CandlesCount`: Integer - Number of candles processed
- `ExecutionTimeSeconds`: Decimal - Total execution time in seconds
- `ProcessingRateCandlesPerSec`: Decimal - Candles processed per second
- `MemoryStartMB`: Decimal - Memory usage at start
- `MemoryEndMB`: Decimal - Memory usage at end
- `MemoryPeakMB`: Decimal - Peak memory usage
- `SignalUpdatesCount`: Decimal - Total signal updates performed
- `SignalUpdatesSkipped`: Integer - Number of signal updates skipped
- `SignalUpdateEfficiencyPercent`: Decimal - Percentage of signal updates that were skipped
- `BacktestStepsCount`: Decimal - Number of backtest steps executed
- `AverageSignalUpdateMs`: Decimal - Average time per signal update
- `AverageBacktestStepMs`: Decimal - Average time per backtest step
- `FinalPnL`: Decimal - Final profit and loss
- `WinRatePercent`: Integer - Win rate percentage
- `GrowthPercentage`: Decimal - Growth percentage
- `Score`: Decimal - Backtest score
- `CommitHash`: Git commit hash
- `GitBranch`: Git branch name
- `Environment`: Environment where test was run
## Implementation Details
The command uses regex patterns to extract metrics from the test console output and formats them into CSV rows. It detects the current git branch and commit hash for tracking purposes but **never commits and push changes automatically**.
## Performance Variance
The benchmark shows significant variance in execution times (e.g., 0.915s to 1.445s for the same code), which is expected:
- **System load affects results**: Background processes and system activity impact measurements
- **GC pauses occur unpredictably**: Garbage collection can cause sudden performance drops
- **Multiple runs recommended**: Run benchmarks 3-5 times and compare median values for reliable measurements
- **Time of day matters**: System resources vary based on other running processes
**Best Practice**: When optimizing, compare the median of multiple runs before and after changes to account for variance.
## Lessons Learned from Optimization Attempts
### ❌ **Pitfall: Rolling Window Changes**
**What happened**: Changing the order of HashSet operations in the rolling window broke business logic.
- Changed PnL from `22032.78` to `24322.17`
- The order of `Add()` and `Remove()` operations on the HashSet affected which candles were available during signal updates
- **Takeaway**: Even "performance-only" changes can alter trading logic if they affect the state during calculations
### ❌ **Pitfall: LINQ Caching**
**What happened**: Caching `candles.First()` and `candles.Last()` caused floating-point precision issues.
- SharpeRatio changed from `-0.01779902594116203` to `-0.017920689062300373`
- Using cached values vs. repeated LINQ calls introduced subtle precision differences
- **Takeaway**: Financial calculations are sensitive to floating-point precision; avoid unnecessary intermediate variables
### ✅ **Success: Business Logic Validation**
**What worked**: The benchmark's comprehensive validation caught breaking changes immediately:
1. **PnL baseline comparison** detected the rolling window issue
2. **Dedicated ETH tests** caught the SharpeRatio precision problem
3. **Immediate feedback** prevented bad optimizations from being committed
**Takeaway**: Always validate business logic after performance optimizations, even if they seem unrelated.
### ❌ **Pitfall: RSI Indicator Optimizations**
**What happened**: Attempting to optimize the RSI divergence indicator decreased performance by ~50%!
- Changed from **6446 candles/sec** back to **2797 candles/sec**
- **Complex LINQ optimizations** like `OrderByDescending().Take()` were slower than simple `TakeLast()`
- **Creating HashSet<Candle>** objects in signal generation added overhead
- **Caching calculations** added complexity without benefit
**Takeaway**: Not all code is worth optimizing. Some algorithms are already efficient enough, and micro-optimizations can hurt more than help. Always measure the impact before committing complex changes.
## Performance Bottleneck Analysis (Latest Findings)
Recent performance logging revealed the **true bottleneck** in backtest execution:
### 📊 **Backtest Timing Breakdown**
- **Total execution time**: ~1.4-1.6 seconds for 5760 candles
- **TradingBotBase.Run() calls**: 5,760 total (~87ms combined, 0.015ms average per call)
- **Unaccounted time**: ~1.3-1.5 seconds (94% of total execution time!)
### 🎯 **Identified Bottlenecks** (in order of impact)
1. **TradingBox.GetSignal()** - Indicator calculations (called ~1,932 times, ~0.99ms per call average)
2. **BacktestExecutor loop overhead** - HashSet operations, memory allocations
3. **Signal update frequency** - Even with 66.5% efficiency, remaining updates are expensive
4. **Memory management** - GC pressure from frequent allocations
### 🚀 **Next Optimization Targets**
1. **Optimize indicator calculations** - RSI divergence processing is the biggest bottleneck
2. **Reduce HashSet allocations** - Pre-allocate or reuse collections
3. **Optimize signal update logic** - Further reduce unnecessary updates
4. **Memory pooling** - Reuse objects to reduce GC pressure
## Major Optimization Success: Pre-Calculated Signals
### ✅ **Optimization: Pre-Calculated Signals**
**What was implemented**: Pre-calculated all signals once upfront instead of calling `TradingBox.GetSignal()` ~1,932 times during backtest execution.
**Technical Details**:
- Added `PreCalculateAllSignals()` method in `BacktestExecutor.cs`
- Pre-calculates signals for all candles using rolling window logic
- Modified `TradingBotBase.UpdateSignals()` to support pre-calculated signal lookup
- Updated backtest loop to use O(1) signal lookups instead of expensive calculations
**Performance Impact** (Average of 3 runs):
- **Processing Rate**: 2,800 → **~5,800 candles/sec** (2.1x improvement!)
- **Execution Time**: 1.4-1.6s → **~1.0s** (35-50% faster!)
- **Signal Update Time**: ~1,417ms → **Eliminated** (no more repeated calculations)
- **Consistent Results**: 5,217 - 6,871 candles/sec range (expected system variance)
**Business Logic Validation**:
- ✅ All validation tests passed
- ✅ Final PnL matches baseline (±0)
- ✅ Two-scenarios test includes baseline assertions for consistency over time (with proper win rate percentage handling)
- ✅ Live trading functionality preserved (no changes to live trading code)
**Takeaway**: The biggest performance gains come from eliminating redundant calculations. Pre-calculating expensive operations once upfront is far more effective than micro-optimizations.
## Safe Optimization Strategies
Based on lessons learned, safe optimizations include:
1. **Reduce system call frequency**: Cache `GC.GetTotalMemory()` checks (e.g., every 100 candles)
2. **Fix bugs**: Remove duplicate counters and redundant operations
3. **Avoid state changes**: Don't modify the order or timing of business logic operations
4. **Skip intermediate calculations**: Reduce logging and telemetry overhead
5. **Always validate**: Run full benchmark suite after every change
6. **Profile before optimizing**: Use targeted logging to identify real bottlenecks
## Example Output
```
🚀 Running backtest performance benchmark...
📊 Running main performance test...
✅ Performance test passed!
📊 Running business logic validation tests...
✅ Business logic validation tests passed!
✅ Business Logic OK: Final PnL matches baseline (±0)
📊 Benchmark Results:
• Processing Rate: 5688.8 candles/sec
• Execution Time: 1.005 seconds
• Memory Peak: 24.66 MB
• Signal Efficiency: 33.2%
• Candles Processed: 5760
• Score: 6015
✅ Benchmark data recorded successfully!
```
### Business Logic Validation
The benchmark includes **comprehensive business logic validation** on three levels:
#### 1. **Dedicated ETH Backtest Tests** (2 tests)
- `ExecuteBacktest_With_ETH_FifteenMinutes_Data_Should_Return_LightBacktest`
- Tests backtest with ETH 15-minute data
- Validates specific trading scenarios and positions
- Ensures indicator calculations are correct
- `ExecuteBacktest_With_ETH_FifteenMinutes_Data_Second_File_Should_Return_LightBacktest`
- Tests with a different ETH dataset
- Validates consistency across different market data
- Confirms trading logic works reliably
#### 2. **Large Dataset Telemetry Test** (1 test)
- `ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry`
- Validates performance metrics extraction
- Confirms signal updates and backtest steps
- Ensures telemetry data is accurate
#### 3. **PnL Baseline Comparison**
- **Consistent**: Final PnL matches first run (±0.01 tolerance)
- **Baseline OK**: Expected baseline is **24560.79**
- **⚠️ Warning**: Large differences indicate broken business logic
**All three validation levels must pass for the benchmark to succeed!**
**This prevents performance improvements from accidentally changing trading outcomes!**
## Files Modified
- `src/Managing.Workers.Tests/performance-benchmarks.csv` - **Modified** (new benchmark row added)
- `src/Managing.Workers.Tests/performance-benchmarks-two-scenarios.csv` - **Modified** (new two-scenarios benchmark row added)
**Note**: Changes are **not committed automatically**. Review the results and commit manually if satisfied.