Oda/managing-apps

Fork 0

Files

cryptooda d94896915c Fix benchmark tests

2025-11-12 21:04:39 +07:00

11 KiB

Raw Blame History

Benchmark Backtest Performance

This command runs the backtest performance tests and records the results in the performance benchmark CSV file.

Usage

Run this command to benchmark backtest performance and update the tracking CSV:

/benchmark-backtest-performance

Or run the script directly:

./scripts/benchmark-backtest-performance.sh

What it does

Runs the main performance telemetry test (Telemetry_ETH_RSI)
Runs the two-scenarios performance test (Telemetry_ETH_RSI_EMACROSS) - tests pre-calculated signals with 2 indicators and validates business logic consistency
Runs two business logic validation tests:
- ExecuteBacktest_With_ETH_FifteenMinutes_Data_Should_Return_LightBacktest
- LongBacktest_ETH_RSI
Validates Business Logic: Compares Final PnL with the first run baseline to ensure optimizations don't break behavior
Extracts performance metrics from the test output
Appends a new row to src/Managing.Workers.Tests/performance-benchmarks.csv (main test)
Appends a new row to src/Managing.Workers.Tests/performance-benchmarks-two-scenarios.csv (two-scenarios test)
Never commits changes automatically

CSV Format

The CSV file contains clean numeric values for all telemetry metrics:

DateTime: ISO 8601 timestamp when the benchmark was run
TestName: Name of the test that was executed
CandlesCount: Integer - Number of candles processed
ExecutionTimeSeconds: Decimal - Total execution time in seconds
ProcessingRateCandlesPerSec: Decimal - Candles processed per second
MemoryStartMB: Decimal - Memory usage at start
MemoryEndMB: Decimal - Memory usage at end
MemoryPeakMB: Decimal - Peak memory usage
SignalUpdatesCount: Decimal - Total signal updates performed
SignalUpdatesSkipped: Integer - Number of signal updates skipped
SignalUpdateEfficiencyPercent: Decimal - Percentage of signal updates that were skipped
BacktestStepsCount: Decimal - Number of backtest steps executed
AverageSignalUpdateMs: Decimal - Average time per signal update
AverageBacktestStepMs: Decimal - Average time per backtest step
FinalPnL: Decimal - Final profit and loss
WinRatePercent: Integer - Win rate percentage
GrowthPercentage: Decimal - Growth percentage
Score: Decimal - Backtest score
CommitHash: Git commit hash
GitBranch: Git branch name
Environment: Environment where test was run

Implementation Details

The command uses regex patterns to extract metrics from the test console output and formats them into CSV rows. It detects the current git branch and commit hash for tracking purposes but never commits and push changes automatically.

Performance Variance

The benchmark shows significant variance in execution times (e.g., 0.915s to 1.445s for the same code), which is expected:

System load affects results: Background processes and system activity impact measurements
GC pauses occur unpredictably: Garbage collection can cause sudden performance drops
Multiple runs recommended: Run benchmarks 3-5 times and compare median values for reliable measurements
Time of day matters: System resources vary based on other running processes

Best Practice: When optimizing, compare the median of multiple runs before and after changes to account for variance.

Lessons Learned from Optimization Attempts

❌ Pitfall: Rolling Window Changes

What happened: Changing the order of HashSet operations in the rolling window broke business logic.

Changed PnL from 22032.78 to 24322.17
The order of Add() and Remove() operations on the HashSet affected which candles were available during signal updates
Takeaway: Even "performance-only" changes can alter trading logic if they affect the state during calculations

❌ Pitfall: LINQ Caching

What happened: Caching candles.First() and candles.Last() caused floating-point precision issues.

SharpeRatio changed from -0.01779902594116203 to -0.017920689062300373
Using cached values vs. repeated LINQ calls introduced subtle precision differences
Takeaway: Financial calculations are sensitive to floating-point precision; avoid unnecessary intermediate variables

✅ Success: Business Logic Validation

What worked: The benchmark's comprehensive validation caught breaking changes immediately:

PnL baseline comparison detected the rolling window issue
Dedicated ETH tests caught the SharpeRatio precision problem
Immediate feedback prevented bad optimizations from being committed

Takeaway: Always validate business logic after performance optimizations, even if they seem unrelated.

❌ Pitfall: RSI Indicator Optimizations

What happened: Attempting to optimize the RSI divergence indicator decreased performance by ~50%!

Changed from 6446 candles/sec back to 2797 candles/sec
Complex LINQ optimizations like OrderByDescending().Take() were slower than simple TakeLast()
Creating HashSet objects in signal generation added overhead
Caching calculations added complexity without benefit

Takeaway: Not all code is worth optimizing. Some algorithms are already efficient enough, and micro-optimizations can hurt more than help. Always measure the impact before committing complex changes.

Performance Bottleneck Analysis (Latest Findings)

Recent performance logging revealed the true bottleneck in backtest execution:

📊 Backtest Timing Breakdown

Total execution time: ~1.4-1.6 seconds for 5760 candles
TradingBotBase.Run() calls: 5,760 total (~87ms combined, 0.015ms average per call)
Unaccounted time: ~1.3-1.5 seconds (94% of total execution time!)

🎯 Identified Bottlenecks (in order of impact)

TradingBox.GetSignal() - Indicator calculations (called ~1,932 times, ~0.99ms per call average)
BacktestExecutor loop overhead - HashSet operations, memory allocations
Signal update frequency - Even with 66.5% efficiency, remaining updates are expensive
Memory management - GC pressure from frequent allocations

🚀 Next Optimization Targets

Optimize indicator calculations - RSI divergence processing is the biggest bottleneck
Reduce HashSet allocations - Pre-allocate or reuse collections
Optimize signal update logic - Further reduce unnecessary updates
Memory pooling - Reuse objects to reduce GC pressure

Major Optimization Attempt: Pre-Calculated Signals (REVERTED)

❌ Optimization: Pre-Calculated Signals - REVERTED

What was attempted: Pre-calculate all signals once upfront to avoid calling TradingBox.GetSignal() repeatedly.

Why it failed: The approach was fundamentally flawed because:

Signal generation depends on the current rolling window state
Pre-calculating signals upfront still required calling the expensive TradingBox.GetSignal() method N times
The lookup mechanism failed due to date matching issues
Net result: Double the work with no performance benefit

Technical Issues:

Pre-calculated signals were not found during lookup (every candle fell back to on-the-fly calculation)
Signal calculation depends on dynamic rolling window state that cannot be pre-calculated
Added complexity without performance benefit

Result: Reverted to original TradingBox.GetSignal() approach with signal update frequency optimization.

Takeaway: Not all "optimizations" work. The signal generation logic is inherently dependent on current market state and cannot be effectively pre-calculated.

Current Performance Status (Post-Reversion)

After reverting the flawed pre-calculated signals optimization, performance is excellent:

✅ Processing Rate: 3,000-7,000 candles/sec (excellent performance with expected system variance)
✅ Execution Time: 0.8-1.8s for 5760 candles (depends on system load)
✅ Signal Update Efficiency: 66.5% (reduces updates by 2.8x)
✅ Memory Usage: 23.73MB peak
✅ All validation tests passed
✅ Business logic integrity maintained

The signal update frequency optimization remains in place and provides significant performance benefits without breaking business logic.

Safe Optimization Strategies

Based on lessons learned, safe optimizations include:

Reduce system call frequency: Cache GC.GetTotalMemory() checks (e.g., every 100 candles)
Fix bugs: Remove duplicate counters and redundant operations
Avoid state changes: Don't modify the order or timing of business logic operations
Skip intermediate calculations: Reduce logging and telemetry overhead
Always validate: Run full benchmark suite after every change
Profile before optimizing: Use targeted logging to identify real bottlenecks

Example Output

🚀 Running backtest performance benchmark...
📊 Running main performance test...
✅ Performance test passed!
📊 Running business logic validation tests...
✅ Business logic validation tests passed!
✅ Business Logic OK: Final PnL matches baseline (±0)
📊 Benchmark Results:
   • Processing Rate: 5688.8 candles/sec
   • Execution Time: 1.005 seconds
   • Memory Peak: 24.66 MB
   • Signal Efficiency: 33.2%
   • Candles Processed: 5760
   • Score: 6015

✅ Benchmark data recorded successfully!

Business Logic Validation

The benchmark includes comprehensive business logic validation on three levels:

1. Dedicated ETH Backtest Tests (2 tests)

ExecuteBacktest_With_ETH_FifteenMinutes_Data_Should_Return_LightBacktest
- Tests backtest with ETH 15-minute data
- Validates specific trading scenarios and positions
- Ensures indicator calculations are correct
LongBacktest_ETH_RSI
- Tests with a different ETH dataset
- Validates consistency across different market data
- Confirms trading logic works reliably

2. Large Dataset Telemetry Test (1 test)

Telemetry_ETH_RSI
- Validates performance metrics extraction
- Confirms signal updates and backtest steps
- Ensures telemetry data is accurate

3. PnL Baseline Comparison

Consistent: Final PnL matches first run (±0.01 tolerance)
Baseline OK: Expected baseline is 24560.79
⚠️ Warning: Large differences indicate broken business logic

All three validation levels must pass for the benchmark to succeed!

This prevents performance improvements from accidentally changing trading outcomes!

Files Modified

src/Managing.Workers.Tests/performance-benchmarks.csv - Modified (new benchmark row added)
src/Managing.Workers.Tests/performance-benchmarks-two-scenarios.csv - Modified (new two-scenarios benchmark row added)

Note: Changes are not committed automatically. Review the results and commit manually if satisfied.

11 KiB Raw Blame History