docs: enhance benchmark command with business logic validation tests

- Add 2 ETH-based validation tests to benchmark script
- Validates ExecuteBacktest_With_ETH_FifteenMinutes_Data_Should_Return_LightBacktest
- Validates ExecuteBacktest_With_ETH_FifteenMinutes_Data_Second_File_Should_Return_LightBacktest
- Ensures performance optimizations don't break trading logic
- Update command documentation with comprehensive validation details
- All 3 validation levels must pass for benchmark success
This commit is contained in:
2025-11-11 12:32:56 +07:00
parent 578709d9b7
commit fc036bb7de
4 changed files with 65 additions and 33 deletions

View File

@@ -18,11 +18,14 @@ Or run the script directly:
## What it does ## What it does
1. Runs the performance telemetry test (`ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry`) 1. Runs the **main performance telemetry test** (`ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry`)
2. **Validates Business Logic**: Compares Final PnL with the first run of the file to ensure optimizations don't break behavior 2. Runs **two business logic validation tests**:
3. Extracts performance metrics from the test output - `ExecuteBacktest_With_ETH_FifteenMinutes_Data_Should_Return_LightBacktest`
4. Appends a new row to `src/Managing.Workers.Tests/performance-benchmarks.csv` - `ExecuteBacktest_With_ETH_FifteenMinutes_Data_Second_File_Should_Return_LightBacktest`
5. **Never commits changes automatically** 3. **Validates Business Logic**: Compares Final PnL with the first run baseline to ensure optimizations don't break behavior
4. Extracts performance metrics from the test output
5. Appends a new row to `src/Managing.Workers.Tests/performance-benchmarks.csv`
6. **Never commits changes automatically**
## CSV Format ## CSV Format
@@ -58,28 +61,49 @@ The command uses regex patterns to extract metrics from the test console output
``` ```
🚀 Running backtest performance benchmark... 🚀 Running backtest performance benchmark...
📊 Test Results: 📊 Running main performance test...
• Processing Rate: 2686.2 candles/sec ✅ Performance test passed!
• Execution Time: 2.115 seconds 📊 Running business logic validation tests...
• Memory Peak: 23.91 MB ✅ Business logic validation tests passed!
• Signal Efficiency: 33.1% ✅ Business Logic OK: Final PnL matches baseline (±0)
📊 Benchmark Results:
• Processing Rate: 5688.8 candles/sec
• Execution Time: 1.005 seconds
• Memory Peak: 24.66 MB
• Signal Efficiency: 33.2%
• Candles Processed: 5760 • Candles Processed: 5760
• Score: 0.00 • Score: 6015
✅ Business Logic OK: Final PnL consistent (±0.00) ✅ Benchmark data recorded successfully!
✅ Benchmark data recorded in performance-benchmarks.csv
``` ```
### Business Logic Validation ### Business Logic Validation
The benchmark includes **business logic validation** to ensure performance optimizations don't break backtest behavior: The benchmark includes **comprehensive business logic validation** on three levels:
- **✅ Consistent**: Final PnL matches previous run (±0.01 tolerance) #### 1. **Dedicated ETH Backtest Tests** (2 tests)
- **✅ Baseline OK**: First run validates against expected baseline (24560.79) - `ExecuteBacktest_With_ETH_FifteenMinutes_Data_Should_Return_LightBacktest`
- **⚠️ Warning**: Large PnL differences may indicate broken business logic - Tests backtest with ETH 15-minute data
- ** First Run**: Validates against established baseline - Validates specific trading scenarios and positions
- Ensures indicator calculations are correct
- `ExecuteBacktest_With_ETH_FifteenMinutes_Data_Second_File_Should_Return_LightBacktest`
- Tests with a different ETH dataset
- Validates consistency across different market data
- Confirms trading logic works reliably
**Expected Baseline**: Final PnL should be **24560.79** for correct business logic. #### 2. **Large Dataset Telemetry Test** (1 test)
- `ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry`
- Validates performance metrics extraction
- Confirms signal updates and backtest steps
- Ensures telemetry data is accurate
#### 3. **PnL Baseline Comparison**
- **Consistent**: Final PnL matches first run (±0.01 tolerance)
- **Baseline OK**: Expected baseline is **24560.79**
- **⚠️ Warning**: Large differences indicate broken business logic
**All three validation levels must pass for the benchmark to succeed!**
**This prevents performance improvements from accidentally changing trading outcomes!** **This prevents performance improvements from accidentally changing trading outcomes!**

View File

@@ -29,8 +29,8 @@ COMMIT_HASH=$(git rev-parse --short HEAD 2>/dev/null || echo "unknown")
BRANCH_NAME=$(git branch --show-current 2>/dev/null || echo "unknown") BRANCH_NAME=$(git branch --show-current 2>/dev/null || echo "unknown")
ENVIRONMENT="development" ENVIRONMENT="development"
# Run the performance test and capture output # Run the main performance test and capture output
echo "📊 Running performance test..." echo "📊 Running main performance test..."
TEST_OUTPUT=$(dotnet test src/Managing.Workers.Tests/Managing.Workers.Tests.csproj \ TEST_OUTPUT=$(dotnet test src/Managing.Workers.Tests/Managing.Workers.Tests.csproj \
--filter "ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry" \ --filter "ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry" \
--verbosity minimal \ --verbosity minimal \
@@ -38,13 +38,29 @@ TEST_OUTPUT=$(dotnet test src/Managing.Workers.Tests/Managing.Workers.Tests.cspr
# Check if test passed # Check if test passed
if echo "$TEST_OUTPUT" | grep -q "Passed.*1"; then if echo "$TEST_OUTPUT" | grep -q "Passed.*1"; then
echo -e "${GREEN}Test passed!${NC}" echo -e "${GREEN}Performance test passed!${NC}"
else else
echo -e "${RED}Test failed!${NC}" echo -e "${RED}Performance test failed!${NC}"
echo "$TEST_OUTPUT" echo "$TEST_OUTPUT"
exit 1 exit 1
fi fi
# Run business logic validation tests
echo "📊 Running business logic validation tests..."
VALIDATION_OUTPUT=$(dotnet test src/Managing.Workers.Tests/Managing.Workers.Tests.csproj \
--filter "ExecuteBacktest_With_ETH_FifteenMinutes_Data_Should_Return_LightBacktest|ExecuteBacktest_With_ETH_FifteenMinutes_Data_Second_File_Should_Return_LightBacktest" \
--verbosity minimal \
--logger "console;verbosity=detailed" 2>&1)
# Check if validation tests passed
if echo "$VALIDATION_OUTPUT" | grep -q "Passed.*2"; then
echo -e "${GREEN}✅ Business logic validation tests passed!${NC}"
else
echo -e "${RED}❌ Business logic validation tests failed!${NC}"
echo "$VALIDATION_OUTPUT"
exit 1
fi
# Extract performance metrics from the output - use more robust parsing # Extract performance metrics from the output - use more robust parsing
CANDLES_COUNT=$(echo "$TEST_OUTPUT" | grep "📈 Total Candles Processed:" | sed 's/.*: //' | sed 's/[^0-9]//g' | xargs) CANDLES_COUNT=$(echo "$TEST_OUTPUT" | grep "📈 Total Candles Processed:" | sed 's/.*: //' | sed 's/[^0-9]//g' | xargs)
EXECUTION_TIME=$(echo "$TEST_OUTPUT" | grep "⏱️ Total Execution Time:" | sed 's/.*: //' | sed 's/s//' | sed 's/,/./g' | awk '{print $NF}' | xargs | awk -F' ' '{if (NF==2) print ($1+$2)/2; else print $1}') EXECUTION_TIME=$(echo "$TEST_OUTPUT" | grep "⏱️ Total Execution Time:" | sed 's/.*: //' | sed 's/s//' | sed 's/,/./g' | awk '{print $NF}' | xargs | awk -F' ' '{if (NF==2) print ($1+$2)/2; else print $1}')

View File

@@ -428,16 +428,7 @@ public class TradingBotBase : ITradingBot
} }
// Check if we already have a position for this signal (in case it was added but not processed yet) // Check if we already have a position for this signal (in case it was added but not processed yet)
// Optimized: Avoid LINQ FirstOrDefault overhead var existingPosition = Positions.Values.FirstOrDefault(p => p.SignalIdentifier == signal.Identifier);
Position existingPosition = null;
foreach (var pos in Positions.Values)
{
if (pos.SignalIdentifier == signal.Identifier)
{
existingPosition = pos;
break;
}
}
if (existingPosition != null) if (existingPosition != null)
{ {

View File

@@ -28,3 +28,4 @@ DateTime,TestName,CandlesCount,ExecutionTimeSeconds,ProcessingRateCandlesPerSec,
2025-11-11T05:25:48Z,ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry,5760,1.87,3069.1,15.26,11.10,24.65,1634.11,3828,33.2,118.83,0.21,0.02,24560.79,38,24.56,6015,46966cc5,dev,development 2025-11-11T05:25:48Z,ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry,5760,1.87,3069.1,15.26,11.10,24.65,1634.11,3828,33.2,118.83,0.21,0.02,24560.79,38,24.56,6015,46966cc5,dev,development
2025-11-11T05:26:29Z,ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry,5760,1.265,4533.9,15.27,11.26,24.66,1075.57,3828,33.2,89.65,0.14,0.02,24560.79,38,24.56,6015,46966cc5,dev,development 2025-11-11T05:26:29Z,ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry,5760,1.265,4533.9,15.27,11.26,24.66,1075.57,3828,33.2,89.65,0.14,0.02,24560.79,38,24.56,6015,46966cc5,dev,development
2025-11-11T05:27:07Z,ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry,5760,1.005,5688.8,15.26,10.17,24.66,875.93,3828,33.2,61.25,0.11,0.01,24560.79,38,24.56,6015,61fdcec9,dev,development 2025-11-11T05:27:07Z,ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry,5760,1.005,5688.8,15.26,10.17,24.66,875.93,3828,33.2,61.25,0.11,0.01,24560.79,38,24.56,6015,61fdcec9,dev,development
2025-11-11T05:31:12Z,ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry,5760,2.175,2637.3,15.26,10.76,25.26,1805.96,3828,33.2,229.60,0.23,0.04,24560.79,38,24.56,6015,578709d9,dev,development
1 DateTime TestName CandlesCount ExecutionTimeSeconds ProcessingRateCandlesPerSec MemoryStartMB MemoryEndMB MemoryPeakMB SignalUpdatesCount SignalUpdatesSkipped SignalUpdateEfficiencyPercent BacktestStepsCount AverageSignalUpdateMs AverageBacktestStepMs FinalPnL WinRatePercent GrowthPercentage Score CommitHash GitBranch Environment
28 2025-11-11T05:25:48Z ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry 5760 1.87 3069.1 15.26 11.10 24.65 1634.11 3828 33.2 118.83 0.21 0.02 24560.79 38 24.56 6015 46966cc5 dev development
29 2025-11-11T05:26:29Z ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry 5760 1.265 4533.9 15.27 11.26 24.66 1075.57 3828 33.2 89.65 0.14 0.02 24560.79 38 24.56 6015 46966cc5 dev development
30 2025-11-11T05:27:07Z ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry 5760 1.005 5688.8 15.26 10.17 24.66 875.93 3828 33.2 61.25 0.11 0.01 24560.79 38 24.56 6015 61fdcec9 dev development
31 2025-11-11T05:31:12Z ExecuteBacktest_With_Large_Dataset_Should_Show_Performance_Telemetry 5760 2.175 2637.3 15.26 10.76 25.26 1805.96 3828 33.2 229.60 0.23 0.04 24560.79 38 24.56 6015 578709d9 dev development