Backtesting

NOVOSKY uses one backtest engine for realistic evaluation: backtest_config.py.

Tool	When to use	Expected WR
`backtest_config.py`	Real performance estimate, after retraining or config changes	~57–86% (ATR-based)

Data contamination: default --days 365 overlaps with training data. Always use --oos-only for real OOS results. This flag reads train_cutoff_date from ensemble_btcusd-live_metadata.json and only tests on data after that date.

Config-faithful backtest (recommended)

backtest_config.py reads your actual config.json + ml_config.json and simulates the live bot bar-by-bar. This is the realistic live performance estimate.

# True OOS — always use this flag for real results
python backtest_config.py \
  --balance 500 --no-swap --leverage 500 \
  --spread 16.95 --oos-only --no-chart

# With realistic lot cap (prevents unlimited compounding)
python backtest_config.py \
  --balance 500 --no-swap --leverage 500 \
  --spread 16.95 --oos-only --max-lot 1.0 --no-chart

# VT Markets Cent account
python backtest_config.py \
  --balance 500 --no-swap --leverage 500 \
  --spread 16.95 --oos-only --cent-account \
  --max-lot 0.1 --no-chart

# IC Markets RAW spread
python backtest_config.py \
  --balance 10000 --days 365 --swap-long 20 \
  --leverage 200 --spread 3.0 --no-chart

Key flags

Flag	Why it matters
`--oos-only`	Always use this. Without it, you’re testing on in-sample data and will see ~97% WR. That is not real.
`--spread`	VT Markets BTCUSD = `16.95`, IC Markets RAW = `3.0`. Wrong value inflates WR by 5–8 percentage points.
`--cent-account`	Only if your broker reports balance in USC. VT Markets Cent uses this.
`--max-lot`	Caps lot size. Without this, compounding hits `max_lot` fast and inflates returns. Use `1.0` for realistic results.
`--no-chart`	Skip matplotlib output (required for headless/server runs).
`--oos-end DATE`	Hard end date for the OOS window (`YYYY-MM-DD`). Truncates data after this date. Used by the weekly optimizer’s sweep phase to prevent config overfitting — the sweep only sees the first 70% of the OOS window.

Preset-style comparisons with `backtest_config.py`

The old fixed-setup wrapper has been removed. Use backtest_config.py directly and vary the inputs you care about:

# Smaller account / conservative cap
python backtest_config.py --balance 200 --no-swap --leverage 500 --spread 16.95 --oos-only --max-lot 0.1 --no-chart

# Growth account / looser cap
python backtest_config.py --balance 1000 --no-swap --leverage 500 --spread 16.95 --oos-only --max-lot 1.0 --no-chart

# Cent account
python backtest_config.py --balance 500 --no-swap --leverage 500 --spread 16.95 --oos-only --cent-account --max-lot 0.1 --no-chart

Interpreting results

WR interpretation

backtest_config.py is the live-expectation tool. Its WR is lower than the old preset wrapper because it uses ATR-based SL/TP and your real config. Use it as the final gate before deployment.

Phase performance history

Phase	OOS window	WR	PF	MaxDD	Return
11	224d (Sep 2025–Apr 2026)	57.4%	2.23	50.2%	+56,136%
15	37d (Mar–Apr 2026)	78.5%	2.43	1.8%	+50.7%

The Phase 15 OOS window is shorter (38 days) — enough to validate post-cutoff behavior but less statistically robust than Phase 11’s 224-day window. Use both reference points when evaluating retraining results.

Score metric

Results are ranked by:

Score = WR × PF / √MaxDD

Higher is better. Latest Phase 15 weekly score: 21.34.

Models

Operations

Config-faithful backtest (recommended)

Key flags

Preset-style comparisons with `backtest_config.py`

Interpreting results

WR interpretation

Phase performance history

Score metric

Models

Operations

​Config-faithful backtest (recommended)

​Key flags

​Preset-style comparisons with backtest_config.py

​Interpreting results

​WR interpretation

​Phase performance history

​Score metric

Config-faithful backtest (recommended)

Key flags

Preset-style comparisons with `backtest_config.py`

Interpreting results

WR interpretation

Phase performance history

Score metric