TODO - NOVOSKY Docs

Current state — Phase 17.3 (active)

Signal model

62 features · RF+XGB+LGB · 3-class · M15 · Optuna-tuned · recency-weighted

Training cutoff: 2026-05-06 20:04 UTC
Hugging Face Hub tag: v20260506

Position model

71 features (62 market + 4 state + 5 M1)

ATR-aware labels · exit_threshold=0.80
Accuracy: RF=22.4%, XGB=76.1%, LGB=60.2%
181,414 training samples

OOS performance

Window	WR	PF	MaxDD	Sharpe	Return	Score
37d — Phase 17.3 (weekly optimize, active)	71.7%	5.46	4.5%	59.54	+195.7%	54.47
38d — Phase 17.2 (weekly optimize)	67.6%	2.83	7.3%	42.30	+394.5%	35.13
38d — Phase 16w (weekly optimize)	62.8%	2.26	14.5%	37.88	+1154.6%	46.80
37d — Phase 16 (base retrain)	65.8%	3.03	7.1%	49.88	+390.9%	—
37d — Phase 15	78.5%	2.43	1.8%	32.99	+50.7%	—
224d (Sep 2025–Apr 2026)	57.4%	2.21	50.2%	13.60	—	—

Active config: conf=0.65 · prob_diff=0.08 · risk=2.0% · TP=1.0×ATR · SL=1.0×ATR · CB=7 · Profile 3 Balanced

In progress — immediate blockers

VM cron (15.2): Wire weekly_optimize.py on trading VM
Phase 15.5: Broker safety audit documented — see Broker safety audit

Phase 15 — Production transition

15.0 Broker-agnostic refactor

Complete — 2026-04-18

config.json no longer requires "broker" key; all scripts load spread/leverage/swap dynamically from MT5 API /account and /symbols
backtest.py — config-faithful backtester; use --max-lot/--cent-account for preset-style comparisons
scripts/sweep.py — unified sweep replacing sweep_signal.py + sweep_pos_model.py; --target signal|pos|both
scripts/check_broker_limits.py — fixed main guard; added --symbol flag

15.1 Position model validation

Complete — 2026-04-15

38d OOS: PF=4.71 vs baseline 4.27 (+10.3%), MaxDD=4.7%, 1 ML_Exit
Sweep of 13 configs via scripts/sweep.py --target pos
Thresholds updated: exit=0.80, min_prob_diff=0.25, min_bars_held=4

15.2 Automated retrain pipeline

Complete — 2026-04-17

scripts/weekly_optimize.py — 13-phase autonomous pipeline (SHAP → tune → retrain → sweep → evaluate → push → commit → notify)
Score improvement gate: 2% minimum to keep new models
Incremental warmstart training already enabled by default (ml/train.py, ml/ensemble_trainer.py)

VM task pending: Wire cron on trading VM:

0 2 * * 0  cd /path/NOVOSKY && .venv/bin/python scripts/weekly_optimize.py >> logs/weekly.log 2>&1

15.3 Cloud monitoring

Complete — Telegram bot commands cover live status; performance_monitor.py removed

Degradation detection is handled by the weekly walk-forward OOS gate in weekly_optimize.py (3 × 4-week folds, WR ≥ 55%, PF ≥ 1.8)
Live alerts flow through trading/telegram_commands.py and scripts/notify.py

15.4 Telegram bot commands

Complete — 2026-04-15

/help · /pause · /resume · /status · /positions · /close · /closeall · /news · /pnl · /latency Auth-gated by TELEGRAM_CHAT_ID. Pause flag wired into main trade entry loop.

15.5 Broker safety audit

Documented — 2026-04-25

Full audit guide at Broker safety audit: tick size, stops level, lot limits, swap status, pass/fail criteria, and per-broker reference values (RoboForex, IC Markets).

Run audit against live RoboForex server: python scripts/check_broker_limits.py — PASS (2026-05-10: all limits confirmed, spread=14.59, stops_level=0, lot 0.01–100, contract_size=1.0)
Run audit against IC Markets when account is provisioned
Run audit against FundingPips when account is provisioned

15.6 Weekly validation cadence

Complete — 2026-04-17

Covered by weekly_optimize.py — runs full OOS backtest every Sunday, auto-rolls back if Score regresses.

Phase 16 — Risk guards & validation

Phase 16 fixes landed 2026-04-24: RF double-weighting removed (val acc 10.5%→49.4%), TP corrected 0.8→1.2×ATR, labels aligned with live execution, risk model rebuilt with Kelly log-utility label function and sequence augmentation (~5,000 training samples, label std 0.000→0.442, val MAE=0.3084). OOS result: WR=65.8%, PF=3.03, MaxDD=7.1%, Sharpe=49.88, Return=+390.9%, 114 trades over 37 days.

16.1 Enable daily loss guard

Complete — 2026-05-10

max_daily_loss_pct is active in trading/bot.py (check_max_loss_profit()). Set to 3.0% of live equity — recomputed each cycle from account.equity.

Formula — dynamic daily loss limit: The guard scales with current equity, not a hard USD constant. At

500 equity: limit =

15. At

2418: limit =

72.54.

# bot.py guard logic (active):
_loss_pct = config.get("max_daily_loss_pct")  # 3.0
_equity = _api_balance_to_usd(_acct.equity)
max_daily_loss = _equity * (_loss_pct / 100.0)
if daily_pl <= -max_daily_loss:
    send_telegram_message("🚨 [DAILY_LOSS_GUARD] ...")
    time.sleep(3600)

Confirmed trading/bot.py:check_max_loss_profit() used fixed USD — upgraded to equity-relative
Added max_daily_loss_pct: 3.0 to config.json; removed max_daily_loss: 99999
Bot computes limit = equity * pct / 100 dynamically each cycle via _get_account().equity
Dry-run: python trading.py --dry — guard logs [DAILY_LOSS_GUARD] on breach
Daily reset uses LOCAL_TZ_OFFSET (+7 WIB) — resets at WIB midnight ✅

16.2 Equity curve filter

Complete — 2026-04-25

equity_curve_filter config block added (enabled: false, lookback_trades: 10, max_drawdown_pct: 5.0)
_recent_trade_pnls persisted in state.json and restored on restart
Entry guard fires [EC_FILTER] when net loss over last N trades ≥ threshold % of equity
Enable after validating OOS backtest with the filter active

16.3 Extend walk-forward OOS gate

Complete — 2026-04-25

Phase 7b added to scripts/weekly_optimize.py between phase 7 and phase 8
3 non-overlapping 4-week OOS backtest folds run after every retrain
Gate: all folds WR ≥ 55%, all folds PF ≥ 1.8, median PF ≥ 2.0
Failure rolls back the snapshot immediately and skips push/commit
Fold results appended to logs/wf_gate.log

16.4 Enable Kelly lot sizing

Kelly lot sizing is fully implemented in trading/bot.py:1193–1257. It is disabled via config.

At average confidence of 61.5%, the Kelly criterion cuts effective risk from 6% to ~2.1% — similar to what the current static risk_percent achieves. The value of enabling it is that it scales up naturally at high-confidence setups and scales down at marginal ones.

Enable: ml_active_management.kelly_lot_sizing.enabled = true — already enabled
OOS sweep (3 runs via --mode kelly): disabled Score=0.19 MaxDD=16.1% | half=0.5 Score=0.30 MaxDD=7.5% | full=1.0 Score=0.31 MaxDD=8.7%
Kelly must stay on — disabled falls back to raw static sizing, MaxDD spikes to 16.1%
Keeping max_kelly_fraction: 0.5 (half-Kelly): full-Kelly wins Score by 0.01 but costs 1.2% extra MaxDD — not worth it at $500 balance. Re-evaluate at higher equity.

16.5 Broker-Agnostic Multi-Account Architecture

The system is designed to be broker-agnostic, decoupling the ML pipeline and trade logic from any specific broker (like RoboForex, IC Markets, FundingPips, etc.). Training and live execution will seamlessly support multiple brokers simultaneously, and eventually decentralized exchanges (DEX) like Hyperliquid. config/accounts.json schema:

{
  "accounts": {
    "exness_main": {
      "broker_id": "exness",
      "terminal_key": "terminal-one",
      "account_type": "cent",
      "currency_multiplier": 0.01,
      "risk_pct": 3.0,
      "max_lot": 5.0,
      "state_file": "state_exness_main.json"
    },
    "icmarkets_main": {
      "broker_id": "icmarkets",
      "terminal_key": "terminal-two",
      "account_type": "standard",
      "currency_multiplier": 1.0,
      "risk_pct": 2.0,
      "max_lot": 1.0,
      "state_file": "state_icmarkets_main.json"
    }
  }
}

Multi-broker dataset merging — deduplication and normalization: Merging MT5 feeds from different brokers introduces duplicate timestamps (broker server time can differ by ±1 bar) and price divergence (spread differences). Normalization formula:

# Normalize close price across brokers to remove spread bias
def normalize_ohlcv(df: pd.DataFrame, broker_id: str) -> pd.DataFrame:
    spread = BROKER_SPREADS[broker_id]          # e.g., 14.59 USD for RoboForex
    df["close_norm"] = df["close"] - spread / 2  # midpoint price
    return df

# Merge: outer join on timestamp, ffill broker gaps (< 2 bars)
merged = pd.merge(df_vt, df_ic, on="timestamp", how="outer", suffixes=("_vt", "_ic"))
merged["close"] = merged[["close_vt", "close_ic"]].mean(axis=1)  # average midpoints
merged.ffill(limit=2, inplace=True)  # fill short gaps; drop if both missing
merged.dropna(subset=["close"], inplace=True)

PM2 multi-instance ecosystem config:

// ecosystem.config.js
module.exports = {
  apps: [
    { name: "novosky-roboforex",  script: "trading.py", args: "--account exness_main", interpreter: ".venv/bin/python" },
    { name: "novosky-ic",  script: "trading.py", args: "--account icmarkets_main", interpreter: ".venv/bin/python" },
    { name: "ws-server",   script: "trading/ws_server.py", interpreter: ".venv/bin/python" },
  ]
};

Tasks:

Create config/accounts.json with schema above; add RoboForex entry only for now
Add --account <account_id> flag to trading.py argument parser; load account config at startup; override risk_pct, max_lot, state_file, and API base URL from account config
Update trading/bot.py: replace hardcoded state.json reference with config["state_file"]; replace API_BASE_URL with terminals[account["terminal_key"]]["url"]
Update ml/train.py: add --brokers exness,icmarkets flag; when multiple brokers specified, fetch, normalize, and merge their OHLCV datasets; retrain on merged set
Unit test multi-broker merge: assert merged DataFrame has no duplicate timestamps, close prices within 0.5% of each broker’s midpoint
Update ecosystem.config.js with multi-process template (disabled by default — uncomment when IC Markets account is provisioned)
config/terminals.json port + domain mapping already implemented — RoboForex on port 6542 / terminal-rf1.novosky.app
Draft Hyperliquid adapter spec in docs/architecture/hyperliquid.mdx: REST endpoints for order placement (POST /exchange), position query (POST /info with type: clearinghouseState), and funding rate feed (/info with type: fundingHistory)

Phase 21 — Dynamic SL/TP & position model upgrades

These have the second-highest near-term ROI because the core code is already built. Items 21.1 and 21.2 require no retraining.

21.1 Enable and validate `ml_sltp` (confidence-scaled TP/SL at entry)

Complete — 2026-04-22

The Dynamic SL/TP regression model (LightGBM) is fully orchestrated in trading/bot.py. It dynamically predicts the ideal SL/TP ATR multipliers for each trade based on market volatility and regime, overriding the static fallback multipliers. It leverages Walk-Forward validation (TimeSeriesSplit) to ensure the predictions generalize well to out-of-sample regimes. At order placement, ml_sltp re-scales SL and TP using the signal model’s entry confidence:

conf_norm = clamp((confidence − threshold) / (1.0 − threshold), 0, 1)
effective_SL = base_sl_atr_mult × ATR × (1 − conf_norm × confidence_sl_adjust)
effective_TP = base_tp_atr_mult × ATR × (1 + conf_norm × confidence_tp_adjust)

confidence	SL	TP	RR
60% (threshold)	1.00 × ATR	1.50 × ATR	1.50
80%	0.85 × ATR	1.75 × ATR	2.06
100%	0.70 × ATR	2.25 × ATR	3.21

Without ml_sltp: SL = 1.0 × ATR, TP = 0.8 × ATR (static 1:0.8 RR, currently live). Config location: config.json → ml_active_management.ml_sltp

Enable: set ml_sltp.enabled = true in config.json — already enabled
OOS backtest result: WR=48.8%, PF=1.72, MaxDD=9.0%, Score=0.26 (37d OOS, post-retrain)
Score degraded vs disabled baseline (Score=0.30, MaxDD=7.5%) — swept confidence_sl_adjust × confidence_tp_adjust (20 combos via --mode ml_sltp)
Best enabled combo sladj0.4_tpadj0.5 still lost on Score (0.29) and MaxDD (9.1%) — disabled per todo rule. Re-evaluate after more live trades improve confidence signal quality.
min_tp floor check already present in the ml_sltp path (bot.py:4104–4105)
Removed dead base_sl_atr_mult / base_tp_atr_mult config keys — SLTP regression model supplies the base multipliers, not static config

atr_tp_mult: 0.8 and atr_sl_mult: 0.8 in ml_config.json → labeling are training-time label parameters only — they determine HOLD/EXIT/ADD ground truth during training. They do not control live execution. Live TP/SL is config.json → dynamic_sltp (and optionally ml_sltp).

21.2 Trailing stop + lower `min_bars_held`

Both are fully implemented in the bot. Both are config-only changes. No retrain required. Trailing stop (bot.py:2928–2960): ATR trail width scales with model confidence and live momentum_decay. High confidence → tight trail; low confidence / adverse momentum → wider trail. Currently disabled.

Enable: set ml_trailing_stop.enabled = true, base_trail_atr_mult = 1.2, min_profit_atr = 0.8
Run OOS backtest and compare Score vs baseline
OOS result: WR=46.6%, PF=1.45, MaxDD=12.7%, Score≈0.19 — worse than baseline (Score=0.30, MaxDD=7.5%). Conflicts with partial_close breakeven SL — trailing stop cuts winners after BE move. Disabled. Saved params in config.json._note.
Do not enable simultaneously with ml_sltp testing — change one variable at a time

min_bars_held reduction: Currently 4 bars (60 min minimum hold). At 2 bars the position model can act after 30 minutes instead of 60.

Set position_optimization.min_bars_held = 2 in config.json
Run python scripts/sweep.py --target pos (pre-retrain): Score 0.30→0.34, best was exit0.60/bars2. Applied temporarily.
Post clean-retrain sweep (2026-04-26): optimal config reverted to exit_threshold=0.80, min_prob_diff=0.25, min_bars_held=4 — Score=0.66, PF=2.69, DD=6.4%. After --no-warmstart, model EXIT signals more reliable at higher threshold. Applied to config.json + ml_config.json.

21.3 M1 intra-candle feature augmentation for position model

The feature cache (_latest_features_cache) is M15-derived. All 59 market features stay stale within a 15-minute candle. Adding 5 M1 scalars gives the position model intra-candle microstructure. This requires a position model retrain (breaking scaler change). Proposed M1 features:

Feature	Calculation	Signal
`m1_price_accel_5`	`(close[0] − close[5]) / close[5]` on M1	Intra-candle momentum
`m1_vol_ratio_5`	`mean(volume[0:5]) / mean(volume[5:10])`	Volume surge
`m1_rsi_9`	Wilder RSI(9) on M1 close	Fast momentum state
`m1_atr_3`	ATR(3) normalized by close	Intra-candle volatility
`m1_body_pct`	`abs(close − open) / (high − low + ε)`	Candle conviction

Implementation (requires retrain):

Do this after 21.2 no-retrain items are validated and live. You want a clean baseline before adding features.

21.4 Catastrophic SL + position model as primary exit

Wide hard SL as safety net only. No hard TP. Position model drives all exits. Stays open longer — exits on regime deterioration rather than a fixed ATR multiple. Preconditions (all must be met before enabling):

Precondition 1 originally referenced ml_sltp (Phase 21.1) — but ml_sltp was disabled in OOS testing (Score degraded). That precondition is now replaced with a position model precision gate only.

~~Phase 21.1 (ml_sltp) validated OOS~~ — removed (ml_sltp is disabled; position-as-primary-exit does not depend on confidence-scaled TP/SL)
Position model EXIT precision ≥ 87% on OOS (current: 85%) — this is the primary readiness gate
≥ 200 live dry-run trades with ml_active_management.enabled = true confirming EXIT fires at correct rate (target: ML_EXIT ≥ 60% of closes)
Telemetry already in place: each close logged as SL_HIT, TP_HIT, or ML_EXIT in ml_performance.csv ✅ (implemented)

Implementation:

"ml_active_management": {
  "position_as_primary_exit": {
    "enabled": false,
    "catastrophic_sl_atr_mult": 3.0,
    "emergency_tp_atr_mult": 5.0
  }
}

position_as_primary_exit config block added to config.json (enabled: false)
bot.py wired: if enabled, overrides effective_sl = ATR × catastrophic_sl_atr_mult, effective_tp = ATR × emergency_tp_atr_mult, logs [PRIMARY_EXIT_MODE]
SL_HIT / TP_HIT / ML_EXIT telemetry added to ml_performance.csv via _close_type() helper OOS test methodology for wide-SL:

# Step 1: Measure wide-SL alone (TP still on) — understand the RR change
python backtest.py --oos-only --no-chart \
    --override "dynamic_sltp.sl_atr_multiplier=3.0"

# Step 2: Full mode — wide SL + no TP (position model drives exits)
python backtest.py --oos-only --no-chart \
    --override "position_as_primary_exit.enabled=true" \
    --override "dynamic_sltp.sl_atr_multiplier=3.0" \
    --override "dynamic_sltp.tp_atr_multiplier=99.0"  # effectively no TP

# Accept only if: WR ≥ 55%, MaxDD ≤ 20%, ML_EXIT rate ≥ 60% in simulation

Expected behavior when enabled: Average trade duration increases from ~2h to ~6–12h. MaxDD can spike during gap events (weekend crypto gaps). The 3.0×ATR hard SL is the last line of defense.

Run Step 1 OOS (wide-SL alone): log result to logs/wide_sl_test.log; accept if Score ≥ current − 2.0 (some score loss from wider SL is expected)
Run Step 2 OOS (position-as-primary): accept only if all gate conditions above are met
Dry-run 100 trades: tail pm2 logs novosky --lines 200 --nostream after each close; count ML_EXIT vs SL_HIT vs TP_HIT; confirm ML_EXIT ≥ 60%
Only enable on live after all preconditions are met; announce in Telegram: [PRIMARY_EXIT_MODE] Enabled — SL=3.0×ATR, no TP

21.5 Risk model scope — architectural constraint

The risk model outputs a scalar multiplier [0.10, 1.25] applied to base_risk_pct. Its 7 features are equity-state scalars only. It must not control SL/TP distances.

Concern	Correct mechanism
Lot size based on equity health	Risk model → `effective_risk_pct`
TP/SL based on signal confidence	`ml_sltp` (Phase 21.1)
Intra-trade exit timing	Position model (Phase 21.3/21.4)
Protective trailing SL	`ml_trailing_stop` (Phase 21.2)

Add a Scope: section to ml/risk_predictor.py module docstring stating the model outputs lot-sizing multipliers only

Phase 17 — Feature engineering

Add new market signals before the next major retrain. Implement in ml/feature_engineering.py. Each feature requires a full retrain + OOS validation before going live.

Add features one group at a time. Retrain after each group. Check that OOS Score does not degrade. Adding too many features at once makes it impossible to identify what hurt or helped.

17.1 On-chain & derivatives features

These have direct theoretical backing for BTCUSD direction — funding rate and OI are the primary sentiment signals used by professional crypto traders. Dependency: pip install requests pandas funding_rate — formula and normalization: Binance perpetual funding settles every 8 h. Rate represents cost of holding long vs short (positive = longs pay shorts → bearish pressure; negative = shorts pay longs → bullish pressure).

# ml/data_sources/binance.py
import requests, pandas as pd

def fetch_funding_rate(symbol="BTCUSDT", limit=500) -> pd.DataFrame:
    url = "https://fapi.binance.com/fapi/v1/fundingRate"
    r = requests.get(url, params={"symbol": symbol, "limit": limit}, timeout=10)
    df = pd.DataFrame(r.json())
    df["timestamp"] = pd.to_datetime(df["fundingTime"], unit="ms", utc=True)
    df["funding_rate"] = df["fundingRate"].astype(float)
    # Normalize: clamp to [-0.003, 0.003] (99th percentile range), then scale to [-1, 1]
    CLIP = 0.003
    df["funding_rate_norm"] = df["funding_rate"].clip(-CLIP, CLIP) / CLIP
    return df[["timestamp", "funding_rate_norm"]].set_index("timestamp")

# Forward-fill to M15 grid:
def merge_funding_to_m15(df_m15: pd.DataFrame, df_funding: pd.DataFrame) -> pd.DataFrame:
    df_m15 = df_m15.join(df_funding, how="left")
    df_m15["funding_rate_norm"] = df_m15["funding_rate_norm"].ffill().fillna(0.0)
    return df_m15

oi_change — rolling delta formula:

def fetch_open_interest(symbol="BTCUSDT") -> pd.DataFrame:
    url = "https://fapi.binance.com/futures/data/openInterestHist"
    r = requests.get(url, params={"symbol": symbol, "period": "15m", "limit": 500}, timeout=10)
    df = pd.DataFrame(r.json())
    df["timestamp"] = pd.to_datetime(df["timestamp"], unit="ms", utc=True)
    df["oi"] = df["sumOpenInterest"].astype(float)
    # 4h rolling pct change = 16 × 15min bars
    df["oi_change"] = df["oi"].pct_change(16).clip(-1, 1).fillna(0.0)
    return df[["timestamp", "oi_change"]].set_index("timestamp")

fear_greed_index — low-information warning: 96 identical M15 values per day from this feature → SHAP will be near zero unless daily pivots correlate with session opens. Include on a trial basis; drop if SHAP < 0.001 after retrain.

def fetch_fear_greed() -> pd.DataFrame:
    r = requests.get("https://api.alternative.me/fng/?limit=365", timeout=10)
    df = pd.DataFrame(r.json()["data"])
    df["timestamp"] = pd.to_datetime(df["timestamp"].astype(int), unit="s", utc=True)
    df["fear_greed"] = df["value"].astype(float) / 100.0  # normalize 0–1
    return df[["timestamp", "fear_greed"]].set_index("timestamp")

Integration tasks:

Create ml/data_sources/binance.py with fetch_funding_rate, fetch_open_interest, fetch_fear_greed — each returns a UTC-indexed DataFrame
Add to ml/train.py data loading block: after df_m15 is built, call all three fetchers and merge via left-join + ffill — same pattern as M1 features
Add feature names to model_compat.json["features"] in this order (append at end): funding_rate_norm, oi_change, fear_greed
Run python scripts/retrain.py --ensemble --no-warmstart with the 3 new features; run ml/shap_analysis.py; remove fear_greed from model_compat.json if its mean absolute SHAP < 0.001
Cache fetched data to datasets/funding_rate.csv, datasets/oi_change.csv, datasets/fear_greed.csv — same pattern as training_data_btcusd_m1.csv
OOS gate: ensemble with on-chain features must have Score ≥ current + 0.5; if not, revert — on-chain features add API call latency so they must pay for themselves

17.2 Market regime features

These are derived entirely from existing OHLCV data — no external API dependencies.

volatility_regime — Compute ATR(14) percentile rank over a rolling 500-bar window; encode as continuous 0–1 (not bucketed) to avoid artificial boundaries
w1_ema_bias — Resample M15 to W1 (504 bars); compute (close − EMA(10)) / close; forward-fill
w1_rsi_norm — RSI(14) on W1 bars, normalized to 0–1; forward-fill

Weekly features follow the same pattern as existing H4/D1 resampling in calculate_mtf_features(). Add them to the same function.

17.3 OHLCV data redundancy pipeline

Reverted. ml/data_sources.py and the multi-source consensus-averaging pipeline were removed in commit e8fd0da. Root cause: averaging OHLCV across Exness and RoboForex at the same timestamps produced artificial prices that didn’t match the live Exness feed, causing a training-live distribution mismatch that degraded fold_3 WF performance (PF=1.34). Training now uses a single source (API_URL in .env) with symbol auto-detection. TRAINING_SOURCE_* and TRAINING_YFINANCE env vars are no longer read by any code. Multi-source redundancy may be revisited in a future phase with a primary-source-wins merge strategy (no averaging) rather than consensus blending.

17.4 Smart Money Concepts (SMC) features

Add institutional order-flow structure as ML features using the smartmoneyconcepts library. SMC theory models how large players move price toward liquidity, making it a natural complement to the existing momentum and volatility features for BTC/USD. Dependency: already installed in examples/python/. Add to requirements.txt for the main project.

pip install smartmoneyconcepts

Lookahead bias — hard rule. The library returns forward-looking columns: ob["MitigatedIndex"], fvg["MitigatedIndex"], bos_choch["BrokenIndex"]. These reference future candles and must never be used as ML features. Only use formation-time data (the candle where the structure was detected). Violating this will silently inflate OOS metrics and cause live failure.

Features to add

All features are computed from M15 OHLCV data only — no external API dependency. Order Block (OB) features — zones where institutional orders are resting:

Feature	Formula	Rationale
`ob_bullish_dist`	`(close − nearest_bullish_ob_top) / atr_14`	ATR-normalised distance to nearest demand zone below
`ob_bearish_dist`	`(nearest_bearish_ob_bottom − close) / atr_14`	ATR-normalised distance to nearest supply zone above
`ob_bullish_present`	`1` if any unmitigated bullish OB within 3×ATR else `0`	Binary: price is near a demand zone
`ob_bearish_present`	`1` if any unmitigated bearish OB within 3×ATR else `0`	Binary: price is near a supply zone
`ob_volume_ratio`	`ob_volume / rolling_mean_volume(50)`	Strength of the most recent OB (high volume = stronger zone)

Fair Value Gap (FVG) features — price imbalances that act as magnets:

Feature	Formula	Rationale
`fvg_bull_above`	`1` if unmitigated bullish FVG above current close else `0`	Unfilled imbalance pulling price up
`fvg_bear_below`	`1` if unmitigated bearish FVG below current close else `0`	Unfilled imbalance pulling price down
`fvg_bull_dist`	`(nearest_bull_fvg_bottom − close) / atr_14`	Normalised distance to nearest bullish FVG
`fvg_bear_dist`	`(close − nearest_bear_fvg_top) / atr_14`	Normalised distance to nearest bearish FVG

Structural features (BOS / CHoCH) — trend continuation vs reversal context:

Feature	Formula	Rationale
`recent_bos`	`1` if a BOS occurred in the last 8 bars else `0`	Trend continuation bias — momentum context
`recent_choch`	`1` if a CHoCH occurred in the last 8 bars else `0`	Reversal bias — structural flip context
`structure_bias`	`+1` BOS, `−1` CHoCH, `0` neither (last 16 bars)	Single signed feature combining both signals

Liquidity features — stop-hunt zones where institutional orders trigger:

Feature	Formula	Rationale
`liq_above_dist`	`(nearest_liq_above − close) / atr_14`	Distance to the nearest pool of buy-side stops
`liq_below_dist`	`(close − nearest_liq_below) / atr_14`	Distance to the nearest pool of sell-side stops

Implementation

Add add_smc_features(df) to ml/feature_engineering.py. The function must only use df[:i] at each row — no forward lookahead. Use swing_length=10 as the default (matches existing indicators.py usage).

# ml/feature_engineering.py

def add_smc_features(df: pd.DataFrame, swing_length: int = 10) -> pd.DataFrame:
    """Add SMC-derived features. No forward-looking columns used."""
    from smartmoneyconcepts import smc

    swing_hl = smc.swing_highs_lows(df, swing_length=swing_length)

    # Order Blocks — formation data only, strip MitigatedIndex
    ob = smc.ob(df, swing_hl)[["OB", "Top", "Bottom", "OBVolume"]]

    # Fair Value Gaps — formation data only, strip MitigatedIndex
    fvg = smc.fvg(df)[["FVG", "Top", "Bottom"]]

    # BOS / CHoCH — formation data only, strip BrokenIndex
    bos_choch = smc.bos_choch(df, swing_hl)[["BOS", "CHOCH", "Level"]]

    # Liquidity levels — formation data only
    liq = smc.liquidity(df, swing_hl)[["Liquidity", "Level"]]

    atr = df["atr_14"] if "atr_14" in df.columns else df["high"].combine(df["low"], max) - df["low"]

    close = df["close"]

    # --- OB features ---
    bull_ob_mask = ob["OB"] == 1
    bear_ob_mask = ob["OB"] == -1

    df["ob_bullish_present"] = 0
    df["ob_bearish_present"] = 0
    df["ob_bullish_dist"] = float("nan")
    df["ob_bearish_dist"] = float("nan")
    df["ob_volume_ratio"] = float("nan")

    vol_mean = df["tick_volume"].rolling(50, min_periods=1).mean()

    for i in range(len(df)):
        c = close.iloc[i]
        a = atr.iloc[i]
        if pd.isna(a) or a == 0:
            continue

        past_bull = ob[bull_ob_mask].iloc[:i]
        past_bear = ob[bear_ob_mask].iloc[:i]

        if not past_bull.empty:
            dists = (c - past_bull["Top"]) / a
            near = dists[(dists >= -3) & (dists <= 3)]
            if not near.empty:
                idx_min = near.abs().idxmin()
                df.at[df.index[i], "ob_bullish_present"] = 1
                df.at[df.index[i], "ob_bullish_dist"] = float(near[idx_min])
                df.at[df.index[i], "ob_volume_ratio"] = (
                    ob.at[idx_min, "OBVolume"] / vol_mean.iloc[i]
                    if vol_mean.iloc[i] > 0 else 0.0
                )

        if not past_bear.empty:
            dists = (past_bear["Bottom"] - c) / a
            near = dists[(dists >= -3) & (dists <= 3)]
            if not near.empty:
                idx_min = near.abs().idxmin()
                df.at[df.index[i], "ob_bearish_present"] = 1
                df.at[df.index[i], "ob_bearish_dist"] = float(near[idx_min])

    # --- FVG features ---
    bull_fvg = fvg[fvg["FVG"] == 1]
    bear_fvg = fvg[fvg["FVG"] == -1]

    df["fvg_bull_above"] = 0
    df["fvg_bear_below"] = 0
    df["fvg_bull_dist"] = float("nan")
    df["fvg_bear_dist"] = float("nan")

    for i in range(len(df)):
        c = close.iloc[i]
        a = atr.iloc[i]
        if pd.isna(a) or a == 0:
            continue
        past_bull_fvg = bull_fvg.iloc[:i]
        past_bear_fvg = bear_fvg.iloc[:i]
        if not past_bull_fvg.empty:
            above = past_bull_fvg[past_bull_fvg["Bottom"] > c]
            if not above.empty:
                nearest = (above["Bottom"] - c).idxmin()
                df.at[df.index[i], "fvg_bull_above"] = 1
                df.at[df.index[i], "fvg_bull_dist"] = (above.at[nearest, "Bottom"] - c) / a
        if not past_bear_fvg.empty:
            below = past_bear_fvg[past_bear_fvg["Top"] < c]
            if not below.empty:
                nearest = (c - below["Top"]).idxmin()
                df.at[df.index[i], "fvg_bear_below"] = 1
                df.at[df.index[i], "fvg_bear_dist"] = (c - below.at[nearest, "Top"]) / a

    # --- BOS / CHoCH features ---
    df["recent_bos"] = (
        bos_choch["BOS"].rolling(8, min_periods=1).apply(lambda x: int(x.notna().any()))
    )
    df["recent_choch"] = (
        bos_choch["CHOCH"].rolling(8, min_periods=1).apply(lambda x: int(x.notna().any()))
    )
    df["structure_bias"] = df["recent_bos"].astype(int) - df["recent_choch"].astype(int)

    # --- Liquidity features ---
    liq_levels = liq[liq["Liquidity"].notna()]["Level"]

    df["liq_above_dist"] = float("nan")
    df["liq_below_dist"] = float("nan")

    for i in range(len(df)):
        c = close.iloc[i]
        a = atr.iloc[i]
        if pd.isna(a) or a == 0:
            continue
        past_liq = liq_levels.iloc[:i]
        if past_liq.empty:
            continue
        above = past_liq[past_liq > c]
        below = past_liq[past_liq < c]
        if not above.empty:
            df.at[df.index[i], "liq_above_dist"] = (above.min() - c) / a
        if not below.empty:
            df.at[df.index[i], "liq_below_dist"] = (c - below.max()) / a

    df[["ob_bullish_dist", "ob_bearish_dist", "ob_volume_ratio",
        "fvg_bull_dist", "fvg_bear_dist",
        "liq_above_dist", "liq_below_dist"]] = (
        df[["ob_bullish_dist", "ob_bearish_dist", "ob_volume_ratio",
            "fvg_bull_dist", "fvg_bear_dist",
            "liq_above_dist", "liq_below_dist"]]
        .clip(-10, 10)
        .fillna(0.0)
    )

    return df

The row-by-row loop is O(n²) and will be slow on the full training dataset (300k+ bars). Vectorise using pd.merge_asof or precompute a rolling lookup table once the feature set is validated. Optimise only after SHAP confirms the features are useful.

Feature names to add to `model_compat.json`

Append in this order (after existing features, before any on-chain features from 17.1):

"ob_bullish_present", "ob_bearish_present",
"ob_bullish_dist", "ob_bearish_dist", "ob_volume_ratio",
"fvg_bull_above", "fvg_bear_below",
"fvg_bull_dist", "fvg_bear_dist",
"recent_bos", "recent_choch", "structure_bias",
"liq_above_dist", "liq_below_dist"

That is 14 new features, taking the ensemble from 62 → 76 features.

Integration tasks

Add smartmoneyconcepts to requirements.txt
Implement add_smc_features(df) in ml/feature_engineering.py — no MitigatedIndex / BrokenIndex columns may appear in the feature matrix
Add the 14 feature names to model_compat.json["features"] (append at end)
Run python scripts/retrain.py --ensemble --no-warmstart with all 14 features
Run ml/shap_analysis.py — inspect beeswarm plot; drop any feature with mean absolute SHAP < 0.001 after retrain
If more than 4 features are pruned by SHAP, split into two groups (OB+FVG first, BOS+liquidity second) and add one group at a time
OOS gate: SMC ensemble Score must be ≥ current Score + 0.5 to ship; if not, revert model_compat.json and remove the features
Performance note: validate that add_smc_features completes in < 60s on the full training dataset; if slower, vectorise before merging to main

Expected impact

OB / FVG features — highest potential. Price returning to an OB or being drawn toward an FVG is a well-documented BTC pattern. Should show non-zero SHAP especially on ob_bullish_dist and fvg_bull_above.
BOS / CHoCH features — structural regime context. structure_bias directly encodes trend vs reversal, complementing existing h4_ema_bias and d1_trend.
Liquidity features — encodes stop-hunt dynamics. Less certain; BTC liquidity grabs are real but noisier at M15 than on higher timeframes.

Phase 18 — Model architecture

18.1 Stacked meta-learner

Replace majority-vote with a learned combiner. The meta-learner trains on the base models’ class probabilities (out-of-fold) and learns optimal weights per class.

Data leakage risk: Using StratifiedKFold on time-series data leaks future bars into past training folds. Always use TimeSeriesSplit which enforces chronological ordering. This is a hard rule — no exceptions.

This is the highest overfit risk item in the entire roadmap. The meta-layer sees the validation set probability distributions and can memorize them. Strict OOS gate is mandatory before deploying.

OOF stacking algorithm (time-series safe):

from sklearn.model_selection import TimeSeriesSplit  # NOT StratifiedKFold

def generate_oof_stacks(models: list, X: np.ndarray, y: np.ndarray,
                        n_splits: int = 5) -> tuple[np.ndarray, np.ndarray]:
    """
    Returns X_meta (N_oof, K*3), y_meta (N_oof,).
    Uses walk-forward folds: each fold trains on [0..t], predicts [t..t+step].
    First fold's training samples are discarded (no OOF predictions for them).
    """
    tscv = TimeSeriesSplit(n_splits=n_splits)
    K = len(models)         # number of base models (3 currently, 5+ in Phase 22)
    X_meta = np.zeros((len(X), K * 3))
    mask = np.zeros(len(X), dtype=bool)

    for train_idx, val_idx in tscv.split(X):
        for k, model in enumerate(models):
            m_clone = clone(model).fit(X[train_idx], y[train_idx])
            probs = m_clone.predict_proba(X[val_idx])  # (n_val, 3)
            X_meta[val_idx, k*3:(k+1)*3] = probs
        mask[val_idx] = True

    return X_meta[mask], y[mask]   # drop rows with no OOF prediction (first fold train set)

Meta-learner choices (ordered by overfit risk, lowest first):

Option	Model	Overfit risk	Use when
A	`LogisticRegression(C=0.1, max_iter=1000)`	Very low	≤ 3 base models
B	`LGBMClassifier(max_depth=3, n_estimators=100)`	Low	4–6 base models
C	`MLPClassifier(hidden_layer_sizes=(32,), max_iter=500)`	Medium	Avoid for now

Start with Option A. Graduate to Option B only when Phase 22 adds 2+ neural models. Calibration of meta-learner output:

from sklearn.isotonic import IsotonicRegression

# After training meta-learner, calibrate on a held-out fold (never on test set)
meta_probs_cal = meta_model.predict_proba(X_cal_meta)  # cal = held-out fold
calibrators = []
for c in range(3):
    ir = IsotonicRegression(out_of_bounds="clip")
    ir.fit(meta_probs_cal[:, c], (y_cal == c).astype(float))
    calibrators.append(ir)

def calibrate(raw_probs: np.ndarray) -> np.ndarray:
    cal = np.column_stack([calibrators[c].transform(raw_probs[:, c]) for c in range(3)])
    cal /= cal.sum(axis=1, keepdims=True)  # renormalize rows
    return cal

Tasks:

Implement generate_oof_stacks in ml/meta_learner.py using TimeSeriesSplit(n_splits=5) — not StratifiedKFold
Start with LogisticRegression(C=0.1) meta-learner (Option A); switch to LGB after Phase 22 adds neural models
Add --meta flag to scripts/retrain.py that triggers OOF generation + meta-learner training after base model training
Apply isotonic calibration on a chronological held-out fold (last 10% of training data); save calibrators to models/meta_calibrators.pkl
OOS gate — recalibrated targets (current best is WR=67.6%, PF=2.83, Score=35.13):
- Keep if OOS Score ≥ current Score + 1.0
- Keep if OOS WR ≥ 60% (not 75% — that was unreachable; WR > 75% would indicate overfit, not improvement)
- Keep if OOS MaxDD ≤ current MaxDD + 2%
- If gate fails: revert to majority-vote, log result to logs/meta_learner_eval.log, do not re-attempt until base models are retrained

18.2 Calibrated probability outputs

The signal model probabilities are not inherently calibrated (a 70% confidence prediction should be right ~70% of the time). Calibration improves confidence-based downstream decisions like ml_sltp and Kelly sizing.

calibrate_models() added to ml/ensemble_trainer.py — applies _IsotonicCalibratedClassifier (sklearn-version-agnostic wrapper) to each base model after training; saves *_calibrated.pkl alongside raw pkl
_load_model() in ml/ensemble_predictor.py updated: calibrated pkl takes priority over ONNX (calibration cannot be applied to ONNX sessions), ONNX is second, raw pkl is fallback
Verified calibration on val set via ml/calibration_check.py: all three models PASS (RF MAE=0.012, XGB MAE=0.018, LGB MAE=0.014, ensemble MAE=0.030 — well within 0.05 threshold)
Completed before enabling ml_sltp (Phase 21.1) — calibrated probabilities ready

18.3 Regime-switching model

Train two separate signal model variants: one on trending data (ADX > 0.25) and one on ranging data (ADX ≤ 0.25). At inference, the active ADX regime routes to the correct model.

This doubles the number of models to maintain. Only implement if walk-forward OOS shows that the single model performs significantly worse in one regime. Run the diagnostic below before building anything.

Prerequisite diagnostic — measure single-model regime performance:

# Run this before deciding to build regime-switching models
import pandas as pd
from backtest import run_backtest

df_oos = pd.read_csv("datasets/oos_signals.csv")  # generated by backtest.py

# Split by regime
trending = df_oos[df_oos["adx_14"] > 0.25]
ranging  = df_oos[df_oos["adx_14"] <= 0.25]

# Check performance gap: if both regimes have WR > 55%, single model is fine
for name, subset in [("trending", trending), ("ranging", ranging)]:
    wr = (subset["pnl"] > 0).mean()
    pf = subset[subset["pnl"]>0]["pnl"].sum() / abs(subset[subset["pnl"]<0]["pnl"].sum())
    print(f"{name}: n={len(subset)}, WR={wr:.1%}, PF={pf:.2f}")

Only proceed if: trending WR < 55% OR ranging WR < 55% AND each segment has ≥ 20,000 training samples. Hysteresis rule for regime transitions (prevents model-flip thrashing): ADX fluctuates around the threshold. Without hysteresis, the model can switch dozens of times in a session. Apply a buffer zone:

# Uses 3 consecutive bars of new regime before switching model
TRENDING_THRESHOLD = 0.28   # Enter trending above this
RANGING_THRESHOLD  = 0.22   # Enter ranging below this (gap = hysteresis band)

def get_active_model(adx_history: deque, current_model: str) -> str:
    recent = list(adx_history)[-3:]   # last 3 bars
    if all(a > TRENDING_THRESHOLD for a in recent):  return "trending"
    if all(a < RANGING_THRESHOLD  for a in recent):  return "ranging"
    return current_model   # stay in current regime — hysteresis

Gate — recalibrated (Score context: current single model = 35.13):

Trending-regime Score ≥ 17.0 (better than full-model 35.13 / 2 = 17.6 is the theoretical minimum for half the trades; 17.0 is conservative given trending is the easier regime)
Ranging-regime Score ≥ 10.0 (ranging is harder; acceptable if at least net-positive)
Combined (blended) Score ≥ current 35.13 + 2.0 (must beat single model or not worth the complexity)

Tasks:

Run regime diagnostic above on the latest OOS dataset before doing anything else; log results to logs/regime_diagnostic.txt
If diagnostic shows no regime gap (both WR > 55%): skip this phase, mark as deferred
If gap found: segment datasets/training_data_btcusd.csv by adx_14; verify ≥ 20k rows per segment
Train two ensembles: python scripts/retrain.py --ensemble --regime trending and --regime ranging; each saves to models/signal_trending/ and models/signal_ranging/
Add RegimeSwitchPredictor to ml/ensemble_predictor.py: maintains adx_history = deque(maxlen=3); calls get_active_model() before each inference; loads both model sets into memory at startup
OOS backtest: run backtest.py --regime-switch which routes each bar to the correct model; compare blended Score to single-model Score
Gate as above; if blended Score < current + 2.0: abandon regime switching and document

18.4 Multi-instrument expansion

Each instrument gets its own dedicated model stack — never shared with BTCUSD. Architecture per instrument:

models/
  BTCUSD/              # existing
    signal_rf.onnx
    signal_xgb.onnx
    signal_lgb.onnx
    position_rf.onnx
    ...
    ensemble_scaler.pkl
    model_compat.json
  XAUUSD/              # new
    signal_rf.onnx
    ...
    ensemble_scaler.pkl   # SEPARATE scaler — gold ATR is 100× smaller than BTC
    model_compat.json     # may differ in feature list (e.g. no dist_to_round_number for gold)

datasets/
  training_data_btcusd.csv
  training_data_xauusd.csv   # new — fetched from MT5 XAUUSD M15
  training_data_eurusd.csv   # new

ml_config.json       # BTCUSD defaults
ml_config_xauusd.json  # gold-specific overrides
ml_config_eurusd.json  # forex-specific overrides

XAUUSD labeling differences — ATR-aware label params must be rescaled: Gold ATR is ~

10–30 vs BTC ATR ~

500–2000. The existing SL=0.8×ATR, TP=1.2×ATR proportions are valid, but the min_atr filter (currently 15 for BTC) needs a per-symbol value:

// ml_config_xauusd.json
{
  "labeling": {
    "atr_sl_multiplier": 0.8,
    "atr_tp_multiplier": 1.2,
    "min_atr": 0.5,         // gold: $0.50 minimum ATR (not $15 like BTC)
    "lookahead_candles": 48  // same
  },
  "training": {
    "min_samples": 50000     // gold has fewer liquid M15 bars; lower threshold
  }
}

EURUSD labeling differences: Pip value for EURUSD standard lot = $10/pip. Convert P&L to USD in backtest.py:

# backtest.py — pip_value must be symbol-aware
PIP_VALUE = {
    "BTCUSD": 100.0,   # USC/lot/point for cent account
    "XAUUSD": 100.0,   # $1/lot/point × 100 for cent
    "EURUSD": 10.0,    # $10/pip/lot standard; 0.1/pip/lot for micro
}

Tasks:

Add --symbol flag to ml/train.py; when set, load ml_config_{symbol.lower()}.json instead of default; save models to models/{SYMBOL}/
Add --symbol flag to backtest.py; load correct scaler and model directory; use symbol-specific pip_value in P&L calculation
Create ml_config_xauusd.json with XAUUSD-specific min_atr, labeling params; keep all other params as BTC defaults initially
Fetch 2 years of XAUUSD M15 from MT5 API: python ml/train.py --symbol XAUUSD --refresh — saves to datasets/training_data_xauusd.csv
Train XAUUSD model stack: python scripts/retrain.py --symbol XAUUSD --ensemble --position --no-warmstart
OOS gate for XAUUSD: PF > 2.0 on 60-day OOS window; MaxDD < 15%
EURUSD: defer until XAUUSD is live-validated; same pipeline applies

Never share model files or scalers across instruments. ensemble_scaler.pkl is fit on each symbol’s feature distribution independently. Using BTC scaler on gold data will produce garbage predictions.

Phase 19 — Infrastructure & reliability

19.1 Live trade dashboard

Superseded by Phase 23 (JARVIS Dashboard) — the Next.js WebSocket dashboard covers all use cases planned here with better UX. This Streamlit version is now a lightweight fallback for quick server-side monitoring without the full frontend stack.

Dependency: pip install streamlit>=1.35 supabase>=2.5 The Streamlit fallback is useful when SSH-ed into the trading VM and needing a quick equity snapshot without opening the browser dashboard.

Build scripts/dashboard_live.py:

import streamlit as st
from supabase import create_client
import pandas as pd

st.set_page_config(page_title="NOVOSKY Live", layout="wide", page_icon="📈")
sb = create_client(os.getenv("SUPABASE_URL"), os.getenv("SUPABASE_KEY"))

@st.cache_data(ttl=30)
def get_equity():
    r = sb.table("account_snapshots").select("equity,created_at").order("created_at", desc=True).limit(500).execute()
    return pd.DataFrame(r.data)

col1, col2 = st.columns([2, 1])
with col1:
    st.line_chart(get_equity().set_index("created_at")["equity"])
with col2:
    st.metric("Latest Equity", f"${get_equity().iloc[0]['equity']:.0f}")

Views: equity area chart (500 snapshots), open position P&L, last 20 signals table with confidence color-coding
Deploy via PM2 port 8501: streamlit run scripts/dashboard_live.py --server.port 8501 --server.headless true; expose under terminal-rf1.novosky.app/monitor via Caddy

19.2 API failover

Complete — 2026-04-25

_api_fail_count and _api_paused globals track consecutive MT5 API failures in trading/bot.py
After 3 consecutive failures: Telegram alert [API UNREACHABLE] fired, _api_paused = True blocks new entries
Resumes automatically when API responds; _api_fail_count cleared, _api_paused = False
Guard wraps the _get_rates() call in the main loop

19.3 Graceful shutdown improvements

Complete — 2026-04-25

_shutdown_requested flag replaces sys.exit(0) in the SIGTERM handler
Main loop top: if flag is set, polls open positions and exits cleanly once count reaches 0
Entry gate blocks new trades while shutdown is pending
Logs [SHUTDOWN] with position count on clean exit

19.4 Config hot-reload

Safe vs unsafe keys — not all config changes can be applied without restart:

Key	Hot-reloadable?	Reason
`risk_percent`	✅ Yes	Only affects next lot sizing call
`max_daily_loss_pct`	✅ Yes	Guard checked every cycle
`min_confidence`	✅ Yes	Filter applied at signal gate
`adx_regime_filter.*`	✅ Yes	Checked at signal gate
`circuit_breaker.*`	✅ Yes	State counter reset is safe
`model_paths.*`	❌ No	Requires model reload → restart
`symbol`	❌ No	Would orphan tracked positions
`api_base_url`	❌ No	Active connections would break
`kelly_lot_sizing.*`	⚠️ Careful	Only safe if no open position

Implementation — mtime polling:

# trading/bot.py — add to main loop top
_config_mtime: float = 0.0
_SAFE_HOT_KEYS = {"risk_percent","max_daily_loss_pct","min_confidence","adx_regime_filter","circuit_breaker"}

def _maybe_reload_config() -> None:
    global _config_mtime, config
    current_mtime = os.path.getmtime("config.json")
    if current_mtime <= _config_mtime:
        return
    new_cfg = json.loads(open("config.json").read())
    changed = {}
    for key in _SAFE_HOT_KEYS:
        if new_cfg.get(key) != config.get(key):
            changed[key] = {"old": config.get(key), "new": new_cfg.get(key)}
            config[key] = new_cfg[key]
    if changed:
        logger.info(f"[CONFIG RELOAD] {changed}")
        _notify_telegram(f"⚙️ Config reloaded: {list(changed.keys())}")
    _config_mtime = current_mtime

Tasks:

Add _maybe_reload_config() call at top of main loop (before signal gate) — call every cycle (M15 cadence means 15-min max lag is acceptable; no need for a background thread)
Define _SAFE_HOT_KEYS set as shown above; never iterate all config keys (would silently apply unsafe changes)
Log changed keys with old/new values (not just key names) — makes audit trail useful
Telegram notification on reload: send list of changed keys so operator knows the change took effect
Unit test: write a temp config.json with modified risk_percent, call _maybe_reload_config(), assert config["risk_percent"] updated and _config_mtime advanced

Phase 22 — Advanced Ensemble Architecture (XGBoost · RF · FT-Transformer · TFT · LSTM)

The current RF + XGB + LGB majority-vote ensemble leaves accuracy on the table because all three base learners are gradient-boosted trees — they share the same inductive bias and make correlated errors. Adding one neural-attention model and one sequence model provides genuine ensemble diversity (target pairwise disagreement rate 0.35–0.50), which lowers the irreducible error floor independent of individual model quality. Ensemble error decomposition:

Total Error = Bias² + Variance − 2 × Covariance(model_i, model_j)

Adding a model that errs on different samples (low covariance) reduces total error even if the new model is weaker individually.

Build one model at a time. Retrain after each addition. Gate on OOS Score ≥ current before keeping. Adding all five models at once makes root-cause analysis impossible.

Python dependencies to install before starting this phase:

pip install torch>=2.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install skl2onnx>=1.16 onnxconverter-common>=1.13
pip install pytorch-forecasting>=1.0  # TFT reference implementation
pip install mapie>=0.8               # Conformal prediction / quantile intervals
pip install xgboost>=2.0             # DART mode requires >=1.7; 2.x preferred

22.1 XGBoost — DART mode + monotonic constraints + Optuna search space

Algorithm — DART tree dropping: At each boosting round, instead of using all t trees built so far, DART randomly drops a subset D ⊆ {1…t} and trains the new tree to compensate for the removed ones:

ŷ_i = Σ_{k ∉ D} f_k(x_i)           # prediction without dropped trees
new tree f_{t+1} fits residuals of ŷ_i
prediction after round: ŷ_i + f_{t+1}(x_i) scaled by 1/|D|+1

rate_drop = probability each tree is included in D. skip_drop = probability the entire drop is skipped for that round (pure GBM step). Monotonic constraint math: For feature j with constraint c_j ∈ {-1, 0, +1}, XGBoost enforces:

if c_j = +1:  for all splits on feature j, left_child_value ≤ right_child_value
if c_j = -1:  left_child_value ≥ right_child_value
if c_j =  0:  unconstrained

Enforced during tree construction via post-order tree repair — each internal node’s value is clipped to [max(left_subtree), min(right_subtree)]. Interaction constraints: Define which feature groups may share a split path. XGBoost rejects any tree that routes both feature i and feature j on the same root-to-leaf path if they are in different groups:

"interaction_constraints": [
  [0, 1, 2, 3, 4],          // Group 0: momentum (RSI, MACD, rsi_slope, etc.)
  [5, 6, 7, 8, 9],          // Group 1: volatility (ATR, BB, ADX)
  [10, 11, 12, 13, 14, 15], // Group 2: structure (EMA, price_vs_ema, d1_trend)
  [16, 17, 18, 19, 20]      // Group 3: session/time (flags, sin/cos encodings)
]

Tasks:

Switch booster in ml_config.json → xgb_params:

"booster": "dart",
"rate_drop": 0.10,
"skip_drop": 0.50,
"normalize_type": "tree",
"learning_rate": 0.05,
"n_estimators": 400,
"max_delta_step": 1

In ml/train.py around the XGBClassifier constructor, build monotone_constraints tuple from model_compat.json["features"] order — map feature names to constraint values:

MONOTONE_MAP = {
    "atr_14": 1, "adx_14": 1, "bb_width": 1,
    "volume_ratio": 1, "atr_percentile": 1,
}  # all others default to 0
constraints = tuple(MONOTONE_MAP.get(f, 0) for f in feature_names)

Build interaction_constraints list from 4 feature cluster groups; attach to XGB params before fit

Add DART-specific Optuna search space in ml/tune.py (new branch under if booster == "dart"):

"rate_drop":  trial.suggest_float("rate_drop", 0.05, 0.30),
"skip_drop":  trial.suggest_float("skip_drop", 0.30, 0.70),
"normalize_type": trial.suggest_categorical("normalize_type", ["tree", "forest"]),
"max_depth":  trial.suggest_int("max_depth", 4, 8),   # shallower than gbtree
"subsample":  trial.suggest_float("subsample", 0.6, 0.9),
"colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 0.8),

DART disables early_stopping_rounds — use fixed n_estimators=400; remove any early_stopping_rounds from the DART fit call in ml/train.py
Run python scripts/retrain.py --ensemble --no-warmstart to force a clean DART retrain
OOS gate: keep if Score ≥ current GBTREE baseline − 0.5 (accept slight score trade-off for lower variance)

22.2 Random Forest — ExtraTrees + Quantile intervals for Kelly

Algorithm — ExtraTrees split selection: Standard RF: at each node, evaluate max_features candidate features and pick the split minimizing Gini. ExtraTrees: pick a random threshold from the feature’s observed range — no exhaustive search:

For each candidate feature j:
  threshold_j ~ Uniform(min(X[:,j]), max(X[:,j]))  # random, not optimal
Pick j* = argmin Gini over random (j, threshold_j) pairs

Result: higher bias per tree, dramatically lower variance across trees — beneficial on noisy M15 features. Algorithm — Quantile intervals via leaf distributions: Standard RF predicts ŷ = (1/T) Σ_t leaf_mean_t(x). Quantile RF instead collects all training labels in each matched leaf across all T trees, forming an empirical distribution, and returns its percentiles:

S(x) = ∪_{t=1}^{T} { y_i : x_i ∈ leaf_t(x) }   # all leaf-matched labels
P_q(x) = q-th percentile of S(x)

Kelly fraction adjustment using interval width:

raw_kelly = (p_win − p_loss) / win_loss_ratio      # standard Kelly
interval_width = P_95(x) − P_05(x)                 # normalized 0–1
adjusted_kelly = raw_kelly × (1 − interval_width)  # tighter interval → larger position
effective_risk  = base_risk_pct × min(adjusted_kelly, max_kelly_fraction)

Tasks:

Add ExtraTreesClassifier to ml/ensemble_trainer.py — insert after the RF definition:

from sklearn.ensemble import ExtraTreesClassifier
extra_trees = ExtraTreesClassifier(
    n_estimators=350,
    max_depth=None,          # full depth; randomness controls variance
    min_samples_split=8,
    max_features='sqrt',
    bootstrap=False,         # ExtraTrees convention
    class_weight='balanced',
    random_state=42,
    n_jobs=-1,
)

Export ExtraTrees to ONNX using the same skl2onnx pipeline in ml/onnx_export.py; output shape [1, 3] float32 probabilities
Add "extra_trees" key to model_compat.json["models"] list; update ensemble_predictor.py load path
Create ml/quantile_predictor.py:
- Class QuantileRFPredictor wraps a trained RandomForestClassifier
- Method predict_interval(X_row) → iterates all tree leaves matched by X_row, pools their training labels, returns (p05, p50, p95) for each class
- Inference is O(T × leaf_size) — keep T ≤ 200 for <50 ms latency at M15 frequency
Wire into trading/bot.py Kelly sizing block (currently around line 1220): fetch interval_width from QuantileRFPredictor; apply adjusted Kelly formula above; log both raw and adjusted Kelly to ml_performance.csv
Add "quantile_rf": {"enabled": false, "p_low": 0.05, "p_high": 0.95} config block to config.json

Optuna search space additions in ml/tune.py for ExtraTrees:

"et_n_estimators": trial.suggest_int("et_n_estimators", 200, 500),
"et_min_samples_split": trial.suggest_int("et_min_samples_split", 4, 16),
"et_max_features": trial.suggest_categorical("et_max_features", ["sqrt", "log2", 0.5]),

OOS sweep: compare 3-model majority-vote vs 4-model (RF+XGB+LGB+ExtraTrees) majority-vote; require Score ≥ current + 0.3

22.3 FT-Transformer (Feature Tokenizer + Transformer)

Architecture — Feature Tokenizer: Each of the 62 features is independently projected from scalar (batch, 1) → embedding vector (batch, d):

token_j = W_j × x_j + b_j + e_j      # W_j ∈ ℝ^d, b_j ∈ ℝ^d, e_j = feature-index embedding

A learnable [CLS] token is prepended, giving a 63-token sequence. Multi-head self-attention then computes pairwise interaction scores between every pair of feature tokens:

Attention(Q, K, V) = softmax( QK^T / √d_k ) V
Q = K = V = token_sequence × W_{q,k,v}

The [CLS] token aggregates cross-feature information; its output is fed to the classification head. Why FT-Transformer > TabTransformer for NOVOSKY: TabTransformer applies attention only to categorical features (9 out of 59). FT-Transformer applies attention to all 59, making it better suited since 50+ features are numerical time-series derivatives. Data requirements:

Input dtype:  float32, shape (batch, 59) — same StandardScaler as RF/XGB
Output dtype: float32, shape (batch, 3) — raw logits (apply softmax at inference)
Training set: same X_train, y_train from ml/train.py
Min samples for stable attention: ~50k (you have ~135k ✓)

Tasks:

Create ml/models/ft_transformer.py — pure torch.nn.Module:

class FTTransformer(nn.Module):
    def __init__(self, n_features=59, d_model=64, n_heads=8, n_layers=6,
                 ffn_dim=256, dropout=0.1):
        # Feature Tokenizer: one Linear(1 → d_model) per feature
        self.tokenizers = nn.ModuleList([nn.Linear(1, d_model) for _ in range(n_features)])
        # Feature-index positional embedding (not time-positional)
        self.feat_index_emb = nn.Embedding(n_features, d_model)
        # CLS token
        self.cls_token = nn.Parameter(torch.randn(1, 1, d_model))
        # Transformer encoder
        encoder_layer = nn.TransformerEncoderLayer(
            d_model, n_heads, ffn_dim, dropout, batch_first=True, norm_first=True
        )
        self.transformer = nn.TransformerEncoder(encoder_layer, n_layers)
        # Head: LayerNorm → Linear → 3 classes
        self.head = nn.Sequential(nn.LayerNorm(d_model), nn.Linear(d_model, 3))

Forward pass: tokenize each feature → add index embeddings → prepend CLS → transformer → CLS output → head

Create ml/trainers/ft_transformer_trainer.py:
- Convert X_train (numpy float64) to torch.float32 tensor
- class_weights = compute_sample_weight('balanced', y_train) → torch.FloatTensor
- loss = F.cross_entropy(logits, y_batch, weight=class_weights_batch)
- Optimizer: AdamW(lr=5e-4, weight_decay=1e-5, betas=(0.9, 0.999))
- Scheduler: CosineAnnealingLR(T_max=150, eta_min=1e-6) — cosine decay ensures smooth convergence
- Batch size: 256; max epochs: 150; early stop patience: 15 on val cross-entropy
- Save best checkpoint to models/ft_transformer.pt (state_dict only, not full model)

Add Optuna hyperparameter search for FT-Transformer in ml/tune.py:

"ft_d_model":  trial.suggest_categorical("ft_d_model", [32, 64, 128]),
"ft_n_heads":  trial.suggest_categorical("ft_n_heads", [4, 8]),
"ft_n_layers": trial.suggest_int("ft_n_layers", 3, 8),
"ft_ffn_dim":  trial.suggest_categorical("ft_ffn_dim", [128, 256, 512]),
"ft_dropout":  trial.suggest_float("ft_dropout", 0.05, 0.30),
"ft_lr":       trial.suggest_float("ft_lr", 1e-4, 1e-3, log=True),
"ft_wd":       trial.suggest_float("ft_wd", 1e-6, 1e-3, log=True),

ONNX export — add to ml/onnx_export.py:

dummy = torch.randn(1, 59)
torch.onnx.export(
    model, dummy, "models/ft_transformer.onnx",
    input_names=["features"], output_names=["logits"],
    dynamic_axes={"features": {0: "batch"}, "logits": {0: "batch"}},
    opset_version=17,
)

Edge case: if n_heads does not divide d_model evenly, ONNX export fails — validate d_model % n_heads == 0 before export.

Apply isotonic calibration via existing calibrate_models() in ml/ensemble_trainer.py — wrap FT-Transformer inference in a sklearn-compatible predict_proba(X) adapter class
Add to ml/ensemble_predictor.py model loading: check models/ft_transformer.onnx → fall back to models/ft_transformer_calibrated.pkl → fall back to models/ft_transformer.pt
OOS gate: FT-Transformer solo OOS WR ≥ 50%, solo PF ≥ 1.5; ensemble with FT-Transformer Score ≥ baseline + 0.5

22.4 Temporal Fusion Transformer (TFT) — sequence model

Architecture overview: TFT processes a sequence of T=48 M15 bars. Each bar carries n_dyn=54 time-varying features. Additionally, 5 static features (session flags, day-of-week sin/cos) are processed separately.

Static features (5) ──→ Static Covariate Encoder (GRN)
                              │
                              ├──→ context_h (init LSTM hidden state)
                              └──→ context_e (enrichment context)

Dynamic features (48 × 54) ──→ Variable Selection Network (VSN)
    │  VSN uses a GRN per feature + softmax over features → weighted sum
    │  Output: (batch, 48, d_model)  — only informative features survive
    │
    ├──→ LSTM Encoder (2-layer, hidden=128)
    │        Outputs: (batch, 48, 128) encoder states
    │
    └──→ LSTM Decoder (2-layer, hidden=128) — initialized from static context_h
             Outputs: (batch, 48, 128) decoder states
             │
             ├──→ Gated Residual Network (GRN) with static context_e
             │
             └──→ Multi-Head Attention (num_heads=4, causal mask off for classification)
                      │
                      └──→ GRN → Layer Norm → (batch, 128)
                                       │
                                       └──→ Linear(128, 3) → BUY/SELL/HOLD logits

Gated Residual Network (GRN) — the core building block:

GRN(x, c=None):
  η₂ = ELU( W₂ × [x; c] + b₂ )      # c = optional context vector
  η₁ = W₁ × η₂ + b₁
  gate = sigmoid( W_gate × [x; c] + b_gate )
  output = LayerNorm( gate ⊙ η₁ + (1−gate) ⊙ x )  # gated skip connection

Variable Selection Network math:

VSN for timestep t with features x^(j)_t, j=1..54:
  ξ^(j)_t = GRN_j( x^(j)_t, static_context )   # per-feature processing
  v_t = softmax( W_vs × [ξ^(1)_t; ...; ξ^(54)_t] + b_vs )  # feature weights
  x̃_t = Σ_j v^(j)_t × ξ^(j)_t                              # weighted combination

The weights v_t are what we expose as “feature attention” in the dashboard. Data pipeline — SequenceDataset:

# ml/data/sequence_dataset.py
class SequenceDataset(Dataset):
    def __init__(self, X: np.ndarray, y: np.ndarray, seq_len: int = 48,
                 static_indices: list[int] = None):
        # X: (N, 59) float32, y: (N,) int64
        # static_indices: positions of the 5 static features in X
        self.X = torch.from_numpy(X.astype(np.float32))
        self.y = torch.from_numpy(y.astype(np.int64))
        self.seq_len = seq_len
        self.static_idx = static_indices or [16, 17, 18, 19, 20]  # session/time features
        self.dyn_idx = [i for i in range(X.shape[1]) if i not in self.static_idx]

    def __len__(self):
        return len(self.X) - self.seq_len   # valid start indices

    def __getitem__(self, idx):
        seq = self.X[idx : idx + self.seq_len]           # (48, 59)
        x_dyn = seq[:, self.dyn_idx]                     # (48, 54)
        x_static = self.X[idx + self.seq_len - 1, self.static_idx]  # (5,) — current bar
        label = self.y[idx + self.seq_len - 1]           # label at bar t
        return x_dyn, x_static, label

No future lookahead: window [idx … idx+seq_len-1] → label at idx+seq_len-1. Training config:

optimizer = AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
scheduler = ReduceLROnPlateau(optimizer, patience=10, factor=0.5, min_lr=1e-5)
# Gradient clipping — mandatory for LSTM in TFT
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
# Batch: 64 (sequence data needs smaller batches for gradient stability)
# Loss: CrossEntropyLoss with class weights

Tasks:

Create ml/models/tft.py — implement GRN, VSN, TFT classes as described above; keep d_model=128, n_heads=4, seq_len=48, n_static=5, n_dynamic=54
Create ml/data/sequence_dataset.py — SequenceDataset as specified; unit test: assert no label from future bars is in the window (check __getitem__ with known data)
Create ml/trainers/tft_trainer.py:
- Build DataLoader(SequenceDataset(...), batch_size=64, shuffle=False) — do NOT shuffle (time-series ordering)
- Training loop: forward → loss → backward → clip_grad → step → scheduler.step(val_loss)
- Save best model (val loss) to models/tft.pt; also save final-epoch attention weights tensor to models/tft_attention_cache.npy (shape (n_val, 48, 54)) for dashboard initialization

Add TFT Optuna search space in ml/tune.py:

"tft_d_model":    trial.suggest_categorical("tft_d_model", [64, 128]),
"tft_n_heads":    trial.suggest_categorical("tft_n_heads", [4, 8]),
"tft_n_grn":      trial.suggest_int("tft_n_grn", 1, 3),   # GRN depth
"tft_dropout":    trial.suggest_float("tft_dropout", 0.05, 0.25),
"tft_seq_len":    trial.suggest_categorical("tft_seq_len", [24, 48, 96]),
"tft_lr":         trial.suggest_float("tft_lr", 5e-4, 5e-3, log=True),

ONNX export: TFT has two inputs — export with input_names=["x_dynamic", "x_static"], shapes [batch, 48, 54] and [batch, 5]; opset 17; verify with onnxruntime.InferenceSession
In ml/ensemble_predictor.py: maintain a rolling buffer _seq_buffer: deque(maxlen=48) populated after each build_features() call; pass last 48 rows to TFT at inference
Edge case: if _seq_buffer has < 48 entries (bot just started), zero-pad the head and still run inference — TFT will output lower-confidence results until buffer fills
OOS gate: TFT solo WR ≥ 50%; ensemble Score ≥ current + 0.5

22.5 LSTM with Bahdanau attention + TCN alternative

Bahdanau (additive) attention — full formula: Given LSTM output sequence H = [h_1, …, h_T] and final hidden state h_T:

e_t = v^T × tanh( W_a × h_t + U_a × h_T )    # alignment score at step t
α_t = exp(e_t) / Σ_{t'} exp(e_{t'})           # softmax over T steps
c   = Σ_t α_t × h_t                            # context vector (weighted sum)
ŷ   = W_out × [c ; h_T] + b_out               # concat context + final hidden

W_a ∈ ℝ^{da×d}, U_a ∈ ℝ^{da×d}, v ∈ ℝ^{da} are learned. da=64 (attention dim). TCN receptive field formula (use to choose dilation schedule): With n_layers dilated causal Conv1d layers, each with kernel_size=k and dilation d_i = 2^i:

Receptive field = 1 + Σ_{i=0}^{n-1} (k−1) × 2^i = 1 + (k−1) × (2^n − 1)
For k=5, n=4: RF = 1 + 4 × 15 = 61 bars  (covers 61 M15 bars = ~15h)
For k=5, n=5: RF = 1 + 4 × 31 = 125 bars (covers 125 M15 bars = ~31h)

Choose n=4 (RF=61) to cover 12–15 h window with lower compute cost. TCN architecture:

Input: (batch, seq_len=48, n_features=59)
Transpose: (batch, 59, 48)  — Conv1d expects (batch, channels, length)

Layer 0: Conv1d(59→128, k=5, dilation=1, padding=4, causal)  → (batch, 128, 48)
Layer 1: Conv1d(128→128, k=5, dilation=2, padding=8, causal) → (batch, 128, 48)
Layer 2: Conv1d(128→128, k=5, dilation=4, padding=16, causal)→ (batch, 128, 48)
Layer 3: Conv1d(128→128, k=5, dilation=8, padding=32, causal)→ (batch, 128, 48)

Each layer: Conv1d → WeightNorm → ReLU → Dropout(0.1) + residual projection if channels change

Take last time step: (batch, 128)
Linear(128, 64) → ReLU → Dropout(0.2) → Linear(64, 3)

Causal padding formula: To ensure no future leakage, pad left by (k−1) × dilation and slice off the right:

def causal_conv1d(x, conv, dilation):
    padding = (conv.kernel_size[0] - 1) * dilation
    x = F.pad(x, (padding, 0))   # left-pad only
    return conv(x)

Tasks:

Create ml/models/lstm_attention.py:
- Bidirectional LSTM: nn.LSTM(input_size=59, hidden_size=128, num_layers=2, bidirectional=True, dropout=0.2, batch_first=True) — output dim = 256
- Attention: implement Bahdanau equations above with da=64; W_a=Linear(256, 64), U_a=Linear(256, 64), v=Linear(64, 1, bias=False)
- Context vector: c ∈ ℝ^{256}; concat with h_T[-1] (last step, bidirectional) → Linear(512, 3)
- Inference mode switch: set self.training_mode flag; when False, run only forward direction of LSTM (unidirectional causal)
Create ml/models/tcn.py:
- 4 CausalConv1d layers with dilations [1, 2, 4, 8], kernel_size=5, out_channels=128
- WeightNorm on each Conv1d (improves training stability vs BatchNorm on small batches)
- Residual connections: if in_channels ≠ out_channels, add Conv1d(in, out, 1) projection
- Final layer: last timestep output → Linear(128, 3)
Train both on SequenceDataset(seq_len=48) using same tft_trainer.py loop (swap model); record OOS Score for each; keep whichever is higher (TCN likely within 0.5% but 4× faster)

Optuna search space for LSTM:

"lstm_hidden":    trial.suggest_categorical("lstm_hidden", [64, 128, 256]),
"lstm_layers":    trial.suggest_int("lstm_layers", 1, 3),
"lstm_dropout":   trial.suggest_float("lstm_dropout", 0.1, 0.4),
"attn_dim":       trial.suggest_categorical("attn_dim", [32, 64, 128]),
"lstm_lr":        trial.suggest_float("lstm_lr", 5e-4, 5e-3, log=True),

Optuna search space for TCN:

"tcn_channels":   trial.suggest_categorical("tcn_channels", [64, 128, 256]),
"tcn_n_layers":   trial.suggest_int("tcn_n_layers", 3, 6),
"tcn_kernel":     trial.suggest_categorical("tcn_kernel", [3, 5, 7]),
"tcn_dropout":    trial.suggest_float("tcn_dropout", 0.05, 0.30),

ONNX export — LSTM: use torch.onnx.export with opset 17; set do_constant_folding=True; verify hidden state output is not exported (classification output only); test inference latency with onnxruntime on a single row — must be < 20 ms on CPU
At inference in ensemble_predictor.py: feed same _seq_buffer used by TFT; if buffer < 48, pad with zeros (same strategy as TFT)
Attach alpha_weights (shape (48,)) to the return value of get_signal() for dashboard streaming

22.6 Stacked meta-learner + regime-adaptive weighting

Algorithm — walk-forward OOF stacking: To prevent data leakage (meta-learner seeing the test set during base-model training), use time-series walk-forward folds:

Fold 1:  Train base models on months 1-6   → predict months 7-8   → collect OOF_1
Fold 2:  Train base models on months 1-8   → predict months 9-10  → collect OOF_2
Fold 3:  Train base models on months 1-10  → predict months 11-12 → collect OOF_3
...

The meta-feature matrix X_meta has shape (N_train, K × C) where:

K = number of base models (5: RF, XGB, LGB, ExtraTrees, FT-Transformer/LSTM/TFT)
C = number of classes (3: BUY, SELL, HOLD)
N_train = training samples with OOF predictions (unavoidably loses first fold’s samples)

Meta-learner loss:

L_meta = CrossEntropy( LGB_meta(X_meta), y_true )

Meta-LGB is kept shallow (max_depth=3, n_estimators=100) to prevent overfitting on K×C=15 features. Isotonic calibration of meta output: After training, apply per-class isotonic regression on a held-out calibration fold:

from sklearn.isotonic import IsotonicRegression
for c in range(3):
    ir_c = IsotonicRegression(out_of_bounds='clip')
    ir_c.fit(meta_probs_val[:, c], (y_val == c).astype(float))
    calibrated_probs[:, c] = ir_c.transform(meta_probs_test[:, c])
# Renormalize rows to sum to 1
calibrated_probs /= calibrated_probs.sum(axis=1, keepdims=True)

Regime router — adaptive weighting formula: When meta-learner confidence max(meta_probs) is < 0.55, fall back to regime-weighted averaging:

conf_threshold = 0.55
if max(meta_probs) >= conf_threshold:
    final_probs = meta_probs              # trust the meta-learner
else:
    w = regime_weights[current_regime]   # per-regime weight dict
    final_probs = Σ_k w_k × base_probs_k
    final_probs /= final_probs.sum()      # renormalize

Regime detection (computed from current _latest_features_cache):

def detect_regime(features: dict) -> str:
    adx = features["adx_14"]
    atr_pct = features["atr_percentile"]
    if adx > 0.25 and atr_pct < 0.75:  return "STRONG_TREND"
    if adx < 0.20 and atr_pct < 0.40:  return "RANGING"
    if atr_pct >= 0.75:                 return "VOLATILE"
    return "CHOPPY"

Tasks:

Create ml/meta_learner.py with:
- generate_oof_stacks(base_models, X, y, n_splits=5) → returns X_meta (N, 15), y_meta (N,) using TimeSeriesSplit
- Algorithm: for each fold, retrain all 5 base models on train split, predict predict_proba on val split, store in X_meta[val_idx]
- Save to models/oof_stacks.npy
- train_meta_learner(X_meta, y_meta) → fits LGBMClassifier(n_estimators=100, max_depth=3, learning_rate=0.1, num_leaves=15, min_child_samples=20), saves to models/meta_learner.pkl
- calibrate_meta(meta_model, X_cal, y_cal) → fits 3 IsotonicRegression objects, saves to models/meta_calibrators.pkl
Create ml/regime_router.py:
- detect_regime(features: dict) → str — 4-state classification using ADX + ATR percentile as above
- REGIME_WEIGHTS dict with per-regime model weights (initial values: tune via scripts/sweep.py --target regime)
- route(meta_probs, base_probs_dict, features) → np.ndarray — implements the fallback formula above
Update ml/ensemble_predictor.py:
- Load meta_learner.pkl and meta_calibrators.pkl in __init__ (alongside existing model loading)
- In get_signal(): collect all base model predict_proba outputs → stack into X_meta_row (1, 15) → run meta_learner.predict_proba → calibrate → pass to RegimeRouter.route()
Update scripts/weekly_optimize.py — add Phase 13b: after base model retraining, regenerate OOF stacks and retrain meta-learner (takes ~5 min extra; acceptable in weekly job)

Optuna search space for meta-learner itself:

"meta_n_est":       trial.suggest_int("meta_n_est", 50, 200),
"meta_max_depth":   trial.suggest_int("meta_max_depth", 2, 5),
"meta_lr":          trial.suggest_float("meta_lr", 0.05, 0.30),
"meta_num_leaves":  trial.suggest_int("meta_num_leaves", 7, 31),
"meta_conf_thresh": trial.suggest_float("meta_conf_thresh", 0.50, 0.65),

OOS gate: 5-model meta-learner must beat current 3-model majority-vote by ≥ 1.0 Score AND ≥ 2% WR; if not, revert to majority-vote and document result in logs/meta_learner_eval.log

Phase 23 — JARVIS Live Trading Dashboard

A real-time visualization system inspired by quant firm internal dashboards (Bloomberg DASH, QuantConnect live monitor, Two Sigma’s internal regime displays) and Jarvis-style AI interface aesthetics. Stack: Next.js + WebSocket + TradingView Lightweight Charts + Framer Motion + Apache ECharts + optional Three.js.

This is read-only. The dashboard connects to a new WebSocket endpoint on the bot server and never writes to config.json, triggers orders, or modifies bot state.

Frontend dependencies (add to package.json):

npm install framer-motion@11        # animation engine
npm install lightweight-charts@4    # professional candlestick chart
npm install recharts@2              # bar/area charts
npm install echarts@5 echarts-for-react@3   # heatmap
npm install @react-three/fiber@8 @react-three/drei@9 three@0.165  # 3D surface (stretch)
npm install @supabase/supabase-js@2  # already installed
npm install zustand@4               # lightweight global state (signal stream store)

Backend dependencies (add to requirements.txt):

fastapi>=0.111
uvicorn[standard]>=0.29
websockets>=12.0
shap>=0.45          # live SHAP values per signal

23.1 WebSocket signal stream (backend)

Message contract — SignalEvent (full schema):

interface SignalEvent {
  ts: string;              // ISO 8601 UTC timestamp
  prediction: "BUY" | "SELL" | "HOLD";
  confidence: number;      // max(probs) after calibration
  prob_diff: number;       // probs[0] - probs[1] (margin of victory)
  probs: [number, number, number];  // [BUY, SELL, HOLD]
  model_votes: {
    rf: "BUY" | "SELL" | "HOLD";
    xgb: "BUY" | "SELL" | "HOLD";
    lgb: "BUY" | "SELL" | "HOLD";
    ft_transformer?: "BUY" | "SELL" | "HOLD";  // optional until Phase 22.3 ships
    lstm?: "BUY" | "SELL" | "HOLD";
  };
  model_confidences: Record<string, number>;  // per-model max prob
  top_shap: Array<{ name: string; value: number }>;  // top 10, signed
  attention_weights?: number[];   // 48 floats from TFT/LSTM (nullable)
  regime: "STRONG_TREND" | "RANGING" | "VOLATILE" | "CHOPPY";
  adx: number;
  atr_percentile: number;
  equity: number;            // current account equity (raw USC)
  open_position: {
    ticket: number;
    direction: "BUY" | "SELL";
    entry: number;
    sl: number;
    tp: number;
    unrealized_pnl: number;
    bars_held: number;
  } | null;
  ohlcv: {  // current completed M15 bar
    time: number;  // Unix timestamp
    open: number; high: number; low: number; close: number; volume: number;
  };
}

Tasks:

Create trading/ws_server.py — FastAPI app on port 8765:

app = FastAPI()
_signal_queue: asyncio.Queue[SignalEvent] = asyncio.Queue(maxsize=10)

@app.websocket("/ws/signal")
async def stream(ws: WebSocket, token: str = Query(...)):
    if token != os.getenv("DASHBOARD_WS_SECRET"):
        await ws.close(code=4001); return
    await ws.accept()
    try:
        while True:
            event = await _signal_queue.get()
            await ws.send_json(dataclasses.asdict(event))
    except WebSocketDisconnect:
        pass

In trading/bot.py after _get_signal() returns (around line 3200): push SignalEvent to _signal_queue via asyncio.get_event_loop().call_soon_threadsafe(_signal_queue.put_nowait, event) — bot runs in a thread, ws_server in asyncio event loop
Run ws_server.py via uvicorn in a background thread started at bot startup; add DASHBOARD_WS_SECRET to .env
Add to ecosystem.config.js: second PM2 process ws_server using uvicorn trading.ws_server:app --port 8765
Caddy config: add reverse_proxy /ws/signal localhost:8765 under terminal-rf1.novosky.app block
SHAP computation: after each build_features() call, compute shap.TreeExplainer(lgb_model).shap_values(X_current)[signal_class] — takes ~20 ms on CPU; acceptable at M15 frequency; include top 10 by abs value in top_shap

23.2 Core dashboard layout (Next.js)

File structure:

app/dashboard/
  page.tsx             # root page — imports all panels
  layout.tsx           # standalone dark layout (no site nav/footer)
  loading.tsx          # skeleton loader while WS connects
components/dashboard/
  ConfidenceMeter.tsx
  ModelVotingPanel.tsx
  ModelConfidenceBars.tsx
  RegimeIndicator.tsx
  FeatureImportance.tsx
  TradeFlowPipeline.tsx
  CandlestickChart.tsx
  AttentionHeatmap.tsx
  EquityPanel.tsx
  EquitySurface3D.tsx  # stretch
hooks/
  useSignalStream.ts   # WebSocket client + store
  useEquityHistory.ts  # Supabase historical equity query
stores/
  signalStore.ts       # Zustand store for latest signal state

useSignalStream.ts — reconnect logic:

export function useSignalStream(url: string) {
  const setSignal = useSignalStore(s => s.setSignal);
  const wsRef = useRef<WebSocket | null>(null);
  const reconnectDelay = useRef(1000);

  const connect = useCallback(() => {
    const ws = new WebSocket(`${url}?token=${process.env.NEXT_PUBLIC_WS_SECRET}`);
    ws.onmessage = (e) => {
      setSignal(JSON.parse(e.data));
      reconnectDelay.current = 1000;  // reset backoff on success
    };
    ws.onclose = () => {
      setTimeout(connect, reconnectDelay.current);
      reconnectDelay.current = Math.min(reconnectDelay.current * 2, 30_000);
    };
    wsRef.current = ws;
  }, [url, setSignal]);

  useEffect(() => { connect(); return () => wsRef.current?.close(); }, [connect]);
}

Tasks:

Create app/dashboard/layout.tsx with className="min-h-screen bg-slate-950 text-slate-100 font-mono" — separate from main site layout; no nav bar
app/dashboard/page.tsx: CSS Grid layout — grid-cols-[40%_60%] on desktop, single column on mobile; gap-4; all panels inside <Suspense> boundaries
stores/signalStore.ts: Zustand store with fields signal: SignalEvent | null, history: SignalEvent[] (last 200), connected: boolean; setSignal appends to history and updates latest
Throttle store updates: wrap setSignal with a 250 ms debounce (4 Hz max re-render rate)
Connection status pill: top-right corner, 8px dot — animate-pulse green when connected; amber when reconnecting; static red when disconnected for > 10 s

23.3 Animated confidence meter

Radial arc implementation using SVG + Framer Motion: The arc is drawn as an SVG <path> using polar-to-Cartesian conversion:

function polarToCartesian(cx, cy, r, angleDeg) {
  const rad = (angleDeg - 90) * Math.PI / 180;
  return { x: cx + r * Math.cos(rad), y: cy + r * Math.sin(rad) };
}

function arcPath(cx, cy, r, startDeg, endDeg) {
  const s = polarToCartesian(cx, cy, r, startDeg);
  const e = polarToCartesian(cx, cy, r, endDeg);
  const large = endDeg - startDeg > 180 ? 1 : 0;
  return `M ${s.x} ${s.y} A ${r} ${r} 0 ${large} 1 ${e.x} ${e.y}`;
}
// Usage: arc from -135° to (-135° + confidence × 270°) → covers 270° total sweep

Tasks:

ConfidenceMeter.tsx: SVG-based radial arc; background arc (dark stroke) + foreground arc animated with motion.path and animate={{ pathLength: confidence }} (Framer Motion SVG animation); center text shows percentage
Spring config: transition={{ type: "spring", stiffness: 120, damping: 20 }} on pathLength change — avoids linear snap, feels organic

On new signal: trigger outer ring pulse using useAnimate:

const [scope, animate] = useAnimate();
useEffect(() => {
  if (signal) animate(scope.current, { scale: [1, 1.4, 1], opacity: [1, 0.3, 1] },
                      { duration: 0.6, repeat: 2 });
}, [signal?.ts]);

Color: derive from signal.prediction — emerald-400 (BUY), rose-400 (SELL), amber-400 (HOLD); use CSS variable for smooth color transition via motion.div animate={{ color }} with transition={{ duration: 0.3 }}
ModelConfidenceBars.tsx: horizontal progress bars per model using motion.div; set the animate width to confidence * 100 percent as a string value, transition={{ duration: 0.25 }}

23.4 Model voting panel

Tasks:

ModelVotingPanel.tsx: 5-card grid (grid-cols-5 gap-3); each card is a motion.div with layout prop (enables FLIP animation on reorder); background color set via animate={{ backgroundColor }} — Framer Motion handles color interpolation
Scale pop on vote change: track prevVote in useRef; if vote !== prevVote, trigger animate={{ scale: [1, 1.15, 1] }, { duration: 0.2 }}
Consensus glow: when all 5 models agree — animate={{ boxShadow: "0 0 24px #10b981" }} (emerald for BUY) with transition={{ repeat: Infinity, repeatType: "reverse", duration: 1.2 }}
Split signal badge: if Object.values(votes).filter(v => v === prediction).length <= 2, render amber ⚠ Split badge using AnimatePresence for enter/exit animation (slide down from top)
Majority fraction badge: "4 / 5 BUY" string derived from vote counts; update without animation (content only)

23.5 Market regime indicator

Regime transition math: Regime changes should feel deliberate, not flickering. Apply a hysteresis rule: only switch regime if the new regime persists for 3 consecutive signals (45 min at M15 frequency):

const regimeBuffer = useRef<string[]>([]);
const confirmedRegime = useSignalStore(s => s.signal?.regime);

useEffect(() => {
  regimeBuffer.current.push(confirmedRegime);
  if (regimeBuffer.current.length > 3) regimeBuffer.current.shift();
  const dominant = mode(regimeBuffer.current);  // most frequent in last 3
  if (dominant !== displayedRegime) setDisplayedRegime(dominant);
}, [confirmedRegime]);

Tasks:

RegimeIndicator.tsx: outer AnimatePresence mode="wait" — exit old card (opacity 0, y -20) then enter new card (opacity 1, y 0); transition={{ duration: 0.4 }}
Glow border: animate={{ boxShadow: glowColor }} — map regime to glow: STRONG_TREND→"0 0 30px #10b981", RANGING→"0 0 30px #3b82f6", VOLATILE→"0 0 30px #ef4444", CHOPPY→"0 0 30px #eab308"
Stats row: ADX value, ATR percentile (as %), recent WR from last 20 signals in signalStore.history; formatted as ADX 0.31 · ATR p72 · WR 68%
Mini sparkline: <AreaChart width={120} height={40} data={adxHistory}> — no axes, no labels, just the shape; <Area dataKey="adx" stroke="#94a3b8" fill="transparent" strokeWidth={1.5} />

23.6 Live feature importance bar chart

SHAP value sign convention: positive SHAP means the feature pushed the model toward prediction class; negative means it pulled away. Color accordingly. Tasks:

FeatureImportance.tsx: <BarChart layout="vertical" width={380} height={300}> — horizontal bars; <Bar dataKey="value" animationDuration={300} animationEasing="ease-out"> with <Cell fill={v > 0 ? "#14b8a6" : "#f43f5e"} /> per bar
Sort top 10 by Math.abs(shap) descending; truncate feature name to 18 chars; <Tooltip formatter={(v) => v.toFixed(4)} />
On update: Recharts re-renders with animationDuration=300 automatically animates bar width changes — no extra work needed
Tabs component (shadcn/ui <Tabs>): tab 1 = SHAP, tab 2 = Attention (disabled/grayed until Phase 22.4 ships); when tab 2 is unlocked, render AttentionHeatmap inline

23.7 Trade flow animation

State machine — 6 nodes, transitions triggered by WebSocket events:

IDLE ──[new signal fires]──→ SCAN ──[confidence > threshold]──→ SIGNAL
     ──[risk check pass]──→ RISK CHECK ──[lot calculated]──→ SIZE
     ──[order sent]──→ EXECUTE ──[order confirmed]──→ MONITOR
     ──[position closes]──→ IDLE

Tasks:

TradeFlowPipeline.tsx: horizontal node list with SVG connecting lines; each node is a 40px circle + label below
Active node animation: motion.div animate={{ rotate: 360 }} transition={{ repeat: Infinity, duration: 2, ease: "linear" }} on the outer ring; inner circle static
Node-to-node hop: use useEffect on signal.open_position to advance state machine; 200 ms delay between hops via sequential setTimeout calls (not sleep — use Promise chain)
Connecting line fills as each node activates: motion.line animate={{ pathLength: isComplete ? 1 : 0 }} transition={{ duration: 0.2 }}
Close event: subscribe to Supabase trades table INSERT; on INSERT, determine close type from close_type column → pulse MONITOR node emerald (TP / ML_EXIT) or rose (SL_HIT) with 3× scale keyframe, then reset state machine to IDLE after 2 s

23.8 TradingView Lightweight Charts integration

Tasks:

CandlestickChart.tsx: initialize chart in useEffect with cleanup; use useRef<IChartApi> for the chart instance to survive re-renders

const chart = createChart(containerRef.current, {
  layout: { background: { color: '#020617' }, textColor: '#94a3b8' },
  grid: { vertLines: { color: '#1e293b' }, horzLines: { color: '#1e293b' } },
  rightPriceScale: { borderColor: '#334155' },
  timeScale: { borderColor: '#334155', timeVisible: true },
  width: containerRef.current.clientWidth,
  height: 420,
});

On mount: fetch last 200 M15 bars from GET /api/bars?symbol=BTCUSD&tf=M15&limit=200 (add this Next.js API route that proxies to the MT5 HTTP API)
On each SignalEvent: append signal.ohlcv to candleSeries.update() — Lightweight Charts handles the scrolling automatically
BUY/SELL markers: accumulate markers array from signalStore.history; call candleSeries.setMarkers(markers) on each update — Lightweight Charts re-renders only changed markers
SL/TP lines: use chart.addLineSeries({ lineStyle: LineStyle.Dashed }) for SL (rose) and TP (emerald); update price on position change; series.setData([]) when no position open
Confidence histogram panel: add chart.addHistogramSeries({ priceScaleId: 'confidence', height: 80 }) — maps signal.confidence to bar height; color by direction
ResizeObserver: watch container width changes → chart.applyOptions({ width: newWidth }) for responsiveness

23.9 Neural attention heatmap

ECharts heatmap config:

const option = {
  tooltip: { formatter: ({ data }) => `Bar ${data[0]}: ${data[1]} = ${data[2].toFixed(3)}` },
  xAxis: { type: 'category', data: barLabels,   // e.g., ["-48", "-47", ..., "0"]
            axisLabel: { interval: 7 } },
  yAxis: { type: 'category', data: featureNames.slice(0, 10) },
  visualMap: { min: 0, max: 1, calculable: true,
                inRange: { color: ['#1e3a5f', '#2563eb', '#ef4444'] } },  // blue→red
  series: [{ type: 'heatmap', data: flatData,
              itemStyle: { borderRadius: 2, borderWidth: 0.5, borderColor: '#0f172a' } }],
};

Stagger animation:

// On new signal, reset and re-animate each column with increasing delay
flatData.forEach((cell, i) => {
  const col = cell[0];
  setTimeout(() => {
    updateCell(col, cell[1], cell[2]);
  }, col * 8);  // 8 ms per column × 48 columns = 384 ms total stagger
});

Tasks:

AttentionHeatmap.tsx: use echarts-for-react wrapper; pass option as prop; style={{ height: 280 }}; update option via useState triggered by signal.attention_weights
Transform attention_weights: number[48] into flatData: for each of the top 10 features (by SHAP), duplicate the bar-level attention weight — weight is the same per bar regardless of feature (TFT VSN gives per-feature weights separately; show VSN weights on Y axis if available)
Show placeholder <div className="...">Attention available after Phase 22.4</div> if signal.attention_weights is null

23.10 Equity curve + live P&L panel

Live P&L tick calculation:

const TICK_MS = 100;
const lotSize = openPosition?.lot ?? 0;
const direction = openPosition?.direction === 'BUY' ? 1 : -1;

useEffect(() => {
  if (!openPosition) return;
  const interval = setInterval(() => {
    // Estimate tick P&L from last known close price in latest signal
    const currentPrice = latestSignal?.ohlcv?.close ?? openPosition.entry;
    const priceDiff = (currentPrice - openPosition.entry) * direction;
    const pnl = priceDiff * lotSize * 100;  // 100 USC per lot per point (BTCUSD CFD)
    setLivePnl(pnl);
  }, TICK_MS);
  return () => clearInterval(interval);
}, [openPosition, latestSignal]);

Tasks:

EquityPanel.tsx: <AreaChart> from Recharts with gradient fill (<defs><linearGradient> — emerald above baseline, transparent below); data from useEquityHistory hook (Supabase query SELECT equity, created_at FROM account_snapshots ORDER BY created_at DESC LIMIT 2000)
useEquityHistory.ts: initial query on mount + Supabase realtime subscription supabase.channel('account_snapshots').on('postgres_changes', { event: 'INSERT' }, handler)
Live P&L counter: <motion.span animate={{ color: livePnl >= 0 ? '#10b981' : '#f43f5e' }}> with transition={{ duration: 0.15 }}; value formatted as +$123.45 / -$12.30 (raw USC with $ prefix per CLAUDE.md)
Today’s stats row: derive from signalStore.history — trades_today = history.filter(s => sameDay(s.ts)), wr_today = wins/trades_today, gross_pnl_today = sum of closed trade pnl from Supabase
Last 5 trades table: columns Dir · Conf · P&L · Close Type; ML_EXIT shown in emerald, SL_HIT in rose, TP_HIT in teal; sorted by close time descending

23.11 (Stretch) 3D equity surface — Three.js / R3F

Surface data construction: Run python scripts/surface_sweep.py (new script) that iterates confidence_threshold ∈ [0.55, 0.60, …, 0.85] × rolling 90-day windows and records cumulative_return at each point. Output: models/surface_data.json with shape (n_thresholds, n_days). Geometry construction:

// surface[i][j] = equity at threshold i, day j
// Normalize: x = j / n_days, z = i / n_thresholds, y = equity / max_equity

const geometry = new THREE.PlaneGeometry(10, 10, n_days - 1, n_thresholds - 1);
surface.flat().forEach((equity, idx) => {
  geometry.attributes.position.setY(idx, (equity / maxEquity) * 3);  // Y = height
});
geometry.computeVertexNormals();  // for lighting

// Vertex colors (jet colormap)
const colors = surface.flat().map(e => jetColor(e / maxEquity));
geometry.setAttribute('color', new THREE.BufferAttribute(new Float32Array(colors.flat()), 3));

Tasks:

scripts/surface_sweep.py: runs backtest.py --oos-only in a subprocess for each confidence threshold (7 values × 1 sweep = ~15 min total); outputs models/surface_data.json
components/dashboard/EquitySurface3D.tsx using @react-three/fiber:
- <Canvas camera={{ position: [8, 5, 8], fov: 45 }}> + <OrbitControls autoRotate autoRotateSpeed={0.5} />
- Mesh: PlaneGeometry with vertex colors (jet colormap), MeshPhongMaterial({ vertexColors: true, wireframe: false })
- Wireframe overlay: same geometry with MeshBasicMaterial({ wireframe: true, color: '#1e293b', opacity: 0.3 })
- Lighting: <ambientLight intensity={0.4} /> + <directionalLight position={[10, 10, 5]} />
- Axis labels as <Text> sprites from @react-three/drei
Gate behind ?surface=1 URL param; add toggle button in dashboard header; lazy-import component to avoid bundling Three.js by default

Full quarterly roadmap → Roadmap.

Completed phases (1–12)

Phase 1 — Fix the model

SELL recall improved 3% → 30% by adding 5 directional features: ema_stack, candle_direction, volume_delta, rsi_slope, consecutive_direction
TP=0.3% / SL=0.25% tuning: WR 60.6%→77.3%, PF 1.48→2.85, MaxDD 12%→6.3%
Training data extended 365 → 730 days; lookahead_candles 12 → 24
LightGBM feature-name warning fixed (pass numpy array directly)

Phase 2 — Multi-timeframe features

H4/D1 features resampled from M15: h4_ema_bias, h4_rsi_norm (#5 SHAP), h4_macd_dir, d1_trend, price_vs_d1_open (#7 SHAP)
Session flags: is_london_session, is_ny_session, is_asian_session, session_hour_sin/cos
Volatility/momentum: atr_percentile (top feature), volume_surge, bb_squeeze, price_acceleration
S/R proximity: dist_to_round_number (#1 SHAP overall), near_daily_high_low, adx_14, market_quality, momentum_decay, adverse_candle_ratio
ATR-aware labeling groundwork: create_labels_atr_aware() added (activated in Phase 10)

Phase 3 — Hyperparameter tuning + SHAP

Optuna 50-trial tuning: WF 44.9%→45.7%, PF 2.82→2.96, MaxDD 7.6%→3.8%, Sharpe 7.23→8.63
SHAP analysis: ml/shap_analysis.py with TreeExplainer; beeswarm plots; models/shap_summary.json
Top SHAP features: h4_rsi_norm > atr_14 > hourly_return > price_vs_ema200 > session_hour_sin
Feature pruning tested and rejected — removing low-SHAP features hurt performance (WR 55.9%→50%)
Sequence model deferred (CPU-only hardware too slow)

Phase 4 — Risk management (superseded)

Items superseded by ML-driven config sweeps in Phases 9/13/14.

EMA trend filter, partial profit taking, trailing stop — all superseded by ML active management
max_weekly_drawdown_pct added to config.json (2026-04-11)

Phase 5 — Backtesting

backtest.py built: config-faithful OOS backtester with ONNX inference; WR, PF, Sharpe, MaxDD, Return
v8 results (1yr OOS, 48 features): Setup A WR=76.4% PF=3.20 MaxDD=2.6% | Setup D Return=+747%
Walk-forward backtest deferred to Phase 16

Phase 6 — Automation & monitoring

Signal logging to models/ml_performance.csv
Performance monitoring via weekly walk-forward OOS gate (weekly_optimize.py)
Live alerts via trading/telegram_commands.py + scripts/notify.py

Phase 7 — Live trading integration

Migrated trading.py from MetaTrader5 Python package to NOVOSKY HTTP API
--dry flag added for safe testing
Paper trade validated via OOS backtest: WR=82.7%, PF=4.27, MaxDD=5.2%

Phase 8 — ML active trade management (2026-04-10)

Dedicated position model: RF+XGB+LGB on 63 features (59 market + 4 position-state)
Labels: HOLD / EXIT / ADD — ml/position_labeling.py + ml/position_trainer.py
PositionPredictor.get_position_action() with 2/3 majority vote
Kelly-adjusted lot sizing, ML-based SL/TP scaling, partial close, trailing stop
Results: Signal WF=43.66%, Position ensemble=73.50% | Setup E: WR=78.8% PF=4.52 MaxDD=1.6%

Phase 9 — Growth config sweep (2026-04-10)

Goal: maximize monthly return for $10k account, IC Markets RAW
Best: conf=0.55, risk=20%, max_lot=10, all Phase 8 active management disabled
Result: WR=57.4%, PF=1.75, MaxDD=15.0%, Sharpe=4.68, Return=+8449%, ~340 trades/yr

Phase 10 — Deep optimization (2026-04-11)

Four critical bugs fixed and retrained:

Label-execution mismatch — activated atr_aware labeling in ml_config.json
Class imbalance — replaced downsample with compute_sample_weight('balanced')
Spread underestimation — backtest.py spread fallback $0.30 →$ 14.59
ADX regime filter — new adx_regime_filter block in config.json

Retrain result: WR 48.8%→78.6%, PF 1.15→2.91, MaxDD 59.8%→22.1%, Sharpe 2.71→14.11

Phase 11 — M15 scalping + local timezone support (2026-04-11)

H1 → M15 timeframe; 112 trades/yr → 477 trades/yr (1.31/day)
ATR-aware labels: SL=0.8×ATR, TP=1.5×ATR, lookahead=48 bars
sl_atr_multiplier 1.0→0.8; min_atr 50→15; ADX filter disabled
Local timezone support via config.json; Telegram redesign
Backtest: WR=63.7%, PF=3.05, MaxDD=19.6%, Sharpe=7.29, Return=+1,284,866%

Phase 12 — Production hardening (2026-04-12)

15 critical issues resolved:

API retry with exponential backoff (3× on network errors)
Atomic state.json writes via .tmp → os.replace()
tracked_positions persisted to state.json — survives PM2 restarts
risk_percent 6→2; max_consecutive_losses 10→5; max_weekly_drawdown_pct 0→20
Full retrain: fresh 2yr M15 data, Optuna 50-trial local tuning, OOS backtest

Roadmap

Documentation Index

​Current state — Phase 17.3 (active)

Signal model

Position model

​OOS performance

​In progress — immediate blockers

​Phase 15 — Production transition

​15.0 Broker-agnostic refactor

​15.1 Position model validation

​15.2 Automated retrain pipeline

​15.3 Cloud monitoring

​15.4 Telegram bot commands

​15.5 Broker safety audit

​15.6 Weekly validation cadence

​Phase 16 — Risk guards & validation

​16.1 Enable daily loss guard

​16.2 Equity curve filter

​16.3 Extend walk-forward OOS gate

​16.4 Enable Kelly lot sizing

​16.5 Broker-Agnostic Multi-Account Architecture

​Phase 21 — Dynamic SL/TP & position model upgrades

​21.1 Enable and validate ml_sltp (confidence-scaled TP/SL at entry)

​21.2 Trailing stop + lower min_bars_held

​21.3 M1 intra-candle feature augmentation for position model

​21.4 Catastrophic SL + position model as primary exit

​21.5 Risk model scope — architectural constraint

​Phase 17 — Feature engineering

​17.1 On-chain & derivatives features

​17.2 Market regime features

​17.3 OHLCV data redundancy pipeline

​17.4 Smart Money Concepts (SMC) features

​Features to add

​Implementation

​Feature names to add to model_compat.json

​Integration tasks

​Expected impact

​Phase 18 — Model architecture

​18.1 Stacked meta-learner

​18.2 Calibrated probability outputs

​18.3 Regime-switching model

​18.4 Multi-instrument expansion

​Phase 19 — Infrastructure & reliability

​19.1 Live trade dashboard

​19.2 API failover

​19.3 Graceful shutdown improvements

​19.4 Config hot-reload

​Phase 22 — Advanced Ensemble Architecture (XGBoost · RF · FT-Transformer · TFT · LSTM)

​22.1 XGBoost — DART mode + monotonic constraints + Optuna search space

​22.2 Random Forest — ExtraTrees + Quantile intervals for Kelly

​22.3 FT-Transformer (Feature Tokenizer + Transformer)

​22.4 Temporal Fusion Transformer (TFT) — sequence model

​22.5 LSTM with Bahdanau attention + TCN alternative

​22.6 Stacked meta-learner + regime-adaptive weighting

​Phase 23 — JARVIS Live Trading Dashboard

​23.1 WebSocket signal stream (backend)

​23.2 Core dashboard layout (Next.js)

​23.3 Animated confidence meter

​23.4 Model voting panel

​23.5 Market regime indicator

​23.6 Live feature importance bar chart

​23.7 Trade flow animation

​23.8 TradingView Lightweight Charts integration

​23.9 Neural attention heatmap

​23.10 Equity curve + live P&L panel

​23.11 (Stretch) 3D equity surface — Three.js / R3F

​Completed phases (1–12)

​Phase 1 — Fix the model

​Phase 2 — Multi-timeframe features

​Phase 3 — Hyperparameter tuning + SHAP

​Phase 4 — Risk management (superseded)

​Phase 5 — Backtesting

​Phase 6 — Automation & monitoring

​Phase 7 — Live trading integration

​Phase 8 — ML active trade management (2026-04-10)

​Phase 9 — Growth config sweep (2026-04-10)

​Phase 10 — Deep optimization (2026-04-11)

​Phase 11 — M15 scalping + local timezone support (2026-04-11)

​Phase 12 — Production hardening (2026-04-12)

Current state — Phase 17.3 (active)

OOS performance

In progress — immediate blockers

Phase 15 — Production transition

15.0 Broker-agnostic refactor

15.1 Position model validation

15.2 Automated retrain pipeline

15.3 Cloud monitoring

15.4 Telegram bot commands

15.5 Broker safety audit

15.6 Weekly validation cadence

Phase 16 — Risk guards & validation

16.1 Enable daily loss guard

16.2 Equity curve filter

16.3 Extend walk-forward OOS gate

16.4 Enable Kelly lot sizing

16.5 Broker-Agnostic Multi-Account Architecture

Phase 21 — Dynamic SL/TP & position model upgrades

21.1 Enable and validate `ml_sltp` (confidence-scaled TP/SL at entry)

21.2 Trailing stop + lower `min_bars_held`

21.3 M1 intra-candle feature augmentation for position model

21.4 Catastrophic SL + position model as primary exit

21.5 Risk model scope — architectural constraint

Phase 17 — Feature engineering

17.1 On-chain & derivatives features

17.2 Market regime features

17.3 OHLCV data redundancy pipeline

17.4 Smart Money Concepts (SMC) features

Features to add

Implementation

Feature names to add to `model_compat.json`

Integration tasks

Expected impact

Phase 18 — Model architecture

18.1 Stacked meta-learner

18.2 Calibrated probability outputs

18.3 Regime-switching model

18.4 Multi-instrument expansion

Phase 19 — Infrastructure & reliability

19.1 Live trade dashboard

19.2 API failover

19.3 Graceful shutdown improvements

19.4 Config hot-reload

Phase 22 — Advanced Ensemble Architecture (XGBoost · RF · FT-Transformer · TFT · LSTM)

22.1 XGBoost — DART mode + monotonic constraints + Optuna search space

22.2 Random Forest — ExtraTrees + Quantile intervals for Kelly

22.3 FT-Transformer (Feature Tokenizer + Transformer)

22.4 Temporal Fusion Transformer (TFT) — sequence model

22.5 LSTM with Bahdanau attention + TCN alternative

22.6 Stacked meta-learner + regime-adaptive weighting

Phase 23 — JARVIS Live Trading Dashboard

23.1 WebSocket signal stream (backend)

23.2 Core dashboard layout (Next.js)

23.3 Animated confidence meter

23.4 Model voting panel

23.5 Market regime indicator

23.6 Live feature importance bar chart

23.7 Trade flow animation

23.8 TradingView Lightweight Charts integration

23.9 Neural attention heatmap

23.10 Equity curve + live P&L panel

23.11 (Stretch) 3D equity surface — Three.js / R3F

Completed phases (1–12)

Phase 1 — Fix the model

Phase 2 — Multi-timeframe features

Phase 3 — Hyperparameter tuning + SHAP

Phase 4 — Risk management (superseded)

Phase 5 — Backtesting

Phase 6 — Automation & monitoring

Phase 7 — Live trading integration

Phase 8 — ML active trade management (2026-04-10)

Phase 9 — Growth config sweep (2026-04-10)

Phase 10 — Deep optimization (2026-04-11)

Phase 11 — M15 scalping + local timezone support (2026-04-11)

Phase 12 — Production hardening (2026-04-12)