This guide covers how to push NOVOSKY’s performance beyond what the automated weekly pipeline achieves. There are two versions: one that uses only standard Python CLI commands (no subscription required), and one that uses Claude Code agents for faster iteration.
Both paths lead to the same place — better models, higher OOS Score, lower drawdown.
Responsibility split: developers run optimization/retraining and publish approved model revisions to R2. Regular users only pull approved revisions and run trading with profiles 1-5.
Normal weekly rhythm (both paths)
Sunday Cron fires automatically — you sleep
Monday Read Telegram, check if Score improved
Wednesday 2-minute spot check
As needed Manual optimization when performance degrades
After each accepted run, publish and announce profile-specific revisions so traders can pull the correct tag (for example vYYYYMMDD-p1 to vYYYYMMDD-p5).
Monday check
# Last optimize result
python3 -c "
import json
b = json.load(open('models/optimize_best.json'))
print('Best ever Score :', b['score'])
print('Profile :', b.get('profile_name', 'unknown'))
print('Achieved :', b['achieved_at'][:10])
"
# Live trade stats (last 50 trades)
python scripts/performance_monitor.py --lookback 50
Wednesday spot check
pm2 status
python scripts/performance_monitor.py --lookback 20
Path A — Pure CLI (no external AI tools)
Everything below runs with standard Python. No subscriptions or external accounts beyond what you already have (MT5, R2, Telegram).
When Score degrades — standard optimization cycle
# 1. Establish baseline before touching anything
python backtest_config.py \
--balance 500 --no-swap --leverage 500 \
--spread 16.95 --oos-only --no-chart
# 2. Quick sweep without retraining (~45 min)
# Tests 40-50 config combos within your current profile's ranges
python scripts/weekly_optimize.py \
--skip-retrain \
--no-commit --no-push \
--profile balanced
# 3. If sweep didn't fix it — full retrain + sweep (~2.5 h)
python scripts/weekly_optimize.py \
--no-commit --no-push \
--profile balanced
# 4. Review the backtest result, then commit if happy
python backtest_config.py \
--balance 500 --no-swap --leverage 500 \
--spread 16.95 --oos-only --no-chart
git add config.json ml_config.json strategy_params.json PERF_HISTORY.md
git commit -m "feat: manual optimization — Score X.XX → Y.YY"
python ml/r2_hub.py --push
When profile-specific outputs differ, publish and communicate profile-tagged revisions for user pull commands.
Tune hyperparameters manually
# Signal model — Optuna, 50 trials (~30 min)
python ml/tune/hyperparams.py --trials 50
# Position model — Optuna, 50 trials (~20 min)
python ml/tune/position.py --trials 50
# Then retrain with the new hyperparams
python train_ml_model.py --ensemble --position --refresh --shap
Retrain options
# Incremental (warm-start from current models — faster)
python train_ml_model.py --ensemble --position --risk --refresh --shap
# From scratch (full cold start — use after removing features)
python train_ml_model.py --ensemble --position --risk --no-warmstart --refresh --shap
# Train risk model only (signal + position must already be on disk)
python train_ml_model.py --risk
# SHAP analysis only (diagnostic, no training)
python train_ml_model.py --shap-only
Adding a new feature (Three-File Rule)
The Three-File Rule: any feature change must touch ml/feature_engineering.py, ml_config.json, and backtest/ (verified, not necessarily edited) before retraining.
There is also a second Three-File Rule for the risk model: ml/risk_trainer.py RISK_FEATURES must equal ml/risk_predictor.py RISK_FEATURES must equal ml_config.json → risk_model.features. These are equity-state features (separate from the 55 signal features) — never mix them up.
# Step 1: implement in ml/feature_engineering.py
# Add calculation inside add_all_features()
# The column must exist in the DataFrame before feature engineering returns
# Step 2: add to ml_config.json features list
# Append the new feature name at the END of the "features" array
# Do NOT reorder existing features — scaler dimension order is fixed
# Step 3: verify the count matches (will mismatch until retrain — that is expected)
python3 -c "
import json
ml = json.load(open('ml_config.json'))
cc = json.load(open('models/model_compat.json'))
print('ml_config features :', len(ml['features']))
print('model_compat expects:', cc['feature_count'])
print('Delta :', len(ml['features']) - cc['feature_count'])
"
# Step 4: retrain from scratch (new feature changes scaler dimensions)
# Always include --risk to retrain the risk multiplier model alongside signal + position
python train_ml_model.py --ensemble --position --risk --no-warmstart --refresh --shap
# Step 5: OOS backtest — compare vs baseline
python backtest_config.py \
--balance 500 --no-swap --leverage 500 \
--spread 16.95 --oos-only --no-chart
# Step 6: if Score improved, push and commit
python ml/r2_hub.py --push
git add ml/feature_engineering.py ml_config.json \
models/model_compat.json models/ensemble_btcusd-live_metadata.json \
models/risk_metadata.json \
strategy_params.json PERF_HISTORY.md
git commit -m "feat: add <feature_name> — OOS Score X.XX → Y.YY"
Never remove is_news_near, is_london_session, is_ny_session, is_asian_session, or is_news_risk_window from the feature list based on low SHAP values. These features are near-zero during training by design but are injected with live values at inference time.
Test a config change before committing
# Test with a temporary config — production config.json is never touched
python3 -c "
import json, copy, subprocess, sys, tempfile, os
cfg = json.load(open('config.json'))
cfg['dynamic_position_sizing']['risk_percent'] = 2.5 # the change you want to test
with tempfile.NamedTemporaryFile('w', suffix='.json', delete=False) as f:
json.dump(cfg, f)
tmp = f.name
r = subprocess.run([
sys.executable, 'backtest_config.py',
'--balance', '500', '--no-swap', '--leverage', '500',
'--spread', '16.95', '--oos-only', '--no-chart',
'--config', tmp,
], capture_output=True, text=True, cwd='.')
print(r.stdout[-3000:])
os.unlink(tmp)
"
Read SHAP results
# Run SHAP analysis
python train_ml_model.py --shap-only
# Read results
python3 -c "
import json
data = json.load(open('models/shap_summary.json'))
if 'ranked_features' in data:
print('Top 10 features:')
for f in data['ranked_features'][:10]:
print(f' {f[\"mean_abs_shap\"]:.4f} {f[\"feature\"]}')
print()
print('Bottom 10 features (never drop without careful review):')
for f in data['ranked_features'][-10:]:
print(f' {f[\"mean_abs_shap\"]:.5f} {f[\"feature\"]}')
"
Compare two configs head-to-head
python scripts/compare_configs.py \
--old-config config_backup.json \
--old-ml ml_config_backup.json
Full health check
python3 -c "
import json, os
# Feature integrity
mc = json.load(open('ml_config.json'))
cc = json.load(open('models/model_compat.json'))
feat_ok = cc['feature_count'] == len(mc['features'])
print(f'[{\"OK\" if feat_ok else \"FAIL\"}] Features: {len(mc[\"features\"])} (expect {cc[\"feature_count\"]})')
# Config sanity
cfg = json.load(open('config.json'))
risk = cfg['dynamic_position_sizing']['risk_percent']
cb = cfg['max_consecutive_losses']
total = cfg.get('max_total_drawdown_pct', 0)
wkly = cfg.get('max_weekly_drawdown_pct', 0)
rm = cfg.get('risk_model', {}).get('enabled', False)
print(f'[OK] Risk: {risk}% CB: {cb} Weekly DD: {wkly}% Total halt: {total}%')
print(f'[OK] Risk model enabled: {rm}')
# Signal + position model files
model_files = [
'models/ensemble_rf.pkl', 'models/ensemble_xgb.pkl', 'models/ensemble_lgb.pkl',
'models/ensemble_scaler.pkl', 'models/ensemble_btcusd-live_metadata.json',
'models/position_rf.pkl', 'models/model_compat.json',
]
for f in model_files:
exists = os.path.exists(f)
print(f'[{\"OK\" if exists else \"MISS\"}] {f}')
# Risk model files
risk_files = ['models/risk_lgb.txt', 'models/risk_scaler.pkl', 'models/risk_metadata.json']
for f in risk_files:
exists = os.path.exists(f)
print(f'[{\"OK\" if exists else \"MISS (will fallback to mult=1.0)\"}] {f}')
"
Path B — With Claude Code
This path requires a Claude Code subscription. If you do not have one, Path A above covers everything — same results, more steps at the keyboard.
Claude Code agents read logs, make decisions, fix errors mid-run, and iterate without you watching. You delegate a goal and get a Telegram notification when it finishes.
When Score degrades — agent-driven optimization
TASK="Run 1 full optimization cycle: SHAP analysis, Optuna tune (50 trials each model), retrain, OOS backtest. Report Score before and after. Do NOT drop any feature from PROTECTED_FEATURES. If Score improves, send Telegram notification."
./scripts/run_agent.sh novosky-optimizer "$TASK"
Adding a new feature
TASK="Add a new feature called 'volume_zscore_20' to the feature set. It should be the z-score of volume over a 20-bar rolling window: (volume - volume.rolling(20).mean()) / volume.rolling(20).std(). Implement in ml/feature_engineering.py and ml_config.json (append at end of features list). Enforce the Three-File Rule. Retrain from scratch with --no-warmstart. Run OOS backtest and compare Score vs baseline. Commit and push to R2 only if OOS Score improves by >= 2%."
./scripts/run_agent.sh feature-engineer "$TASK"
Risk/config parameter sweep without retraining
TASK="Run a config sweep for Profile 3 Balanced without retraining: test risk 1.5-2.0% and confidence 0.60-0.65. Use --skip-retrain flag. Find the combo with highest OOS Score, apply it to config.json. Run a final backtest to confirm. Report before/after and send Telegram."
./scripts/run_agent.sh risk-tuner "$TASK"
SHAP audit — flag candidates for removal
TASK="Run SHAP analysis. List the 5 lowest-importance features that are NOT in PROTECTED_FEATURES (is_news_near, is_london_session, is_ny_session, is_asian_session, is_news_risk_window, news_minutes_away, news_count_today, is_news_risk_window). For each candidate: explain what it measures, its current SHAP value, and whether it is a plausible removal candidate. Do not drop or retrain — just report."
./scripts/run_agent.sh novosky-optimizer "$TASK"
Multi-iteration optimization loop
TASK="Run 3 optimization iterations back-to-back. Each iteration: Optuna tune → retrain → OOS backtest. After each iteration compare Score vs previous. Stop early if Score has not improved in 2 consecutive iterations. At the end: push best models to R2, commit config changes, send Telegram with iteration-by-iteration Score history."
./scripts/run_agent.sh novosky-optimizer "$TASK"
Investigate a live performance drop
TASK="The bot's live win rate has dropped below 55% over the last 30 trades. Diagnose why: 1) Run OOS backtest with current models and config. 2) Check blocked_signals.csv for the dominant block reason. 3) Run SHAP to see if any top features have degraded. 4) Propose the single highest-impact fix (config change or retrain). Do not apply any fix — just diagnose and report."
./scripts/run_agent.sh novosky-optimizer "$TASK"
Comparison: Path A vs Path B
| Task | Path A (pure CLI) | Path B (Claude Code) |
|---|
| Config sweep | weekly_optimize.py --skip-retrain | run_agent.sh risk-tuner "..." |
| Full retrain + sweep | weekly_optimize.py | run_agent.sh novosky-optimizer "..." |
| Add feature | Edit 2 files + retrain manually | Delegate with TASK description |
| SHAP audit | train_ml_model.py --shap-only + manual read | Delegate — agent interprets results |
| Multi-iteration loop | Run weekly_optimize.py repeatedly | Delegate — agent decides when to stop |
| Error mid-run | You read logs and fix | Agent reads logs and retries |
| Time at keyboard | High | Low |
| Requires subscription | No | Yes (Claude Code) |
Decision tree: what to do when
Live WR below 55%?
└─ Check blocked_signals.csv — what is the top block reason?
├─ "circuit_breaker" → Too many consecutive losses. Wait for daily reset.
├─ "session_conf" → Confidence threshold too high. Run --skip-retrain sweep.
├─ "atr_floor" → Market is choppy / ranging. Expected during low-vol periods.
└─ No dominant block, but WR low → Market regime shifted. Full retrain needed.
OOS Score dropped below 4.0?
└─ Run full weekly_optimize.py manually (--no-commit --no-push first to review).
Score not improving after 3 weekly runs in a row?
└─ Consider adding a new feature. Run SHAP first to see what the model leans on.
Hard halt fired?
└─ Update risk_profile.starting_balance_usd to current equity.
└─ Optionally lower profile: --profile conservative --skip-retrain.
└─ Restart trading.py.
Model compat mismatch error at startup?
└─ python ml/r2_hub.py --pull (re-download matching model set)
└─ If that fails: python train_ml_model.py --ensemble --position --no-warmstart