User journey: Developer

This guide covers how to push NOVOSKY’s performance beyond what the automated weekly pipeline achieves. There are two versions: one that uses only standard Python CLI commands (no subscription required), and one that uses Claude Code agents for faster iteration. Both paths lead to the same place — better models, higher OOS Score, lower drawdown.

Responsibility split: developers run optimization/retraining and publish approved model revisions to R2. Regular users only pull approved revisions and run trading with profiles 1-5.

Normal weekly rhythm (both paths)

Sunday     Cron fires automatically — you sleep
Monday     Read Telegram, check if Score improved
Wednesday  2-minute spot check
As needed  Manual optimization when performance degrades

After each accepted run, publish and announce profile-specific revisions so traders can pull the correct tag (for example vYYYYMMDD-p1 to vYYYYMMDD-p5).

Monday check

# Last optimize result
python3 -c "
import json
b = json.load(open('models/optimize_best.json'))
print('Best ever Score :', b['score'])
print('Profile         :', b.get('profile_name', 'unknown'))
print('Achieved        :', b['achieved_at'][:10])
"

# Live trade stats (last 50 trades)
python scripts/performance_monitor.py --lookback 50

Wednesday spot check

pm2 status
python scripts/performance_monitor.py --lookback 20

Path A — Pure CLI (no external AI tools)

Everything below runs with standard Python. No subscriptions or external accounts beyond what you already have (MT5, R2, Telegram).

When Score degrades — standard optimization cycle

# 1. Establish baseline before touching anything
python backtest_config.py \
  --balance 500 --no-swap --leverage 500 \
  --spread 16.95 --oos-only --no-chart

# 2. Quick sweep without retraining (~45 min)
#    Tests 40-50 config combos within your current profile's ranges
python scripts/weekly_optimize.py \
  --skip-retrain \
  --no-commit --no-push \
  --profile balanced

# 3. If sweep didn't fix it — full retrain + sweep (~2.5 h)
python scripts/weekly_optimize.py \
  --no-commit --no-push \
  --profile balanced

# 4. Review the backtest result, then commit if happy
python backtest_config.py \
  --balance 500 --no-swap --leverage 500 \
  --spread 16.95 --oos-only --no-chart

git add config.json ml_config.json strategy_params.json PERF_HISTORY.md
git commit -m "feat: manual optimization — Score X.XX → Y.YY"
python ml/r2_hub.py --push

When profile-specific outputs differ, publish and communicate profile-tagged revisions for user pull commands.

Tune hyperparameters manually

# Signal model — Optuna, 50 trials (~30 min)
python ml/tune/hyperparams.py --trials 50

# Position model — Optuna, 50 trials (~20 min)
python ml/tune/position.py --trials 50

# Then retrain with the new hyperparams
python train_ml_model.py --ensemble --position --refresh --shap

Retrain options

# Incremental (warm-start from current models — faster)
python train_ml_model.py --ensemble --position --risk --refresh --shap

# From scratch (full cold start — use after removing features)
python train_ml_model.py --ensemble --position --risk --no-warmstart --refresh --shap

# Train risk model only (signal + position must already be on disk)
python train_ml_model.py --risk

# SHAP analysis only (diagnostic, no training)
python train_ml_model.py --shap-only

Adding a new feature (Three-File Rule)

The Three-File Rule: any feature change must touch ml/feature_engineering.py, ml_config.json, and backtest/ (verified, not necessarily edited) before retraining. There is also a second Three-File Rule for the risk model: ml/risk_trainer.py RISK_FEATURES must equal ml/risk_predictor.py RISK_FEATURES must equal ml_config.json → risk_model.features. These are equity-state features (separate from the 55 signal features) — never mix them up.

# Step 1: implement in ml/feature_engineering.py
#   Add calculation inside add_all_features()
#   The column must exist in the DataFrame before feature engineering returns

# Step 2: add to ml_config.json features list
#   Append the new feature name at the END of the "features" array
#   Do NOT reorder existing features — scaler dimension order is fixed

# Step 3: verify the count matches (will mismatch until retrain — that is expected)
python3 -c "
import json
ml = json.load(open('ml_config.json'))
cc = json.load(open('models/model_compat.json'))
print('ml_config features  :', len(ml['features']))
print('model_compat expects:', cc['feature_count'])
print('Delta               :', len(ml['features']) - cc['feature_count'])
"

# Step 4: retrain from scratch (new feature changes scaler dimensions)
#   Always include --risk to retrain the risk multiplier model alongside signal + position
python train_ml_model.py --ensemble --position --risk --no-warmstart --refresh --shap

# Step 5: OOS backtest — compare vs baseline
python backtest_config.py \
  --balance 500 --no-swap --leverage 500 \
  --spread 16.95 --oos-only --no-chart

# Step 6: if Score improved, push and commit
python ml/r2_hub.py --push
git add ml/feature_engineering.py ml_config.json \
        models/model_compat.json models/ensemble_btcusd-live_metadata.json \
        models/risk_metadata.json \
        strategy_params.json PERF_HISTORY.md
git commit -m "feat: add <feature_name> — OOS Score X.XX → Y.YY"

Never remove is_news_near, is_london_session, is_ny_session, is_asian_session, or is_news_risk_window from the feature list based on low SHAP values. These features are near-zero during training by design but are injected with live values at inference time.

Test a config change before committing

# Test with a temporary config — production config.json is never touched
python3 -c "
import json, copy, subprocess, sys, tempfile, os

cfg = json.load(open('config.json'))
cfg['dynamic_position_sizing']['risk_percent'] = 2.5   # the change you want to test

with tempfile.NamedTemporaryFile('w', suffix='.json', delete=False) as f:
    json.dump(cfg, f)
    tmp = f.name

r = subprocess.run([
    sys.executable, 'backtest_config.py',
    '--balance', '500', '--no-swap', '--leverage', '500',
    '--spread', '16.95', '--oos-only', '--no-chart',
    '--config', tmp,
], capture_output=True, text=True, cwd='.')

print(r.stdout[-3000:])
os.unlink(tmp)
"

Read SHAP results

# Run SHAP analysis
python train_ml_model.py --shap-only

# Read results
python3 -c "
import json
data = json.load(open('models/shap_summary.json'))
if 'ranked_features' in data:
    print('Top 10 features:')
    for f in data['ranked_features'][:10]:
        print(f'  {f[\"mean_abs_shap\"]:.4f}  {f[\"feature\"]}')
    print()
    print('Bottom 10 features (never drop without careful review):')
    for f in data['ranked_features'][-10:]:
        print(f'  {f[\"mean_abs_shap\"]:.5f}  {f[\"feature\"]}')
"

Compare two configs head-to-head

python scripts/compare_configs.py \
  --old-config config_backup.json \
  --old-ml ml_config_backup.json

Full health check

python3 -c "
import json, os

# Feature integrity
mc = json.load(open('ml_config.json'))
cc = json.load(open('models/model_compat.json'))
feat_ok = cc['feature_count'] == len(mc['features'])
print(f'[{\"OK\" if feat_ok else \"FAIL\"}] Features: {len(mc[\"features\"])} (expect {cc[\"feature_count\"]})')

# Config sanity
cfg = json.load(open('config.json'))
risk  = cfg['dynamic_position_sizing']['risk_percent']
cb    = cfg['max_consecutive_losses']
total = cfg.get('max_total_drawdown_pct', 0)
wkly  = cfg.get('max_weekly_drawdown_pct', 0)
rm    = cfg.get('risk_model', {}).get('enabled', False)
print(f'[OK] Risk: {risk}%  CB: {cb}  Weekly DD: {wkly}%  Total halt: {total}%')
print(f'[OK] Risk model enabled: {rm}')

# Signal + position model files
model_files = [
    'models/ensemble_rf.pkl', 'models/ensemble_xgb.pkl', 'models/ensemble_lgb.pkl',
    'models/ensemble_scaler.pkl', 'models/ensemble_btcusd-live_metadata.json',
    'models/position_rf.pkl', 'models/model_compat.json',
]
for f in model_files:
    exists = os.path.exists(f)
    print(f'[{\"OK\" if exists else \"MISS\"}] {f}')

# Risk model files
risk_files = ['models/risk_lgb.txt', 'models/risk_scaler.pkl', 'models/risk_metadata.json']
for f in risk_files:
    exists = os.path.exists(f)
    print(f'[{\"OK\" if exists else \"MISS (will fallback to mult=1.0)\"}] {f}')
"

Path B — With Claude Code

This path requires a Claude Code subscription. If you do not have one, Path A above covers everything — same results, more steps at the keyboard.

Claude Code agents read logs, make decisions, fix errors mid-run, and iterate without you watching. You delegate a goal and get a Telegram notification when it finishes.

When Score degrades — agent-driven optimization

TASK="Run 1 full optimization cycle: SHAP analysis, Optuna tune (50 trials each model), retrain, OOS backtest. Report Score before and after. Do NOT drop any feature from PROTECTED_FEATURES. If Score improves, send Telegram notification."
./scripts/run_agent.sh novosky-optimizer "$TASK"

Adding a new feature

TASK="Add a new feature called 'volume_zscore_20' to the feature set. It should be the z-score of volume over a 20-bar rolling window: (volume - volume.rolling(20).mean()) / volume.rolling(20).std(). Implement in ml/feature_engineering.py and ml_config.json (append at end of features list). Enforce the Three-File Rule. Retrain from scratch with --no-warmstart. Run OOS backtest and compare Score vs baseline. Commit and push to R2 only if OOS Score improves by >= 2%."
./scripts/run_agent.sh feature-engineer "$TASK"

Risk/config parameter sweep without retraining

TASK="Run a config sweep for Profile 3 Balanced without retraining: test risk 1.5-2.0% and confidence 0.60-0.65. Use --skip-retrain flag. Find the combo with highest OOS Score, apply it to config.json. Run a final backtest to confirm. Report before/after and send Telegram."
./scripts/run_agent.sh risk-tuner "$TASK"

SHAP audit — flag candidates for removal

TASK="Run SHAP analysis. List the 5 lowest-importance features that are NOT in PROTECTED_FEATURES (is_news_near, is_london_session, is_ny_session, is_asian_session, is_news_risk_window, news_minutes_away, news_count_today, is_news_risk_window). For each candidate: explain what it measures, its current SHAP value, and whether it is a plausible removal candidate. Do not drop or retrain — just report."
./scripts/run_agent.sh novosky-optimizer "$TASK"

Multi-iteration optimization loop

TASK="Run 3 optimization iterations back-to-back. Each iteration: Optuna tune → retrain → OOS backtest. After each iteration compare Score vs previous. Stop early if Score has not improved in 2 consecutive iterations. At the end: push best models to R2, commit config changes, send Telegram with iteration-by-iteration Score history."
./scripts/run_agent.sh novosky-optimizer "$TASK"

Investigate a live performance drop

TASK="The bot's live win rate has dropped below 55% over the last 30 trades. Diagnose why: 1) Run OOS backtest with current models and config. 2) Check blocked_signals.csv for the dominant block reason. 3) Run SHAP to see if any top features have degraded. 4) Propose the single highest-impact fix (config change or retrain). Do not apply any fix — just diagnose and report."
./scripts/run_agent.sh novosky-optimizer "$TASK"

Comparison: Path A vs Path B

Task	Path A (pure CLI)	Path B (Claude Code)
Config sweep	`weekly_optimize.py --skip-retrain`	`run_agent.sh risk-tuner "..."`
Full retrain + sweep	`weekly_optimize.py`	`run_agent.sh novosky-optimizer "..."`
Add feature	Edit 2 files + retrain manually	Delegate with TASK description
SHAP audit	`train_ml_model.py --shap-only` + manual read	Delegate — agent interprets results
Multi-iteration loop	Run `weekly_optimize.py` repeatedly	Delegate — agent decides when to stop
Error mid-run	You read logs and fix	Agent reads logs and retries
Time at keyboard	High	Low
Requires subscription	No	Yes (Claude Code)

Decision tree: what to do when

Live WR below 55%?
  └─ Check blocked_signals.csv — what is the top block reason?
       ├─ "circuit_breaker" → Too many consecutive losses. Wait for daily reset.
       ├─ "session_conf"    → Confidence threshold too high. Run --skip-retrain sweep.
       ├─ "atr_floor"       → Market is choppy / ranging. Expected during low-vol periods.
       └─ No dominant block, but WR low → Market regime shifted. Full retrain needed.

OOS Score dropped below 4.0?
  └─ Run full weekly_optimize.py manually (--no-commit --no-push first to review).

Score not improving after 3 weekly runs in a row?
  └─ Consider adding a new feature. Run SHAP first to see what the model leans on.

Hard halt fired?
  └─ Update risk_profile.starting_balance_usd to current equity.
  └─ Optionally lower profile: --profile conservative --skip-retrain.
  └─ Restart trading.py.

Model compat mismatch error at startup?
  └─ python ml/r2_hub.py --pull   (re-download matching model set)
  └─ If that fails: python train_ml_model.py --ensemble --position --no-warmstart

User journeys

Launch

Optimization

User journey: Developer

Normal weekly rhythm (both paths)

Monday check

Wednesday spot check

Path A — Pure CLI (no external AI tools)

When Score degrades — standard optimization cycle

Tune hyperparameters manually

Retrain options

Adding a new feature (Three-File Rule)

Test a config change before committing

Read SHAP results

Compare two configs head-to-head

Full health check

Path B — With Claude Code

When Score degrades — agent-driven optimization

Adding a new feature

Risk/config parameter sweep without retraining

SHAP audit — flag candidates for removal

Multi-iteration optimization loop

Investigate a live performance drop

Comparison: Path A vs Path B

Decision tree: what to do when

User journeys

Launch

Optimization

​Normal weekly rhythm (both paths)

​Monday check

​Wednesday spot check

​Path A — Pure CLI (no external AI tools)

​When Score degrades — standard optimization cycle

​Tune hyperparameters manually

​Retrain options

​Adding a new feature (Three-File Rule)

​Test a config change before committing

​Read SHAP results

​Compare two configs head-to-head

​Full health check

​Path B — With Claude Code

​When Score degrades — agent-driven optimization

​Adding a new feature

​Risk/config parameter sweep without retraining

​SHAP audit — flag candidates for removal

​Multi-iteration optimization loop

​Investigate a live performance drop

​Comparison: Path A vs Path B

​Decision tree: what to do when

Normal weekly rhythm (both paths)

Monday check

Wednesday spot check

Path A — Pure CLI (no external AI tools)

When Score degrades — standard optimization cycle

Tune hyperparameters manually

Retrain options

Adding a new feature (Three-File Rule)

Test a config change before committing

Read SHAP results

Compare two configs head-to-head

Full health check

Path B — With Claude Code

When Score degrades — agent-driven optimization

Adding a new feature

Risk/config parameter sweep without retraining

SHAP audit — flag candidates for removal

Multi-iteration optimization loop

Investigate a live performance drop

Comparison: Path A vs Path B

Decision tree: what to do when