NOVOSKY ships with three Claude Code agents in .claude/agents/. Each is specialized for a specific type of work and pre-loaded with the codebase context.
Which agent for which task
What do you want to do?
|
+-- "Check current performance / establish baseline"
| -> novosky-optimizer (Phase 0, read-only, ~5 min)
|
+-- "Improve without retraining (conf, risk%, SL/TP, filters)"
| -> risk-tuner (~30 min)
|
+-- "Add, remove, or redesign an ML feature"
| -> feature-engineer (requires retrain, ~60-90 min)
|
+-- "Run a full optimization cycle (tune + retrain + validate)"
| -> novosky-optimizer (--iterations 1, ~2 hrs)
|
+-- "Something is broken / model mismatch"
| -> See Troubleshooting
Agents
| Agent | Model | Best for |
|---|
novosky-optimizer | claude-opus-4-6 | Full loop: SHAP → tune → retrain → OOS validate. Multi-step ML reasoning. |
risk-tuner | claude-sonnet-4-6 | Sweep config params (confidence, risk%, SL/TP) without retraining. Fast and focused. |
feature-engineer | claude-opus-4-6 | Add/remove/redesign features with Three-File Rule enforcement. Requires retrain. |
Setup (once)
chmod +x scripts/run_agent.sh
How to invoke
Always use the TASK= variable pattern. Never inline backslash continuation — a blank line between \ and the prompt silently passes an empty task.
# Standard invocation
TASK="your task description here"
./scripts/run_agent.sh novosky-optimizer "$TASK"
./scripts/run_agent.sh risk-tuner "$TASK"
./scripts/run_agent.sh feature-engineer "$TASK"
# Resume last session
./scripts/run_agent.sh novosky-optimizer "$TASK" --continue
# Named session
./scripts/run_agent.sh novosky-optimizer "$TASK" --name phase15
# Resume a named session later
./scripts/run_agent.sh novosky-optimizer "$TASK" --name phase15 --continue
run_agent.sh wraps the Claude CLI and streams tool calls, bash output, and responses live. Sessions are named and saved.
Production prompts
Copy-paste ready. Use these exact prompts — they include all required flags.
1. Assess current state — no changes (~5 min)
TASK="Run Phase 0 only: read optimize_best.json, print config snapshot (conf threshold, risk%, feature count, position model on/off), run OOS backtest: python backtest_config.py --balance 500 --no-swap --leverage 500 --spread 16.95 --oos-only --no-chart. Report WR, PF, MaxDD, Sharpe, Score=WR*PF/sqrt(MaxDD). No changes."
./scripts/run_agent.sh novosky-optimizer "$TASK"
2. Config sweep — no retrain (~30 min)
TASK="Sweep for VT Markets (balance=500, leverage=500, spread=16.95, no-swap, max-lot=1.0): sweep confidence_threshold from 0.50 to 0.62 in steps of 0.02, then risk_percent from 3 to 8 in steps of 1. After each change run: python backtest_config.py --balance 500 --no-swap --leverage 500 --spread 16.95 --oos-only --max-lot 1.0 --no-chart. Set the best combo and apply it. Then send a Telegram notification -- explain what was swept, what the best config is, what score improved from/to, and whether MaxDD target is now met."
./scripts/run_agent.sh risk-tuner "$TASK"
3. One optimization iteration (~2 hrs)
TASK="Fix tech debt first (TD-1 position label mismatch if pos_opt enabled, TD-4 if MaxDD>25%, TD-5 SHAP for news_surprise/bars_since_news). Then: python scripts/optimize_loop.py --iterations 1 --trials 50 --drop-threshold 0.0 --improvement-threshold 0.02. Do not remove/reorder features in ml_config.json. When done send a Telegram notification -- explain the outcome (improved/reverted), what score changed from/to, what specifically changed, which tech debt was addressed, and what the next priority is."
./scripts/run_agent.sh novosky-optimizer "$TASK"
4. Continue previous session (2 more iterations)
TASK="Run 2 more optimization iterations without dropping features: python scripts/optimize_loop.py --iterations 2 --trials 50 --drop-threshold 0.0 --improvement-threshold 0.02. Keep ml_config.json features unchanged. Report per-iteration score and all-time best. When done send a Telegram notification summarising both iterations -- what improved vs reverted, the net score change, and the recommended next action."
./scripts/run_agent.sh novosky-optimizer "$TASK" --continue
5. Feature audit — diagnostics only (no removal)
TASK="Audit all features by SHAP from models/shap_summary.json. Identify low-importance features (for monitoring only), especially news_surprise and bars_since_news, but DO NOT remove any feature from ml_config.json. Keep full feature set intact, retrain only for parameter/labeling updates if needed, and run OOS backtest. When done send a Telegram notification -- explain what was learned from SHAP, what was tuned, and the score change."
./scripts/run_agent.sh feature-engineer "$TASK"
6. Full hands-off pipeline (~6 hrs)
TASK="Run the full NOVOSKY optimization pipeline. Phase 0: baseline OOS backtest. Phase 1: fix tech debt TD-1, TD-4, TD-5. Phase 2: python scripts/optimize_loop.py --iterations 3 --trials 50 --drop-threshold 0.0 --improvement-threshold 0.02. Keep the full ml_config.json feature list unchanged. Phase 3: final OOS backtest + python trading.py --dry. Phase 4: document results in strategy_params.json. Target WR>=60% PF>=2.5 MaxDD<=20% Score>=0.35. When done send a Telegram notification -- cover all 3 iterations (improved/reverted each), net score progression, which targets are now met vs still missing, and the single most important next action."
./scripts/run_agent.sh novosky-optimizer "$TASK"
Optimization priority order
Run in this order — later steps are slower and more expensive:
- Read state —
novosky-optimizer Phase 0, no changes (~5 min)
- Config-only —
risk-tuner confidence/risk/filters sweep (~30 min)
- Feature audit —
feature-engineer if loop keeps reverting (~90 min)
- Full loop —
novosky-optimizer 1–3 iterations (~2 hrs each)
After optimization
# Verify dry-run passes
python trading.py --dry
# Push models to Cloudflare R2 (auto-tags v{YYYYMMDD})
python ml/r2_hub.py --push
# Commit config changes (model binaries are gitignored -- they live on R2)
git add config.json ml_config.json strategy_params.json
git commit -m "feat: Phase N -- WR=XX% PF=X.XX MaxDD=XX.X% Score=X.XX"
# Go live
python trading.py
Monitoring a running session
Open a second terminal while the agent runs:
# Watch optimization log update live
watch -n 10 'cat models/optimize_log.json 2>/dev/null | python3 -m json.tool | tail -40'
# Confirm retrain happened (model files change)
watch -n 30 'ls -lh models/*.pkl | awk "{print \$5, \$6, \$7, \$9}"'
# Tail the agent run log
tail -f logs/agent_runs/<latest>.log
# Watch retrain progress by file timestamp updates
watch -n 60 'ls -lh models/*.pkl | awk "{print \$6, \$7, \$8, \$9}"'
Retrain doesn’t print back to the agent session in real time. If the agent goes quiet mid-run, check whether Python is actually running: ps aux | grep python | grep -v grep. The log file captures everything even when the terminal is silent.
Recovery
Session crashed mid-retrain
# Check model consistency
python3 -c "
import json
mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
ok = mc['feature_count'] == len(ml['features'])
print('OK' if ok else 'MISMATCH -- restore snapshot')
"
# List snapshots
ls models/_snapshot_*/
# Restore a snapshot
SNAP=models/_snapshot_<tag>
cp $SNAP/ml_config.json .
cp $SNAP/*.pkl models/
python trading.py --dry
Feature-count mismatch after partial run
If an optimizer drops features from ml_config.json but the retrain doesn’t complete, deployed models still expect the original count. Fix by restoring from the compat manifest:
python3 -c "
import json
mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
ml['features'] = mc['features']
with open('ml_config.json', 'w') as f: json.dump(ml, f, indent=2)
print(f'Restored to {mc[\"feature_count\"]} features.')
"
optimize_loop.py runs this check automatically at the start of every iteration.
SHAP analysis only
# Fast -- no training, writes models/shap_summary.json
python train_ml_model.py --shap-only
# Inspect bottom features (diagnostics only -- never drop based on SHAP alone)
python3 -c "
import json
s = json.load(open('models/shap_summary.json'))
ranked = s.get('ranked_features', [])
protected = {'is_news_near','news_minutes_away','news_count_today',
'is_news_risk_window','is_london_session','is_ny_session','is_asian_session'}
print(f'Total features: {len(ranked)}')
for r in sorted(ranked, key=lambda x: x.get('mean_abs_shap', 0))[:15]:
marker = ' <- PROTECTED' if r['feature'] in protected else ''
print(f' {r.get(\"mean_abs_shap\", 0):.5f} {r[\"feature\"]}{marker}')
"
Technical notes
run_agent.sh automatically adds --output-format stream-json --verbose (required for claude -p)
- Unicode box-drawing characters inside Python f-strings in shell heredocs cause
SyntaxError — use plain ASCII in generated code
- Agents have full tool access: read, write, bash, grep, glob
- Never drop features from
ml_config.json during optimization — use --drop-threshold 0.0 always. SHAP is for diagnostics, not removal.