Claude AI agents

NOVOSKY ships with three Claude Code agents in .claude/agents/. Each is specialized for a specific type of work and pre-loaded with the codebase context.

Which agent for which task

What do you want to do?
|
+-- "Check current performance / establish baseline"
|     -> novosky-optimizer (Phase 0, read-only, ~5 min)
|
+-- "Improve without retraining (conf, risk%, SL/TP, filters)"
|     -> risk-tuner (~30 min)
|
+-- "Add, remove, or redesign an ML feature"
|     -> feature-engineer (requires retrain, ~60-90 min)
|
+-- "Run a full optimization cycle (tune + retrain + validate)"
|     -> novosky-optimizer (--iterations 1, ~2 hrs)
|
+-- "Something is broken / model mismatch"
|     -> See Troubleshooting

Agents

Agent	Model	Best for
`novosky-optimizer`	claude-opus-4-6	Full loop: SHAP → tune → retrain → OOS validate. Multi-step ML reasoning.
`risk-tuner`	claude-sonnet-4-6	Sweep config params (confidence, risk%, SL/TP) without retraining. Fast and focused.
`feature-engineer`	claude-opus-4-6	Add/remove/redesign features with Three-File Rule enforcement. Requires retrain.

Setup (once)

chmod +x scripts/run_agent.sh

How to invoke

Always use the TASK= variable pattern. Never inline backslash continuation — a blank line between \ and the prompt silently passes an empty task.

# Standard invocation
TASK="your task description here"
./scripts/run_agent.sh novosky-optimizer "$TASK"
./scripts/run_agent.sh risk-tuner "$TASK"
./scripts/run_agent.sh feature-engineer "$TASK"

# Resume last session
./scripts/run_agent.sh novosky-optimizer "$TASK" --continue

# Named session
./scripts/run_agent.sh novosky-optimizer "$TASK" --name phase15

# Resume a named session later
./scripts/run_agent.sh novosky-optimizer "$TASK" --name phase15 --continue

run_agent.sh wraps the Claude CLI and streams tool calls, bash output, and responses live. Sessions are named and saved.

Production prompts

Copy-paste ready. Use these exact prompts — they include all required flags.

1. Assess current state — no changes (~5 min)

TASK="Run Phase 0 only: read optimize_best.json, print config snapshot (conf threshold, risk%, feature count, position model on/off), run OOS backtest: python backtest_config.py --balance 500 --no-swap --leverage 500 --spread 16.95 --oos-only --no-chart. Report WR, PF, MaxDD, Sharpe, Score=WR*PF/sqrt(MaxDD). No changes."
./scripts/run_agent.sh novosky-optimizer "$TASK"

2. Config sweep — no retrain (~30 min)

TASK="Sweep for VT Markets (balance=500, leverage=500, spread=16.95, no-swap, max-lot=1.0): sweep confidence_threshold from 0.50 to 0.62 in steps of 0.02, then risk_percent from 3 to 8 in steps of 1. After each change run: python backtest_config.py --balance 500 --no-swap --leverage 500 --spread 16.95 --oos-only --max-lot 1.0 --no-chart. Set the best combo and apply it. Then send a Telegram notification -- explain what was swept, what the best config is, what score improved from/to, and whether MaxDD target is now met."
./scripts/run_agent.sh risk-tuner "$TASK"

3. One optimization iteration (~2 hrs)

TASK="Fix tech debt first (TD-1 position label mismatch if pos_opt enabled, TD-4 if MaxDD>25%, TD-5 SHAP for news_surprise/bars_since_news). Then: python scripts/optimize_loop.py --iterations 1 --trials 50 --drop-threshold 0.0 --improvement-threshold 0.02. Do not remove/reorder features in ml_config.json. When done send a Telegram notification -- explain the outcome (improved/reverted), what score changed from/to, what specifically changed, which tech debt was addressed, and what the next priority is."
./scripts/run_agent.sh novosky-optimizer "$TASK"

4. Continue previous session (2 more iterations)

TASK="Run 2 more optimization iterations without dropping features: python scripts/optimize_loop.py --iterations 2 --trials 50 --drop-threshold 0.0 --improvement-threshold 0.02. Keep ml_config.json features unchanged. Report per-iteration score and all-time best. When done send a Telegram notification summarising both iterations -- what improved vs reverted, the net score change, and the recommended next action."
./scripts/run_agent.sh novosky-optimizer "$TASK" --continue

5. Feature audit — diagnostics only (no removal)

TASK="Audit all features by SHAP from models/shap_summary.json. Identify low-importance features (for monitoring only), especially news_surprise and bars_since_news, but DO NOT remove any feature from ml_config.json. Keep full feature set intact, retrain only for parameter/labeling updates if needed, and run OOS backtest. When done send a Telegram notification -- explain what was learned from SHAP, what was tuned, and the score change."
./scripts/run_agent.sh feature-engineer "$TASK"

6. Full hands-off pipeline (~6 hrs)

TASK="Run the full NOVOSKY optimization pipeline. Phase 0: baseline OOS backtest. Phase 1: fix tech debt TD-1, TD-4, TD-5. Phase 2: python scripts/optimize_loop.py --iterations 3 --trials 50 --drop-threshold 0.0 --improvement-threshold 0.02. Keep the full ml_config.json feature list unchanged. Phase 3: final OOS backtest + python trading.py --dry. Phase 4: document results in strategy_params.json. Target WR>=60% PF>=2.5 MaxDD<=20% Score>=0.35. When done send a Telegram notification -- cover all 3 iterations (improved/reverted each), net score progression, which targets are now met vs still missing, and the single most important next action."
./scripts/run_agent.sh novosky-optimizer "$TASK"

Optimization priority order

Run in this order — later steps are slower and more expensive:

Read state — novosky-optimizer Phase 0, no changes (~5 min)
Config-only — risk-tuner confidence/risk/filters sweep (~30 min)
Feature audit — feature-engineer if loop keeps reverting (~90 min)
Full loop — novosky-optimizer 1–3 iterations (~2 hrs each)

After optimization

# Verify dry-run passes
python trading.py --dry

# Push models to Cloudflare R2 (auto-tags v{YYYYMMDD})
python ml/r2_hub.py --push

# Commit config changes (model binaries are gitignored -- they live on R2)
git add config.json ml_config.json strategy_params.json
git commit -m "feat: Phase N -- WR=XX% PF=X.XX MaxDD=XX.X% Score=X.XX"

# Go live
python trading.py

Monitoring a running session

Open a second terminal while the agent runs:

# Watch optimization log update live
watch -n 10 'cat models/optimize_log.json 2>/dev/null | python3 -m json.tool | tail -40'

# Confirm retrain happened (model files change)
watch -n 30 'ls -lh models/*.pkl | awk "{print \$5, \$6, \$7, \$9}"'

# Tail the agent run log
tail -f logs/agent_runs/<latest>.log

# Watch retrain progress by file timestamp updates
watch -n 60 'ls -lh models/*.pkl | awk "{print \$6, \$7, \$8, \$9}"'

Retrain doesn’t print back to the agent session in real time. If the agent goes quiet mid-run, check whether Python is actually running: ps aux | grep python | grep -v grep. The log file captures everything even when the terminal is silent.

Recovery

Session crashed mid-retrain

# Check model consistency
python3 -c "
import json
mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
ok = mc['feature_count'] == len(ml['features'])
print('OK' if ok else 'MISMATCH -- restore snapshot')
"

# List snapshots
ls models/_snapshot_*/

# Restore a snapshot
SNAP=models/_snapshot_<tag>
cp $SNAP/ml_config.json .
cp $SNAP/*.pkl models/
python trading.py --dry

Feature-count mismatch after partial run

If an optimizer drops features from ml_config.json but the retrain doesn’t complete, deployed models still expect the original count. Fix by restoring from the compat manifest:

python3 -c "
import json
mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
ml['features'] = mc['features']
with open('ml_config.json', 'w') as f: json.dump(ml, f, indent=2)
print(f'Restored to {mc[\"feature_count\"]} features.')
"

optimize_loop.py runs this check automatically at the start of every iteration.

SHAP analysis only

# Fast -- no training, writes models/shap_summary.json
python train_ml_model.py --shap-only

# Inspect bottom features (diagnostics only -- never drop based on SHAP alone)
python3 -c "
import json
s = json.load(open('models/shap_summary.json'))
ranked = s.get('ranked_features', [])
protected = {'is_news_near','news_minutes_away','news_count_today',
             'is_news_risk_window','is_london_session','is_ny_session','is_asian_session'}
print(f'Total features: {len(ranked)}')
for r in sorted(ranked, key=lambda x: x.get('mean_abs_shap', 0))[:15]:
    marker = ' <- PROTECTED' if r['feature'] in protected else ''
    print(f'  {r.get(\"mean_abs_shap\", 0):.5f}  {r[\"feature\"]}{marker}')
"

Technical notes

run_agent.sh automatically adds --output-format stream-json --verbose (required for claude -p)
Unicode box-drawing characters inside Python f-strings in shell heredocs cause SyntaxError — use plain ASCII in generated code
Agents have full tool access: read, write, bash, grep, glob
Never drop features from ml_config.json during optimization — use --drop-threshold 0.0 always. SHAP is for diagnostics, not removal.

Rules

Workflow

Which agent for which task

Agents

Setup (once)

How to invoke

Production prompts

1. Assess current state — no changes (~5 min)

2. Config sweep — no retrain (~30 min)

3. One optimization iteration (~2 hrs)

4. Continue previous session (2 more iterations)

5. Feature audit — diagnostics only (no removal)

6. Full hands-off pipeline (~6 hrs)

Optimization priority order

After optimization

Monitoring a running session

Recovery

Session crashed mid-retrain

Feature-count mismatch after partial run

SHAP analysis only

Technical notes

Rules

Workflow

​Which agent for which task

​Agents

​Setup (once)

​How to invoke

​Production prompts

​1. Assess current state — no changes (~5 min)

​2. Config sweep — no retrain (~30 min)

​3. One optimization iteration (~2 hrs)

​4. Continue previous session (2 more iterations)

​5. Feature audit — diagnostics only (no removal)

​6. Full hands-off pipeline (~6 hrs)

​Optimization priority order

​After optimization

​Monitoring a running session

​Recovery

​Session crashed mid-retrain

​Feature-count mismatch after partial run

​SHAP analysis only

​Technical notes

Which agent for which task

Agents

Setup (once)

How to invoke

Production prompts

1. Assess current state — no changes (~5 min)

2. Config sweep — no retrain (~30 min)

3. One optimization iteration (~2 hrs)

4. Continue previous session (2 more iterations)

5. Feature audit — diagnostics only (no removal)

6. Full hands-off pipeline (~6 hrs)

Optimization priority order

After optimization

Monitoring a running session

Recovery

Session crashed mid-retrain

Feature-count mismatch after partial run

SHAP analysis only

Technical notes