Skip to main content

Quick health check

Run all checks at once:
python3 -c "
import json, os

mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
cfg = json.load(open('config.json'))

feat_ok = mc['feature_count'] == len(ml['features'])
print(f'[{\"OK\" if feat_ok else \"FAIL\"}] Feature count: ml_config={len(ml[\"features\"])}, compat={mc[\"feature_count\"]}')

model_files = ['ensemble_rf.pkl','ensemble_xgb.pkl','ensemble_lgb.pkl','ensemble_scaler.pkl',
               'position_rf.pkl','position_xgb.pkl','position_lgb.pkl','position_scaler.pkl']
all_present = all(os.path.exists(f'models/{f}') for f in model_files)
print(f'[{\"OK\" if all_present else \"FAIL\"}] Model files present')

print(f'[OK] conf={ml[\"prediction\"][\"confidence_threshold\"]} risk={cfg[\"dynamic_position_sizing\"][\"risk_percent\"]}%')
print(f'[OK] SL={cfg[\"dynamic_sltp\"][\"sl_atr_multiplier\"]}xATR TP={cfg[\"dynamic_sltp\"][\"tp_atr_multiplier\"]}xATR')
"

1. Feature count mismatch — scaler dimension error

Symptom: Bot crashes with scaler dimension error, or OOS results look wildly wrong. Diagnose:
python3 -c "
import json
mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
print(f'model_compat expects : {mc[\"feature_count\"]}')
print(f'ml_config has        : {len(ml[\"features\"])}')
print('OK' if mc['feature_count'] == len(ml['features']) else 'MISMATCH')
"
Fix — restore from compat manifest:
python3 -c "
import json, shutil
mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
ml['features'] = mc['features']
with open('ml_config.json', 'w') as f: json.dump(ml, f, indent=2)
print(f'Restored to {mc[\"feature_count\"]} features.')
"
Root cause: An optimizer dropped features from ml_config.json without completing the retrain. The deployed models still expect the original feature set.

2. Backtest produces zero trades

Symptom: backtest_config.py exits with 0 trades or No qualifying signals. Check in order:
# Is confidence threshold too high? (> 0.62 can produce zero trades)
python3 -c "import json; ml=json.load(open('ml_config.json')); print('conf:', ml['prediction']['confidence_threshold'])"

# Is min_atr_threshold too high? (M15 ATR is ~$50-200, threshold should be <= 20)
python3 -c "import json; ml=json.load(open('ml_config.json')); print('min_atr:', ml['risk_management']['min_atr_threshold'])"

# Is OOS window too short? (need at least 30 days after cutoff)
python3 -c "
import json
meta = json.load(open('models/ensemble_btcusd-live_metadata.json'))
print('train_cutoff_date:', meta.get('train_cutoff_date', 'not set'))
"

2b. Strict config sync check fails

Symptom: weekly_optimize.py, optimize_loop.py, or onboarding.py exits early with a generated-config drift error when NOVOSKY_STRICT_CONFIG_SYNC=1 is set. Diagnose:
python scripts/config_sync.py --check
Fix drift:
python scripts/config_sync.py --sync
python scripts/config_sync.py --check
Root cause: root configs were edited into an invalid state, or required canonical files are missing.

3. ONNX / model load error

Symptom: ModelNotFoundError, ONNX Runtime error, or pkl load failed at startup.
# Check what model files exist
ls -lh models/ensemble_*.pkl models/ensemble_*.onnx models/position_*.pkl 2>/dev/null

# Pull latest models from R2
python ml/r2_hub.py --pull

# Roll back to a specific version
python ml/r2_hub.py --pull --revision v20260414

# Verify model_compat.json is present (written after every successful train)
ls -l models/model_compat.json
If model_compat.json is missing, models were not trained together — do NOT go live. Retrain:
python scripts/retrain.py --local

4. trading.py —dry fails immediately

Symptom: Dry run exits with error before printing any cycle output.
python trading.py --dry 2>&1 | head -60
Common causes:
  • API_URL not set → check .env has API_URL=http://<IP>:6542
  • Feature mismatch → fix with issue #1 above
  • Missing model file → python ml/r2_hub.py --pull

5. Retrain appears stuck or very slow

If retrain has no visible progress, run the local retrain command directly in a fresh terminal:
python scripts/retrain.py --refresh
If no GPU is available, the process will continue on CPU and can take longer.

6. optimize_loop.py keeps reverting every iteration

Symptom: Every iteration reverts — score never improves.
# Read last 3 iterations
python3 -c "
import json
log = json.load(open('models/optimize_log.json'))
for e in log[-3:]:
    print(e.get('iteration'), e.get('decision'), 'score:', e.get('final_score','?'))
"
Actions:
  1. Switch to risk-tuner — adjust conf/risk% without retraining
  2. Try different lookahead (48→36 bars) or atr_sl_mult (0.8→0.7)
  3. Refresh training data: python scripts/retrain.py --refresh

7. Session crashed mid-retrain

Symptom: Agent died during retrain — models may be inconsistent.
# Check consistency
python3 -c "
import json
mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
print('Feature count match:', mc['feature_count'] == len(ml['features']))
print('Trained at:', mc.get('trained_at','unknown'))
"

# Check if a snapshot exists
ls -d models/_snapshot_*/
To restore a snapshot:
SNAP=models/_snapshot_<tag>
cp $SNAP/ensemble_*.pkl $SNAP/ensemble_scaler.pkl \
   $SNAP/position_*.pkl $SNAP/position_scaler.pkl models/
cp $SNAP/ml_config.json .
python trading.py --dry

8. Position model EXIT firing too often

Symptom: High ML_Exit count (> 25% of trades), WR drops.
python3 -c "
import json
ml = json.load(open('ml_config.json'))
pm = ml.get('position_model', {}).get('prediction', {})
print('exit_threshold :', pm.get('exit_threshold'))
print('min_prob_diff  :', pm.get('min_prob_diff'))
"
Raise exit_threshold to 0.80 and min_prob_diff to 0.25 in ml_config.json, then re-run backtest to confirm. These are the Phase 15 tuned values.

9. Telegram not sending

# Test connectivity
python scripts/notify.py "test"

# Check .env
grep TELEGRAM .env
If keys are missing, add to .env:
TELEGRAM_TOKEN=<bot_token>
TELEGRAM_CHAT_ID=<chat_id>

10. Inflated WR in backtest (> 70% OOS)

Cause: Wrong --spread value. VT Markets BTCUSD = 16.95, IC Markets RAW = 3.0. Wrong spread inflates WR by 5–8pp. Always pass explicitly:
python backtest_config.py \
  --balance 500 --no-swap --leverage 500 \
  --spread 16.95 --oos-only --no-chart

11. PM2 / bot won’t start on reboot

# Check PM2 startup hook is registered
pm2 startup

# Check MT5 API is reachable
curl http://localhost:6542/ping

# If API is down, start Docker first
make up   # from mt5api/ directory

# Then start NOVOSKY
bash scripts/pm2-start.sh

# Tail logs
pm2 logs novosky --lines 50
First-time persistent setup:
bash scripts/pm2-start.sh --setup
# Run the printed 'sudo env PATH=...' command if prompted

12. Supabase open_positions shows rows, but broker has no open positions

Symptom: MT5 API /positions is empty, but Supabase open_positions still has old tickets. Most common cause: bot process is stopped/crashed, so heartbeat reconciliation is not running.
pm2 status
pm2 logs novosky --lines 120
After bot restart, heartbeat reconciliation removes stale rows automatically. Verify parity:
curl -sS -H "Authorization: Bearer $API_TOKEN" "$API_URL/positions?symbol=BTCUSD" | jq 'map(.ticket)'
curl -sS "$SUPABASE_URL/rest/v1/open_positions?bot_name=eq.$SUPABASE_BOT_NAME&select=ticket" \
  -H "apikey: $SUPABASE_KEY" -H "Authorization: Bearer $SUPABASE_KEY" | jq 'map(.ticket)'

13. Supabase trades is empty but account history has closed deals

Symptom: broker history/deals has exits, but trades has no rows. Check in order:
  1. Bot process was down when positions closed.
  2. Supabase credentials are missing/invalid.
  3. You are looking at the wrong SUPABASE_BOT_NAME namespace.
pm2 status
grep -E 'SUPABASE_URL|SUPABASE_KEY|SUPABASE_BOT_NAME' .env
If process downtime caused the gap, backfill closed deals once, then keep PM2 online.

14. news_events duplicates or re-sync behavior

news_events is deduplicated by key (bot_name, title, event_at).
  • Re-pushing the same weekly FF payload does not create duplicate rows.
  • Existing rows are upserted (for example, fetched_at refreshes), new rows are inserted.
This matches FF’s weekly refresh model (FF_URL) and supports safe periodic syncing.