Troubleshooting

Quick health check

Run all checks at once:

python3 -c "
import json, os

mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
cfg = json.load(open('config.json'))

feat_ok = mc['feature_count'] == len(ml['features'])
print(f'[{\"OK\" if feat_ok else \"FAIL\"}] Feature count: ml_config={len(ml[\"features\"])}, compat={mc[\"feature_count\"]}')

model_files = ['ensemble_rf.pkl','ensemble_xgb.pkl','ensemble_lgb.pkl','ensemble_scaler.pkl',
               'position_rf.pkl','position_xgb.pkl','position_lgb.pkl','position_scaler.pkl']
all_present = all(os.path.exists(f'models/{f}') for f in model_files)
print(f'[{\"OK\" if all_present else \"FAIL\"}] Model files present')

print(f'[OK] conf={ml[\"prediction\"][\"confidence_threshold\"]} risk={cfg[\"dynamic_position_sizing\"][\"risk_percent\"]}%')
print(f'[OK] SL={cfg[\"dynamic_sltp\"][\"sl_atr_multiplier\"]}xATR TP={cfg[\"dynamic_sltp\"][\"tp_atr_multiplier\"]}xATR')
"

1. Feature count mismatch — scaler dimension error

Symptom: Bot crashes with scaler dimension error, or OOS results look wildly wrong. Diagnose:

python3 -c "
import json
mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
print(f'model_compat expects : {mc[\"feature_count\"]}')
print(f'ml_config has        : {len(ml[\"features\"])}')
print('OK' if mc['feature_count'] == len(ml['features']) else 'MISMATCH')
"

Fix — restore from compat manifest:

python3 -c "
import json, shutil
mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
ml['features'] = mc['features']
with open('ml_config.json', 'w') as f: json.dump(ml, f, indent=2)
print(f'Restored to {mc[\"feature_count\"]} features.')
"

Root cause: An optimizer dropped features from ml_config.json without completing the retrain. The deployed models still expect the original feature set.

2. Backtest produces zero trades

Symptom: backtest_config.py exits with 0 trades or No qualifying signals. Check in order:

# Is confidence threshold too high? (> 0.62 can produce zero trades)
python3 -c "import json; ml=json.load(open('ml_config.json')); print('conf:', ml['prediction']['confidence_threshold'])"

# Is min_atr_threshold too high? (M15 ATR is ~$50-200, threshold should be <= 20)
python3 -c "import json; ml=json.load(open('ml_config.json')); print('min_atr:', ml['risk_management']['min_atr_threshold'])"

# Is OOS window too short? (need at least 30 days after cutoff)
python3 -c "
import json
meta = json.load(open('models/ensemble_btcusd-live_metadata.json'))
print('train_cutoff_date:', meta.get('train_cutoff_date', 'not set'))
"

2b. Strict config sync check fails

Symptom: weekly_optimize.py, optimize_loop.py, or onboarding.py exits early with a generated-config drift error when NOVOSKY_STRICT_CONFIG_SYNC=1 is set. Diagnose:

python scripts/config_sync.py --check

Fix drift:

python scripts/config_sync.py --sync
python scripts/config_sync.py --check

Root cause: root configs were edited into an invalid state, or required canonical files are missing.

3. ONNX / model load error

Symptom: ModelNotFoundError, ONNX Runtime error, or pkl load failed at startup.

# Check what model files exist
ls -lh models/ensemble_*.pkl models/ensemble_*.onnx models/position_*.pkl 2>/dev/null

# Pull latest models from R2
python ml/r2_hub.py --pull

# Roll back to a specific version
python ml/r2_hub.py --pull --revision v20260414

# Verify model_compat.json is present (written after every successful train)
ls -l models/model_compat.json

If model_compat.json is missing, models were not trained together — do NOT go live. Retrain:

python scripts/retrain.py --local

4. trading.py —dry fails immediately

Symptom: Dry run exits with error before printing any cycle output.

python trading.py --dry 2>&1 | head -60

Common causes:

API_URL not set → check .env has API_URL=http://<IP>:6542
Feature mismatch → fix with issue #1 above
Missing model file → python ml/r2_hub.py --pull

5. Retrain appears stuck or very slow

If retrain has no visible progress, run the local retrain command directly in a fresh terminal:

python scripts/retrain.py --refresh

If no GPU is available, the process will continue on CPU and can take longer.

6. optimize_loop.py keeps reverting every iteration

Symptom: Every iteration reverts — score never improves.

# Read last 3 iterations
python3 -c "
import json
log = json.load(open('models/optimize_log.json'))
for e in log[-3:]:
    print(e.get('iteration'), e.get('decision'), 'score:', e.get('final_score','?'))
"

Actions:

Switch to risk-tuner — adjust conf/risk% without retraining
Try different lookahead (48→36 bars) or atr_sl_mult (0.8→0.7)
Refresh training data: python scripts/retrain.py --refresh

7. Session crashed mid-retrain

Symptom: Agent died during retrain — models may be inconsistent.

# Check consistency
python3 -c "
import json
mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
print('Feature count match:', mc['feature_count'] == len(ml['features']))
print('Trained at:', mc.get('trained_at','unknown'))
"

# Check if a snapshot exists
ls -d models/_snapshot_*/

To restore a snapshot:

SNAP=models/_snapshot_<tag>
cp $SNAP/ensemble_*.pkl $SNAP/ensemble_scaler.pkl \
   $SNAP/position_*.pkl $SNAP/position_scaler.pkl models/
cp $SNAP/ml_config.json .
python trading.py --dry

8. Position model EXIT firing too often

Symptom: High ML_Exit count (> 25% of trades), WR drops.

python3 -c "
import json
ml = json.load(open('ml_config.json'))
pm = ml.get('position_model', {}).get('prediction', {})
print('exit_threshold :', pm.get('exit_threshold'))
print('min_prob_diff  :', pm.get('min_prob_diff'))
"

Raise exit_threshold to 0.80 and min_prob_diff to 0.25 in ml_config.json, then re-run backtest to confirm. These are the Phase 15 tuned values.

9. Telegram not sending

# Test connectivity
python scripts/notify.py "test"

# Check .env
grep TELEGRAM .env

If keys are missing, add to .env:

TELEGRAM_TOKEN=<bot_token>
TELEGRAM_CHAT_ID=<chat_id>

10. Inflated WR in backtest (> 70% OOS)

Cause: Wrong --spread value. VT Markets BTCUSD = 16.95, IC Markets RAW = 3.0. Wrong spread inflates WR by 5–8pp. Always pass explicitly:

python backtest_config.py \
  --balance 500 --no-swap --leverage 500 \
  --spread 16.95 --oos-only --no-chart

11. PM2 / bot won’t start on reboot

# Check PM2 startup hook is registered
pm2 startup

# Check MT5 API is reachable
curl http://localhost:6542/ping

# If API is down, start Docker first
make up   # from mt5api/ directory

# Then start NOVOSKY
bash scripts/pm2-start.sh

# Tail logs
pm2 logs novosky --lines 50

First-time persistent setup:

bash scripts/pm2-start.sh --setup
# Run the printed 'sudo env PATH=...' command if prompted

12. Supabase `open_positions` shows rows, but broker has no open positions

Symptom: MT5 API /positions is empty, but Supabase open_positions still has old tickets. Most common cause: bot process is stopped/crashed, so heartbeat reconciliation is not running.

pm2 status
pm2 logs novosky --lines 120

After bot restart, heartbeat reconciliation removes stale rows automatically. Verify parity:

curl -sS -H "Authorization: Bearer $API_TOKEN" "$API_URL/positions?symbol=BTCUSD" | jq 'map(.ticket)'
curl -sS "$SUPABASE_URL/rest/v1/open_positions?bot_name=eq.$SUPABASE_BOT_NAME&select=ticket" \
  -H "apikey: $SUPABASE_KEY" -H "Authorization: Bearer $SUPABASE_KEY" | jq 'map(.ticket)'

13. Supabase `trades` is empty but account history has closed deals

Symptom: broker history/deals has exits, but trades has no rows. Check in order:

Bot process was down when positions closed.
Supabase credentials are missing/invalid.
You are looking at the wrong SUPABASE_BOT_NAME namespace.

pm2 status
grep -E 'SUPABASE_URL|SUPABASE_KEY|SUPABASE_BOT_NAME' .env

If process downtime caused the gap, backfill closed deals once, then keep PM2 online.

14. `news_events` duplicates or re-sync behavior

news_events is deduplicated by key (bot_name, title, event_at).

Re-pushing the same weekly FF payload does not create duplicate rows.
Existing rows are upserted (for example, fetched_at refreshes), new rows are inserted.

This matches FF’s weekly refresh model (FF_URL) and supports safe periodic syncing.

Rules

Workflow

Quick health check

1. Feature count mismatch — scaler dimension error

2. Backtest produces zero trades

2b. Strict config sync check fails

3. ONNX / model load error

4. trading.py —dry fails immediately

5. Retrain appears stuck or very slow

6. optimize_loop.py keeps reverting every iteration

7. Session crashed mid-retrain

8. Position model EXIT firing too often

9. Telegram not sending

10. Inflated WR in backtest (> 70% OOS)

11. PM2 / bot won’t start on reboot

12. Supabase `open_positions` shows rows, but broker has no open positions

13. Supabase `trades` is empty but account history has closed deals

14. `news_events` duplicates or re-sync behavior

Rules

Workflow

​Quick health check

​1. Feature count mismatch — scaler dimension error

​2. Backtest produces zero trades

​2b. Strict config sync check fails

​3. ONNX / model load error

​4. trading.py —dry fails immediately

​5. Retrain appears stuck or very slow

​6. optimize_loop.py keeps reverting every iteration

​7. Session crashed mid-retrain

​8. Position model EXIT firing too often

​9. Telegram not sending

​10. Inflated WR in backtest (> 70% OOS)

​11. PM2 / bot won’t start on reboot

​12. Supabase open_positions shows rows, but broker has no open positions

​13. Supabase trades is empty but account history has closed deals

​14. news_events duplicates or re-sync behavior

Quick health check

1. Feature count mismatch — scaler dimension error

2. Backtest produces zero trades

2b. Strict config sync check fails

3. ONNX / model load error

4. trading.py —dry fails immediately

5. Retrain appears stuck or very slow

6. optimize_loop.py keeps reverting every iteration

7. Session crashed mid-retrain

8. Position model EXIT firing too often

9. Telegram not sending

10. Inflated WR in backtest (> 70% OOS)

11. PM2 / bot won’t start on reboot

12. Supabase `open_positions` shows rows, but broker has no open positions

13. Supabase `trades` is empty but account history has closed deals

14. `news_events` duplicates or re-sync behavior