Skip to main content
Every ML feature exists in three places simultaneously. Change one — change all three.
Violating the Three-File Rule causes silent bugs that only show up at inference time or in production — wrong predictions, scaler dimension mismatches, or crashes on startup.

The rule

If you change…You must also update…And then…
ml/feature_engineering.pyml_config.json → features list + backtest/run.py cache dictRetrain all models
ml_config.json → featuresml/feature_engineering.py + backtest/run.py cache dictRetrain all models
trading.py execution logicbacktest/run.py simulation logicVerify parity
backtest/run.py logictrading.py execution logicVerify parity
Any retrainstrategy_params.json + TODO.md

Risk model three-file rule

The risk model has its own parallel three-file requirement. The 7 equity-state feature names must be identical in all three places:
FileLocation
ml/risk_trainer.pyRISK_FEATURES list constant
ml/risk_predictor.pyRISK_FEATURES list constant
ml_config.jsonrisk_model.features array
# Verify all three are in sync
python3 -c "
import json, ast, re

ml = json.load(open('ml_config.json'))
config_feats = ml['risk_model']['features']

def extract_risk_features(path):
    src = open(path).read()
    m = re.search(r'RISK_FEATURES\s*=\s*\[([^\]]+)\]', src, re.DOTALL)
    if not m:
        return []
    return [x.strip().strip('\"').strip(\"'\") for x in m.group(1).split(',') if x.strip()]

trainer_feats = extract_risk_features('ml/risk_trainer.py')
predictor_feats = extract_risk_features('ml/risk_predictor.py')

ok = trainer_feats == predictor_feats == config_feats
print('Risk Three-File Rule:', 'OK' if ok else 'MISMATCH')
if not ok:
    print('  trainer :', trainer_feats)
    print('  predictor:', predictor_feats)
    print('  config  :', config_feats)
"
If these diverge, the risk model will silently receive wrong feature order at inference and output garbage multipliers. Always update all three together.

Hard rules

  • Never reorder ml_config.json → features without retraining — the scaler is fixed to this exact column order.
  • Never drop features from ml_config.json without immediately retraining — removing a column causes dimension mismatch.
  • Never drop session or news features based on SHAP — they show near-zero SHAP at training time by design, but are critical at inference.
  • Never use sltp_aware labeling with dynamic_sltp.enabled = true — this collapses live WR from ~78% to ~49%.
  • Never hardcode thresholds in trading.py — read everything from config.json or ml_config.json.
  • Never use --days 365 without --oos-only for real performance evaluation.

Adding a new feature (step by step)

# 1. Implement the computation in ml/feature_engineering.py
#    Add to add_all_features() or as a new helper method

# 2. Append to ml_config.json → "features"
#    ALWAYS append at the end — never insert in the middle
#    e.g. "features": [...existing 55..., "my_new_feature"]

# 3. Add to the cache dict in backtest/run.py
#    Search for the feature cache block — it manually populates
#    features from stored OHLCV bars

# 4. Retrain (always --no-warmstart when feature count changes)
#    Include --risk to retrain the risk multiplier model alongside signal + position
python train_ml_model.py --ensemble --position --risk --no-warmstart

# 5. Validate on OOS data
python backtest_config.py \
  --balance 500 --no-swap --leverage 500 \
  --spread 16.95 --oos-only --no-chart

# 6. Document in strategy_params.json and TODO.md

Removing a feature

# 1. Remove from ml/feature_engineering.py
# 2. Remove from ml_config.json → "features"
# 3. Remove from backtest/run.py cache dict
# 4. Retrain immediately (--no-warmstart required)
python train_ml_model.py --ensemble --position --risk --no-warmstart
Never remove these features based on SHAP analysis alone — they are near-zero at training time by design, but critical at inference: is_news_near, news_minutes_away, news_count_today, is_news_risk_window, is_london_session, is_ny_session, is_asian_session

Verify compatibility

Run this any time to confirm models and feature config are in sync:
python3 -c "
import json
mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
assert mc['feature_count'] == len(ml['features']), 'MISMATCH'
print('OK:', mc['feature_count'], 'features')
"
This check also runs automatically at bot startup and is enforced by optimize_loop.py.