Three-File Rule

Every ML feature exists in three places simultaneously. Change one — change all three.

Violating the Three-File Rule causes silent bugs that only show up at inference time or in production — wrong predictions, scaler dimension mismatches, or crashes on startup.

The rule

If you change…	You must also update…	And then…
`ml/feature_engineering.py`	`ml_config.json → features` list + `backtest/run.py` cache dict	Retrain all models
`ml_config.json → features`	`ml/feature_engineering.py` + `backtest/run.py` cache dict	Retrain all models
`trading.py` execution logic	`backtest/run.py` simulation logic	Verify parity
`backtest/run.py` logic	`trading.py` execution logic	Verify parity
Any retrain	`strategy_params.json` + `TODO.md`	—

Risk model three-file rule

The risk model has its own parallel three-file requirement. The 7 equity-state feature names must be identical in all three places:

File	Location
`ml/risk_trainer.py`	`RISK_FEATURES` list constant
`ml/risk_predictor.py`	`RISK_FEATURES` list constant
`ml_config.json`	`risk_model.features` array

# Verify all three are in sync
python3 -c "
import json, ast, re

ml = json.load(open('ml_config.json'))
config_feats = ml['risk_model']['features']

def extract_risk_features(path):
    src = open(path).read()
    m = re.search(r'RISK_FEATURES\s*=\s*\[([^\]]+)\]', src, re.DOTALL)
    if not m:
        return []
    return [x.strip().strip('\"').strip(\"'\") for x in m.group(1).split(',') if x.strip()]

trainer_feats = extract_risk_features('ml/risk_trainer.py')
predictor_feats = extract_risk_features('ml/risk_predictor.py')

ok = trainer_feats == predictor_feats == config_feats
print('Risk Three-File Rule:', 'OK' if ok else 'MISMATCH')
if not ok:
    print('  trainer :', trainer_feats)
    print('  predictor:', predictor_feats)
    print('  config  :', config_feats)
"

If these diverge, the risk model will silently receive wrong feature order at inference and output garbage multipliers. Always update all three together.

Hard rules

Never reorder ml_config.json → features without retraining — the scaler is fixed to this exact column order.
Never drop features from ml_config.json without immediately retraining — removing a column causes dimension mismatch.
Never drop session or news features based on SHAP — they show near-zero SHAP at training time by design, but are critical at inference.
Never use sltp_aware labeling with dynamic_sltp.enabled = true — this collapses live WR from ~78% to ~49%.
Never hardcode thresholds in trading.py — read everything from config.json or ml_config.json.
Never use --days 365 without --oos-only for real performance evaluation.

Adding a new feature (step by step)

# 1. Implement the computation in ml/feature_engineering.py
#    Add to add_all_features() or as a new helper method

# 2. Append to ml_config.json → "features"
#    ALWAYS append at the end — never insert in the middle
#    e.g. "features": [...existing 55..., "my_new_feature"]

# 3. Add to the cache dict in backtest/run.py
#    Search for the feature cache block — it manually populates
#    features from stored OHLCV bars

# 4. Retrain (always --no-warmstart when feature count changes)
#    Include --risk to retrain the risk multiplier model alongside signal + position
python train_ml_model.py --ensemble --position --risk --no-warmstart

# 5. Validate on OOS data
python backtest_config.py \
  --balance 500 --no-swap --leverage 500 \
  --spread 16.95 --oos-only --no-chart

# 6. Document in strategy_params.json and TODO.md

Removing a feature

# 1. Remove from ml/feature_engineering.py
# 2. Remove from ml_config.json → "features"
# 3. Remove from backtest/run.py cache dict
# 4. Retrain immediately (--no-warmstart required)
python train_ml_model.py --ensemble --position --risk --no-warmstart

Never remove these features based on SHAP analysis alone — they are near-zero at training time by design, but critical at inference: is_news_near, news_minutes_away, news_count_today, is_news_risk_window, is_london_session, is_ny_session, is_asian_session

Verify compatibility

Run this any time to confirm models and feature config are in sync:

python3 -c "
import json
mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
assert mc['feature_count'] == len(ml['features']), 'MISMATCH'
print('OK:', mc['feature_count'], 'features')
"

This check also runs automatically at bot startup and is enforced by optimize_loop.py.

Rules

Workflow

The rule

Risk model three-file rule

Hard rules

Adding a new feature (step by step)

Removing a feature

Verify compatibility

Rules

Workflow

​The rule

​Risk model three-file rule

​Hard rules

​Adding a new feature (step by step)

​Removing a feature

​Verify compatibility

The rule

Risk model three-file rule

Hard rules

Adding a new feature (step by step)

Removing a feature

Verify compatibility