Feature engineering - NOVOSKY Docs

Overview

Feature engineering is the first step in both training and inference. ml/feature_engineering.py transforms raw OHLCV candles into the 62-feature vector consumed by all four models. The feature list is a contract. The training pipeline, inference pipeline, and ml_config.json must all reference the same 62 features in the same order. Any change requires retraining all 4 models.

# Verify features are in sync before a retrain
python -c "
from ml.feature_engineering import get_feature_names
from ml.data_preparation import load_config
cfg = load_config()
assert get_feature_names() == cfg['features'], 'Feature mismatch!'
print('OK —', len(get_feature_names()), 'features')
"

Feature groups

Bollinger Bands (4 features)

Feature	Description
`bb_upper`	Upper Bollinger Band (20-period, 2σ)
`bb_lower`	Lower Bollinger Band
`bb_width`	Band width normalized by mid — measures squeeze/expansion
`range_position`	Close position within the band: 0 = at lower, 1 = at upper

Trend (8 features)

Feature	Description
`price_vs_ema200`	(Close − EMA200) / Close — trend direction and strength
`trend_strength`	ADX-inspired directional strength
`macd_signal`	MACD signal line (9-period EMA of MACD)
`macd_hist`	MACD histogram (MACD − signal)
`ema_crossover`	EMA50/EMA200 crossover state: +1, 0, −1
`ema_fast`	EMA fast line (period from config)
`ema_slope_5`	5-bar slope of EMA20
`higher_tf_trend`	H4 timeframe trend direction

Momentum (6 features)

Feature	Description
`rsi_14`	RSI(14), normalized to [0, 1]
`rsi_divergence`	Bullish/bearish RSI divergence vs price
`momentum_5`	5-bar log return
`momentum_10`	10-bar log return
`rate_of_change`	14-bar % price change
`close_vs_open`	Current candle body direction and size

Volatility (5 features)

Feature	Description
`atr_14`	ATR(14) — absolute pip distance
`atr_ratio`	ATR normalized by close price
`volatility_20`	Rolling 20-bar std dev of log returns
`high_low_ratio`	High/Low ratio — candle range
`gap_up_down`	Open vs previous close gap

Volume (4 features)

Feature	Description
`volume_ratio`	Current volume / 20-bar rolling average
`volume_trend`	5-bar volume slope (increasing vs decreasing)
`tick_volume_norm`	Tick volume normalized to [0, 1]
`volume_price_trend`	Volume × price direction (accumulation/distribution)

Price structure (6 features)

Feature	Description
`support_distance`	Distance from nearest support level
`resistance_distance`	Distance from nearest resistance level
`candlestick_pattern`	Encoded candlestick pattern (doji, hammer, engulfing, etc.)
`bar_body_pct`	Body as % of total candle range
`upper_shadow`	Upper shadow normalized by range
`lower_shadow`	Lower shadow normalized by range

Session and news (7 features) — PROTECTED

These features must never be dropped based on SHAP analysis alone. Historical testing showed removing them increased drawdown significantly. They carry regime and timing information not captured by price alone.

Feature	Description
`is_london_session`	1 if current UTC hour is in London session (07:00–16:00)
`is_ny_session`	1 if current UTC hour is in NY session (13:00–22:00)
`is_asian_session`	1 if current UTC hour is in Asian session (00:00–09:00)
`is_news_near`	1 if high-impact news event within `news_block_minutes`
`news_minutes_away`	Minutes to next scheduled high-impact event
`news_count_today`	Number of high-impact events today
`is_news_risk_window`	1 if within the combined pre/post news risk window

Never drop is_news_near, news_minutes_away, news_count_today, is_news_risk_window, is_london_session, is_ny_session, or is_asian_session based on SHAP importance alone. Low SHAP mean ≠ useless. These features encode risk context that only matters during specific market regimes.

Account state (7 features)

These features are computed at runtime from live account data, not from candles. They allow the models to adapt to current account health.

Feature	Description
`drawdown_pct`	Current equity drawdown from starting balance, %
`equity_ratio`	equity / starting_balance
`win_rate_recent`	Win rate over the last 20 trades
`consecutive_losses`	Current consecutive loss streak
`trades_today`	Number of trades taken today
`profit_today_pct`	Today’s P&L as % of equity
`hours_since_last_trade`	Time gap since the last closed trade

Time features (12 features)

Feature	Description
`hour`	UTC hour (0–23)
`day_of_week`	Day encoded 0–6 (0 = Monday)
`is_monday`	Flag
`is_friday`	Flag
`hour_sin`, `hour_cos`	Cyclical encoding of UTC hour
`dow_sin`, `dow_cos`	Cyclical encoding of day of week
`is_market_open`	1 if within active trading hours
`minutes_to_close`	Minutes until session close
`week_of_month`	Week number within the month
`month_sin`, `month_cos`	Cyclical month encoding

Market regime (3 features) — Phase 17.2

Derived entirely from existing OHLCV data — no external API dependencies.

Feature	Description
`volatility_regime`	ATR(14) percentile rank over a rolling 500-bar window, encoded as continuous [0–1]. More responsive than `atr_percentile` (720-bar). Low = ranging/coiling, high = trending/breakout.
`w1_ema_bias`	`(close − EMA10) / close` on weekly (W1) bars, forward-filled to M15. Positive = price above weekly trend.
`w1_rsi_norm`	RSI(14) on W1 bars, normalised to [0, 1] (0.5 = neutral). Captures macro weekly momentum state.

Labeling

ml/feature_engineering.py uses ATR-aware forward labeling to generate training targets. For signal model labels:

if future_high >= open + tp_atr × ATR:
    label = BUY
elif future_low <= open - tp_atr × ATR:
    label = SELL
else:
    label = HOLD

The tp_atr multiplier is tuned during Optuna search. Labels are generated using the same ATR-based SL/TP logic that the live bot uses, ensuring training distribution matches live inference distribution. For position model labels, ml/position_labeling.py generates EXIT labels when the price subsequently reverses by more than a configurable threshold before reaching TP.

Adding or modifying features

The Three-File Rule applies to all feature changes:

ml/feature_engineering.py — add the computation
ml_config.json → features array — add the name in the correct position
Retrain all 4 models

After retraining, run python ml/hf_hub.py --push to publish the new models and models/model_compat.json to Hugging Face Hub. Do not add features speculatively. Every added feature increases the risk of overfitting and must be validated with an OOS backtest showing improvement.

Documentation Index

​Overview

​Feature groups

​Labeling

​Adding or modifying features

Overview

Feature groups

Labeling

Adding or modifying features