Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.novosky.app/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Feature engineering is the first step in both training and inference. ml/feature_engineering.py transforms raw OHLCV candles into the 62-feature vector consumed by all four models. The feature list is a contract. The training pipeline, inference pipeline, and ml_config.json must all reference the same 62 features in the same order. Any change requires retraining all 4 models.
# Verify features are in sync before a retrain
python -c "
from ml.feature_engineering import get_feature_names
from ml.data_preparation import load_config
cfg = load_config()
assert get_feature_names() == cfg['features'], 'Feature mismatch!'
print('OK β€”', len(get_feature_names()), 'features')
"

Feature groups

FeatureDescription
bb_upperUpper Bollinger Band (20-period, 2Οƒ)
bb_lowerLower Bollinger Band
bb_widthBand width normalized by mid β€” measures squeeze/expansion
range_positionClose position within the band: 0 = at lower, 1 = at upper
FeatureDescription
price_vs_ema200(Close βˆ’ EMA200) / Close β€” trend direction and strength
trend_strengthADX-inspired directional strength
macd_signalMACD signal line (9-period EMA of MACD)
macd_histMACD histogram (MACD βˆ’ signal)
ema_crossoverEMA50/EMA200 crossover state: +1, 0, βˆ’1
ema_fastEMA fast line (period from config)
ema_slope_55-bar slope of EMA20
higher_tf_trendH4 timeframe trend direction
FeatureDescription
rsi_14RSI(14), normalized to [0, 1]
rsi_divergenceBullish/bearish RSI divergence vs price
momentum_55-bar log return
momentum_1010-bar log return
rate_of_change14-bar % price change
close_vs_openCurrent candle body direction and size
FeatureDescription
atr_14ATR(14) β€” absolute pip distance
atr_ratioATR normalized by close price
volatility_20Rolling 20-bar std dev of log returns
high_low_ratioHigh/Low ratio β€” candle range
gap_up_downOpen vs previous close gap
FeatureDescription
volume_ratioCurrent volume / 20-bar rolling average
volume_trend5-bar volume slope (increasing vs decreasing)
tick_volume_normTick volume normalized to [0, 1]
volume_price_trendVolume Γ— price direction (accumulation/distribution)
FeatureDescription
support_distanceDistance from nearest support level
resistance_distanceDistance from nearest resistance level
candlestick_patternEncoded candlestick pattern (doji, hammer, engulfing, etc.)
bar_body_pctBody as % of total candle range
upper_shadowUpper shadow normalized by range
lower_shadowLower shadow normalized by range
These features must never be dropped based on SHAP analysis alone. Historical testing showed removing them increased drawdown significantly. They carry regime and timing information not captured by price alone.
FeatureDescription
is_london_session1 if current UTC hour is in London session (07:00–16:00)
is_ny_session1 if current UTC hour is in NY session (13:00–22:00)
is_asian_session1 if current UTC hour is in Asian session (00:00–09:00)
is_news_near1 if high-impact news event within news_block_minutes
news_minutes_awayMinutes to next scheduled high-impact event
news_count_todayNumber of high-impact events today
is_news_risk_window1 if within the combined pre/post news risk window
Never drop is_news_near, news_minutes_away, news_count_today, is_news_risk_window, is_london_session, is_ny_session, or is_asian_session based on SHAP importance alone. Low SHAP mean β‰  useless. These features encode risk context that only matters during specific market regimes.
These features are computed at runtime from live account data, not from candles. They allow the models to adapt to current account health.
FeatureDescription
drawdown_pctCurrent equity drawdown from starting balance, %
equity_ratioequity / starting_balance
win_rate_recentWin rate over the last 20 trades
consecutive_lossesCurrent consecutive loss streak
trades_todayNumber of trades taken today
profit_today_pctToday’s P&L as % of equity
hours_since_last_tradeTime gap since the last closed trade
FeatureDescription
hourUTC hour (0–23)
day_of_weekDay encoded 0–6 (0 = Monday)
is_mondayFlag
is_fridayFlag
hour_sin, hour_cosCyclical encoding of UTC hour
dow_sin, dow_cosCyclical encoding of day of week
is_market_open1 if within active trading hours
minutes_to_closeMinutes until session close
week_of_monthWeek number within the month
month_sin, month_cosCyclical month encoding
Derived entirely from existing OHLCV data β€” no external API dependencies.
FeatureDescription
volatility_regimeATR(14) percentile rank over a rolling 500-bar window, encoded as continuous [0–1]. More responsive than atr_percentile (720-bar). Low = ranging/coiling, high = trending/breakout.
w1_ema_bias(close βˆ’ EMA10) / close on weekly (W1) bars, forward-filled to M15. Positive = price above weekly trend.
w1_rsi_normRSI(14) on W1 bars, normalised to [0, 1] (0.5 = neutral). Captures macro weekly momentum state.

Labeling

ml/feature_engineering.py uses ATR-aware forward labeling to generate training targets. For signal model labels:
if future_high >= open + tp_atr Γ— ATR:
    label = BUY
elif future_low <= open - tp_atr Γ— ATR:
    label = SELL
else:
    label = HOLD
The tp_atr multiplier is tuned during Optuna search. Labels are generated using the same ATR-based SL/TP logic that the live bot uses, ensuring training distribution matches live inference distribution. For position model labels, ml/position_labeling.py generates EXIT labels when the price subsequently reverses by more than a configurable threshold before reaching TP.

Adding or modifying features

The Three-File Rule applies to all feature changes:
  1. ml/feature_engineering.py β€” add the computation
  2. ml_config.json β†’ features array β€” add the name in the correct position
  3. Retrain all 4 models
After retraining, run python ml/hf_hub.py --push to publish the new models and models/model_compat.json to Hugging Face Hub. Do not add features speculatively. Every added feature increases the risk of overfitting and must be validated with an OOS backtest showing improvement.