All model binaries (.pkl, .onnx, *_xgb.json, *_lgb.txt) are stored on Cloudflare R2, not in Git. The repository contains only code and small metadata JSON files.
Bucket: novosky-models
Current tag: v20260415 (Phase 15)
User lane: pull approved revisions only. Developer lane: push approved revisions after weekly optimization/retraining and announce which tag each profile (1-5) should use.
Credentials are required for both push and pull. The bucket is private — there is no anonymous access. Add CF_R2_ACCESS_KEY_ID and CF_R2_SECRET_ACCESS_KEY to .env before running any r2_hub.py command.
Common commands
# Pull latest models (after a fresh clone or on a new server)
python ml/r2_hub.py --pull
# Pull a profile-specific approved revision (recommended for users)
python ml/r2_hub.py --pull --revision vYYYYMMDD-p3
# Push after a retrain (auto-creates tag vYYYYMMDD)
python ml/r2_hub.py --push
# Push directly from the training script
python train_ml_model.py --ensemble --position --push-to-hub
# List all version tags
python ml/r2_hub.py --list
# Roll back to a specific version
python ml/r2_hub.py --pull --revision v20260414
# Create a named tag without uploading new files
python ml/r2_hub.py --tag-only v15-production "Phase 15 validated"
Storage layout
signal/
ensemble_rf.pkl # latest signal model (Random Forest)
ensemble_rf.onnx
ensemble_xgb.json # XGBoost native format
ensemble_lgb.txt # LightGBM native format
ensemble_scaler.pkl
ensemble_btcusd-live_metadata.json
position/
position_rf.pkl # latest position model
position_rf.onnx
position_xgb.json
position_lgb.txt
position_scaler.pkl
position_metadata.json
model_compat.json # latest compatibility manifest
v20260415/ # versioned snapshot (for rollback)
signal/...
position/...
tags/
v20260415.json # version manifest (metadata + accuracy)
What goes where
| Location | What |
|---|
| Cloudflare R2 | *.pkl, *.onnx, *_xgb.json, *_lgb.txt, model_compat.json |
| Git | All Python code, config.json, training datasets (CSV), metadata JSONs |
| Gitignored | .env, model binaries, models/_snapshot_*/, runtime log files |
Auto-pull on startup
The bot automatically pulls models from R2 at startup if models/ensemble_rf.onnx is missing. This is handled by _ensure_models_present() in trading/bot.py. You don’t need to manually pull after a fresh clone — just start the bot.
Rollback procedure
# 1. Pull a specific version
python ml/r2_hub.py --pull --revision v20260414
# 2. Verify compatibility
python3 -c "
import json
mc = json.load(open('models/model_compat.json'))
ml = json.load(open('ml_config.json'))
assert mc['feature_count'] == len(ml['features'])
print('OK:', mc['feature_count'], 'features')
"
# 3. Dry-run to confirm
python trading.py --dry