Menu
⚾ Teams 📈 Markets 🏆 Playoffs 📊 Backtest 🔬 Features ℹ️ About
2026 Season

About the Model

2026 MLB Season Win Predictions

Model Architecture

Ensemble Approach

Three machine learning models combined via weighted average:

  • XGBoost (50%) - Gradient boosting with 300 trees, depth=5, lr=0.01 (optimized)
  • Random Forest (40%) - 300 decision trees, max features=sqrt(n) (optimized)
  • Ridge Regression (10%) - Linear model with L2 regularization

Weights optimized via autoresearch (43 experiments, 1.5% MAE improvement)

Training Data

359 team-seasons from 2014-2025 (12 years, all 30 MLB teams)

Validated via walk-forward cross-validation on 2021-2024 test years.

43 Predictive Features

Historical Performance (12)

  • Wins (previous year)
  • Pythagorean wins
  • OPS, ERA, WHIP
  • Runs scored/allowed
  • Home runs, batting average
  • 3-year rolling averages

Advanced Metrics (19)

  • WAR: Batting, Pitching, Total, Bullpen, Rotation
  • Park Factors: Runs/HR adjustments for ballpark
  • Home/Road Splits: Win% at home vs away
  • Age: Roster age (batting/pitching)
  • Pythagorean Luck: 3-year mean reversion
  • Schedule Strength: Division opponent quality
  • Manager: Win% and tenure

2026 Projections (7) + Historical Projections (3)

  • Historical Projections (ZiPS/Steamer 2014-2025): proj_wins, proj_over_under, proj_win_pct (47.6% importance!)
  • 2026 FanGraphs: OPS, ERA, runs, WHIP projections

Contextual (3)

  • Pythagorean luck (mean reversion)
  • Over/under previous year
  • Win percentage trends

Top 10 Most Important Features

1. pythag_wins_prev
20.6%
2. wins_prev_year
6.0%
3. home_win_pct
5.1%
4. era_prev_year
5.0%
5. whip_prev_year
5.0%
6. runs_allowed
3.9%
7. schedule_strength
3.5%
8. total_war
3.3%
9. park_adj_runs
3.1%
10. bat_war
2.9%

Features ranked by XGBoost importance scores. Historical projections (proj_wins, proj_over_under, proj_win_pct) dominate with 47.6% combined importance.

Data Sources

Historical Statistics

Baseball Reference via pybaseball (2014-2025)

Historical Projections (2014-2025)

ZiPS/Steamer archives from FanGraphs blog posts (360 team-years, 0.706 correlation with actuals)

2026 Projections

FanGraphs Depth Charts (scraped via Firecrawl)

Transactions

Baseball Reference 2026 transaction log (525 moves, WAR-matched)

Injuries

FanGraphs Injury List (real-time, 122 players on IL)

Home/Road Splits

MLB Stats API (2014-2024 historical splits)

Manager Data

Baseball Almanac, Baseball Reference (2026 roster)

Model Performance

3.69
Mean Absolute Error (wins)
Walk-forward CV (2021-2024)
0.877
R² Score
88% variance explained
81.8
Mean Predicted Wins
Expected: ~81 wins