Building Power Ratings: From Spreadsheet to Real Edge

A power rating is a single-number estimate of a team's strength. The market produces implicit power ratings every time it sets a spread. Building your own gives you a fair-line estimate that's independent of the market — which is the prerequisite for measuring your edge against it. The math is straightforward; the discipline is in not over-engineering.

What a power rating is

A power rating expresses team quality on a points-per-game scale. If the Patriots have a power rating of +7 and the Bills have +3, the model expects the Patriots to win by 4 points on a neutral field. Add home-field advantage (+2-3 in NFL), get the spread.

The simplest model: margin-of-victory averaging

Start with a 5-line model:

For each team, compute average point margin per game season-to-date.
Adjust for opponent strength (subtract opponent's average margin).
Apply a regression-to-mean factor (multiply by 0.7 in early season, 0.85 in mid-season).
Add home-field advantage where applicable.
Compare to market spread.

This is the entry-level model. It will be wrong in many ways. It's also surprisingly competitive against the market in mid-tier conferences and lower-volume sports.

The inputs that matter

Different sports weight inputs differently:

NFL: Margin of victory + opponent adjustment + recency weighting + home/away. Sample size: 17 games (small).
NBA: Net rating (offensive efficiency - defensive efficiency) + opponent adjustment + injury status + back-to-backs. Sample size: 82 games (medium).
MLB: Pitcher-specific (xFIP, K%, BB%) + lineup wOBA + park factor. Sample size: 162 (large) but per-game pitching variance is huge.
College football: Margin + opponent + offensive/defensive efficiency separately + transfer portal status. Lots of variance from rosters changing year-over-year.

The calibration problem

A power rating that produces 'right' answers on average isn't enough. You need calibration: when your model says the spread should be -4 and the market says -7, the model needs to be right enough times to justify the bet. Calibration tells you whether your model's probability estimates are reliable.

Test calibration with a back-test: for every bet recommended by your model in the last 2 seasons, check whether the realized result matched the model's expected probability. A well-calibrated model that says 'this bet has 55% probability of winning' should win 55% of those bets.

The over-engineering trap

Most retail-built models suffer from over-engineering. Adding 30 inputs, 50 weights, and 10 layers of regression typically produces a model that looks impressive, fits historical data perfectly, and fails out-of-sample. The reason: with enough parameters, you can fit anything; you've fit the noise, not the signal.

Two rules of thumb:

Start with 5-7 inputs. Add a new one only when you can articulate a specific reason it should improve the model and the back-test confirms.
Walk-forward test: train on 2022-2023 seasons, test on 2024. Train on 2022-2024, test on 2025. If the model works only with all-data training, it's overfit.

Beating the market vs being correlated with it

Most retail power ratings end up correlated with market spreads — meaning the bettor and the market agree on most games. That's not a bug; the market is informed. But correlation isn't edge. Edge is in the games where you and the market disagree, and you're right disproportionately.

For your model to have edge, you need a specific reason to disagree with the market on specific games. 'My model has Bills +1 and the market has Bills -3' is informative only if you can articulate why your model is right (e.g., 'my model weights opponent-adjusted EPA more heavily than the market does, and the Bills have played a soft schedule that the market hasn't fully discounted').

Where retail-built models compete

Mid-tier college football and basketball. Lower trader attention; retail-built models can find spots where the market is just wrong.
Lower-volume props. Pace-driven NBA player props, MLB strikeout props in obscure pitcher matchups.
F5 baseball. See our F5 guide.
WNBA, MLS, lesser-watched leagues. Lower hold percentage at sharp books.

Where retail-built models lose

NFL game lines. Most-bet, most-modeled, most-efficient market. Building a model that beats NFL game lines from retail is unrealistic absent specialized info.
NBA top-stars markets. LeBron points, Curry threes, Tatum scoring — too many sharp models compete here.
Top-level European soccer. Premier League, La Liga, Bundesliga top markets.

The 'just enough' model

The most efficient retail strategy is a 'just enough' model: simple, well-calibrated, and applied selectively to inefficient markets. Beat the market by 1.5% on 200 bets a year and you've added significant value to the bankroll. Don't try to build the perfect model; build the model that's good enough to justify a bet.

Tools and stack

Spreadsheet (Excel, Google Sheets): entry-level, fine for <500 bets/year. Auto-pull historical data via Sheets imports.
Python + Pandas + scikit-learn: medium-complexity. Most retail-built models live here.
R + tidymodels: alternative; statistically more rigorous out of the box.
Custom databases (PostgreSQL): for high-volume bettors with multiple models.

Discipline rules

Walk-forward test before betting. No exceptions.
Bet small at first. 0.5% of bankroll until 100+ bets prove the model works in production.
Track CLV per bet. CLV is a faster signal than W/L.
Re-train regularly. A model trained on 2023 data may decay by 2026.
Don't keep adding inputs. Most additions hurt out-of-sample performance.

By Alex Park · Updated April 30, 2026 · Reviewed by editorial team