Home/Guides/How to Evaluate a Betting Model (Yours or Someone Else's)
Betting strategy · 12 min read

How to Evaluate a Betting Model (Yours or Someone Else's)

Most betting models — including most published ones — fail under the smallest stress. Here's the discipline.

The number of "models" in retail sports betting has exploded. Twitter has them. Substacks have them. Discord groups sell them. The honest answer about most of them is that they're noise dressed up in spreadsheets. Evaluating whether a model is real — yours or someone else's — requires a small set of disciplined questions. None of them are exotic. Most published models fail when you ask them.

Question 1: What does the model claim to do?

A real model has a falsifiable claim. "This model identifies +EV NFL spread bets where my number differs from the closing line by 1.5+ points" is falsifiable. "This model spots winners" is not. If the claim is fuzzy, the model is unevaluable — which is the point.

Question 2: What's the sample size?

Sports betting variance is enormous. A 100-bet sample tells you almost nothing. Industry standard for assessing whether a betting strategy has edge is 1,000+ bets — and even then, the standard error on win rate is large. A model with 50 bets and a 60% win rate at -110 (a stunning record) has a 95% confidence interval that ranges from "elite" to "barely above coinflip." Ignore any model claim that doesn't disclose sample size and the corresponding standard error.

Question 3: What's the CLV?

Closing line value is the single best forward-looking edge indicator. A model that consistently beats the closing line by 1.5%+ on every bet is mathematically expected to be profitable. A model that "wins 60% of bets" but doesn't beat closing line is variance, not edge. Read our CLV guide for the underlying logic.

CLV is also resistant to most forms of result-padding. You can cherry-pick "winners" in retrospect; you cannot retroactively beat closing lines you didn't bet.

Question 4: How was it backtested?

Backtests are the most-abused tool in betting analytics. The most common backtesting traps:

  • Look-ahead bias. Using closing odds (which incorporate information not available at the time of decision) to evaluate a model that pretends to operate on opening odds.
  • Survivorship bias. Backtesting only on the seasons / sports / players where the model "would have worked" — and excluding the rest as "unique circumstances."
  • Parameter mining. Tuning model parameters until the backtest shows positive results; the model has 30 parameters and you tested 200 combinations until one worked.
  • Free-data trap. Modeling on cleaned, after-the-fact data that didn't exist in real-time. Closing odds, final injury reports, even some game-state data was not available the way the backtest assumes.

A clean backtest discloses look-ahead controls, walks the test forward in time (training-set ends before test-set begins), uses opening or open-near odds (not close), and reports out-of-sample results separately from in-sample.

Question 5: What's the variance profile?

A model that wins 55% of bets at -110 has expected value of about 5%. A losing streak of 10 in a row is roughly a 1-in-180 event for that model. In a 500-bet season, you'll see at least one 10-loss streak with high probability. A model that doesn't include variance projections — and the bettor doesn't have a bankroll plan to survive them — will get abandoned mid-streak even if the underlying edge is real. That's not a model failure; it's a discipline failure.

Question 6: Why hasn't the market arbitraged the edge away?

This is the question most retail-built models can't answer. Sports betting is a competitive market with sophisticated counterparties (Pinnacle, Cris, Betcris) whose entire business is removing inefficiencies. If your model claims a 5% edge on NFL totals, you have to explain why the global market — including Pinnacle, which prices the closest-to-fair lines in the world — is leaving that edge on the table.

The answer is sometimes legitimate: a niche market (D3 college football, table tennis), an information advantage (you cover a beat, you saw a closed practice), a behavioral inefficiency (the public's over-tilt on prime-time games). The answer is sometimes "I haven't tested at scale and my model is overfit." Don't put bankroll on a model until you can answer Question 6.

Question 7: Does the model still work as it scales?

Some models work at small bet sizes ($50-100) but break at larger sizes ($500-1,000) because the books move lines on the model's signal. Some models work for 50 bets per week but fail at 200 because the bettor can't maintain decision discipline at that volume. A model is only as good as the bettor's ability to execute it consistently.

Evaluating someone else's model (a paid pick service or model)

Three quick disqualifiers:

  1. The track record is reported in units, not in CLV. Units can be padded by selectively reporting wins. CLV cannot.
  2. The track record predates the disclosure. Honest models publish picks before games start, retain the public record indefinitely, and report ALL picks (not "best bets only").
  3. The pricing model depends on retail buyers, not on the operator's own bets. The best signal that a tout actually has edge is that they're betting their own picks at scale. Most don't.

The shortest possible model evaluation

If a model claims edge: ask for the bet log, run CLV against the published closing lines, check sample size, look at variance profile, and ask why the market hasn't already pricing the edge in. If the answers are weak on any of these, walk away.

Building your own

If you want to build, start small. Pick one market (NFL totals, NBA player rebounds, MLB strikeout props). Build a model with 5-10 inputs. Walk it forward through 2-3 seasons of historical data without parameter mining. Track CLV from day one. Bet small until the sample is meaningful. Most models that survive this discipline produce 1-3% edge — modest, real, and actually exploitable.

Related reading