Zero-inflated Poisson#
The Zero-Inflated Poisson (ZIP) model is an extension of the standard Poisson model designed to handle an excess of zero-goal outcomes in football match data.
Traditional Poisson models may struggle with matches that end in goalless draws more frequently than expected, often due to defensive tactics or low-quality attacking play.
The ZIP model addresses this by introducing a separate process that accounts for the probability of an excess number of zeros, improving predictions for match results, goal distributions, and betting markets like correct scores and over/under goals.
This makes it particularly useful for leagues or teams where 0-0 results occur more often than a simple Poisson distribution would suggest.
[1]:
import penaltyblog as pb
Get data from football-data.co.uk#
[2]:
fb = pb.scrapers.FootballData("ENG Premier League", "2019-2020")
df = fb.get_fixtures()
df.head()
[2]:
date | datetime | season | competition | div | time | team_home | team_away | fthg | ftag | ... | b365_cahh | b365_caha | pcahh | pcaha | max_cahh | max_caha | avg_cahh | avg_caha | goals_home | goals_away | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||||
1565308800---liverpool---norwich | 2019-08-09 | 2019-08-09 20:00:00 | 2019-2020 | ENG Premier League | E0 | 20:00 | Liverpool | Norwich | 4 | 1 | ... | 1.91 | 1.99 | 1.94 | 1.98 | 1.99 | 2.07 | 1.90 | 1.99 | 4 | 1 |
1565395200---bournemouth---sheffield_united | 2019-08-10 | 2019-08-10 15:00:00 | 2019-2020 | ENG Premier League | E0 | 15:00 | Bournemouth | Sheffield United | 1 | 1 | ... | 1.95 | 1.95 | 1.98 | 1.95 | 2.00 | 1.96 | 1.96 | 1.92 | 1 | 1 |
1565395200---burnley---southampton | 2019-08-10 | 2019-08-10 15:00:00 | 2019-2020 | ENG Premier League | E0 | 15:00 | Burnley | Southampton | 3 | 0 | ... | 1.87 | 2.03 | 1.89 | 2.03 | 1.90 | 2.07 | 1.86 | 2.02 | 3 | 0 |
1565395200---crystal_palace---everton | 2019-08-10 | 2019-08-10 15:00:00 | 2019-2020 | ENG Premier League | E0 | 15:00 | Crystal Palace | Everton | 0 | 0 | ... | 1.82 | 2.08 | 1.97 | 1.96 | 2.03 | 2.08 | 1.96 | 1.93 | 0 | 0 |
1565395200---tottenham---aston_villa | 2019-08-10 | 2019-08-10 17:30:00 | 2019-2020 | ENG Premier League | E0 | 17:30 | Tottenham | Aston Villa | 3 | 1 | ... | 2.10 | 1.70 | 2.18 | 1.77 | 2.21 | 1.87 | 2.08 | 1.80 | 3 | 1 |
5 rows × 111 columns
Train the model#
[3]:
clf = pb.models.ZeroInflatedPoissonGoalsModel(
df["goals_home"], df["goals_away"], df["team_home"], df["team_away"]
)
clf.fit()
The model’s parameters#
[4]:
clf
[4]:
Module: Penaltyblog
Model: Zero-inflated Poisson
Number of parameters: 42
Log Likelihood: -1057.712
AIC: 2199.424
Team Attack Defence
------------------------------------------------------------
Arsenal 1.133 -0.937
Aston Villa 0.84 -0.618
Bournemouth 0.813 -0.65
Brighton 0.776 -0.837
Burnley 0.87 -0.91
Chelsea 1.349 -0.806
Crystal Palace 0.543 -0.922
Everton 0.899 -0.795
Leicester 1.306 -1.084
Liverpool 1.536 -1.283
Man City 1.721 -1.206
Man United 1.286 -1.216
Newcastle 0.754 -0.766
Norwich 0.391 -0.521
Sheffield United 0.761 -1.162
Southampton 1.052 -0.719
Tottenham 1.218 -0.953
Watford 0.706 -0.669
West Ham 1.013 -0.688
Wolves 1.031 -1.125
------------------------------------------------------------
Home Advantage: 0.229
Zero Inflation: 0.0
[5]:
clf.get_params()
[5]:
{'attack_Arsenal': np.float64(1.1331372080052156),
'attack_Aston Villa': np.float64(0.8398561410597628),
'attack_Bournemouth': np.float64(0.813057637933014),
'attack_Brighton': np.float64(0.7764426299853483),
'attack_Burnley': np.float64(0.8703210613417295),
'attack_Chelsea': np.float64(1.3486518753819212),
'attack_Crystal Palace': np.float64(0.542528060198383),
'attack_Everton': np.float64(0.8994654672297977),
'attack_Leicester': np.float64(1.305782179585337),
'attack_Liverpool': np.float64(1.5361283056497848),
'attack_Man City': np.float64(1.7212536319787866),
'attack_Man United': np.float64(1.2855938357088401),
'attack_Newcastle': np.float64(0.754379678544581),
'attack_Norwich': np.float64(0.39137200684554035),
'attack_Sheffield United': np.float64(0.761496465130705),
'attack_Southampton': np.float64(1.0516579720663382),
'attack_Tottenham': np.float64(1.217800596476084),
'attack_Watford': np.float64(0.7064069227623189),
'attack_West Ham': np.float64(1.0134905097965201),
'attack_Wolves': np.float64(1.031177814319993),
'defence_Arsenal': np.float64(-0.9373866783739618),
'defence_Aston Villa': np.float64(-0.6183005430604998),
'defence_Bournemouth': np.float64(-0.6497289565031042),
'defence_Brighton': np.float64(-0.836602390899969),
'defence_Burnley': np.float64(-0.9097319671165368),
'defence_Chelsea': np.float64(-0.8056312701235303),
'defence_Crystal Palace': np.float64(-0.9217060184023727),
'defence_Everton': np.float64(-0.7950673965766504),
'defence_Leicester': np.float64(-1.0840072293711287),
'defence_Liverpool': np.float64(-1.283237543707284),
'defence_Man City': np.float64(-1.2063865223320078),
'defence_Man United': np.float64(-1.2155171715865978),
'defence_Newcastle': np.float64(-0.7659852881574328),
'defence_Norwich': np.float64(-0.5205223308283069),
'defence_Sheffield United': np.float64(-1.1624467635288984),
'defence_Southampton': np.float64(-0.7187036119033049),
'defence_Tottenham': np.float64(-0.953260850786312),
'defence_Watford': np.float64(-0.6693411859904393),
'defence_West Ham': np.float64(-0.68777419205324),
'defence_Wolves': np.float64(-1.1252484363593045),
'home_advantage': np.float64(0.22922731015568612),
'zero_inflation': np.float64(2.8371077678703293e-12)}
Predict Match Outcomes#
[6]:
probs = clf.predict("Liverpool", "Wolves")
probs
[6]:
Module: Penaltyblog
Class: FootballProbabilityGrid
Home Goal Expectation: [1.89668415]
Away Goal Expectation: [0.77719832]
Home Win: 0.6384940133706147
Draw: 0.2148861999024029
Away Win: 0.1466197848036293
1x2 Probabilities#
[7]:
probs.home_draw_away
[7]:
[np.float64(0.6384940133706147),
np.float64(0.2148861999024029),
np.float64(0.1466197848036293)]
[8]:
probs.home_win
[8]:
np.float64(0.6384940133706147)
[9]:
probs.draw
[9]:
np.float64(0.2148861999024029)
[10]:
probs.away_win
[10]:
np.float64(0.1466197848036293)
Probablity of Total Goals >1.5#
[11]:
probs.total_goals("over", 1.5)
[11]:
np.float64(0.7465613418970934)
Probability of Asian Handicap 1.5#
[12]:
probs.asian_handicap("home", 1.5)
[12]:
np.float64(0.3843887021858695)
Probability of both teams scoring#
[13]:
probs.both_teams_to_score
[13]:
np.float64(0.45922636572315534)