Zero-inflated Poisson#

The Zero-Inflated Poisson (ZIP) model is an extension of the standard Poisson model designed to handle an excess of zero-goal outcomes in football match data.

Traditional Poisson models may struggle with matches that end in goalless draws more frequently than expected, often due to defensive tactics or low-quality attacking play.

The ZIP model addresses this by introducing a separate process that accounts for the probability of an excess number of zeros, improving predictions for match results, goal distributions, and betting markets like correct scores and over/under goals.

This makes it particularly useful for leagues or teams where 0-0 results occur more often than a simple Poisson distribution would suggest.

[1]:
import penaltyblog as pb

Get data from football-data.co.uk#

[2]:
fb = pb.scrapers.FootballData("ENG Premier League", "2019-2020")
df = fb.get_fixtures()

df.head()
[2]:
date datetime season competition div time team_home team_away fthg ftag ... b365_cahh b365_caha pcahh pcaha max_cahh max_caha avg_cahh avg_caha goals_home goals_away
id
1565308800---liverpool---norwich 2019-08-09 2019-08-09 20:00:00 2019-2020 ENG Premier League E0 20:00 Liverpool Norwich 4 1 ... 1.91 1.99 1.94 1.98 1.99 2.07 1.90 1.99 4 1
1565395200---bournemouth---sheffield_united 2019-08-10 2019-08-10 15:00:00 2019-2020 ENG Premier League E0 15:00 Bournemouth Sheffield United 1 1 ... 1.95 1.95 1.98 1.95 2.00 1.96 1.96 1.92 1 1
1565395200---burnley---southampton 2019-08-10 2019-08-10 15:00:00 2019-2020 ENG Premier League E0 15:00 Burnley Southampton 3 0 ... 1.87 2.03 1.89 2.03 1.90 2.07 1.86 2.02 3 0
1565395200---crystal_palace---everton 2019-08-10 2019-08-10 15:00:00 2019-2020 ENG Premier League E0 15:00 Crystal Palace Everton 0 0 ... 1.82 2.08 1.97 1.96 2.03 2.08 1.96 1.93 0 0
1565395200---tottenham---aston_villa 2019-08-10 2019-08-10 17:30:00 2019-2020 ENG Premier League E0 17:30 Tottenham Aston Villa 3 1 ... 2.10 1.70 2.18 1.77 2.21 1.87 2.08 1.80 3 1

5 rows × 111 columns

Train the model#

[3]:
clf = pb.models.ZeroInflatedPoissonGoalsModel(
    df["goals_home"], df["goals_away"], df["team_home"], df["team_away"]
)
clf.fit()

The model’s parameters#

[4]:
clf
[4]:
Module: Penaltyblog

Model: Zero-inflated Poisson

Number of parameters: 42
Log Likelihood: -1057.712
AIC: 2199.424

Team                 Attack               Defence
------------------------------------------------------------
Arsenal              1.133                -0.937
Aston Villa          0.84                 -0.618
Bournemouth          0.813                -0.65
Brighton             0.776                -0.837
Burnley              0.87                 -0.91
Chelsea              1.349                -0.806
Crystal Palace       0.543                -0.922
Everton              0.899                -0.795
Leicester            1.306                -1.084
Liverpool            1.536                -1.283
Man City             1.721                -1.206
Man United           1.286                -1.216
Newcastle            0.754                -0.766
Norwich              0.391                -0.521
Sheffield United     0.761                -1.162
Southampton          1.052                -0.719
Tottenham            1.218                -0.953
Watford              0.706                -0.669
West Ham             1.013                -0.688
Wolves               1.031                -1.125
------------------------------------------------------------
Home Advantage: 0.229
Zero Inflation: 0.0
[5]:
clf.get_params()
[5]:
{'attack_Arsenal': np.float64(1.1331372080052156),
 'attack_Aston Villa': np.float64(0.8398561410597628),
 'attack_Bournemouth': np.float64(0.813057637933014),
 'attack_Brighton': np.float64(0.7764426299853483),
 'attack_Burnley': np.float64(0.8703210613417295),
 'attack_Chelsea': np.float64(1.3486518753819212),
 'attack_Crystal Palace': np.float64(0.542528060198383),
 'attack_Everton': np.float64(0.8994654672297977),
 'attack_Leicester': np.float64(1.305782179585337),
 'attack_Liverpool': np.float64(1.5361283056497848),
 'attack_Man City': np.float64(1.7212536319787866),
 'attack_Man United': np.float64(1.2855938357088401),
 'attack_Newcastle': np.float64(0.754379678544581),
 'attack_Norwich': np.float64(0.39137200684554035),
 'attack_Sheffield United': np.float64(0.761496465130705),
 'attack_Southampton': np.float64(1.0516579720663382),
 'attack_Tottenham': np.float64(1.217800596476084),
 'attack_Watford': np.float64(0.7064069227623189),
 'attack_West Ham': np.float64(1.0134905097965201),
 'attack_Wolves': np.float64(1.031177814319993),
 'defence_Arsenal': np.float64(-0.9373866783739618),
 'defence_Aston Villa': np.float64(-0.6183005430604998),
 'defence_Bournemouth': np.float64(-0.6497289565031042),
 'defence_Brighton': np.float64(-0.836602390899969),
 'defence_Burnley': np.float64(-0.9097319671165368),
 'defence_Chelsea': np.float64(-0.8056312701235303),
 'defence_Crystal Palace': np.float64(-0.9217060184023727),
 'defence_Everton': np.float64(-0.7950673965766504),
 'defence_Leicester': np.float64(-1.0840072293711287),
 'defence_Liverpool': np.float64(-1.283237543707284),
 'defence_Man City': np.float64(-1.2063865223320078),
 'defence_Man United': np.float64(-1.2155171715865978),
 'defence_Newcastle': np.float64(-0.7659852881574328),
 'defence_Norwich': np.float64(-0.5205223308283069),
 'defence_Sheffield United': np.float64(-1.1624467635288984),
 'defence_Southampton': np.float64(-0.7187036119033049),
 'defence_Tottenham': np.float64(-0.953260850786312),
 'defence_Watford': np.float64(-0.6693411859904393),
 'defence_West Ham': np.float64(-0.68777419205324),
 'defence_Wolves': np.float64(-1.1252484363593045),
 'home_advantage': np.float64(0.22922731015568612),
 'zero_inflation': np.float64(2.8371077678703293e-12)}

Predict Match Outcomes#

[6]:
probs = clf.predict("Liverpool", "Wolves")
probs
[6]:
Module: Penaltyblog

Class: FootballProbabilityGrid

Home Goal Expectation: [1.89668415]
Away Goal Expectation: [0.77719832]

Home Win: 0.6384940133706147
Draw: 0.2148861999024029
Away Win: 0.1466197848036293

1x2 Probabilities#

[7]:
probs.home_draw_away
[7]:
[np.float64(0.6384940133706147),
 np.float64(0.2148861999024029),
 np.float64(0.1466197848036293)]
[8]:
probs.home_win
[8]:
np.float64(0.6384940133706147)
[9]:
probs.draw
[9]:
np.float64(0.2148861999024029)
[10]:
probs.away_win
[10]:
np.float64(0.1466197848036293)

Probablity of Total Goals >1.5#

[11]:
probs.total_goals("over", 1.5)
[11]:
np.float64(0.7465613418970934)

Probability of Asian Handicap 1.5#

[12]:
probs.asian_handicap("home", 1.5)
[12]:
np.float64(0.3843887021858695)

Probability of both teams scoring#

[13]:
probs.both_teams_to_score
[13]:
np.float64(0.45922636572315534)