Last week, we asked this question:
If you’ve ever delved into Football Analytics Twitter, you’ll probably have come across the term ‘expected goals’. For many analytics folks, it’s the holy grail. But what is it?
Let’s go back to school, simple maths and probability.
Each shot taken by a player has a probability of going in. This probability is expressed in numerical form as a number between 0 (no chance of a goal being scored) and 1 (a goal is certain). The probability of a shot being scored from a location is its expected goals (or XPG/xG if you prefer abbreviations).
Many factors go into working out each shot’s probability of going in; for example the player’s location on the pitch when taking out the shot.
Over the course of a match, each team’s expected goals can be added up, and very clever people on Twitter can produce expected goals maps like these:
(This xG map is provided by Michael Caley, who you can (and should) follow here.)
You’re probably thinking “What the hell is this?” Let me explain.
Each square on the pitch is a shot, with Verona’s shot locations on the right, and Napoli’s on the left. The size of the square corresponds to the quality (expected goals) of the chance. Goals are in pink. The xG sum of all of Napoli’s shots was 4.7, Verona’s was 0.4, clearly indicating that Napoli dominated the game, both in terms of shot numbers, but in terms of shot quality.
Why do I like expected goals?
Unlike most statistics in the mainstream, (shots, shots on target etc.) expected goals directly measures shot quality, and if taken over the course of the match, both quality and quantity of shots. It makes sense to assume that a side that is able to create more/better chances than the opposition has played better and ‘deserves’ to win, as a general rule of thumb. Sure, statistical anomalies happen, and sides that have an xG of 0.5 over a match can and have beaten sides with 2.7 xG, but (very generally) a higher xG than your opponent means that you ‘should’ win the game. On this topic, this from Danny Page is an invaluable resource.
I find xG maps an invaluable tool in gauging how well teams played, xG as a whole helpful in challenging media narratives as to how well a team is playing and the concept vital in thinking more about shot quality, which is a topic not covered enough in mainstream media analysis in my opinion.
Expected goals can be also useful when applied to individual players.
Player Z’s xG over a season: 10.1
Player Z’s goals over a season: 18
Now, each shot’s xG is only the average probability of scoring for every single player, so there are some exceptional finishers, such as Messi and Podolski, who are able to outperform their expected goals year after year. But what’s interesting about expected goals is how rare it is that players consistently overperform their expected goals to a large extent. We can predict with some degree of confidence that players who are largely outperforming their individual expected goals (see player Z above) will regress to the mean over time. Expected goals can help you gauge how good a player’s finishing is (if he’s consistently over/underperformed his xG, or been a G = xG finisher throughout his career), but more importantly, how good he is at getting into good positions. Arsene Wenger has described finishing as ‘cyclical’, and while what he’s talking about refers more to regression than anything else, he’s right.
Expected goals is a great predictor of future results (more on that in another blog post) and it correlates really well with actual goals over a large enough sample size.
Why don’t I like expected goals?
The biggest black mark against expected goals most people have is that it can’t measure the proximity of defenders to the shooter, due to the lack of off the ball tracking data. The modellers try to compensate for this (as explained below), but this is still an ongoing problem.
It only includes shots. Say a cross is fizzed across the face of goal and nobody gets a touch and it goes out for a throw-in. That amazing chance to score a goal isn’t counted by expected goals because there was no shot and the side gets no credit for it in terms of expected goals.
It’s easy to misinterpret an xG map.
Take this xG map for example. Looks like a fairly close game based on the xG difference, doesn’t it? Fact is, Arsenal went 2-0 up early on, and could afford to sit off, conceding loads of poor quality chances to Villa, and had no need to go forwards as long as Villa didn’t score. Score effects matter a lot in determining a side’s xG. As with everything in football analytics, context is king.
What factors go into expected goals?
It varies based on each person who creates a model, but I’ll go through several factors and give a brief explanation for the theory behind each.
Angle of the shot
This should be fairly obvious. The more central a shot is taken, the more of the goal the shooter will have to aim at, and the higher chance he’ll have of scoring.
Distance of the shooter from goal
Fairly self-explanatory as well. Shots closer to the goal tend to find the net more frequently than those from outside the area.
Part of the body used to shoot
In terms of feet, this should be obvious. A player with two identical opportunities is more likely to score with his stronger foot, so this is incorporated into xG calculations. However, xG does not like headers, because they’re far harder to direct and generate power on, meaning that a shot with feet is almost always going to have a higher xG than a header from the same location.
Speed of the attack
This is partly included because of football’s lack of off the ball tracking data. The general theory behind this is that faster attacks are more likely to result in goals because the opposition’s defence will be unable to get back into defence in a strong defensive shape. Expected goals loves chances coming from counter attacks.
Type of assist
Perhaps a less obvious factor. Crosses (especially those in the air, leading to headers) are not a great way of generating high-quality chances. Instead, throughballs (because they lead to one-on-ones) and passes from the danger zone (because these leave the defence and goalkeeper out of position for an easy chance) generally result in high quality chances with little defensive pressures, so a chance assisted by one of these passes is given greater xG. Assists (or shots) following a successful dribble give the chance a higher xG because it is assumed (quite fairly) that there is less defensive pressure on the player, giving them room to get a shot away.
A few others
Individual errors often give sides an easy chance with most of the team out of position, a player rounding the keeper ramps up his shot’s xG and rebounds often give a free shot to an attacker in a good area, resulting in a higher xG for that chance.
(Game state also affects expected goals, but I want to cover them in the next instalment.)
To conclude, expected goals isn’t perfect, and nobody who works with it would suggest that it is, but it’s by far the best stat we have, and it’s got plenty of things going for it too.
If you’re curious, here are the ‘best’ and ‘worst’ finishers compared to xG over the past few years:
(Credit to Michael Caley for both the xG maps and these two graphs.)
A few really cool xG-related resources from the Twitterati:
Thanks for reading.
If you’ve any questions, you can contact me @OneShortCorner.
You can read the other parts of this series here: