Football Analytics – Part Seven: How (Not) To Use Stats

We need to talk about stats. Or more specifically, how not to use them. The rampant misuse of statistics in football does much to discredit them as a whole, and it’s important to be able to distinguish between the ‘good’ and ‘bad’ ones. I just wanted to jot down some thoughts about the use of stats and the ways in which we can (and should) interpret them better.

WhoScored and Squawka

Let’s start off with two serial criminal offenders, WhoScored and Squawka. These two sites can be hugely useful in terms of looking up players’ and teams’ statistics, so we shouldn’t disregard them entirely, but each have flaws. WhoScored’s oft-retweeted Teams of the Week/Season consistently throw up quite strange suggestions, for example, often favouring midfielders who make a high number of passes, regardless of whether or not these passes have impacted a game, meaning that the player rating that they come up with is not necessarily a good indicator of how well they have played. As the formula they use to weight the stats they use to come up with these teams is at present unknown to the public, take them with a pinch of salt.

Squawka’s Comparison Matrix is so often misused (producing monstrosities like the one below) that I thought it worth making a few basic points on how to get better results from it.

Townsend Ronaldo.png

First things first, always adjust your stats to per90. Always. And preferably with a decent sample size. This is a fairly basic concept that people often ignore. Adjusting your stats like this better allows you to compare two players’ outputs because they ignore whether a player has played more minutes than another, which often accounts for higher numbers of raw actions (eg. passes and shots). However, as with almost everything, there should be an appreciation of context. Players who often come on as substitutes are often involved in more goals p90 because goals are more frequent later on in games, thanks to fatigue and teams throwing the kitchen sink at the opposition (which in turn leaves them open at the back), so it’s important to understand that sub effects exist. This is a good piece on that subject.

Making sure that you use non-penalty goals p90 when evaluating goalscorers (if you don’t have any xG data) is another concept that is generally ignored by the mainstream media. However, the winning of penalties is not generally a repeatable skill and therefore the goals coming from them are not necessarily indicative of a wider scoring ability. Furthermore, a striker being given a ~80% chance doesn’t necessarily show any ability to get into good positions, which has been proven to be key to defining a goalscorer’s quality.

It’s also important to understand whether the stats you’re comparing players with are indicators of quality or of style, but more on that later.

Defensive Stats Are Different

You often see defenders’ individual stats bandied about like badges of honour after games, but in truth, they don’t really mean much. Defensive stats such as tackles or interceptions made are often a better indicator of a defender’s style rather than their actually quality. Mark (@ETNAR_uk) has done a tonne of work on evaluating defenders using statistics and he hasn’t really been able to find defensive statistics that correlate well with how good he thinks defenders are at preventing shots.

This leads on to another point about individual statistics, which is that they’re often more a description of style than anything else. Take dribbling for example. If a player X dribbles twice as much as a player Y, does that make him a better player? Not really. Maybe he’s a better dribbler? Maybe the other player has a style of play that revolves more around passing than moving with the ball? Even then, you should still try to break these statistics down to percentage success rates to try to work out which player is a better dribbler. Does player X or player Y bypass more players on average when dribbling? You can almost always go into more detail and context, and it’s important to do so.

For individual defensive stats such as tackles made and interceptions, it’s also helpful to try and adjust them based on the player’s side’s possession, as Ted Knutson (@mixedknuts) explains in this piece:

“If your team has possession of the ball, you can’t rack up defensive rate stats.  Teams that have a ton of possession don’t give their opponent the ball very often, and thus can’t accumulate defensive stats. What do you do when you know the basic rate stats are meaningless? You adjust them.”

And if you’re interested, the formula that Ted uses to adjust defensive stats for possession is at the bottom of the previously-linked piece.

Team Stats

Team stats have to correlate to winning in order to be relevant in determining teams’ quality, otherwise they’re measures of style rather than anything else. If you take more shots than the opposition, you’ll generally win. If you run more than the opposition, there’s no guarantee you’ll win. There’s no correlation between distance run over a season and finishing position (contrary to what Sky Sport’s Distance Covered graphics would have you believe) as shown in Soccernomics. The sides that run more tend to press more heavily, but that doesn’t in itself make them a better team, it just makes their style different to opponents who are more willing to play a low block. It’s always important to ensure your statistics are relevant.

Unsustainability and Sample Size

Rashford

Stats companies such as Opta often tweet little gobbets like the above tweet concerning England’s rising star, Marcus Rashford. It gets them those sweet, sweet numbers, but often these stats are fairly useless for two reasons. Firstly, the 8 goals from the 14 shots on target Rashford had taken by the 21st May represents such a small sample size that it’s going to be pretty much wholly unrepresentative of his career finishing ability. Secondly, this is plainly unsustainable. No professional footballer alive can sustain a conversion rate of anywhere near (or above) 50% as Rashford has done here, so this stat clearly doesn’t represent his true ability. Rather, it’s not repeatable, an important subject I touch upon here. Conversion rate have been proven to fluctuate a lot, and the English media for one seem rather taken with stats that are clearly unsustainable. Sure, it’s a fact and you can’t dispute what the numbers say in this case, but is it helpful in evaluating how good the player is? Not really. For me, there’s a difference between facts and stats. I see facts as little gobbets, like the Rashford tweet above, but they don’t have any predictive or evaluative value, unlike stats, which are much better predictors of the future.

Misconceptions?

Despite the weirdly prevalent idea that those who are into stats don’t actually watch football, there’s a lot to be said for applying the ‘eye test’ to football matches to validate statistical theories, although one should also be aware of the inherent biases that come with watching sport.

One of the more frustrating misconceptions surrounding stats is the idea among non-statisticians that one stat or number can unequivocally prove everything, something now self-respecting statistician would ever claim. The idea that football can be reduced to one single number is silly in such a fluid and random sport, and it’s vital to always remember that one stat doesn’t necessarily prove something else to be true. Stats don’t lie, they can only be misinterpreted. I would say that applying context to the stats that you find is the most important thing to do when presenting or evaluating them. There’s always an extra level of detail or explanation you can, and should, go into. Nuance is vital in increasing the chance of you producing a valid conclusion.

Thanks for reading.

You can follow me here @OneShortCorner

And you can find the rest of this series below:

Part One: Introduction

Part Two: Shots

Part Three: PDO

Part Four: Expected Goals

Part Five: Game States and Score Effects

Part Six: Resources

Arsenal and the 07/08 Season – What if?

There have been some incredibly tight title races in Premier League history, with notable seasons including 95/96, 01/02, 02/03, 11/12 and most recently, 13/14. But growing up as a young Arsenal fan, there was one title race that gripped my imagination like no other, and that occurred in the 2007/2008 season. This race had the benefit of being between three teams, a feat only really replicated in 01/02 and 13/14, and it was pretty much too close to call for the majority of the season.

In early 2014, Arsenal were top of the league and going strong (stop me if you’ve heard this bit before), and were winning plaudits in the media. Analysts were much less convinced however, pointing to pretty ‘meh’ shot numbers and an unsustainable conversion rate, as this Statsbomb piece articulates. What I wanted to check with fairly rudimentary shot data having manually collected it (pre-enlightened era struggles) from Statto.com was whether Arsenal’s title challenge in 07/08 was sustainable, and if so, why they didn’t win the league, having been 5 points clear of nearest challengers Manchester United with 12 games to play, albeit with trips to Old Trafford and Stamford Bridge to come.

To August 2007 then, and some pre-season context. Manchester United had fought back from their first Premier League era blip with their first title in 4 seasons in 2007, beating Chelsea into second place. Arsenal had been well off the pace in 06/07, with the Invincibles squad in the midst of being broken up and talismanic captain Thierry Henry missing most of the second half of the season, and they finished 4th, level on points with Liverpool.

Transfer Activity

Here’s a transfer round-up of Arsenal’s, United’s and Chelsea’s summers.

Arsenal:

Arrivals Departures
Eduardo da Silva £7.5m Thierry Henry £16m
Bacary Sagna £6m Jose Antonio Reyes £8m
Lassana Diarra £4m Freddie Ljungberg £2m
Lukasz Fabianski £2m Jeremie Aliadiere £2m
Total spent: £19.5m Total received: £28m

Seemingly a fairly underwhelming summer for Arsenal, with a lot of attacking talent leaving and only Eduardo coming in on that front. Bacary Sagna would prove invaluable to a side lacking a top-class right back since the departure of Lauren.

Manchester United:

Arrivals Departures
Owen Hargreaves £17m Kieran Richardson £5.5m
Tomas Kuszczak £4m Giuseppe Rossi £6m
Nani £17.3m Alan Smith £6m
Anderson £20.4m Gabriel Heinze £8m
Carlos Tevez £9.5m loan fee
Total spent: £68.2m Total received: £25.5m

A big spending summer from Sir Alex Ferguson saw significant arrivals in midfield and up front, although he wisely chose to stick with the Ferdinand-Vidic axis that had served him so well in 06/07. Tevez was a late arrival in August after the Premier League case involving his registration.

Chelsea:

Arrivals Departures
Franco Di Santo £3m Glen Johnson £4m
Florent Malouda £13.5m Arjen Robben £25m
Juliano Belletti £4m Lassana Diarra £4m
Tal Ben Haim Free
Claudio Pizarro Free
Total spent: £20.5m Total received: £33m

A fairly fiscally conservative window by Chelsea’s and Abramovich’s standards, although they would go on to spend close to £25m in the January window on Anelka and Ivanovic. This squad probably had the smallest significant player turnover (Arsenal’s sale of Henry and Tevez’s move to United were to be important moves for their sides), similar to Mourinho’s Chelsea side before the 15/16 season…

Squad Ageing Curves

To create these I looked at each player’s age at the start of the season and placed them in 6 different age brackets. Optimum squad ageing curves see most players between 26-30, with a bit of experience on one side of the curve, and a bit of youthful vigour on the other side. The differences between the teams are pretty interesting. Title Challengers' Ageing Curves 0708 2

Chelsea and United are pretty similar, with Chelsea maybe possessing a bit more experience. But Arsenal, holy crap. They had one outfield player starting the season aged 30 or above, and that was Gilberto Silva, who only played ~1,200 league minutes. This was an incredibly young side. Look how many players they had in the age bracket 20-23 compared to everyone else (Fabianski, Senderos, Clichy, Diaby, Diarra, Fabregas, Flamini and Adebayor, the latter 3 being hugely important to them). Anyways, I’ll come onto experience later. On with the season.

The First Nine Games

Arsenal started off like a house on fire, going 8-1-0 in their first 9 games, helped by a 26% conversion rate, an 87% save percentage and a pretty easy start, with Tottenham away proving the only really challenging game, and with a small sample size PDO a tad high at 111. Nonetheless, for a young side, written off in many quarters before the season had started, to have a SoTR of 0.64 was an impressive start, but one dwarfed by Manchester United’s shooting numbers. Despite having a typically slow start to the season (2 draws and a defeat in their first three games), United reeled off 6 wins in a row to stay on Arsenal’s tails, with a whopping SoTR of 0.73 over 9 games. I’d wager it would be pretty difficult to find any team with such a degree of shot volume dominance over 9 games in Premier League history. Their conversion rate was a little on the low side, but Van der Sar was already excelling again with an early unsustainable save percentage of 93%. For Chelsea, the start of the season was a bit of a strange one. They started off well, with 3 wins and a draw from their opening fixtures, but shot numbers never convinced, even early in the season, and Mourinho was sacked after only six games, with a TSR of 0.57. Not elite, but plenty good enough for 3rd/4th, and despite the small sample size, Chelsea’s shot level stayed at around 0.57 for most of the season, as shown below.

Title Challengers' TSR 0708.png

It’s not really surprising that United were comfortably the best team by TSR and SoTR with shot monsters like Ronaldo and a young Wayne Rooney in their side. Over the course of the season, their conversion rate was consistently lower than Arsenal’s, whether that’s a season-long quirk or a dual result of a Wenger team prioritising shot quality in the time before expected goals and United’s more haphazard shooting approach, I’m not sure. What’s remarkable is how early Chelsea’s TSR stabilised, indicating that Caretaker Manager Avram Grant didn’t do as good a job in terms of improving the side as their improved results would necessarily suggest.

Up to Boxing Day

Arsenal’s points per game (PPG) and shot numbers dropped slightly off, in understandable circumstances given how they started the season, with draws at Anfield and against United at the Emirates seemingly evidence of this side’s ‘bottle’, whatever that means. Despite their shot numbers falling a little bit, United kept pace, leading Arsenal by one point at the half-way point on Boxing Day. Chelsea had been steadily creeping up the table with fairly unremarkable stats compared to the other two (lower shot ratios, save percentage and conversion rate), but found themselves only 7 points off the top with 19 games played.

Title Challengers' Points 0708.png

And So To Birmingham

The next 7 games went pretty much perfectly for Arsenal. They won 6 and drew one, playing some scintillating stuff against Everton, West Ham and Manchester City in particular, with the Eduardo and Adebayor partnership starting to flourish, picking up 16 out of a possible 18 points, whereas United and Chelsea could only pick up 14 and 12 points respectively. Arsenal’s win at home to Blackburn left them 5 points clear with 12 games to play, the joint-largest lead a side would have in the season, and there was a mounting sense that the title race was pretty much over as far as I can remember, despite the fact Arsenal still had to travel to both of their closest challengers. Then Arsenal went to Birmingham, and everything changed. Eduardo had his leg broken by Martin Taylor, and despite a stunned Arsenal coming from behind to lead the game, Gael Clichy gave away an unnecessary last minute penalty which James McFadden converted. 2-2. The enduring image of that game will be William Gallas storming to and sitting at the far end of the pitch after the penalty was given, and to many, that signified the beginning of the end as far as Arsenal’s title challenge was concerned. To me at the time, it certainly seemed like the Eduardo injury affected Arsenal, they seemed sluggish and frail in the following matches. But did this translate to the numbers?

First 26 games (Pre-Birmingham)
TSR 0.60
SoTR 0.61
PPG 2.42
PDO 113.5
Sc% 28.3%
Sv% 85.2%

Immediately in the first column the PDO value of 114 jumps out at you. Looking at its components, both of them are a little high, if we were being analytical, we’d predict both to regress just a touch.

Last 12 games (Post-Birmingham)
TSR 0.64
SoTR 0.63
PPG 1.66
PDO 97.6
Sc% 20.4%
Sv% 77.2%

Wow. So Arsenal look like they were an even better shots team in the final third or so of the season, but their conversion rates and save percentages fell off big time. For some context, if you apply Arsenal’s sc% and sv% from their first 26 games to their shot data from their last 12 games, they’d have, on average, scored 8 more goals and conceded 5 fewer, which would have given them a projected goal difference of +56, only 2 off United’s eventual +58. But this didn’t happen. Why did Arsenal’s %s fall off however?

Part of it is probably down to regression, to sustain a PDO of greater than 110 over the course of a whole season is very unlikely, and Arsenal’s final PDO of 108 is probably round about the most a top club can affect and nudge its PDO north of 100 through its ability to control the quality of shots taken and faced.

PDO Distribution.jpg

But part of it is down to score effects as well. Arsenal conceded the first goal in games against Birmingham, Aston Villa, Middlesbrough, Bolton and Liverpool in the last 12 games of the season, forcing them to chase the game for long periods, resulting in more speculative shots against a packed defence. These lower quality shots would obviously be easier to save, and Arsenal’s need to throw men forwards would leave them exposed on the counter-attack, resulting in higher quality chances for the opposition, and ergo a lower save percentage for Manuel Almunia/Jens Lehmann. It’s not at all beyond the realms of possibility that a fairly inexperienced team (with an outfield average of 23) would do this, but without xG data, it’s pretty hard to work out to what extent regression/bad luck/game states played a part. I think it’s fair to say that all three (combined with the mental trauma of Eduardo’s leg break) occurred to some extent, especially in the 4 games Arsenal played after Eduardo’s injury (including the one against Birmingham), where they outshot their opponents 69-17 yet contrived to draw all of them.

Those four drawn games, and the four following it, saw Arsenal’s lead eradicated and the team fall to third, with defeats at Chelsea and Manchester United nails in their title challenge’s proverbial coffin. If you’re more interested about Arsenal’s tactics and they related to their drop-off, this is an excellent piece . It’s quite hard to say Arsenal ‘bottled’ in these performances per se, but maybe it’s fair to some small extent to ascribe the players’ inability to put the ball away in those crucial games to ‘bottling’, a fairly lazy and insulting term nonetheless, and one I’d rather avoid.

Title Challengers' sc% 0708.png

Title Challengers' sv% 0708.png

The Final Push

With that, it came down to United and Chelsea, as it had done for the two preceding seasons and as it would do for several future years, with both hitting their attacking strides with conversion rates rising. In the 9 league games following Eduardo’s leg break and Arsenal’s implosion, the two sides matched each other, both going 8-1-0 before the gargantuan meeting at Stamford Bridge between the two. Chelsea, whose results since January had been boosted by the January arrival of Nicholas Anelka, prevailed 2-1 with a late Michael Ballack penalty, leaving the two sides neck and neck with two games to play, with United possessing the vastly superior goal difference. Chelsea beat Newcastle but drew to Bolton on the last day, a result made irrelevant by United’s wins over West Ham and Wigan, leaving the table looking like this:

Position Club Points Goal Difference
1st Manchester United 87 +58
2nd Chelsea 85 +36
3rd Arsenal 83 +43

It’s pretty rare to have three teams break the 80 barrier, and the only other season in which I can remember it happening was 13/14, which was made even more impressive by the 4th placed team, Arsenal (hold your jokes), finishing on 79. Nevertheless, the 07/08 season was remarkable in that there was pretty much a sustained 3 horse race for the entire season, with the title being decided on the very last day. As an Arsenal fan, supporting a very young team written off by all after selling their all-time goal-scorer in the pre-season, I can only look back at that day in Birmingham and think “What if?”

See if you can spot that game on the graph below.

Title Challengers' PPG 0708.png

PPG +GD/G TSR SDiff/G SoTR SOTDiff/G PDO Sc% Sv% Best Worst
MNU 2.29 1.53 0.63 +6.56 0.65 +4.31 111 22.5% 88.5% 8 1
CHE 2.24 1.03 0.58 +3.55 0.61 +2.47 109 24.2% 85.1% 0 5
ARS 2.18 1.13 0.61 +5.11 0.62 +2.87 108 25.6% 82.8% 1 3

From this viewpoint, United’s success almost looked inevitable, although that doesn’t really tell the full story of the way the season panned out.

Although they were a good team (as proven by getting to the Champions League Final where they lost to United), Chelsea weren’t really a great team in 07/08. A shot differential per game of +3.6 isn’t much to write home about for a title contender, although in their defence, there were mitigating circumstances. Only five of their outfield players managed to clock more than 2,000 minutes (in a 3,420 minute season), compared to 8 from Arsenal and 9 from United, so it was pretty difficult for Grant to ever get a settled team.

Arsenal, especially when one considers their shot domination in their blip which cost them the league, can be seen to have been a little unlucky to come 3rd, especially when 2nd, or even 1st at a slight push, would have better reflected their actual ability. What prevented them from winning the league in my opinion was both a lack of experience, which manifested itself in the post-Birmingham reaction, and the lack of a partner for Adebayor (who played almost 3,000 league minutes over the season) in his dry patch in 2008 after Eduardo broke his leg, with van Persie only playing his first 90 minutes in a match against Bolton at the Reebok, after that run of 4 consecutive draws and a defeat at Chelsea.

Title Challengers' Mins Per Age Bracket.png

Nonetheless, Arsenal’s reliance on young players was almost unheard of from a title-challenging team’s perspective, and there’s no doubt that this team were unbelievably talented, and probably should have actually won things when they matured, particularly in 2009-2011. There’s a strong case for them being Wenger’s best side of the Emirates era, being the only post-2005 Arsenal side to break the 80-point barrier, although next year’s iteration, if recent signings are anything to go buy, could give them a run for their money.

If you want cheering up (I know I did after writing this), I’d advise you to watch the first two thirds of this (Arsenal’s 07/08 season review) and try to enjoy the quality of football on display, and remember what could have been.

Thanks for reading.

You can follow me on Twitter @OneShortCorner.

Arsène Wenger.

I’ve always found the hardest part of an article to be the start. I normally only blog about stats or the like on here, but I’ve wanted to write something about Wenger for a while. The man is the single greatest inspiration to me in both life and football, and is far more than just a football manager, yet so little substantive has been written about him, apart from possibly this piece hosted by the Guardian.

Arsène the stranger

Arsène Wenger arrived at Arsenal in 1996 as an unknown quantity in the English game, a foreigner in an introverted land, following a successful short stint in Japan, a country he was forced to move to after becoming disillusioned with the (widely suspected to be) rampant match-fixing in French football, something that had undoubtedly denied Wenger the chance to add several titles to his CV. By his own admission, he was the league’s favourite to be sacked before Christmas, having been handed the unenviable task of reforming a club well known for its tradition, alcoholism and off-the-field scandals. George Graham had left under a cloud following the bung scandal, Tony Adams and Paul Merson had previously confessed to being alcoholics and Arsenal had finished 12th and 5th in the two preceding seasons.

wenger avi 171.jpg

October 1996: Neither Arsenal nor British football would ever be the same again.

There’s been a plethora of articles devoted to the transformative impact Wenger has had on his club, and it is his club to all intents and purposes, since joining Arsenal; new training regimes, healthier diets, better contracts for ageing players and a more fluid system of football. He was a revolutionary, using rudimentary statistical programs, an area he is still at the forefront of today, to rate players, picking the crème de la crème of Europe’s young talents to flourish under his watchful eye at Arsenal, players such as Anelka and Vieira. He even resurrected the stuttering careers of players like Overmars and Henry, the latter the most destructive forward in Premier League history.

Arsène the winner

And it worked. Doubles followed in 1998 and 2002, along with FA Cups in 2003 and 2005, and most impressively of all, the unbeaten season of 2003/04, a feat unlikely to ever be replicated in English football, all whilst Arsène and Arsenal strived to compete against the trophy-winning juggernaut that was Sir Alex Ferguson’s Manchester United, resulting in one of the finest sporting rivalries of all time. Like heavyweight boxers, the pair slugged it out for 7 seasons between 1997/98 and 2003/04, with United winning 4 titles to Arsenal’s 3 in one of sport’s great rivalries. Champions League football came to Highbury (or Wembley for a short time) for the first time in Arsenal’s history, and although it took Wenger’s sides a few years to get the hang of it, they soon became perennial fixtures in the knockout stages, little old Arsenal, a club with virtually no European pedigree and a stadium of 38,500, competing with monsters of the European stage such as Inter, Juventus, Barcelona and Real Madrid, in the most successful period in their history.

wenger avi 122.jpg

Invincible.

Wenger was so close to winning the European trophies he so craves, with penalties separating Arsenal from the 2000 UEFA Cup, and everything going wrong on that fateful evening in Paris in May 2006, Arsenal’s first (and only) Champions League Final. Arriving late to the stadium, Lehmann was promptly sent off against a Ronaldinho-inspired Barcelona. Yet Arsenal went ahead, clinging to their record (which still stands) of the longest run in Champions League history without conceding a goal, only being undone by an offside Eto’o goal and Belletti’s winner 10 minutes from time.

Arsène the builder

wenger avi 195.jpg

The most ambitious project in the club’s history.

No matter. Arsène and Arsenal had more important things to think about. The club was about to enter the most important phase of its 120-year history, moving into the 60,000 capacity Emirates Stadium. This was a colossal project, the culmination of years of planning on Wenger’s and the board’s part, as Arsenal’s visionary manager attempted to move his club permanently into the European elite. To finance the construction of this £390m stadium, record 8-year deals were signed with Nike and Fly Emirates, to the tune of £8m a year. The problem was, within a few years, the hyperinflation of English football, kick-started by the arrival of Roman Abramovich across London, made these deals irrelevant in size to those being signed by Arsenal’s rivals Chelsea, Manchester United and Liverpool, and Arsenal were commercially left behind. Arsenal’s stadium-induced austerity project was so severe at points that they were reportedly almost unable to pay their players. Yet Wenger stayed throughout that period, rejecting offers from Europe’s most illustrious clubs; Barcelona, Real Madrid, PSG, Bayern Munich, all came for him. All were rebuffed by a man who has given his best years to the club of his life.

wenger avi 75.png

The arrival of Sheikh Mansour at Manchester City in 2008 and Harry Redknapp’s improving Tottenham team shifted an already-uncertain landscape as far as Arsenal’s position in England was concerned. All five teams (United, Chelsea, City, Tottenham and Liverpool) could spend more than Arsenal, and they did. Arsenal’s net spend in the period 2004/05-2012/13 (the austerity years) was -£31.8m. In contrast, Chelsea and City’s combined net spend in that period was over £700m. And yet Arsenal remained at the top table in Europe, qualifying for the Champions League every season despite these restrictions, a feat that possibly ranks as Wenger’s greatest ever achievement in terms of difficulty, something they have continued to do since, a feat even more impressive when one considers Liverpool’s and United’s decline, as well as Chelsea’s year in the wilderness of 6th in 2011/12, not to mention their implosion in the 2015/16 season.

During the years of admirable consistency, the trophies stopped. An incredibly talented youth team was beaten in the 2007 Carling Cup Final, Eduardo’s broken leg derailed a title challenge seemingly heading into a procession in 2008, and semi-final defeats to United and Chelsea put Arsenal out of two competitions in less than a week in 2009, as had happened 5 years previously. Yet Wenger stuck to his principles, signing and developing young players in the hope that his financially restricted club could fight with the big boys, with Fabregas, Nasri, van Persie, Adebayor et al flourishing, creating countless memorable moments by weaving a rich tapestry on the Emirates carpet. Yet all moved on to pastures new, lured by the promise of financial rewards and trophies.

Following the disastrous 2010/11 season, where Arsenal failed to win any of the four competitions they were in realistic competition for in January, everything seemed to fall apart at Arsenal. The club had been easily able to absorb the loss of Adebayor and Toure to Manchester City in 2009, but the departure of Fabregas to his native Barcelona represented an end to Wenger’s ‘Project Youth’ and his trust in younger players, to which his transfer policy since testifies. There’s no denying that the move hurt Wenger, a father-figure to Fabregas, one need only look at how much more haggard his face became between the optimism-filled months of January and February 2011 and the traumatic summer of 2011, which ended in jeers at Old Trafford, where Arsenal were thrashed 8-2. Nasri had left too, tempted by Manchester City’s lucrative wages, and Arsenal were a husk of the team which had beaten Barcelona in February of the same year. Yet the team still managed to sneak into the Champions League thanks to a remarkable season from Robin van Persie, finishing one place higher than they had the previous year. Van Persie promptly repaid Wenger’s dedication to backing his injury-prone self since 2004 by leaving to Manchester United, forcing Wenger to rebuild – again. That 2011/12 season was followed by another season in 4th in 2012/13, as Wenger kept up his perfect record, which still stands, of never having underperformed his club’s wage bill, the best financial predictor of finishing positions, and of never having finished outside of the top 4.

The summer of 2013 felt different however. And so it proved to be.

Arsène the restorer

wenger ozil.jpg

A new era.

Arsenal’s CEO, Ivan Gazidis, had boldly claimed that the club could spend big on players, with the club seemingly assured of new commercial deals which would kick in in the 14/15 season. Arsène tried, having Higuain snatched under his nose by Napoli and Liverpool refusing to sell Suarez despite Arsenal activating his release clause. The new season began with Mathieu Flamini and Yaya Sanogo the only signings and rumblings of discontent from fans convinced they had been duped by the club, rumblings which became a roar after an opening-day defeat to Aston Villa. Then it happened. With less than an hour to go until the end of the transfer window, Arsenal signed Mesut Ozil, the bonafide world-class signing the club had been crying out for in order to re-establish themselves a serious contender at home and in Europe, a contemporary signing to rival the impact of Dennis Bergkamp in 1995, smashing their transfer record by almost three times the previous amount, and obliterating a few myths about Wenger’s frugality as well. Inspired by Ozil, Arsenal stormed to the top of the table in the first half of the season, but heavy defeats to title rivals set their challenge back, with Arsenal finishing 7 points behind winners City in 4th. Despite this, a first trophy in nine years was won on a gloriously sunny day at Wembley, and clear progress had been made, as Arsène began the process of re-establishing Arsenal at the summit of English football.

wenger avi 110.jpg

Nine years of hurt, over.

£100m was spent the next season on new players, in part thanks to new commercial deals signed with PUMA and Fly Emirates, and Arsenal improved again, finishing strongly in 3rd having seemingly overcome their big-game block (showing that Arsène actually does do tactics) and retaining the FA Cup to make them the most successful team in its history and Wenger the most successful manager in its history, both magnificent achievements. However, there had been increasingly unsavoury incidents at games that season, despite the on-the-pitch progression. Wenger had been booed in public after a defeat at Stoke in December 2014, as fans’ frustrations boiled over at Arsenal’s failure to compete for trophies, even though they would win 4 (including Community Shields) between May 2014 and August 2015, more than any other club in England in that period, hinting at deeper divisions in the club, one that trophies seemingly couldn’t cure.

wenger avi 151.jpg

“A big club must always have the responsibility of winning with style and class.”

2015/16 was seen by many as the culmination of Wenger’s post-2013 rebuilding of Arsenal. Petr Cech was added to the squad, providing the much-vaunted clichéd qualities of ‘experience’ and ‘know-how’. Everything seemed to be set for another title for Arsenal’s most successful manager, especially in the context of Mourinho’s Chelsea imploding and Manchester City’s defensive struggles. For the first half of the season, everything went to plan. Arsenal were well in contention for the league, qualifying for the latter stages of the Champions League yet again. Then everything fell apart. Leicester, a 5,000-1 event, powered clear of the title-chasing pack as Arsenal faltered behind rivals Tottenham for the first time under Wenger, hampered by injuries to key players such as Cazorla and Alexis, having kept pace up until February. There was no refuge in the FA Cup either, with Arsenal being dumped out in the Quarter-Final by Watford, and fans’ fury boiled over, resulting in fights outside the Emirates, banners calling for Wenger’s removal and a tempestuous atmosphere at matches, leaving Wenger’s standing among Arsenal fans lower than ever before, with a summer of discontent seemingly on the horizon. A late rally, combined with Newcastle’s trouncing of Spurs on the final day of the season, saw Arsenal finish in second place, their highest finish since 2005, but one hardly likely to appease a section of the fanbase, who protested against the way the club was being run in a home match against Norwich, only to be drowned out by the overwhelmingly pro-Wenger crowd. Further evidence of Arsenal’s newfound financial strength was seen in the summer of 2016, with £96m spent on new signings, and although the club entered potentially the final season of Wenger’s glittering career with a squad blessed with depth in a way the sides of the austerity years could only dream of, the scepticism surrounding the manger’s 12 year wait for a title was greater than ever.

Despite this, the numbers of Wenger’s reign are astonishing. He’s won more games than any other Arsenal manager, he’s won more trophies than any other Arsenal manager, he has the best win percentage of any Arsenal manager, so much so that he could lose his next 100 games and still hold that record. He was the first foreign manager to win the Premier League with widely-accepted innovations in training and dieting, paving the way for future winners such as Mourinho, Pellegrini and Mancini, as well as the current crop of star managers; Guardiola, Conte, Klopp and Pochettino. They all owe him a lot. As do we all.

Arsène the philosopher

Albert Einstein once (allegedly) said that ‘insanity is doing the same thing over and over again but expecting different results.’ Gabriel Clarke, in an interview for ITV, on the eve of the 2014 FA Cup Final, put that quote to Arsène as a criticism of the perceived way in which Arsenal had fallen short in terms of winning trophies in the previous nine years. Wenger responded passionately, saying ‘without strong beliefs, you go nowhere in life.’ Who to believe? Who does more in the world? Those who consistently change their beliefs and approaches, or those whose single-mindedness and willpower enact great change? History is littered with great men in both camps, as is football.

wenger avi 201.jpg

“The unhappiness of man comes when he finds himself alone to fight against the problems he must face.”

We live in a time unique in history, where capitalism empowers those to aspire to (and reach) the cravings of man, with social media and the internet creating a web of immediacy and therefore impatience that touches all who come into contact with them, acting as a mouthpiece for millions. The creation of games like FIFA and Football Manager plays a part too; now you, little humble you, can take over and control the running of a multi-million pound organisation without fear of the consequences of failure. After all, you only need to turn your computer off. Humanity’s inherent nature is to desire more and more, bigger, better, newer, more flashy, and this is reflected in today’s society more than any other.

Arsène is the exception to that in today’s whirlwind of football management. The boy from Duttleheim, born in a different era, with one foot in post-war France and the other in post-war West Germany, growing up in the ruins of 1950s Europe. The young man who grew up watching faded legends of a game from a different time; Netzer, Platini, Baggio, Eusébio . An intellectual, equally capable of discussing politics with a room of enamoured journalists as he is talking football tactics. A manager, fluent in six languages, who fought against corruption in France, establishing Monaco as a domestic powerhouse in his reign, a man who revolutionised football in Japan, and a man whose spell at Arsenal transcends the movement of the world outside N5.

wenger avi 124.jpg

“The philosophical definition of happiness is a match between what you want and what you have.”

Maybe it’s because of his upbringing in his parents’ pub, watching the effects of addiction on those surrounding him, maybe it’s because of his different way of viewing the world, but Arsène has always had a rather ambivalent attitude to winning. Not that he doesn’t want to of course, but he understands that there is something more.

How can there not be? If you don’t win every time you play, do you fail? Does your success solely depend on whether the ball hits the woodwork and goes in, or spins tantalisingly across the goal-line? Luck matters more in football than most other sports, and the best team often doesn’t win every trophy. Even if you do win every trophy, as Real Madrid, Barcelona and Bayern Munich have seemingly done at will in recent years, are the fans still happy? Or will they always call for more, criticising managers such as Guardiola for not winning the Champions League at Bayern Munich when they could not have possibly bettered the final year of Heynckes’ tenure? We live in a strange period in football where certain clubs and managers have a monopoly on success, which seemingly demeans each individual trophy as ‘inadequate’ unless it is consistently supplemented by more and more, a never-ending cycle of greed and entitlement. Trophies are nice, you get a shiny piece of silverware and a few days of celebration, with admiration fêted from all sides, before people forget, and past winners disappear into the history books.

wenger avi 203.jpg

“I would like to get more out of my career from the human side of it than from the medal side of it.”

Arsène understands that old adage of Socrates, who stated that ‘Beauty comes first. Victory is secondary. What matters is joy.’ He himself is quoted as saying ‘I want a fan to wake up in the morning and say, “Arsenal are playing today, I’m going to have a good time.” That guy starts his day off by thinking that something good is going to happen to him.’ To him, the way in which you win is much more important than the fact that you win, and I never used to understand this non-materialistic way of thinking. But I’ve been converted.

 

wenger-fabregas-1

“You can imagine though, that plenty of people have talent in life but they do not meet someone who gives them a chance. Why are they not there? Because no one has given them a chance. So in life it’s important to meet someone who will give you a chance, and when I can do this in football, I do it.”

I’ve been very lucky in my life to have gone to many Arsenal games, and the moments of greatest satisfaction haven’t necessarily come from winning trophies. There’s Jack Wilshere’s team goal against Norwich, Andrey Arshavin’s winner against Barcelona and Mesut Ozil’s deft flick to assist Olivier Giroud against Aston Villa, moments which left me with a smile on my face, the entrance fee well worth the price for that split-second moment.

wenger avi 54.jpg

“Daily life can drag you down, football can be a fantastic experience for some people.”

Arsène lives and loves these moments just as much as all of you, pumping his fists in the air in his trademark celebration, grinning as he prepares to sign Mesut Ozil, as he prepares to hug Fabregas, Henry, van Persie and Gallas in celebration, evidence of the tremendous bond he has with his players. He understands that what is left behind is more important than what went before, that legacy trumps all. We see examples of this all around us today, where the fall of mighty invincible empires has left behind pyramids at Egypt, temples at Palmyra, walls in England, palaces in Vienna and a world full of inequality and greed. Arsène wants to leave a positive impact, explaining, ‘I would like to go down in history as somebody who tried genuinely to help the club make a step forward,’ and his sense of legacy is impeccable, encompassing both the style of football Arsenal are recognised for and the Emirates Stadium itself, resulting in a football legacy that only Johan Cruyff can match at a single club.

wenger-avi-176

‘I still believe, during all of my stay here, I have done the best job between 2007 and 2014. The trophyless years, yes, they can never be completely appreciated, and I understand that.’

 

Arsène understands the temporary nature of life, that life should be based around our attempts to make ‘each day as beautiful as we can’ and that ‘the only way to deal with death is to transform everything that precedes it into art’. His desire to play beautiful football, whilst forming a large part of his overwhelmingly positive legacy at Arsenal, is not born out of stubbornness, but rather out of pragmatism, an alternative to the idea of winning above all else, a temporary release from the constraints of our society that we can witness at 3 o’clock every Saturday.

wenger-avi-256

“Football is there to provoke moments of happiness, excitement and positive experiences in people, no matter where they come from, what colour skin they have, what religion they are or what their preferred sexuality is.”

Arsène’s connection to Arsenal is total, even extending to his name, and he’s all I’ve ever been fortunate enough to know as an Arsenal supporter, from the hazy successful days of the early 2000s to the barren years of beautiful football in our beautiful new home, followed by the return to former glories, all overseen by one man, beholden to the idea of purity and principles on the pitch. But outside of football stadia is where we must spend most of our lives, and I find it fitting that it’s a man from football, that strange sport defined by short-termism, where the average life expectancy of a top flight manager is just 13 months, who’s taught me so much about perspective in life. Here’s to the next 20 years Arsène. We shall never see his like again.

wenger avi 205.jpg

“I grew up with the game, and I think I will die like that.”

 

Football Analytics – Part Six: Resources

In case you missed it, I recently wrote about game states and score effects.

I was asked a few weeks ago which resources I use to look up, collect and present my stats, and I thought it would be a good idea to provide a list of the sites and programs I use. Something similar has already been done by Tom (@Worville), and you can look at his list here, there are a fair few similarities.

Expected Goals Dashboard

xG dashboard.png

This brilliant resource was made by Paul Riley (@footballfactman). It allows you to look at each shot on target taken by a player this season, as well as the outcome and the shot’s xG value. You can easily find teams’ xG data, and there are goodies in the form of Google Documents which have both goalkeepers’ and outfield players’ xG data at the bottom of the page. Regularly updated, it’s an absolute treasure trove of information, and my only minor qualm with it is that it doesn’t include all shots, only shots on target. His Tableau profile includes links to an ‘Expected Assists’ dashboard and last year’s xG dashboard. If you want to do some historical research, Paul also has some xG data from the 10/11 season onwards here.

Advanced Statistics Page

Caley data.png

Michael Caley, godfather of expected goals on Twitter, has a couple of pages covering teams’ attacking and defensive statistics. The linked page (here) contains information on the type of attacks PL sides prefer, the number of attacks they attempt, as well as xG information for each team. If European football is more your thing, a similar page can be found here for La Liga, Bundesliga and Serie A.

Pitch Tracker

Pitch Tracker.png

Want to track any type of event manually on a football pitch? John Murdoch’s pitch tracker was set up to help capture shot location, but you could use it with pretty much anything, from tackles to crosses.

Expected Goals Calculator

xG calculator.png

A nice simple xG calculator can be found here from Ben (@Torvaney), something which is sure to be updated and improved. Watch this space.

Expected Goals Simulator

xG simulator.png

Danny Page wrote this a while ago on expected goals. His xG simulator calculates the probability of a result occurring based on the xG value of the shots that occurred in the game.

Shots, PDO and everything else

ObjectiveFooty.png

Expected goals data can be difficult impossible to find at times, so often we humble analytics folk without access to it must use shot data. Luckily for us, there’s a tonne of data available. ObjectiveFooty (site here) has everything you’ll need on PDO, shot ratios, score effects, whilst FootyInTheClouds (site here) has plenty of shot data too, as well as individual player shot data in terms of their contribution to the team. Furthermore, FootyInTheClouds has a cool feature which allows you to look at a side’s rolling PDO and its elements, as well as other metrics, as shown below:

footyintheclouds.png

If you’re looking for shot data over a long period of time, this site is perfect for you. Clicking on any country in the ‘Odds and Results’ section on the page’s right hand side will take you to a page of spreadsheets with shot data from every league game in that country over recent seasons (Premier League shot data goes back to the 2000/2001 season) and in several tiers of that country’s domestic football.

Footy Data.png

Findng Other People’s Work

This is not as difficult as you’d imagine. Tom has made a bot (@FanalyticsBlogs) which automatically tweets the newest pieces as soon as they’re published, so you can find all the most recent articles concerning analytics.

Programs

Tableau.png

Currently, I have still not progressed beyond the computer-based purgatory that is Microsoft Excel (and Tableau). Excel is okay, and you can, with a little care and devotion, make okay-looking data visualisations. Tableau is a pretty cool free program, perhaps a little difficult to get the hang of, but perfect for interactive data visualisations.

A level up in terms of complexity are coding programs such as R and Python, and whilst I haven’t learned how to use either so far, it’s definitely something I want to look at.

Saving Articles

I used to bookmark an article that I’d enjoyed reading, but this led to an absolute mess in my bookmarks. Instead, I now have a Word document with different sections in which I put the relevant articles, keeping them all in once place where I can refer back to them if I ever need to look up any specific statistic or theory, and it’s something I’d encourage you to do as well if you enjoy reading/saving articles.

Thanks for reading, I hope you find this helpful. If you’ve got any questions, don’t hesitate to contact me @OneShortCorner.

The rest of my ‘Analytics for Beginners’ series can be found here:

Part One: Introduction

Part Two: Shots

Part Three: PDO

Part Four: Expected Goals

Part Five: Game States and Score Effects

Football Analytics – Part Five: Game States and Score Effects

So last time in this series I covered expected goals, and you can read about it here, where I briefly mention game state, but only that it would be covered in the future. Well luckily for you lot, this is that time.

As with most things concerning football analytics, the terminology is more confusing than the actual concept itself. Game state refers, surprise, surprise, to the state of the game in which two teams are playing. If we’re going to start off simplistically, there are three generic types of game state that a team can find itself in; winning, drawing or losing.

We can break down these game states further into categories determined by the match’s goal difference for greater detail. So a side that is 3-1 up will be in a game state of +2, having scored two more goals than the opposition.

Why does this concept matter?

Game state matters because the way teams respond to changing game states creates score effects. Any score effects outside of the game states -2 to +2 (so anything except -2, -1, 0, +1, +2) should be taken with a pinch of salt, due to small sample size.

Firstly, the following score effects are not necessarily applicable to every club and situation, they’re based on several years of theory and testing. Elite clubs are often able to sustain similar shot levels at any game state.

One seemingly obvious effect is that a side which is a goal down is more likely to take more shots. If you haven’t read about either TSR or SoTR, my piece on them is here.

TSR

TSR game effects

The above graph was made by the excellent @11tegen11, whose piece on game states you can read here. Sides that are in game states of -2 or -1 tend to outshoot their opposition, presumably because they need to get back into the game, resulting in more shoots against an opposition that has no need to chase the game, and who are more likely to be defending in a good defensive structure.

Sides which are 3 goals or more down tend to get heavily outshot, normally because the 3 goal deficit is indicative of their inferior strength compared to the opposition, who can easily dominate them.

SoTR

It’s a fairly similar story with SoTR (graph again from 11Tegen). But why are do teams 2 goals down get fewer shots on target off than their opponents, when they appear to get off more shots?

SOTR game state.png

Being behind, theoretically, one would think, would lead to lower shot quality from the team behind, as with more players behind the ball, the leading team should be able to deny the opposition space to get shots away. It would also have the added effect of forcing a trailing team to take more hopeful efforts from further out and from poor angles if they were unable to work the ball into good shooting positions because of the defence. The winning side would probably also be able to create better quality chances on the counter-attack as the opposition would likely throw more men forward, allowing them to take shots from better angles which have a higher chance of going on target.

Expected Goals

Indeed, Michael Caley mentions here that game state is significant for the expected goals value of regular shots (not headers, shots assisted by crosses or set-plays). Although he admits the effect is small, he attributes it to the ‘still unaccounted-for slight differences in defensive pressure applied by teams trailing or leading a match,’ which, if you remember how we don’t have off-the-ball tracking to measure pressure, makes sense.

Basically, if you’re winning, the quality of your chances by xG will be slightly higher than at a neutral/negative game state, because it’s presumed there will be less defensive pressure on the shooter.

Conversion rate

sc% and game state.png

This graph illustrates the importance of game state in conversion rate. Conversion rate’s kind of a big deal, and it’s not surprising, having discussed it above, to see sides in the lead score a higher % of their shots. Why? Because of the likelihood of the trailing opposition throwing men forward leaving themselves exposed to counter-attacks which can more easily create high-quality chances.

PDO

Another graph now (sorry), this from this StatsBomb.com piece by Ben Pugsley, who has done a ton of work on game states and runs this excellent site, which has this cool page detailing all of the shenanigans that take place at different game states.

PDO game state.png

I spoke about PDO in Part 3 of this series, and it’s amazed me to see Leicester continue to hold of PDO of over 110 throughtout this season, even though this ‘should’ be unsustainable. But when you consider that Leicester have only trailed for 361 minutes this term and have been at +1 game state for longer than anyone else, and thinking about what the above graphs have shown, a portion of their ‘over-performance’ in PDO can be explained. This is because their opponents will be taking more poor quality shots (due to the Leicester defensive structure) because they’re behind for a long time, allowing Leicester to rack up a high save % and conversion % when utilising their counter-attacks.

Context is king

Score effects can have the ability to make a team look better or worse than it actually is. Liverpool have been one of the strongest shot teams in the league this season, but the fact that they’ve only been leading for 23.8 minutes per game necessitates greater shot volume, so maybe they’re not as good as TSR says? In contrast, Leicester, who’ve spent only 10.9 minutes per 90 trailing this season, might not need to take as many shots as Liverpool, causing them to look like a weaker team using metrics such as TSR or SoTR.

Penalties?!

For those who don’t follow them (and you should) @Stats4Footy has done several articles on penalties in football, including this one, which looks at penalty success rates at different game states. It threw up this graph:

penalties game state.png

Is this evidence of the presence of pressure on the shoulders of a penalty taker? I don’t know, but it’s something intriguing I thought to end this piece on.

Thanks very much for reading. If you’ve got any questions, don’t hesitate to contact me @OneShortCorner.

If you want to read more about game states, I did something a bit more specific on Arsenal and Manchester City back in December.

Football Analytics – Part Four: Expected Goals

Last week, we asked this question:

OSC poll

If you’ve ever delved into Football Analytics Twitter, you’ll probably have come across the term ‘expected goals’. For many analytics folks, it’s the holy grail. But what is it?

Let’s go back to school, simple maths and probability.

Each shot taken by a player has a probability of going in. This probability is expressed in numerical form as a number between 0 (no chance of a goal being scored) and 1 (a goal is certain). The probability of a shot being scored from a location is its expected goals (or XPG/xG if you prefer abbreviations).

Many factors go into working out each shot’s probability of going in; for example the player’s location on the pitch when taking out the shot.

Over the course of a match, each team’s expected goals can be added up, and very clever people on Twitter can produce expected goals maps like these:

xG map

(This xG map is provided by Michael Caley, who you can (and should) follow here.)

You’re probably thinking “What the hell is this?” Let me explain.

Each square on the pitch is a shot, with Verona’s shot locations on the right, and Napoli’s on the left. The size of the square corresponds to the quality (expected goals) of the chance. Goals are in pink. The xG sum of all of Napoli’s shots was 4.7, Verona’s was 0.4, clearly indicating that Napoli dominated the game, both in terms of shot numbers, but in terms of shot quality.

Why do I like expected goals?

Unlike most statistics in the mainstream, (shots, shots on target etc.) expected goals directly measures shot quality, and if taken over the course of the match, both quality and quantity of shots. It makes sense to assume that a side that is able to create more/better chances than the opposition has played better and ‘deserves’ to win, as a general rule of thumb. Sure, statistical anomalies happen, and sides that have an xG of 0.5 over a match can and have beaten sides with 2.7 xG, but (very generally) a higher xG than your opponent means that you ‘should’ win the game. On this topic, this from Danny Page is an invaluable resource.

I find xG maps an invaluable tool in gauging how well teams played, xG as a whole helpful in challenging media narratives as to how well a team is playing and the concept vital in thinking more about shot quality, which is a topic not covered enough in mainstream media analysis in my opinion.

Expected goals can be also useful when applied to individual players.

Player Z’s xG over a season: 10.1

Player Z’s goals over a season: 18

Now, each shot’s xG is only the average probability of scoring for every single player, so there are some exceptional finishers, such as Messi and Podolski, who are able to outperform their expected goals year after year. But what’s interesting about expected goals is how rare it is that players consistently overperform their expected goals to a large extent. We can predict with some degree of confidence that players who are largely outperforming their individual expected goals (see player Z above) will regress to the mean over time. Expected goals can help you gauge how good a player’s finishing is (if he’s consistently over/underperformed his xG, or been a G = xG finisher throughout his career), but more importantly, how good he is at getting into good positions. Arsene Wenger has described finishing as ‘cyclical’, and while what he’s talking about refers more to regression than anything else, he’s right.

Expected goals is a great predictor of future results (more on that in another blog post) and it correlates really well with actual goals over a large enough sample size.

Why don’t I like expected goals?

The biggest black mark against expected goals most people have is that it can’t measure the proximity of defenders to the shooter, due to the lack of off the ball tracking data. The modellers try to compensate for this (as explained below), but this is still an ongoing problem.

It only includes shots. Say a cross is fizzed across the face of goal and nobody gets a touch and it goes out for a throw-in. That amazing chance to score a goal isn’t counted by expected goals because there was no shot and the side gets no credit for it in terms of expected goals.

It’s easy to misinterpret an xG map.

xG map 2

Take this xG map for example. Looks like a fairly close game based on the xG difference, doesn’t it? Fact is, Arsenal went 2-0 up early on, and could afford to sit off, conceding loads of poor quality chances to Villa, and had no need to go forwards as long as Villa didn’t score. Score effects matter a lot in determining a side’s xG. As with everything in football analytics, context is king.

What factors go into expected goals?

It varies based on each person who creates a model, but I’ll go through several factors and give a brief explanation for the theory behind each.

Angle of the shot

This should be fairly obvious. The more central a shot is taken, the more of the goal the shooter will have to aim at, and the higher chance he’ll have of scoring.

Distance of the shooter from goal

Fairly self-explanatory as well. Shots closer to the goal tend to find the net more frequently than those from outside the area.

Part of the body used to shoot

In terms of feet, this should be obvious. A player with two identical opportunities is more likely to score with his stronger foot, so this is incorporated into xG calculations. However, xG does not like headers, because they’re far harder to direct and generate power on, meaning that a shot with feet is almost always going to have a higher xG than a header from the same location.

Speed of the attack

This is partly included because of football’s lack of off the ball tracking data. The general theory behind this is that faster attacks are more likely to result in goals because the opposition’s defence will be unable to get back into defence in a strong defensive shape. Expected goals loves chances coming from counter attacks.

Type of assist

Perhaps a less obvious factor. Crosses (especially those in the air, leading to headers) are not a great way of generating high-quality chances. Instead, throughballs (because they lead to one-on-ones) and passes from the danger zone (because these leave the defence and goalkeeper out of position for an easy chance) generally result in high quality chances with little defensive pressures, so a chance assisted by one of these passes is given greater xG. Assists (or shots) following a successful dribble give the chance a higher xG because it is assumed (quite fairly) that there is less defensive pressure on the player, giving them room to get a shot away.

A few others

Individual errors often give sides an easy chance with most of the team out of position, a player rounding the keeper ramps up his shot’s xG and rebounds often give a free shot to an attacker in a good area, resulting in a higher xG for that chance.

(Game state also affects expected goals, but I want to cover them in the next instalment.)

To conclude, expected goals isn’t perfect, and nobody who works with it would suggest that it is, but it’s by far the best stat we have, and it’s got plenty of things going for it too.

If you’re curious, here are the ‘best’ and ‘worst’ finishers compared to xG over the past few years:

best xG performers

worst xG performers

(Credit to Michael Caley for both the xG maps and these two graphs.)

A few really cool xG-related resources from the Twitterati:

You can make your own xG map with this cool resource from @Torvaney.

This player action tracker from John Murdoch is fantastic as well.

If you want to follow more people who tweet about expected goals, look no further than Michael Caley and 11Tegen (Sander).

Here’s an interactive xG dashboard from Paul Riley.

And finally, Michael Caley’s Premier League and European stat sheets.

 

Thanks for reading.

If you’ve any questions, you can contact me @OneShortCorner.

You can read the other parts of this series here:

Part One

Part Two

Part Three

Football Analytics – Part Three: PDO

If you haven’t already, you can read the first two parts of this series here and here.

In Part Two I mentioned that the two most important requirements of any metric are its repeatability and predictability. Just to be difficult, this next metric I’ll discuss, PDO, is not widely recognised as being either especially repeatable or predictable.

*Warning. PDO is a complicated subject.*

So what is PDO?

Confusingly enough, PDO doesn’t really stand for anything. It’s a metric borrowed from ice hockey across the pond, brought into football by James Grayson, and it’s generally used to gauge how lucky a team is.

It’s calculated by adding a team’s save percentage to its conversion percentage. Save percentage is (100-(Goals Conceded/Shots on Target against)*100) and conversion percentage is ((Goals Scored/Shots on Target for) *100).

So let’s say Liverpool are saving 60% of the shots on target they face, and scoring 33% of the shots on target they take, their PDO would be 93.

Because one team’s PDO directly affects the opposition, the average for teams is always 100.

PDO SOTR

All graphs are provided by James Grayson.

As the above graph shows, there’s pretty much no correlation between how good a team’s SOTR (covered in Part Two) is, and its PDO, indicating that PDO is determined by luck, not quality. (Don’t be put off by the numbers on the ‘y’ axis, some analysts have PDO’s average at 1000, others at 100. I use 100 because I think it’s easier to understand.) Better teams generally do have a slightly better PDO than worse teams, partly due to the fact that they can create more high-quality chances, but the differences are negligible.

PDO repeatability.jpg

Furthermore, there’s little to no repeatability in PDO numbers, as shown above by the weak correlation. This reinforces the fact that PDO is not something that can be controlled and skill-driven, but something driven by luck.

What PDO is though, is consistent.

PDO Distribution.jpg

As you can see, very few teams are expected to deviate largely from the PDO average of 100.

What does it mean?

PDO assumes that finishing and saving are random and tend to even out, or regress to the mean if you want a fancier term, over a period of time, affecting sides both offensively and defensively. If a side is scoring 50% of their shots on target or conceding 50% of its shots on target against, it’s likely that this is unsustainable and can’t be maintained, causing them to regress to the mean, thereby affecting their PDO.

A good example of this is Arsenal during the early part of this season. Having failed to score against West Ham in their opening game, they followed this up with a good win at Palace, a 0-0 at home to Liverpool, and wins over Newcastle and Stoke. The media tore their hair out at Wenger’s refusal to buy a striker after Arsenal had over 20 shots in both the Newcastle and Stoke games but only scored 3 goals, claiming that Arsenal had no chance of winning the title. In reality, Arsenal’s PDO was unsustainably low, and it quickly regressed to the mean, as PDO often does, with the Gunners putting 5 past Leicester.

There are exceptions to this rule of course. Good teams can generally sustain a PDO of >100 with a great ‘keeper and hot striker who has a great season or is just bloody brilliant, and Tony Pulis’ Stoke City sides had a strange habit of always having high PDO values under his tenure, implying that there are ways in which you can consciously influence your PDO. But examples like this are generally anomalies, and it’s a safe bet to expect a side with a PDO of 110 or 90 to regress to the mean.

The trouble is, you can’t predict with certainty if a side will regress, when a side will regress or to what extent the side will regress. Which complicates things. But one can make a reasonable assumption about all three things.

If it’s not repeatable, or a measure of how good a team is, what use is it?

PDO is a huge driver of narratives in football. Every manager sacked before Christmas in the Premier League was managing a side with a PDO of less than 100, showing the huge role it has in determining results, which in turn, influence people’s perceptions of events. A side putting together a few wins on the spin could be powered by unsustainable finishing/save percentages, indicating that they might not be as good as pundits and fans think. PDO allows you to estimate how lucky a side has been, giving you a better sense of their true quality.

As with everything in football analytics, it’s all about context.

Thanks for reading.

If you’ve got any questions, follow and DM me @OneShortCorner .

Further reading on PDO can be found

From @11Tegen11 here.

And from James Grayson here, here and here.

Football Analytics – Part Two: Shots

*If you haven’t read Part One, you can find it here.*

Football analytics tries to do many things, the principal among them evaluating the strength of a team and predicting future results. When looking at team statistics in football, there are two main ideas to consider; their repeatability and their predictive ability.

Repeatability basically means that if your team does something in one game, how likely is it that they can continue to do this over future games? If something is not repeatable, it has little predictive value.

Predictive ability is how strongly correlated the statistic or metric is to winning matches, often compared to points per game, points or future goal ratio (the proportion of goals a team scores in its games; above 0.5 means they score more goals than the opposition).

“At the end of the day, the only stat that matter is the scoreline.”

So a wise man on the television once told you, or maybe a stranger on Twitter. But as this exchange tried to show, there’s far more to football that the result. A side can play poorly and win, but if they were to continue to play poorly, results would likely catch up with them, meaning that they weren’t as good as the initial result would appear to make them. Results are a bad way of evaluating how good a team is and predicting their future performance in the longer-term.

Why?

There are so few goals in a game compared to points in a tennis match or balls bowled in a cricket match. This means that goals are really important whenever they’re scored in comparison to other sports, meaning that if a weaker team can score against a stronger one, they’ve got a much better chance of winning the match than if an unseeded player managed to take a game or set off Djokovic. This means shocks in football are relatively common, which is part of the attraction of the sport. However, it also means that individual results are less driven by skill and more by luck than in other sports. Hence why individual results are a bad way to rate teams, not to be mention the tiny sample size that goes into forming opinions off them. Of course, over time, better sides will score more and concede fewer goals than worse sides, but the same problems still remain, as goals are often not the most repeatable of statistics. 

As this graph from the brilliant @11tegen11 shows, goal ratio and a side’s points per game are relatively poor predictors of future performance.

correlation with goal ratio and ppg

So we have to dig deeper to get a better metric than one that uses results and goals, and the logical next step is shots.

The first metric I want to discuss is called TSR. TSR stands for ‘total shots ratio’, and it’s really easy to work out. Let’s say that team ‘X’ has taken 20 shots this season, and has had 30 shots against them. In their matches, there have been 50 shots (20+30), of which team ‘X’ have taken 20 of them. Simply divide the number of shots taken by team ‘X’ (20) by the total number of shots in their matches (50), and you’ll get that team’s TSR, the proportion of shots they take over the course of a few games/season etc (0.4).

Obviously over the course of just one game, having more shots than the opposition doesn’t mean you’ll win the game, but if such shot dominance can be extended over a long period of time, impressive results should follow (assuming you’re not going full-on Coutinho every game and shooting from 25 yards).

Why do we like shots, or more specifically, TSR? 

Shots inherently have a larger sample size than goals, and this, combined the naturally streaky nature of finishing, makes them more representative of a side’s quality. It’s also really easy to find their stats. Every match report, be it on BBC or Sky Sports, will have simple shot statistics that can be easily collected.

It’s more repeatable than goals (thanks to @JamesWGrayson, whose article can be found here).

goal ratio repeatability

The titles on these axes are incorrect, it should read 'TSR' instead of 'Total shots for'.

The titles on the axes on the second graph are incorrect, it should read ‘TSR’ instead of ‘Total shots for’.

It’s a better predictor of future performance than goals (again, thanks to @11tegen11).

TSR predictability

For the sake of brevity I won’t try to explain why this is the case in this blog post, as there’s another very similar metric that is also widely used to be explained. It’s called SOTR, or ‘Shots on Target Ratio’, and it’s calculated in the same way TSR is, except that it only includes shots on target in its calculations. The theory behind this is that better quality chances are more likely to be hit on target, so if you have more shots on target than your opposition, you’re likely to be creating better quality chances, scoring more goals, and getting better results. It’s not so surprising that it correlates with future points better than TSR, although only slightly.

SOTR predictability

Quick summary of the ideas I’ve tried to cover here:

Repeatability

Predictability

TSR = total shots for/(total shots for + total shots against)

SOTR = total shots on target for/(total shots on target for + total shots on target against) 

Thanks for reading, and I hope I’ve been able to help you understand some parts of  football analytics a bit better. If you’ve got any questions, feel free to drop me a DM @OneShortCorner.

 

Football Analytics – Part One: Introduction

: information resulting from the systematic analysis of data or statistics.

Hello. The aim of this (what I hope will become) series is to try to explain to people who haven’t really come across football analytics the jargon and metrics it entails in simple terms.

First things first, I’m not a mathematician – my only maths qualification is a GCSE in it several years ago, and I’ve never pursued a higher degree in that subject, so some of you may be in a better starting position than I was when I first discovered football analytics from a maths viewpoint. I’ve learnt everything I know from Twitter and its users’ blogs, which are a fantastic way to see the latest cutting-edge developments in public analytics (we at OSC have a regularly updated list of real life football analytics persons here).

Okay, why is football analytics needed? If we can watch the game with our eyes, see what happened and come to our own conclusions, why do we need numbers? 

I was very sceptical of numbers to begin with, no doubt due to the rampant misuse of Squawka’s Comparison Matrix, and wasn’t sure why we needed to quantify ‘everything’. Soccernomics (a must read for anybody interested in football) sums this need up well in its early chapters. I quote it in this extract from a previous article of mine ‘In Defence of Analytics‘:

“There are two main factors which are almost exclusively avoided by the average football fan when it comes to evaluating anything football-related, but both are absolutely critical in the evaluation of anything to do with the beautiful game. The first is given the uninspiring name of availability heuristic. Soccernomics, a fantastic book I’d highly recommend to anyone wishing to get into football and stats, defines availability heuristic as ‘the more available a piece of information is to the memory, the more likely it is to influence your decision, even when the information is irrelevant.’ Basically this means that more recent and/or memorable events tend to stand out in your mind, and therefore influence your opinion about a player/event, even if the information is useless. This means that eyes are hugely fallible when it comes to making judgements, so it makes sense to at least consider the application of something less biased when making said judgements.”

“The other factor is confirmation bias, which is defined as ‘a tendency to search for or interpret information in a way that confirms one’s preconceptions, leading to statistical errors.’ A suitable analogy would be that when watching a game, a player you don’t ‘rate’ makes a poor piece of play. Confirmation bias would strengthen your belief in his lack of quality, even though he might have otherwise had a solid performance. A failure to apply this pair to decision-making is a sure-fire way to make misinformed decisions and judgements, as is part of the reason why decisions based on gut feelings go wrong more often than those based on data, as data can’t lie, it can only be misinterpreted.”

Essentially football analytics is just another layer of information that can be applied to the sport; there will always be room for visual analysis, even the most ardent of the evil number wizards that inhabit Twitter.com would admit that. Taking the old adage “Knowledge is power” and applying it to football puts analytics better into context. Given how much many people have riding on the game of football, from managers to fans to punters, it makes sense to try and understand as much of it as possible, in as many different (plausible) forms as possible. Or at least it did to me.

Numbers are also useful because they can be used to attempt to quantify things that would otherwise be only described with an adjective, and their use in football data means that things not immediately apparent to the eye can be picked up by the stats, allowing us to understand the game even more.

Thanks for reading.

Our next part will focus on one of the key features of football analytics: shots.

If you’ve any questions or suggestions, tweet or DM me @OneShortCorner.

My Prediction Model

Over the past few weeks, I’ve been asked several times to elaborate on my predictions model, which I created as a bit of fun earlier this season. This blog, unless you’re interested in modelling and numbers, will be immensely boring, so don’t say I didn’t warn you.

First things first, I have almost no mathematical experience whatsoever, apart from an iGCSE in Maths. I also have no experience of any high-tech programs such as R or Python which I know many members of the stats community use in their endeavours (although I have downloaded R and might actually one day get round to using a tutorial). So armed with nothing apart from Microsoft Excel, my extremely lowly maths qualification, and good old instinct, I set about setting up my model.

It’s gone through a few different phases, getting more complicated (and hopefully more accurate over time), but here’s how I’m currently doing it.

First things first, I aimed to compare the two competing teams, using their ClubELO values and either their SOTR (or xGR, which I’ve only brought in recently as xG takes slightly longer to stabilise than SOTR), to get something similar to a team rating (although it’s not detailed enough to accurately be described as such). Here’s what I do with this information:

Team A’s SOTR/Team B’s SOTR = Weighted SOTR. Anything above 1 means that Team A has superior shooting metrics. The same would be done for Team B, with the numerator and the denominator switched around.

Team A’s ClubELO rating/Team B’s ClubELO rating = Weighted ClubELO rating. Anything above 1 means that ClubELO (an extremely accurate way of measuring a team’s quality) ranks that club above the other. Again, the same would be done for Team B, with the numerator and the denominator switched around, giving me something like this:

Model

I would then multiply each team’s weighted SOTR by their weighted ELO rating, which generally has the effect of exacerbating the differences between the two teams, so as to separate their quality better.

The next step isn’t something conventionally used in analytics as far as I know, and I haven’t read anything on its predictive value, but as I’m interested in the psychological side of the game, I decided to add it. I decided to look at the previous results between the aforementioned clubs (within the last year so as to keep them relevant) and boost the winner’s  chances of winning by this:

+GD change/(Number of months since result + 12)

I decided to add the 12 after hours of trial and error, as without it, previous results would be too heavily weighted. I like it because victories by the odd goal hardly ever change the predicted result, but previous thrashings can have an impact, and there’s something to be said for the mental scars caused by pastings. This number is then added onto the team’s rating, and the away team’s rating is subtracted from the home team’s rating. If the number is positive, the home team is viewed as stronger, if it is negative, the away team is viewed as stronger. In order to fully compensate for team effects, I multiplied the difference between the two teams by 20, again after hours of trial and error, giving a ‘points difference’ between the two sides that aims to reflect their respective qualities (found in the bottom right of the image below).

Here’s an example from the Everton vs Tottenham game last week.

Model 6

Theoretically this is the easy part. The next step was the include home advantage, and I try to compensate for this in two ways.

Firstly, I broke down the Premier League’s home and away records into two parts; the 13/14 and 14/15 season were one part, and this season is the other. Because this season has, so far, seen more away wins than usual, I wanted to account for this in my model, so I weighted them both equally by taking the average of their respective percentages, as opposed to taking the last 3 seasons as one.

Model 3.png

With this information, I created home and away swings, and I suspect this is where my model’s dislike for draws comes from, but I don’t really know enough about maths to confidently change this. (See below)

Cell C7 is simply the sum of E5 and and G5, because I figured that any point that goes to either the home or away team is advantageous to them. It’s the same for C10, with is the sum of G5 and I5. To work out what proportion of the home/away advantage goes to the win/draw, I divided the weighted HW%/D%/AW% (E5, G5, I5) by the Home or Away Swing as applicable, getting the numbers in E6, E7, E9 and E10.

Model 4.png

I’ll just use the home team example here, otherwise it’ll take too long. The weighted home advantage (E5) is put into a percentage figure (x 100) and is added to the team rating I spoke about earlier which has been multiplied by the HW Swing in cell E6 (E5 + (team rating * HW Swing)). This is repeated for the draw and the away win, giving three numbers that summed to 100. However, this vastly overrated home teams, causing me to add another step.

I wanted to add in team effects too so secondly I looked at each club’s top flight record since the 13/14 season started, both home and away (this brings complications with sample size into the mix for Watford and Bournemouth, but I didn’t want to include their Championship statistics as that would overrate them).

For Arsenal, it looks like this:

Model 2.png

Then for the two teams in the game I was predicting, I added up their following stats:

Model 5

The sums of these equations were turned into percentages in themselves. Then all I did was take the average of each potential outcome from the table above and the corresponding one from previous step (where I initially tried to formulate home advantage). This gave me three numbers, which summed to 100, and act as the %s I use in my predictions.

And that’s pretty much that, I’m aware it’s neither the most holistic or best-explained method, but it’s a bit of fun, and I believe the best way for analytics to advance is for everyone to explore freely. I’m happy to take any questions/feedback/help, so feel free to follow me @OneShortCorner. Soon I’ll publish a blog looking at how my model’s performed this season to evaluate its success.

Thanks for reading.