In a recent post over at the Phylogenetic Networks blog, I performed a network analysis of the historical results of the FIFA World Cup (a.k.a. soccer). During this process I noticed that much has been written about our ability to "predict" sports results. However, it occurred to me that some fairly simple points about forecasting and prediction have not been emphasized in these discussions; and so I thought that it might be interesting to cover them here.
The first thing to do is make clear the distinction between forecasting and prediction. Forecasting says: "If things continue the way they are, and have been in the recent past, then we forecast that this is what will happen next." Prediction, on the other hand, tells us what will happen next irrespective of what is happening now. In other words, forecasting does not take into account unforeseen circumstances while prediction does.
For example, the weather bureau makes suggestions about tomorrow's weather based on what the weather has been like over the past few days. This is a "weather forecast" not a prediction. In order to make a prediction, the bureau would need to know about all sorts of future events, such as up-coming eruptions of volcanoes, that they know nothing about, and they would also need to know about exactly how weather events in one location will affect other locations.
The main thing to recognize about forecasts is that they only have relevance during the period of time while things continue as they are. For example, the weather bureau forecasts the weather only up to 10 days in advance, because past that time today's weather has little direct influence. Moreover, you will have noticed that even when they do forecast tomorrow's weather accurately, their forecasts several days into the future get worse and worse, because more and more "unforeseen circumstances" can arise over time.
Forecasting has three uses: (i) people who like the forecast outcome can plan for it; (ii) people who don't like the outcome can try to stop it, and (iii) we can learn more about prediction by comparing the forecast to the actual outcome. Use (i) says, for example, that if the weather bureau forecasts sun tomorrow then we can plan to go on a picnic; but it is (ii) that is perhaps the most interesting use. A good example of use (ii) is the "hole in the ozone layer" that was a big issue a few years ago. The forecast was that there would be global problems if things continued they way they were going (the hole was getting bigger). Few people liked that forecast outcome, and so the governments of the world got together to try to work out how to change things. In effect, we have tried to make sure that the forecast does not come true (which it would do if we hadn't done anything). This is a notable victory for the forecasters (who didn't like the outcome, either).
I am not sure what use prediction is. If we actually knew what was coming next then life would hard to live. Imagine actually knowing all of the bad things that are going to happen to you (in among the good things, hopefully), and knowing exactly where and when and how you are going to die. It doesn't bear thinking about! So, I will leave prediction to the crystal-ball gazers.
This brings us to sport. I presume that we do not want an accurate prediction of the outcome of any sporting event. I suspect that if everyone knew the outcome then very few people would turn up to watch it. Indeed, I have seen people leaving the arena in droves once the outcome was beyond doubt, even if the event was not yet over. Much of the pleasure of watching sport comes from not knowing how it will turn out. This makes sports different from the weather.
Nevertheless, we can ask: to what extent can we forecast the outcome of sporting events? In my mind, this depends very much on the type of event. Here are two lists of competitive events for you (not all of them are sports).
Running (track and cross-country)
Skiing (downhill and cross-country)
Snooker (and billiards)
Motor racing (cars, bikes, trucks, boats)
Football (all types)
Hockey (both ice and field)
You will notice some general differences between these two lists. For example, in the first list the competitors often work alone or in pairs whereas in the second list they usually work in teams. Also, in the second list the competitors directly interact with each other continuously during the event, whereas in the first list they mostly compete alongside each other (except tennis).
However, neither of these things is the basis of the separate lists. What is consistently different is that in the first list the events terminate when a pre-specified goal is reached, whereas in the second list they terminate after a pre-specified period of time has elapsed. For example, in tennis the winner of the match is the person who first wins a specified number of sets, and the match continues for as long as it takes for one of the competitors to achieve that objective. In all types of football, on the other hand, the game stops when the whistle or hooter is sounded, and the winner is whoever is ahead at that second — it is irrelevant what the situation was one second before, or might have been one second later.
What is most important for forecasting is that a competitor's superiority "on the day" plays a much larger part in the outcome of the event for the first group ("objective-terminated") than the for the second group ("time-terminated"). There are simply too many "unforeseen circumstances" (usually referred to as good or bad luck) that can affect the event at the precise moment the event ends, so that the effect of overall superiority is reduced (but not eliminated!) in the List 2 events. Instead of being "best on the day" the winner is "the best at the final second". This means that forecasting will usually be much less successful for the second group than for the first group.
This difference has several consequences for the way competitions tend to be organized. Notably, most of the activities in the first list occur as a series of one-off competitions consisting of a small series of events, and the competitors choose which competitions they will take part in during any one year. In tennis or golf, for instance, there are world-famous competitions in which a winner is declared to be "the best that week" (eg. there is a "US Open" in both tennis and golf). Winning those competitions is important, but each competition stands on its own. On the other hand, the activities in the second list usually have a season full of connected events that are compulsory for all competitors, with the overall winner declared only at the end of the season. This full season is needed because of the reduced effect of superiority on the outcome of each individual event. The way to find out who is "the best" is to have a lot of events, so that all of the "unforeseen circumstances" average out over the whole season.
Another consequence of the difference between the two lists is that for the first group there are often ranking systems, listing the competitors in order of superiority based on their recent competition results. This allows a "winner" to be declared based on a whole year full of unrelated competitions, but it also serves to indicate who are the superior competitors at any one moment, and how big their level of superiority might be. These lists are updated after each competition. The point here is that such a list is only of use if recent results reflect superiority. For the activities in the second list this is not always so.
Another possible use of a ranking list is forecasting, of course. In this case, the forecast outcome of any specified event will be based on where in the ranking the competitors are at the time — the higher you are up the ranking then the more likely you are to be the winner.
I will provide a concrete example here by looking at the sport of soccer (football). The Fédération Internationale de Football Association (FIFA) World Cup™ competition has been played every 4 years since 1930 (except 1942 and 1946, for obvious reasons). The last competition was in 2010, and what we are going to try here is to forecast the outcome for the 32 competing teams. Given what I have said above, this is not likely to be a successful exercise, because in soccer each game terminates after a specified time rather than having any specified goal for the competitors to achieve.
For those of you who don't know, in soccer the ball is round, a team has 11 players (plus a substitute or three), and a match takes 90 minutes (possibly with extra time, and maybe a bizarre lottery called a "penalty shoot-out"). The World Cup finals competition is usually considered to be the most widely viewed sporting event in the world, surpassing even the Olympic Games. Not unexpectedly, forecasting is rampant beforehand, and national pride is at stake in many countries.
At the end of each competition FIFA provides an ordering of the teams (from first to last) based on their success in the finals. It is this ordering for 2010 that we wish to forecast, not just the overall winner.
For comparison, I will consider four different "professional" forecasts, out of the dozens that were available before the competition:
- Ian Hale's statistical forecast model, published in Engineering and Technology Magazine August 2010, available at eandt.theiet.org/magazine/2010/08/
- the BET365 bookmaker odds, taken from Hale's paper
- the consensus (average) of 26 bookmaker odds, taken from the working paper of Christoph Leitner, Achim Zeileis & Kurt Hornik, available at epub.wu.ac.at/702/
- La Informacion, a Spanish newspaper at www.lainformacion.com/
Furthermore, in spite of everything I have said so far, there are actually two quality rankings of the national soccer teams available. Given what I have said above, this might be appropriate for tennis, golf or chess (all of which also have such rankings) but it may not be useful for football. These rankings come from:
- FIFA itself, at www.fifa.com/worldranking/
- Elo Rating System, at www.eloratings.net/
I will compare these six forecasts using what statisticians call the coefficient of variation. This number indicates the proportional success of the forecast, measured as the amount of variation accounted for (see Wikipedia), varying from 0 (random forecast) to 1 (perfect forecast). Here are the results:
0.13 Ian Hale
0.14 La Informacion
0.14 FIFA order
0.26 consensus bookmakers
0.33 Elo score
These numbers are uniformly low, indicating that forecasting of soccer results is not a successful activity, in general, no matter how you do it. This is the main thesis of this blog post — forecasting competitions where the events terminate based on time cannot be very successful. Even the bookies had only a 25% success rate at forecasting the outcome of the competition, while the FIFA ranking was almost no use at all.
The most successful forecast came from the Elo (quality) score, which indicates that 1/3 of the time the team's quality (or superiority), as determined by their recent results, determined the outcome of the competition (but not necessarily each game). The graph for the data is shown below. The low coefficient of variation is caused by the very poor performance of the highly ranked French (ranked 9th finished 29th) and Italian (ranked 5th finished 26th) teams, and also by the lack of success of the very highly ranked Brazilian (ranked 1st finished 6th) and English (ranked 4th finished 13th) teams.
I have further investigated the results for the FIFA and Elo rankings by checking the result of using them to forecast the outcome of the recent UEFA Euro 2012 soccer championship. The coefficients of variation for this competition are:
0.23 FIFA order
0.22 Elo score
These values are in the middle of the range shown above for the World Cup, so the results seem to be general.
I thus conclude that being the best soccer team in recent games has only 25-30% influence on the outcome of the next series of games. Or, forecasting soccer results will be only 25-30% successful, if you base the forecast on the outcome of recent games. You already suspected that, of course, but in science we like to put numbers on our suspicions, which is what I have tried to do here. What I wonder, though, is how the bookmakers deal with this situation, since their livelihood depends on successful forecasting.
I will finish this post with what I think was the most interesting forecast for the 2010 World Cup. It was wrong, but only just — Germany officially finished third.