Move aside Paul the Octopus, Machine Learning predicts the World Cup 2018 winner

For many, football is unconditional love. The beautiful game, as many call it is more than just about sports. Football is an emotion that unites people across borders. With the latest edition of FIFA’s premiere event – The World Cup underway, a lot of has been debated over who will be the eventual. Will it be Germany – the defending champions or Brazil – the iconic team in the football space. Or will some other country take the price home and stun the world.

Personally, I do not like the prediction game. For me, the game is about team work, quality and proving your mettle in a 90 minutes time frame. Results not only depends on the team on-paper but also on the mindset and mentality of the individual players on the given day. I never paid heed to predictions being thrown up in the air from fans who have utmost faith in their team. However, one such prediction caught my eye and is making heads turn.

The prediction comes from Andres Groll and his colleagues from the University of Dortmund. These experts have used Machine Learning algorithms along with statistical data, an approach known as Random-Forrest, to predict the winner of the most viewed sporting event in the world. So, who do you think it will be?

Drumrolls …. The ML algorithm predicts that Spain has a 17.8% chance of winning the coveted price.

The definite use of Random Forrest

Let’s dive deep into this and breakdown the prediction to understand better. The Random Forrest technique has emerged as a grand solution for analyzing huge chunks of data while avoiding the pitfalls of the conventional data analyzing approaches. Random Forrest is based on the idea that certain future events can be predicted by a decision tree in which outcome is calculated on each branch by referring to a training data.

Random Forrest over comes the major drawback of Decision Tree Algorithm – Overfitting. Instead of calculating the outcome at every branch, outcomes are calculated at random branches. This is iterated multiple times, each time with a differently selected branches. The final result is obtained by averaging these randomly constructed decision tree.

Random Forrest is significantly advantageous. First of all, it does not suffer from the same overfitting problem. Secondly, if a decision tree has a lot of parameters, it becomes easier to see which factors are redundant. These redundant factors can then be ignored.

How did they do it?

So, this was the methodology adopted Groll and co. They began with combining wide range of data ranging from country’s GDP to population density, from national FIFA ranking to individual player rating. Characteristics of individual players and their teams were also considered like – Age, the number of Champions League players, their domestic form and rating over the couple of seasons and so on.

Bringing all this data together, helped Groll and his team to visualize the importance of different factors. As it turns out, the team rankings created by other methods like FIFA, bookmakers and other sources proved influential along with GDP of the nation and the number of Champions league players. Redundant factors included population and domestic form and ratings.

The prediction in itself, is not that simple. A lot of the outcome depends on the structure of the tournament. For example, if Germany qualify for the knockout stage, then they are bound to face a strong opposition and have a 58% probability to qualify for the quarter finals. On the other hand, Spain are unlikely to face a strong competition and have a 78% chance of going through to the quarter finals.

The paper goes on to say that, if both the teams qualify for the knockout stages Spain and Germany have more or less equal chance of winning the coveted title. However, Spain trumps Germany due to having higher chances of getting through.

But, there is a grand twist to this tale, Random Forrest makes it possible to simulate the entire tournament, over multiple iterations. Groll and his team used this method to predict the winner over 100,000 times.

According to the most probable tournament instead of Spain, Germany would win the world cup. So, the paper goes on to state that although Spain are the most probable winners. If Germany do qualify till the quarters, they have the upper hand and will be the most probable winners.

So, there you have it. With all that his happening, one thing is clear – irrespective of the outcome, technology is going to be the grand winner. From traditional means of prediction to technology taking over this game with strong reason to back them up. It just goes out to prove how technology has evolved with us and is taking doing things we never imagined machines to do before. Understanding Machine Learning algorithms would help you in riding this technology wave and make a mark in the tch world. Start Learning today and let us know who you think will win the world cup?

Reference: arxiv.org/abs/1806.03208 : Prediction Of The FIFA World Cup 2018 – A Random Forest Approach With An Emphasis On Estimated Team Ability Parameters