clock menu more-arrow no yes mobile

Filed under:

Methodology for ranking top 500 clubs in CONCACAF

The methodology for ranking top 500 clubs in CONCACAF uses an Elo rating system with over 22,000 games coupled with the Poisson distribution to rank as many clubs in CONCACAF as we could.

Peter G. Aiken-USA TODAY Sports

This post walks through the process we developed to rank the Top 500 soccer clubs in the CONCACAF region. As with our top 100 soccer clubs in the United States and Canada ranking the Elo rating system is what drives the ranking. However, this time we coupled the Elo system with the use of the Poisson distribution to set the crucial weights that the Elo system uses.

This rating includes the results of over 22,500 competitive games that occurred in over 35 countries going back to the 2011 season and through December of 2014. Only league games were used in the final ranking but games used in tournaments were used to set the initial weights and the weights of individual games.

The Elo rating system, developed by Hungarian-American mathematician Dr. Árpád Élő, is used by FIDE, the international chess federation, to rate chess players. In 1997 Bob Runyan adapted the Elo rating system to international football and posted the results on the Internet. Given how the soccer intelligentsia romanticizes a connection between soccer and chess, the use of this formula makes sense.

Footballdatabase.com uses Bob Runyan's methodology to rank club teams around the world. FIFA uses a modified version of Bob Runyan's adaptation to rank Women's International Soccer teams. The Elo system was adapted for soccer by adding a weighting for the kind of match, an adjustment for the home team advantage, and an adjustment for goal difference in the match result.

Each team starts with an initial score which sets an expectation for the quality of the team and the competition they will face. This initial weighting is critical to the final outcome, so we compiled hundreds of games between the teams and leagues to get the weights as justifiable as possible.

After each team has an initial weight, each game results in an adjustment to the team's score based on the following factors:

  • The team's old rating
  • The importance or weight of the match
  • The goal difference of the match
  • The result of the match including home field advantage
  • The expected result of the match

The details of the actual score are below but first we'll begin with how the model works and adjustments we've made to make the ranking as accurate as possible.

Strengths of the Rating System

The advantage of the Elo system is that it is a thorough assessment of a team's historical results. The system uses the results of every league match, including the goal difference, and weights the outcome based on the importance and the expected outcome.

Weaknesses of the Rating System

The importance of each match is pre-determined by the rating designer. Therefore the weighting of each match could potentially have a subjective element embedded in the score.

In addition, each team must start with a score. That score is pre-determined by the rating designer based on the perceived strength of the team's schedule. Teams with a strong strength of schedule will start with a higher score. Teams with a lower strength of schedule will start with a lower score. The difference between these starting scores is an important and could be a subjective choice by the designer.

The goal difference factor as determined in the original adaptation was very strong. A team that wins a game by two goals will get 50% more points for that outcome than if they had won by 1 goal. A three goal difference gets 75% more than 1 goal difference.

Changes to the Elo System to Address Weaknesses

The key to a good statistical ranking is to create a solid estimate of the quality of the leagues the teams play in. The leagues strength is used to set the initial weights and game weights used by Elo. You can look at the methodology for how that was estimated here. The short version is that we used games from CONCACAF Champions League, US Open Cup and Copa MX tournaments drive the majority of the weights used here. We created a goal difference index to estimate what the expected goal difference that would occur if average teams from two leagues played each other on a neutral site. For example, if an average Liga MX team played an average MLS team on a neutral site, our model estimates that the Liga MX team would win on average by 1.0 goals.

The use of the Poisson distribution

But linking goal difference to the critical Elo initial weights is challenging. This is where the Poisson distribution comes in. Using Poisson, we can use the expected goals of the two leagues to drive a likelihood of winning percentage. This is a common distribution used to predict the outcome of soccer matches. Just input expected goals scored and voila, you can estimate the probability of all scoring outcomes occurring. Elo scores are used to predict winning percentage as well. So we can use Poisson and goal expectation to get a winning percentage and then take that winning percentage and back into initial weights.

Here is a table that walks through that process for a number of the leagues included.

Starting Elo Weight League GDI vs. Liga MX Win % vs. Liga MX
Elo/Poisson
1550 Liga MX 0.0 50%
1425 Ascenco MX 0.9 33%
1400 MLS 1.0 30%
1335 Costa Rica 1.5 23%
1320 Panama 1.6 21%
1310 Honduras 1.7 20%
1300 NASL 1.8 19%
1255 Guatemala 2.1 16%
1185 El Salvador 2.4 11%
1180 USL 2.4 11%
1150 Nicaragua 2.7 9%
1020 Trinidad & Tobago 3.1 5%
970 Haiti 4.2 3%

The initial and game weights of the 40 leagues included are as follows:

Country Div ELO Starting Weight ELO Game Weight
Mexico - Liga MX 1550 30
Mexico - Ascenso MX 1425 26
USA & Canada - MLS 1400 25
Costa Rica - Primera 1335 23
Guatemala - Liga Nacional 1255 20
Panama - LPF 1320 22
Honduras - Liga Nacional 1310 22
USA & Canada - NASL 1300 22
Costa Rica - Asenco 1110 15
Trinidad & Tobago - T&T Pro League 1020 12
El Salvador - Primera Division 1185 18
Nicaragua - Primera Division 1150 17
USA & Canada - USL PRO 1180 18
Guatemala - Primera Division 1030 13
Jamaica - Premier League 1000 12
Puerto Rico - LNFPR 1000 12
Panama - Liga Nacional 1095 15
Haiti - Championnat National 1000 12
Guyana - Super League 950 10
Mexico - Segundo Division 1000 12
Antigua and Barbuda - Premier 800 5
Aruba - Division di Honor 800 5
Bahamas - Senior League 800 5
Barbados - Premier 800 5
Belize - Premier 800 5
Bermuda - Premier 800 5
British Virgin Islands - Premier 800 5
Cayman Islands - Premier 800 5
Cuba - Primera 800 5
Turks and Caicos Islands - Football League 800 5
Suriname - Hoofdklasse 800 5
St. Kitts and Nevis - Premier League 800 5
Martinique - Division d'Honneur 800 5
Guadeloupe - Division d'Honneur 800 5
Grenada - Premier Division 800 5
French Guiana - Division d'Honneur 800 5
Dominican Republic - Liga Mayor 800 5
Dominica - Premiere League 800 5
Curaçao - Sekshon Paga 800 5
Suriname - Eerste Klasse 770 4

A couple of quick notes. Second divisions with the exception of Asenco MX, which had good data, was weighted 225 points less at the start. The low score of 800 was based on expected win percentage between Liga MX teams and these teams. The game weights are calculated by assuming all game weights are between 30 to 5 and that the initial weights range between 800 and 1550 and everything was extrapolated in between. Suriname's 2nd division was placed below these levels. Also, for playoff games in each league 10 points was added to the game weight.

This initial weight represents the value for R_o (below) for the team's first match in the database.

The Basic Calculation

The Elo system has one formula which takes into account the factors mentioned above.

The ratings are based on the following formula:

R_n = R_o + K*G (W - W_e)

Where; 
R_n = The new team rating 
R_o = The old team rating 
K = Weight index regarding the tournament of the match 
G = A number from the index of goal differences 
W = The result of the match 
W_e = The expected result

Goal Differential = G

The number of goals is taken into account by use of a goal difference index. G is increased by 25% if a game is won by two goals, and if the game is won by three or more goals by a number decided through the appropriate calculation shown below;

If the game is a draw or is won by one goal

G = 1

If the game is won by two goals

G = 1.25

If the game is won by three or more goals

G = (11+N)/10

Where N is the goal difference

Result of the Match = W

W is the result of the game (1 for a win, 0.5 for a draw, and 0 for a loss).

Expected Result of Match = W_e

W_e is the expected result (win expectancy with a draw counting as 0.5) from the following formula:

W_e = 1 / (10^(-dr/400) + 1)

Where dr equals the difference in ratings plus 100 points for a team playing at home. So dr of 0 gives 0.5, of 120 gives 0.666 to the higher ranked team and 0.334 to the lower, and of 800 gives 0.99 to the higher ranked team and 0.01 to the lower.

This formula is calculated for each team in each game and the resulting score is carried forward to set the expectation for each team's next match.

Calculating the Probability of Game Outcomes

The expected result formula can be used to predict the outcome of games between two scored teams.

Take an average USL PRO team with 1100 points at home against an average MLS team with 1400 points. Add 100 to the USL PRO team for home field advantage.

To calculate the USL PRO team's odds of winning (including a draw being worth .5 points) use the formula above

W_e = 1 / (10^(-dr/400) + 1)

The dr for the USL PRO team is 1200-1400 = -200. Plugging -200 into the equation yields a result of 24%. Plugging in 200 would yield the result of 76%, the odds of winning for the MLS side.