The Elo Rating system is a method to rate players in chess and other competitive games. A new player starts with a rating of 1000. This rating will go up if they win games, and go down if they lose games. Over time a player’s rating becomes a true reflection of their ability – relative to the population.

My video was mostly based on A Comprehensive Guide to Chess Ratings by Prof Mark E Glickman

Below are some of the things I wanted to talk about, but cut so the video wasn’t too long!

Some explanations of the Elo rating system say it is based on the normal distribution, which is not quite true. Elo’s original idea did model each player’s ability as a normal distribution. The difference between the two players strengths would then also be a normal distribution. However, the formula for a normal distribution is a bit messy so today it is preferred to model each player using an extreme value distribution. The difference between the two players strengths is then a logistic distribution. This has the property that if a player has a rating 400 points more than another player they are 10 times more likely to win, this makes the formula nicer to use. Practically, the difference between a logistic distribution and the normal distribution is small.

Logistic distribution on Wikipedia

We replace e with base 10, s=400, mu=R_A – R_B and x=0 in the cdf.

For the update formula I say that your rating can increase or decrease by a maximum of 32 points, and I said there was no special reason for that. This value is called the K-factor, and the higher the K-factor the more weight you give to the players tournament performance (and so less weight to their pre-tournament performance). For high level chess tournaments they use a K-factor of 16 as it is believed their pre-tournament rating is about right, so their rating will not fluctuate as much. Some tournaments use different K-factors.

In the original Elo system, draws are not included, instead they are considered to be equivalent to half a win and half a loss. The paper by Mark Glickman above contains a formula that includes draws. Similarly the paper contains a formula that includes the advantage to white.

Another criticism of Elo is the reliability of the rating. The rating of an infrequent player is a less reliable measure of that player’s strength, so to address this problem Mark Glickman devised Glicko and Glicko2. See descriptions of these methods at

On the plus side, the Elo system was leagues ahead of what it replaced, known as the Harkness system. I originally intended to explain the Harkness system as well, so here are the paragraphs I cut:

“In the Harkness system an average was taken of everyone’s rating, then at the end of the tournament if the percentage of games you won was 50% then your new rating was the average rating.

If you did better or worse than 50% then 10 points was added or subtracted to the average rating for every percentage point above or below 50.

This system was not the best and could produce some strange results. For example, it was possible for a player to lose every game and still gain points.”

This video was suggested by Outray Chess. The maths is a bit harder, but I liked the idea so I made a in-front-of-a-wall video.

1:08 you mean Hungarian physicis

intelligence follows this curve

Great

So the larger the population the higher the rating of the best player will be?

Basically we can assign elo ratings to many aspects of life…Interesting.

I would be curious to know how could this rating system scale to a multi-player game where each game can have more than two players and the outcome is a ranking like:

1: Alice

2: Bob

3: Charles

4: Dominic

I ran into this problem in the Android game that I develop. I created my own system, which, after a few iterations, seems to work well, but I would be curious to know if the Elo rating can apply!

For info, the idea of my rating system is the following:

A few criteria that I wanted was that the score should be somehow tangible. The unit should be meaningful, and not just an obscure number. Also, I wanted that not only the first of the ranking would win and the rest loose, but the gains should be adjusted progressively depending on the rank.

For this, I decided that for a balanced match, each player in the ranking should receive a score equal to the number of players he defeated minus the number of players who defeated him.

In my ranking example above, Alice score would be 3-0=3, while Charles would be 1-2=-1 (he won against D. but lost against A and B)

So my scoring process presents two steps: the contribution, and the distribution.

The contribution part consists in having the N players to contribute (n-1) of their score to the common pot which sums to n(n-1).

The distribution simply consists in distributing 0 to the last player, 2 to the one above him, then 4 to the next, and climb the ranking up to giving 2(n-1) to the first player. This all sums to n(n-1), the amount from the contribution part!

For each player's point of view, the sum of its contribution plus its distribution gain does match the pattern described above: the first player gets n-1 and the last gets -(n-1), which corresponds to the idea of gaining as much points as people we defeated minus people we lost against.

Now, in order to get the same property than the Elo rating where the gains are depending on the players rating, the idea is to weight the contributions depending on each players' score before the match, still aiming for a total contribution of n*(n-1). This brings a bit of complexity since a very good player might "pay" for everyone's contribution and end up loosing in overall even if he came up first in the ranking. This problem could be resolved by defining a maximum contribution with respect to the number of player.

Anyway, with this system of weighted contributions according to players' score and then the linear distribution, I could make a relatively fair competitive system where:

– fair repartition of the score across the ranking (not just the first one wins)

– winning against weaker players brings fewer points than winning against better opponents (and vice versa for losing)

– joining a large game with many participants presents higher risks (possibility to win more at once, but also to loose more)

Hope that was understandable 🙃

I would have enjoyed a bit more explanation regarding the ratings themselves. As I recall, and correct me if I'm wrong, the system is set so that the average player is rated at 1200 and since it follows a Gaussian distribution, each rating would also place you at an approximate percentile of skill.

very good explanation

I dont understand anything haha

Unless I'm missing something, the fact that wins/losses effectively transfer points from the loser to the winner would tend to result in the average number of points/player across all players drifting over time. If a grandmaster dies, they take their points with them. If a weak player starts with 1000, loses their first ten games, and then stops playing because they're not having fun, they leave those extra points in the system. Has anyone ever taken a good look at this effect? Have any studies been done?

Or is that just the source of the inflation/deflation you mention at the end, where 1800 this year isn't the same as 1800 next year or five years from now?

Go (a.k.a. Wei Chi / Baduk) has an established rating system based roughly on its handicap system, but I believe that many online servers nowadays use an Elo system and convert to kyu/dan for display.

Your shirt just suits you unbelievably fine!!

Need this video with glikos system!

Thanks

Understanding the rating system for chess is harder than chess

I'm rated 1300 and I'll tell you I feel like I have 0% of a chance to beat someone rated 2100.

When I play someone 300 points higher, i will have a draw in every tenth and win in every 40 games. Only once in 2k games I won a guy 375 points higher than me. 1/10 seems to high for 400 points difference, or was it just an example?

Wow so Magnus Carles is a million times more likely to beat me… that seems wrong I think it’s impossible

Elo hell is real!

I was hoping you would go into rating inflation. In principle, the average rating of the population should always be exactly 1,000, since every point that one player earns is lost by another. And if you include all ratings (including of inactive players), that is actually true (if you ignore some interruptions to the rating system, like when women got 100 free points in 1986). But there is a persistent claim that among the most elite players, ratings have gone up, even while their skill has not increased much relative to the general public. Whether or not this is true, it is at least possible. For instance, suppose only players with a rating 2000 or above are allowed to participate in most tournaments. Players with a true skill level somewhat below 2000 will sometimes be overrated and make it into these tournaments. On average, because they are overrated, they will donate points to other players until their rating drops back down to reflect their true skill. On the other hand, players that truly have a rating somewhat above 2000 will sometimes be underrated and unable to participate in these tournaments. But then they will just be taking points from lower rated tournaments until their rating rises back up to reflect their true skill. The overall result is a steady flow of rating points from weaker players to stronger players even if skill levels never change. Or at least, that's the idea.

What about draws in the formula?

It would be interesting to see what those methods are for normalizing this score over time.

I always through Elo was an acronym.

An excellent video! I play a lot of Overwatch which uses the elo system and i created another account to play with my buddies who weren’t quite at my skill level. I noticed that the elo system in overwatch is very harsh at first for new players, but gets softer over time. I’m overwatch, once the harshness disperses, your elo change for any give game is generally 15-30 (On a scale of 1 to 5000) I normally play at ~2400 to 2600. My all my friends played at ~1100-1700. I noticed when i played on my new account the range was more like 85-115! I wonder how new players are calculated in other elo systems like chess.

Then you must play chess on Lichess.org

"eduardo i need the algorhithm"

thought it said cheese…

Couldn't this system become flooded with "inflation points" of new people coming in, losing all their points, and then quitting?

Damn. I need to watch this another two times.

So the Elo system is more or less a tool against noobcrushing? I like that.

This is why gwent is bull shit you get roled even tho the other player has no skill. Pay to win

solid video. explained clearly.

I thought ELO was a band.

As the name suggests Élő Árpád was hungarian btw(but he lived in 'murica)

Since technically expected score is always less than 1, can't a player keep beating some low-rated player to keep increasing their score to infinity? No matter how high your rating is, there is always a probability you lose, not sure if the overall rating converges somewhere

DO a video on the other systems you mention at the end of the video please!

but what does the E, the L, and the O stand for

Good explination. But already knew how it worked because I watched a video on how DotA2 reiting works.

Is this why weaker players are known as bell-ends?

Great video btw.

Playing competitive Overwatch is the greatest proof against this system.

It's all bollocks.

elo is awful in online games because only a few bad matches with hackers or smurfs ruins your rank forever

Cool video…thanks. As a player on chess.com, I wondered exactly how that worked. Basically, I only have a 10% chance against someone with a rating 400 higher than mine. If the difference is 400, that player's odds of winning are a factor of 10 higher.

This is similar to the magnitude system used in astronomy. If a star is 5 magnitudes brighter then it is 100 times brighter.

1,000th comment

Uff my english ist Bad i thought this was an elon Musk cheese rating

Is there a subtle fish eye filter on your head centered like between your eyes?

Elo was Hungarian, not American

No, Árpád wasn't american, after he had born in Hungary, he moved to the US with his parents, when he was 10 years old. Yes, he got the US citizenship, but his origins from Hungary.

How about elo rating inflation??

I can't figure if I know you from numberphile or if you're a Weasley. But either way I clicked…. wait, is it both?

Okay, so that means, that when a new player enters the sport, the total pool of points increases by 1000 and when a player retires the total amount decreases by their current rating. As a result, if a player retires with a ranking greater than 1000, then the total amount in play has decreased, and if they retire with fewer than 1000 points the total has increased.

So my question would be: has the total Elo pool (significantly) increased over time?

Fantastic explanation. Could you explain whether chess inflation is real, and if so, what causes it? Also, I think the elo system might better be replaced by a CAPS score in which accuracy against computer moves is measured. That would be more accurate but the problem is that chess engine strength also improves over time