The Elo Rating System for Chess and Beyond

The Elo Rating system is a method to rate players in chess and other competitive games. A new player starts with a rating of 1000. This rating will go up if they win games, and go down if they lose games. Over time a player’s rating becomes a true reflection of their ability – relative to the population.

My video was mostly based on A Comprehensive Guide to Chess Ratings by Prof Mark E Glickman

Below are some of the things I wanted to talk about, but cut so the video wasn’t too long!

Some explanations of the Elo rating system say it is based on the normal distribution, which is not quite true. Elo’s original idea did model each player’s ability as a normal distribution. The difference between the two players strengths would then also be a normal distribution. However, the formula for a normal distribution is a bit messy so today it is preferred to model each player using an extreme value distribution. The difference between the two players strengths is then a logistic distribution. This has the property that if a player has a rating 400 points more than another player they are 10 times more likely to win, this makes the formula nicer to use. Practically, the difference between a logistic distribution and the normal distribution is small.

Logistic distribution on Wikipedia
We replace e with base 10, s=400, mu=R_A – R_B and x=0 in the cdf.

For the update formula I say that your rating can increase or decrease by a maximum of 32 points, and I said there was no special reason for that. This value is called the K-factor, and the higher the K-factor the more weight you give to the players tournament performance (and so less weight to their pre-tournament performance). For high level chess tournaments they use a K-factor of 16 as it is believed their pre-tournament rating is about right, so their rating will not fluctuate as much. Some tournaments use different K-factors.

In the original Elo system, draws are not included, instead they are considered to be equivalent to half a win and half a loss. The paper by Mark Glickman above contains a formula that includes draws. Similarly the paper contains a formula that includes the advantage to white.

Another criticism of Elo is the reliability of the rating. The rating of an infrequent player is a less reliable measure of that player’s strength, so to address this problem Mark Glickman devised Glicko and Glicko2. See descriptions of these methods at

On the plus side, the Elo system was leagues ahead of what it replaced, known as the Harkness system. I originally intended to explain the Harkness system as well, so here are the paragraphs I cut:

“In the Harkness system an average was taken of everyone’s rating, then at the end of the tournament if the percentage of games you won was 50% then your new rating was the average rating.
If you did better or worse than 50% then 10 points was added or subtracted to the average rating for every percentage point above or below 50.
This system was not the best and could produce some strange results. For example, it was possible for a player to lose every game and still gain points.”

This video was suggested by Outray Chess. The maths is a bit harder, but I liked the idea so I made a in-front-of-a-wall video.

500 Comments

  1. Bless you honestly, i had a team of designers who decided it good to copy paste elo from website without understanding which cost alot of problems for us.
    Thanks to that video i can fix this problem and made it better system

  2. so a 2400 difference in elo rating means that the weaker player will only win one game out of 1,000,000?

  3. I'm a data scientist and I've always found the idea of generating labels for subjective qualities by getting people to just "choose the better one" and assigning elo scores a fascinating idea.

  4. I hate math
    It's corse and rough and its Everywhere

  5. I need a separate Elo rating for sober playing vs high af or an equation to factor it all in

  6. I still don't get how you add and subtract scores. I want to do elo ratings with just pen and paper.

  7. Tekken 7 is a fighting game and they just announced a new rating system for Season 4. Some people suspect it would be a ELO rating system. I hope so this sounds like a fantastic idea. Especially matching against opponents in your current level.

  8. That system is basically the same principle that's used for Q-learning (a particular algorithm to train AIs).

    Your get Rewards for certain events (in here win, loss, draw) and update a score based on that reward.

    It's just that in Q-Learning it's not players getting the scores but pairs of a states and possible actions. i.e. a chess board and a particular move.
    And the formulas are slightly different.

  9. Is there some sort of formula that can give the maximum elo rating based on the population, example, lets assume there was exactly 1,000,000,000 people who played chess, what is the limit the top player can have.

  10. I know this guy from the numberphile channel

  11. Has there been a change in Lichess blitz player pool since covid? My score has plummited while my rapid score has steadly increased

  12. "Each player brings a box of numbers, whoever pulls out the higher number wins." I can see that working as a TV format, FFS! people watch "Deal or no deal" and that has bugger all possible game play strategy.

  13. I legit thought someone had just copy-pasted a numberphile video on their channel. Had to go check the channel to see

  14. When my rating was brand new I gained over 40 points for a single game before, it depends on the coefficient

  15. English is not my native language and I really wanted to say that the way you express yourself as well as your accent are perfect for people like me ! Thank you for being that smooth in your speaking and clear in your explanations.

  16. After 2.34 I was baffled. Why 400? Why is 800 equal to 100 times more. Surely it’s 20 if 400 is ten times more as 40 times 20 equals 800. What does the zero represent on the curve?

  17. Although he was an American citizen he was born in Hungary thus he was Hungarian and not American. (A disappointed Hungarian viewer. 🙂 )

  18. And again YouTube recommendations brought us together.

  19. i plugged my rating against carlsens and it returned "LOL"

  20. Cool thing. Made me go down a rabbit hole in my field of work and how to apply this.

  21. Is 10 times more likely equal to 1000% more likely, that is, 11 times as likely?

  22. So if a player with a 1300 score gets brain damage the week before. They have ~20 games to lose b4 people catch on to his handicap.

  23. Hello creator of this video, I have a couple of questions. 1) Could you prove that, purely mathematically speaking, it makes no difference trying to raise your ELO by playing weak players rather than strong players? 2) Could you prove that the ELO rating gives the best results in terms of being stable when it should be, and being dynamic when it should be. I ask this because FIFA has a different rating system that can make undeserving national teams the best team in the world, and then bring them down in no time to place twenty. Same goes for tennis. 3) Can you prove that, purely mathematically speaking, the ELO system gives at any time the best predictability of the outcome (this is probably the same quiestion as 2 but expressed differently)

  24. Thoughts on Elo inflation?

    Let's say that 100 new players start, lose a bunch of games, get discouraged, leave forever. Those points are now in circulation even though the players aren't, artificially raising the score of every other player a tiny bit averaging out naturally over time. In this way 1000 would not be the average, just the starting score. Of course high ranking players can stop too but the fact of being high ranking means those players themselves are rarer.

  25. If Brick Top got into chess instead of bare-knuckle boxing and breeding pigs.

  26. So ELO assumes all have the same standard deviation on their performance. but in real life, I don't believe this would be true. i.e. David vs Goliath…. David gets help from God or a very lucky strike and wins. But his real probability of winning was zero except he won the lottery.

  27. Of course the real chess rating system is more complicated. They add in rating floors to prevent sandbagging (so if your rating has reached (e.g.) 1501 then no matter how much you lose it can never go below 1400). But then floors caused rating inflation so they introduced some other complexities to counter that. I'm not even sure what all the details are. It would have been interesting if the video covered all that too

  28. Plat on r6s 🤙🏾🤟🏽👳🏾‍♂️😩👧

  29. "Your rating is a measure of your ability relative to the population"
    That is an important factor to remember

  30. Why P(A)=1-P(B), where's the probability of draw?

  31. The important thing is not to “Turn to Stone”, avoid “Confusion” and “hold on tight” to as many pieces as possible.

  32. I had Watched the video before But after 1 year (recommendation) I see, it's numberphile guy

  33. so basically, this means that on chess.com, GM Hikaru has ~10000x chances of winning against me. That makes sense, yep.

  34. A question. How can P(B wins) be equal to 1 – P(A wins) in a game where there is a possibility of the game ends in a draw?

Leave a Reply

Your email address will not be published.