The Elo Rating system is a method to rate players in chess and other competitive games. A new player starts with a rating of 1000. This rating will go up if they win games, and go down if they lose games. Over time a player’s rating becomes a true reflection of their ability – relative to the population.

My video was mostly based on A Comprehensive Guide to Chess Ratings by Prof Mark E Glickman

Below are some of the things I wanted to talk about, but cut so the video wasn’t too long!

Some explanations of the Elo rating system say it is based on the normal distribution, which is not quite true. Elo’s original idea did model each player’s ability as a normal distribution. The difference between the two players strengths would then also be a normal distribution. However, the formula for a normal distribution is a bit messy so today it is preferred to model each player using an extreme value distribution. The difference between the two players strengths is then a logistic distribution. This has the property that if a player has a rating 400 points more than another player they are 10 times more likely to win, this makes the formula nicer to use. Practically, the difference between a logistic distribution and the normal distribution is small.

Logistic distribution on Wikipedia

We replace e with base 10, s=400, mu=R_A – R_B and x=0 in the cdf.

For the update formula I say that your rating can increase or decrease by a maximum of 32 points, and I said there was no special reason for that. This value is called the K-factor, and the higher the K-factor the more weight you give to the players tournament performance (and so less weight to their pre-tournament performance). For high level chess tournaments they use a K-factor of 16 as it is believed their pre-tournament rating is about right, so their rating will not fluctuate as much. Some tournaments use different K-factors.

In the original Elo system, draws are not included, instead they are considered to be equivalent to half a win and half a loss. The paper by Mark Glickman above contains a formula that includes draws. Similarly the paper contains a formula that includes the advantage to white.

Another criticism of Elo is the reliability of the rating. The rating of an infrequent player is a less reliable measure of that player’s strength, so to address this problem Mark Glickman devised Glicko and Glicko2. See descriptions of these methods at

On the plus side, the Elo system was leagues ahead of what it replaced, known as the Harkness system. I originally intended to explain the Harkness system as well, so here are the paragraphs I cut:

“In the Harkness system an average was taken of everyone’s rating, then at the end of the tournament if the percentage of games you won was 50% then your new rating was the average rating.

If you did better or worse than 50% then 10 points was added or subtracted to the average rating for every percentage point above or below 50.

This system was not the best and could produce some strange results. For example, it was possible for a player to lose every game and still gain points.”

This video was suggested by Outray Chess. The maths is a bit harder, but I liked the idea so I made a in-front-of-a-wall video.

Chess.com and lichess.com may not use the Elo formulas. I have noticed that beginners' ratings will often change 50-100 points whereas people who have been on the website for awhile tend to change 5-8 points. So clearly N (total number of games played) is part of their formula.

I have also noticed that my lichess rating is ~300 points higher than my chess.com rating. This could be because of different populations, but it is also possible the two sites use different calculations.

400 points more than another player they are 10 times more likely to win

You should take a look at Microsoft's Trueskill algorithm. The Microsoft Research whitepaper goes into detail about the methodology that guided their decisions, and why Elo or modified Elo algorithms weren't sufficient for more than 2 players.

v1: https://www.microsoft.com/en-us/research/wp-content/uploads/2007/01/NIPS2006_0688.pdf

v2: https://www.microsoft.com/en-us/research/uploads/prod/2018/03/trueskill2.pdf

My rating on chess.com rating is around 1650 right now which is in the top 10%, Hikaru Nakamura's rating is 3200(who is on top) , according to the formula that means the odds of me winning is 0.000133334 or 1 in 10,000. I already had so much respect for GMs like Naka but holly fuck that is humbling.

Apply that to Bobby Fischer probably the greatest of all time, who was 125 points ahead of his nearest competitor at his peak Boris Spassky the biggest gap between top players ever. The odds of Bobby beating his nearest competitor was 67.2%. If you look at Magnus(2863) the best player currently and his nearest competitor Fabi (2835) the odds of magnus winning 54.2%.

Sequel on Glicko and Glicko-2 sometime?

If drawing Is a possibility how Is the probability of winning for A equal to 1-P(B). Doesn't this mean that the probability of a draw Is 0?

Help. My Elo rating goes down sometimes when I beat a lower rank. Makes no sense.

Some people claim the system breaks down at rating differences of 200-300 and online people have been known to try and farm players that are 200-300 points lower. Does this actually work and what is the exact cause of this error?

Isnt this system flawed in a way cos say for example Carlsen is 2800+and someone is rated 400,thats a difference of 2400 which in elo means that Carlsen will win 10^6 games before the 400 guy wins 1, so if they play a million games then the other guy would win 1, but in reality they wouldnt, ever. Not in a trillion games

3:43… so AlphaZero's supposed 3200 rating against Magnus Carlsen 2882, would come to 0.138… that's significantly better than I expected.

How come I played a game and my elo rating increased by 7 while my opponent's rating dropped by 9

Is no one else bothered by the fact that he called it a "logistic curve?" It's a logarithmic graph! Sorry, I couldn't get past this lol.

These days many chess sites like chess.com use glicko or even glicko 2

As someone who plays Starcraft 2 to this day, I totally get how population changes ELO

Very well explained. I wasn't aware that the measurement was relative to the population.

I developed PowerBase, a method of making power ratings and pointspreads for college basketball, with the Elo method, but I had to make some adjustments. K can have only one value because of margin of victory (it's like 0.65 or something), and I do "replays" (successive approximation) to do bonus or penalty points of retroactive ratings. Won a good deal of money on it.

There was a guy that was arguing with me that your actual live tournament rating on USCF or FIDE is the same as your rating would be on chess.com 😂😂😂. That’s why Magnus Carlsen’s rating online is like 500 points higher than his actual. 😂

Zero Probability events can occur.

