How accurate are’s new rating estimates?

Are the bots overrated? Underrated? Correctly rated? These are the questions we’re here to answer today, because I like watching make up numbers for my elo.

00:00: Game 1
02:26: Game 2
07:53: Game 3
19:19: Stockfishn’t

  1. 1150 is being very generous to Nelson. I have no difficulty destroying it as a 750.

  2. I see many claim that the bots are far weaker than their ratings would suggest. From what I gather, Alexander disagrees. What are other people's experiences with them?

  3. Your result is a bit of different than mine. With the bot ratings it’s about the same we can see in the video. Sometimes a bit better sometimes a bit lower than the bot‘s rating. But the ratings I get are actually fairly consistent. It does not matter so much if I play a strong bot or a weak bot the rating I get is about the same. And it is not far of my actual OTB rating.

  4. 27:50 You never mentioned the possibility of Rh8#. It's the correct move over Qh8+ which does indeed let black escape temporarily and doesn't lead to immediate mate.

  5. Reason d5 is a good move is cause he takes, you take back with the bishop and he’s just losing the rook.

  6. Komodo level 25 is a 3050, not 3200. Magnus Carlsen drew you

  7. I wonder if part of the problem with your rating is that against the first two bots especially you played more casually, and therefore your rating might reasonably have been judged to be lower.
    I also wonder if there's a cap on the difference between the two numbers.
    I'd love to see more experiments with larger sample sizes, but this was a really fun video, thanks for the upload!

  8. they aren't. my peak rating was 300 (i am a newbie) and after i beat Nelson it told me that, apparently, I was an 1800.

  9. Nice video! I have wondered this for a long time, but never took the time to do this. Now I don't have to. Thanks.

  10. 1:30 how is that only an inaccuracy? He's giving up a queen and a knight for a rook.

  11. I did basically the same thing you did and got the same results. I think it only gives you a little credit for beating a 250, IE anyone can do it so why would it rate you as 2000 when you could beat it as a 400? On the other end, I think it was highly inaccurate giving me a rating of 2300 for losing to the 3200 bot. So basically it estimated your strength based on how tough your opponent is. I guess the best true-Elo estimate would be against a bot with whom you win only half the games.
    My long game OTB Elo estimate by me is about 1600. My actual USCF Elo rating is about 1050, but I haven't played OTB in rated tournaments in 18 years, and have learned a lot since then. I regularly beat the 1800 bots, but assume their rating is inflated a bit compared to human Elo ratings.
    Good video, thanks. Subbed. Like your music too.

  12. idk why I was reccomended this channel. I like your style, though!

  13. Well, you beated a 1900 bot. So you must be around 2000, is that your real rating?

  14. I just tested it. It said a 400 rated bot played like a 1650 (blundered queen on move 5) and i played like a 2200 (im 800 and was playing very carelessly on top of that) 😭

  15. You see, the chess content is where the views are at

  16. Amazing video! Honestly when this came into my recommendation I assumed you had 100k+ subs lmao. What’s your real rating though I’m curious?

  17. I've been spending days trying to outwit Komodo25 and still have not done it.

  18. The rating evaluator is all over the place i am a 750 in speed chess and won against a 1300 bot and it called me 2050 like ????

  19. Remember the rating are based on an accuracy average that the bots may be able to play at. +- based on certain circumstances, some are forced, other are not. The bots are making moves that are like moves based on their level

  20. You totally misrepresent what that accuracy elo is

  21. “He is a wall for a lot of beginner players, likes to play with his queen alot” 🤨🤨🤨

  22. I once played a 850 bot and it showed me that im a 2200 and the 850 bot is 2100

  23. I got 1800+ twice even though I'm like 800~ I think

  24. very fucking shitty. an evaluation of my moms rating from some game she played (at that time like 240) said it thought her rating was like 2250

  25. The engine once estimated my ELO at 800, another two times it estimated me as 2450.

    I'm a 1400…

  26. Nice video! I can see you put a ton of effort into making it! I subbed to your channel xD

    10:00 In a position where you can't move the knight without having it be captured like that usually it's better to not move it until they threaten to capture it because when you move it to have them capture it you lose a tempo, especially when you captured the pawn back with the Bishop, instead of using a move to Develop another piece.
    Of course the idea of doubling the opponents pawns or messing up the structure is still great, and they aren't a super strong computer so it doesn't matter much, but just wanted to point this out to save you a tempo in the future (and also keep your pieces alive for as long as possible) xD

    I'm about half ways through the video, some things came up that I gotta do, but just wanted to mention that. Have a nice day 🙂

  27. i played a game with two brilliants and 2 inaccuracies and it gave me an estimate of 1250

  28. I’m 770 and it’s called me a 1550 so I don’t know what it’s thinking, it’s also called me a 100 so I think that might have something to do with why my overall rating is low. lol

  29. it is the most accurate when it rates you higher than you actually are

  30. I have had it rate ChatGPT-generated PGNs anything from 1600-2800.

  31. You laugh like I do. Might be hard to listen to, lol.

  32. This entire video is wrong because of a quote by Garry kasparov "chess strength in general and chess strength in a specific match are by no means the same thing" i love the content but it's still wrong

  33. I wonder if the rating estimate would be higher or lower if the bots played against themselves. Would Nelson vs Nelson result in a 1300 rating estimate? Personally I assume that it would be an almost 800 point game due to purposeful mistakes.

  34. I'm 750 rated and the two times I used it, I got an estimated rating of around 2300

