Rating List Experiment

Subject: Rating List Experiment Thu Jan 04, 2024 8:19 pm

Code::    # PLAYER : RATING ERROR POINTS PLAYED (%) CFS(%) W D L D(%) OppAvg
   1 Stockfish 16 230630 : 3707.7 3.1 10733.0 15000 72 100 7197 7072 731 47 3486.2
   2 Torch 1 popavx2 : 3659.2 2.9 10020.0 15000 67 100 6438 7164 1398 48 3489.4
   3 KomodoDragon 3.3 avx2 : 3618.0 3.5 9389.0 15000 63 100 5753 7272 1975 48 3492.1
   4 Berserk 12 avx2 : 3587.0 3.4 8903.5 15000 59 100 5278 7251 2471 48 3494.2
   5 Ethereal 14.25 nnue : 3514.4 3.5 7742.5 15000 52 100 3970 7545 3485 50 3499.0
   6 Caissa 1.15 avx2 : 3479.8 3.3 7186.5 15000 48 100 3396 7581 4023 51 3501.3
   7 RubiChess 230918 avx2 : 3472.1 3.2 7064.0 15000 47 100 3321 7486 4193 50 3501.9
   8 CSTal 2.0 avx2 : 3436.2 3.7 6492.0 15000 43 99 2735 7514 4751 50 3504.3
   9 Obsidian 9.0 avx2 : 3431.7 2.1 6420.0 15000 43 100 2727 7386 4887 49 3504.6
  10 Clover 6.1 avx2 : 3425.9 3.6 6329.0 15000 42 100 2577 7504 4919 50 3504.9
  11 Koivisto 9.2 avx2 : 3414.4 1.9 6148.0 15000 41 100 2428 7440 5132 50 3505.7
  12 Rebel EAS avx2 : 3392.0 4.5 5798.0 15000 39 99 2145 7306 5549 49 3507.2
  13 Seer 2.7.0 avx2 : 3385.7 3.4 5701.0 15000 38 81 2111 7180 5709 48 3507.6
  14 RofChade 3.1 avx2 : 3384.0 3.0 5674.5 15000 38 100 2077 7195 5728 48 3507.7
  15 Uralochka 3.40a avx2 : 3374.3 1.5 5525.0 15000 37 --- 1968 7114 5918 47 3508.4

This is the Stefan rating list with ordo -a 3500

When count won games as 2 points instead of one we get -

Code::    # PLAYER : RATING ERROR POINTS PLAYED (%) CFS(%) W D L D(%)
   1 Stockfish 16 230630 : 3763.6 3.5 17930.0 22928 78 100 14394 7072 1462 31
   2 Torch 1 popavx2 : 3699.7 3.8 16458.0 22836 72 100 12876 7164 2796 31
   3 KomodoDragon 3.3 avx2 : 3647.6 3.2 15142.0 22728 67 100 11506 7272 3950 32
   4 Berserk 12 avx2 : 3610.1 2.3 14181.5 22749 62 100 10556 7251 4942 32
   5 Ethereal 14.25 nnue : 3514.8 3.1 11712.5 22455 52 100 7940 7545 6970 34
   6 Caissa 1.15 avx2 : 3475.7 3.1 10582.5 22419 47 100 6792 7581 8046 34
   7 RubiChess 230918 avx2 : 3459.7 3.3 10385.0 22514 46 100 6642 7486 8386 33
   8 CSTal 2.0 avx2 : 3420.1 2.2 9227.0 22486 41 100 5470 7514 9502 33
   9 Clover 6.1 avx2 : 3413.2 3.4 8906.0 22496 40 70 5154 7504 9838 33
  10 Obsidian 9.0 avx2 : 3411.7 4.4 9147.0 22614 40 100 5454 7386 9774 33
  11 Koivisto 9.2 avx2 : 3395.3 3.0 8576.0 22560 38 100 4856 7440 10264 33
  12 Rebel EAS avx2 : 3359.2 3.5 7943.0 22694 35 99 4290 7306 11098 32
  13 Seer 2.7.0 avx2 : 3353.2 3.5 7812.0 22820 34 89 4222 7180 11418 31
  14 RofChade 3.1 avx2 : 3350.4 4.3 7751.5 22805 34 73 4154 7195 11456 32
  15 Uralochka 3.40a avx2 : 3349.0 3.8 7493.0 22886 33 --- 3936 7114 11836 31

For visual comparison -

Code::    NORMAL | DOUBLE
   # PLAYER : RATING | # PLAYER : RATING
   1 Stockfish 16 230630 : 3707.7 | 1 Stockfish 16 230630 : 3763.6
   2 Torch 1 popavx2 : 3659.2 | 2 Torch 1 popavx2 : 3699.7
   3 KomodoDragon 3.3 avx2 : 3618.0 | 3 KomodoDragon 3.3 avx2 : 3647.6
   4 Berserk 12 avx2 : 3587.0 | 4 Berserk 12 avx2 : 3610.1
   5 Ethereal 14.25 nnue : 3514.4 | 5 Ethereal 14.25 nnue : 3514.8
   6 Caissa 1.15 avx2 : 3479.8 | 6 Caissa 1.15 avx2 : 3475.7
   7 RubiChess 230918 avx2 : 3472.1 | 7 RubiChess 230918 avx2 : 3459.7
   8 CSTal 2.0 avx2 : 3436.2 | 8 CSTal 2.0 avx2 : 3420.1
   9 Obsidian 9.0 avx2 : 3431.7 | 9 Clover 6.1 avx2 : 3413.2
  10 Clover 6.1 avx2 : 3425.9 | 10 Obsidian 9.0 avx2 : 3411.7
  11 Koivisto 9.2 avx2 : 3414.4 | 11 Koivisto 9.2 avx2 : 3395.3
  12 Rebel EAS avx2 : 3392.0 | 12 Rebel EAS avx2 : 3359.2
  13 Seer 2.7.0 avx2 : 3385.7 | 13 Seer 2.7.0 avx2 : 3353.2
  14 RofChade 3.1 avx2 : 3384.0 | 14 RofChade 3.1 avx2 : 3350.4
  15 Uralochka 3.40a avx2 : 3374.3 | 15 Uralochka 3.40a avx2 : 3349.0

We see -

1. The order hasn't changed except for Clover and Obsidian that exchanged places.

2. SF has increased its elo advantage to Torch from 52 elo to 64.

3. The top-4 profits, elo goes up, from place 6 and on elo goes down significantly.

I have no problem with such a change (wins are 2 points) looking at the LOSS statistic.

Code::    # PLAYER : LOSS
   1 Stockfish 16 230630 : 731
   2 Torch 1 popavx2 : 1398
   3 KomodoDragon 3.3 avx2 : 1975
   4 Berserk 12 avx2 : 2471
   5 Ethereal 14.25 nnue : 3485
   6 Caissa 1.15 avx2 : 4023
   7 RubiChess 230918 avx2 : 4193
   8 CSTal 2.0 avx2 : 4751
   9 Obsidian 9.0 avx2 : 4887
  10 Clover 6.1 avx2 : 4919
  11 Koivisto 9.2 avx2 : 5132
  12 Rebel EAS avx2 : 5549
  13 Seer 2.7.0 avx2 : 5709
  14 RofChade 3.1 avx2 : 5728
  15 Uralochka 3.40a avx2 : 5918

Incredible....

Posts : 222 Join date : 2021-08-28

Personally, I don't think very much of this new statistic, which in my opinion only serves to measure any existing Elo progress in current Stockfish versions. This is just my personal opinion.

Subject: Re: Rating List Experiment Thu Jan 04, 2024 11:33 pm

You hit the nail on the head.

http://www.cegt.net/40_40%20Rating%20List/40_40%20SingleVersion/rangliste.html
https://www.computerchess.org.uk/ccrl/4040/index.html

It can't be that a difference of 6 or 12 more elo only that SF wins TCEC the last 7 times in a row. The difference must be much higher.

I know playing from balanced positions plays a role in this matter.

Posts : 159 Join date : 2022-03-01 Location : Berlin

Admin wrote:

We see -

1. The order hasn't changed except for Clover and Obsidian that exchanged places.

2. SF has increased its elo advantage to Torch from 52 elo to 64.

3. The top-4 profits, elo goes up, from place 6 and on elo goes down significantly.

I have no problem with such a change (wins are 2 points) looking at the LOSS statistic.

Code::    # PLAYER : LOSS
   1 Stockfish 16 230630 : 731
   2 Torch 1 popavx2 : 1398
   3 KomodoDragon 3.3 avx2 : 1975
   4 Berserk 12 avx2 : 2471
   5 Ethereal 14.25 nnue : 3485
   6 Caissa 1.15 avx2 : 4023
   7 RubiChess 230918 avx2 : 4193
   8 CSTal 2.0 avx2 : 4751
   9 Obsidian 9.0 avx2 : 4887
  10 Clover 6.1 avx2 : 4919
  11 Koivisto 9.2 avx2 : 5132
  12 Rebel EAS avx2 : 5549
  13 Seer 2.7.0 avx2 : 5709
  14 RofChade 3.1 avx2 : 5728
  15 Uralochka 3.40a avx2 : 5918

Incredible....

In my AntiDraw-openings download, you find 2 Tools for Armageddon Rescoring and Advanced Armageddon Rescoring of pgn-files:
https://www.sp-cc.de/files/antidraw_v2.1.7z

(classical Armageddon-Chess):
Win for white = 1 point for white
Draw = 1 point for black
Win for black = 1 point for black

(advanced Armageddon-Chess):
Win for white = 1 point for white
Draw = 1 point for black
Win for black = 2 points for black (!!!) (especially useful/interesting, of course, when using white-biased unbalanced openings (like my UHO-openings)

Subject: Re: Rating List Experiment Mon Jan 08, 2024 2:10 pm

Interesting.....

Posts : 207 Join date : 2020-11-28

Admin wrote:: You hit the nail on the head.

http://www.cegt.net/40_40%20Rating%20List/40_40%20SingleVersion/rangliste.html
https://www.computerchess.org.uk/ccrl/4040/index.html

It can't be that a difference of 6 or 12 more elo only that SF wins TCEC the last 7 times in a row. The difference must be much higher.

I know playing from balanced positions plays a role in this matter.

rating list with balanced book is different than rating list with unbalanced book.

It is even possible that not the same engine is better in both options so comparing CEGT with TCEC is comparing apples with oranges.

Stockfish team does not care about being best with a balanced book so hopefully we will see some engine with higher rating in CEGT or CCRL because of being better at beating weaker chess engines with 100% draws against stockfish.

Posts : 159 Join date : 2022-03-01 Location : Berlin

Uri Blass wrote:: Stockfish team does not care about being best with a balanced book

Of course not. Why should they? All engine tournaments with top-engines are played with my UHO-openings (chesscom engine-torunament site) or similar biased openings (TCEC). Balanced openings are useless for highend-computerchess these days. And in the future, balanced openings will be useless for computerchess at all. Just a question of time.

Thankfully, in contrast to you, the Stockfish-team looks forward into the future, not back into the past.

Posts : 207 Join date : 2020-11-28

pohl4711 wrote:

Uri Blass wrote:: Stockfish team does not care about being best with a balanced book

Of course not. Why should they? All engine tournaments with top-engines are played with my UHO-openings (chesscom engine-torunament site) or similar biased openings (TCEC). Balanced openings are useless for highend-computerchess these days. And in the future, balanced openings will be useless for computerchess at all. Just a question of time.

Thankfully, in contrast to you, the Stockfish-team looks forward into the future, not back into the past.

When I use an engine to analyze my chess games I want the engine to help me to understand the best move also in equal positions that happen in my games that means the move that can give me the better chance to win against weaker opponents(otherwise I can have also games with no tactical mistakes and learn almost nothing from them).

You can say that the best move to increase chances to win against humans is not the best move to win against weaker engines but at least the best move to win against weaker engines is better than ignoring beating weaker players.

Note that my last OTB tournament game was with no tactical blunders when stockfish's evaluation never left the draw zone.
There are some inaccuracies but the main reason for the draw is not that we are very strong chess players but because both players were more afraid to lose than wanted to win.

The game is from the following tournament
https://chess-results.com/tnr873644.aspx?lan=1&art=2&rd=3
I was white against yonatan veber.

for the record here is the game.

I started 1.e3 to surprise my opponent
It seems that my opponent also was afraid from opening preperation so the game started with 1.e3 Nf6 2.Nf3 e6(in any case 1.e3 is not a bad move and it is not the reason for failing to win this game)

https://lichess.org/EcDDucHj

During the game I regretted not playing 11.Ne4 and I understood 36.Nc4 was bad after 36...Nxc4 37.Rxc4 a5 but these moves do not get the score out of the draw zone.

I guess that analysis by engines is going to claim that the sides played at GM level based on average blunder but I think average blunder is a bad strategy to evaluate level of players.

Posts : 131 Join date : 2020-11-20

We probably have to look at the game of chess a bit differently IMHO. I am not a programmer, so please read with a friendly eye.
We are used to think of chess as a zero sum gam with perfect information.
But in reality, a game like NLH Poker (where I certainly reached a +way+ higher playing level than in chess) might be a more interesting general model.And even there, computers rule - so, this is clearly doeable for a computer program.
I don't want to bore anyone with poker theory, but translated to chess and greatly simplified, you may be interested to know the expected value of a chessmove against a given opponent, including the concept of bluffs.
It is my impression that this is the way top level human chess is currently heading. A player like Alireza Firouzja is taking huge risks ( currently in bad form, but still feeling special). We don't want to play a move risking something like losing a piece or being mated - but we are ready to take a risk.
A program that worked somehow like this would be useful for you, Uri. As long as the move were good enough not to be punished more often than bearable at a given level.

» New Rating List ?
» Speedy rating list
» First Gambit Rating List
» SSDF Rating List 23-05-24
» Gambit Rating List - May 20, 2021