ProDeo
Would you like to react to this message? Create an account in a few clicks or log in to continue.
ProDeo

Computer Chess
 
HomeHome  CalendarCalendar  Latest imagesLatest images  FAQFAQ  SearchSearch  MemberlistMemberlist  UsergroupsUsergroups  RegisterRegister  Log in  

 

 Rating List Experiment

Go down 
5 posters
AuthorMessage
Admin
Admin
Admin


Posts : 2571
Join date : 2020-11-17
Location : Netherlands

Rating List Experiment Empty
PostSubject: Rating List Experiment   Rating List Experiment EmptyThu Jan 04, 2024 8:19 pm

Code:
   # PLAYER                   :  RATING  ERROR   POINTS  PLAYED   (%)  CFS(%)     W     D     L  D(%)  OppAvg
   1 Stockfish 16 230630      :  3707.7    3.1  10733.0   15000    72     100  7197  7072   731    47  3486.2
   2 Torch 1 popavx2          :  3659.2    2.9  10020.0   15000    67     100  6438  7164  1398    48  3489.4
   3 KomodoDragon 3.3 avx2    :  3618.0    3.5   9389.0   15000    63     100  5753  7272  1975    48  3492.1
   4 Berserk 12 avx2          :  3587.0    3.4   8903.5   15000    59     100  5278  7251  2471    48  3494.2
   5 Ethereal 14.25 nnue      :  3514.4    3.5   7742.5   15000    52     100  3970  7545  3485    50  3499.0
   6 Caissa 1.15 avx2         :  3479.8    3.3   7186.5   15000    48     100  3396  7581  4023    51  3501.3
   7 RubiChess 230918 avx2    :  3472.1    3.2   7064.0   15000    47     100  3321  7486  4193    50  3501.9
   8 CSTal 2.0 avx2           :  3436.2    3.7   6492.0   15000    43      99  2735  7514  4751    50  3504.3
   9 Obsidian 9.0 avx2        :  3431.7    2.1   6420.0   15000    43     100  2727  7386  4887    49  3504.6
  10 Clover 6.1 avx2          :  3425.9    3.6   6329.0   15000    42     100  2577  7504  4919    50  3504.9
  11 Koivisto 9.2 avx2        :  3414.4    1.9   6148.0   15000    41     100  2428  7440  5132    50  3505.7
  12 Rebel EAS avx2           :  3392.0    4.5   5798.0   15000    39      99  2145  7306  5549    49  3507.2
  13 Seer 2.7.0 avx2          :  3385.7    3.4   5701.0   15000    38      81  2111  7180  5709    48  3507.6
  14 RofChade 3.1 avx2        :  3384.0    3.0   5674.5   15000    38     100  2077  7195  5728    48  3507.7
  15 Uralochka 3.40a avx2     :  3374.3    1.5   5525.0   15000    37     ---  1968  7114  5918    47  3508.4

This is the Stefan rating list with ordo -a 3500

When count won games as 2 points instead of one we get -

Code:
   # PLAYER                   :  RATING  ERROR   POINTS  PLAYED   (%)  CFS(%)      W     D      L  D(%)
   1 Stockfish 16 230630      :  3763.6    3.5  17930.0   22928    78     100  14394  7072   1462    31
   2 Torch 1 popavx2          :  3699.7    3.8  16458.0   22836    72     100  12876  7164   2796    31
   3 KomodoDragon 3.3 avx2    :  3647.6    3.2  15142.0   22728    67     100  11506  7272   3950    32
   4 Berserk 12 avx2          :  3610.1    2.3  14181.5   22749    62     100  10556  7251   4942    32
   5 Ethereal 14.25 nnue      :  3514.8    3.1  11712.5   22455    52     100   7940  7545   6970    34
   6 Caissa 1.15 avx2         :  3475.7    3.1  10582.5   22419    47     100   6792  7581   8046    34
   7 RubiChess 230918 avx2    :  3459.7    3.3  10385.0   22514    46     100   6642  7486   8386    33
   8 CSTal 2.0 avx2           :  3420.1    2.2   9227.0   22486    41     100   5470  7514   9502    33
   9 Clover 6.1 avx2          :  3413.2    3.4   8906.0   22496    40      70   5154  7504   9838    33
  10 Obsidian 9.0 avx2        :  3411.7    4.4   9147.0   22614    40     100   5454  7386   9774    33
  11 Koivisto 9.2 avx2        :  3395.3    3.0   8576.0   22560    38     100   4856  7440  10264    33
  12 Rebel EAS avx2           :  3359.2    3.5   7943.0   22694    35      99   4290  7306  11098    32
  13 Seer 2.7.0 avx2          :  3353.2    3.5   7812.0   22820    34      89   4222  7180  11418    31
  14 RofChade 3.1 avx2        :  3350.4    4.3   7751.5   22805    34      73   4154  7195  11456    32
  15 Uralochka 3.40a avx2     :  3349.0    3.8   7493.0   22886    33     ---   3936  7114  11836    31

For visual comparison -

Code:
                                 NORMAL   |                                 DOUBLE
   # PLAYER                   :  RATING   |   # PLAYER                   :  RATING
   1 Stockfish 16 230630      :  3707.7   |   1 Stockfish 16 230630      :  3763.6
   2 Torch 1 popavx2          :  3659.2   |   2 Torch 1 popavx2          :  3699.7
   3 KomodoDragon 3.3 avx2    :  3618.0   |   3 KomodoDragon 3.3 avx2    :  3647.6
   4 Berserk 12 avx2          :  3587.0   |   4 Berserk 12 avx2          :  3610.1
   5 Ethereal 14.25 nnue      :  3514.4   |   5 Ethereal 14.25 nnue      :  3514.8
   6 Caissa 1.15 avx2         :  3479.8   |   6 Caissa 1.15 avx2         :  3475.7
   7 RubiChess 230918 avx2    :  3472.1   |   7 RubiChess 230918 avx2    :  3459.7
   8 CSTal 2.0 avx2           :  3436.2   |   8 CSTal 2.0 avx2           :  3420.1
   9 Obsidian 9.0 avx2        :  3431.7   |   9 Clover 6.1 avx2          :  3413.2
  10 Clover 6.1 avx2          :  3425.9   |  10 Obsidian 9.0 avx2        :  3411.7
  11 Koivisto 9.2 avx2        :  3414.4   |  11 Koivisto 9.2 avx2        :  3395.3
  12 Rebel EAS avx2           :  3392.0   |  12 Rebel EAS avx2           :  3359.2
  13 Seer 2.7.0 avx2          :  3385.7   |  13 Seer 2.7.0 avx2          :  3353.2
  14 RofChade 3.1 avx2        :  3384.0   |  14 RofChade 3.1 avx2        :  3350.4
  15 Uralochka 3.40a avx2     :  3374.3   |  15 Uralochka 3.40a avx2     :  3349.0

We see -

1. The order hasn't changed except for Clover and Obsidian that exchanged places.

2. SF has increased its elo advantage to Torch from 52 elo to 64.

3. The top-4 profits, elo goes up, from place 6 and on elo goes down significantly.

I have no problem with such a change (wins are 2 points) looking at the LOSS statistic.

Code:
   # PLAYER                   :  LOSS
   1 Stockfish 16 230630      :   731
   2 Torch 1 popavx2          :  1398
   3 KomodoDragon 3.3 avx2    :  1975
   4 Berserk 12 avx2          :  2471
   5 Ethereal 14.25 nnue      :  3485
   6 Caissa 1.15 avx2         :  4023
   7 RubiChess 230918 avx2    :  4193
   8 CSTal 2.0 avx2           :  4751
   9 Obsidian 9.0 avx2        :  4887
  10 Clover 6.1 avx2          :  4919
  11 Koivisto 9.2 avx2        :  5132
  12 Rebel EAS avx2           :  5549
  13 Seer 2.7.0 avx2          :  5709
  14 RofChade 3.1 avx2        :  5728
  15 Uralochka 3.40a avx2     :  5918

Incredible....

Dio and Ghppn like this post

Back to top Go down
http://rebel13.nl/
Dio




Posts : 213
Join date : 2021-08-28

Rating List Experiment Empty
PostSubject: Re: Rating List Experiment   Rating List Experiment EmptyThu Jan 04, 2024 8:31 pm

Personally, I don't think very much of this new statistic, which in my opinion only serves to measure any existing Elo progress in current Stockfish versions. This is just my personal opinion.

Ghppn likes this post

Back to top Go down
Admin
Admin
Admin


Posts : 2571
Join date : 2020-11-17
Location : Netherlands

Rating List Experiment Empty
PostSubject: Re: Rating List Experiment   Rating List Experiment EmptyThu Jan 04, 2024 11:33 pm

You hit the nail on the head.

http://www.cegt.net/40_40%20Rating%20List/40_40%20SingleVersion/rangliste.html
https://www.computerchess.org.uk/ccrl/4040/index.html

It can't be that a difference of 6 or 12 more elo only that SF wins TCEC the last 7 times in a row. The difference must be much higher.

I know playing from balanced positions plays a role in this matter.

Ghppn likes this post

Back to top Go down
http://rebel13.nl/
pohl4711

pohl4711


Posts : 132
Join date : 2022-03-01
Location : Berlin

Rating List Experiment Empty
PostSubject: Re: Rating List Experiment   Rating List Experiment EmptyMon Jan 08, 2024 2:01 pm

Admin wrote:

We see -

1. The order hasn't changed except for Clover and Obsidian that exchanged places.

2. SF has increased its elo advantage to Torch from 52 elo to 64.

3. The top-4 profits, elo goes up, from place 6 and on elo goes down significantly.

I have no problem with such a change (wins are 2 points) looking at the LOSS statistic.

Code:
   # PLAYER                   :  LOSS
   1 Stockfish 16 230630      :   731
   2 Torch 1 popavx2          :  1398
   3 KomodoDragon 3.3 avx2    :  1975
   4 Berserk 12 avx2          :  2471
   5 Ethereal 14.25 nnue      :  3485
   6 Caissa 1.15 avx2         :  4023
   7 RubiChess 230918 avx2    :  4193
   8 CSTal 2.0 avx2           :  4751
   9 Obsidian 9.0 avx2        :  4887
  10 Clover 6.1 avx2          :  4919
  11 Koivisto 9.2 avx2        :  5132
  12 Rebel EAS avx2           :  5549
  13 Seer 2.7.0 avx2          :  5709
  14 RofChade 3.1 avx2        :  5728
  15 Uralochka 3.40a avx2     :  5918

Incredible....

In my AntiDraw-openings download, you find 2 Tools for Armageddon Rescoring and Advanced Armageddon Rescoring of pgn-files:
https://www.sp-cc.de/files/antidraw_v2.1.7z

(classical Armageddon-Chess):
Win for white = 1 point for white
Draw = 1 point for black
Win for black = 1 point for black

(advanced Armageddon-Chess):
Win for white = 1 point for white
Draw = 1 point for black
Win for black = 2 points for black (!!!) (especially useful/interesting, of course, when using white-biased unbalanced openings (like my UHO-openings)
Back to top Go down
https://www.sp-cc.de
Admin
Admin
Admin


Posts : 2571
Join date : 2020-11-17
Location : Netherlands

Rating List Experiment Empty
PostSubject: Re: Rating List Experiment   Rating List Experiment EmptyMon Jan 08, 2024 2:10 pm

Interesting.....

Ghppn likes this post

Back to top Go down
http://rebel13.nl/
Uri Blass




Posts : 207
Join date : 2020-11-28

Rating List Experiment Empty
PostSubject: Re: Rating List Experiment   Rating List Experiment EmptyTue Jan 23, 2024 11:08 pm

Admin wrote:
You hit the nail on the head.

http://www.cegt.net/40_40%20Rating%20List/40_40%20SingleVersion/rangliste.html
https://www.computerchess.org.uk/ccrl/4040/index.html

It can't be that a difference of 6 or 12 more elo only that SF wins TCEC the last 7 times in a row. The difference must be much higher.

I know playing from balanced positions plays a role in this matter.

rating list with balanced book is different than rating list with unbalanced book.

It is even possible that not the same engine is better in both options so comparing CEGT with TCEC is comparing apples with oranges.

Stockfish team does not care about being best with a balanced book so hopefully we will see some engine with higher rating in CEGT or CCRL because of being better at beating weaker chess engines with 100% draws against stockfish.
Back to top Go down
pohl4711

pohl4711


Posts : 132
Join date : 2022-03-01
Location : Berlin

Rating List Experiment Empty
PostSubject: Re: Rating List Experiment   Rating List Experiment EmptyWed Jan 24, 2024 6:27 am

Uri Blass wrote:

Stockfish team does not care about being best with a balanced book

Of course not. Why should they? All engine tournaments with top-engines are played with my UHO-openings (chesscom engine-torunament site) or similar biased openings (TCEC). Balanced openings are useless for highend-computerchess these days. And in the future, balanced openings will be useless for computerchess at all. Just a question of time.

Thankfully, in contrast to you, the Stockfish-team looks forward into the future, not back into the past.
Back to top Go down
https://www.sp-cc.de
Uri Blass




Posts : 207
Join date : 2020-11-28

Rating List Experiment Empty
PostSubject: Re: Rating List Experiment   Rating List Experiment EmptyWed Jan 24, 2024 11:00 am

pohl4711 wrote:
Uri Blass wrote:

Stockfish team does not care about being best with a balanced book

Of course not. Why should they? All engine tournaments with top-engines are played with my UHO-openings (chesscom engine-torunament site) or similar biased openings (TCEC). Balanced openings are useless for highend-computerchess these days. And in the future, balanced openings will be useless for computerchess at all. Just a question of time.

Thankfully, in contrast to you, the Stockfish-team looks forward into the future, not back into the past.


When I use an engine to analyze my chess games I want the engine to help me to understand the best move also in equal positions that happen in my games that means the move that can give me the better chance to win against weaker opponents(otherwise I can have also games with no tactical mistakes and learn almost nothing from them).

You can say that the best move to increase chances to win against humans is not the best move to win against weaker engines but at least the best move to win against weaker engines is better than ignoring beating weaker players.

Note that my last OTB tournament game was with no tactical blunders when stockfish's evaluation never left the draw zone.
There are some inaccuracies but the main reason for the draw is not that we are very strong chess players but because both players were more afraid to lose than wanted to win.

The game is from the following tournament
https://chess-results.com/tnr873644.aspx?lan=1&art=2&rd=3
I was white against yonatan veber.

for the record here is the game.

I started 1.e3 to surprise my opponent
It seems that my opponent also was afraid from opening preperation so the game started with 1.e3 Nf6 2.Nf3 e6(in any case 1.e3 is not a bad move and it is not the reason for failing to win this game)

https://lichess.org/EcDDucHj

During the game I regretted not playing 11.Ne4 and I understood 36.Nc4 was bad after 36...Nxc4 37.Rxc4 a5 but these moves do not get the score out of the draw zone.

I guess that analysis by engines is going to claim that the sides played at GM level based on average blunder but I think average blunder is a bad strategy to evaluate level of players.

Peter Berger likes this post

Back to top Go down
Peter Berger




Posts : 130
Join date : 2020-11-20

Rating List Experiment Empty
PostSubject: Re: Rating List Experiment   Rating List Experiment EmptyThu Jan 25, 2024 8:24 pm

We probably have to look at the game of chess a bit differently IMHO. I am not a programmer, so please read with a friendly eye.
We are used to think of chess as a zero sum gam with perfect information.
But in reality, a game like NLH Poker (where I certainly reached a +way+ higher playing level than in chess) might be  a more interesting general model.And even there, computers rule - so, this is clearly doeable for a computer program.
I don't want to bore anyone with poker theory, but translated to chess and greatly simplified, you may be interested to know the expected value of a chessmove against a given opponent, including the concept of bluffs.
It is my impression that this is the way top level human chess is currently heading. A player like Alireza Firouzja is taking huge risks ( currently in bad form, but still feeling special). We don't want to play a move risking something like losing a piece or being mated - but we are ready to take a risk.
A program that worked somehow like this would be useful for you, Uri. As long as the move were good enough not to be punished more often than bearable at a given level.
Back to top Go down
Sponsored content





Rating List Experiment Empty
PostSubject: Re: Rating List Experiment   Rating List Experiment Empty

Back to top Go down
 
Rating List Experiment
Back to top 
Page 1 of 1

Permissions in this forum:You cannot reply to topics in this forum
ProDeo :: Computer Chess-
Jump to: