Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: New Rating List ? Fri Oct 06, 2023 10:52 pm
I applaud to Stefan's new setup of his rating list --> less draws among the top engines. I want to promote the idea further using the TCEC positions composed by Jeroen Noomen and GM Matthew Sadler and made a setup, a pilot run gives -
Although few games I very much like the big elo difference between SF and Komodo, IMO it is real when push comes to shove.
[Q] - before going for real, I tried to involve the new Rofchade 3.1 but as version 3.0 crashes under cutechess, any idea?
Ipmanchess likes this post
Ipmanchess
Posts : 42 Join date : 2022-06-08
Subject: Re: New Rating List ? Sat Oct 07, 2023 8:43 am
Nice to see new lists.. i always hope they finally test a engine against all engines with same total games in a list to get more accurate results.. good luck!
About Rofchade ,i'm using on 3 systems Cutechess Gui and have been testing RofChade different versions regular and have not see any crashes yet!
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: New Rating List ? Sat Oct 07, 2023 9:42 am
Absolutely, robin round, all vs all, small pool is doable.
Regarding Rofchade, tried 2 cutechess versions, can you see what I am doing wrong?
Code:
C:\Cute Chess>ccc -concurrency 5 -tournament gauntlet -engine name=REBEL-EAS cmd=REBEL-EAS.exe proto=uci option.Hash=128 -engine name=rofChade-3.1 cmd=rofChade-3.1.exe proto=uci option.Hash=128 option.NetFile=230620-224.nnue -each restart=on tc=40/10 timemargin=500 -draw movenumber=160 movecount=3 score=100 -resign movecount=3 score=999 -rounds 10 -repeat -pgnout all.pgn -openings file=fq.pgn plies=40 Warning: 2 opening repetitions vs 1 games per encounter Started game 5 of 10 (REBEL-EAS vs rofChade-3.1) Started game 3 of 10 (REBEL-EAS vs rofChade-3.1) Started game 4 of 10 (rofChade-3.1 vs REBEL-EAS) Started game 2 of 10 (rofChade-3.1 vs REBEL-EAS) Started game 1 of 10 (REBEL-EAS vs rofChade-3.1) Terminating process of engine rofChade-3.1(6) Finished game 4 (rofChade-3.1 vs REBEL-EAS): 0-1 {White's connection stalls} Score of REBEL-EAS vs rofChade-3.1: 1 - 0 - 0 [1.000] 1 Terminating process of engine rofChade-3.1(9) Terminating process of engine rofChade-3.1(5) Terminating process of engine rofChade-3.1(1) Terminating process of engine rofChade-3.1(2) Finished game 5 (REBEL-EAS vs rofChade-3.1): * {No result} Score of REBEL-EAS vs rofChade-3.1: 1 - 0 - 0 [1.000] 1 Finished game 3 (REBEL-EAS vs rofChade-3.1): * {No result} Score of REBEL-EAS vs rofChade-3.1: 1 - 0 - 0 [1.000] 1 Finished game 1 (REBEL-EAS vs rofChade-3.1): * {No result} Score of REBEL-EAS vs rofChade-3.1: 1 - 0 - 0 [1.000] 1 Finished game 2 (rofChade-3.1 vs REBEL-EAS): * {No result} Score of REBEL-EAS vs rofChade-3.1: 1 - 0 - 0 [1.000] 1 ... REBEL-EAS playing Black: 1 - 0 - 0 [1.000] 1 ... White vs Black: 0 - 1 - 0 [0.000] 1 Elo difference: inf +/- nan, LOS: 84.1 %, DrawRatio: 0.0 % Finished match
C:\Cute Chess>pause Press any key to continue . . .
pohl4711
Posts : 159 Join date : 2022-03-01 Location : Berlin
Subject: Re: New Rating List ? Sat Oct 07, 2023 10:13 am
Admin wrote:
Regarding Rofchade, tried 2 cutechess versions, can you see what I am doing wrong?
Put Rofchade binary and the net in a folder. And do not use the uci-option for naming the net, it is not necessary. Only Threads and Hash need a uci-option setting. Right now, my Rofchade 3.1 test using cutechess-cli, is running: More than 10000 games played. No crashes, timelosses, disconnects.
Last edited by pohl4711 on Sat Oct 07, 2023 10:30 am; edited 1 time in total
pohl4711
Posts : 159 Join date : 2022-03-01 Location : Berlin
Subject: Re: New Rating List ? Sat Oct 07, 2023 10:30 am
Admin wrote:
Absolutely, robin round, all vs all, small pool is doable.
Don't forget the additionally Gamepair-Ratinglist and Gamepair-Statistics, which are very interesting and the EAS-Ratinglist calculated out of the played games... No other Ratinglist has this to offer. And Gamepairs are the most natural way of looking at results of games, played with biased openings. As Vondele (Stockfish Maintainer) mentioned: "Thinking uniquely in game pairs makes sense with the biased openings used these days. While pentanomial makes sense it is a bit complicated so we could simplify and score game pairs only (not games) as W-L-D (a traditional score of 2-0, or 1.5-0.5 is just a W)."
Example: Gamepair result of the 1000 games of Stockfish 230929 vs. KomodoDragon 3.2: 500 gamepairs (+375, =118, -7) = 86.8%... So KomodoDragon just won 7 out of 500 gamepairs - ouch...
PS: And not to forget: On my website, you can replay some of the most spectacular sac-games from the latest testrun, using the ChessBase pgn-viewer: https://www.sp-cc.de/view-games-with-sacs.htm
Example: This crazy game by Stockfish Dev vs. KomodoDragon 3.2:
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: New Rating List ? Sat Oct 07, 2023 11:31 am
pohl4711 wrote:
Admin wrote:
Regarding Rofchade, tried 2 cutechess versions, can you see what I am doing wrong?
Put Rofchade binary and the net in a folder. And do not use the uci-option for naming the net, it is not necessary. Only Threads and Hash need a uci-option setting. Right now, my Rofchade 3.1 test using cutechess-cli, is running: More than 10000 games played. No crashes, timelosses, disconnects.
How odd, with the folder trick it works that way....never seen that before. And typing uci the net is empty. Always something with those Dutch
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: New Rating List ? Sat Oct 07, 2023 12:21 pm
pohl4711 wrote:
Admin wrote:
Absolutely, robin round, all vs all, small pool is doable.
Don't forget the additionally Gamepair-Ratinglist and Gamepair-Statistics, which are very interesting and the EAS-Ratinglist calculated out of the played games... No other Ratinglist has this to offer. And Gamepairs are the most natural way of looking at results of games, played with biased openings. As Vondele (Stockfish Maintainer) mentioned: "Thinking uniquely in game pairs makes sense with the biased openings used these days. While pentanomial makes sense it is a bit complicated so we could simplify and score game pairs only (not games) as W-L-D (a traditional score of 2-0, or 1.5-0.5 is just a W)."
Example: Gamepair result of the 1000 games of Stockfish 230929 vs. KomodoDragon 3.2: 500 gamepairs (+375, =118, -7) = 86.8%... So KomodoDragon just won 7 out of 500 gamepairs - ouch...
Yep, SF finally gets what it deserves. SF notable won the last 7 TCEC tournaments.
Tried your gamepair tool, is it possible it doesn't work with FEN pgns?
Processing head-to-head file 77 of 78 Processing head-to-head file 78 of 78 ************************************************************ Step 4: Elo-calculating with ORDO ERROR: Input file contains no games ************************************************************ ------------------------------------------------------------------- --- Number of all Gamepairs : 0 --- Number of drawn Gamepairs overall: 0 (= 00.00%) --- Number of 1:1 drawn Gamepairs : 0 (= 00.00%) --- Number of 2-draws drawn Gamepairs: 0 (= 00.00%) ------------------------------------------------------------------- ************************************************************ Press any key to continue . . .
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: New Rating List ? Sat Oct 07, 2023 12:35 pm
I let the pilot run overnight, created the head-to-head
Posts : 159 Join date : 2022-03-01 Location : Berlin
Subject: Re: New Rating List ? Sat Oct 07, 2023 4:33 pm
Admin wrote:
Tried your gamepair tool, is it possible it doesn't work with FEN pgns?
No, sadly not. The Tool needs pgn from move 1 and (important) all used openings have to have the same length (like my UHO-Openings-sets - in theses sets, each line has the same length), which is not the case in the TCEC openings for example...
From the ReadMe: The tool works only, when: 1) The pgn-gamefile contains only games with not more than 995 different head-to-head engine-competitions 2) Each opening is played exactly 2 times in each head-to-head (each engine plays it with white and black) 3) Each opening-line has the same length/depth (in plies) 4) The files pgn-extract.exe, textReplace.exe, ordo-win64.exe, pairSplit.exe and nameList.exe are in the bin folder, which has to be in the working-folder - they are needed for processing...KUDOS to the authors and (C): D.Barnes, T.Zipproth, M.Ballicora, N.Pollock
In my tool (Auto_Gamepairs_Rescorer_V1.bat), you have to set the 4 parameters at the top (just open the file with any texteditor): For example: set opening_plies=12 (Each opening-line is 6 moves (=12 plies) deep) set games=testmatch.pgn (name of the pgn-gamefile) set reference_eng="Stockfish 14 avx2" (name of the engine, Ordo shall use for reference) set /A reference_elo=3700 (Elo-number, which Ordo shall use for the reference-engine)
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: New Rating List ? Sat Oct 07, 2023 4:54 pm
Yep, I suspected that, tried set opening_plies=0 to no avail.
But the head-to-head is part of ORDO, I am using -