Dragon by Komodo 2.5 64-bit 4CPU 3514 Dragon by Komodo 2.5 64-bit 1CPU 3483
Rating Change +31
Not entirely sure I'm following the reasoning here, or what the data is supposed to be showing.
One thing, rating diff between two Elo's with error bars, have an error bar about double the individual error bars. This is okay if the elo diff is much greater than the error bars, but not if in the same range as it (as it is in this case). It's probably more accurate simply to play games between the 1CPU and 4CPU version and get an elo diff that way.
Also would be useful to get a rating diff for a bunch of engines, then you know what you are looking for, just two is not enough.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 2:27 pm
Chris Whittington wrote:
mwyoung wrote:
Admin wrote:
Wrote a util that gives some insight in scaling.
GUI : Arena using [F4] SF15 vs Dragon 2.5 Threads : 10 TC : 40/120 Games : 10
Util produces 2 statistics.
1. Changing best move during last 5 iterations
SF15 vs Komodo Dragon_2.5
Code:
IT SF15 perc Komo perc 4 229 22% 529 52% 3 91 9% 198 19% 2 44 4% 57 5% 1 2 0% 8 0% 0 0 0% 0 0% Tot 366 792
Total moves 1008
2. Changing best move during all iterations (all mainlines)
Dragon by Komodo 2.5 64-bit 4CPU 3514 Dragon by Komodo 2.5 64-bit 1CPU 3483
Rating Change +31
Not entirely sure I'm following the reasoning here, or what the data is supposed to be showing.
One thing, rating diff between two Elo's with error bars, have an error bar about double the individual error bars. This is okay if the elo diff is much greater than the error bars, but not if in the same range as it (as it is in this case). It's probably more accurate simply to play games between the 1CPU and 4CPU version and get an elo diff that way.
Also would be useful to get a rating diff for a bunch of engines, then you know what you are looking for, just two is not enough.
See my other post you missed when posting this...
I connect the dots.
TheSelfImprover
Posts : 3110 Join date : 2020-11-18
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 2:45 pm
Admin wrote:
Wrote a util that gives some insight in scaling.
GUI : Arena using [F4] SF15 vs Dragon 2.5 Threads : 10 TC : 40/120 Games : 10
Util produces 2 statistics.
1. Changing best move during last 5 iterations
SF15 vs Komodo Dragon_2.5
Code:
IT SF15 perc Komo perc 4 229 22% 529 52% 3 91 9% 198 19% 2 44 4% 57 5% 1 2 0% 8 0% 0 0 0% 0 0% Tot 366 792
Total moves 1008
2. Changing best move during all iterations (all mainlines)
IMO this is valuable information: it looks as though SF gets to its favourite move more quickly than KD. Which also means that extra hardware will be less useful to it - even if it uses it efficiently.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 2:51 pm
TheSelfImprover wrote:
Admin wrote:
Wrote a util that gives some insight in scaling.
GUI : Arena using [F4] SF15 vs Dragon 2.5 Threads : 10 TC : 40/120 Games : 10
Util produces 2 statistics.
1. Changing best move during last 5 iterations
SF15 vs Komodo Dragon_2.5
Code:
IT SF15 perc Komo perc 4 229 22% 529 52% 3 91 9% 198 19% 2 44 4% 57 5% 1 2 0% 8 0% 0 0 0% 0 0% Tot 366 792
Total moves 1008
2. Changing best move during all iterations (all mainlines)
IMO this is valuable information: it looks as though SF gets to its favourite move more quickly than KD. Which also means that extra hardware will be less useful to it - even if it uses it efficiently.
"even if it uses it efficiently." Is a counter diction. Unless you think somehow Stockfish 15 has found the best move all the time in a matter of seconds in chess.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 3:48 pm
Actually the worrying part would be a match on super hardware and time control that gives something like -
Code:
IT SF15 perc Komo perc 4 9 1% 9 1% 3 1 0% 1 0% 2 0 0% 0 0% 1 0 0% 0 0% 0 0 0% 0 0% Tot 10 10
And I wonder how the TCEC super final would look like and how we are away from such a scenario.
Chris Whittington
Posts : 1254 Join date : 2020-11-17 Location : France
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 3:59 pm
mwyoung wrote:
Chris Whittington wrote:
mwyoung wrote:
Admin wrote:
Wrote a util that gives some insight in scaling.
GUI : Arena using [F4] SF15 vs Dragon 2.5 Threads : 10 TC : 40/120 Games : 10
Util produces 2 statistics.
1. Changing best move during last 5 iterations
SF15 vs Komodo Dragon_2.5
Code:
IT SF15 perc Komo perc 4 229 22% 529 52% 3 91 9% 198 19% 2 44 4% 57 5% 1 2 0% 8 0% 0 0 0% 0 0% Tot 366 792
Total moves 1008
2. Changing best move during all iterations (all mainlines)
Dragon by Komodo 2.5 64-bit 4CPU 3514 Dragon by Komodo 2.5 64-bit 1CPU 3483
Rating Change +31
Not entirely sure I'm following the reasoning here, or what the data is supposed to be showing.
One thing, rating diff between two Elo's with error bars, have an error bar about double the individual error bars. This is okay if the elo diff is much greater than the error bars, but not if in the same range as it (as it is in this case). It's probably more accurate simply to play games between the 1CPU and 4CPU version and get an elo diff that way.
Also would be useful to get a rating diff for a bunch of engines, then you know what you are looking for, just two is not enough.
See my other post you missed when posting this...
I connect the dots.
Yes, I see it now. You show decreasing rating diffs for SF versions as released over time. Again we need to see corresponding rating diffs for other engines (presumably that data is in the CCRL results). To cross-check (NB the error bars doubling), is it possible to build rating diffs by finding the game results between versions?
I'ld look for correlation between the error-diffs slope and the actual Elo value of the engine. It may well be that weaker engines have steeper slopes.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 8:22 pm
Chris Whittington wrote:
mwyoung wrote:
Chris Whittington wrote:
mwyoung wrote:
Admin wrote:
Wrote a util that gives some insight in scaling.
GUI : Arena using [F4] SF15 vs Dragon 2.5 Threads : 10 TC : 40/120 Games : 10
Util produces 2 statistics.
1. Changing best move during last 5 iterations
SF15 vs Komodo Dragon_2.5
Code:
IT SF15 perc Komo perc 4 229 22% 529 52% 3 91 9% 198 19% 2 44 4% 57 5% 1 2 0% 8 0% 0 0 0% 0 0% Tot 366 792
Total moves 1008
2. Changing best move during all iterations (all mainlines)
Dragon by Komodo 2.5 64-bit 4CPU 3514 Dragon by Komodo 2.5 64-bit 1CPU 3483
Rating Change +31
Not entirely sure I'm following the reasoning here, or what the data is supposed to be showing.
One thing, rating diff between two Elo's with error bars, have an error bar about double the individual error bars. This is okay if the elo diff is much greater than the error bars, but not if in the same range as it (as it is in this case). It's probably more accurate simply to play games between the 1CPU and 4CPU version and get an elo diff that way.
Also would be useful to get a rating diff for a bunch of engines, then you know what you are looking for, just two is not enough.
See my other post you missed when posting this...
I connect the dots.
Yes, I see it now. You show decreasing rating diffs for SF versions as released over time. Again we need to see corresponding rating diffs for other engines (presumably that data is in the CCRL results). To cross-check (NB the error bars doubling), is it possible to build rating diffs by finding the game results between versions?
I'ld look for correlation between the error-diffs slope and the actual Elo value of the engine. It may well be that weaker engines have steeper slopes.
I show SF 14 for example is equal with SF 15 in rating at 4 cores. and so is SF 13. But both SF 14 and SF 13 are scaling much faster +42 and + 38 to SF 15's +15 scaling rate.
Is it really a mystery to anyone. Why SF 15 lost to both SF 14 and SF 13. In Mclane's tournament. When Mclane is testing with 8 threads, and longer time controls. When SF 14 and SF 13 are showing scaling 3 times the SF 15's scaling rate at 4 cores.
That is why I always say "conditions matter" on what is the best chess engine.
Hmmmmmmm.
Last edited by mwyoung on Sun Jul 31, 2022 8:41 pm; edited 1 time in total
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 8:36 pm
mwyoung wrote:
I show SF 14 for example is equal with SF 15 in rating at 4 cores. and so is SF 13. But both SF 14 and SF 13 are scaling much faster +42 and + 38 to SF 15's +15 scaling rate.
Is it really a mystery to anyone. Why SF 15 lost to both SF 14 and SF 13. In Mclane's tournament. When Mclane is testing with 8 threads, and longer time controls. When SF 14 and SF 13 are showing scaling 3 times the SF 15's scaling rate at 4 cores.
That why I always say "conditions matter" on what is the best chess engine.
Hmmmmmmm.
Maybe a hint would be in the WDL which is not available.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 9:08 pm
There is a huge difference between the used elo pools especially for SF13 (-165) and SF14.1(-80).
Heck, I will setup a robin round between the four.
Use 20 cores, and 1 core. You will see very interesting results. I did.
How interesting was interesting?
Stockfish 14, then Stockfish 13 tested as the best stockfish engines. At 3+2 all cores on my system. Stockfish 14.1 is a dog, I guess the Fishtest passed it do to their micro bullet time control testing.
Admin likes this post
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 10:30 pm
The picture at 40/10 (one core) is crystal clear. Currently I don't have comp-time to run at 20 cores. Of course older versions will catch up because they will give SF15 a harder time, the question is how much.
Will be continued.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Thu Aug 04, 2022 10:49 am
End of Match.
Code:
Score of Dragon 3.1 vs Stockfish 07/24/22: 0 - 10 - 590 [0.492] ... Dragon 3.1 playing White: 0 - 1 - 299 [0.498] 300 ... Dragon 3.1 playing Black: 0 - 9 - 291 [0.485] 300 ... White vs Black: 9 - 1 - 590 [0.507] 600 Elo difference: -5.8 +/- 3.6, LOS: 0.1 %, DrawRatio: 98.3 % 600 of 600 games finished.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Thu Aug 04, 2022 11:05 am
mwyoung wrote:
End of Match.
Code:
Score of Dragon 3.1 vs Stockfish 07/24/22: 0 - 10 - 590 [0.492] ... Dragon 3.1 playing White: 0 - 1 - 299 [0.498] 300 ... Dragon 3.1 playing Black: 0 - 9 - 291 [0.485] 300 ... White vs Black: 9 - 1 - 590 [0.507] 600 Elo difference: -5.8 +/- 3.6, LOS: 0.1 %, DrawRatio: 98.3 % 600 of 600 games finished.
DrawRatio: 98.3 %
Yikes.
TheSelfImprover likes this post
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Thu Aug 04, 2022 11:08 am
Admin wrote:
mwyoung wrote:
End of Match.
Code:
Score of Dragon 3.1 vs Stockfish 07/24/22: 0 - 10 - 590 [0.492] ... Dragon 3.1 playing White: 0 - 1 - 299 [0.498] 300 ... Dragon 3.1 playing Black: 0 - 9 - 291 [0.485] 300 ... White vs Black: 9 - 1 - 590 [0.507] 600 Elo difference: -5.8 +/- 3.6, LOS: 0.1 %, DrawRatio: 98.3 % 600 of 600 games finished.
DrawRatio: 98.3 %
Yikes.
The next match will make you quit chess engine testing.
Chris Whittington
Posts : 1254 Join date : 2020-11-17 Location : France
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Fri Aug 05, 2022 8:13 am
Admin wrote:
mwyoung wrote:
End of Match.
Code:
Score of Dragon 3.1 vs Stockfish 07/24/22: 0 - 10 - 590 [0.492] ... Dragon 3.1 playing White: 0 - 1 - 299 [0.498] 300 ... Dragon 3.1 playing Black: 0 - 9 - 291 [0.485] 300 ... White vs Black: 9 - 1 - 590 [0.507] 600 Elo difference: -5.8 +/- 3.6, LOS: 0.1 %, DrawRatio: 98.3 % 600 of 600 games finished.
DrawRatio: 98.3 %
Yikes.
How about doing a Stephan Pohl on the game data, looking for a sacrifice material finally ending in a draw? Eg, at least find out if an engine should get credit for at least trying to win?
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Fri Aug 05, 2022 11:02 am
Chris Whittington wrote:
Admin wrote:
mwyoung wrote:
End of Match.
Code:
Score of Dragon 3.1 vs Stockfish 07/24/22: 0 - 10 - 590 [0.492] ... Dragon 3.1 playing White: 0 - 1 - 299 [0.498] 300 ... Dragon 3.1 playing Black: 0 - 9 - 291 [0.485] 300 ... White vs Black: 9 - 1 - 590 [0.507] 600 Elo difference: -5.8 +/- 3.6, LOS: 0.1 %, DrawRatio: 98.3 % 600 of 600 games finished.
DrawRatio: 98.3 %
Yikes.
How about doing a Stephan Pohl on the game data, looking for a sacrifice material finally ending in a draw? Eg, at least find out if an engine should get credit for at least trying to win?
No, I like the idea that even my Laptop has solved chess at 5 seconds a move. And using draw rate as proof.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Fri Aug 05, 2022 11:07 am
Current data from the 40/2 vs 3m+2s match.
Code:
Result: ----------------------------------------------------------------------------------------------- # name games wins draws losses score% elo + - 1. Dragon 3.1 by Komodo Chess 64-bit 10 0 10 0 50.0 0 69 69 2. Stockfish 240722 10 0 10 0 50.0 0 69 69
Cross table: ----------------------------------------------------------------------------------------------- # name score% games 1 2 1. Dragon 3.1 by Komodo Chess 64-bit 50.0 10 x ==========------------------------------------------------------------------------------------------ 2. Stockfish 240722 50.0 10 ==========------------------------------------------------------------------------------------------ x