Dragon 3.1 vs Stockfish 07/24/22 - Live

Posts : 880 Join date : 2020-11-25 Location : USA

Hardware Threadripper 2950x, RTX 2080TI 64 GBs Ram, SSD 2 TB Evo 970 Plus

Tablebases = 7 man top 10.
Book Perfect Book 2021
GUI = CuteChess
Hash = 4 GB.
16 Threads.
TC =3m + 2s
Contempt = 0

Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 12:07 pm

Wrote a util that gives some insight in scaling.

GUI : Arena using [F4]
SF15 vs Dragon 2.5
Threads : 10
TC : 40/120
Games : 10

Util produces 2 statistics.

1. Changing best move during last 5 iterations

SF15 vs Komodo Dragon_2.5

Code:: IT SF15 perc Komo perc
4 229 22% 529 52%
3 91 9% 198 19%
2 44 4% 57   5%
1 2 0% 8   0%
0 0 0% 0   0%
Tot 366 792

Total moves 1008

2. Changing best move during all iterations (all mainlines)

SF15 vs Komodo Dragon_2.5

Code:: All mainlines overview
IT SF15 perc Komo perc
1 1 0% 502 49%
2 133 13% 201 19%
3 150 14% 194 19%
4 85 8% 89 8%
5 73 7% 100 9%
6 50 4% 44 4%
7 57 5% 47 4%
8 41 4% 31 3%
9 35 3% 36 3%
10 33 3% 38 3%
11 35 3% 38 3%
12 24 2% 26 2%
13 24 2% 27 2%
14 23 2% 36 3%
15 22 2% 45 4%
16 27 2% 38 3%
17 25 2% 30 2%
18 22 2% 31 3%
19 19 1% 28 2%
20 20 1% 32 3%
21 20 1% 26 2%
22 28 2% 15 1%
23 33 3% 18 1%
24 28 2% 15 1%
25 16 1% 8 0%
26 18 1% 8 0%
27 16 1% 1 0%
28 20 1% 5 0%
29 9 0% 2 0%
30 11 1% 1 0%
31 11 1% 1 0%
32 10 0% 1 0%
33 6 0% 1 0%
34 9 0% 0 0%
35 3 0% 0 0%
36 6 0% 1 0%
37 7 0% 1 0%
38 6 0% 0 0%
39 11 1% 0 0%
40 9 0% 0 0%
41 6 0% 0 0%
42 7 0% 0 0%
43 1 0% 0 0%
44 3 0% 0 0%
45 9 0% 0 0%
46 5 0% 0 0%
47 4 0% 0 0%
48 8 0% 0 0%
49 0 0% 0 0%
50 6 0% 0 0%
51 3 0% 0 0%
52 8 0% 0 0%
53 7 0% 0 0%
54 1 0% 0 0%
55 1 0% 0 0%
56 3 0% 0 0%
57 0 0% 0 0%
58 1 0% 0 0%
59 1 0% 0 0%
60 1 0% 0 0%
61 1 0% 0 0%
62 0 0% 0 0%
63 2 0% 0 0%
64 0 0% 0 0%
65 0 0% 0 0%
66 1 0% 0 0%
67 0 0% 0 0%
68 0 0% 0 0%
69 0 0% 0 0%
70 0 0% 0 0%
71 0 0% 0 0%
72 0 0% 0 0%
73 1 0% 0 0%
74 0 0% 0 0%
75 1 0% 0 0%
76 0 0% 0 0%
77 0 0% 0 0%
78 0 0% 0 0%
79 0 0% 0 0%
80 0 0% 0 0%
81 0 0% 0 0%
82 0 0% 0 0%
83 0 0% 0 0%
84 0 0% 0 0%
Tot 1257 1717

Posts : 880 Join date : 2020-11-25 Location : USA

Admin wrote:

Wrote a util that gives some insight in scaling.

GUI : Arena using [F4]
SF15 vs Dragon 2.5
Threads : 10
TC : 40/120
Games : 10

Util produces 2 statistics.

1. Changing best move during last 5 iterations

SF15 vs Komodo Dragon_2.5

Code:: IT SF15 perc Komo perc
4 229 22% 529 52%
3 91 9% 198 19%
2 44 4% 57   5%
1 2 0% 8   0%
0 0 0% 0   0%
Tot 366 792

Total moves 1008

2. Changing best move during all iterations (all mainlines)

SF15 vs Komodo Dragon_2.5

Code:: All mainlines overview
IT SF15 perc Komo perc
1 1 0% 502 49%
2 133 13% 201 19%
3 150 14% 194 19%
4 85 8% 89 8%
5 73 7% 100 9%
6 50 4% 44 4%
7 57 5% 47 4%
8 41 4% 31 3%
9 35 3% 36 3%
10 33 3% 38 3%
11 35 3% 38 3%
12 24 2% 26 2%
13 24 2% 27 2%
14 23 2% 36 3%
15 22 2% 45 4%
16 27 2% 38 3%
17 25 2% 30 2%
18 22 2% 31 3%
19 19 1% 28 2%
20 20 1% 32 3%
21 20 1% 26 2%
22 28 2% 15 1%
23 33 3% 18 1%
24 28 2% 15 1%
25 16 1% 8 0%
26 18 1% 8 0%
27 16 1% 1 0%
28 20 1% 5 0%
29 9 0% 2 0%
30 11 1% 1 0%
31 11 1% 1 0%
32 10 0% 1 0%
33 6 0% 1 0%
34 9 0% 0 0%
35 3 0% 0 0%
36 6 0% 1 0%
37 7 0% 1 0%
38 6 0% 0 0%
39 11 1% 0 0%
40 9 0% 0 0%
41 6 0% 0 0%
42 7 0% 0 0%
43 1 0% 0 0%
44 3 0% 0 0%
45 9 0% 0 0%
46 5 0% 0 0%
47 4 0% 0 0%
48 8 0% 0 0%
49 0 0% 0 0%
50 6 0% 0 0%
51 3 0% 0 0%
52 8 0% 0 0%
53 7 0% 0 0%
54 1 0% 0 0%
55 1 0% 0 0%
56 3 0% 0 0%
57 0 0% 0 0%
58 1 0% 0 0%
59 1 0% 0 0%
60 1 0% 0 0%
61 1 0% 0 0%
62 0 0% 0 0%
63 2 0% 0 0%
64 0 0% 0 0%
65 0 0% 0 0%
66 1 0% 0 0%
67 0 0% 0 0%
68 0 0% 0 0%
69 0 0% 0 0%
70 0 0% 0 0%
71 0 0% 0 0%
72 0 0% 0 0%
73 1 0% 0 0%
74 0 0% 0 0%
75 1 0% 0 0%
76 0 0% 0 0%
77 0 0% 0 0%
78 0 0% 0 0%
79 0 0% 0 0%
80 0 0% 0 0%
81 0 0% 0 0%
82 0 0% 0 0%
83 0 0% 0 0%
84 0 0% 0 0%
Tot 1257 1717

Thanks Ed for the data.

Here is what the data means in rating change as cores increase from 1 to 4 cores. Data from CCRL 40/15 rating list.

Stockfish 15 64-bit 4CPU 3538
Stockfish 15 64-bit 1CPU 3523

Rating change +15

Dragon by Komodo 2.5 64-bit 4CPU 3514
Dragon by Komodo 2.5 64-bit 1CPU 3483

Rating Change +31

Posts : 880 Join date : 2020-11-25 Location : USA

Here is the rating change for all Stockfish NNUE engines from 1 to 4 cores. Data from CCRL 40/15 rating list.

Stockfish 15 64-bit 4CPU 3538
Stockfish 15 64-bit 1CPU 3523

Rating change +15

Stockfish 14.1 64-bit 4CPU 3522
Stockfish 14.1 64-bit 1CPU 3499

Rating change +23

Stockfish 14 64-bit 4CPU 3537
Stockfish 14 64-bit 1CPU 3495

Rating change +42

Stockfish 13 64-bit 4CPU 3536
Stockfish 13 64-bit 1CPU 3498

Rating change +38

Stockfish 12 64-bit 4CPU 3509
Stockfish 12 64-bit 1CPU 3471

Rating change +38

Now when you see results like this for example from Mclane.
And you see comments like this from Mclane.

"There is something wrong with stockfish. I have no improvement on my pcs with stockfish" - Mclane

Now these results make much more sense.

Dragon 3.1 vs Stockfish 07/24/22 - Live Result10

Posts : 1254 Join date : 2020-11-17 Location : France

mwyoung wrote:

Admin wrote:

Wrote a util that gives some insight in scaling.

GUI : Arena using [F4]
SF15 vs Dragon 2.5
Threads : 10
TC : 40/120
Games : 10

Util produces 2 statistics.

1. Changing best move during last 5 iterations

SF15 vs Komodo Dragon_2.5

Code:: IT SF15 perc Komo perc
4 229 22% 529 52%
3 91 9% 198 19%
2 44 4% 57   5%
1 2 0% 8   0%
0 0 0% 0   0%
Tot 366 792

Total moves 1008

2. Changing best move during all iterations (all mainlines)

SF15 vs Komodo Dragon_2.5

Code:: All mainlines overview
IT SF15 perc Komo perc
1 1 0% 502 49%
2 133 13% 201 19%
3 150 14% 194 19%
4 85 8% 89 8%
5 73 7% 100 9%
6 50 4% 44 4%
7 57 5% 47 4%
8 41 4% 31 3%
9 35 3% 36 3%
10 33 3% 38 3%
11 35 3% 38 3%
12 24 2% 26 2%
13 24 2% 27 2%
14 23 2% 36 3%
15 22 2% 45 4%
16 27 2% 38 3%
17 25 2% 30 2%
18 22 2% 31 3%
19 19 1% 28 2%
20 20 1% 32 3%
21 20 1% 26 2%
22 28 2% 15 1%
23 33 3% 18 1%
24 28 2% 15 1%
25 16 1% 8 0%
26 18 1% 8 0%
27 16 1% 1 0%
28 20 1% 5 0%
29 9 0% 2 0%
30 11 1% 1 0%
31 11 1% 1 0%
32 10 0% 1 0%
33 6 0% 1 0%
34 9 0% 0 0%
35 3 0% 0 0%
36 6 0% 1 0%
37 7 0% 1 0%
38 6 0% 0 0%
39 11 1% 0 0%
40 9 0% 0 0%
41 6 0% 0 0%
42 7 0% 0 0%
43 1 0% 0 0%
44 3 0% 0 0%
45 9 0% 0 0%
46 5 0% 0 0%
47 4 0% 0 0%
48 8 0% 0 0%
49 0 0% 0 0%
50 6 0% 0 0%
51 3 0% 0 0%
52 8 0% 0 0%
53 7 0% 0 0%
54 1 0% 0 0%
55 1 0% 0 0%
56 3 0% 0 0%
57 0 0% 0 0%
58 1 0% 0 0%
59 1 0% 0 0%
60 1 0% 0 0%
61 1 0% 0 0%
62 0 0% 0 0%
63 2 0% 0 0%
64 0 0% 0 0%
65 0 0% 0 0%
66 1 0% 0 0%
67 0 0% 0 0%
68 0 0% 0 0%
69 0 0% 0 0%
70 0 0% 0 0%
71 0 0% 0 0%
72 0 0% 0 0%
73 1 0% 0 0%
74 0 0% 0 0%
75 1 0% 0 0%
76 0 0% 0 0%
77 0 0% 0 0%
78 0 0% 0 0%
79 0 0% 0 0%
80 0 0% 0 0%
81 0 0% 0 0%
82 0 0% 0 0%
83 0 0% 0 0%
84 0 0% 0 0%
Tot 1257 1717

Thanks Ed for the data.

Here is what the data means in rating change as cores increase from 1 to 4 cores. Data from CCRL 40/15 rating list.

Stockfish 15 64-bit 4CPU 3538
Stockfish 15 64-bit 1CPU 3523

Rating change +15

Dragon by Komodo 2.5 64-bit 4CPU 3514
Dragon by Komodo 2.5 64-bit 1CPU 3483

Rating Change +31

Not entirely sure I'm following the reasoning here, or what the data is supposed to be showing.

One thing, rating diff between two Elo's with error bars, have an error bar about double the individual error bars. This is okay if the elo diff is much greater than the error bars, but not if in the same range as it (as it is in this case). It's probably more accurate simply to play games between the 1CPU and 4CPU version and get an elo diff that way.

Also would be useful to get a rating diff for a bunch of engines, then you know what you are looking for, just two is not enough.

Posts : 880 Join date : 2020-11-25 Location : USA

Chris Whittington wrote:

mwyoung wrote:

Admin wrote:

Wrote a util that gives some insight in scaling.

GUI : Arena using [F4]
SF15 vs Dragon 2.5
Threads : 10
TC : 40/120
Games : 10

Util produces 2 statistics.

1. Changing best move during last 5 iterations

SF15 vs Komodo Dragon_2.5

Code:: IT SF15 perc Komo perc
4 229 22% 529 52%
3 91 9% 198 19%
2 44 4% 57   5%
1 2 0% 8   0%
0 0 0% 0   0%
Tot 366 792

Total moves 1008

2. Changing best move during all iterations (all mainlines)

SF15 vs Komodo Dragon_2.5

Code:: All mainlines overview
IT SF15 perc Komo perc
1 1 0% 502 49%
2 133 13% 201 19%
3 150 14% 194 19%
4 85 8% 89 8%
5 73 7% 100 9%
6 50 4% 44 4%
7 57 5% 47 4%
8 41 4% 31 3%
9 35 3% 36 3%
10 33 3% 38 3%
11 35 3% 38 3%
12 24 2% 26 2%
13 24 2% 27 2%
14 23 2% 36 3%
15 22 2% 45 4%
16 27 2% 38 3%
17 25 2% 30 2%
18 22 2% 31 3%
19 19 1% 28 2%
20 20 1% 32 3%
21 20 1% 26 2%
22 28 2% 15 1%
23 33 3% 18 1%
24 28 2% 15 1%
25 16 1% 8 0%
26 18 1% 8 0%
27 16 1% 1 0%
28 20 1% 5 0%
29 9 0% 2 0%
30 11 1% 1 0%
31 11 1% 1 0%
32 10 0% 1 0%
33 6 0% 1 0%
34 9 0% 0 0%
35 3 0% 0 0%
36 6 0% 1 0%
37 7 0% 1 0%
38 6 0% 0 0%
39 11 1% 0 0%
40 9 0% 0 0%
41 6 0% 0 0%
42 7 0% 0 0%
43 1 0% 0 0%
44 3 0% 0 0%
45 9 0% 0 0%
46 5 0% 0 0%
47 4 0% 0 0%
48 8 0% 0 0%
49 0 0% 0 0%
50 6 0% 0 0%
51 3 0% 0 0%
52 8 0% 0 0%
53 7 0% 0 0%
54 1 0% 0 0%
55 1 0% 0 0%
56 3 0% 0 0%
57 0 0% 0 0%
58 1 0% 0 0%
59 1 0% 0 0%
60 1 0% 0 0%
61 1 0% 0 0%
62 0 0% 0 0%
63 2 0% 0 0%
64 0 0% 0 0%
65 0 0% 0 0%
66 1 0% 0 0%
67 0 0% 0 0%
68 0 0% 0 0%
69 0 0% 0 0%
70 0 0% 0 0%
71 0 0% 0 0%
72 0 0% 0 0%
73 1 0% 0 0%
74 0 0% 0 0%
75 1 0% 0 0%
76 0 0% 0 0%
77 0 0% 0 0%
78 0 0% 0 0%
79 0 0% 0 0%
80 0 0% 0 0%
81 0 0% 0 0%
82 0 0% 0 0%
83 0 0% 0 0%
84 0 0% 0 0%
Tot 1257 1717

Thanks Ed for the data.

Here is what the data means in rating change as cores increase from 1 to 4 cores. Data from CCRL 40/15 rating list.

Stockfish 15 64-bit 4CPU 3538
Stockfish 15 64-bit 1CPU 3523

Rating change +15

Dragon by Komodo 2.5 64-bit 4CPU 3514
Dragon by Komodo 2.5 64-bit 1CPU 3483

Rating Change +31

Not entirely sure I'm following the reasoning here, or what the data is supposed to be showing.

One thing, rating diff between two Elo's with error bars, have an error bar about double the individual error bars. This is okay if the elo diff is much greater than the error bars, but not if in the same range as it (as it is in this case). It's probably more accurate simply to play games between the 1CPU and 4CPU version and get an elo diff that way.

Also would be useful to get a rating diff for a bunch of engines, then you know what you are looking for, just two is not enough.

See my other post you missed when posting this...

I connect the dots.

Posts : 3110 Join date : 2020-11-18

Admin wrote:

Wrote a util that gives some insight in scaling.

GUI : Arena using [F4]
SF15 vs Dragon 2.5
Threads : 10
TC : 40/120
Games : 10

Util produces 2 statistics.

1. Changing best move during last 5 iterations

SF15 vs Komodo Dragon_2.5

Code:: IT SF15 perc Komo perc
4 229 22% 529 52%
3 91 9% 198 19%
2 44 4% 57   5%
1 2 0% 8   0%
0 0 0% 0   0%
Tot 366 792

Total moves 1008

2. Changing best move during all iterations (all mainlines)

SF15 vs Komodo Dragon_2.5

Code:: All mainlines overview
IT SF15 perc Komo perc
1 1 0% 502 49%
2 133 13% 201 19%
3 150 14% 194 19%
4 85 8% 89 8%
5 73 7% 100 9%
6 50 4% 44 4%
7 57 5% 47 4%
8 41 4% 31 3%
9 35 3% 36 3%
10 33 3% 38 3%
11 35 3% 38 3%
12 24 2% 26 2%
13 24 2% 27 2%
14 23 2% 36 3%
15 22 2% 45 4%
16 27 2% 38 3%
17 25 2% 30 2%
18 22 2% 31 3%
19 19 1% 28 2%
20 20 1% 32 3%
21 20 1% 26 2%
22 28 2% 15 1%
23 33 3% 18 1%
24 28 2% 15 1%
25 16 1% 8 0%
26 18 1% 8 0%
27 16 1% 1 0%
28 20 1% 5 0%
29 9 0% 2 0%
30 11 1% 1 0%
31 11 1% 1 0%
32 10 0% 1 0%
33 6 0% 1 0%
34 9 0% 0 0%
35 3 0% 0 0%
36 6 0% 1 0%
37 7 0% 1 0%
38 6 0% 0 0%
39 11 1% 0 0%
40 9 0% 0 0%
41 6 0% 0 0%
42 7 0% 0 0%
43 1 0% 0 0%
44 3 0% 0 0%
45 9 0% 0 0%
46 5 0% 0 0%
47 4 0% 0 0%
48 8 0% 0 0%
49 0 0% 0 0%
50 6 0% 0 0%
51 3 0% 0 0%
52 8 0% 0 0%
53 7 0% 0 0%
54 1 0% 0 0%
55 1 0% 0 0%
56 3 0% 0 0%
57 0 0% 0 0%
58 1 0% 0 0%
59 1 0% 0 0%
60 1 0% 0 0%
61 1 0% 0 0%
62 0 0% 0 0%
63 2 0% 0 0%
64 0 0% 0 0%
65 0 0% 0 0%
66 1 0% 0 0%
67 0 0% 0 0%
68 0 0% 0 0%
69 0 0% 0 0%
70 0 0% 0 0%
71 0 0% 0 0%
72 0 0% 0 0%
73 1 0% 0 0%
74 0 0% 0 0%
75 1 0% 0 0%
76 0 0% 0 0%
77 0 0% 0 0%
78 0 0% 0 0%
79 0 0% 0 0%
80 0 0% 0 0%
81 0 0% 0 0%
82 0 0% 0 0%
83 0 0% 0 0%
84 0 0% 0 0%
Tot 1257 1717

IMO this is valuable information: it looks as though SF gets to its favourite move more quickly than KD. Which also means that extra hardware will be less useful to it - even if it uses it efficiently.

Posts : 880 Join date : 2020-11-25 Location : USA

TheSelfImprover wrote:

Admin wrote:

Wrote a util that gives some insight in scaling.

GUI : Arena using [F4]
SF15 vs Dragon 2.5
Threads : 10
TC : 40/120
Games : 10

Util produces 2 statistics.

1. Changing best move during last 5 iterations

SF15 vs Komodo Dragon_2.5

Code:: IT SF15 perc Komo perc
4 229 22% 529 52%
3 91 9% 198 19%
2 44 4% 57   5%
1 2 0% 8   0%
0 0 0% 0   0%
Tot 366 792

Total moves 1008

2. Changing best move during all iterations (all mainlines)

SF15 vs Komodo Dragon_2.5

Code:: All mainlines overview
IT SF15 perc Komo perc
1 1 0% 502 49%
2 133 13% 201 19%
3 150 14% 194 19%
4 85 8% 89 8%
5 73 7% 100 9%
6 50 4% 44 4%
7 57 5% 47 4%
8 41 4% 31 3%
9 35 3% 36 3%
10 33 3% 38 3%
11 35 3% 38 3%
12 24 2% 26 2%
13 24 2% 27 2%
14 23 2% 36 3%
15 22 2% 45 4%
16 27 2% 38 3%
17 25 2% 30 2%
18 22 2% 31 3%
19 19 1% 28 2%
20 20 1% 32 3%
21 20 1% 26 2%
22 28 2% 15 1%
23 33 3% 18 1%
24 28 2% 15 1%
25 16 1% 8 0%
26 18 1% 8 0%
27 16 1% 1 0%
28 20 1% 5 0%
29 9 0% 2 0%
30 11 1% 1 0%
31 11 1% 1 0%
32 10 0% 1 0%
33 6 0% 1 0%
34 9 0% 0 0%
35 3 0% 0 0%
36 6 0% 1 0%
37 7 0% 1 0%
38 6 0% 0 0%
39 11 1% 0 0%
40 9 0% 0 0%
41 6 0% 0 0%
42 7 0% 0 0%
43 1 0% 0 0%
44 3 0% 0 0%
45 9 0% 0 0%
46 5 0% 0 0%
47 4 0% 0 0%
48 8 0% 0 0%
49 0 0% 0 0%
50 6 0% 0 0%
51 3 0% 0 0%
52 8 0% 0 0%
53 7 0% 0 0%
54 1 0% 0 0%
55 1 0% 0 0%
56 3 0% 0 0%
57 0 0% 0 0%
58 1 0% 0 0%
59 1 0% 0 0%
60 1 0% 0 0%
61 1 0% 0 0%
62 0 0% 0 0%
63 2 0% 0 0%
64 0 0% 0 0%
65 0 0% 0 0%
66 1 0% 0 0%
67 0 0% 0 0%
68 0 0% 0 0%
69 0 0% 0 0%
70 0 0% 0 0%
71 0 0% 0 0%
72 0 0% 0 0%
73 1 0% 0 0%
74 0 0% 0 0%
75 1 0% 0 0%
76 0 0% 0 0%
77 0 0% 0 0%
78 0 0% 0 0%
79 0 0% 0 0%
80 0 0% 0 0%
81 0 0% 0 0%
82 0 0% 0 0%
83 0 0% 0 0%
84 0 0% 0 0%
Tot 1257 1717

IMO this is valuable information: it looks as though SF gets to its favourite move more quickly than KD. Which also means that extra hardware will be less useful to it - even if it uses it efficiently.

"even if it uses it efficiently." Is a counter diction. Unless you think somehow Stockfish 15 has found the best move all the time in a matter of seconds in chess. lol!

Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 3:48 pm

Actually the worrying part would be a match on super hardware and time control that gives something like -

Code:: IT SF15 perc Komo perc
4 9 1% 9 1%
3 1   0% 1 0%
2 0   0%   0 0%
1    0   0% 0 0%
0    0   0% 0 0%
Tot 10 10

And I wonder how the TCEC super final would look like and how we are away from such a scenario.

Posts : 1254 Join date : 2020-11-17 Location : France

mwyoung wrote:

Chris Whittington wrote:

mwyoung wrote:

Admin wrote:

Wrote a util that gives some insight in scaling.

GUI : Arena using [F4]
SF15 vs Dragon 2.5
Threads : 10
TC : 40/120
Games : 10

Util produces 2 statistics.

1. Changing best move during last 5 iterations

SF15 vs Komodo Dragon_2.5

Code:: IT SF15 perc Komo perc
4 229 22% 529 52%
3 91 9% 198 19%
2 44 4% 57   5%
1 2 0% 8   0%
0 0 0% 0   0%
Tot 366 792

Total moves 1008

2. Changing best move during all iterations (all mainlines)

SF15 vs Komodo Dragon_2.5

Code:: All mainlines overview
IT SF15 perc Komo perc
1 1 0% 502 49%
2 133 13% 201 19%
3 150 14% 194 19%
4 85 8% 89 8%
5 73 7% 100 9%
6 50 4% 44 4%
7 57 5% 47 4%
8 41 4% 31 3%
9 35 3% 36 3%
10 33 3% 38 3%
11 35 3% 38 3%
12 24 2% 26 2%
13 24 2% 27 2%
14 23 2% 36 3%
15 22 2% 45 4%
16 27 2% 38 3%
17 25 2% 30 2%
18 22 2% 31 3%
19 19 1% 28 2%
20 20 1% 32 3%
21 20 1% 26 2%
22 28 2% 15 1%
23 33 3% 18 1%
24 28 2% 15 1%
25 16 1% 8 0%
26 18 1% 8 0%
27 16 1% 1 0%
28 20 1% 5 0%
29 9 0% 2 0%
30 11 1% 1 0%
31 11 1% 1 0%
32 10 0% 1 0%
33 6 0% 1 0%
34 9 0% 0 0%
35 3 0% 0 0%
36 6 0% 1 0%
37 7 0% 1 0%
38 6 0% 0 0%
39 11 1% 0 0%
40 9 0% 0 0%
41 6 0% 0 0%
42 7 0% 0 0%
43 1 0% 0 0%
44 3 0% 0 0%
45 9 0% 0 0%
46 5 0% 0 0%
47 4 0% 0 0%
48 8 0% 0 0%
49 0 0% 0 0%
50 6 0% 0 0%
51 3 0% 0 0%
52 8 0% 0 0%
53 7 0% 0 0%
54 1 0% 0 0%
55 1 0% 0 0%
56 3 0% 0 0%
57 0 0% 0 0%
58 1 0% 0 0%
59 1 0% 0 0%
60 1 0% 0 0%
61 1 0% 0 0%
62 0 0% 0 0%
63 2 0% 0 0%
64 0 0% 0 0%
65 0 0% 0 0%
66 1 0% 0 0%
67 0 0% 0 0%
68 0 0% 0 0%
69 0 0% 0 0%
70 0 0% 0 0%
71 0 0% 0 0%
72 0 0% 0 0%
73 1 0% 0 0%
74 0 0% 0 0%
75 1 0% 0 0%
76 0 0% 0 0%
77 0 0% 0 0%
78 0 0% 0 0%
79 0 0% 0 0%
80 0 0% 0 0%
81 0 0% 0 0%
82 0 0% 0 0%
83 0 0% 0 0%
84 0 0% 0 0%
Tot 1257 1717

Thanks Ed for the data.

Here is what the data means in rating change as cores increase from 1 to 4 cores. Data from CCRL 40/15 rating list.

Stockfish 15 64-bit 4CPU 3538
Stockfish 15 64-bit 1CPU 3523

Rating change +15

Dragon by Komodo 2.5 64-bit 4CPU 3514
Dragon by Komodo 2.5 64-bit 1CPU 3483

Rating Change +31

Not entirely sure I'm following the reasoning here, or what the data is supposed to be showing.

One thing, rating diff between two Elo's with error bars, have an error bar about double the individual error bars. This is okay if the elo diff is much greater than the error bars, but not if in the same range as it (as it is in this case). It's probably more accurate simply to play games between the 1CPU and 4CPU version and get an elo diff that way.

Also would be useful to get a rating diff for a bunch of engines, then you know what you are looking for, just two is not enough.

See my other post you missed when posting this...

I connect the dots.

Yes, I see it now. You show decreasing rating diffs for SF versions as released over time. Again we need to see corresponding rating diffs for other engines (presumably that data is in the CCRL results).
To cross-check (NB the error bars doubling), is it possible to build rating diffs by finding the game results between versions?

I'ld look for correlation between the error-diffs slope and the actual Elo value of the engine. It may well be that weaker engines have steeper slopes.

Posts : 880 Join date : 2020-11-25 Location : USA

Chris Whittington wrote:

mwyoung wrote:

Chris Whittington wrote:

mwyoung wrote:

Admin wrote:

Wrote a util that gives some insight in scaling.

GUI : Arena using [F4]
SF15 vs Dragon 2.5
Threads : 10
TC : 40/120
Games : 10

Util produces 2 statistics.

1. Changing best move during last 5 iterations

SF15 vs Komodo Dragon_2.5

Code:: IT SF15 perc Komo perc
4 229 22% 529 52%
3 91 9% 198 19%
2 44 4% 57   5%
1 2 0% 8   0%
0 0 0% 0   0%
Tot 366 792

Total moves 1008

2. Changing best move during all iterations (all mainlines)

SF15 vs Komodo Dragon_2.5

Code:: All mainlines overview
IT SF15 perc Komo perc
1 1 0% 502 49%
2 133 13% 201 19%
3 150 14% 194 19%
4 85 8% 89 8%
5 73 7% 100 9%
6 50 4% 44 4%
7 57 5% 47 4%
8 41 4% 31 3%
9 35 3% 36 3%
10 33 3% 38 3%
11 35 3% 38 3%
12 24 2% 26 2%
13 24 2% 27 2%
14 23 2% 36 3%
15 22 2% 45 4%
16 27 2% 38 3%
17 25 2% 30 2%
18 22 2% 31 3%
19 19 1% 28 2%
20 20 1% 32 3%
21 20 1% 26 2%
22 28 2% 15 1%
23 33 3% 18 1%
24 28 2% 15 1%
25 16 1% 8 0%
26 18 1% 8 0%
27 16 1% 1 0%
28 20 1% 5 0%
29 9 0% 2 0%
30 11 1% 1 0%
31 11 1% 1 0%
32 10 0% 1 0%
33 6 0% 1 0%
34 9 0% 0 0%
35 3 0% 0 0%
36 6 0% 1 0%
37 7 0% 1 0%
38 6 0% 0 0%
39 11 1% 0 0%
40 9 0% 0 0%
41 6 0% 0 0%
42 7 0% 0 0%
43 1 0% 0 0%
44 3 0% 0 0%
45 9 0% 0 0%
46 5 0% 0 0%
47 4 0% 0 0%
48 8 0% 0 0%
49 0 0% 0 0%
50 6 0% 0 0%
51 3 0% 0 0%
52 8 0% 0 0%
53 7 0% 0 0%
54 1 0% 0 0%
55 1 0% 0 0%
56 3 0% 0 0%
57 0 0% 0 0%
58 1 0% 0 0%
59 1 0% 0 0%
60 1 0% 0 0%
61 1 0% 0 0%
62 0 0% 0 0%
63 2 0% 0 0%
64 0 0% 0 0%
65 0 0% 0 0%
66 1 0% 0 0%
67 0 0% 0 0%
68 0 0% 0 0%
69 0 0% 0 0%
70 0 0% 0 0%
71 0 0% 0 0%
72 0 0% 0 0%
73 1 0% 0 0%
74 0 0% 0 0%
75 1 0% 0 0%
76 0 0% 0 0%
77 0 0% 0 0%
78 0 0% 0 0%
79 0 0% 0 0%
80 0 0% 0 0%
81 0 0% 0 0%
82 0 0% 0 0%
83 0 0% 0 0%
84 0 0% 0 0%
Tot 1257 1717

Thanks Ed for the data.

Here is what the data means in rating change as cores increase from 1 to 4 cores. Data from CCRL 40/15 rating list.

Stockfish 15 64-bit 4CPU 3538
Stockfish 15 64-bit 1CPU 3523

Rating change +15

Dragon by Komodo 2.5 64-bit 4CPU 3514
Dragon by Komodo 2.5 64-bit 1CPU 3483

Rating Change +31

Not entirely sure I'm following the reasoning here, or what the data is supposed to be showing.

One thing, rating diff between two Elo's with error bars, have an error bar about double the individual error bars. This is okay if the elo diff is much greater than the error bars, but not if in the same range as it (as it is in this case). It's probably more accurate simply to play games between the 1CPU and 4CPU version and get an elo diff that way.

Also would be useful to get a rating diff for a bunch of engines, then you know what you are looking for, just two is not enough.

See my other post you missed when posting this...

I connect the dots.

Yes, I see it now. You show decreasing rating diffs for SF versions as released over time. Again we need to see corresponding rating diffs for other engines (presumably that data is in the CCRL results).
To cross-check (NB the error bars doubling), is it possible to build rating diffs by finding the game results between versions?

I'ld look for correlation between the error-diffs slope and the actual Elo value of the engine. It may well be that weaker engines have steeper slopes.

I show SF 14 for example is equal with SF 15 in rating at 4 cores. and so is SF 13. But both SF 14 and SF 13 are scaling much faster +42 and + 38 to SF 15's +15 scaling rate.

Is it really a mystery to anyone. Why SF 15 lost to both SF 14 and SF 13. In Mclane's tournament. When Mclane is testing with 8 threads, and longer time controls. When SF 14 and SF 13 are showing scaling 3 times the SF 15's scaling rate at 4 cores.

That is why I always say "conditions matter" on what is the best chess engine.

Hmmmmmmm.

Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 8:36 pm

mwyoung wrote:: I show SF 14 for example is equal with SF 15 in rating at 4 cores. and so is SF 13. But both SF 14 and SF 13 are scaling much faster +42 and + 38 to SF 15's +15 scaling rate.

Is it really a mystery to anyone. Why SF 15 lost to both SF 14 and SF 13. In Mclane's tournament. When Mclane is testing with 8 threads, and longer time controls. When SF 14 and SF 13 are showing scaling 3 times the SF 15's scaling rate at 4 cores.

That why I always say "conditions matter" on what is the best chess engine.

Hmmmmmmm.

Maybe a hint would be in the WDL which is not available.

Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 9:08 pm

Got the WDL anyway.

Code:: W D L SCORE AV-OPP
Stockfish 13 64-bit 4CPU 604 586 -2 75.3% -165.1
Stockfish 14 64-bit 4CPU 301 568 -5 66.9% -100.5
Stockfish 14.1 64-bit 4CPU 307 738 -9 64.1% -80.5
Stockfish 15 64-bit 4CPU 426 780 -2 67.5% -104.8

There is a huge difference between the used elo pools especially for SF13 (-165) and SF14.1(-80).

Heck, I will setup a robin round between the four.

Posts : 880 Join date : 2020-11-25 Location : USA

Admin wrote:

Got the WDL anyway.

Code:: W D L SCORE AV-OPP
Stockfish 13 64-bit 4CPU 604 586 -2 75.3% -165.1
Stockfish 14 64-bit 4CPU 301 568 -5 66.9% -100.5
Stockfish 14.1 64-bit 4CPU 307 738 -9 64.1% -80.5
Stockfish 15 64-bit 4CPU 426 780 -2 67.5% -104.8

There is a huge difference between the used elo pools especially for SF13 (-165) and SF14.1(-80).

Heck, I will setup a robin round between the four.

Use 20 cores, and 1 core. You will see very interesting results. I did.

Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 9:44 pm

mwyoung wrote:

Admin wrote:

Got the WDL anyway.

Code:: W D L SCORE AV-OPP
Stockfish 13 64-bit 4CPU 604 586 -2 75.3% -165.1
Stockfish 14 64-bit 4CPU 301 568 -5 66.9% -100.5
Stockfish 14.1 64-bit 4CPU 307 738 -9 64.1% -80.5
Stockfish 15 64-bit 4CPU 426 780 -2 67.5% -104.8

There is a huge difference between the used elo pools especially for SF13 (-165) and SF14.1(-80).

Heck, I will setup a robin round between the four.

Use 20 cores, and 1 core. You will see very interesting results. I did.

How interesting was interesting?

Posts : 880 Join date : 2020-11-25 Location : USA

Admin wrote:

mwyoung wrote:

Admin wrote:

Got the WDL anyway.

Code:: W D L SCORE AV-OPP
Stockfish 13 64-bit 4CPU 604 586 -2 75.3% -165.1
Stockfish 14 64-bit 4CPU 301 568 -5 66.9% -100.5
Stockfish 14.1 64-bit 4CPU 307 738 -9 64.1% -80.5
Stockfish 15 64-bit 4CPU 426 780 -2 67.5% -104.8

There is a huge difference between the used elo pools especially for SF13 (-165) and SF14.1(-80).

Heck, I will setup a robin round between the four.

Use 20 cores, and 1 core. You will see very interesting results. I did.

How interesting was interesting?

Stockfish 14, then Stockfish 13 tested as the best stockfish engines. At 3+2 all cores on my system.
Stockfish 14.1 is a dog, I guess the Fishtest passed it do to their micro bullet time control testing.

Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Sun Jul 31, 2022 10:30 pm

Code:: No. Name Win Draw Loss Unf. Score Games %
----------------------------------------------------
  1 sf15 +154 =321 -35 *0 314.5 510 61.7%
  2 sf114.1 +110 =332 -68 *0 276.0 510 54.1%
  3 sf14 +81 =333 -96 *0 247.5 510 48.5%
  4 sf13 +31 =302 -177 *0 182.0 510 35.7%

The picture at 40/10 (one core) is crystal clear. Currently I don't have comp-time to run at 20 cores. Of course older versions will catch up because they will give SF15 a harder time, the question is how much.

Will be continued.

Posts : 880 Join date : 2020-11-25 Location : USA

End of Match.

Code:: Score of Dragon 3.1 vs Stockfish 07/24/22: 0 - 10 - 590 [0.492]
... Dragon 3.1 playing White: 0 - 1 - 299 [0.498] 300
... Dragon 3.1 playing Black: 0 - 9 - 291 [0.485] 300
... White vs Black: 9 - 1 - 590 [0.507] 600
Elo difference: -5.8 +/- 3.6, LOS: 0.1 %, DrawRatio: 98.3 %
600 of 600 games finished.

Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live Thu Aug 04, 2022 11:05 am

mwyoung wrote:

End of Match.

Code:: Score of Dragon 3.1 vs Stockfish 07/24/22: 0 - 10 - 590 [0.492]
... Dragon 3.1 playing White: 0 - 1 - 299 [0.498] 300
... Dragon 3.1 playing Black: 0 - 9 - 291 [0.485] 300
... White vs Black: 9 - 1 - 590 [0.507] 600
Elo difference: -5.8 +/- 3.6, LOS: 0.1 %, DrawRatio: 98.3 %
600 of 600 games finished.

DrawRatio: 98.3 %

Yikes.

Posts : 880 Join date : 2020-11-25 Location : USA

Admin wrote:

mwyoung wrote:

End of Match.

Code:: Score of Dragon 3.1 vs Stockfish 07/24/22: 0 - 10 - 590 [0.492]
... Dragon 3.1 playing White: 0 - 1 - 299 [0.498] 300
... Dragon 3.1 playing Black: 0 - 9 - 291 [0.485] 300
... White vs Black: 9 - 1 - 590 [0.507] 600
Elo difference: -5.8 +/- 3.6, LOS: 0.1 %, DrawRatio: 98.3 %
600 of 600 games finished.

DrawRatio: 98.3 %

Yikes.

The next match will make you quit chess engine testing.

Posts : 1254 Join date : 2020-11-17 Location : France

Admin wrote:

mwyoung wrote:

End of Match.

Code:: Score of Dragon 3.1 vs Stockfish 07/24/22: 0 - 10 - 590 [0.492]
... Dragon 3.1 playing White: 0 - 1 - 299 [0.498] 300
... Dragon 3.1 playing Black: 0 - 9 - 291 [0.485] 300
... White vs Black: 9 - 1 - 590 [0.507] 600
Elo difference: -5.8 +/- 3.6, LOS: 0.1 %, DrawRatio: 98.3 %
600 of 600 games finished.

DrawRatio: 98.3 %

Yikes.

How about doing a Stephan Pohl on the game data, looking for a sacrifice material finally ending in a draw? Eg, at least find out if an engine should get credit for at least trying to win?

Posts : 880 Join date : 2020-11-25 Location : USA

Chris Whittington wrote:

Admin wrote:

mwyoung wrote:

End of Match.

Code:: Score of Dragon 3.1 vs Stockfish 07/24/22: 0 - 10 - 590 [0.492]
... Dragon 3.1 playing White: 0 - 1 - 299 [0.498] 300
... Dragon 3.1 playing Black: 0 - 9 - 291 [0.485] 300
... White vs Black: 9 - 1 - 590 [0.507] 600
Elo difference: -5.8 +/- 3.6, LOS: 0.1 %, DrawRatio: 98.3 %
600 of 600 games finished.

DrawRatio: 98.3 %

Yikes.

How about doing a Stephan Pohl on the game data, looking for a sacrifice material finally ending in a draw? Eg, at least find out if an engine should get credit for at least trying to win?

No, I like the idea that even my Laptop has solved chess at 5 seconds a move. And using draw rate as proof. lol!

Posts : 880 Join date : 2020-11-25 Location : USA

Current data from the 40/2 vs 3m+2s match.

Code:: Result:
-----------------------------------------------------------------------------------------------
  # name games wins draws losses score% elo + -
  1. Dragon 3.1 by Komodo Chess 64-bit 10 0 10 0 50.0 0 69 69
  2. Stockfish 240722 10 0 10 0 50.0 0 69 69

Cross table:
-----------------------------------------------------------------------------------------------
  # name score% games 1 2
  1. Dragon 3.1 by Komodo Chess 64-bit 50.0 10 x ==========------------------------------------------------------------------------------------------
  2. Stockfish 240722 50.0 10 ==========------------------------------------------------------------------------------------------ x

Tech:
-----------------------------------------------------------------------------------------------

Tech (average nodes, depths, time/m per move, others per game), counted for computing moves only, ignored moves with zero nodes:
  # name nodes/m NPS depth/m time/m moves time
  1. Dragon 3.1 by Komodo Chess 64-bit 2306091K 12137221 54.2 190.0 62.0 11780.1
  2. Stockfish 240722 45346K 10130437 46.7 4.5 62.2 278.4
   all --- 1146385K 12090886 50.4 97.1 62.1 6029.3

Subject: Re: Dragon 3.1 vs Stockfish 07/24/22 - Live

» Dragon 3.1 vs Stockfish 07/24/22, TC = 40/2hours vs 3m+2s LIVE
» LIVE Dragon by Komodo Chess vs Stockfish 151120 (TC=90m+30s)(32 Threads)
» Dragon 3.1 vs Stockfish 170822, TC = 3m+2s vs 40/2hours (Reverse Match) Live
» Live Dragon by Komodo Chess vs Stockfish 291120 (TC=25m+5s)(32 Threads)
» LIVE broadcast - Who is the best gambit engine, Stockfish 13 or Komodo Dragon