ProDeo
Would you like to react to this message? Create an account in a few clicks or log in to continue.
ProDeo

Computer Chess
 
HomeHome  CalendarCalendar  Latest imagesLatest images  FAQFAQ  SearchSearch  MemberlistMemberlist  UsergroupsUsergroups  RegisterRegister  Log in  

 

 The Stockfish ELO problem

Go down 
+6
Chris Whittington
Damir Desevac
Uri Blass
TheSelfImprover
mwyoung
Admin
10 posters
Go to page : 1, 2  Next
AuthorMessage
Admin
Admin
Admin


Posts : 2608
Join date : 2020-11-17
Location : Netherlands

The Stockfish ELO problem Empty
PostSubject: The Stockfish ELO problem   The Stockfish ELO problem EmptyMon Aug 01, 2022 10:24 am

So which of the last 4 Stockfish versions is the best? And the 2 main rating lists give confusing results, so what's the matter?

Code:

CCRL 40m/15m                 CEGT 40m/20m
Stockfish 15   4CPU   3538   Stockfish 15   1CPU    3592
Stockfish 14   4CPU   3537   Stockfish 14.1 1CPU    3578
Stockfish 13   4CPU   3536   Stockfish 14   1CPU    3575    
Stockfish 14.1 4CPU   3522   Stockfish 13   1CPU    3563
                        
CCRL 40m/2m                  CEGT 40m/2m
Stockfish 15   1CPU   3691   Stockfish 14.1 1CPU    3649
Stockfish 14.1 1CPU   3679   Stockfish 14   1CPU    3646
Stockfish 14   1CPU   3650   Stockfish 15   1CPU    3645
Stockfish 13   1CPU   3624   Stockfish 13   1CPU    3600

1. Following the CCRL 40/15 4CPU list Stockfish 13-15 made no progress at all, version 14.1 is even worse.

2. However the CCRL 40/2 1CPU list looks as expected and as announced by the SF team, nice and steady progress, +67 elo.

3. The same can be said about the CEGT 40/20 1CPU list, steady progress, although the progress is less, +29 elo.

4. But like the CCRL 40/15 4CPU list the CEGT 40/20 1CPU list is as confusing, no progress since version 13.

Various issues (in the mix!) may play a role:
1. Stockfish scales bad, assumption, not a fact.
2. The used elo pools by the testers, for examples see the Average Opponent column in the rating lists.
3. The use of SF derivatives that may lower the real strength of the real one.
4. The fact that the more CPU's plus the higher time control will make the opposition stronger, a good example is [2] vs [3], +67 elo vs +29 elo.
5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and  Bayeselo elo systems no longer function, has reached a sort of upper limit.

To shed some other light on the issue I decided to pitch the 4 versions against each other in a robin round, on 20 cores and increasing time controls. Let's see how that goes and if we see the same pattern arise.

Code:
------------------------|----------------|----------------|----------------|-----------------|
          TC=40/10  ELO | TC=40/10   ELO | TC=40/20   ELO | TC=40/40   ELO | TC=40/80   ELO  |
------------------------|----------------|----------------|----------------|-----------------|
Engine     1 CPU        |  20 CPU        |  20 CPU        |  20 CPU        |  20 CPU         |
------------------------|----------------|----------------|----------------|-----------------|
SF15       61.7%    +82 |   58.3%    +58 |                |                |                 |
SF14.1     54.1%    +31 |   53.3%    +23 |   running      |                |                 |
SF14       48.5%    -10 |   46.3%    -26 |                |                |                 |
SF13       35.7%   -100 |   42.0%    -56 |                |                |                 |
------------------------|----------------|----------------|----------------|-----------------|
Back to top Go down
http://rebel13.nl/
mwyoung

mwyoung


Posts : 880
Join date : 2020-11-25
Location : USA

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyMon Aug 01, 2022 11:30 am

5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and  Bayeselo elo systems no longer function, has reached a sort of upper limit.

I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.

As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.

So if this system no longer works!

That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.

So no need to worry about those pricey computer upgrades for chess engines.
Back to top Go down
TheSelfImprover

TheSelfImprover


Posts : 3112
Join date : 2020-11-18

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyMon Aug 01, 2022 11:45 am

mwyoung wrote:
5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and  Bayeselo elo systems no longer function, has reached a sort of upper limit.

I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.

As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.

So if this system no longer works!

That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.

So no need to worry about those pricey computer upgrades for chess engines.


We might not be there yet, but it look to me as though that's the direction of travel we're on.
Back to top Go down
Admin
Admin
Admin


Posts : 2608
Join date : 2020-11-17
Location : Netherlands

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyMon Aug 01, 2022 6:47 pm

mwyoung wrote:
5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and  Bayeselo elo systems no longer function, has reached a sort of upper limit.

I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.

As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.

So if this system no longer works!

That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.

So no need to worry about those pricey computer upgrades for chess engines.

If that is the case we need a new elo formula. Of course progress continues. SF can't win about every TCEC super final without progress.
Back to top Go down
http://rebel13.nl/
Admin
Admin
Admin


Posts : 2608
Join date : 2020-11-17
Location : Netherlands

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyMon Aug 01, 2022 9:11 pm

UPDATE

Code:
|-------------------------|----------------|----------------|----------------|-----------------|
|         TC=40/10    ELO | TC=40/10   ELO | TC=40/20   ELO | TC=40/40   ELO | TC=40/80   ELO  |
|-------------------------|----------------|----------------|----------------|-----------------|
| Engine     1 CPU        |    20 CPU      |    20 CPU      |    20 CPU      |    20 CPU       |
|-------------------------|----------------|----------------|----------------|-----------------|
| SF15       61.7%    +82 |   58.3%    +58 |   54.8%    +33 |                |                 |
| SF14.1     54.1%    +31 |   53.3%    +23 |   51.8%    +12 |   running      |                 |
| SF14       48.5%    -10 |   46.3%    -26 |   48.7%     -9 |                |                 |
| SF13       35.7%   -100 |   42.0%    -56 |   44.7%    -37 |                |                 |
|-------------------------|----------------|----------------|----------------|-----------------|
|                         |     Equals     |     Equals     |     Equals     |     Equals      |
|                         |  40m in 3m/20s |  40m in 6m/40s |  40m in 13m    |  40m in 26m     |
|-------------------------|----------------|----------------|----------------|-----------------|
| Dragon 2.5  running     |                |                |                |                 |
| Dragon 2.0              |                |                |                |                 |
|-------------------------|----------------|----------------|----------------|-----------------|

Draws : 86.2%

I can already see the first contours loom of the rating lists and option 5.

OTOH it can be a specific SF scaling issue and so I also added Komodo to the same test. Dragon 2.5 is considerable stronger than 2.0 but if it shows the same pattern as SF option 5 becomes a serious reality.

We will see....
Back to top Go down
http://rebel13.nl/
mwyoung

mwyoung


Posts : 880
Join date : 2020-11-25
Location : USA

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyMon Aug 01, 2022 9:38 pm

Admin wrote:
mwyoung wrote:
5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and  Bayeselo elo systems no longer function, has reached a sort of upper limit.

I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.

As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.

So if this system no longer works!

That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.

So no need to worry about those pricey computer upgrades for chess engines.

If that is the case we need a new elo formula. Of course progress continues. SF can't win about every TCEC super final without progress.

Ed there is nothing wrong with the Elo system. That is not the issue. You could put a 32 man table base in the mix, and the Elo system would give a accurate rating for the conditions of the rating pool.

My feeling is that it is not a priority for Stockfish, or the other engines to tune for long time control chess or many cores. As they only tune at very fast time controls like micro bullet. As they want thousands of games to judge the changes to a engine.

But those changes do not always work at longer time controls, or with with more cores.

Stockfish is a perfect example. As we see advancement at micro bullet and bullet time controls.

I think it is a issue of priorities.
Back to top Go down
mwyoung

mwyoung


Posts : 880
Join date : 2020-11-25
Location : USA

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyMon Aug 01, 2022 10:19 pm

To show the scaling issue with Stockfish. I started a match with SF DEV, and a very slow and weak chess search engine when compared to Stockfish NNUE.

The Monte-Carlo Tree Search.  The MCTS is searching only 700 NPS to Stockfish's 13 Million NPS. But it has the advantage of good scaling.

Here are the results so far a at only 40moves/1hour

Code:
Score of Dragon 3.1 MCTS vs Stockfish 07/24/22: 0 - 0 - 5 [0.500]
...      Dragon 3.1 MCTS playing White: 0 - 0 - 3  [0.500] 3
...      Dragon 3.1 MCTS playing Black: 0 - 0 - 2  [0.500] 2
...      White vs Black: 0 - 0 - 5  [0.500] 5
Elo difference: 0.0 +/- 0.0, LOS: nan %, DrawRatio: 100.0 %
5 of 200 games finished.
Back to top Go down
Admin
Admin
Admin


Posts : 2608
Join date : 2020-11-17
Location : Netherlands

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyTue Aug 02, 2022 8:54 am

KOMODO UPDATE

Code:
|-------------------------|----------------|----------------|----------------|-----------------|
|         TC=40/10    ELO | TC=40/10   ELO | TC=40/20   ELO | TC=40/40   ELO | TC=40/80   ELO  |
|-------------------------|----------------|----------------|----------------|-----------------|
| Engine     1 CPU        |    20 CPU      |    20 CPU      |    20 CPU      |    20 CPU       |
|-------------------------|----------------|----------------|----------------|-----------------|
| SF15       61.7%    +82 |   58.3%    +58 |   54.8%    +33 |                |                 |
| SF14.1     54.1%    +31 |   53.3%    +23 |   51.8%    +12 |   running      |                 |
| SF14       48.5%    -10 |   46.3%    -26 |   48.7%     -9 |                |                 |
| SF13       35.7%   -100 |   42.0%    -56 |   44.7%    -37 |                |                 |
|-------------------------|----------------|----------------|----------------|-----------------|
| Dragon 2.5 66.2%   +113 |   58.0%    +56 |   58.7%    +61 |   running      |                 |
| Dragon 2.0 33.8%   -113 |   42.0%    -56 |   41.3%    -61 |                |                 |
|-------------------------|----------------|----------------|----------------|-----------------|
|                         |     Equals     |     Equals     |     Equals     |     Equals      |
|                         |  40m in 3m/20s |  40m in 6m/40s |  40m in 13m    |  40m in 26m     |
|-------------------------|----------------|----------------|----------------|-----------------|

Contrary to SF we see Komodo scaling extremely well. It's too early for a final conclusion but it looks like what Mark said earlier, the SF folks better start to focus on LTC and there is nothing wrong with the rating lists.

Phew...

Dio likes this post

Back to top Go down
http://rebel13.nl/
mwyoung

mwyoung


Posts : 880
Join date : 2020-11-25
Location : USA

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyTue Aug 02, 2022 11:00 am

mwyoung wrote:
To show the scaling issue with Stockfish. I started a match with SF DEV, and a very slow and weak chess search engine when compared to Stockfish NNUE.

The Monte-Carlo Tree Search.  The MCTS is searching only 700 NPS to Stockfish's 13 Million NPS. But it has the advantage of good scaling.

Here are the results so far a at only 40moves/1hour

Code:
Score of Dragon 3.1 MCTS vs Stockfish 07/24/22: 0 - 0 - 5 [0.500]
...      Dragon 3.1 MCTS playing White: 0 - 0 - 3  [0.500] 3
...      Dragon 3.1 MCTS playing Black: 0 - 0 - 2  [0.500] 2
...      White vs Black: 0 - 0 - 5  [0.500] 5
Elo difference: 0.0 +/- 0.0, LOS: nan %, DrawRatio: 100.0 %
5 of 200 games finished.

This result so far is hilarious! And shows if you use your engine for long analysis. Also check with better scaling engines also like Lc0, or even MCTS.

Code:
Score of Dragon 3.1 MCTS vs Stockfish 07/24/22: 0 - 0 - 8 [0.500]
...      Dragon 3.1 MCTS playing White: 0 - 0 - 4  [0.500] 4
...      Dragon 3.1 MCTS playing Black: 0 - 0 - 4  [0.500] 4
...      White vs Black: 0 - 0 - 8  [0.500] 8
Elo difference: 0.0 +/- 0.0, LOS: nan %, DrawRatio: 100.0 %
8 of 200 games finished.
Back to top Go down
Admin
Admin
Admin


Posts : 2608
Join date : 2020-11-17
Location : Netherlands

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyTue Aug 02, 2022 8:50 pm

UPDATE

Code:
|-------------------------|----------------|----------------|----------------|-----------------|
|         TC=40/10    ELO | TC=40/10   ELO | TC=40/20   ELO | TC=40/40   ELO | TC=40/80   ELO  |
|-------------------------|----------------|----------------|----------------|-----------------|
| Engine     1 CPU        |    20 CPU      |    20 CPU      |    20 CPU      |    20 CPU       |
|-------------------------|----------------|----------------|----------------|-----------------|
| SF15       61.7%    +82 |   58.3%    +58 |   54.8%    +33 |   55.1%    +35 |                 |
| SF14.1     54.1%    +31 |   53.3%    +23 |   51.8%    +12 |   51.3%     +9 |   running       |
| SF14       48.5%    -10 |   46.3%    -26 |   48.7%     -9 |   47.6%    -16 |                 |
| SF13       35.7%   -100 |   42.0%    -56 |   44.7%    -37 |   46.0%    -28 |                 |
|-------------------------|----------------|----------------|----------------|-----------------|
| Dragon 2.5 66.2%   +113 |   58.0%    +56 |   58.7%    +61 |   58.5%    +59 |  Done, Komodo   |
| Dragon 2.0 33.8%   -113 |   42.0%    -56 |   41.3%    -61 |   41.5%    -59 |  scales well    |
|-------------------------|----------------|----------------|----------------|-----------------|
|                         |     Equals     |     Equals     |     Equals     |     Equals      |
|                         |  40m in 3m/20s |  40m in 6m/40s |  40m in 13m    |  40m in 26m     |
|-------------------------|----------------|----------------|----------------|-----------------|

Komodo scales extremely well (+56,+61,+59), SF15 kept its ~+34 (no further falling down), last SF run, equals 40m/26m about CCRL 40/15 | CEGT 40/20 at 2CPU.

SF draw rate : 88%
Dragon draw rate : 73%

Mclane likes this post

Back to top Go down
http://rebel13.nl/
mwyoung

mwyoung


Posts : 880
Join date : 2020-11-25
Location : USA

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyTue Aug 02, 2022 9:05 pm

Admin wrote:
UPDATE

Code:
|-------------------------|----------------|----------------|----------------|-----------------|
|         TC=40/10    ELO | TC=40/10   ELO | TC=40/20   ELO | TC=40/40   ELO | TC=40/80   ELO  |
|-------------------------|----------------|----------------|----------------|-----------------|
| Engine     1 CPU        |    20 CPU      |    20 CPU      |    20 CPU      |    20 CPU       |
|-------------------------|----------------|----------------|----------------|-----------------|
| SF15       61.7%    +82 |   58.3%    +58 |   54.8%    +33 |   55.1%    +35 |                 |
| SF14.1     54.1%    +31 |   53.3%    +23 |   51.8%    +12 |   51.3%     +9 |   running       |
| SF14       48.5%    -10 |   46.3%    -26 |   48.7%     -9 |   47.6%    -16 |                 |
| SF13       35.7%   -100 |   42.0%    -56 |   44.7%    -37 |   46.0%    -28 |                 |
|-------------------------|----------------|----------------|----------------|-----------------|
| Dragon 2.5 66.2%   +113 |   58.0%    +56 |   58.7%    +61 |   58.5%    +59 |  Done, Komodo   |
| Dragon 2.0 33.8%   -113 |   42.0%    -56 |   41.3%    -61 |   41.5%    -59 |  scales well    |
|-------------------------|----------------|----------------|----------------|-----------------|
|                         |     Equals     |     Equals     |     Equals     |     Equals      |
|                         |  40m in 3m/20s |  40m in 6m/40s |  40m in 13m    |  40m in 26m     |
|-------------------------|----------------|----------------|----------------|-----------------|

Komodo scales extremely well (+56,+61,+59), SF15 kept its ~+34 (no further falling down), last SF run, equals 40m/26m about CCRL 40/15 | CEGT 40/20 at 2CPU.

SF draw rate : 88%
Dragon draw rate : 73%

It gets even worse for SF if you use normal time controls. But we can see it here just fine.
Back to top Go down
Admin
Admin
Admin


Posts : 2608
Join date : 2020-11-17
Location : Netherlands

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyThu Aug 04, 2022 8:13 am

FINAL RESULT

Code:
|-------------------------|----------------|----------------|----------------|-----------------|
|         TC=40/10    ELO | TC=40/10   ELO | TC=40/20   ELO | TC=40/40   ELO | TC=40/80   ELO  |
|-------------------------|----------------|----------------|----------------|-----------------|
| Engine     1 CPU        |    20 CPU      |    20 CPU      |    20 CPU      |    20 CPU       |
|-------------------------|----------------|----------------|----------------|-----------------|
| SF15       61.7%    +82 |   58.3%    +58 |   54.8%    +33 |   55.1%    +35 |   53.7%    +26  |
| SF14.1     54.1%    +31 |   53.3%    +23 |   51.8%    +12 |   51.3%     +9 |   50.3%     +2  |
| SF14       48.5%    -10 |   46.3%    -26 |   48.7%     -9 |   47.6%    -16 |   48.7%     -9  |
| SF13       35.7%   -100 |   42.0%    -56 |   44.7%    -37 |   46.0%    -28 |   47.3%    -19  |
|-------------------------|----------------|----------------|----------------|-----------------|
| Dragon 2.5 66.2%   +113 |   58.0%    +56 |   58.7%    +61 |   58.5%    +59 |  Done, Komodo   |
| Dragon 2.0 33.8%   -113 |   42.0%    -56 |   41.3%    -61 |   41.5%    -59 |  scales well    |
|-------------------------|----------------|----------------|----------------|-----------------|
|                         |     Equals     |     Equals     |     Equals     |     Equals      |
|                         |  40m in 3m/20s |  40m in 6m/40s |  40m in 13m    |  40m in 26m     |
|-------------------------|----------------|----------------|----------------|-----------------|

Draw rate : 91.7%

Ugh.

SF15 going down from +82 to finally +26 but it never lost a game in the last run.
SF13 going up from -113 to finally -19

I am beginning to understand why CEGT and CCRL are showing the results as pointed out in the OP.

I am beginning to understand why rating lists like the one of Stefan and the GRL don't have this problem yet. First they play openings that favor SF and also the short time control favors SF.

TCEC is a dead end street, they play unusual openings to survive, the question is, for how long?

TheSelfImprover, mwyoung and Dio like this post

Back to top Go down
http://rebel13.nl/
Uri Blass




Posts : 207
Join date : 2020-11-28

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyMon Aug 08, 2022 11:43 am

Admin wrote:
mwyoung wrote:
5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and  Bayeselo elo systems no longer function, has reached a sort of upper limit.

I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.

As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.

So if this system no longer works!

That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.

So no need to worry about those pricey computer upgrades for chess engines.

If that is the case we need a new elo formula. Of course progress continues. SF can't win about every TCEC super final without progress.

1)SF cannot win TCEC without progress in unbalanced position but if stockfish cannot create these positions in normal chess games even against weaker opponents then this progress is not relevant for rating in normal games.

2)Even if stockfish is unbeatable there can be progress in beating weaker opponents because even if you know not to make losing mistakes it does not mean that you always know how to take of advantage of losing mistakes in order to win.

Ozymandias likes this post

Back to top Go down
TheSelfImprover

TheSelfImprover


Posts : 3112
Join date : 2020-11-18

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyMon Aug 08, 2022 1:52 pm

Uri Blass wrote:
Even if stockfish is unbeatable there can be progress in beating weaker opponents because even if you know not to make losing mistakes it does not mean that you always know how to take of advantage of losing mistakes in order to win.


If a player recognises losing positions, he/she/it should be able to avoid a move that turns a winning position into a non-winning one.
Back to top Go down
Uri Blass




Posts : 207
Join date : 2020-11-28

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyMon Aug 08, 2022 5:07 pm

TheSelfImprover wrote:
Uri Blass wrote:
Even if stockfish is unbeatable there can be progress in beating weaker opponents because even if you know not to make losing mistakes it does not mean that you always know how to take of advantage of losing mistakes in order to win.


If a player recognises losing positions, he/she/it should be able to avoid a move that turns a winning position into a non-winning one.


Being able not to lose does not mean that you are able to win every winning position that you get and does not mean you recognize losing positions.
You practically do not go to inferior positions when you have the choice of equal position even if you do not know if the inferior position is drawn or loss.

Suppose you know in KRPPP vs KRPPP in the same side how to draw and know not to lose material so you practically never lose.

Now suppose that the opponent lose a pawn and go to KRPPP vs KRPP when you have a pawn advantage.
It is possible that now you can win the game but do not know how to do it and that you are going to miss a win later and it does not contradict being able not to lose from the starting KRPPP vs KRPPP.
Back to top Go down
Damir Desevac

Damir Desevac


Posts : 330
Join date : 2020-11-27
Age : 43
Location : Denmark

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyTue Aug 09, 2022 5:43 pm

For SF to improve its scalling it must disable its nnue...

Also another issue is both Komodo and SF not use the same nnue file, so not a fair compairsement...
Back to top Go down
Chris Whittington




Posts : 1254
Join date : 2020-11-17
Location : France

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyTue Aug 09, 2022 6:02 pm

Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).

Ozymandias likes this post

Back to top Go down
Admin
Admin
Admin


Posts : 2608
Join date : 2020-11-17
Location : Netherlands

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyTue Aug 09, 2022 8:14 pm

Damir Desevac wrote:
For SF to improve its scalling it must disable its nnue...

With a loss of 150-200 elo as a result.

Instead they should work on search, key word: less (taperated) pruning.
Back to top Go down
http://rebel13.nl/
Admin
Admin
Admin


Posts : 2608
Join date : 2020-11-17
Location : Netherlands

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyTue Aug 09, 2022 8:37 pm

Alright, more on scaling.

We have seen SF to scale not so well while Komodo scales very well.

What about Rebel?

Here are the results of the latest network.

Code:
No. Name           Win Draw Loss Unf.  Score Games       %
----------------------------------------------------------
  1 Rebel         +634 =976 -390   *0 1122.0  2000   56.1%
  2 Seer-2.5.0     +81 =236  -83   *0  199.0   400   49.8%
  3 Stockfish-9    +89 =176 -135   *0  177.0   400   44.2%
  4 Komodo-14      +70 =205 -125   *0  172.5   400   43.1%
  5 Houdini-6.03   +81 =169 -150   *0  165.5   400   41.4%
  6 RubiChess-2.2  +69 =190 -141   *0  164.0   400   41.0%

Ordo performance based on the CCRL blitz list.

ordo -p all.pgn -a 3457

Code:
   # PLAYER           :  RATING  POINTS  PLAYED   (%)
   1 Rebel            :  3492.9  1122.0    2000    56
   2 Seer-2.5.0       :  3491.2   199.0     400    50
   3 Stockfish-9      :  3452.4   177.0     400    44
   4 Komodo-14        :  3444.4   172.5     400    43
   5 Houdini-6.03     :  3431.9   165.5     400    41
   6 RubiChess-2.2    :  3429.1   164.0     400    41

56% is a fantastic result against these giants, but....... based on 40/10 time control.

In the next runs I will double the time control, 40/20 to begin with and I PREDICT that in the end I should be happy with 50% and if that will come true that would mean a loss of 42 elo points because of scaling.

The reason is simple : SEARCH.
Back to top Go down
http://rebel13.nl/
Uri Blass




Posts : 207
Join date : 2020-11-28

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyTue Aug 09, 2022 9:16 pm

Chris Whittington wrote:
Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).


EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.

Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves.
Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.


Last edited by Uri Blass on Tue Aug 09, 2022 9:18 pm; edited 1 time in total (Reason for editing : changed small letter to big letter)

Ozymandias likes this post

Back to top Go down
mwyoung

mwyoung


Posts : 880
Join date : 2020-11-25
Location : USA

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyTue Aug 09, 2022 9:27 pm

Uri Blass wrote:
Chris Whittington wrote:
Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).


EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.

Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves.
Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.

When I do the reverse match of D3.1 vs SF. Where SF gets the 2 hours in 40 moves. It could tell us something.

But as we know, engines do not play perfectly. As can be shown with the perfect play we have today. 7 man positions, and many test positions. The engines are still clueless.

But a good way to maximize a 32 man TB, would be to play the lines that promote the most complexity. Longest game possible. So you are playing the odds.

Remember it only takes 1 mistake to lose against perfect play. . Assuming, and that is assuming chess is a forced draw.
Back to top Go down
Uri Blass




Posts : 207
Join date : 2020-11-28

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyTue Aug 09, 2022 10:53 pm

mwyoung wrote:
Uri Blass wrote:
Chris Whittington wrote:
Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).


EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.

Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves.
Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.

When I do the reverse match of D3.1 vs SF. Where SF gets the 2 hours in 40 moves. It could tell us something.

But as we know, engines do not play perfectly. As can be shown with the perfect play we have today. 7 man positions, and many test positions. The engines are still clueless.

But a good way to maximize a 32 man TB, would be to play the lines that promote the most complexity. Longest game possible. So you are playing the odds.

Remember it only takes 1 mistake to lose against perfect play. . Assuming, and that is assuming chess is a forced draw.

I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position.

Basically there are 2 questions:
1)Do engines play perfect games with no losing mistake?
2)Is it possible to find some strategy to beat chess engines?

It is possible that the reply to both question is positive.
I do not know about a single software that does a serious try to win against a specific opponent (when it get the details of the opponent and the time control) including using the fact that it has significantly more time to predict the moves of the opponent and pruning moves not because they are bad but because the opponent is not expected to play them(predicting the moves can be done simply by asking a copy of the opponent to search).
Back to top Go down
Admin
Admin
Admin


Posts : 2608
Join date : 2020-11-17
Location : Netherlands

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyTue Aug 09, 2022 10:57 pm

40/20 result

Code:
No. Name           Win Draw Loss Unf.  Score Games       %
----------------------------------------------------------
  1 Rebel         +569 =1030 -401   *0 1084.0  2000  54.2%
  2 Seer-2.5.0     +92 =236  -72   *0  210.0   400   52.5%
  3 Komodo-14      +78 =211 -111   *0  183.5   400   45.9%
  4 Stockfish-9    +79 =199 -122   *0  178.5   400   44.6%
  5 RubiChess-2.2  +75 =205 -120   *0  177.5   400   44.4%
  6 Houdini-6.03   +77 =179 -144   *0  166.5   400   41.6%

Grand General

Code:
  TC   SCORE  PERF
40/10  56.1%  3492
40/20  54.2%  3481
40/40
40/80

TheSelfImprover and Dio like this post

Back to top Go down
http://rebel13.nl/
Mclane

Mclane


Posts : 3022
Join date : 2020-11-17
Age : 57
Location : United States of Europe, Germany, Ruhr area

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyTue Aug 09, 2022 10:59 pm

On my new pc , rebel is very very strong.
Back to top Go down
http://www.thorstenczub.de
mwyoung

mwyoung


Posts : 880
Join date : 2020-11-25
Location : USA

The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem EmptyTue Aug 09, 2022 11:12 pm

Uri Blass wrote:
mwyoung wrote:
Uri Blass wrote:
Chris Whittington wrote:
Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).


EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.

Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves.
Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.

When I do the reverse match of D3.1 vs SF. Where SF gets the 2 hours in 40 moves. It could tell us something.

But as we know, engines do not play perfectly. As can be shown with the perfect play we have today. 7 man positions, and many test positions. The engines are still clueless.

But a good way to maximize a 32 man TB, would be to play the lines that promote the most complexity. Longest game possible. So you are playing the odds.

Remember it only takes 1 mistake to lose against perfect play. . Assuming, and that is assuming chess is a forced draw.

I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position.

Basically there are 2 questions:
1)Do engines play perfect games with no losing mistake?
2)Is it possible to find some strategy to beat chess engines?

It is possible that the reply to both question is positive.
I do not know about a single software that does a serious try to win against a specific opponent (when it get the details of the opponent and the time control) including using the fact that it has significantly more time to predict the moves of the opponent and pruning moves not because they are bad but because the opponent is not expected to play them(predicting the moves can be done simply by asking a copy of the opponent to search).

"I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position."

Yes it does contradict that!
What are you talking about?

Otherwise chess engines would play perfect chess in all 7 man positions, and test positions. As that is a subset of all opening positions.

1)Do engines play perfect games with no losing mistake?

Yes they can play perfectly, depending on the opponent. Remember chess is a 2 player game. If for example I play into fools mate, that would be a perfect game. And many engines, and other humans , and I could play a perfect game.

And that is why you can not assume draw, or win results = perfect play. Between 2 imperfect players.

2)Is it possible to find some strategy to beat chess engines?

Yes as we know that chess engines do not play perfect chess. Until chess engines do you can still improve.
Back to top Go down
Sponsored content





The Stockfish ELO problem Empty
PostSubject: Re: The Stockfish ELO problem   The Stockfish ELO problem Empty

Back to top Go down
 
The Stockfish ELO problem
Back to top 
Page 1 of 2Go to page : 1, 2  Next
 Similar topics
-
» Testing the New Stockfish Net of 45 Mb, Stockfish 190521 vs Stockfish 13 (TC = 5m+5s) (32 Threads)
» Testing the New Stockfish Net of 45 Mb vs 20 Mb, Stockfish 190521 vs Stockfish 13 (TC = 1m+1s)
» Stockfish 110221 Vs. Stockfish 12, Ratings Gain Test, 5000 Games.
» How much Progress? Stockfish 14 vs Stockfish 22/11/21 (NUMA issue fixed version)
» Stockfish NNUE 310121 (1 Core) vs Stockfish 11 (32 Threads) (5m game)

Permissions in this forum:You cannot reply to topics in this forum
ProDeo :: Computer Chess-
Jump to: