The Stockfish ELO problem

Subject: The Stockfish ELO problem Mon Aug 01, 2022 10:24 am

So which of the last 4 Stockfish versions is the best? And the 2 main rating lists give confusing results, so what's the matter?

Code:: CCRL 40m/15m CEGT 40m/20m
Stockfish 15 4CPU 3538 Stockfish 15 1CPU 3592
Stockfish 14 4CPU 3537   Stockfish 14.1 1CPU 3578
Stockfish 13 4CPU 3536 Stockfish 14 1CPU 3575
Stockfish 14.1 4CPU 3522 Stockfish 13 1CPU 3563

CCRL 40m/2m CEGT 40m/2m
Stockfish 15 1CPU 3691   Stockfish 14.1 1CPU    3649
Stockfish 14.1 1CPU 3679 Stockfish 14 1CPU    3646
Stockfish 14 1CPU 3650 Stockfish 15 1CPU    3645
Stockfish 13 1CPU 3624   Stockfish 13 1CPU    3600

1. Following the CCRL 40/15 4CPU list Stockfish 13-15 made no progress at all, version 14.1 is even worse.

2. However the CCRL 40/2 1CPU list looks as expected and as announced by the SF team, nice and steady progress, +67 elo.

3. The same can be said about the CEGT 40/20 1CPU list, steady progress, although the progress is less, +29 elo.

4. But like the CCRL 40/15 4CPU list the CEGT 40/20 1CPU list is as confusing, no progress since version 13.

Various issues (in the mix!) may play a role:
1. Stockfish scales bad, assumption, not a fact.
2. The used elo pools by the testers, for examples see the Average Opponent column in the rating lists.
3. The use of SF derivatives that may lower the real strength of the real one.
4. The fact that the more CPU's plus the higher time control will make the opposition stronger, a good example is [2] vs [3], +67 elo vs +29 elo.
5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and Bayeselo elo systems no longer function, has reached a sort of upper limit.

To shed some other light on the issue I decided to pitch the 4 versions against each other in a robin round, on 20 cores and increasing time controls. Let's see how that goes and if we see the same pattern arise.

Code:: ------------------------|----------------|----------------|----------------|-----------------|
TC=40/10 ELO | TC=40/10 ELO | TC=40/20 ELO | TC=40/40 ELO | TC=40/80 ELO |
------------------------|----------------|----------------|----------------|-----------------|
Engine 1 CPU | 20 CPU | 20 CPU | 20 CPU | 20 CPU |
------------------------|----------------|----------------|----------------|-----------------|
SF15 61.7% +82 | 58.3% +58 | | | |
SF14.1 54.1% +31 | 53.3% +23 | running | | |
SF14 48.5% -10 | 46.3% -26 | | | |
SF13 35.7% -100 | 42.0% -56 | | | |
------------------------|----------------|----------------|----------------|-----------------|

Posts : 880 Join date : 2020-11-25 Location : USA

5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and Bayeselo elo systems no longer function, has reached a sort of upper limit.

I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.

As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.

So if this system no longer works!

That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.

So no need to worry about those pricey computer upgrades for chess engines.

Posts : 3112 Join date : 2020-11-18

mwyoung wrote:: 5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and Bayeselo elo systems no longer function, has reached a sort of upper limit.

I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.

As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.

So if this system no longer works!

That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.

So no need to worry about those pricey computer upgrades for chess engines.

We might not be there yet, but it look to me as though that's the direction of travel we're on.

Subject: Re: The Stockfish ELO problem Mon Aug 01, 2022 6:47 pm

mwyoung wrote:: 5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and Bayeselo elo systems no longer function, has reached a sort of upper limit.

I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.

As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.

So if this system no longer works!

That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.

So no need to worry about those pricey computer upgrades for chess engines.

If that is the case we need a new elo formula. Of course progress continues. SF can't win about every TCEC super final without progress.

Subject: Re: The Stockfish ELO problem Mon Aug 01, 2022 9:11 pm

UPDATE

Code:: |-------------------------|----------------|----------------|----------------|-----------------|
| TC=40/10 ELO | TC=40/10 ELO | TC=40/20 ELO | TC=40/40 ELO | TC=40/80 ELO |
|-------------------------|----------------|----------------|----------------|-----------------|
| Engine 1 CPU | 20 CPU | 20 CPU | 20 CPU | 20 CPU |
|-------------------------|----------------|----------------|----------------|-----------------|
| SF15 61.7% +82 | 58.3% +58 | 54.8% +33 | | |
| SF14.1 54.1% +31 | 53.3% +23 | 51.8% +12 | running | |
| SF14 48.5% -10 | 46.3% -26 | 48.7% -9 | | |
| SF13 35.7% -100 | 42.0% -56 | 44.7% -37 | | |
|-------------------------|----------------|----------------|----------------|-----------------|
| | Equals | Equals | Equals | Equals |
| | 40m in 3m/20s | 40m in 6m/40s | 40m in 13m | 40m in 26m |
|-------------------------|----------------|----------------|----------------|-----------------|
| Dragon 2.5 running | | | | |
| Dragon 2.0 | | | | |
|-------------------------|----------------|----------------|----------------|-----------------|

Draws : 86.2%

I can already see the first contours loom of the rating lists and option 5.

OTOH it can be a specific SF scaling issue and so I also added Komodo to the same test. Dragon 2.5 is considerable stronger than 2.0 but if it shows the same pattern as SF option 5 becomes a serious reality.

We will see....

Posts : 880 Join date : 2020-11-25 Location : USA

Admin wrote:

mwyoung wrote:: 5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and Bayeselo elo systems no longer function, has reached a sort of upper limit.

I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.

As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.

So if this system no longer works!

That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.

So no need to worry about those pricey computer upgrades for chess engines.

If that is the case we need a new elo formula. Of course progress continues. SF can't win about every TCEC super final without progress.

Ed there is nothing wrong with the Elo system. That is not the issue. You could put a 32 man table base in the mix, and the Elo system would give a accurate rating for the conditions of the rating pool.

My feeling is that it is not a priority for Stockfish, or the other engines to tune for long time control chess or many cores. As they only tune at very fast time controls like micro bullet. As they want thousands of games to judge the changes to a engine.

But those changes do not always work at longer time controls, or with with more cores.

Stockfish is a perfect example. As we see advancement at micro bullet and bullet time controls.

I think it is a issue of priorities.

Posts : 880 Join date : 2020-11-25 Location : USA

To show the scaling issue with Stockfish. I started a match with SF DEV, and a very slow and weak chess search engine when compared to Stockfish NNUE.

The Monte-Carlo Tree Search. The MCTS is searching only 700 NPS to Stockfish's 13 Million NPS. But it has the advantage of good scaling.

Here are the results so far a at only 40moves/1hour

Code:: Score of Dragon 3.1 MCTS vs Stockfish 07/24/22: 0 - 0 - 5 [0.500]
... Dragon 3.1 MCTS playing White: 0 - 0 - 3 [0.500] 3
... Dragon 3.1 MCTS playing Black: 0 - 0 - 2 [0.500] 2
... White vs Black: 0 - 0 - 5 [0.500] 5
Elo difference: 0.0 +/- 0.0, LOS: nan %, DrawRatio: 100.0 %
5 of 200 games finished.

Subject: Re: The Stockfish ELO problem Tue Aug 02, 2022 8:54 am

KOMODO UPDATE

Code:: |-------------------------|----------------|----------------|----------------|-----------------|
| TC=40/10 ELO | TC=40/10 ELO | TC=40/20 ELO | TC=40/40 ELO | TC=40/80 ELO |
|-------------------------|----------------|----------------|----------------|-----------------|
| Engine 1 CPU | 20 CPU | 20 CPU | 20 CPU | 20 CPU |
|-------------------------|----------------|----------------|----------------|-----------------|
| SF15 61.7% +82 | 58.3% +58 | 54.8% +33 | | |
| SF14.1 54.1% +31 | 53.3% +23 | 51.8% +12 | running | |
| SF14 48.5% -10 | 46.3% -26 | 48.7% -9 | | |
| SF13 35.7% -100 | 42.0% -56 | 44.7% -37 | | |
|-------------------------|----------------|----------------|----------------|-----------------|
| Dragon 2.5 66.2% +113 | 58.0% +56 | 58.7% +61 | running | |
| Dragon 2.0 33.8% -113 | 42.0% -56 | 41.3% -61 | | |
|-------------------------|----------------|----------------|----------------|-----------------|
| | Equals | Equals | Equals | Equals |
| | 40m in 3m/20s | 40m in 6m/40s | 40m in 13m | 40m in 26m |
|-------------------------|----------------|----------------|----------------|-----------------|

Contrary to SF we see Komodo scaling extremely well. It's too early for a final conclusion but it looks like what Mark said earlier, the SF folks better start to focus on LTC and there is nothing wrong with the rating lists.

Phew...

Posts : 880 Join date : 2020-11-25 Location : USA

mwyoung wrote:

To show the scaling issue with Stockfish. I started a match with SF DEV, and a very slow and weak chess search engine when compared to Stockfish NNUE.

The Monte-Carlo Tree Search. The MCTS is searching only 700 NPS to Stockfish's 13 Million NPS. But it has the advantage of good scaling.

Here are the results so far a at only 40moves/1hour

Code:: Score of Dragon 3.1 MCTS vs Stockfish 07/24/22: 0 - 0 - 5 [0.500]
... Dragon 3.1 MCTS playing White: 0 - 0 - 3 [0.500] 3
... Dragon 3.1 MCTS playing Black: 0 - 0 - 2 [0.500] 2
... White vs Black: 0 - 0 - 5 [0.500] 5
Elo difference: 0.0 +/- 0.0, LOS: nan %, DrawRatio: 100.0 %
5 of 200 games finished.

This result so far is hilarious! And shows if you use your engine for long analysis. Also check with better scaling engines also like Lc0, or even MCTS.

Code:: Score of Dragon 3.1 MCTS vs Stockfish 07/24/22: 0 - 0 - 8 [0.500]
... Dragon 3.1 MCTS playing White: 0 - 0 - 4 [0.500] 4
... Dragon 3.1 MCTS playing Black: 0 - 0 - 4 [0.500] 4
... White vs Black: 0 - 0 - 8 [0.500] 8
Elo difference: 0.0 +/- 0.0, LOS: nan %, DrawRatio: 100.0 %
8 of 200 games finished.

Subject: Re: The Stockfish ELO problem Tue Aug 02, 2022 8:50 pm

UPDATE

Code:: |-------------------------|----------------|----------------|----------------|-----------------|
| TC=40/10 ELO | TC=40/10 ELO | TC=40/20 ELO | TC=40/40 ELO | TC=40/80 ELO |
|-------------------------|----------------|----------------|----------------|-----------------|
| Engine 1 CPU | 20 CPU | 20 CPU | 20 CPU | 20 CPU |
|-------------------------|----------------|----------------|----------------|-----------------|
| SF15 61.7% +82 | 58.3% +58 | 54.8% +33 | 55.1% +35 | |
| SF14.1 54.1% +31 | 53.3% +23 | 51.8% +12 | 51.3% +9 | running |
| SF14 48.5% -10 | 46.3% -26 | 48.7% -9 | 47.6% -16 | |
| SF13 35.7% -100 | 42.0% -56 | 44.7% -37 | 46.0% -28 | |
|-------------------------|----------------|----------------|----------------|-----------------|
| Dragon 2.5 66.2% +113 | 58.0% +56 | 58.7% +61 | 58.5% +59 | Done, Komodo |
| Dragon 2.0 33.8% -113 | 42.0% -56 | 41.3% -61 | 41.5% -59 | scales well |
|-------------------------|----------------|----------------|----------------|-----------------|
| | Equals | Equals | Equals | Equals |
| | 40m in 3m/20s | 40m in 6m/40s | 40m in 13m | 40m in 26m |
|-------------------------|----------------|----------------|----------------|-----------------|

Komodo scales extremely well (+56,+61,+59), SF15 kept its ~+34 (no further falling down), last SF run, equals 40m/26m about CCRL 40/15 | CEGT 40/20 at 2CPU.

SF draw rate : 88%
Dragon draw rate : 73%

Posts : 880 Join date : 2020-11-25 Location : USA

Admin wrote:

UPDATE

Code:: |-------------------------|----------------|----------------|----------------|-----------------|
| TC=40/10 ELO | TC=40/10 ELO | TC=40/20 ELO | TC=40/40 ELO | TC=40/80 ELO |
|-------------------------|----------------|----------------|----------------|-----------------|
| Engine 1 CPU | 20 CPU | 20 CPU | 20 CPU | 20 CPU |
|-------------------------|----------------|----------------|----------------|-----------------|
| SF15 61.7% +82 | 58.3% +58 | 54.8% +33 | 55.1% +35 | |
| SF14.1 54.1% +31 | 53.3% +23 | 51.8% +12 | 51.3% +9 | running |
| SF14 48.5% -10 | 46.3% -26 | 48.7% -9 | 47.6% -16 | |
| SF13 35.7% -100 | 42.0% -56 | 44.7% -37 | 46.0% -28 | |
|-------------------------|----------------|----------------|----------------|-----------------|
| Dragon 2.5 66.2% +113 | 58.0% +56 | 58.7% +61 | 58.5% +59 | Done, Komodo |
| Dragon 2.0 33.8% -113 | 42.0% -56 | 41.3% -61 | 41.5% -59 | scales well |
|-------------------------|----------------|----------------|----------------|-----------------|
| | Equals | Equals | Equals | Equals |
| | 40m in 3m/20s | 40m in 6m/40s | 40m in 13m | 40m in 26m |
|-------------------------|----------------|----------------|----------------|-----------------|

Komodo scales extremely well (+56,+61,+59), SF15 kept its ~+34 (no further falling down), last SF run, equals 40m/26m about CCRL 40/15 | CEGT 40/20 at 2CPU.

SF draw rate : 88%
Dragon draw rate : 73%

It gets even worse for SF if you use normal time controls. But we can see it here just fine.

Subject: Re: The Stockfish ELO problem Thu Aug 04, 2022 8:13 am

FINAL RESULT

Code:: |-------------------------|----------------|----------------|----------------|-----------------|
| TC=40/10 ELO | TC=40/10 ELO | TC=40/20 ELO | TC=40/40 ELO | TC=40/80 ELO |
|-------------------------|----------------|----------------|----------------|-----------------|
| Engine 1 CPU | 20 CPU | 20 CPU | 20 CPU | 20 CPU |
|-------------------------|----------------|----------------|----------------|-----------------|
| SF15 61.7% +82 | 58.3% +58 | 54.8% +33 | 55.1% +35 | 53.7% +26 |
| SF14.1 54.1% +31 | 53.3% +23 | 51.8% +12 | 51.3% +9 | 50.3% +2 |
| SF14 48.5% -10 | 46.3% -26 | 48.7% -9 | 47.6% -16 | 48.7% -9 |
| SF13 35.7% -100 | 42.0% -56 | 44.7% -37 | 46.0% -28 | 47.3% -19 |
|-------------------------|----------------|----------------|----------------|-----------------|
| Dragon 2.5 66.2% +113 | 58.0% +56 | 58.7% +61 | 58.5% +59 | Done, Komodo |
| Dragon 2.0 33.8% -113 | 42.0% -56 | 41.3% -61 | 41.5% -59 | scales well |
|-------------------------|----------------|----------------|----------------|-----------------|
| | Equals | Equals | Equals | Equals |
| | 40m in 3m/20s | 40m in 6m/40s | 40m in 13m | 40m in 26m |
|-------------------------|----------------|----------------|----------------|-----------------|

Draw rate : 91.7%

Ugh.

SF15 going down from +82 to finally +26 but it never lost a game in the last run.
SF13 going up from -113 to finally -19

I am beginning to understand why CEGT and CCRL are showing the results as pointed out in the OP.

I am beginning to understand why rating lists like the one of Stefan and the GRL don't have this problem yet. First they play openings that favor SF and also the short time control favors SF.

TCEC is a dead end street, they play unusual openings to survive, the question is, for how long?

Posts : 207 Join date : 2020-11-28

Admin wrote:

mwyoung wrote:: 5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and Bayeselo elo systems no longer function, has reached a sort of upper limit.

I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.

As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.

So if this system no longer works!

That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.

So no need to worry about those pricey computer upgrades for chess engines.

If that is the case we need a new elo formula. Of course progress continues. SF can't win about every TCEC super final without progress.

1)SF cannot win TCEC without progress in unbalanced position but if stockfish cannot create these positions in normal chess games even against weaker opponents then this progress is not relevant for rating in normal games.

2)Even if stockfish is unbeatable there can be progress in beating weaker opponents because even if you know not to make losing mistakes it does not mean that you always know how to take of advantage of losing mistakes in order to win.

Posts : 3112 Join date : 2020-11-18

Uri Blass wrote:: Even if stockfish is unbeatable there can be progress in beating weaker opponents because even if you know not to make losing mistakes it does not mean that you always know how to take of advantage of losing mistakes in order to win.

If a player recognises losing positions, he/she/it should be able to avoid a move that turns a winning position into a non-winning one.

Posts : 207 Join date : 2020-11-28

TheSelfImprover wrote:

Uri Blass wrote:: Even if stockfish is unbeatable there can be progress in beating weaker opponents because even if you know not to make losing mistakes it does not mean that you always know how to take of advantage of losing mistakes in order to win.

If a player recognises losing positions, he/she/it should be able to avoid a move that turns a winning position into a non-winning one.

Being able not to lose does not mean that you are able to win every winning position that you get and does not mean you recognize losing positions.
You practically do not go to inferior positions when you have the choice of equal position even if you do not know if the inferior position is drawn or loss.

Suppose you know in KRPPP vs KRPPP in the same side how to draw and know not to lose material so you practically never lose.

Now suppose that the opponent lose a pawn and go to KRPPP vs KRPP when you have a pawn advantage.
It is possible that now you can win the game but do not know how to do it and that you are going to miss a win later and it does not contradict being able not to lose from the starting KRPPP vs KRPPP.

Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 5:43 pm

For SF to improve its scalling it must disable its nnue...

Also another issue is both Komodo and SF not use the same nnue file, so not a fair compairsement...

Posts : 1254 Join date : 2020-11-17 Location : France

Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).

Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 8:14 pm

Damir Desevac wrote:: For SF to improve its scalling it must disable its nnue...

With a loss of 150-200 elo as a result.

Instead they should work on search, key word: less (taperated) pruning.

Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 8:37 pm

Alright, more on scaling.

We have seen SF to scale not so well while Komodo scales very well.

What about Rebel?

Here are the results of the latest network.

Code:: No. Name Win Draw Loss Unf. Score Games %
----------------------------------------------------------
  1 Rebel    +634 =976 -390 *0 1122.0 2000 56.1%
  2 Seer-2.5.0 +81 =236 -83 *0 199.0 400 49.8%
  3 Stockfish-9 +89 =176 -135 *0 177.0 400 44.2%
  4 Komodo-14 +70 =205 -125 *0 172.5 400 43.1%
  5 Houdini-6.03 +81 =169 -150 *0 165.5 400 41.4%
  6 RubiChess-2.2 +69 =190 -141 *0 164.0 400 41.0%

Ordo performance based on the CCRL blitz list.

ordo -p all.pgn -a 3457

Code::    # PLAYER : RATING POINTS PLAYED (%)
   1 Rebel    : 3492.9 1122.0 2000 56
   2 Seer-2.5.0 : 3491.2 199.0 400 50
   3 Stockfish-9 : 3452.4 177.0 400 44
   4 Komodo-14 : 3444.4 172.5 400 43
   5 Houdini-6.03 : 3431.9 165.5 400 41
   6 RubiChess-2.2 : 3429.1 164.0 400 41

56% is a fantastic result against these giants, but....... based on 40/10 time control.

In the next runs I will double the time control, 40/20 to begin with and I PREDICT that in the end I should be happy with 50% and if that will come true that would mean a loss of 42 elo points because of scaling.

The reason is simple : SEARCH.

Posts : 207 Join date : 2020-11-28

Chris Whittington wrote:: Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).

EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.

Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves.
Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.

Posts : 880 Join date : 2020-11-25 Location : USA

Uri Blass wrote:

Chris Whittington wrote:: Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).

EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.

Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves.
Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.

When I do the reverse match of D3.1 vs SF. Where SF gets the 2 hours in 40 moves. It could tell us something.

But as we know, engines do not play perfectly. As can be shown with the perfect play we have today. 7 man positions, and many test positions. The engines are still clueless.

But a good way to maximize a 32 man TB, would be to play the lines that promote the most complexity. Longest game possible. So you are playing the odds.

Remember it only takes 1 mistake to lose against perfect play. . Assuming, and that is assuming chess is a forced draw.

Posts : 207 Join date : 2020-11-28

mwyoung wrote:

Uri Blass wrote:

Chris Whittington wrote:: Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).

EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.

Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves.
Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.

When I do the reverse match of D3.1 vs SF. Where SF gets the 2 hours in 40 moves. It could tell us something.

But as we know, engines do not play perfectly. As can be shown with the perfect play we have today. 7 man positions, and many test positions. The engines are still clueless.

But a good way to maximize a 32 man TB, would be to play the lines that promote the most complexity. Longest game possible. So you are playing the odds.

Remember it only takes 1 mistake to lose against perfect play. . Assuming, and that is assuming chess is a forced draw.

I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position.

Basically there are 2 questions:
1)Do engines play perfect games with no losing mistake?
2)Is it possible to find some strategy to beat chess engines?

It is possible that the reply to both question is positive.
I do not know about a single software that does a serious try to win against a specific opponent (when it get the details of the opponent and the time control) including using the fact that it has significantly more time to predict the moves of the opponent and pruning moves not because they are bad but because the opponent is not expected to play them(predicting the moves can be done simply by asking a copy of the opponent to search).

Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 10:57 pm

40/20 result

Code:: No. Name Win Draw Loss Unf. Score Games %
----------------------------------------------------------
  1 Rebel    +569 =1030 -401 *0 1084.0 2000 54.2%
  2 Seer-2.5.0 +92 =236 -72 *0 210.0 400 52.5%
  3 Komodo-14 +78 =211 -111 *0 183.5 400 45.9%
  4 Stockfish-9 +79 =199 -122 *0 178.5 400 44.6%
  5 RubiChess-2.2 +75 =205 -120 *0 177.5 400 44.4%
  6 Houdini-6.03 +77 =179 -144 *0 166.5 400 41.6%

Grand General

Code:: TC SCORE PERF
40/10 56.1% 3492
40/20 54.2% 3481
40/40
40/80

Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 10:59 pm

On my new pc , rebel is very very strong.

Posts : 880 Join date : 2020-11-25 Location : USA

Uri Blass wrote:

mwyoung wrote:

Uri Blass wrote:

Chris Whittington wrote:: Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).

EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.

Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves.
Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.

When I do the reverse match of D3.1 vs SF. Where SF gets the 2 hours in 40 moves. It could tell us something.

But as we know, engines do not play perfectly. As can be shown with the perfect play we have today. 7 man positions, and many test positions. The engines are still clueless.

But a good way to maximize a 32 man TB, would be to play the lines that promote the most complexity. Longest game possible. So you are playing the odds.

Remember it only takes 1 mistake to lose against perfect play. . Assuming, and that is assuming chess is a forced draw.

I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position.

Basically there are 2 questions:
1)Do engines play perfect games with no losing mistake?
2)Is it possible to find some strategy to beat chess engines?

It is possible that the reply to both question is positive.
I do not know about a single software that does a serious try to win against a specific opponent (when it get the details of the opponent and the time control) including using the fact that it has significantly more time to predict the moves of the opponent and pruning moves not because they are bad but because the opponent is not expected to play them(predicting the moves can be done simply by asking a copy of the opponent to search).

"I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position."

Yes it does contradict that!
What are you talking about?

Otherwise chess engines would play perfect chess in all 7 man positions, and test positions. As that is a subset of all opening positions.

1)Do engines play perfect games with no losing mistake?

Yes they can play perfectly, depending on the opponent. Remember chess is a 2 player game. If for example I play into fools mate, that would be a perfect game. And many engines, and other humans , and I could play a perfect game.

And that is why you can not assume draw, or win results = perfect play. Between 2 imperfect players.

2)Is it possible to find some strategy to beat chess engines?

Yes as we know that chess engines do not play perfect chess. Until chess engines do you can still improve.

» Testing the New Stockfish Net of 45 Mb, Stockfish 190521 vs Stockfish 13 (TC = 5m+5s) (32 Threads)
» Testing the New Stockfish Net of 45 Mb vs 20 Mb, Stockfish 190521 vs Stockfish 13 (TC = 1m+1s)
» Stockfish 110221 Vs. Stockfish 12, Ratings Gain Test, 5000 Games.
» How much Progress? Stockfish 14 vs Stockfish 22/11/21 (NUMA issue fixed version)
» Stockfish NNUE 310121 (1 Core) vs Stockfish 11 (32 Threads) (5m game)