The Stockfish ELO problem

Subject: Re: The Stockfish ELO problem Wed Aug 10, 2022 1:23 am

40/40 result

Code:: No. Name Win Draw Loss Unf. Score Games %
----------------------------------------------------------
  1 Rebel    +253 =548 -199 *0 527.0  1000 52.7%
  2 Seer-2.5.0 +37 =139 -24 *0 106.5 200 53.2%
  3 Stockfish-9 +48 =99 -53 *0 97.5 200 48.8%
  4 RubiChess-2.2 +40 =109 -51 *0 94.5 200 47.2%
  5 Komodo-14 +37 =103 -60 *0 88.5 200 44.2%
  6 Houdini-6.03 +37 =98 -65 *0 86.0 200 43.0%

Grand General Scaling

Code:: TC SCORE PERF
40/10 56.1% 3492
40/20 54.2% 3481
40/40 52.7% 3472
40/80

Subject: Re: The Stockfish ELO problem Wed Aug 10, 2022 2:47 pm

40/80 result

Code:: No. Name Win Draw Loss Unf. Score Games %
----------------------------------------------------------
  1 Rebel +252 =551 -164 *0 527.5 967 54.6%
  2 Seer-2.5.0 +37 =130 -26 *0 102.0 193 52.8%
  3 Houdini-6.03 +42 =91 -61 *0 87.5 194 45.1%
  4 RubiChess-2.2 +22 =127 -44 *0 85.5 193 44.3%
  5 Stockfish-9 +34 =97 -63 *0 82.5 194 42.5%
  6 Komodo-14 +29 =106 -58 *0 82.0 193 42.5%

Grand General Scaling

Code::    TC SCORE PERF
   40/10 56.1% 3492
   40/20 54.2% 3481
   40/40 52.7% 3472
   40/80 54.6% 3483

Subject: Re: The Stockfish ELO problem Wed Aug 10, 2022 2:48 pm

And so I was wrong.

Posts : 1254 Join date : 2020-11-17 Location : France

Admin wrote:: And so I was wrong.

You should be wrong more often

Posts : 1254 Join date : 2020-11-17 Location : France

Uri Blass wrote:

Chris Whittington wrote:: Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).

EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.

Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves.
Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.

There’s no reason an EGTB should not contain [move count to draw], well, other than size constraints. Then, move selection would be fastest to win, else slowest to draw, else slowest to loss. No doubt there are several added possibilities for move selection when EGTB=win (prioritise moves that appear to throw material, prioritise moves that keep as much material on board as possible, prioritise moves that appear to damage your own king safety, prioritise moves that a trained NN identifies the position as being “unchesslike”, and so on). You could MCTS your way forward a few ply and prioritise moves that have a high MCTS own loss count. Or prioritise moves to which the MCTS replies have a high opponent loss count.

Anyway, my post was to do with the scaling discussion, although I quite like my idea of a simultaneous MCTS search.

Posts : 1254 Join date : 2020-11-17 Location : France

Admin wrote:: And so I was wrong.

How did it overall match my prediction (I’ve forgotten what that was)?

Posts : 207 Join date : 2020-11-28

mwyoung wrote:

Uri Blass wrote:

mwyoung wrote:

Uri Blass wrote:

Chris Whittington wrote:: Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).

EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.

Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves.
Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.

When I do the reverse match of D3.1 vs SF. Where SF gets the 2 hours in 40 moves. It could tell us something.

But as we know, engines do not play perfectly. As can be shown with the perfect play we have today. 7 man positions, and many test positions. The engines are still clueless.

But a good way to maximize a 32 man TB, would be to play the lines that promote the most complexity. Longest game possible. So you are playing the odds.

Remember it only takes 1 mistake to lose against perfect play. . Assuming, and that is assuming chess is a forced draw.

I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position.

Basically there are 2 questions:
1)Do engines play perfect games with no losing mistake?
2)Is it possible to find some strategy to beat chess engines?

It is possible that the reply to both question is positive.
I do not know about a single software that does a serious try to win against a specific opponent (when it get the details of the opponent and the time control) including using the fact that it has significantly more time to predict the moves of the opponent and pruning moves not because they are bad but because the opponent is not expected to play them(predicting the moves can be done simply by asking a copy of the opponent to search).

"I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position."

Yes it does contradict that!
What are you talking about?

Otherwise chess engines would play perfect chess in all 7 man positions, and test positions. As that is a subset of all opening positions.

1)Do engines play perfect games with no losing mistake?

Yes they can play perfectly, depending on the opponent. Remember chess is a 2 player game. If for example I play into fools mate, that would be a perfect game. And many engines, and other humans , and I could play a perfect game.

And that is why you can not assume draw, or win results = perfect play. Between 2 imperfect players.

2)Is it possible to find some strategy to beat chess engines?

Yes as we know that chess engines do not play perfect chess. Until chess engines do you can still improve.

1)Engines need to play perfect chess only in positions that they can practically get in games in order to play perfect chess.
If there is no way to force them to get some test position that they do not find the right move then the fact that they do not play the right move in it is not relevant for games.

I am not 100% sure that chess is a draw with perfect play but it is what I believe unless I see evidence that support that it is not correct(if we see some chess engine that practically wins every game with white I am going to change my mind)

2)I believe it is possible to find a strategy to beat the top engines but the only possible proof for it is giving a game when stockfish is losing and not draw.

Posts : 222 Join date : 2021-08-28

The current dev version of Rebel seems to be significantly stronger than Rebel 15.1 according to the results published so far, I guess more than 50 Elo, congratulations.

Subject: Re: The Stockfish ELO problem Thu Aug 11, 2022 10:19 am

Chris Whittington wrote:

Admin wrote:: And so I was wrong.

How did it overall match my prediction (I’ve forgotten what that was)?

I predicted 50%.

Subject: Re: The Stockfish ELO problem Thu Aug 11, 2022 10:21 am

Dio wrote:: The current dev version of Rebel seems to be significantly stronger than Rebel 15.1 according to the results published so far, I guess more than 50 Elo, congratulations.

My test result show 15-20 elo max but are in an early stage.

How did you get to 50 elo?

Posts : 233 Join date : 2021-10-08

I was confused by that but concluded that Dio must have meant "stronger than Rebel 15" not 15.1, as I don't think you guys have published results yet Smile

?

Posts : 222 Join date : 2021-08-28

I had suspected that a more recent version of Rebel 15.1 was playing the results in this thread.

In particular, I consider Rubichess, Komodo 14 and Seer 2.50 to be about 50 Elo stronger than Rebel 15.1.

My mistake.

Posts : 622 Join date : 2020-11-23

SF had been making very little progress for the 20 months leading up to NNUE (less than 50 Elo points). The change allowed for some rapid improvements, for a time. Since the architecture update before last (v4), over half a year ago, we've seen no progress at all, which is why rating lists will show either a regression or positive results for the latest version, depending on testing conditions.

Posts : 880 Join date : 2020-11-25 Location : USA

Uri Blass wrote:

mwyoung wrote:

Uri Blass wrote:

mwyoung wrote:

Uri Blass wrote:

Chris Whittington wrote:: Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).

EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.

Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves.
Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.

When I do the reverse match of D3.1 vs SF. Where SF gets the 2 hours in 40 moves. It could tell us something.

But as we know, engines do not play perfectly. As can be shown with the perfect play we have today. 7 man positions, and many test positions. The engines are still clueless.

But a good way to maximize a 32 man TB, would be to play the lines that promote the most complexity. Longest game possible. So you are playing the odds.

Remember it only takes 1 mistake to lose against perfect play. . Assuming, and that is assuming chess is a forced draw.

I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position.

Basically there are 2 questions:
1)Do engines play perfect games with no losing mistake?
2)Is it possible to find some strategy to beat chess engines?

It is possible that the reply to both question is positive.
I do not know about a single software that does a serious try to win against a specific opponent (when it get the details of the opponent and the time control) including using the fact that it has significantly more time to predict the moves of the opponent and pruning moves not because they are bad but because the opponent is not expected to play them(predicting the moves can be done simply by asking a copy of the opponent to search).

"I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position."

Yes it does contradict that!
What are you talking about?

Otherwise chess engines would play perfect chess in all 7 man positions, and test positions. As that is a subset of all opening positions.

1)Do engines play perfect games with no losing mistake?

Yes they can play perfectly, depending on the opponent. Remember chess is a 2 player game. If for example I play into fools mate, that would be a perfect game. And many engines, and other humans , and I could play a perfect game.

And that is why you can not assume draw, or win results = perfect play. Between 2 imperfect players.

2)Is it possible to find some strategy to beat chess engines?

Yes as we know that chess engines do not play perfect chess. Until chess engines do you can still improve.

1)Engines need to play perfect chess only in positions that they can practically get in games in order to play perfect chess.
If there is no way to force them to get some test position that they do not find the right move then the fact that they do not play the right move in it is not relevant for games.

I am not 100% sure that chess is a draw with perfect play but it is what I believe unless I see evidence that support that it is not correct(if we see some chess engine that practically wins every game with white I am going to change my mind)

2)I believe it is possible to find a strategy to beat the top engines but the only possible proof for it is giving a game when stockfish is losing and not draw.

Uri, you do not understand.

The whole game tree is connected. If you can show imperfect play at a lower level then the starting position. You are proving, and showing imperfect play at a higher level. More men on the chess board.

Subject: Re: The Stockfish ELO problem Fri Aug 12, 2022 1:04 pm

For clarity reasons I let Rebel 15.1 run the same opponents for an elo comparison of the new net.

For CCRL

Code::    CCRL ELO POOL : 3457
      REBEL-NEW-NNUE    | REBEL 15.1 (current elo 3465)
   TC SCORE PERF | SCORE PERF
   40/10 56.1% 3499 | 53.2% 3479 +20
   40/20 54.2% 3486 | 52.7% 3476 +10
   40/40 52.7% 3476 | 51.3% 3466 +10
   40/80 54.6% 3489 | 51.2% 3466 +23

For CEGT

Code::    CEGT ELO POOL : 3381
   REBEL-NEW-NNUE   |  REBEL 15.1 (current elo 3371)
   TC SCORE PERF | SCORE PERF
   40/10 56.1% 3423 | 53.2% 3403 +20
   40/20 54.2% 3410 | 52.7% 3400 +10
   40/40 52.7% 3400 | 51.3% 3390 +10
   40/80 54.6% 3413 | 51.2% 3390 +23

Seems my other testing showing an estimated elo increase of 15-20 points is not far from the truth.

Or in other words, not enough for a release.

Posts : 207 Join date : 2020-11-28

mwyoung wrote:

Uri Blass wrote:

mwyoung wrote:

Uri Blass wrote:

mwyoung wrote:

Uri Blass wrote:

Chris Whittington wrote:: Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).

EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.

Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves.
Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.

When I do the reverse match of D3.1 vs SF. Where SF gets the 2 hours in 40 moves. It could tell us something.

But as we know, engines do not play perfectly. As can be shown with the perfect play we have today. 7 man positions, and many test positions. The engines are still clueless.

But a good way to maximize a 32 man TB, would be to play the lines that promote the most complexity. Longest game possible. So you are playing the odds.

Remember it only takes 1 mistake to lose against perfect play. . Assuming, and that is assuming chess is a forced draw.

I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position.

Basically there are 2 questions:
1)Do engines play perfect games with no losing mistake?
2)Is it possible to find some strategy to beat chess engines?

It is possible that the reply to both question is positive.
I do not know about a single software that does a serious try to win against a specific opponent (when it get the details of the opponent and the time control) including using the fact that it has significantly more time to predict the moves of the opponent and pruning moves not because they are bad but because the opponent is not expected to play them(predicting the moves can be done simply by asking a copy of the opponent to search).

"I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position."

Yes it does contradict that!
What are you talking about?

Otherwise chess engines would play perfect chess in all 7 man positions, and test positions. As that is a subset of all opening positions.

1)Do engines play perfect games with no losing mistake?

Yes they can play perfectly, depending on the opponent. Remember chess is a 2 player game. If for example I play into fools mate, that would be a perfect game. And many engines, and other humans , and I could play a perfect game.

And that is why you can not assume draw, or win results = perfect play. Between 2 imperfect players.

2)Is it possible to find some strategy to beat chess engines?

Yes as we know that chess engines do not play perfect chess. Until chess engines do you can still improve.

1)Engines need to play perfect chess only in positions that they can practically get in games in order to play perfect chess.
If there is no way to force them to get some test position that they do not find the right move then the fact that they do not play the right move in it is not relevant for games.

I am not 100% sure that chess is a draw with perfect play but it is what I believe unless I see evidence that support that it is not correct(if we see some chess engine that practically wins every game with white I am going to change my mind)

2)I believe it is possible to find a strategy to beat the top engines but the only possible proof for it is giving a game when stockfish is losing and not draw.

Uri, you do not understand.

The whole game tree is connected. If you can show imperfect play at a lower level then the starting position. You are proving, and showing imperfect play at a higher level. More men on the chess board.

I disagree.
Suppose you start with KRP vs KRP draw position and do not have tablebases but you have search that is deep enough not to lose material.

You can play perfectly KRP vs KRP without losing a pawn and get a draw but not play perfectly KRP vs KR that you never get in games.

Posts : 880 Join date : 2020-11-25 Location : USA

Uri Blass wrote:

mwyoung wrote:

Uri Blass wrote:

mwyoung wrote:

Uri Blass wrote:

mwyoung wrote:

Uri Blass wrote:

Chris Whittington wrote:: Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).

EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.

Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves.
Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.

When I do the reverse match of D3.1 vs SF. Where SF gets the 2 hours in 40 moves. It could tell us something.

But as we know, engines do not play perfectly. As can be shown with the perfect play we have today. 7 man positions, and many test positions. The engines are still clueless.

But a good way to maximize a 32 man TB, would be to play the lines that promote the most complexity. Longest game possible. So you are playing the odds.

Remember it only takes 1 mistake to lose against perfect play. . Assuming, and that is assuming chess is a forced draw.

I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position.

Basically there are 2 questions:
1)Do engines play perfect games with no losing mistake?
2)Is it possible to find some strategy to beat chess engines?

It is possible that the reply to both question is positive.
I do not know about a single software that does a serious try to win against a specific opponent (when it get the details of the opponent and the time control) including using the fact that it has significantly more time to predict the moves of the opponent and pruning moves not because they are bad but because the opponent is not expected to play them(predicting the moves can be done simply by asking a copy of the opponent to search).

"I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position."

Yes it does contradict that!
What are you talking about?

Otherwise chess engines would play perfect chess in all 7 man positions, and test positions. As that is a subset of all opening positions.

1)Do engines play perfect games with no losing mistake?

Yes they can play perfectly, depending on the opponent. Remember chess is a 2 player game. If for example I play into fools mate, that would be a perfect game. And many engines, and other humans , and I could play a perfect game.

And that is why you can not assume draw, or win results = perfect play. Between 2 imperfect players.

2)Is it possible to find some strategy to beat chess engines?

Yes as we know that chess engines do not play perfect chess. Until chess engines do you can still improve.

1)Engines need to play perfect chess only in positions that they can practically get in games in order to play perfect chess.
If there is no way to force them to get some test position that they do not find the right move then the fact that they do not play the right move in it is not relevant for games.

I am not 100% sure that chess is a draw with perfect play but it is what I believe unless I see evidence that support that it is not correct(if we see some chess engine that practically wins every game with white I am going to change my mind)

2)I believe it is possible to find a strategy to beat the top engines but the only possible proof for it is giving a game when stockfish is losing and not draw.

Uri, you do not understand.

The whole game tree is connected. If you can show imperfect play at a lower level then the starting position. You are proving, and showing imperfect play at a higher level. More men on the chess board.

I disagree.
Suppose you start with KRP vs KRP draw position and do not have tablebases but you have search that is deep enough not to lose material.

You can play perfectly KRP vs KRP without losing a pawn and get a draw but not play perfectly KRP vs KR that you never get in games.

Really, Really.....

That makes no sense. Your argument

That point is did you play perfectly in KRP vs krp . And you do not have to lose material to play imperfectly. And it does not matter the outcome from 2 imperfect players.

1)Do engines play perfect games with no losing mistake?

Yes they can play perfectly, depending on the opponent. Remember chess is a 2 player game. If for example I play into fools mate, that would be a perfect game. And many engines, and other humans , and I could play a perfect game.

And that is why you can not assume draw, or win results = perfect play. Between 2 imperfect players.

Posts : 880 Join date : 2020-11-25 Location : USA

http://talkchess.com/forum3/viewtopic.php?f=2&t=80444&start=20

Code:: Re: The Stockfish ELO problem
Post by Rebel » Sun Aug 07, 2022 6:21 pm

Sopel wrote: ↑Sun Aug 07, 2022 12:49 pm
Rebel wrote: ↑Sat Aug 06, 2022 8:04 pm
Some remarks
1. Komodo scales extremely well (+56,+61,+59).
2. SF15 went down from +82 to +26 (last SF run, equals 40m/26m about CCRL 40/15 | CEGT 40/20 at 2CPU).
3. SF13 went up from -100 to -19.
4. Draw rate last SF run 91.7% but SF15 never lost a game.
That's very good news. An oracle engine doesn't scale, so Stockfish is closer to a perfect engine compared to komodo.

Meaning at increasing time control and more threads Komodo can catch up and overtake you? Oh wait, it already happened :wink:

It looks like Ed is going to be correct, unless Stockfish starts winning in my match.

I do not understand the need for people to put out such crazy claims in defense of Stockfish. Like perfect engines do not scale. lol!

All this is testable, and we know Stockfish is not a perfect chess engine just from testing.

After 24 games in the Reverse match where Stockfish has the massive time odds advantage playing Dragon 3.1 of 40/2hours vs 3m+2s.

The scores is +0 -0 =24 plus 1 fail win in a draw score position.

In the first match where Dragon 3.1 had the massive time advantage. Dragon 3.1 won that match +4 -1 =94

» Testing the New Stockfish Net of 45 Mb, Stockfish 190521 vs Stockfish 13 (TC = 5m+5s) (32 Threads)
» Testing the New Stockfish Net of 45 Mb vs 20 Mb, Stockfish 190521 vs Stockfish 13 (TC = 1m+1s)
» Stockfish 110221 Vs. Stockfish 12, Ratings Gain Test, 5000 Games.
» How much Progress? Stockfish 14 vs Stockfish 22/11/21 (NUMA issue fixed version)
» Stockfish NNUE 310121 (1 Core) vs Stockfish 11 (32 Threads) (5m game)