1. Following the CCRL 40/15 4CPU list Stockfish 13-15 made no progress at all, version 14.1 is even worse.
2. However the CCRL 40/2 1CPU list looks as expected and as announced by the SF team, nice and steady progress, +67 elo.
3. The same can be said about the CEGT 40/20 1CPU list, steady progress, although the progress is less, +29 elo.
4. But like the CCRL 40/15 4CPU list the CEGT 40/20 1CPU list is as confusing, no progress since version 13.
Various issues (in the mix!) may play a role: 1. Stockfish scales bad, assumption, not a fact. 2. The used elo pools by the testers, for examples see the Average Opponent column in the rating lists. 3. The use of SF derivatives that may lower the real strength of the real one. 4. The fact that the more CPU's plus the higher time control will make the opposition stronger, a good example is [2] vs [3], +67 elo vs +29 elo. 5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and Bayeselo elo systems no longer function, has reached a sort of upper limit.
To shed some other light on the issue I decided to pitch the 4 versions against each other in a robin round, on 20 cores and increasing time controls. Let's see how that goes and if we see the same pattern arise.
Code:
------------------------|----------------|----------------|----------------|-----------------| TC=40/10 ELO | TC=40/10 ELO | TC=40/20 ELO | TC=40/40 ELO | TC=40/80 ELO | ------------------------|----------------|----------------|----------------|-----------------| Engine 1 CPU | 20 CPU | 20 CPU | 20 CPU | 20 CPU | ------------------------|----------------|----------------|----------------|-----------------| SF15 61.7% +82 | 58.3% +58 | | | | SF14.1 54.1% +31 | 53.3% +23 | running | | | SF14 48.5% -10 | 46.3% -26 | | | | SF13 35.7% -100 | 42.0% -56 | | | | ------------------------|----------------|----------------|----------------|-----------------|
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: The Stockfish ELO problem Mon Aug 01, 2022 11:30 am
5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and Bayeselo elo systems no longer function, has reached a sort of upper limit.
I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.
As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.
So if this system no longer works!
That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.
So no need to worry about those pricey computer upgrades for chess engines.
TheSelfImprover
Posts : 3112 Join date : 2020-11-18
Subject: Re: The Stockfish ELO problem Mon Aug 01, 2022 11:45 am
mwyoung wrote:
5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and Bayeselo elo systems no longer function, has reached a sort of upper limit.
I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.
As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.
So if this system no longer works!
That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.
So no need to worry about those pricey computer upgrades for chess engines.
We might not be there yet, but it look to me as though that's the direction of travel we're on.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: The Stockfish ELO problem Mon Aug 01, 2022 6:47 pm
mwyoung wrote:
5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and Bayeselo elo systems no longer function, has reached a sort of upper limit.
I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.
As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.
So if this system no longer works!
That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.
So no need to worry about those pricey computer upgrades for chess engines.
If that is the case we need a new elo formula. Of course progress continues. SF can't win about every TCEC super final without progress.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: The Stockfish ELO problem Mon Aug 01, 2022 9:11 pm
UPDATE
Code:
|-------------------------|----------------|----------------|----------------|-----------------| | TC=40/10 ELO | TC=40/10 ELO | TC=40/20 ELO | TC=40/40 ELO | TC=40/80 ELO | |-------------------------|----------------|----------------|----------------|-----------------| | Engine 1 CPU | 20 CPU | 20 CPU | 20 CPU | 20 CPU | |-------------------------|----------------|----------------|----------------|-----------------| | SF15 61.7% +82 | 58.3% +58 | 54.8% +33 | | | | SF14.1 54.1% +31 | 53.3% +23 | 51.8% +12 | running | | | SF14 48.5% -10 | 46.3% -26 | 48.7% -9 | | | | SF13 35.7% -100 | 42.0% -56 | 44.7% -37 | | | |-------------------------|----------------|----------------|----------------|-----------------| | | Equals | Equals | Equals | Equals | | | 40m in 3m/20s | 40m in 6m/40s | 40m in 13m | 40m in 26m | |-------------------------|----------------|----------------|----------------|-----------------| | Dragon 2.5 running | | | | | | Dragon 2.0 | | | | | |-------------------------|----------------|----------------|----------------|-----------------|
Draws : 86.2%
I can already see the first contours loom of the rating lists and option 5.
OTOH it can be a specific SF scaling issue and so I also added Komodo to the same test. Dragon 2.5 is considerable stronger than 2.0 but if it shows the same pattern as SF option 5 becomes a serious reality.
We will see....
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: The Stockfish ELO problem Mon Aug 01, 2022 9:38 pm
Admin wrote:
mwyoung wrote:
5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and Bayeselo elo systems no longer function, has reached a sort of upper limit.
I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.
As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.
So if this system no longer works!
That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.
So no need to worry about those pricey computer upgrades for chess engines.
If that is the case we need a new elo formula. Of course progress continues. SF can't win about every TCEC super final without progress.
Ed there is nothing wrong with the Elo system. That is not the issue. You could put a 32 man table base in the mix, and the Elo system would give a accurate rating for the conditions of the rating pool.
My feeling is that it is not a priority for Stockfish, or the other engines to tune for long time control chess or many cores. As they only tune at very fast time controls like micro bullet. As they want thousands of games to judge the changes to a engine.
But those changes do not always work at longer time controls, or with with more cores.
Stockfish is a perfect example. As we see advancement at micro bullet and bullet time controls.
I think it is a issue of priorities.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: The Stockfish ELO problem Mon Aug 01, 2022 10:19 pm
To show the scaling issue with Stockfish. I started a match with SF DEV, and a very slow and weak chess search engine when compared to Stockfish NNUE.
The Monte-Carlo Tree Search. The MCTS is searching only 700 NPS to Stockfish's 13 Million NPS. But it has the advantage of good scaling.
Here are the results so far a at only 40moves/1hour
Code:
Score of Dragon 3.1 MCTS vs Stockfish 07/24/22: 0 - 0 - 5 [0.500] ... Dragon 3.1 MCTS playing White: 0 - 0 - 3 [0.500] 3 ... Dragon 3.1 MCTS playing Black: 0 - 0 - 2 [0.500] 2 ... White vs Black: 0 - 0 - 5 [0.500] 5 Elo difference: 0.0 +/- 0.0, LOS: nan %, DrawRatio: 100.0 % 5 of 200 games finished.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: The Stockfish ELO problem Tue Aug 02, 2022 8:54 am
Contrary to SF we see Komodo scaling extremely well. It's too early for a final conclusion but it looks like what Mark said earlier, the SF folks better start to focus on LTC and there is nothing wrong with the rating lists.
Phew...
Dio likes this post
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: The Stockfish ELO problem Tue Aug 02, 2022 11:00 am
mwyoung wrote:
To show the scaling issue with Stockfish. I started a match with SF DEV, and a very slow and weak chess search engine when compared to Stockfish NNUE.
The Monte-Carlo Tree Search. The MCTS is searching only 700 NPS to Stockfish's 13 Million NPS. But it has the advantage of good scaling.
Here are the results so far a at only 40moves/1hour
Code:
Score of Dragon 3.1 MCTS vs Stockfish 07/24/22: 0 - 0 - 5 [0.500] ... Dragon 3.1 MCTS playing White: 0 - 0 - 3 [0.500] 3 ... Dragon 3.1 MCTS playing Black: 0 - 0 - 2 [0.500] 2 ... White vs Black: 0 - 0 - 5 [0.500] 5 Elo difference: 0.0 +/- 0.0, LOS: nan %, DrawRatio: 100.0 % 5 of 200 games finished.
This result so far is hilarious! And shows if you use your engine for long analysis. Also check with better scaling engines also like Lc0, or even MCTS.
Code:
Score of Dragon 3.1 MCTS vs Stockfish 07/24/22: 0 - 0 - 8 [0.500] ... Dragon 3.1 MCTS playing White: 0 - 0 - 4 [0.500] 4 ... Dragon 3.1 MCTS playing Black: 0 - 0 - 4 [0.500] 4 ... White vs Black: 0 - 0 - 8 [0.500] 8 Elo difference: 0.0 +/- 0.0, LOS: nan %, DrawRatio: 100.0 % 8 of 200 games finished.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: The Stockfish ELO problem Tue Aug 02, 2022 8:50 pm
Komodo scales extremely well (+56,+61,+59), SF15 kept its ~+34 (no further falling down), last SF run, equals 40m/26m about CCRL 40/15 | CEGT 40/20 at 2CPU.
SF draw rate : 88% Dragon draw rate : 73%
Mclane likes this post
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: The Stockfish ELO problem Tue Aug 02, 2022 9:05 pm
Komodo scales extremely well (+56,+61,+59), SF15 kept its ~+34 (no further falling down), last SF run, equals 40m/26m about CCRL 40/15 | CEGT 40/20 at 2CPU.
SF draw rate : 88% Dragon draw rate : 73%
It gets even worse for SF if you use normal time controls. But we can see it here just fine.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: The Stockfish ELO problem Thu Aug 04, 2022 8:13 am
SF15 going down from +82 to finally +26 but it never lost a game in the last run. SF13 going up from -113 to finally -19
I am beginning to understand why CEGT and CCRL are showing the results as pointed out in the OP.
I am beginning to understand why rating lists like the one of Stefan and the GRL don't have this problem yet. First they play openings that favor SF and also the short time control favors SF.
TCEC is a dead end street, they play unusual openings to survive, the question is, for how long?
TheSelfImprover, mwyoung and Dio like this post
Uri Blass
Posts : 207 Join date : 2020-11-28
Subject: Re: The Stockfish ELO problem Mon Aug 08, 2022 11:43 am
Admin wrote:
mwyoung wrote:
5. An unproven assumption, that SF has become so strong (with high draw percentages as a result) the Ordo and Bayeselo elo systems no longer function, has reached a sort of upper limit.
I like 5! As this means chess engines are no longer making progress. As Stockfish on current hardware has become unbeatable.
As the Elo system is pretty simple. If you win your rating goes up, if you lose your rating goes down. If you draw your rating goes up, or down, or stays the same depending on the other players ratings.
So if this system no longer works!
That means chess engines progress is also at a end. And any chess engine still being developed can only hope to match Stockfish's rating.
So no need to worry about those pricey computer upgrades for chess engines.
If that is the case we need a new elo formula. Of course progress continues. SF can't win about every TCEC super final without progress.
1)SF cannot win TCEC without progress in unbalanced position but if stockfish cannot create these positions in normal chess games even against weaker opponents then this progress is not relevant for rating in normal games.
2)Even if stockfish is unbeatable there can be progress in beating weaker opponents because even if you know not to make losing mistakes it does not mean that you always know how to take of advantage of losing mistakes in order to win.
Ozymandias likes this post
TheSelfImprover
Posts : 3112 Join date : 2020-11-18
Subject: Re: The Stockfish ELO problem Mon Aug 08, 2022 1:52 pm
Uri Blass wrote:
Even if stockfish is unbeatable there can be progress in beating weaker opponents because even if you know not to make losing mistakes it does not mean that you always know how to take of advantage of losing mistakes in order to win.
If a player recognises losing positions, he/she/it should be able to avoid a move that turns a winning position into a non-winning one.
Uri Blass
Posts : 207 Join date : 2020-11-28
Subject: Re: The Stockfish ELO problem Mon Aug 08, 2022 5:07 pm
TheSelfImprover wrote:
Uri Blass wrote:
Even if stockfish is unbeatable there can be progress in beating weaker opponents because even if you know not to make losing mistakes it does not mean that you always know how to take of advantage of losing mistakes in order to win.
If a player recognises losing positions, he/she/it should be able to avoid a move that turns a winning position into a non-winning one.
Being able not to lose does not mean that you are able to win every winning position that you get and does not mean you recognize losing positions. You practically do not go to inferior positions when you have the choice of equal position even if you do not know if the inferior position is drawn or loss.
Suppose you know in KRPPP vs KRPPP in the same side how to draw and know not to lose material so you practically never lose.
Now suppose that the opponent lose a pawn and go to KRPPP vs KRPP when you have a pawn advantage. It is possible that now you can win the game but do not know how to do it and that you are going to miss a win later and it does not contradict being able not to lose from the starting KRPPP vs KRPPP.
Damir Desevac
Posts : 330 Join date : 2020-11-27 Age : 43 Location : Denmark
Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 5:43 pm
For SF to improve its scalling it must disable its nnue...
Also another issue is both Komodo and SF not use the same nnue file, so not a fair compairsement...
Chris Whittington
Posts : 1254 Join date : 2020-11-17 Location : France
Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 6:02 pm
Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).
Ozymandias likes this post
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 8:14 pm
Damir Desevac wrote:
For SF to improve its scalling it must disable its nnue...
With a loss of 150-200 elo as a result.
Instead they should work on search, key word: less (taperated) pruning.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 8:37 pm
Alright, more on scaling.
We have seen SF to scale not so well while Komodo scales very well.
56% is a fantastic result against these giants, but....... based on 40/10 time control.
In the next runs I will double the time control, 40/20 to begin with and I PREDICT that in the end I should be happy with 50% and if that will come true that would mean a loss of 42 elo points because of scaling.
The reason is simple : SEARCH.
Uri Blass
Posts : 207 Join date : 2020-11-28
Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 9:16 pm
Chris Whittington wrote:
Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).
EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.
Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves. Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.
Last edited by Uri Blass on Tue Aug 09, 2022 9:18 pm; edited 1 time in total (Reason for editing : changed small letter to big letter)
Ozymandias likes this post
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 9:27 pm
Uri Blass wrote:
Chris Whittington wrote:
Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).
EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.
Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves. Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.
When I do the reverse match of D3.1 vs SF. Where SF gets the 2 hours in 40 moves. It could tell us something.
But as we know, engines do not play perfectly. As can be shown with the perfect play we have today. 7 man positions, and many test positions. The engines are still clueless.
But a good way to maximize a 32 man TB, would be to play the lines that promote the most complexity. Longest game possible. So you are playing the odds.
Remember it only takes 1 mistake to lose against perfect play. . Assuming, and that is assuming chess is a forced draw.
Uri Blass
Posts : 207 Join date : 2020-11-28
Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 10:53 pm
mwyoung wrote:
Uri Blass wrote:
Chris Whittington wrote:
Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).
EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.
Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves. Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.
When I do the reverse match of D3.1 vs SF. Where SF gets the 2 hours in 40 moves. It could tell us something.
But as we know, engines do not play perfectly. As can be shown with the perfect play we have today. 7 man positions, and many test positions. The engines are still clueless.
But a good way to maximize a 32 man TB, would be to play the lines that promote the most complexity. Longest game possible. So you are playing the odds.
Remember it only takes 1 mistake to lose against perfect play. . Assuming, and that is assuming chess is a forced draw.
I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position.
Basically there are 2 questions: 1)Do engines play perfect games with no losing mistake? 2)Is it possible to find some strategy to beat chess engines?
It is possible that the reply to both question is positive. I do not know about a single software that does a serious try to win against a specific opponent (when it get the details of the opponent and the time control) including using the fact that it has significantly more time to predict the moves of the opponent and pruning moves not because they are bad but because the opponent is not expected to play them(predicting the moves can be done simply by asking a copy of the opponent to search).
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 10:57 pm
Posts : 3022 Join date : 2020-11-17 Age : 57 Location : United States of Europe, Germany, Ruhr area
Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 10:59 pm
On my new pc , rebel is very very strong.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: The Stockfish ELO problem Tue Aug 09, 2022 11:12 pm
Uri Blass wrote:
mwyoung wrote:
Uri Blass wrote:
Chris Whittington wrote:
Arguing from limits: Upper bound of a NNUE is EGTB(32). EGTB(32) needs no search at all. As NNUE approaches its upper bound, it will benefit less and less from search (or scale badly as some people call it).
EGTB(32) means that you do not lose but it does not mean no improvement is possible in beating weaker opponents.
Playing a random draw move in a draw position is certainly a bad strategy and EGTB(32) does not tell you the better move out of drawing moves. Playing a fixed draw move that you add to the 32 piece EGTB in a draw position in case that you play many games is also a bad strategy because the opponent can memorize one drawn game and repeat it to draw against you again and again.
When I do the reverse match of D3.1 vs SF. Where SF gets the 2 hours in 40 moves. It could tell us something.
But as we know, engines do not play perfectly. As can be shown with the perfect play we have today. 7 man positions, and many test positions. The engines are still clueless.
But a good way to maximize a 32 man TB, would be to play the lines that promote the most complexity. Longest game possible. So you are playing the odds.
Remember it only takes 1 mistake to lose against perfect play. . Assuming, and that is assuming chess is a forced draw.
I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position.
Basically there are 2 questions: 1)Do engines play perfect games with no losing mistake? 2)Is it possible to find some strategy to beat chess engines?
It is possible that the reply to both question is positive. I do not know about a single software that does a serious try to win against a specific opponent (when it get the details of the opponent and the time control) including using the fact that it has significantly more time to predict the moves of the opponent and pruning moves not because they are bad but because the opponent is not expected to play them(predicting the moves can be done simply by asking a copy of the opponent to search).
"I know that engines are clueless in many positions but it does not contradict perfect chess from the opening position."
Yes it does contradict that! What are you talking about?
Otherwise chess engines would play perfect chess in all 7 man positions, and test positions. As that is a subset of all opening positions.
1)Do engines play perfect games with no losing mistake?
Yes they can play perfectly, depending on the opponent. Remember chess is a 2 player game. If for example I play into fools mate, that would be a perfect game. And many engines, and other humans , and I could play a perfect game.
And that is why you can not assume draw, or win results = perfect play. Between 2 imperfect players.
2)Is it possible to find some strategy to beat chess engines?
Yes as we know that chess engines do not play perfect chess. Until chess engines do you can still improve.