The reason Chess System Tal wont allow threads > cores. Don't want to have to wade through pages and pages of nonsense ....
Berserk 11.1 time management issue leads to regularly losing games it would otherwise win #485
yetisyny opened this issue 2 days ago · 5 comments
Berserk 11.1 time management issue leads to regularly losing games it would otherwise win
yetisyny opened this issue 2 days ago · 5 comments
yetisyny commented 2 days ago
Berserk 11.1 has an intermittent issue where sometimes it takes longer than it is supposed to to do a move. This is obvious if you have it play a bunch of games in the Scid vs. PC 4.23 GUI against the built-in chess engines Scidlet, Phalanx, or Toga II. Most of the chess engines with high Elo ratings pretty much always beat Scidlet, almost always beat Phalanx, and pretty consistently beat Toga II. You see this with Stockfish 15.1, Lc0 v0.29.0, or Fat Fritz 2 if they are installed and configured correctly and running on good enough hardware (Lc0 requires a decent NVIDIA video card and the right CUDA build).
Anyway the problem with Berserk 11.1 is, you see it losing sometimes to the weakest chess engines, even the really really bad Scidlet, on regular occasions, something that just doesn't happen at all with other high-Elo chess engines I mentioned earlier like Stockfish, Lc0, or the controversial Fat Fritz 2. And when you look at the reason why it loses to them, it's always going over the time limit, using more time than it is allowed to use by the GUI. If a chess engine goes over the time limit, for whatever reason, it automatically loses the game, which is what happens regularly to Berserk 11.1, at least in this setup. It seems to happen a bit more often against Toga II and Phalanx than Scidlet because they are advanced enough that the game keeps going for more moves, giving Berserk more chances to go over the time limit.
To really trigger this issue more often, you can set the time control to be per move and just give 1 second per move, or set the time control to be per game and start out at 1 second and add an additional second for each move. This also makes games go faster as well as triggering this bug to occur more often, so I recommend using those settings to duplicate this bug, and testing it against an engine that is significantly weaker, enough to consistently lose if there isn't a timeout, but not so weak that it loses after just a few moves. This issue comes up remarkably often in games between Berserk and Phalanx... Phalanx is better than Scidlet so it can last longer, but it is nowhere near as good as modern NNUE engines so its loss would only be a matter of time if it weren't for this timeout issue. Since Berserk vs. Phalanx games go on longer than Berserk vs. Scidlet games, you see more of them where Berserk loses and Phalanx wins on time. If you fixed this bug in Berserk 11.1, obviously Berserk would win pretty much every game against these much weaker engines. I think the quick fix for a time issue included in Berserk 11.1 wasn't quite perfect and this issue seems to be a relic of that. Berserk DOES do pretty good moves when it does do a move, still, but this timeout issue affects it no matter what chess engine it is up against, and makes it lose against good chess engines even more than the weak ones, at least in the Scid vs. PC GUI. I have it set with Ponder = On which might be relevant, the other settings are mostly defaults other than the number of threads being the number of cores my CPU has. Everything besides those 2 is default and it is the right version for my CPU (x64-avx2-pext since I have a relatively recent CPU). Oh and this is on 64-bit Windows 11.
None of the other chess engines I have have any issues like this with time management. This issue is a real shame because I have been doing some computer chess engine tournaments just for testing purposes on my computer and Berserk 11.1 does much worse in the tournaments than it is supposed to do according to its online ratings, and I would like to get this fixed so it is at peak performance as one of the best engines with a unique evaluation function that gives different results from other top engines but still does very well. I assume that if I tested Berserk 10 or earlier, it would probably not have this bug, but I have not really done that, since I figure you can figure this out for yourself. To duplicate this, of course, you need to use a chess GUI to set strict time limits of like, one second or something, and watch how it sometimes goes over the limits and loses. Thank you.
jhonnold commented 2 days ago
Can you provide explicit details regarding this issue.
PGNs and/or UCI engine logs?
Exact settings given to Berserk and it's opponent
yetisyny commented yesterday •
Here is how you can reproduce the bug. I have tried to simplify things but unfortunately it is a bit complicated. I had to go into some detail how to reproduce it and this does not even include a bunch of other things I have changed in my own configuration, this is a simplified version but is still many steps. I think PGNs and engine logs are not the most useful way to reproduce this issue but rather it is best to try and recreate the same settings that cause this issue to happen, so I have put together instructions to get this bug to occur consistently in a way you can observe and test them.
Steps to reproduce bug with time management in Berserk 11.1:
Install 64-bit Scid vs. PC version 4.23 (currently the latest version) on 64-bit Windows.
Create a "berserk" subdirectory of the "bin\engines" subdirectory which already contains "phalanx", "stockfish", and "toga" subdirectories.
Install the most advanced version of Berserk 11.1 your CPU supports into that "berserk" subdirectory of "bin\engines" underneath the Scid vs. PC 4.23 installation. In my case it is "berserk-11.1-x64-avx2-pext.exe", x64 with AVX2 and PEXT, also known as AVX2 and BMI2, which works on Haswell and newer CPUs. My CPU is an x64 Intel Core i7 CPU newer than Haswell and supports AVX2 and PEXT so I use it.
Run Scid vs. PC 4.23 and keep it open for now.
Go to the Tools menu, Analysis Engines.
Click the New button.
For the new chess engine, put in Berserk 11.1 as the name and select the executable you just installed for Berserk 11.1 as the command using the "browse" button to the right of the command field. The directory should read "." The parameters should be blank. It doesn't matter what you put for website or Elo. Leave it as UCI and not Xboard because it's a UCI chess engine.
Press Configure to configure Berserk 11.1. We want it to use as many threads as the number of logical processors on the computer (which you can find by going to the Windows Task Manager, Performance tab). In my case, my CPU has 12 logical processors, so I set the number of threads to 12. The number of logical processors in an x86 CPU can either be its number of cores times 2 (if it has hyperthreading) or just be its number of cores (if it doesn't have hyperthreading), and in my case, I have a 6-core CPU with hyperthreading which makes 12 logical processors which is why I use 12, but you should use the correct number for your CPU's number of logical processors. Also we want Ponder to be on (true). The other settings can be left the same. Save these settings. It seems these settings might make this bug happen more often, which is good because we are trying to duplicate this bug.
Hit OK to save the settings for Berserk 11.1.
Move Berserk 11.1 to the top of the list of chess engines, to make it easier to set up a tournament where Berserk 11.1 plays the other engines but they don't play each other.
Edit the settings for Stockfish and hit Configure for its window too. We want to increase its number of threads to the number of logical processors on the computer (in my case 12) and enable Ponder for it as well, while leaving other settings the same. Save the settings for Stockfish and hit OK to save the changes.
Do the same thing for Toga, except Toga doesn't allow you to increase the number of threads, but you can turn Ponder on, so turn Ponder on for Toga as well.
If you install any other UCI chess engines to test against Berserk 11.1, also set their number of threads to the number of logical processors and turn Ponder on for them too, but leave the other settings at defaults. This way, we are testing every engine under the same conditions.
You can't configure Phalanx or Scidlet at all, so we are done here, so close the Analysis Engines window.
Go to Play menu, Computer Tournament. Go ahead and set the number of engines to the maximum, 5, and hit the Update button. (You could try this with more engines but this is enough.)
Set the 1st engine to Berserk 11.1. The other 4 engines should be Stockfish, Toga, Phalanx, and Scidlet. (Those come with Scid vs. PC 4.23 by default but you can test against other engines too.)
Increase number of rounds from 2 to 10 to see results from more games. Instead of having base number time per game be 60 seconds with an increment of 1 second, decrease it to 1 second with an increment of 1 second if you leave Time Control to be per game. Another setting to get similar results is switching Time Control to be per move and setting it to 1 second, but either way we want it on 1 second to increase time pressure. Using the up and down arrows for time might not allow you to set the exact number of seconds to 1 so you probably need to type in the number 1 and switch to another field.
The settings for "Show Clocks", "Animate Moves", and "Engine Scores as Comments" don't matter and can be set either way. Having "Show Clocks", "Animate Moves", and "Engine Scores as Comments" all turned off makes things faster, though.
Make sure to turn on "Permanent Thinking" (equivalent to "Ponder"). The reason I use Ponder on is because Leela Chess Zero needs it on to work well and I was doing tournaments involving Leela Chess Zero, and I thought it would be unfair if it were only on for Leela Chess Zero and off for everything else. I didn't tell you to install Leela Chess Zero because that is unnecessary, this is just the reason I had Ponder and Permanent Thinking on for all the UCI engines. It seems this might make this bug in Berserk 11.1 happen more often.
Turn on "Use book" for more variety and random starting positions in games. The default book, Elo2400.bin, is fine and avoids making major mistakes in the openings while having more variety than if an opening book is turned off.
For game scheduling, set it to "First plays others" so that all of our games will be between Berserk 11.1 and other engines and we don't waste time having other engines play each other. We already know Stockfish is better than Toga, Toga is better than Phalanx, and Phalanx is better than Scidlet, and don't need to waste time verifying that. This bug happens regardless of what game scheduling is, and I normally use "Carousel" to have all the engines play each other in round-robin tournaments, but for verifying and testing this bug, "First plays others" is more efficient at just testing Berserk 11.1.
Leave the "Start position" setting at "Normal".
Hit OK to start, and it should start 40 games, assuming Berserk 11.1 has 4 opponents (the old versions of Stockfish, Toga, Phalanx, and Scidlet bundled with Scid vs. PC 4.23).
Switch to the main window of Scid vs. PC without closing the other windows, and go to the Window menu, Crosstable, and wait for the results to show up in the crosstable.
Switch back to the Tournament window and have it on top with the Crosstable window behind it. That way you can see if a game ends because one of the players timed out, as well as see the total results from all games so far. Bear in mind that this is an old version of Stockfish that Berserk 11.1 can usually beat, Toga is worse than the old Stockfish and Berserk 11.1 is much stronger, Phalanx is worse than Toga so Berserk 11.1 has an even bigger advantage, and Scidlet is even worse than Phalanx so Berserk 11.1 is far more advanced then Scidlet, and all of them are opponents that Berserk 11.1 should win against most of the time.
You should observe that Berserk 11.1 sometimes loses games that it should win, on a regular basis, due to timing out, and this happens against all the other opponents at least occasionally. When this happens, it will say "Timed out" as the reason for Berserk 11.1 losing, in the Tournament window. This is the bug we are trying to duplicate. It seems to happen fairly often against opponents that put up a good fight and less often against weaker ones, although it depends on how long the game lasts, and is more likely to happen in longer games with more moves.
If you install another modern chess engine that is good, like Stockfish 15.1 or Lc0 v0.29.0 or the controversial Fat Fritz 2 (which you can find for free now that it is open source), among many other examples, you will not see this timing out bug occur in tournaments like this in Scid vs. PC, and other modern top-tier chess engines will do better than Berserk 11.1 under these specific conditions because of not having this bug. If you want to observe that, as well as which engines are stronger than which other engines, install some other good current engines and use these same settings for multithreading and having Ponder = On to test them under the same conditions as Berserk 11.1. This will confirm that this bug is unique to Berserk 11.1 and not found in the other chess engines or in the Scid vs. PC 4.23 program itself, as when the other engines play each other they do not time out.
If you follow these steps, you should be able to replicate this bug pretty easily and see it happen for yourself on a regular basis. For instance, I was doing these steps as I was writing them and testing them to make sure these steps cause this bug to occur, and then right in the first 10 games between Berserk 11.1 and Stockfish 9, 4 of the 10 games between them were lost by Berserk 11.1 because of timing out, 3 of them were won by Berserk 11.1 because it is stronger than Stockfish 9, and 3 of the games were ties because Stockfish 9 is able to put up a good fight despite being weaker than Berserk 11.1 (typical results). After that, Berserk 11.1 won 9 games against Toga II 1.3.1 because it is stronger and lost 1 because it timed out (usually it times out more often against Toga but this time it had good luck). Then it won 6 games against Phalanx XXIV because it is stronger and lost 4 because it timed out (typical results). Finally it won 6 games against Scidlet because it is stronger and lost 4 because it timed out (usually it wins more consistently against Scidlet, this was an unusually high number of timeouts against Scidlet this time).
Here is the crosstable from that tournament, showing Berserk 11.1 winning 24 times out of 40 because it is stronger than the other engines here, losing 13 times out of 40 because of this timeouts bug, and getting 3 draws against Stockfish 9 because Stockfish 9 put up a better fight than the weaker engines (it looks like there is also a completely unrelated bug in either Toga II 1.3.1 or in Scid vs. PC 4.23 that made Scid vs. PC list Toga as having an age of 33 and being Turkish here, but we can just ignore that, it is irrelevant to the bug we are discussing here in Berserk 11.1):
Scid vs. PC
Age Nat Score Berserk 11 Stockfish Phalanx XX Scidlet Toga
1: Berserk 11.1 25.5 / 40 XXXXXXXXXX 1010100=== 1100010111 1101101010 1111111110 (+24 -13 =3)
2: Stockfish 5.5 / 10 0101011=== XXXXXXXXXX .......... .......... .......... (+4 -3 =3)
3: Phalanx XXIV 4.0 / 10 0011101000 .......... XXXXXXXXXX .......... .......... (+4 -6 =0)
4: Scidlet 4.0 / 10 0010010101 .......... .......... XXXXXXXXXX .......... (+4 -6 =0)
5: Toga 33 TUR 1.0 / 10 0000000001 .......... .......... .......... XXXXXXXXXX (+1 -9 =0)
40 games: +22 -15 =3
This bug happens more often against opponents that give Berserk 11.1 more of a challenge, but still intermittently happens against weaker opponents too, which will show up in tournament results as the weaker opponents winning, and these are opponents that are not able to ever even get a draw, which Berserk 11.1 usually wins against outright quite easily.
In previous tournaments I noticed this bug also seems to happen strangely often against Phalanx, whether it is the bundled Phalanx XXIV or if you upgrade to the slightly more powerful Phalanx XXV. So I think specifically testing it against Phalanx, particularly the final version, Phalanx XXV, might be the best way to test this bug, if you are thinking of testing against any one specific engine that reproduces this issue the most often. That is also useful because Phalanx XXIV or Phalanx XXV definitely has a much lower Elo rating than Berserk 11.1 and they never ever draw, and every time Phalanx wins is because Berserk 11.1 timed out. I just had directions for having it play against all the other engines to demonstrate that it intermittently happens against all of them and because I usually have all the engines play each other. Also if you just have 2 engines play each other, the maximum number of games Scid vs. PC 4.23 lets you do in a tournament is 10, and occasionally Berserk 11.1 can have a winning streak where this intermittent bug doesn't happen at all 10 games in a row, and I wanted to make sure the bug occurred for you and you could replicate it. It never has a 40-game winning streak without the bug occurring 40 games in a row, if the multithreading and Ponder settings are set like this.
This bug also seems to happen more often if you do things in other programs at the same time as the chess tournament. And it seems to happen less often if all the chess engines are set to only use 1 thread each with Ponder off and with Permanent Thinking off and with time set to be per game rather than per move and with the default setting of 60 seconds per game with a 1 second increment, and if you don't do anything in other programs when it is running. It still occasionally happens then but not as much.
AlexBrunetti commented 20 hours ago
Playing with 12 threads and ponder on requires 12 logical cores per engine.
yetisyny reacted with thumbs up emoji
yetisyny commented 19 hours ago •
Yes, I have realized since I discovered this bug that these are not the best settings since then, and that Leela Chess Zero ignores the Ponder setting, and it is better to have Ponder off for tournaments. I was just trying to explain how to recreate the bug. Obviously these are not the best settings or conditions to run chess engines in, and put added stress on them and make it more difficult for them to get a move done on time. You are certainly 100% correct that requiring 12 logical cores per engine and having both run at the same idea is a little bit odd and a bit overkill and has the chess engines running many threads competing for CPU time on each others' turns. I have switched to having Ponder set to off for all the chess engines in my setup.
But this bug also occurs intermittently under normal circumstances with default settings. These settings I provided just increase the probability of this bug occurring and make it more easy to duplicate. I do not recommend for other people to use those settings unless their goal is to duplicate this sort of bug.
jhonnold commented 12 hours ago
So, to summarize, it seems like Berserk times out on your system (or at least gets to very low time) if you run more threads than your system can handle?