Forum challenge [!]

Subject: Forum challenge [!] Tue Jul 04, 2023 10:27 am

Just for the fun of it....

Code:: Engine Shorties List

PGN database : pgn\all-aggregated-130.pgn
Get won games : 11.500

Engine 1-10 11-20 21-30 31-40 41-49
NNUE-TRAING-SESSION 1 209 1019 1397 1035
CST-1.35-V20-E520 1   389 1853 2721 2128
SlowChess 2.9 1   387 1845 2726 2159
Stockfish 12 1   364 1751 2628 2095
Berserk 9 1   382 1822 2687 2105
Koivisto 9.0 1   378 1768 2626 2096
Stockfish 14 1 345 1692 2460 2026
Stockfish 13 0   352 1688 2556 2060
Berserk 10 1 376 1778 2639 2085
Koivisto 8.0 1   388 1824 2702 2118
Seer 2.6 1 380 1810 2678 2143

Press a key to continue...

This is an util that should list the most aggressive engine based on the number of moves an engine needed to reach a winning position, IOW shorties. Your job, find me a formula that based on the numbers above will create a reliable score for each engine. Hint, of course a game won in the first 20 moves is more worth than a win in 40 moves.

Remark - CST-1.35 is an older version of Chris, thus not 2.0, that we use in our testings.

Before you ask, here is the (current) definition of a winning position - when both engines agree on a score of 3 pawns for 5 consecutive moves and the game is won and the move number < 50.

Show me your arithmetic skills Cool

Posts : 880 Join date : 2020-11-25 Location : USA

Just score

1-10 as 1.000 points
11-20 as 0.750 points
21-30 as 0.500 points
31-40 as 0.250 points
41-49 as 0.000 points

Then divide by the number of games played.

Posts : 880 Join date : 2020-11-25 Location : USA

If you insist on chasing this dragon. I would suggest incorporating the known average Elo of the other players for each engine into the formula.

Average Elo*(1+score).

As chess is a 2 player game.

Subject: Re: Forum challenge [!] Wed Jul 05, 2023 8:10 pm

You are on the right track, 2 reputations points for you bounce

Posts : 880 Join date : 2020-11-25 Location : USA

Admin wrote:: You are on the right track, 2 reputations points for you

NICE!!!

Posts : 131 Join date : 2020-11-20

The opinion of the weaker engine is pretty irrelevant in this challenge ( as is resigning itself).

If you pit two engines against each other the winning one will usually recognize the win first - the other one will catch up with the fail-lows later.

What you may want is that the winning one can resign for the weaker one ( to save time or avoid all the irrelevant moves).

Worded this way, it is just a typical statistical challenge, isn't it? Do a bazillion of engine games at a very speedy time control in a pool of engines of kind of similar strength ( else you get nonsense, as the stronger one will win anyway). Find the specific engine value ( in like three consecutive moves in a game to avoid randomness) where the stronger engine can resign for the weaker one as the weaker one won't recover in more than say 95% ( or whatever) of the games, if you play it out.

This would result in some kind of list:

Stockfish 0.9
Rebel 1.6
etc.etc.

Then you know exactly what a "winning position" means for a specific engine judged by its "winning number".

A value that is similar for all engines will either be unnecessarily conservative ( or mean that some engines do things in a very similar way).

Posts : 1254 Join date : 2020-11-17 Location : France

Peter Berger wrote:: The opinion of the weaker engine is pretty irrelevant in this challenge ( as is resigning itself).

If you pit two engines against each other the winning one will usually recognize the win first - the other one will catch up with the fail-lows later.

What you may want is that the winning one can resign for the weaker one ( to save time or avoid all the irrelevant moves).

Worded this way, it is just a typical statistical challenge, isn't it? Do a bazillion of engine games at a very speedy time control in a pool of engines of kind of similar strength ( else you get nonsense, as the stronger one will win anyway). Find the specific engine value ( in like three consecutive moves in a game to avoid randomness) where the stronger engine can resign for the weaker one as the weaker one won't recover in more than say 95% ( or whatever) of the games, if you play it out.

This would result in some kind of list:

Stockfish 0.9
Rebel 1.6
etc.etc.

Then you know exactly what a "winning position" means for a specific engine judged by its "winning number".

A value that is similar for all engines will either be unnecessarily conservative ( or mean that some engines do things in a very similar way).

Kind of missing the point. The idea is to determine if movestowin is a measure of engine aggression or engine Elo difference or both.
My opinion (without doing the regression experiment at this stage) is that it measures both. Stephan Pohl’s EAS tool uses it as if it measures aggression only. Okay, if it’s also Elo, if we can extract the Elo component, then we’re left with the EAS component.
It’s probably down to me to calculate the equations, nobody else is going to, but first we need to pin down what movestowin is going to be and how to calculate it.

Subject: Re: Forum challenge [!] Thu Jul 06, 2023 10:58 pm

@STEFAN

If you reading this, how useful would it be for your EAS tool if I take your pgn database and store the move number a game is decided (as described in the OP) into the pgn? You are no longer dependent on Plycount and shorties would be more accurate.

Options, store move-number into Event, Site, Round, ECO, or maybe even in Plycount. I noticed pgn-extract can deal with that.

Maybe other wishes?

Posts : 233 Join date : 2021-10-08

In case the EAS does not work too well, I still think the original Benjamin version of the Toga Rebels was working extremely well... But I have absolutely no idea how training NNUE works. It seemed to capture the original playing style of Benjamin so very good? Plus an enormous Elo lift. But as soon as you start adding Lco games, the natural tendency of going towards Lco, because that is Alpha Zero like, will go faster I think. Even without adding Lc0 games, it will eventually gravitate to Alpha Zero with every iteration and that is probably where a lot of the Elo comes from? But you do not want all of the Lc0 playing style, you also want sacrificial play. For endgame technique Lc0 is very okay.

A couple of million original MsDos CS Tal games, would that work the same as the Benjamin games? But who can play that now. But that would be my option one because it seems proven, with Benjamin, that it works.

Option two is simple and Ed already tried it I think, or intended to with Toga Rebel but it did not work with Toga anymore, because of the NNUE changes introduced, the material function of Toga could not be used anymore. It is very simple; if you have a decent fast material function available in every node, you can calibrate that for how much to subtract from the NNUE evaluation, to get a "pure" positional evaluation out of NNUE evaluation. This you can increase or decrease at will, (of course that has an impact on Elo too but that is where the art of training NNUE and programming comes in) and that will regulate how much material sacrifices you will see. That was the simple Bluefish idea (for Ovyron Smile

). Because Stockfish really had a well tuned material function to begin with and you play essentially against all sorts of Stockfish when in the opening book warfare in the Engine Rooms of this world, it also worked as a predictor of where a standard Stockfish would not see the material compensation of a sacrifice too well, so that is part of the 'opponent modeling' of Bluefish. But that only worked for Stockfish of course.

Posts : 74 Join date : 2020-11-27

Eelco wrote:: In case the EAS does not work too well, I still think the original Benjamin version of the Toga Rebels was working extremely well... But I have absolutely no idea how training NNUE works. It seemed to capture the original playing style of Benjamin so very good? Plus an enormous Elo lift. But as soon as you start adding Lco games, the natural tendency of going towards Lco, because that is Alpha Zero like, will go faster I think. Even without adding Lc0 games, it will eventually gravitate to Alpha Zero with every iteration and that is probably where a lot of the Elo comes from? But you do not want all of the Lc0 playing style, you also want sacrificial play. For endgame technique Lc0 is very okay.

A couple of million original MsDos CS Tal games, would that work the same as the Benjamin games? But who can play that now. But that would be my option one because it seems proven, with Benjamin, that it works.

Option two is simple and Ed already tried it I think, or intended to with Toga Rebel but it did not work with Toga anymore, because of the NNUE changes introduced, the material function of Toga could not be used anymore. It is very simple; if you have a decent fast material function available in every node, you can calibrate that for how much to subtract from the NNUE evaluation, to get a "pure" positional evaluation out of NNUE evaluation. This you can increase or decrease at will, (of course that has an impact on Elo too but that is where the art of training NNUE and programming comes in) and that will regulate how much material sacrifices you will see. That was the simple Bluefish idea (for Ovyron ). Because Stockfish really had a well tuned material function to begin with and you play essentially against all sorts of Stockfish when in the opening book warfare in the Engine Rooms of this world, it also worked as a predictor of where a standard Stockfish would not see the material compensation of a sacrifice too well, so that is part of the 'opponent modeling' of Bluefish. But that only worked for Stockfish of course.

When was the original Benjamin Toga Rebel released? I'd like to look it up by timestamp to see if I have it downloaded. Thanks!

Posts : 125 Join date : 2022-07-19

Eelco wrote:: In case the EAS does not work too well, I still think the original Benjamin version of the Toga Rebels was working extremely well... But I have absolutely no idea how training NNUE works. It seemed to capture the original playing style of Benjamin so very good? Plus an enormous Elo lift. But as soon as you start adding Lco games, the natural tendency of going towards Lco, because that is Alpha Zero like, will go faster I think. Even without adding Lc0 games, it will eventually gravitate to Alpha Zero with every iteration and that is probably where a lot of the Elo comes from? But you do not want all of the Lc0 playing style, you also want sacrificial play. For endgame technique Lc0 is very okay.

A couple of million original MsDos CS Tal games, would that work the same as the Benjamin games? But who can play that now. But that would be my option one because it seems proven, with Benjamin, that it works.

Option two is simple and Ed already tried it I think, or intended to with Toga Rebel but it did not work with Toga anymore, because of the NNUE changes introduced, the material function of Toga could not be used anymore. It is very simple; if you have a decent fast material function available in every node, you can calibrate that for how much to subtract from the NNUE evaluation, to get a "pure" positional evaluation out of NNUE evaluation. This you can increase or decrease at will, (of course that has an impact on Elo too but that is where the art of training NNUE and programming comes in) and that will regulate how much material sacrifices you will see. That was the simple Bluefish idea (for Ovyron ). Because Stockfish really had a well tuned material function to begin with and you play essentially against all sorts of Stockfish when in the opening book warfare in the Engine Rooms of this world, it also worked as a predictor of where a standard Stockfish would not see the material compensation of a sacrifice too well, so that is part of the 'opponent modeling' of Bluefish. But that only worked for Stockfish of course.

they're trying to make a strong, tal-like engine. They've made it strong, but its a matter of how much strength they want to give away for style. NNUE is originally trained on a HCE like Benjamin was. After each new version though, the engine is stronger and makes less mistakes. So the net is trained on less variable data, you lose some style because those engines are weak. There's probably still some room for CSTal, but what they really might want to consider is retraining a net but using human games evaluated by CStal as part of training data. Human games usually make a more aggressive engine/net but still manage to result in strong engines/nets. i.e. Badgyal, Night Nurse, Viridthias.

Posts : 125 Join date : 2022-07-19

Peter Berger wrote:: The opinion of the weaker engine is pretty irrelevant in this challenge ( as is resigning itself).

If you pit two engines against each other the winning one will usually recognize the win first - the other one will catch up with the fail-lows later.

What you may want is that the winning one can resign for the weaker one ( to save time or avoid all the irrelevant moves).

Worded this way, it is just a typical statistical challenge, isn't it? Do a bazillion of engine games at a very speedy time control in a pool of engines of kind of similar strength ( else you get nonsense, as the stronger one will win anyway). Find the specific engine value ( in like three consecutive moves in a game to avoid randomness) where the stronger engine can resign for the weaker one as the weaker one won't recover in more than say 95% ( or whatever) of the games, if you play it out.

This would result in some kind of list:

Stockfish 0.9
Rebel 1.6
etc.etc.

Then you know exactly what a "winning position" means for a specific engine judged by its "winning number".

A value that is similar for all engines will either be unnecessarily conservative ( or mean that some engines do things in a very similar way).

another issue with this is sometimes engines are just wrong. They eval as winning when its a draw or losing. So who would be in the right?

Posts : 1254 Join date : 2020-11-17 Location : France

texium wrote:

Peter Berger wrote:: The opinion of the weaker engine is pretty irrelevant in this challenge ( as is resigning itself).

If you pit two engines against each other the winning one will usually recognize the win first - the other one will catch up with the fail-lows later.

What you may want is that the winning one can resign for the weaker one ( to save time or avoid all the irrelevant moves).

Worded this way, it is just a typical statistical challenge, isn't it? Do a bazillion of engine games at a very speedy time control in a pool of engines of kind of similar strength ( else you get nonsense, as the stronger one will win anyway). Find the specific engine value ( in like three consecutive moves in a game to avoid randomness) where the stronger engine can resign for the weaker one as the weaker one won't recover in more than say 95% ( or whatever) of the games, if you play it out.

This would result in some kind of list:

Stockfish 0.9
Rebel 1.6
etc.etc.

Then you know exactly what a "winning position" means for a specific engine judged by its "winning number".

A value that is similar for all engines will either be unnecessarily conservative ( or mean that some engines do things in a very similar way).

another issue with this is sometimes engines are just wrong. They eval as winning when its a draw or losing. So who would be in the right?

Eval different sign to result (when there is a result) is quite rare though.

Posts : 233 Join date : 2021-10-08

Nezhman wrote:

Eelco wrote:: In case the EAS does not work too well, I still think the original Benjamin version of the Toga Rebels was working extremely well... But I have absolutely no idea how training NNUE works. It seemed to capture the original playing style of Benjamin so very good? Plus an enormous Elo lift. But as soon as you start adding Lco games, the natural tendency of going towards Lco, because that is Alpha Zero like, will go faster I think. Even without adding Lc0 games, it will eventually gravitate to Alpha Zero with every iteration and that is probably where a lot of the Elo comes from? But you do not want all of the Lc0 playing style, you also want sacrificial play. For endgame technique Lc0 is very okay.

A couple of million original MsDos CS Tal games, would that work the same as the Benjamin games? But who can play that now. But that would be my option one because it seems proven, with Benjamin, that it works.

Option two is simple and Ed already tried it I think, or intended to with Toga Rebel but it did not work with Toga anymore, because of the NNUE changes introduced, the material function of Toga could not be used anymore. It is very simple; if you have a decent fast material function available in every node, you can calibrate that for how much to subtract from the NNUE evaluation, to get a "pure" positional evaluation out of NNUE evaluation. This you can increase or decrease at will, (of course that has an impact on Elo too but that is where the art of training NNUE and programming comes in) and that will regulate how much material sacrifices you will see. That was the simple Bluefish idea (for Ovyron ). Because Stockfish really had a well tuned material function to begin with and you play essentially against all sorts of Stockfish when in the opening book warfare in the Engine Rooms of this world, it also worked as a predictor of where a standard Stockfish would not see the material compensation of a sacrifice too well, so that is part of the 'opponent modeling' of Bluefish. But that only worked for Stockfish of course.

When was the original Benjamin Toga Rebel released? I'd like to look it up by timestamp to see if I have it downloaded. Thanks!

Hi Nezhman,

I find it difficult to search for release dates, I think I should have called it the Fruit version with Benjamin Net, not Toga, specifically Pawel Koziol's Fruit version, but exactly when that was, and how it was called...
It would have been somewhere after this thread from Ed where he describes the first Benjamin NNUE: https://prodeo.actieforum.com/t638-my-first-nnue
and that was Dec 08 2021.
There was a first Benjamin net, maybe that was the one mentioned by Ed there in the posted link. Not official. That was compatible with Marvin, by accident, so it had a strong search if you had that program available. For sure I don't have that version, and the link, for the NNUE only, was on rebel13.nl so that does not work anymore.

Maybe Matejst still has it somewhere on his older computer...

Shortly after that Ed published the first Fruit version with a Benjamin based NNUE net. I'm sure I have that but exactly which date or name right now I do not know...

Posts : 74 Join date : 2020-11-27

Thanks, Eelco.

» Monthly GM challenge
» What were the nodes per move in the Crafty-Rebel NP Challenge?
» New Chess Forum "Chess Engine Lovers"
» Fishtest forum
» Good luck with your new forum!