This is an util that should list the most aggressive engine based on the number of moves an engine needed to reach a winning position, IOW shorties. Your job, find me a formula that based on the numbers above will create a reliable score for each engine. Hint, of course a game won in the first 20 moves is more worth than a win in 40 moves.
Remark - CST-1.35 is an older version of Chris, thus not 2.0, that we use in our testings.
Before you ask, here is the (current) definition of a winning position - when both engines agree on a score of 3 pawns for 5 consecutive moves and the game is won and the move number < 50.
The opinion of the weaker engine is pretty irrelevant in this challenge ( as is resigning itself).
If you pit two engines against each other the winning one will usually recognize the win first - the other one will catch up with the fail-lows later.
What you may want is that the winning one can resign for the weaker one ( to save time or avoid all the irrelevant moves).
Worded this way, it is just a typical statistical challenge, isn't it? Do a bazillion of engine games at a very speedy time control in a pool of engines of kind of similar strength ( else you get nonsense, as the stronger one will win anyway). Find the specific engine value ( in like three consecutive moves in a game to avoid randomness) where the stronger engine can resign for the weaker one as the weaker one won't recover in more than say 95% ( or whatever) of the games, if you play it out.
This would result in some kind of list:
Stockfish 0.9 Rebel 1.6 etc.etc.
Then you know exactly what a "winning position" means for a specific engine judged by its "winning number".
A value that is similar for all engines will either be unnecessarily conservative ( or mean that some engines do things in a very similar way).
Ghppn likes this post
Chris Whittington
Posts : 1254 Join date : 2020-11-17 Location : France
The opinion of the weaker engine is pretty irrelevant in this challenge ( as is resigning itself).
If you pit two engines against each other the winning one will usually recognize the win first - the other one will catch up with the fail-lows later.
What you may want is that the winning one can resign for the weaker one ( to save time or avoid all the irrelevant moves).
Worded this way, it is just a typical statistical challenge, isn't it? Do a bazillion of engine games at a very speedy time control in a pool of engines of kind of similar strength ( else you get nonsense, as the stronger one will win anyway). Find the specific engine value ( in like three consecutive moves in a game to avoid randomness) where the stronger engine can resign for the weaker one as the weaker one won't recover in more than say 95% ( or whatever) of the games, if you play it out.
This would result in some kind of list:
Stockfish 0.9 Rebel 1.6 etc.etc.
Then you know exactly what a "winning position" means for a specific engine judged by its "winning number".
A value that is similar for all engines will either be unnecessarily conservative ( or mean that some engines do things in a very similar way).
Kind of missing the point. The idea is to determine if movestowin is a measure of engine aggression or engine Elo difference or both. My opinion (without doing the regression experiment at this stage) is that it measures both. Stephan Pohl’s EAS tool uses it as if it measures aggression only. Okay, if it’s also Elo, if we can extract the Elo component, then we’re left with the EAS component. It’s probably down to me to calculate the equations, nobody else is going to, but first we need to pin down what movestowin is going to be and how to calculate it.
Mclane and Ghppn like this post
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
If you reading this, how useful would it be for your EAS tool if I take your pgn database and store the move number a game is decided (as described in the OP) into the pgn? You are no longer dependent on Plycount and shorties would be more accurate.
Options, store move-number into Event, Site, Round, ECO, or maybe even in Plycount. I noticed pgn-extract can deal with that.
Maybe other wishes?
Eelco
Posts : 233 Join date : 2021-10-08
Subject: Re: Forum challenge [!] Sun Jul 09, 2023 6:43 am
In case the EAS does not work too well, I still think the original Benjamin version of the Toga Rebels was working extremely well... But I have absolutely no idea how training NNUE works. It seemed to capture the original playing style of Benjamin so very good? Plus an enormous Elo lift. But as soon as you start adding Lco games, the natural tendency of going towards Lco, because that is Alpha Zero like, will go faster I think. Even without adding Lc0 games, it will eventually gravitate to Alpha Zero with every iteration and that is probably where a lot of the Elo comes from? But you do not want all of the Lc0 playing style, you also want sacrificial play. For endgame technique Lc0 is very okay.
A couple of million original MsDos CS Tal games, would that work the same as the Benjamin games? But who can play that now. But that would be my option one because it seems proven, with Benjamin, that it works.
Option two is simple and Ed already tried it I think, or intended to with Toga Rebel but it did not work with Toga anymore, because of the NNUE changes introduced, the material function of Toga could not be used anymore. It is very simple; if you have a decent fast material function available in every node, you can calibrate that for how much to subtract from the NNUE evaluation, to get a "pure" positional evaluation out of NNUE evaluation. This you can increase or decrease at will, (of course that has an impact on Elo too but that is where the art of training NNUE and programming comes in) and that will regulate how much material sacrifices you will see. That was the simple Bluefish idea (for Ovyron ). Because Stockfish really had a well tuned material function to begin with and you play essentially against all sorts of Stockfish when in the opening book warfare in the Engine Rooms of this world, it also worked as a predictor of where a standard Stockfish would not see the material compensation of a sacrifice too well, so that is part of the 'opponent modeling' of Bluefish. But that only worked for Stockfish of course.
Nezhman and Ghppn like this post
Nezhman
Posts : 74 Join date : 2020-11-27
Subject: Re: Forum challenge [!] Sun Jul 09, 2023 10:19 pm
Eelco wrote:
In case the EAS does not work too well, I still think the original Benjamin version of the Toga Rebels was working extremely well... But I have absolutely no idea how training NNUE works. It seemed to capture the original playing style of Benjamin so very good? Plus an enormous Elo lift. But as soon as you start adding Lco games, the natural tendency of going towards Lco, because that is Alpha Zero like, will go faster I think. Even without adding Lc0 games, it will eventually gravitate to Alpha Zero with every iteration and that is probably where a lot of the Elo comes from? But you do not want all of the Lc0 playing style, you also want sacrificial play. For endgame technique Lc0 is very okay.
A couple of million original MsDos CS Tal games, would that work the same as the Benjamin games? But who can play that now. But that would be my option one because it seems proven, with Benjamin, that it works.
Option two is simple and Ed already tried it I think, or intended to with Toga Rebel but it did not work with Toga anymore, because of the NNUE changes introduced, the material function of Toga could not be used anymore. It is very simple; if you have a decent fast material function available in every node, you can calibrate that for how much to subtract from the NNUE evaluation, to get a "pure" positional evaluation out of NNUE evaluation. This you can increase or decrease at will, (of course that has an impact on Elo too but that is where the art of training NNUE and programming comes in) and that will regulate how much material sacrifices you will see. That was the simple Bluefish idea (for Ovyron ). Because Stockfish really had a well tuned material function to begin with and you play essentially against all sorts of Stockfish when in the opening book warfare in the Engine Rooms of this world, it also worked as a predictor of where a standard Stockfish would not see the material compensation of a sacrifice too well, so that is part of the 'opponent modeling' of Bluefish. But that only worked for Stockfish of course.
When was the original Benjamin Toga Rebel released? I'd like to look it up by timestamp to see if I have it downloaded. Thanks!
texium
Posts : 125 Join date : 2022-07-19
Subject: Re: Forum challenge [!] Sun Jul 09, 2023 10:27 pm
Eelco wrote:
In case the EAS does not work too well, I still think the original Benjamin version of the Toga Rebels was working extremely well... But I have absolutely no idea how training NNUE works. It seemed to capture the original playing style of Benjamin so very good? Plus an enormous Elo lift. But as soon as you start adding Lco games, the natural tendency of going towards Lco, because that is Alpha Zero like, will go faster I think. Even without adding Lc0 games, it will eventually gravitate to Alpha Zero with every iteration and that is probably where a lot of the Elo comes from? But you do not want all of the Lc0 playing style, you also want sacrificial play. For endgame technique Lc0 is very okay.
A couple of million original MsDos CS Tal games, would that work the same as the Benjamin games? But who can play that now. But that would be my option one because it seems proven, with Benjamin, that it works.
Option two is simple and Ed already tried it I think, or intended to with Toga Rebel but it did not work with Toga anymore, because of the NNUE changes introduced, the material function of Toga could not be used anymore. It is very simple; if you have a decent fast material function available in every node, you can calibrate that for how much to subtract from the NNUE evaluation, to get a "pure" positional evaluation out of NNUE evaluation. This you can increase or decrease at will, (of course that has an impact on Elo too but that is where the art of training NNUE and programming comes in) and that will regulate how much material sacrifices you will see. That was the simple Bluefish idea (for Ovyron ). Because Stockfish really had a well tuned material function to begin with and you play essentially against all sorts of Stockfish when in the opening book warfare in the Engine Rooms of this world, it also worked as a predictor of where a standard Stockfish would not see the material compensation of a sacrifice too well, so that is part of the 'opponent modeling' of Bluefish. But that only worked for Stockfish of course.
they're trying to make a strong, tal-like engine. They've made it strong, but its a matter of how much strength they want to give away for style. NNUE is originally trained on a HCE like Benjamin was. After each new version though, the engine is stronger and makes less mistakes. So the net is trained on less variable data, you lose some style because those engines are weak. There's probably still some room for CSTal, but what they really might want to consider is retraining a net but using human games evaluated by CStal as part of training data. Human games usually make a more aggressive engine/net but still manage to result in strong engines/nets. i.e. Badgyal, Night Nurse, Viridthias.
Nezhman and Ghppn like this post
texium
Posts : 125 Join date : 2022-07-19
Subject: Re: Forum challenge [!] Sun Jul 09, 2023 10:30 pm
Peter Berger wrote:
The opinion of the weaker engine is pretty irrelevant in this challenge ( as is resigning itself).
If you pit two engines against each other the winning one will usually recognize the win first - the other one will catch up with the fail-lows later.
What you may want is that the winning one can resign for the weaker one ( to save time or avoid all the irrelevant moves).
Worded this way, it is just a typical statistical challenge, isn't it? Do a bazillion of engine games at a very speedy time control in a pool of engines of kind of similar strength ( else you get nonsense, as the stronger one will win anyway). Find the specific engine value ( in like three consecutive moves in a game to avoid randomness) where the stronger engine can resign for the weaker one as the weaker one won't recover in more than say 95% ( or whatever) of the games, if you play it out.
This would result in some kind of list:
Stockfish 0.9 Rebel 1.6 etc.etc.
Then you know exactly what a "winning position" means for a specific engine judged by its "winning number".
A value that is similar for all engines will either be unnecessarily conservative ( or mean that some engines do things in a very similar way).
another issue with this is sometimes engines are just wrong. They eval as winning when its a draw or losing. So who would be in the right?
Ghppn likes this post
Chris Whittington
Posts : 1254 Join date : 2020-11-17 Location : France
Subject: Re: Forum challenge [!] Sun Jul 09, 2023 10:32 pm
texium wrote:
Peter Berger wrote:
The opinion of the weaker engine is pretty irrelevant in this challenge ( as is resigning itself).
If you pit two engines against each other the winning one will usually recognize the win first - the other one will catch up with the fail-lows later.
What you may want is that the winning one can resign for the weaker one ( to save time or avoid all the irrelevant moves).
Worded this way, it is just a typical statistical challenge, isn't it? Do a bazillion of engine games at a very speedy time control in a pool of engines of kind of similar strength ( else you get nonsense, as the stronger one will win anyway). Find the specific engine value ( in like three consecutive moves in a game to avoid randomness) where the stronger engine can resign for the weaker one as the weaker one won't recover in more than say 95% ( or whatever) of the games, if you play it out.
This would result in some kind of list:
Stockfish 0.9 Rebel 1.6 etc.etc.
Then you know exactly what a "winning position" means for a specific engine judged by its "winning number".
A value that is similar for all engines will either be unnecessarily conservative ( or mean that some engines do things in a very similar way).
another issue with this is sometimes engines are just wrong. They eval as winning when its a draw or losing. So who would be in the right?
Eval different sign to result (when there is a result) is quite rare though.
Ghppn likes this post
Eelco
Posts : 233 Join date : 2021-10-08
Subject: Re: Forum challenge [!] Mon Jul 10, 2023 11:48 am
Nezhman wrote:
Eelco wrote:
In case the EAS does not work too well, I still think the original Benjamin version of the Toga Rebels was working extremely well... But I have absolutely no idea how training NNUE works. It seemed to capture the original playing style of Benjamin so very good? Plus an enormous Elo lift. But as soon as you start adding Lco games, the natural tendency of going towards Lco, because that is Alpha Zero like, will go faster I think. Even without adding Lc0 games, it will eventually gravitate to Alpha Zero with every iteration and that is probably where a lot of the Elo comes from? But you do not want all of the Lc0 playing style, you also want sacrificial play. For endgame technique Lc0 is very okay.
A couple of million original MsDos CS Tal games, would that work the same as the Benjamin games? But who can play that now. But that would be my option one because it seems proven, with Benjamin, that it works.
Option two is simple and Ed already tried it I think, or intended to with Toga Rebel but it did not work with Toga anymore, because of the NNUE changes introduced, the material function of Toga could not be used anymore. It is very simple; if you have a decent fast material function available in every node, you can calibrate that for how much to subtract from the NNUE evaluation, to get a "pure" positional evaluation out of NNUE evaluation. This you can increase or decrease at will, (of course that has an impact on Elo too but that is where the art of training NNUE and programming comes in) and that will regulate how much material sacrifices you will see. That was the simple Bluefish idea (for Ovyron ). Because Stockfish really had a well tuned material function to begin with and you play essentially against all sorts of Stockfish when in the opening book warfare in the Engine Rooms of this world, it also worked as a predictor of where a standard Stockfish would not see the material compensation of a sacrifice too well, so that is part of the 'opponent modeling' of Bluefish. But that only worked for Stockfish of course.
When was the original Benjamin Toga Rebel released? I'd like to look it up by timestamp to see if I have it downloaded. Thanks!
Hi Nezhman,
I find it difficult to search for release dates, I think I should have called it the Fruit version with Benjamin Net, not Toga, specifically Pawel Koziol's Fruit version, but exactly when that was, and how it was called... It would have been somewhere after this thread from Ed where he describes the first Benjamin NNUE: https://prodeo.actieforum.com/t638-my-first-nnue and that was Dec 08 2021. There was a first Benjamin net, maybe that was the one mentioned by Ed there in the posted link. Not official. That was compatible with Marvin, by accident, so it had a strong search if you had that program available. For sure I don't have that version, and the link, for the NNUE only, was on rebel13.nl so that does not work anymore.
Maybe Matejst still has it somewhere on his older computer...
Shortly after that Ed published the first Fruit version with a Benjamin based NNUE net. I'm sure I have that but exactly which date or name right now I do not know...
Nezhman and Ghppn like this post
Nezhman
Posts : 74 Join date : 2020-11-27
Subject: Re: Forum challenge [!] Tue Jul 11, 2023 2:50 am