ProDeo
Would you like to react to this message? Create an account in a few clicks or log in to continue.
ProDeo

Computer Chess
 
HomeHome  CalendarCalendar  Latest imagesLatest images  FAQFAQ  SearchSearch  MemberlistMemberlist  UsergroupsUsergroups  RegisterRegister  Log in  

 

 Forum challenge [!]

Go down 
+3
Peter Berger
mwyoung
Admin
7 posters
AuthorMessage
Admin
Admin
Admin


Posts : 2528
Join date : 2020-11-17
Location : Netherlands

Forum challenge [!] Empty
PostSubject: Forum challenge [!]   Forum challenge [!] EmptyTue Jul 04, 2023 11:27 am

Just for the fun of it....

Code:
Engine Shorties List

PGN database      : pgn\all-aggregated-130.pgn
Get won games      : 11.500

Engine                  1-10 11-20 21-30 31-40 41-49
NNUE-TRAING-SESSION       1   209   1019  1397  1035
CST-1.35-V20-E520         1   389   1853  2721  2128
SlowChess 2.9             1   387   1845  2726  2159
Stockfish 12              1   364   1751  2628  2095
Berserk 9                 1   382   1822  2687  2105
Koivisto 9.0              1   378   1768  2626  2096
Stockfish 14              1   345   1692  2460  2026
Stockfish 13              0   352   1688  2556  2060
Berserk 10                1   376   1778  2639  2085
Koivisto 8.0              1   388   1824  2702  2118
Seer 2.6                  1   380   1810  2678  2143

Press a key to continue...

This is an util that should list the most aggressive engine based on the number of moves an engine needed to reach a winning position, IOW shorties. Your job, find me a formula that based on the numbers above will create a reliable score for each engine. Hint, of course a game won in the first 20 moves is more worth than a win in 40 moves.

Remark - CST-1.35 is an older version of Chris, thus not 2.0, that we use in our testings.

Before you ask, here is the (current) definition of a winning position - when both engines agree on a score of 3 pawns for 5 consecutive moves and the game is won and the move number < 50.

Show me your arithmetic skills Cool

Nezhman likes this post

Back to top Go down
http://rebel13.nl/
mwyoung

mwyoung


Posts : 880
Join date : 2020-11-25
Location : USA

Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] EmptyWed Jul 05, 2023 12:56 am

Just score

1-10   as 1.000 points
11-20 as 0.750 points
21-30 as 0.500 points
31-40 as 0.250 points
41-49 as 0.000 points

Then divide by the number of games played.

Ghppn likes this post

Back to top Go down
mwyoung

mwyoung


Posts : 880
Join date : 2020-11-25
Location : USA

Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] EmptyWed Jul 05, 2023 1:46 am

If you insist on chasing this dragon. I would suggest incorporating the known average Elo of the other players for each engine into the formula.

Average Elo*(1+score).

As chess is a 2 player game.

Ghppn likes this post

Back to top Go down
Admin
Admin
Admin


Posts : 2528
Join date : 2020-11-17
Location : Netherlands

Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] EmptyWed Jul 05, 2023 9:10 pm

You are on the right track, 2 reputations points for you bounce

Ghppn likes this post

Back to top Go down
http://rebel13.nl/
mwyoung

mwyoung


Posts : 880
Join date : 2020-11-25
Location : USA

Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] EmptyWed Jul 05, 2023 11:33 pm

Admin wrote:
You are on the right track, 2 reputations points for you bounce

NICE!!!

Ghppn likes this post

Back to top Go down
Peter Berger




Posts : 120
Join date : 2020-11-20

Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] EmptyThu Jul 06, 2023 2:30 pm

The opinion of the weaker engine is pretty irrelevant in this challenge ( as is resigning itself).

If you pit two engines against each other the winning one will usually recognize the win first - the other one will catch up with the fail-lows later.

What you may want is that the winning one can resign for the weaker one ( to save time or avoid all the irrelevant moves).

Worded this way, it is just a typical statistical challenge, isn't it? Do a bazillion of engine games at a very speedy time control in a pool of engines of kind of similar strength ( else you get nonsense, as the stronger one will win anyway). Find the specific engine value ( in like three consecutive moves in a game to avoid randomness) where the stronger engine can resign for the weaker one as the weaker one won't recover in more than say 95% ( or whatever) of the games, if you play it out.

This would result in some kind of list:

Stockfish 0.9
Rebel 1.6
etc.etc.

Then you know exactly what a "winning position" means for a specific engine judged by its "winning number".

A value that is similar for all engines will either be unnecessarily conservative ( or mean that some engines do things in a very similar way).

Ghppn likes this post

Back to top Go down
Chris Whittington




Posts : 1254
Join date : 2020-11-17
Location : France

Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] EmptyThu Jul 06, 2023 3:44 pm

Peter Berger wrote:
The opinion of the weaker engine is pretty irrelevant in this challenge ( as is resigning itself).

If you pit two engines against each other the winning one will usually recognize the win first - the other one will catch up with the fail-lows later.

What you may want is that the winning one can resign for the weaker one ( to save time or avoid all the irrelevant moves).

Worded this way, it is just a typical statistical challenge, isn't it? Do a bazillion of engine games at a very speedy time control in a pool of engines of kind of similar strength ( else you get nonsense, as the stronger one will win anyway). Find the specific engine value ( in like three consecutive moves in a game to avoid randomness) where the stronger engine can resign for the weaker one as the weaker one won't recover in more than say 95% ( or whatever) of the games, if you play it out.

This would result in some kind of list:

Stockfish 0.9
Rebel 1.6
etc.etc.

Then you know exactly what a "winning position" means for a specific engine judged by its "winning number".

A value that is similar for all engines will either be unnecessarily conservative ( or mean that some engines do things in a very similar way).


Kind of missing the point. The idea is to determine if movestowin is a measure of engine aggression or engine Elo difference or both.
My opinion (without doing the regression experiment at this stage) is that it measures both. Stephan Pohl’s EAS tool uses it as if it measures aggression only. Okay, if it’s also Elo, if we can extract the Elo component, then we’re left with the EAS component.
It’s probably down to me to calculate the equations, nobody else is going to, but first we need to pin down what movestowin is going to be and how to calculate it.

Mclane and Ghppn like this post

Back to top Go down
Admin
Admin
Admin


Posts : 2528
Join date : 2020-11-17
Location : Netherlands

Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] EmptyThu Jul 06, 2023 11:58 pm

@STEFAN

If you reading this, how useful would it be for your EAS tool if I take your pgn database and store the move number a game is decided (as described in the OP) into the pgn? You are no longer dependent on Plycount and shorties would be more accurate.

Options, store move-number into Event, Site, Round, ECO, or maybe even in Plycount. I noticed pgn-extract can deal with that.

Maybe other wishes?
Back to top Go down
http://rebel13.nl/
Eelco

Eelco


Posts : 212
Join date : 2021-10-08

Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] EmptySun Jul 09, 2023 7:43 am

In case the EAS does not work too well, I still think the original Benjamin version of the Toga Rebels was working extremely well... But I have absolutely no idea how training NNUE works. It seemed to capture the original playing style of Benjamin so very good? Plus an enormous Elo lift. But as soon as you start adding Lco games, the natural tendency of going towards Lco, because that is Alpha Zero like, will go faster I think. Even without adding Lc0 games, it will eventually gravitate to Alpha Zero with every iteration and that is probably where a lot of the Elo comes from? But you do not want all of the Lc0 playing style, you also want sacrificial play. For endgame technique Lc0 is very okay.

A couple of million original MsDos CS Tal games, would that work the same as the Benjamin games? But who can play that now. But that would be my option one because it seems proven, with Benjamin, that it works.

Option two is simple and Ed already tried it I think, or intended to with Toga Rebel but it did not work with Toga anymore, because of the NNUE changes introduced, the material function of Toga could not be used anymore. It is very simple; if you have a decent fast material function available in every node, you can calibrate that for how much to subtract from the NNUE evaluation, to get a "pure" positional evaluation out of NNUE evaluation. This you can increase or decrease at will, (of course that has an impact on Elo too but that is where the art of training NNUE and programming comes in) and that will regulate how much material sacrifices you will see. That was the simple Bluefish idea (for Ovyron Smile). Because Stockfish really had a well tuned material function to begin with and you play essentially against all sorts of Stockfish when in the opening book warfare in the Engine Rooms of this world, it also worked as a predictor of where  a standard Stockfish would not see the material compensation of a sacrifice too well, so that is part of the 'opponent modeling' of Bluefish. But that only worked for Stockfish of course.

Nezhman and Ghppn like this post

Back to top Go down
Nezhman




Posts : 74
Join date : 2020-11-27

Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] EmptySun Jul 09, 2023 11:19 pm

Eelco wrote:
In case the EAS does not work too well, I still think the original Benjamin version of the Toga Rebels was working extremely well... But I have absolutely no idea how training NNUE works. It seemed to capture the original playing style of Benjamin so very good? Plus an enormous Elo lift. But as soon as you start adding Lco games, the natural tendency of going towards Lco, because that is Alpha Zero like, will go faster I think. Even without adding Lc0 games, it will eventually gravitate to Alpha Zero with every iteration and that is probably where a lot of the Elo comes from? But you do not want all of the Lc0 playing style, you also want sacrificial play. For endgame technique Lc0 is very okay.

A couple of million original MsDos CS Tal games, would that work the same as the Benjamin games? But who can play that now. But that would be my option one because it seems proven, with Benjamin, that it works.

Option two is simple and Ed already tried it I think, or intended to with Toga Rebel but it did not work with Toga anymore, because of the NNUE changes introduced, the material function of Toga could not be used anymore. It is very simple; if you have a decent fast material function available in every node, you can calibrate that for how much to subtract from the NNUE evaluation, to get a "pure" positional evaluation out of NNUE evaluation. This you can increase or decrease at will, (of course that has an impact on Elo too but that is where the art of training NNUE and programming comes in) and that will regulate how much material sacrifices you will see. That was the simple Bluefish idea (for Ovyron Smile). Because Stockfish really had a well tuned material function to begin with and you play essentially against all sorts of Stockfish when in the opening book warfare in the Engine Rooms of this world, it also worked as a predictor of where  a standard Stockfish would not see the material compensation of a sacrifice too well, so that is part of the 'opponent modeling' of Bluefish. But that only worked for Stockfish of course.

When was the original Benjamin Toga Rebel released? I'd like to look it up by timestamp to see if I have it downloaded. Thanks!
Back to top Go down
texium




Posts : 112
Join date : 2022-07-19

Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] EmptySun Jul 09, 2023 11:27 pm

Eelco wrote:
In case the EAS does not work too well, I still think the original Benjamin version of the Toga Rebels was working extremely well... But I have absolutely no idea how training NNUE works. It seemed to capture the original playing style of Benjamin so very good? Plus an enormous Elo lift. But as soon as you start adding Lco games, the natural tendency of going towards Lco, because that is Alpha Zero like, will go faster I think. Even without adding Lc0 games, it will eventually gravitate to Alpha Zero with every iteration and that is probably where a lot of the Elo comes from? But you do not want all of the Lc0 playing style, you also want sacrificial play. For endgame technique Lc0 is very okay.

A couple of million original MsDos CS Tal games, would that work the same as the Benjamin games? But who can play that now. But that would be my option one because it seems proven, with Benjamin, that it works.

Option two is simple and Ed already tried it I think, or intended to with Toga Rebel but it did not work with Toga anymore, because of the NNUE changes introduced, the material function of Toga could not be used anymore. It is very simple; if you have a decent fast material function available in every node, you can calibrate that for how much to subtract from the NNUE evaluation, to get a "pure" positional evaluation out of NNUE evaluation. This you can increase or decrease at will, (of course that has an impact on Elo too but that is where the art of training NNUE and programming comes in) and that will regulate how much material sacrifices you will see. That was the simple Bluefish idea (for Ovyron Smile). Because Stockfish really had a well tuned material function to begin with and you play essentially against all sorts of Stockfish when in the opening book warfare in the Engine Rooms of this world, it also worked as a predictor of where  a standard Stockfish would not see the material compensation of a sacrifice too well, so that is part of the 'opponent modeling' of Bluefish. But that only worked for Stockfish of course.
they're trying to make a strong, tal-like engine. They've made it strong, but its a matter of how much strength they want to give away for style. NNUE is originally trained on a HCE like Benjamin was. After each new version though, the engine is stronger and makes less mistakes. So the net is trained on less variable data, you lose some style because those engines are weak. There's probably still some room for CSTal, but what they really might want to consider is retraining a net but using human games evaluated by CStal as part of training data. Human games usually make a more aggressive engine/net but still manage to result in strong engines/nets. i.e. Badgyal, Night Nurse, Viridthias.

Nezhman and Ghppn like this post

Back to top Go down
texium




Posts : 112
Join date : 2022-07-19

Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] EmptySun Jul 09, 2023 11:30 pm

Peter Berger wrote:
The opinion of the weaker engine is pretty irrelevant in this challenge ( as is resigning itself).

If you pit two engines against each other the winning one will usually recognize the win first - the other one will catch up with the fail-lows later.

What you may want is that the winning one can resign for the weaker one ( to save time or avoid all the irrelevant moves).

Worded this way, it is just a typical statistical challenge, isn't it? Do a bazillion of engine games at a very speedy time control in a pool of engines of kind of similar strength ( else you get nonsense, as the stronger one will win anyway). Find the specific engine value ( in like three consecutive moves in a game to avoid randomness) where the stronger engine can resign for the weaker one as the weaker one won't recover in more than say 95% ( or whatever) of the games, if you play it out.

This would result in some kind of list:

Stockfish 0.9
Rebel 1.6
etc.etc.

Then you know exactly what a "winning position" means for a specific engine judged by its "winning number".

A value that is similar for all engines will either be unnecessarily conservative ( or mean that some engines do things in a very similar way).
another issue with this is sometimes engines are just wrong. They eval as winning when its a draw or losing. So who would be in the right?

Ghppn likes this post

Back to top Go down
Chris Whittington




Posts : 1254
Join date : 2020-11-17
Location : France

Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] EmptySun Jul 09, 2023 11:32 pm

texium wrote:
Peter Berger wrote:
The opinion of the weaker engine is pretty irrelevant in this challenge ( as is resigning itself).

If you pit two engines against each other the winning one will usually recognize the win first - the other one will catch up with the fail-lows later.

What you may want is that the winning one can resign for the weaker one ( to save time or avoid all the irrelevant moves).

Worded this way, it is just a typical statistical challenge, isn't it? Do a bazillion of engine games at a very speedy time control in a pool of engines of kind of similar strength ( else you get nonsense, as the stronger one will win anyway). Find the specific engine value ( in like three consecutive moves in a game to avoid randomness) where the stronger engine can resign for the weaker one as the weaker one won't recover in more than say 95% ( or whatever) of the games, if you play it out.

This would result in some kind of list:

Stockfish 0.9
Rebel 1.6
etc.etc.

Then you know exactly what a "winning position" means for a specific engine judged by its "winning number".

A value that is similar for all engines will either be unnecessarily conservative ( or mean that some engines do things in a very similar way).
another issue with this is sometimes engines are just wrong. They eval as winning when its a draw or losing. So who would be in the right?

Eval different sign to result (when there is a result) is quite rare though.

Ghppn likes this post

Back to top Go down
Eelco

Eelco


Posts : 212
Join date : 2021-10-08

Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] EmptyMon Jul 10, 2023 12:48 pm

Nezhman wrote:
Eelco wrote:
In case the EAS does not work too well, I still think the original Benjamin version of the Toga Rebels was working extremely well... But I have absolutely no idea how training NNUE works. It seemed to capture the original playing style of Benjamin so very good? Plus an enormous Elo lift. But as soon as you start adding Lco games, the natural tendency of going towards Lco, because that is Alpha Zero like, will go faster I think. Even without adding Lc0 games, it will eventually gravitate to Alpha Zero with every iteration and that is probably where a lot of the Elo comes from? But you do not want all of the Lc0 playing style, you also want sacrificial play. For endgame technique Lc0 is very okay.

A couple of million original MsDos CS Tal games, would that work the same as the Benjamin games? But who can play that now. But that would be my option one because it seems proven, with Benjamin, that it works.

Option two is simple and Ed already tried it I think, or intended to with Toga Rebel but it did not work with Toga anymore, because of the NNUE changes introduced, the material function of Toga could not be used anymore. It is very simple; if you have a decent fast material function available in every node, you can calibrate that for how much to subtract from the NNUE evaluation, to get a "pure" positional evaluation out of NNUE evaluation. This you can increase or decrease at will, (of course that has an impact on Elo too but that is where the art of training NNUE and programming comes in) and that will regulate how much material sacrifices you will see. That was the simple Bluefish idea (for Ovyron Smile). Because Stockfish really had a well tuned material function to begin with and you play essentially against all sorts of Stockfish when in the opening book warfare in the Engine Rooms of this world, it also worked as a predictor of where  a standard Stockfish would not see the material compensation of a sacrifice too well, so that is part of the 'opponent modeling' of Bluefish. But that only worked for Stockfish of course.

When was the original Benjamin Toga Rebel released? I'd like to look it up by timestamp to see if I have it downloaded. Thanks!

Hi Nezhman,

I find it difficult to search for release dates, I think I should have called it the Fruit version with Benjamin Net, not Toga, specifically Pawel Koziol's Fruit version, but exactly when that was, and how it was called...
It would have been somewhere after this thread from Ed where he describes the first Benjamin NNUE: https://prodeo.actieforum.com/t638-my-first-nnue
and that was Dec 08 2021.
There was a first Benjamin net, maybe that was the one mentioned by Ed there in the posted link. Not official. That was compatible with Marvin, by accident, so it had a strong search if you had that program available. For sure I don't have that version, and the link, for the NNUE only, was on rebel13.nl so that does not work anymore.

Maybe Matejst still has it somewhere on his older computer...

Shortly after that Ed published the first Fruit version with a Benjamin based NNUE net. I'm sure I have that but exactly which date or name right now I do not know...

Nezhman and Ghppn like this post

Back to top Go down
Nezhman




Posts : 74
Join date : 2020-11-27

Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] EmptyTue Jul 11, 2023 3:50 am

Thanks, Eelco.
Back to top Go down
Sponsored content





Forum challenge [!] Empty
PostSubject: Re: Forum challenge [!]   Forum challenge [!] Empty

Back to top Go down
 
Forum challenge [!]
Back to top 
Page 1 of 1
 Similar topics
-
» Monthly GM challenge
» What were the nodes per move in the Crafty-Rebel NP Challenge?
» New Chess Forum "Chess Engine Lovers"
» Fishtest forum
» Good luck with your new forum!

Permissions in this forum:You cannot reply to topics in this forum
ProDeo :: Computer Chess-
Jump to: