Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Rodent NNUE development Sun Mar 27, 2022 5:31 pm
Months ago I promised Pawel a Rodent net running under Rebel 14. A first try failed. But I have a good 450 million position base (created from Rodent 4.022) running now. On this page you can follow the development of the NNUE.
Keep in mind that Rodent 4.022 is ~3000 elo rated, the Rebel search based on Toga is estimated 2850-2900 max.
The learner is still in an early stage, it's currently at epoch-15 and should at least run till epoch-100.
First results
Code:
Rodent-NN vs Rodent 4.022
Epoch Games Time Result Epoch-3 1000 40/20 67.3% Epoch-8 1500 40/20 69.4% Epoch-15 500 40/20 76.4%
Results are so good already it's better to leave this path and play other engines. I have chosen for a 3218 elo pool which no doubt is too high in the early stage of the learner but I have hope Rodent NN will reach this level as the learner makes progress.
Epoch-15 : 3168 elo Epoch-30 : 3178 elo Epoch-40 : 3187 elo Epoch-50 : 3180 elo Epoch-60 : 3174 elo Epoch-75 : 3183 elo Epoch-85 : 3162 elo Epoch-95 : 3178 elo
Notes
1. The elo gain is in the range of 278 - 328 depending how strong you estimate the Toga search.
2. For a relative small dataset of 450 million positions this is a good result.
3. Not much is happening after epoch-30, not unlikely.
4. Note that the error bar for 1000 games is -15/+15
Preparing for PART-II Increasing the dataset to ~750 million and try again, most of the time this gives more elo. Then restart the learner and repeat the process.
Last edited by Admin on Tue Mar 29, 2022 8:39 am; edited 7 times in total
adminx likes this post
Chris Whittington
Posts : 1254 Join date : 2020-11-17 Location : France
Subject: Re: Rodent NNUE development Sun Mar 27, 2022 5:50 pm
Admin wrote:
Months ago I promised Pawel a Rodent net running under Rebel 14. A first try failed. But I have a good 450 million position base (created from Rodent 4.022) running now. On this page you can follow the development of the NNUE.
Keep in mind that Rodent 4.022 is ~3000 elo rated, the Rebel search based on Toga is estimated 2850-2900 max.
The learner is still in an early stage, it's currently at epoch-15 and should at least run till epoch-100.
First results
Code:
Rodent-NN vs Rodent 4.022
Epoch Games Time Result Epoch-3 1000 40/20 67.3% Epoch-8 1500 40/20 69.4% Epoch-15 500 40/20 76.4%
Results are so good already it's better to leave this path and play other engines. I have chosen for a 3218 elo pool which no doubt is too high in the early stage of the learner but I have hope Rodent NN will reach this level as the learner makes progress.
Three epochs is about one hour of training on Ed’s system. After three epochs, the engine has maybe 3100 Elo. World Chess Champion Elo is maybe 2850. Who could have imagined this even ten years ago. Less than an hour to become a Super-GM.
TheSelfImprover, adminx and matejst like this post
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Rodent NNUE development Mon Mar 28, 2022 2:27 am
Chris Whittington wrote:
Admin wrote:
Months ago I promised Pawel a Rodent net running under Rebel 14. A first try failed. But I have a good 450 million position base (created from Rodent 4.022) running now. On this page you can follow the development of the NNUE.
Keep in mind that Rodent 4.022 is ~3000 elo rated, the Rebel search based on Toga is estimated 2850-2900 max.
The learner is still in an early stage, it's currently at epoch-15 and should at least run till epoch-100.
First results
Code:
Rodent-NN vs Rodent 4.022
Epoch Games Time Result Epoch-3 1000 40/20 67.3% Epoch-8 1500 40/20 69.4% Epoch-15 500 40/20 76.4%
Results are so good already it's better to leave this path and play other engines. I have chosen for a 3218 elo pool which no doubt is too high in the early stage of the learner but I have hope Rodent NN will reach this level as the learner makes progress.
Three epochs is about one hour of training on Ed’s system. After three epochs, the engine has maybe 3100 Elo. World Chess Champion Elo is maybe 2850. Who could have imagined this even ten years ago. Less than an hour to become a Super-GM.
And a look in the kitchen how we bake our engines and when they are cooked become yours.
adminx and matejst like this post
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Rodent NNUE development Tue Mar 29, 2022 7:43 pm
PART TWO
Using 800 million Rodent positions vs previous run of 450 million.
This ends the Rodent NNUE development and testing.
Closing remarks:
1. The created net based on Rodent HCE will improve with ~300 elo over Rodent HCE evaluation. Pawel is free to import the GPL interfere NNUE source from Chris into Rodent.
2. The net needs fine tuning, will certainly give extra elo.
3. I like to release the Rodent-NN running under my source code, will be ~3200 elo CCRL and GRL but I am not sure how to call it.
4. At the start of this thread I mailed Pawel and made him aware however I haven't seen him reading the forum. I can imagine he has different priorities, 3 million refugees entering your country including the tensions of the war are not small things. Today I have mailed him again. We will see.
Thanks for providing this interesting engine. I played a testgame on two pretty identical notebooks at a slow time control. I really liked the game very much because of the early exchange sac, that came as a surprise and challenge to my own evaluation, though it has to be said that Crafty understood it just as well to my surprise – both engines arrived at similar conclusions. But the time management of Rodent can’t be good here and would probably apply to Rebel and Tal just as well. I am not sure of the why ( unless you introduced a new bug into Toga) , maybe it is because Rodent produces higher swings in evaluation than Rebel or Tal in general ?! Settings were 4 threads and 5GB Hash. 20 minutes for 2. … g6 and 20 minutes for 3. … d5 have to be unreasonable time management in general. [Event "Lang 120min+10sek"] [Site "Berlin"] [Date "2022.04.08"] [Round "?"] [White "Crafty 25.3"] [Black "Rodent NN 1.1"] [Result "0-1"] [ECO "D85"] [PlyCount "136"] [TimeControl "7200+10"]
Subject: Re: Rodent NNUE development Sat Apr 09, 2022 5:18 am
I had something that does not really belong to the Rodent thread, but I don't want to go back to Rebel 14.1 for this; I see Rebel sometimes going for opposite coloured Bishop endgames with very high scores but that are recognized by the opponent as draws. Probably there are more cases where the endgame knowledge is a bit missing.
For instance in the Stockfish - Dragon Superfinal game position that Ernest posted 38...Re4 is probably only a draw but 38...Qf6 is, we think, possibly winning. But if I analyze a bit with Rebel 14.1 against Stockfish, it ends up with these opposite color endgames that are rather totally drawn unfortunately.
beste zet: Kf6-g5 tijd: 35:31.406 min n/s: 1.097.555 nodes: 2.339.331.482
There must be many games like this for instance in the gauntlet from Graham Banks. How to teach Rebel the endgame knowledge? You could replace Rebel's score in high quality games with something that gradually goes towards the game result, especially for draws where opponent already knows it is a draw much earlier than Rebel. But it is a gargantuan task and not doable at all if you have billions of separate positions, not games. If you have the complete games it is a bit easier. Well I don't really know if it can be done that way. The alternative I think is to go back to total scratch like Alpha Zero, let Rebel learn without Benjamin scores at all. But that needs so much computing power and is not Rebel anymore but Alpha Zero, Rebel Zero what name do you want to give it.
It is not something done on a rainy afternoon...
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Rodent NNUE development Sat Apr 09, 2022 8:10 am
The answer is that the learner has too few positions to learn from it, the cure is too add 100+ million (or so) opposite colored Bishop endgames to the data. The most extreme example of too few data is the KBNK ending, try it, it doesn't know how to mate. Here the solution is to copy the PoDeo HCE code that handles it well and switch to HCE eval, also for those other special endings. Kind of finishing touch.
TheSelfImprover and matejst like this post
matejst
Posts : 612 Join date : 2020-11-26
Subject: Re: Rodent NNUE development Sun Apr 10, 2022 9:30 pm
Ed, I noticed that although trained on the same number of games, CSTal had a better understanding of endings with different colors bishops, knows also the endgame with a bishop a the a/h pawn of the wrong color, etc. I think it was probably more a problem in Benjamin's evaluation than anything else.
Eelco
Posts : 233 Join date : 2021-10-08
Subject: Re: Rodent NNUE development Mon Apr 11, 2022 5:41 am
Thanks for the explanation Ed. It is fascinating. You are right about the KBNK endgame, it is hopeless because the Lone Black King always manages to go to the corner where the Bishop can't give check and White does not manage to get him to the other corner, and then 50 moves are up It would sure be nice if Rebel had more endgame knowledge, if there is any way we could help with providing those (endgame) positions. For instance if you would want to make something for Pavel again, to stay with this thread?
Eelco
Posts : 233 Join date : 2021-10-08
Subject: Re: Rodent NNUE development Mon Apr 11, 2022 6:16 am
matejst wrote:
Ed, I noticed that although trained on the same number of games, CSTal had a better understanding of endings with different colors bishops, knows also the endgame with a bishop a the a/h pawn of the wrong color, etc. I think it was probably more a problem in Benjamin's evaluation than anything else.
It is probably different for different programs how much bonus they give for an extra pawn, even though technically (or heuristically, I'm not sure it is proven for eight men games always?) it is drawn, you can still win against a human opponent or weaker program, so programs can give it a small positive score. But almost 3½ pawns starts to work counterproductive. Ed I think says it is an artifact because the endgame positions are easily underrepresented in the learning process.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Rodent NNUE development Mon Apr 11, 2022 10:42 am
Both of you are right to a certain extend. Here is something I learned through the years about HCE, it's valid for search as well.
The evaluation (of a position) is the heart of a chess program, in the end the final move is decided by the evaluation provided the search part is reasonable. The eval consists of many ingredients, the 3 most important (and dominant) are: 1) mobility, 2 passed pawns and 3) king safety and 25-50-100 or more others such as pawn evaluation, PST, bishop pair, center control, outposts, endgame stuff as the next important ones.
For all these ingredients you have to invent ideas (rules) and write the code for it, apply bonuses and penalty values. First problem, no matter how hard you try for complex eval ingredients such as king safety, mobility, passed pawn you never can write perfect code, but..... if in 90-95% (arguable!) of the cases your code is right you have accomplished something good.
Second problem, ingredients you program interact with each other and sometimes may clash, for example, you may have a good mobility and good king safety but in practice in many positions the one is more important than the other and may lead to bad moves after all. For a typical middle game position all of above mentioned eval ingredients interact with each other and will influence the final evaluation score (the sum of all ingredients) and as a result the reliability of the total evaluation will drop considerable. And to deal with that problem we invented the concept of tuning, playing thousands of games learning all the ingredients to get along with each other, so to say, and find the right balance.
It's what the NNUE learner is doing also, only in a different way and a lot better and much will depend how good your HCE eval is. In the case you have not so well functioning ingredients in your HCE eval, which in my case is certainly true for unequal bishop endings you still may expect a good improvement from the learner PROVIDED the learner is given enough positions to learn from.
The learner not only looks at the scores of moves but also at the game result and if too high scores (of say +3.xx) or to low scores (of say -3.xx) eventually end in a draw the learner will take notice.
adminx and matejst like this post
Chris Whittington
Posts : 1254 Join date : 2020-11-17 Location : France
Both of you are right to a certain extend. Here is something I learned through the years about HCE, it's valid for search as well.
The evaluation (of a position) is the heart of a chess program, in the end the final move is decided by the evaluation provided the search part is reasonable. The eval consists of many ingredients, the 3 most important (and dominant) are: 1) mobility, 2 passed pawns and 3) king safety and 25-50-100 or more others such as pawn evaluation, PST, bishop pair, center control, outposts, endgame stuff as the next important ones.
For all these ingredients you have to invent ideas (rules) and write the code for it, apply bonuses and penalty values. First problem, no matter how hard you try for complex eval ingredients such as king safety, mobility, passed pawn you never can write perfect code, but..... if in 90-95% (arguable!) of the cases your code is right you have accomplished something good.
Second problem, ingredients you program interact with each other and sometimes may clash, for example, you may have a good mobility and good king safety but in practice in many positions the one is more important than the other and may lead to bad moves after all. For a typical middle game position all of above mentioned eval ingredients interact with each other and will influence the final evaluation score (the sum of all ingredients) and as a result the reliability of the total evaluation will drop considerable. And to deal with that problem we invented the concept of tuning, playing thousands of games learning all the ingredients to get along with each other, so to say, and find the right balance.
It's what the NNUE learner is doing also, only in a different way and a lot better and much will depend how good your HCE eval is. In the case you have not so well functioning ingredients in your HCE eval, which in my case is certainly true for unequal bishop endings you still may expect a good improvement from the learner PROVIDED the learner is given enough positions to learn from.
The learner not only looks at the scores of moves but also at the game result and if too high scores (of say +3.xx) or to low scores (of say -3.xx) eventually end in a draw the learner will take notice.
If you train the nnue on game results only (as many people are doing) then it will learn all these things no matter (well, almost no matter) what the original HCE says. This was the zero concept, proven by Alpha Zero, it learnt from scratch with no evaluation playing random games. Presumably (I’ve forgotten) Leela started off like that.
D. Kappe wrote about experimenting with evaluation vs result learning, but I have the impression that evaluation learning helps keeping the characteristics, the style of play of an engine.
Then -- I really don't see a notable qualitative difference between Rebel and, let's say, SF in the opening at the same depths, but the difference start to be more visible in later phases of the game. The initial moves are, logically, more represented in the training data. I believe that there has to be an intelligent, targeted selection of the data for NN training/learning, and a good way to proceed would be to see what the weakness of the engine are, then add adequate positions that will help improve this part of his game.
Sorry, I am tired and my English is awful right now.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Rodent NNUE development Tue Apr 12, 2022 10:45 am
matejst wrote:
D. Kappe wrote about experimenting with evaluation vs result learning, but I have the impression that evaluation learning helps keeping the characteristics, the style of play of an engine.
Then -- I really don't see a notable qualitative difference between Rebel and, let's say, SF in the opening at the same depths, but the difference start to be more visible in later phases of the game. The initial moves are, logically, more represented in the training data. I believe that there has to be an intelligent, targeted selection of the data for NN training/learning, and a good way to proceed would be to see what the weakness of the engine are, then add adequate positions that will help improve this part of his game.
Sorry, I am tired and my English is awful right now.
Fixing eval holes, such as unequal bishop endings, can be done in the following way, create a set of at least 5-10 million of unequal bishop endings, play games and you might end up with 200-300 million usable positions for the learner and volume is THE measuring rod to weed out most of the miss evaluations.
Probably (emphasis added) even better play those 5-10 million games with an engine that understands unequal bishops better and put those into the learner. I say probably because mixing engines is a dangerous concept but the whole neural net business is a matter of trial by error anyway.
Only problem, how to create 5-10 million unequal bishop ending start positions.
matejst likes this post
matejst
Posts : 612 Join date : 2020-11-26
Subject: Re: Rodent NNUE development Tue Apr 12, 2022 11:21 am
In my previous post, I started to write that the filtering of data could be a mess, but I was really tired and in such cases, I simply forget English words.
Would it be a good idea to add, just like Connor McMonigle did, positions from 6 men TBH? [Connor aimed at something else though].
Jonathan Kreuzer also experimented with special endgame nets for SlowChess, and after John Stanback remark in a CCC thread, I ran a short match between Wasp and SC where, just like John said, SC destroyed Wasp in endgames. What interested me was how Wasp fared in the openings and middlegames -- and it was OK, but the longer and the greater was SC's advantage. Perhaps having several smaller nets could be more productive than having a huge net. Of course, I am not the one that knows how to do it.