Crafty NN development

Subject: Crafty NN development Mon Apr 18, 2022 12:44 pm

Crafty NN progress.

Crafty NN vs Crafty 25.6 (Bob's last one)

Code:: Epoch Games Result Elo Data cycle
4 1000 51.5% +10 0.99
9 1000 51.5% +10 2.24
21 1000 50.2% +1 5.29
30 500 49.0% -7 7.55

I created a small 400 million neural net with Crafty 25.6. The intention was a 450M NN (as with Rodent) but Windows 10 in its incomprehensible wisdom at 4:11 AM decided to reboot.

Code:: CCRL 40/15
Crafty 25.6 2904
Toga IV 2903

Curiously the real Crafty and Crafty NN running under Toga IV are CCRL equally rated, but in the meantime the Toga IV search has been improved by me so an exact comparison is not possible.

Nevertheless while the learner progresses (increasing epoch numbers) it is expected Crafty NN to become stronger and stronger. A major problem is always when to stop the learner, it's a bit of an unexplored area. With my limited knowledge so far it's my (probably premature) conclusion the learner needs at least 12 data cycles. A data cycle is when the learner has gone through all the 400 million positions and then starts again. Looking at the small net of 400M I will probably let it run till 20 data cycles.

Enjoy.

Subject: Re: Crafty NN development Mon Apr 18, 2022 8:00 pm

It's very unfortunate but I have to stop here at epoch-30. The net is not giving and will not giving. The net is unstable due to too much noise (bad evaluations), the learner can't learn from too much noise. See also the graphs as further explanation.

Graph-1 : bad Crafty graph
Graph-2 : bad Rebel graph
Graph-3 : good graph

A good graph is when the red and blue line gradually move from the top to the bottom.

It once shows again that a good HCE eval matters for NNUE.

Back to Rebel.

Posts : 131 Join date : 2020-11-20

Your results aren't easy to grasp for a lay. Am I correct to conclude that 300 points improvement over 25.6 feel unlikely right now?

LOL - in the meantime you posted a very easy to understand post, sry, my post is redundant now.

Could you provide your Crafty 25.6 executable btw?

Subject: Re: Crafty NN development Mon Apr 18, 2022 9:22 pm

Sorry Peter, no executable.

Code:: info multipv 1 depth 15 seldepth 43 score cp 1 time 2312 nodes 1173050 pv d2d4 d7d5 c2c3 a7a6 g1f3 c7c6 a2a3 e7e6 e1d2 g8f6 g2g3 c6c5 f1g2 c8d7

I can not and will not responsible for that.

Likely a Crafty playing 6 million games (that is what is needed creating a small 400M net) using a factor of 2-4-8 more time will do a lot better.

Subject: Re: Crafty NN development Mon Apr 18, 2022 9:26 pm

BTW, 25.6 is at -

https://www.chess2u.com/t13508-crafty-25-6-is-here

Posts : 131 Join date : 2020-11-20

I really know Crafty - it never does things like e1d2 in PVs here. Have you witnessed this with the initial 25.2, too? Really looks buggy to me.

Posts : 131 Join date : 2020-11-20

I have given Crafty 25.6 a try with the gcc compile.
It doesn’t produce unwanted logfiles and plays as strong as can be expected.
It looks as if it is not suitabe for automated optimization by an NNUE process.
If I get you right, this is due to the fact that its HCE isn’t good enough.
But is this really an explanation? And if it were, how is this logical? Let’s assume that Crafty assigns eval to a position in a more or less linear way, give or take, which is probably just the truth. Its eval may be right, it may be wrong, but its eval works to produce a strong chess entity.
This NNUE process is some kind of optimisation I assume. If two eval functions work in a similar way, wouldn’t it be more logical for the weaker one to profit more?

Subject: Re: Crafty NN development Fri Apr 22, 2022 12:29 am

It's complicated.

And it would be best if Bob would comment since he knows best.

However I will give it my best shot, here goes.

Volume is crucial, the learner is hungry, it needs millions of positions. For Rodent and Crafty both needed 6 million games to produce a relative small epd database of 400 million positions. How do you play 6 million games? You do it with a low depth time control. And that already takes a full week on one PC using 40 threads. For Rebel, Rodent and Crafty I used depth=6. Now lets have a look at the quality of depth=6, we play 100 game matches against the new SF15.

Code:: No. Name Win Draw Loss Unf. Score Games %
-------------------------------------------------------
  1 Rebel-14.2 +34 =40 -26 *0 54.0 100 54.0%
  2 SF15 +26 =40 -34 *0 46.0 100 46.0%

No. Name Win Draw Loss Unf. Score Games %
------------------------------------------------------
  1 Rodent-NN +24 =52 -24 *0 50.0 100 50.0%
  2 SF15 +24 =52 -24 *0 50.0 100 50.0%

No. Name Win Draw Loss Unf. Score Games %
--------------------------------------------------------
  1 SF15 +64 =17 -19 *0 72.5 100 72.5%
  2 Crafty-25.3 +19 =17 -64 *0 27.5 100 27.5%

No. Name Win Draw Loss Unf. Score Games %
------------------------------------------------------
  1 Gandalf-7 +38 =24 -38 *0 50.0 100 50.0%
  2 SF15 +38 =24 -38 *0 50.0 100 50.0%

No. Name Win Draw Loss Unf. Score Games %
------------------------------------------------------
  1 SF15 +45 =16 -39 *0 53.0 100 53.0%
  2 Fruit-2.1 +39 =16 -45 *0 47.0 100 47.0%

I could go on, there also is this extremely high NPS of Crafty, it is as if Crafty puts all its money on search and therefore disqualifies itself for NNUE learning at low depths. Like I said previously, Crafty likely will do better at higher depth like 7, 8, 9 or 10 but then creating volume will become an almost impossible job.

Hope this helps a bit.

Posts : 131 Join date : 2020-11-20

Admin wrote:

It's complicated.

And it would be best if Bob would comment since he knows best.

However I will give it my best shot, here goes.

Volume is crucial, the learner is hungry, it needs millions of positions. For Rodent and Crafty both needed 6 million games to produce a relative small epd database of 400 million positions. How do you play 6 million games? You do it with a low depth time control. And that already takes a full week on one PC using 40 threads. For Rebel, Rodent and Crafty I used depth=6. Now lets have a look at the quality of depth=6, we play 100 game matches against the new SF15.

Code:: No. Name Win Draw Loss Unf. Score Games %
-------------------------------------------------------
  1 Rebel-14.2 +34 =40 -26 *0 54.0 100 54.0%
  2 SF15 +26 =40 -34 *0 46.0 100 46.0%

No. Name Win Draw Loss Unf. Score Games %
------------------------------------------------------
  1 Rodent-NN +24 =52 -24 *0 50.0 100 50.0%
  2 SF15 +24 =52 -24 *0 50.0 100 50.0%

No. Name Win Draw Loss Unf. Score Games %
--------------------------------------------------------
  1 SF15 +64 =17 -19 *0 72.5 100 72.5%
  2 Crafty-25.3 +19 =17 -64 *0 27.5 100 27.5%

No. Name Win Draw Loss Unf. Score Games %
------------------------------------------------------
  1 Gandalf-7 +38 =24 -38 *0 50.0 100 50.0%
  2 SF15 +38 =24 -38 *0 50.0 100 50.0%

No. Name Win Draw Loss Unf. Score Games %
------------------------------------------------------
  1 SF15 +45 =16 -39 *0 53.0 100 53.0%
  2 Fruit-2.1 +39 =16 -45 *0 47.0 100 47.0%

I could go on, there also is this extremely high NPS of Crafty, it is as if Crafty puts all its money on search and therefore disqualifies itself for NNUE learning at low depths. Like I said previously, Crafty likely will do better at higher depth like 7, 8, 9 or 10 but then creating volume will become an almost impossible job.

Hope this helps a bit.

First – thank you very much for your clear explanation, Ed.
An obvious question: do I get it right, that though Crafty does depth=7 faster than most others, it is still too slow to make it viable for your NNUE method at depth=7 considering your availlable ressources?
I have another question, please bear with me for a little longer.
I have since had a closer look at Crafty 25.6’s performance. It looks pretty good and impressive to me and like a clear improvement over 25.3.
Please let me show a game from today first, that I consider to be very pretty against the strongest opponent availlable.
[pgn]
[Event "Lang 120min+10sek"]
[Site "Berlin"]
[Date "2022.04.22"]
[Round "?"]
[White "Stockfish 220422"]
[Black "Crafty 25.6"]
[Result "1-0"]
[ECO "E90"]
[PlyCount "99"]
[TimeControl "7200+10"]

{5120MB, LAPTOP-NCDN8BTK} 1. d4 {[%eval 37,42] [%emt 0:07:04]} Nf6 {[%emt 0:00:
06]} 2. Nf3 {[%eval 31,39] [%emt 0:01:30]} g6 {[%emt 0:00:06] (e6)} 3. c4 {
[%eval 28,36] [%emt 0:01:25]} Bg7 {[%emt 0:00:06]} 4. Nc3 {[%eval 44,39] [%emt
0:04:55]} O-O {[%emt 0:00:05] (d5)} 5. e4 {[%eval 59,34] [%emt 0:01:55]} d6 {
[%emt 0:00:06]} 6. h3 {[%eval 62,37] [%emt 0:02:04]} e5 {[%emt 0:00:06]} 7. d5
{[%eval 46,41] [%emt 0:03:32]} a5 {[%emt 0:00:06] (Sh5)} 8. Be3 {[%eval 82,42]
[%emt 0:08:42]} Na6 {[%emt 0:00:07] (Sbd7)} 9. g4 {[%eval 76,37] [%emt 0:02:59]
} Nc5 {[%emt 0:00:06]} 10. Nd2 {[%eval 62,37] [%emt 0:01:51]} c6 {[%emt 0:00:
06] (a4)} 11. a4 {[%eval 76,37] [%emt 0:02:40]} Nfd7 {[%emt 0:05:29] (Db6)} 12.
h4 {[%eval 82,34] [%emt 0:01:38]} Qb6 {[%emt 0:04:24] (Sa6)} 13. Rb1 {[%eval
107,32] [%emt 0:01:14]} Nb8 {[%emt 0:01:24]} 14. h5 {[%eval 102,36] [%emt 0:01:
28]} Nba6 {[%emt 0:03:19]} 15. Be2 {[%eval 112,37] [%emt 0:00:01]} Bd7 {
[%emt 0:02:46]} 16. Kf1 {[%eval 100,37] [%emt 0:02:20]} Rac8 {[%emt 0:05:45]
(Dd8)} 17. g5 {[%eval 101,36] [%emt 0:01:58]} Qd8 {[%emt 0:17:27]} 18. Nf3 {
[%eval 136,40] [%emt 0:01:16]} cxd5 {[%emt 0:03:44] (De8)} 19. cxd5 {[%eval
175,31] [%emt 0:02:37]} Nb4 {[%emt 0:00:23] (Le8)} 20. Nh4 {[%eval 163,49]
[%emt 0:02:50]} Bh8 {[%emt 0:02:17] (De7)} 21. Ra1 {[%eval 162,47] [%emt 0:02:
00]} Bg7 {[%emt 0:06:35] (De7)} 22. Rh2 {[%eval 178,38] [%emt 0:02:29]} Rc7 {
[%emt 0:02:19] (De7)} 23. Kg2 {[%eval 158,38] [%emt 0:04:24]} Bh8 {[%emt 0:03:
11] (De7)} 24. Ra3 {[%eval 182,40] [%emt 0:04:16]} b6 {[%emt 0:03:48] (De7)}
25. Qb1 {[%eval 157,51] [%emt 0:03:27]} Qe7 {[%emt 0:04:00]} 26. Nb5 {[%eval
182,41] [%emt 0:01:46]} Rcc8 {[%emt 0:01:10] (Tb7)} 27. Kh1 {[%eval 187,37]
[%emt 0:01:55]} Bg7 {[%emt 0:00:14]} 28. Rg2 {[%eval 208,36] [%emt 0:01:43]}
Rb8 {[%emt 0:02:10] (Kh8)} 29. Nf3 {[%eval 209,30] [%emt 0:01:07]} Rfc8 {
[%emt 0:03:31]} 30. Nd2 {[%eval 211,35] [%emt 0:01:14]} Bf8 {[%emt 0:02:25]}
31. f3 {[%eval 218,37] [%emt 0:00:01]} gxh5 {[%emt 0:01:10] (Sba6)} 32. Qd1 {
[%eval 251,32] [%emt 0:02:22]} Kh8 {[%emt 0:02:11] (Lg7)} 33. f4 {[%eval 347,
30] [%emt 0:01:15]} exf4 {[%emt 0:00:49]} 34. Bxf4 {[%eval 360,34] [%emt 0:00:
54]} Nb7 {[%emt 0:00:07]} 35. Re3 {[%eval 352,35] [%emt 0:01:25]} Nc2 {[%emt 0:
00:49]} 36. Rc3 {[%eval 403,36] [%emt 0:01:18]} Nb4 {[%emt 0:00:30]} 37. Bf1 {
[%eval 440,39] [%emt 0:10:06]} Bg4 {[%emt 0:00:49]} 38. Rxg4 {[%eval 418,29]
[%emt 0:00:38]} hxg4 {[%emt 0:00:06]} 39. Qxg4 {[%eval 484,30] [%emt 0:01:36]}
Rxc3 {[%emt 0:00:07]} 40. bxc3 {[%eval 474,32] [%emt 0:01:17]} Na6 {[%emt 0:03:
48] (Lg7)} 41. Nd4 {[%eval 573,32] [%emt 0:01:32]} Nbc5 {[%emt 0:00:09] (b5)}
42. Nf5 {[%eval 686,28] [%emt 0:01:16]} Qa7 {[%emt 0:01:50] (Dd8)} 43. Be3 {
[%eval 868,28] [%emt 0:02:20]} f6 {[%emt 0:01:30]} 44. gxf6 {[%eval 999,28]
[%emt 0:01:05]} Qf7 {[%emt 0:00:08]} 45. Bd4 {[%eval 1047,28] [%emt 0:01:23]}
h6 {[%emt 0:00:41] (Te8)} 46. Be2 {[%eval 1148,28] [%emt 0:01:48]} Nd7 {
[%emt 0:01:39]} 47. Qh4 {[%eval 1176,27] [%emt 0:00:01]} Qg6 {[%emt 0:00:38]}
48. Bxa6 {[%eval 1212,27] [%emt 0:00:40]} Qg5 {[%emt 0:00:42] (Kh7)} 49. Qxg5 {
[%eval 1272,29] [%emt 0:01:34]} hxg5 {[%emt 0:00:26]} 50. Bb5 {[%eval 1354,28]
[%emt 0:00:55]} 1-0 [/pgn]

For human eyes (that are right for once) it is easy to identify 31. ..gxh5 as suicidal here. Despite Stockfish’s huge evals before I personally think that white has nothing much if black avoids this move.
Crafty had been toying with this move for several moves before but it managed to always resolve this to leading to a disadvantage by search. A different evaluation might have avoided this move, but if an engine rejects this by eval or by search doesn’t matter much in an actual game, if it can avoid it.
Now to my second question: shouldn’t this be an even bigger problem for Stockfish and its completely insane search depth and pruning? As a lay I see two alternatives: either it is so incredibly fast that you could do a similar process ( assuming it weren’t already NNUE) at higher depths – or it is the incredible hardware ressources it has, that allowed it to become NNUE?
I think it would indeed be best if Bob participated in this discussion that he probably isn’t aware of right now. For tactical reasons I would suggest you contact him ( as you are way more interesting than me

) .
All the best.
Peter

Posts : 233 Join date : 2021-10-08

(Just some sideline remarks I had) Not all engines obey the fixed depth rule very well. I think at least Stockfish did not, in the past. So that is something to consider. It is actually a miracle that this works for Rebel, under Toga or even Growing Fruit with increased LMP and LMR I think. Maybe the pruning is little bit less for those programs, but if you look at Stockfish, a 6 ply search, even double that, is only a brushed up capture search only, or almost a quiescence search. It should be checked with some sample games I think what moves are chosen then? Late Move Pruning and Late Move Reductions will reduce, or even outright prune, all the non captures but the hash move. That is in All nodes, which for the engine, Rebel or Crafty, will be positions where it has the move, that did not make beta. So they are not in the PV. In PV nodes the hashmove is tried first and never reduced, in Cut nodes too, the first move from TT is not reduced, but other moves that are non captures only if that move first fails, or in a PV node. That is practically the only difference with the quiescence search, that the TT move is tried too. I am exaggerating a bit Smile

As long as a non tactical move gets a chance of a reduced search, that will be enough for the learner to get a positional value, but not if the move is pruned outright. And if a capture is tried first, the evaluation will be from a quiescense search only. This is what produces that non positional play in Stockfish and other programs that use heavy LMP and LMR. Because those semi quiescence evaluations can be higher than the purely positional LMR shortened search, of a quiet move that is objectively the best in the position (or they can be lower than eval of a quiet move but that move is never evaluated. So then the parent node gets too high evaluation. And the engine will not stand pat on the positional evaluation alone like in the quiescence search, where this is a safeguard to the effect of the too low eval, positional eval is overruling if that is enough to cross alpha).

Subject: Re: Crafty NN development Fri Apr 22, 2022 6:28 pm

Like I said, it is complicated, lemme say some words about the choice for the right depth playing games creating volume for the development of a neural net.

In the past Gary Linschott (SF) said that for Stockfish the best depth was 9. I was surprised knowing that SF at the early plies is not so good, see the depth=6 games I played above, even good old Fruit 2.1 scores 47% against SF15.

I could not believe that (stupid of course) and out of pure curiosity gathered 10-20 million quality SF games with a minimum depth of 20 plies and created a neural net from it and it was soooooooo bad. Reason: the Learner can't learn from quality games, it needs not so good moves also. It's the world upside down. But it's a bit comparable with evolution theory, millions of mutations, the bad die, the good mutations create something new that can survive.

For Rebel-14 I used Benjamin depth=6 and it was great. For Rebel-14.1 I tried Benjamin depth=7 and it gave zero improvement. And that became a crucial observation, a learning moment. Eventually lumping depth 6 and depth 7 together (volume!) and a new net structure by Chris gave a good improvement.

And so it will differ for each engine what time control you choose, if the data is too good or the data is too bad the learner can't do much and the result will be bad, somehow a balance must be found. For Rebel and Rodent depth=6 works fine, for Crafty..... no idea.

Posts : 233 Join date : 2021-10-08

Dat lijkt me een heldere analyse Ed! Very good, I think we can, to borrow from Robert Heinlein "grok" this a little bit better now!

Subject: Re: Crafty NN development Sat Apr 23, 2022 9:19 am

Eelco wrote:: Dat lijkt me een heldere analyse Ed! Very good, I think we can, to borrow from Robert Heinlein "grok" this a little bit better now!

I had to look up grok and now I grok.

Very Happy

» What were the nodes per move in the Crafty-Rebel NP Challenge?
» Which NNUE development is next?
» Rebel 15 development
» Chess Tal development .... What is to be done?
» What is going on, on Stockfish Development Website?