Testing at fixed depth -- a question to Ed

Posts : 612 Join date : 2020-11-26

Ed,

I know that you tested engines at fixed depth, experimented with depth +1, etc. Did you test Komodo this way?

I started a tournament between different Komodo versions at fixed depth and I am interested in your opinion about their eval function. I hope that they behave the same way (that fixed depth means the same thing for all engines), and I am interested in the changes after Komodo 8. The newer versions were much faster, pruned more, and I would like to know how much has been sacrificed to achieve it.

Also, are there suggestions you would like to make concerning the testing?

Subject: Re: Testing at fixed depth -- a question to Ed Thu Jan 28, 2021 5:50 pm

I have Komodo 14, what I can do is run 100 game matches with cutechess.

depth 12 vs depth 13
depth 12 vs depth 14
depth 12 vs depth 15

then

depth 13 vs depth 14
depth 13 vs depth 15
depth 13 vs depth 16

then

depth 14 vs depth 15
depth 14 vs depth 16
depth 14 vs depth 17

And post the results here.

Good idea ?

Posts : 612 Join date : 2020-11-26

Yes, it would be interesting.

Meanwhile, I already ran two tournaments, one at depth 10 and another at depth 12, with K3, K CCT, K TCEC, K8, K9 and K12. In both K3 finished at the top and K12 was last. The number of games was low though (4 rounds at depth 12, 10 rounds at depth 10) to make any conclusion, so I will test K3, K8 and K12 in matches at different depths in the next few days.

What I want to try/check is what you did here:

http://rebel13.nl/rebel13/experiments.html

but between different versions of Komodo. In your tests there was a correlation between eval/depth, and I would like to see if this is the case with Komodo. I tend to think that older versions of Komodo had a better/more complex evaluation function. It was clear that parts of the eval disappeared when tablebase support was added, and I I also feel that Komodo since the K9 is a different engine. It is just a hutch, but since K9 seems better in tactics, and somehow... well, with another philosophy. Of course, another author coded it.

Anyway, I am just curious.

Posts : 612 Join date : 2020-11-26

OK. Meanwhile, first results: K12-K8 at depth 14, o-deville.pgn from move 8: 10-10.

I use a slow laptop so I will run short matches of 20 games. Now I will test K3 vs K12 at depth 14.

Posts : 612 Join date : 2020-11-26

And again...

Engine Score Ko Ko S-B
1: Komodo 12.1.1 10.0/20 ···················· =0=0==1==00==1==1=1= 100.00
1: Komodo 3 10.0/20 =1=1==0==11==0==0=0= ···················· 100.00

20 games played / Tournament is finished
Name of the tournament: Komodo_match2_fd14.at
Level: 14 Half moves
Hardware: AMD A8-4500M APU with Radeon(tm) HD Graphics 1900 MHz with 5.5 GB Memory
PGN-File: C:\Temp\Chess\GUI\Arena\Tournaments\Komodo_match2_fd14.at.pgn

I'll try now at depths 15 and 16, publish the results, then restart at lower depths with much more games.

Subject: Re: Testing at fixed depth -- a question to Ed Thu Jan 28, 2021 9:37 pm

Komodo 14 - Komodo 3, 100 games, depth=12, 45-55

Code:: Engine Depth Time Games Moves Average Forfeit Book Depth MIDG EARLY ENDG LATE
Komodo-3 11.93 0:28:18 100 8427 0.20 0 810 8.10 11.98 | 11.98 | 11.90 | 11.88
Komodo-14 12.00 0:04:26 100 8433 0.03 0 800 8.00 12.00 | 12.00 | 12.00 | 12.00

Note the time difference 28:18 vs 04:26

Running depth=16 now.

Posts : 612 Join date : 2020-11-26

I had these results in the meantime:

K8 - K12 - 12-8 depth 16
K8 - K12 - 33,5-16,5 depth 4

Of course, I also noticed the time difference. K12 is evidently pruning much more than the previous versions.

Nonetheless, in your testing of Pro Deo vs SF, there was a moment when the trend changed: at depth 10 with SF8, and at depth 12 with SF9. Here, I still have not noticed such a trend.

You wrote (I quote from memory) that the evaluation is qualitatively different when the horizon of the engine is low, but here I have the impression that Larry and Mark, although pruning much more, did not adapt there evaluation to the ability to reach higher depths.

I could be wrong, obviously, but I guess that we will have linear results -- a simplified evaluation to achieve higher depths that behaves worse in fixed-depth testing. It also explains why -- as you noticed -- Komodo does not scale well.

What also can be remarked is that K12 is tactically much aware and is generally better in openings, while it loses in simple positions (although I have to have another look at the games played at depth 16).

Posts : 612 Join date : 2020-11-26

Ok, so far:

K8 - K12 - 33,5-16,5 depth 4
K8 - K12 - 34-16 depth 6
K8 - K12 - 39-11 depth 8

Continuing the tests.

Subject: Re: Testing at fixed depth -- a question to Ed Fri Jan 29, 2021 2:01 am

Depth=16
Score of Komodo-14 vs Komodo-3: 35 - 17 - 48 [0.590] 100
ELO difference: 63.23 +/- 49.44
Finished match

Code:: Engine Depth Time Games Moves Average Forfeit Book Depth MIDG EARLY ENDG LATE
Komodo-3 15.91 5:07:36 100 8564 2.16 0 805 8.05 16.00 | 16.00 | 15.93 | 15.71
Komodo-14 16.00 0:24:40 100 8553 0.17 0 800 8.00 16.00 | 16.00 | 16.00 | 16.00

From a close loss to a clear win.

Posts : 612 Join date : 2020-11-26

Thanks, Ed! Meanwhile, I have the following results so far:

K8 - K12 - 33,5-16,5 depth 4
K8 - K12 - 34-16 depth 6
K8 - K12 - 39-11 depth 8
K8 - K12 - 33-17 depth 10
K8 - K12 - 35-15 depth 12
K8 - K12 - 27,5-22,5 depth 14

Testing now at depth 16.

Anyway, testing is one thing, but giving these results a meaning is another. One thing is certain: K12 (and, obviously, K14) is tactically better. It won a lot of games by mating the opponent immediately after the opening. Now, at 16 plies, I see that time after time K12 finds tactical solutions to positional problems: in the current game, it gave material to get into a K+R vs K+B ending, in the previous, he did the same thing to draw a K+R+B vs K+R ending, etc. (and in the following, it converts a worse position in an ending where it gives the N for a pawn to draw). K8 looks like Karpov, K12 like Kasparov at depth 16. K8 plays better simple positions that it fails to convert, while K12 is clearly superior in complications.

So, why does an evaluation function, written, improved by the same author for ten years (or five, in the case of K12 vs K8), give better results at 16 plies, while it remains inferior at 12 plies? I could see the answer here: the focus has been shifted to areas where engines have an advantage: tactics. Depth compensates for small positional mistakes.

Posts : 612 Join date : 2020-11-26

And... K8 - K12 - 25-25 at depth 16

A few [final] words.

The ultimate aim of these tests was to see which Komodo was, for me, best to use in analysis. Since time is not a factor -- I try to assess the positions from my standpoint, and need a lot of time to calculate variations -- I wanted to see if the evaluation function was improved, and, being aware of pruning and Ed's previous tests, from what depths.

In a sense, I feel some deception: the positional evaluation of Komodo has not only shifted from static to dynamic elements, from pawn structure to piece activity, but simply seems a bit worse.

So, why not use NNUE engines? They are simply too fast. The advantage of using older engines is that they give lines one (a CM) can easily understand. Also, in OTB games, the evaluation of the position shifts often from +1 to -1 without big disadvantage at my level. I would rather play a combative, slightly worse position that I understand well than a complicated, superior position where I do not see a plan, or a tactical trap in 12 plies. Most NNUE engines are based on SF and reach depths 20 after a few seconds. Lines replace lines each second, and I feel lost.

The development of older engines went, for years, following the ways of human thinking, before shifting to machine tuning, then machine learning. I simply do not "understand" modern engines, and therefore cannot, and don't like to use them. But it's only me, I guess.

Posts : 612 Join date : 2020-11-26

I thought about starting a new topic, but here is all right too.

I continued testing K8 at different fixed depths. The main problem is, of course, my relatively slow hardware, so I use different numbers of games at different depths, and the results are not as precise as I would like.

Anyway. I continued pitching K8 against K12. Not many games -- but it seems so far that at depth 20 K8 does not lose ground. I watched several games, and while K12 was obviously tactically superior, K8 was better positionally and managed to play 20 games even (two matches of 10).

Then, I tested K8 against Winter 0.9. Komodo was evidently superior at lower depths, but the difference was smaller at higher depths. At d18, a short match finished 5,5-4,5 for K8, and I am testing these two at depth 20, to see if it is worth continuing with more games.

What is interesting is that Komodo 8 has a higher rating than Winter 0.9, and I wanted to see how Winter's NN eval would fare with more time. The main problem is that it seems that Winter extends the search somehow.

We can also see that a higher rated engine does not have to have a better evaluation function than a lower rated one. In the case of Komodo, I think the development focused on speed and tactics, and positional elements of the eval were simplified. In a way, it is logical: calculating deep often leads to simpler positions easy to assess, and the key is not to miss tactics in the process.

It's hard to draw conclusions. Pruning the search has the nasty side effect you also prune in the chess knowledge. Even a bad LMR implementation gives elo at the cost of losing playing style. An example from the past with ProDeo 1.6

[pgn][Event "?"]
[Site "?"]
[Date "2021.02.01"]
[White "?"]
[Black "?"]
[Result "*"]
[FEN "7k/8/8/8/8/8/7P/4KB2 w - -"]
[/pgn]

Code:: 00:00:00 1.00 0.03 1.Kd2
00:00:00 2.00 0.04 1.Kd2 Kg7
00:00:00 3.00 0.03 1.Kd2 Kg7 2.h4 Kh6 3.Kc2 Kg7 4.Bg2 Kh6 5.Kb3
   Kg7 6.h5 Kh6 7.Bf3 Kg7 8.Kc4 Kh7 9.Kd5 Kh6
   10.Ke4 Kg7 11.Ke5 Kh6 12.Kd5 Kh7
00:00:00 4.00 0.04 1.Kd2 Kg7 2.h4 Kh6 3.Kc2 Kg7 4.Bg2 Kh6 5.Kb3
   Kg7 6.h5 Kh6 7.Bf3 Kg7 8.Kc4 Kh7 9.Kd5 Kh6
   10.Ke4 Kg7 11.Ke5 Kh6 12.Kd5 Kh7
00:00:00 5.00 0.03 1.Kd2 Kg7 2.h4 Kh6 3.Kc2 Kg7 4.Bg2 Kh6 5.Kb3
   Kg7 6.h5 Kh6 7.Bf3 Kg7 8.Kc4 Kh7 9.Kd5 Kh6
   10.Ke4 Kg7 11.Ke5 Kh6 12.Kd5 Kh7
00:00:00 5.04 0.04 1.Be2 Kg7 2.h4 Kg8 3.Bc4 Kg7 4.Ke2 Kg6 5.Kf3
   Kh5 6.Kg3 Kg6 7.Kg4 Kf6 8.h5 Kg7 9.Kf5 Kh7
   10.Ke5 Kh6 11.Bf7 Kg5 12.Kd4 Kf6
00:00:00 6.00 0.04 1.Be2 Kg7 2.h4 Kg8 3.Bc4 Kg7 4.Ke2 Kg6 5.Kf3
   Kh5 6.Kg3 Kg6 7.Kg4 Kf6 8.h5 Kg7 9.Kf5 Kh7
   10.Ke5 Kh6 11.Bf7 Kg5 12.Kd4 Kf6
00:00:00 7.00 0.04 1.Be2 Kg7 2.h4 Kg8 3.Bc4 Kg7 4.Ke2 Kg6 5.Kf3
   Kh5 6.Kg3 Kg6 7.Kg4 Kf6 8.h5 Kg7 9.Kf5 Kh7
   10.Ke5 Kh6 11.Bf7 Kg5 12.Kd4 Kf6
00:00:00 8.00 0.04 1.Be2 Kg7 2.h4 Kg8 3.Kf2 Kh7 4.Bd3 Kg7 5.h5
   Kh6 6.Bg6 Kg5 7.Kf3 Kh6 8.Ke4 Kg7 9.Bf5 Kh6
   10.Bg4 Kh7 11.Bf3 Kg7 12.Ke5 Kh6
00:00:00 9.00 0.04 1.Be2 Kg7 2.h4 Kg8 3.Kf2 Kh7 4.Bc4 Kg6 5.Bf7
   Kg7 6.Bd5 Kh7 7.Kf3 Kg7 8.Ke4 Kh6 9.Kf4 Kh5
   10.Kg3 Kg6 11.Kg4 Kg7 12.Kf5 Kh6
00:00:00 10.00 0.04 1.Be2 Kg7 2.h4 Kg8 3.Kf2 Kh7 4.Bc4 Kg6 5.Bf7
   Kg7 6.Bd5 Kh7 7.Kf3 Kg7 8.Ke4 Kh6 9.Kf4 Kh5
   10.Kg3 Kg6 11.Kg4 Kg7 12.Kf5 Kh6
00:00:00 11.00 0.03 1.Be2 Kg7 2.h4 Kg8 3.Kf2 Kg7 4.Bc4 Kg6 5.Kf3
   Kh5 6.Kg3 Kg6 7.Kg4 Kf6 8.h5 Kg7 9.Kf5 Kh7
   10.Ke5 Kh6 11.Bf7 Kg5 12.Kd4 Kf6
00:00:00 12.00 0.04 1.Be2 Kg7 2.h4 Kg8 3.Kf2 Kg7 4.Ke3 Kh7 5.Bd3
   Kg8 6.Bc4 Kg7 7.Kf2 Kg6 8.Kf3 Kh5 9.Kg3 Kg6
   10.Bd3 Kg7 11.Kf3 Kh6 12.Be2 Kh5
00:00:00 13.00 0.03 1.Be2 Kg7 2.h4 Kg8 3.Bc4 Kh7 4.Bd3 Kg7 5.Ke2
   Kg8 6.Bc4 Kh7 7.Kf3 Kg6 8.Bf7 Kh6 9.Ke4 Kg7
   10.Bd5 Kh6 11.Kf4 Kh5 12.Kg3 Kg6
00:00:00 14.00 0.04 1.Be2 Kg7 2.h4 Kg8 3.Bc4 Kh7 4.Bd3 Kg7 5.Ke2
   Kh6 6.Kf2 Kg7 7.Ke2 Kh6 8.Kf2 Kg7
00:00:00 15.00 0.04 1.Be2 Kg7 2.h4 Kg8 3.Bc4 Kh7 4.Bd3 Kg7 5.Kd1
   Kh6 6.Ke2 Kh5 7.Be4 Kxh4
00:00:00 16.00 0.04 1.Be2 Kg7 2.h4 Kg8 3.Bc4 Kh7 4.Bf7 Kg7 5.Bd5
   Kh6 6.Kf2 Kh5 7.Kg3 Kg6 8.Kf3 Kh5 9.Kg3 Kg6
   10.Kf3 Kh5
00:00:00 17.00 0.04 1.Be2 Kg7 2.h4 Kg8 3.Bc4 Kh7 4.Kd2 Kh6 5.Bf7
   Kh7 6.h5 Kh6 7.Ke1 Kg7 8.Ke2 Kh6 9.Kd3 Kg7
   10.Bd5 Kh6 11.Bf3 Kg7 12.Bd1 Kh6
00:00:00 18.00 0.04 1.Be2 Kg7 2.h4 Kg8 3.Bc4 Kh7 4.Kd2 Kh6 5.Bf7
   Kh7 6.Kd3 Kg7 7.Be8 Kh6 8.Kc2 Kg7 9.Kd3 Kh6
   10.Kc2 Kg7
00:00:01 18.05 0.04 1.Bd3
00:00:01 18.05 0.04 1.Bd3
00:00:01 18.05 0.04 1.Bd3 Kg7 2.h4 Kg8 3.Ke2 Kh8 4.Kf3 Kg8 5.Kg4
   Kg7 6.Kf4 Kh6 7.Bc4 Kg6 8.Kg4 Kg7 9.Kf4 Kg6
   10.Kg4 Kg7 11.Kf4
00:00:02 19.00 5.87 1.Bd3 Kg8 2.Kf2 Kf7 3.Kg3 Kg7 4.Kf4 Kh6 5.h4
   Kh5 6.Kg3 Kh6 7.Be2 Kh7 8.Bd3 Kg7 9.Be2 Kh6
   10.Bf3 Kg7 11.Bh5 Kh6
00:00:03 19.05 5.87 1.Kf2
00:00:04 19.12 5.87 1.h4
00:00:06 20.00 6.69 1.Bd3 Kg8 2.Kf2 Kg7 3.Kg3 Kg8 4.Bc4 Kg7 5.h4
   Kh6 6.Bd3 Kh5 7.Kh3 Kh6 8.Kg4 Kg7 9.Kf4 Kh6
   10.Be2 Kg6 11.Ke4 Kh7 12.Bd3 Kg7
00:00:20 21.00 0.04 1.Bd3 Kg8 2.Kf2 Kh8 3.Bc4 Kg7 4.Ke3 Kg6 5.Bb5
   Kh5 6.Kd4 Kh4 7.Bd7 Kg5 8.Bb5 Kf6 9.Bc6 Kg5
   10.Ke5 Kh4 11.Bg2 Kh5 12.Bd5 Kg5
00:00:20 21.01 0.04 1.h4
00:00:20 21.01 0.04 1.h4
00:00:21 21.01 8.87 1.h4 Kg7 2.Bb5 Kf6 3.Ke2 Kg6 4.Kf3 Kf7 5.Ke4
   Kg8 6.Bc4 Kg7 7.Ke5 Kh7 8.Be2 Kh6 9.Kf4 Kg7
   10.Ke4 Kf8 11.Bc4 Ke7 12.Kf4 Kd7
00:00:21 21.02 8.87 1.Be2
00:00:23 21.03 8.87 1.Kd2
00:00:25 21.04 8.87 1.Ke2
00:00:42 22.00 9.00 1.h4 Kg7 2.Kd2 Kh6 3.Kc1 Kg7 4.Kc2 Kh6 5.Kb3
   Kg7 6.Ka4 Kh6 7.Kb4 Kg7 8.Kc5 Kh6 9.Be2 Kg6
   10.h5 Kg7 11.Bd3 Kg8 12.Be2 Kh7
00:00:42 22.01 9.00 1.Bd3
00:00:49 22.02 9.00 1.Ke2
00:00:52 22.03 9.00 1.Be2
00:01:03 22.06 9.00 1.Kf2
00:01:10 22.07 9.00 1.Bc4
00:01:16 22.08 9.00 1.Bb5
00:01:22 22.10 9.00 1.Bg2
00:01:26 22.10 9.03 1.Bg2 Kg7 2.h4 Kg6 3.Bh3 Kg7 4.h5 Kh6 5.Bg4
   Kg7 6.Kf2 Kf6 7.Ke3 Kg5 8.Kd4 Kf6 9.Be2 Kg7
   10.Ke4 Kf6 11.Bc4 Kg5 12.Be2 Kf6

As you can see from the first ply one the engine shows a draw score, meaning it has the chess knowledge of the 'bad bishop'. Then when search moves in due to pruning the chess knowledge is lost. Extreme example but you get the drift.

Posts : 612 Join date : 2020-11-26

I was aware that the interaction of search and eval was widely different from engine to engine, but that the search pruned also knowledge was a bit of surprise. I thought that the evaluation was the main criterion for pruning.

Now, as a non-programmer, I have to ask: can it be avoided? Is every increase of speed detrimental to knowledge? Is knowledge reduced in top engines?

I tend to think that lack of knowledge was the culprit for the lack of improvements lately. The NN were a shortcut to knowledge, and programming complex knowledge (complex pawn structures, plans, etc.) was from the get go difficult. I thought that the use of big data could have been incorporate in a more transparent way, something we will be forcedd to do anyway, by understanding how NNs work.

Anyway, I guess that we have made a long trip and are back to the starting point: back to knowledge, to improve NNs, to be able to optimize search for NNs, to incorporate classic evaluation terms with NNs.

matejst wrote:: I was aware that the interaction of search and eval was widely different from engine to engine, but that the search pruned also knowledge was a bit of surprise. I thought that the evaluation was the main criterion for pruning.

The evaluation is only a part of the many pruning techniques, some pruning techniques are pure search related and don't use the evaluation at all.

Quote :: Now, as a non-programmer, I have to ask: can it be avoided? Is every increase of speed detrimental to knowledge? Is knowledge reduced in top engines?

I tend to think that lack of knowledge was the culprit for the lack of improvements lately. The NN were a shortcut to knowledge, and programming complex knowledge (complex pawn structures, plans, etc.) was from the get go difficult. I thought that the use of big data could have been incorporate in a more transparent way, something we will be forcedd to do anyway, by understanding how NNs work.

Anyway, I guess that we have made a long trip and are back to the starting point: back to knowledge, to improve NNs, to be able to optimize search for NNs, to incorporate classic evaluation terms with NNs.

It's unclear for me where this NN approach is going to lead us as computer chess lovers, the rating lists are in a split what to test and what not, to publish what and what not, see the crazy CCRL compromise between 40/2 and 40/15. Next revolution will come when programmers are able to put the search into a NN, great for elo, but computer chess will be death.

Posts : 612 Join date : 2020-11-26

Ed, in a way, for years already, computerchess as we knew it has been dead.

In the '90 I managed to buy several chess programs (it is not easy when you live in a little town in a small country, and one has to rely on relatives working in Germany, France, Russia when they somehow manage to understand what you want), and there were always improvements of the playing style, the interface, the opening books, etc. Computer chess was about playing, analyzing, the program was a replacement for chess partner.

But suddenly, eng-eng matches became the main field of computer chess; the SF project put out of business all the existing professional developers, and there was no money for new ones. And, to be blunt, I think that open source has been not only used, but greatly abused: no one dares to publish a closed-source engine nowadays. It has become a witch hunt: everybody is under suspicion, everybody is attacked at some point.

What makes me a bit too cynical about the SF project and the open source purity is the faith of Rybka. In retrospect, I tend more and more to think that Vas was absolutely innocent. He took ideas, wrote its own code and improved his engine tremendously. And, he was found guilty. But soon after, his code and his ideas were pillaged, incorporated in open source projects, and it was OK since he was the villain... The way open source is always white as snow, despite its origins and its nefarious consequences (monopolization of ideas) is quite paradoxical. But, just like in The Highlander, there can be only one.

Subject: Re: Testing at fixed depth -- a question to Ed Sun Feb 07, 2021 8:05 am

You are right.

Posts : 46 Join date : 2020-12-01

Perhaps engines that reach greater depths gain from evaluation function that exaggerates dynamic elements. At low depths, exaggeration brings bad results and skewed playing style. At greater depths, there are many more ways to avoid being under direct attack. But if this attack happens in all the lines, then exaggerated score is not exaggerated at all!

Also, I remember that running Rodent in TCEC was an important learning experience. They shown position at the end of PV alongside the main board. When engine reaches really big depths, then middlegame moves are choosen mostly because of the result after many exchanges, which often is an endgame with 2 or 3 pieces for each side. It seems that in modern engines king safety is used not for conducting an attack, but chiefly to weed out lines where any side is under too strong attack.

Posts : 3112 Join date : 2020-11-18

nescitus wrote:: Perhaps engines that reach greater depths gain from evaluation function that exaggerates dynamic elements.

Maybe it ultimately comes down to the question, "What is good chess?"

The answer has to be, "it depends on the opponent."

If your opponent is beatable, and you're better than the opponent, then I would argue that "good chess" is making the position as difficult as possible to maximise the probability of the opponent making a mistake. I can see that you could also make a case for "play for long term positional advantages" being "good chess" in this situation.

Posts : 612 Join date : 2020-11-26

Pawel,

It is clear that engines that calculate deeper do not need certain elements of the evaluation. Evaluation, just like calculation, is a way of predicting, except that it is less precise in a sense, it is more speculative. You really could be here on the right path: I once noticed that higher rated engines, achieving bigger depths, are weaker in the opening and the middle game, but were much better in simpler positions.

I will check the existing testing in endgames and, if I can find something, in late middlegame. I will also try to test Komodo at fixed depths in simple positions in the next few days while I am at home.

BTW, I tried to create Karpovian personalities with Rodent, but it was an complete failure, despite reading the extensive explanations you gave.

Posts : 3112 Join date : 2020-11-18

matejst wrote:: It is clear that engines that calculate deeper do not need certain elements of the evaluation. Evaluation, just like calculation, is a way of predicting, except that it is less precise in a sense, it is more speculative.

Hmmm... if the negative consequence of a move is over the search horizon, then good evaluation might be more precise.

I think you'd agree that it would be possible to create an evaluation function that would score most positions correctly: the biggest question is how big this EF would need to be. I took part in a big discussion about this at CCC last year. To me, the evidence suggests that such an EF could actually be quite a lot smaller than most people would think it would need to be (some people disagreed). I think that one day, a chess program will be like a calculator in the sense that you input a position and immediately get the correct answer back.

We agree that deep search eliminates the need for some EF knowledge. Likewise, deep knowledge of chess would eliminate the need for a lot of the surface knowledge. By surface knowledge, I mean knowledge that a good human player can explain, and by deep knowledge I mean either knowledge that a good player has but cannot explain, or knowledge that good humans don't even have.

» Lc0 question
» question about K/B/R/Q odds
» Stockfish! What is really better? Faster Cores? More Cores?, and NPS vs. Time to Depth.
» Stockfish 13 Gauntlet Testing
» How much Progress? Stockfish 14 vs Stockfish 22/11/21 (NUMA issue fixed version)