Subject: Testing at fixed depth -- a question to Ed Thu Jan 28, 2021 1:08 pm
Ed,
I know that you tested engines at fixed depth, experimented with depth +1, etc. Did you test Komodo this way?
I started a tournament between different Komodo versions at fixed depth and I am interested in your opinion about their eval function. I hope that they behave the same way (that fixed depth means the same thing for all engines), and I am interested in the changes after Komodo 8. The newer versions were much faster, pruned more, and I would like to know how much has been sacrificed to achieve it.
Also, are there suggestions you would like to make concerning the testing?
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Testing at fixed depth -- a question to Ed Thu Jan 28, 2021 5:50 pm
I have Komodo 14, what I can do is run 100 game matches with cutechess.
depth 12 vs depth 13 depth 12 vs depth 14 depth 12 vs depth 15
then
depth 13 vs depth 14 depth 13 vs depth 15 depth 13 vs depth 16
then
depth 14 vs depth 15 depth 14 vs depth 16 depth 14 vs depth 17
And post the results here.
Good idea ?
matejst
Posts : 612 Join date : 2020-11-26
Subject: Re: Testing at fixed depth -- a question to Ed Thu Jan 28, 2021 6:45 pm
Yes, it would be interesting.
Meanwhile, I already ran two tournaments, one at depth 10 and another at depth 12, with K3, K CCT, K TCEC, K8, K9 and K12. In both K3 finished at the top and K12 was last. The number of games was low though (4 rounds at depth 12, 10 rounds at depth 10) to make any conclusion, so I will test K3, K8 and K12 in matches at different depths in the next few days.
but between different versions of Komodo. In your tests there was a correlation between eval/depth, and I would like to see if this is the case with Komodo. I tend to think that older versions of Komodo had a better/more complex evaluation function. It was clear that parts of the eval disappeared when tablebase support was added, and I I also feel that Komodo since the K9 is a different engine. It is just a hutch, but since K9 seems better in tactics, and somehow... well, with another philosophy. Of course, another author coded it.
Anyway, I am just curious.
matejst
Posts : 612 Join date : 2020-11-26
Subject: Re: Testing at fixed depth -- a question to Ed Thu Jan 28, 2021 6:53 pm
OK. Meanwhile, first results: K12-K8 at depth 14, o-deville.pgn from move 8: 10-10.
I use a slow laptop so I will run short matches of 20 games. Now I will test K3 vs K12 at depth 14.
matejst
Posts : 612 Join date : 2020-11-26
Subject: Re: Testing at fixed depth -- a question to Ed Thu Jan 28, 2021 7:47 pm
And again...
Engine Score Ko Ko S-B 1: Komodo 12.1.1 10.0/20 ···················· =0=0==1==00==1==1=1= 100.00 1: Komodo 3 10.0/20 =1=1==0==11==0==0=0= ···················· 100.00
20 games played / Tournament is finished Name of the tournament: Komodo_match2_fd14.at Level: 14 Half moves Hardware: AMD A8-4500M APU with Radeon(tm) HD Graphics 1900 MHz with 5.5 GB Memory PGN-File: C:\Temp\Chess\GUI\Arena\Tournaments\Komodo_match2_fd14.at.pgn
I'll try now at depths 15 and 16, publish the results, then restart at lower depths with much more games.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Testing at fixed depth -- a question to Ed Thu Jan 28, 2021 9:37 pm
Komodo 14 - Komodo 3, 100 games, depth=12, 45-55
Code:
Engine Depth Time Games Moves Average Forfeit Book Depth MIDG EARLY ENDG LATE Komodo-3 11.93 0:28:18 100 8427 0.20 0 810 8.10 11.98 | 11.98 | 11.90 | 11.88 Komodo-14 12.00 0:04:26 100 8433 0.03 0 800 8.00 12.00 | 12.00 | 12.00 | 12.00
Note the time difference 28:18 vs 04:26
Running depth=16 now.
matejst
Posts : 612 Join date : 2020-11-26
Subject: Re: Testing at fixed depth -- a question to Ed Thu Jan 28, 2021 9:50 pm
Of course, I also noticed the time difference. K12 is evidently pruning much more than the previous versions.
Nonetheless, in your testing of Pro Deo vs SF, there was a moment when the trend changed: at depth 10 with SF8, and at depth 12 with SF9. Here, I still have not noticed such a trend.
You wrote (I quote from memory) that the evaluation is qualitatively different when the horizon of the engine is low, but here I have the impression that Larry and Mark, although pruning much more, did not adapt there evaluation to the ability to reach higher depths.
I could be wrong, obviously, but I guess that we will have linear results -- a simplified evaluation to achieve higher depths that behaves worse in fixed-depth testing. It also explains why -- as you noticed -- Komodo does not scale well.
What also can be remarked is that K12 is tactically much aware and is generally better in openings, while it loses in simple positions (although I have to have another look at the games played at depth 16).
matejst
Posts : 612 Join date : 2020-11-26
Subject: Re: Testing at fixed depth -- a question to Ed Thu Jan 28, 2021 11:00 pm
Anyway, testing is one thing, but giving these results a meaning is another. One thing is certain: K12 (and, obviously, K14) is tactically better. It won a lot of games by mating the opponent immediately after the opening. Now, at 16 plies, I see that time after time K12 finds tactical solutions to positional problems: in the current game, it gave material to get into a K+R vs K+B ending, in the previous, he did the same thing to draw a K+R+B vs K+R ending, etc. (and in the following, it converts a worse position in an ending where it gives the N for a pawn to draw). K8 looks like Karpov, K12 like Kasparov at depth 16. K8 plays better simple positions that it fails to convert, while K12 is clearly superior in complications.
So, why does an evaluation function, written, improved by the same author for ten years (or five, in the case of K12 vs K8), give better results at 16 plies, while it remains inferior at 12 plies? I could see the answer here: the focus has been shifted to areas where engines have an advantage: tactics. Depth compensates for small positional mistakes.
matejst
Posts : 612 Join date : 2020-11-26
Subject: Re: Testing at fixed depth -- a question to Ed Fri Jan 29, 2021 1:20 pm
And... K8 - K12 - 25-25 at depth 16
A few [final] words.
The ultimate aim of these tests was to see which Komodo was, for me, best to use in analysis. Since time is not a factor -- I try to assess the positions from my standpoint, and need a lot of time to calculate variations -- I wanted to see if the evaluation function was improved, and, being aware of pruning and Ed's previous tests, from what depths.
In a sense, I feel some deception: the positional evaluation of Komodo has not only shifted from static to dynamic elements, from pawn structure to piece activity, but simply seems a bit worse.
So, why not use NNUE engines? They are simply too fast. The advantage of using older engines is that they give lines one (a CM) can easily understand. Also, in OTB games, the evaluation of the position shifts often from +1 to -1 without big disadvantage at my level. I would rather play a combative, slightly worse position that I understand well than a complicated, superior position where I do not see a plan, or a tactical trap in 12 plies. Most NNUE engines are based on SF and reach depths 20 after a few seconds. Lines replace lines each second, and I feel lost.
The development of older engines went, for years, following the ways of human thinking, before shifting to machine tuning, then machine learning. I simply do not "understand" modern engines, and therefore cannot, and don't like to use them. But it's only me, I guess.
matejst
Posts : 612 Join date : 2020-11-26
Subject: Re: Testing at fixed depth -- a question to Ed Sun Jan 31, 2021 7:08 pm
I thought about starting a new topic, but here is all right too.
I continued testing K8 at different fixed depths. The main problem is, of course, my relatively slow hardware, so I use different numbers of games at different depths, and the results are not as precise as I would like.
Anyway. I continued pitching K8 against K12. Not many games -- but it seems so far that at depth 20 K8 does not lose ground. I watched several games, and while K12 was obviously tactically superior, K8 was better positionally and managed to play 20 games even (two matches of 10).
Then, I tested K8 against Winter 0.9. Komodo was evidently superior at lower depths, but the difference was smaller at higher depths. At d18, a short match finished 5,5-4,5 for K8, and I am testing these two at depth 20, to see if it is worth continuing with more games.
What is interesting is that Komodo 8 has a higher rating than Winter 0.9, and I wanted to see how Winter's NN eval would fare with more time. The main problem is that it seems that Winter extends the search somehow.
We can also see that a higher rated engine does not have to have a better evaluation function than a lower rated one. In the case of Komodo, I think the development focused on speed and tactics, and positional elements of the eval were simplified. In a way, it is logical: calculating deep often leads to simpler positions easy to assess, and the key is not to miss tactics in the process.
TheSelfImprover likes this post
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Testing at fixed depth -- a question to Ed Mon Feb 01, 2021 11:03 am
It's hard to draw conclusions. Pruning the search has the nasty side effect you also prune in the chess knowledge. Even a bad LMR implementation gives elo at the cost of losing playing style. An example from the past with ProDeo 1.6
As you can see from the first ply one the engine shows a draw score, meaning it has the chess knowledge of the 'bad bishop'. Then when search moves in due to pruning the chess knowledge is lost. Extreme example but you get the drift.
matejst likes this post
matejst
Posts : 612 Join date : 2020-11-26
Subject: Re: Testing at fixed depth -- a question to Ed Wed Feb 03, 2021 10:26 am
I was aware that the interaction of search and eval was widely different from engine to engine, but that the search pruned also knowledge was a bit of surprise. I thought that the evaluation was the main criterion for pruning.
Now, as a non-programmer, I have to ask: can it be avoided? Is every increase of speed detrimental to knowledge? Is knowledge reduced in top engines?
I tend to think that lack of knowledge was the culprit for the lack of improvements lately. The NN were a shortcut to knowledge, and programming complex knowledge (complex pawn structures, plans, etc.) was from the get go difficult. I thought that the use of big data could have been incorporate in a more transparent way, something we will be forcedd to do anyway, by understanding how NNs work.
Anyway, I guess that we have made a long trip and are back to the starting point: back to knowledge, to improve NNs, to be able to optimize search for NNs, to incorporate classic evaluation terms with NNs.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Testing at fixed depth -- a question to Ed Wed Feb 03, 2021 12:01 pm
matejst wrote:
I was aware that the interaction of search and eval was widely different from engine to engine, but that the search pruned also knowledge was a bit of surprise. I thought that the evaluation was the main criterion for pruning.
The evaluation is only a part of the many pruning techniques, some pruning techniques are pure search related and don't use the evaluation at all.
Quote :
Now, as a non-programmer, I have to ask: can it be avoided? Is every increase of speed detrimental to knowledge? Is knowledge reduced in top engines?
I tend to think that lack of knowledge was the culprit for the lack of improvements lately. The NN were a shortcut to knowledge, and programming complex knowledge (complex pawn structures, plans, etc.) was from the get go difficult. I thought that the use of big data could have been incorporate in a more transparent way, something we will be forcedd to do anyway, by understanding how NNs work.
Anyway, I guess that we have made a long trip and are back to the starting point: back to knowledge, to improve NNs, to be able to optimize search for NNs, to incorporate classic evaluation terms with NNs.
It's unclear for me where this NN approach is going to lead us as computer chess lovers, the rating lists are in a split what to test and what not, to publish what and what not, see the crazy CCRL compromise between 40/2 and 40/15. Next revolution will come when programmers are able to put the search into a NN, great for elo, but computer chess will be death.
matejst
Posts : 612 Join date : 2020-11-26
Subject: Re: Testing at fixed depth -- a question to Ed Wed Feb 03, 2021 3:17 pm
Ed, in a way, for years already, computerchess as we knew it has been dead.
In the '90 I managed to buy several chess programs (it is not easy when you live in a little town in a small country, and one has to rely on relatives working in Germany, France, Russia when they somehow manage to understand what you want), and there were always improvements of the playing style, the interface, the opening books, etc. Computer chess was about playing, analyzing, the program was a replacement for chess partner.
But suddenly, eng-eng matches became the main field of computer chess; the SF project put out of business all the existing professional developers, and there was no money for new ones. And, to be blunt, I think that open source has been not only used, but greatly abused: no one dares to publish a closed-source engine nowadays. It has become a witch hunt: everybody is under suspicion, everybody is attacked at some point.
What makes me a bit too cynical about the SF project and the open source purity is the faith of Rybka. In retrospect, I tend more and more to think that Vas was absolutely innocent. He took ideas, wrote its own code and improved his engine tremendously. And, he was found guilty. But soon after, his code and his ideas were pillaged, incorporated in open source projects, and it was OK since he was the villain... The way open source is always white as snow, despite its origins and its nefarious consequences (monopolization of ideas) is quite paradoxical. But, just like in The Highlander, there can be only one.
Admin likes this post
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Testing at fixed depth -- a question to Ed Sun Feb 07, 2021 8:05 am
You are right.
matejst likes this post
nescitus
Posts : 46 Join date : 2020-12-01
Subject: Re: Testing at fixed depth -- a question to Ed Sun Feb 07, 2021 10:35 am
Perhaps engines that reach greater depths gain from evaluation function that exaggerates dynamic elements. At low depths, exaggeration brings bad results and skewed playing style. At greater depths, there are many more ways to avoid being under direct attack. But if this attack happens in all the lines, then exaggerated score is not exaggerated at all!
Also, I remember that running Rodent in TCEC was an important learning experience. They shown position at the end of PV alongside the main board. When engine reaches really big depths, then middlegame moves are choosen mostly because of the result after many exchanges, which often is an endgame with 2 or 3 pieces for each side. It seems that in modern engines king safety is used not for conducting an attack, but chiefly to weed out lines where any side is under too strong attack.
matejst likes this post
TheSelfImprover
Posts : 3112 Join date : 2020-11-18
Subject: Re: Testing at fixed depth -- a question to Ed Sun Feb 07, 2021 12:09 pm
nescitus wrote:
Perhaps engines that reach greater depths gain from evaluation function that exaggerates dynamic elements.
Maybe it ultimately comes down to the question, "What is good chess?"
The answer has to be, "it depends on the opponent."
If your opponent is beatable, and you're better than the opponent, then I would argue that "good chess" is making the position as difficult as possible to maximise the probability of the opponent making a mistake. I can see that you could also make a case for "play for long term positional advantages" being "good chess" in this situation.
matejst likes this post
matejst
Posts : 612 Join date : 2020-11-26
Subject: Re: Testing at fixed depth -- a question to Ed Sun Feb 07, 2021 2:03 pm
Pawel,
It is clear that engines that calculate deeper do not need certain elements of the evaluation. Evaluation, just like calculation, is a way of predicting, except that it is less precise in a sense, it is more speculative. You really could be here on the right path: I once noticed that higher rated engines, achieving bigger depths, are weaker in the opening and the middle game, but were much better in simpler positions.
I will check the existing testing in endgames and, if I can find something, in late middlegame. I will also try to test Komodo at fixed depths in simple positions in the next few days while I am at home.
BTW, I tried to create Karpovian personalities with Rodent, but it was an complete failure, despite reading the extensive explanations you gave.
TheSelfImprover
Posts : 3112 Join date : 2020-11-18
Subject: Re: Testing at fixed depth -- a question to Ed Sun Feb 07, 2021 5:29 pm
matejst wrote:
It is clear that engines that calculate deeper do not need certain elements of the evaluation. Evaluation, just like calculation, is a way of predicting, except that it is less precise in a sense, it is more speculative.
Hmmm... if the negative consequence of a move is over the search horizon, then good evaluation might be more precise.
I think you'd agree that it would be possible to create an evaluation function that would score most positions correctly: the biggest question is how big this EF would need to be. I took part in a big discussion about this at CCC last year. To me, the evidence suggests that such an EF could actually be quite a lot smaller than most people would think it would need to be (some people disagreed). I think that one day, a chess program will be like a calculator in the sense that you input a position and immediately get the correct answer back.
We agree that deep search eliminates the need for some EF knowledge. Likewise, deep knowledge of chess would eliminate the need for a lot of the surface knowledge. By surface knowledge, I mean knowledge that a good human player can explain, and by deep knowledge I mean either knowledge that a good player has but cannot explain, or knowledge that good humans don't even have.
matejst likes this post
Sponsored content
Subject: Re: Testing at fixed depth -- a question to Ed