Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Fire 8 released Sat 6 Mar 2021 - 12:56
TheSelfImprover wrote:
mwyoung wrote:
Results at 1000ms. Corrected.
Apologies for pointing out the obvious and for being negative, but you've also got a high similarity between SF12 and SF13. Are you trying to confirm or disprove something?
We trying to find out if the sim-test still makes sense for NNUE engines, my opinion is that it doesn't.
TheSelfImprover likes this post
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Fire 8 released Sat 6 Mar 2021 - 15:33
Ozymandias wrote:
For some reason, your 100ms results aren't comparable to mine, so do you think you could run the Github net? 250ms looks good enough.
Yes, post the link here. And I will also run your request.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Fire 8 released Sat 6 Mar 2021 - 15:44
"The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder
Posted for clarification.
Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.
Ozymandias
Posts : 622 Join date : 2020-11-23
Subject: Re: Fire 8 released Sat 6 Mar 2021 - 16:00
mwyoung wrote:
Ozymandias wrote:
For some reason, your 100ms results aren't comparable to mine, so do you think you could run the Github net? 250ms looks good enough.
Yes, post the link here. And I will also run your request.
Posts : 1254 Join date : 2020-11-17 Location : France
Subject: Re: Fire 8 released Sat 6 Mar 2021 - 17:01
mwyoung wrote:
"The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder
Posted for clarification.
Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.
Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point! Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals. If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?
Ozymandias
Posts : 622 Join date : 2020-11-23
Subject: Re: Fire 8 released Sat 6 Mar 2021 - 18:53
In my case, I'm interested in seeing whether it's "more" or "less" like SF.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Fire 8 released Sat 6 Mar 2021 - 18:55
Chris Whittington wrote:
mwyoung wrote:
"The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder
Posted for clarification.
Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.
Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point! Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals. If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?
Score similarity (instead of moves) has been tried by Miguel Ballicora author of Gaviota. It never came to a release. I can imagine that, I have seen beta examples. Can you come up with an idea how to?
Chris Whittington
Posts : 1254 Join date : 2020-11-17 Location : France
Subject: Re: Fire 8 released Sat 6 Mar 2021 - 20:45
Admin wrote:
Chris Whittington wrote:
mwyoung wrote:
"The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder
Posted for clarification.
Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.
Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point! Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals. If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?
Score similarity (instead of moves) has been tried by Miguel Ballicora author of Gaviota. It never came to a release. I can imagine that, I have seen beta examples. Can you come up with an idea how to?
Use the first SF NNUE as baseline. Is that SF12? Get the evals from depth whatever search, or timed search for say 100000 positions. Same thing for FF. Calculate the MSE, or mean absolute error (difference). That gives you one number in centipawns, the smaller it is, the more the evals (therefore NNUEs) are the same. Same thing with done other NN engines. Will FF be a similarity outlier? Not saying you’re going to find anything, but that would be where to start.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Fire 8 released Sat 6 Mar 2021 - 21:09
Chris Whittington wrote:
Admin wrote:
Chris Whittington wrote:
mwyoung wrote:
"The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder
Posted for clarification.
Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.
Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point! Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals. If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?
Score similarity (instead of moves) has been tried by Miguel Ballicora author of Gaviota. It never came to a release. I can imagine that, I have seen beta examples. Can you come up with an idea how to?
Use the first SF NNUE as baseline. Is that SF12? Get the evals from depth whatever search, or timed search for say 100000 positions. Same thing for FF. Calculate the MSE, or mean absolute error (difference). That gives you one number in centipawns, the smaller it is, the more the evals (therefore NNUEs) are the same. Same thing with done other NN engines. Will FF be a similarity outlier? Not saying you’re going to find anything, but that would be where to start.
Ah, same thing we did in the debate with Bob when he questioned the sim-test. Calculate the average score.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Fire 8 released Sat 6 Mar 2021 - 21:17
No one is saying we are not going to find anything. Just a clarification. The results being generated using millisecond search is not conclusive on the current output being posted.
Testing still continues...
Chris Whittington
Posts : 1254 Join date : 2020-11-17 Location : France
Subject: Re: Fire 8 released Sat 6 Mar 2021 - 23:17
Admin wrote:
Chris Whittington wrote:
Admin wrote:
Chris Whittington wrote:
mwyoung wrote:
"The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder
Posted for clarification.
Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.
Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point! Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals. If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?
Score similarity (instead of moves) has been tried by Miguel Ballicora author of Gaviota. It never came to a release. I can imagine that, I have seen beta examples. Can you come up with an idea how to?
Use the first SF NNUE as baseline. Is that SF12? Get the evals from depth whatever search, or timed search for say 100000 positions. Same thing for FF. Calculate the MSE, or mean absolute error (difference). That gives you one number in centipawns, the smaller it is, the more the evals (therefore NNUEs) are the same. Same thing with done other NN engines. Will FF be a similarity outlier? Not saying you’re going to find anything, but that would be where to start.
Ah, same thing we did in the debate with Bob when he questioned the sim-test. Calculate the average score.
Not sure about that. I don’t think we did an average on any delta scores.
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Fire 8 released Sat 6 Mar 2021 - 23:29
I remember I did, with the increase of Cratfy's similarity to Fruit the average score almost became equal. Anyway, I calculated the average score for 100ms and 250ms. Maybe I should try a more advanced way of counting.
Code:
100ms                                250ms Positions  8238 Ether Nemor RubiC SF12c SF12n SF13c SF13n  eval   Positions  8238 Ether Nemor RubiC SF12c SF12n SF13c SF13n  eval Ethereal-NNUE  ----- 52.73 54.95 46.81 56.68 47.29 54.21  0.14   Ethereal-NNUE  ----- 53.93 58.70 50.81 61.82 49.97 58.40  0.13 Nemorino 6.00  52.73 ----- 54.12 46.05 53.73 45.60 50.42  0.09   Nemorino 6.00  53.93 ----- 55.08 47.46 54.56 47.24 51.90  0.09 RubiChess-NNUE  54.95 54.12 ----- 45.28 55.13 45.14 53.03  0.13   RubiChess-NNUE  58.70 55.08 ----- 48.55 59.71 48.78 56.32  0.12 SF12-classic   46.81 46.05 45.28 ----- 48.89 62.43 49.92  0.22   SF12-classic   50.81 47.46 48.55 ----- 52.29 65.50 53.87  0.19 SF12       56.68 53.73 55.13 48.89 ----- 48.74 56.76  0.04   SF12       61.82 54.56 59.71 52.29 ----- 51.91 61.36  0.04 SF13-classic   47.29 45.60 45.14 62.43 48.74 ----- 50.50  0.22   SF13-classic   49.97 47.24 48.78 65.50 51.91 ----- 53.94  0.20 SF13       54.21 50.42 53.03 49.92 56.76 50.50 -----  0.07   SF13       58.40 51.90 56.32 53.87 61.36 53.94 -----  0.05
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Fire 8 released Sun 7 Mar 2021 - 6:14
Results at 2500ms NNUE=False.
Ozymandias likes this post
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Fire 8 released Sun 7 Mar 2021 - 16:07
Results at 1000ms NNUE=False.
Admin and Ozymandias like this post
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Fire 8 released Mon 8 Mar 2021 - 6:09
Results at 500ms NNUE=False.
Ozymandias likes this post
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Fire 8 released Mon 8 Mar 2021 - 8:07
Results at 250ms NNUE=False.
Ozymandias likes this post
Admin Admin
Posts : 2608 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Fire 8 released Mon 8 Mar 2021 - 8:15
@Mark, did you receive my PM?
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Fire 8 released Mon 8 Mar 2021 - 8:59
Admin wrote:
@Mark, did you receive my PM?
I did. And I will seed you all I have. If something is not right let me know. 1 more run. 100ms. I just got home from work. And the run will be quick.
Last edited by mwyoung on Mon 8 Mar 2021 - 9:18; edited 2 times in total
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Fire 8 released Mon 8 Mar 2021 - 9:00
Results at 100ms NNUE=False.
Ozymandias likes this post
Ozymandias
Posts : 622 Join date : 2020-11-23
Subject: Re: Fire 8 released Mon 8 Mar 2021 - 17:00
mwyoung wrote:
1 more run.
Did I miss the github results?
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Fire 8 released Mon 8 Mar 2021 - 18:14
Ozymandias wrote:
mwyoung wrote:
1 more run.
Did I miss the github results?
You are next.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Fire 8 released Mon 8 Mar 2021 - 21:08
Ozymandias wrote:
mwyoung wrote:
1 more run.
Did I miss the github results?
Ozymandias likes this post
Ozymandias
Posts : 622 Join date : 2020-11-23
Subject: Re: Fire 8 released Mon 8 Mar 2021 - 22:20
Thx, and you said that the 100ms run gave you results similar to these one?: