Fire 8 released

Subject: Re: Fire 8 released Sat 6 Mar 2021 - 12:56

TheSelfImprover wrote:

mwyoung wrote:: Results at 1000ms. Corrected.

Apologies for pointing out the obvious and for being negative, but you've also got a high similarity between SF12 and SF13. Are you trying to confirm or disprove something?

We trying to find out if the sim-test still makes sense for NNUE engines, my opinion is that it doesn't.

Posts : 880 Join date : 2020-11-25 Location : USA

Ozymandias wrote:: For some reason, your 100ms results aren't comparable to mine, so do you think you could run the Github net? 250ms looks good enough.

Yes, post the link here. And I will also run your request.

Posts : 880 Join date : 2020-11-25 Location : USA

"The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder

Posted for clarification.

Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.

Posts : 622 Join date : 2020-11-23

mwyoung wrote:

Ozymandias wrote:: For some reason, your 100ms results aren't comparable to mine, so do you think you could run the Github net? 250ms looks good enough.

Yes, post the link here. And I will also run your request.

There you go, thx in advance.

Posts : 1254 Join date : 2020-11-17 Location : France

mwyoung wrote:: "The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder

Posted for clarification.

Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.

Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point!
Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals.
If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?

Posts : 622 Join date : 2020-11-23

In my case, I'm interested in seeing whether it's "more" or "less" like SF.

Subject: Re: Fire 8 released Sat 6 Mar 2021 - 18:55

Chris Whittington wrote:

mwyoung wrote:: "The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder

Posted for clarification.

Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.

Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point!
Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals.
If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?

Score similarity (instead of moves) has been tried by Miguel Ballicora author of Gaviota. It never came to a release. I can imagine that, I have seen beta examples. Can you come up with an idea how to?

Posts : 1254 Join date : 2020-11-17 Location : France

Admin wrote:

Chris Whittington wrote:

mwyoung wrote:: "The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder

Posted for clarification.

Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.

Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point!
Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals.
If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?

Score similarity (instead of moves) has been tried by Miguel Ballicora author of Gaviota. It never came to a release. I can imagine that, I have seen beta examples. Can you come up with an idea how to?

Use the first SF NNUE as baseline. Is that SF12?
Get the evals from depth whatever search, or timed search for say 100000 positions.
Same thing for FF.
Calculate the MSE, or mean absolute error (difference). That gives you one number in centipawns, the smaller it is, the more the evals (therefore NNUEs) are the same. Same thing with done other NN engines. Will FF be a similarity outlier?
Not saying you’re going to find anything, but that would be where to start.

Subject: Re: Fire 8 released Sat 6 Mar 2021 - 21:09

Chris Whittington wrote:

Admin wrote:

Chris Whittington wrote:

mwyoung wrote:: "The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder

Posted for clarification.

Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.

Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point!
Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals.
If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?

Score similarity (instead of moves) has been tried by Miguel Ballicora author of Gaviota. It never came to a release. I can imagine that, I have seen beta examples. Can you come up with an idea how to?

Use the first SF NNUE as baseline. Is that SF12?
Get the evals from depth whatever search, or timed search for say 100000 positions.
Same thing for FF.
Calculate the MSE, or mean absolute error (difference). That gives you one number in centipawns, the smaller it is, the more the evals (therefore NNUEs) are the same. Same thing with done other NN engines. Will FF be a similarity outlier?
Not saying you’re going to find anything, but that would be where to start.

Ah, same thing we did in the debate with Bob when he questioned the sim-test. Calculate the average score.

Posts : 880 Join date : 2020-11-25 Location : USA

No one is saying we are not going to find anything. Just a clarification. The results being generated using millisecond search is not conclusive on the current output being posted.

Testing still continues...

Posts : 1254 Join date : 2020-11-17 Location : France

Admin wrote:

Chris Whittington wrote:

Admin wrote:

Chris Whittington wrote:

mwyoung wrote:: "The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder

Posted for clarification.

Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.

Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point!
Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals.
If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?

Score similarity (instead of moves) has been tried by Miguel Ballicora author of Gaviota. It never came to a release. I can imagine that, I have seen beta examples. Can you come up with an idea how to?

Use the first SF NNUE as baseline. Is that SF12?
Get the evals from depth whatever search, or timed search for say 100000 positions.
Same thing for FF.
Calculate the MSE, or mean absolute error (difference). That gives you one number in centipawns, the smaller it is, the more the evals (therefore NNUEs) are the same. Same thing with done other NN engines. Will FF be a similarity outlier?
Not saying you’re going to find anything, but that would be where to start.

Ah, same thing we did in the debate with Bob when he questioned the sim-test. Calculate the average score.

Not sure about that. I don’t think we did an average on any delta scores.

Subject: Re: Fire 8 released Sat 6 Mar 2021 - 23:29

I remember I did, with the increase of Cratfy's similarity to Fruit the average score almost became equal. Anyway, I calculated the average score for 100ms and 250ms. Maybe I should try a more advanced way of counting.

Code:: 100ms 250ms
Positions 8238 Ether Nemor RubiC SF12c SF12n SF13c SF13n eval Positions 8238 Ether Nemor RubiC SF12c SF12n SF13c SF13n eval
Ethereal-NNUE ----- 52.73 54.95 46.81 56.68 47.29 54.21 0.14 Ethereal-NNUE ----- 53.93 58.70 50.81 61.82 49.97 58.40 0.13
Nemorino 6.00 52.73 ----- 54.12 46.05 53.73 45.60 50.42 0.09 Nemorino 6.00 53.93 ----- 55.08 47.46 54.56 47.24 51.90 0.09
RubiChess-NNUE 54.95 54.12 ----- 45.28 55.13 45.14 53.03 0.13 RubiChess-NNUE 58.70 55.08 ----- 48.55 59.71 48.78 56.32 0.12
SF12-classic 46.81 46.05 45.28 ----- 48.89 62.43 49.92 0.22 SF12-classic 50.81 47.46 48.55 ----- 52.29 65.50 53.87 0.19
SF12 56.68 53.73 55.13 48.89 ----- 48.74 56.76 0.04 SF12 61.82 54.56 59.71 52.29 ----- 51.91 61.36 0.04
SF13-classic 47.29 45.60 45.14 62.43 48.74 ----- 50.50 0.22 SF13-classic 49.97 47.24 48.78 65.50 51.91 ----- 53.94 0.20
SF13 54.21 50.42 53.03 49.92 56.76 50.50 ----- 0.07 SF13 58.40 51.90 56.32 53.87 61.36 53.94 ----- 0.05

Posts : 880 Join date : 2020-11-25 Location : USA

Results at 2500ms NNUE=False.

Posts : 880 Join date : 2020-11-25 Location : USA

Results at 1000ms NNUE=False.

Posts : 880 Join date : 2020-11-25 Location : USA

Results at 500ms NNUE=False.

Posts : 880 Join date : 2020-11-25 Location : USA

Results at 250ms NNUE=False.

Subject: Re: Fire 8 released Mon 8 Mar 2021 - 8:15

@Mark, did you receive my PM?

Posts : 880 Join date : 2020-11-25 Location : USA

Admin wrote:: @Mark, did you receive my PM?

I did. And I will seed you all I have. If something is not right let me know. 1 more run. 100ms. I just got home from work. And the run will be quick.