Posts : 2522 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Fire 8 released Sat Mar 06, 2021 12:56 pm
TheSelfImprover wrote:
mwyoung wrote:
Results at 1000ms. Corrected.
Apologies for pointing out the obvious and for being negative, but you've also got a high similarity between SF12 and SF13. Are you trying to confirm or disprove something?
We trying to find out if the sim-test still makes sense for NNUE engines, my opinion is that it doesn't.
TheSelfImprover likes this post
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Fire 8 released Sat Mar 06, 2021 3:33 pm
Ozymandias wrote:
For some reason, your 100ms results aren't comparable to mine, so do you think you could run the Github net? 250ms looks good enough.
Yes, post the link here. And I will also run your request.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Fire 8 released Sat Mar 06, 2021 3:44 pm
"The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder
Posted for clarification.
Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.
Ozymandias
Posts : 622 Join date : 2020-11-23
Subject: Re: Fire 8 released Sat Mar 06, 2021 4:00 pm
mwyoung wrote:
Ozymandias wrote:
For some reason, your 100ms results aren't comparable to mine, so do you think you could run the Github net? 250ms looks good enough.
Yes, post the link here. And I will also run your request.
Posts : 1254 Join date : 2020-11-17 Location : France
Subject: Re: Fire 8 released Sat Mar 06, 2021 5:01 pm
mwyoung wrote:
"The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder
Posted for clarification.
Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.
Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point! Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals. If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?
Ozymandias
Posts : 622 Join date : 2020-11-23
Subject: Re: Fire 8 released Sat Mar 06, 2021 6:53 pm
In my case, I'm interested in seeing whether it's "more" or "less" like SF.
Admin Admin
Posts : 2522 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Fire 8 released Sat Mar 06, 2021 6:55 pm
Chris Whittington wrote:
mwyoung wrote:
"The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder
Posted for clarification.
Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.
Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point! Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals. If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?
Score similarity (instead of moves) has been tried by Miguel Ballicora author of Gaviota. It never came to a release. I can imagine that, I have seen beta examples. Can you come up with an idea how to?
Chris Whittington
Posts : 1254 Join date : 2020-11-17 Location : France
Subject: Re: Fire 8 released Sat Mar 06, 2021 8:45 pm
Admin wrote:
Chris Whittington wrote:
mwyoung wrote:
"The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder
Posted for clarification.
Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.
Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point! Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals. If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?
Score similarity (instead of moves) has been tried by Miguel Ballicora author of Gaviota. It never came to a release. I can imagine that, I have seen beta examples. Can you come up with an idea how to?
Use the first SF NNUE as baseline. Is that SF12? Get the evals from depth whatever search, or timed search for say 100000 positions. Same thing for FF. Calculate the MSE, or mean absolute error (difference). That gives you one number in centipawns, the smaller it is, the more the evals (therefore NNUEs) are the same. Same thing with done other NN engines. Will FF be a similarity outlier? Not saying you’re going to find anything, but that would be where to start.
Admin Admin
Posts : 2522 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Fire 8 released Sat Mar 06, 2021 9:09 pm
Chris Whittington wrote:
Admin wrote:
Chris Whittington wrote:
mwyoung wrote:
"The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder
Posted for clarification.
Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.
Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point! Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals. If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?
Score similarity (instead of moves) has been tried by Miguel Ballicora author of Gaviota. It never came to a release. I can imagine that, I have seen beta examples. Can you come up with an idea how to?
Use the first SF NNUE as baseline. Is that SF12? Get the evals from depth whatever search, or timed search for say 100000 positions. Same thing for FF. Calculate the MSE, or mean absolute error (difference). That gives you one number in centipawns, the smaller it is, the more the evals (therefore NNUEs) are the same. Same thing with done other NN engines. Will FF be a similarity outlier? Not saying you’re going to find anything, but that would be where to start.
Ah, same thing we did in the debate with Bob when he questioned the sim-test. Calculate the average score.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: Fire 8 released Sat Mar 06, 2021 9:17 pm
No one is saying we are not going to find anything. Just a clarification. The results being generated using millisecond search is not conclusive on the current output being posted.
Testing still continues...
Chris Whittington
Posts : 1254 Join date : 2020-11-17 Location : France
Subject: Re: Fire 8 released Sat Mar 06, 2021 11:17 pm
Admin wrote:
Chris Whittington wrote:
Admin wrote:
Chris Whittington wrote:
mwyoung wrote:
"The research is about if the sim-test still makes sense with NNUE engines. And the conclusion is that there is no sense, for NNUE engines the sim-test is dead." Ed Schroder
Posted for clarification.
Meaning that any sim-test results showing any kind of red flag, or the opposite with NNUE engines. Most likely means nothing. But you also need to show this conclusion is correct with data.
Nobody paid any attention to my post the other day. You’re all too effing slow to grasp the point! Stop comparing moves and start comparing the evaluation. To tell if a NNUE was trained on Stockfish, compare the scale, scoring range and MSE of its evals against SF evals. If something has SF search, more or less the same Elo, and it’s NNUE was trained using SF, then it more or less is SF. Which is what you’re looking for, no?
Score similarity (instead of moves) has been tried by Miguel Ballicora author of Gaviota. It never came to a release. I can imagine that, I have seen beta examples. Can you come up with an idea how to?
Use the first SF NNUE as baseline. Is that SF12? Get the evals from depth whatever search, or timed search for say 100000 positions. Same thing for FF. Calculate the MSE, or mean absolute error (difference). That gives you one number in centipawns, the smaller it is, the more the evals (therefore NNUEs) are the same. Same thing with done other NN engines. Will FF be a similarity outlier? Not saying you’re going to find anything, but that would be where to start.
Ah, same thing we did in the debate with Bob when he questioned the sim-test. Calculate the average score.
Not sure about that. I don’t think we did an average on any delta scores.
Admin Admin
Posts : 2522 Join date : 2020-11-17 Location : Netherlands
Subject: Re: Fire 8 released Sat Mar 06, 2021 11:29 pm
I remember I did, with the increase of Cratfy's similarity to Fruit the average score almost became equal. Anyway, I calculated the average score for 100ms and 250ms. Maybe I should try a more advanced way of counting.