Posts : 2528 Join date : 2020-11-17 Location : Netherlands
Subject: GPU speed benchmark Sat May 01, 2021 5:41 pm
Please run this little benchmark. I will maintain the results.
Start Lc0 v27 (or a very recent version) from the command line by typing "go movetime 10000". Lc0 will run for 10 seconds. Then contribute the GPU, NPS and NN size 128, 256 or 384.
My 1060 6Gb NN 128 - nps = 19.443 NN 256 - nps = 3.736 NN 384 - nps = 1.147
Mclane
Posts : 2921 Join date : 2020-11-17 Age : 57 Location : United States of Europe, Germany, Ruhr area
Subject: Re: GPU speed benchmark Sat May 01, 2021 8:41 pm
Mine says go is no command
„Unknown command line argument: go Why not benchmark ?! Also if we all use different nets the whole relation makes no sense.
I did benchmark with the amd RX470 , this is DX12, the nvidias use CUDA.
Total time 351347 Nps 727 Net j92-100
Total time 359207 Nps 694 Net j94-130
Last edited by Mclane on Sat May 01, 2021 9:05 pm; edited 2 times in total
Admin Admin
Posts : 2528 Join date : 2020-11-17 Location : Netherlands
Subject: Re: GPU speed benchmark Sat May 01, 2021 8:50 pm
Mclane wrote:
Mine says go is no command Why not benchmark ?! Also if we all use different nets the whole relation makes no sense.
1. First start Lc0, then go movetime 10000
2. These benchmarks on internet are not to be trusted when it comes to chess.
3. Your 1060 should give similar NPS to mine.
Mclane
Posts : 2921 Join date : 2020-11-17 Age : 57 Location : United States of Europe, Germany, Ruhr area
Subject: Re: GPU speed benchmark Sat May 01, 2021 9:14 pm
Admin wrote:
Mclane wrote:
Mine says go is no command Why not benchmark ?! Also if we all use different nets the whole relation makes no sense.
1. First start Lc0, then go movetime 10000
2. These benchmarks on internet are not to be trusted when it comes to chess.
3. Your 1060 should give similar NPS to mine.
I see. Mistake was to start lc0 directly with the command. You do 2 steps. Starting lc0 and then go movetime.
Amd rx470 (4GB) (Xeon 6 core) 627 NPS with j94-130 Nvidia 1060 (3 GB) (on the phenom II X6 core). 970 NPS j94-130 Nvidia 2070 (8 gb) (ryzen9 with 8 cores) 9021 NPS j94-130 Amd rx570 (8GB) (ryzen5 with 6 cores) 4710 NPS j94-130
Last edited by Mclane on Sat May 01, 2021 10:27 pm; edited 1 time in total
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: GPU speed benchmark Sun May 02, 2021 4:25 am
Lc0 v27 running on a RTX 2080ti
NN 128 - 155,314 nps NN 192 - 71,118 nps NN 256 - 39,198 nps NN 320 - 20,518 nps NN 384 - 15,769 nps NN 512 - 8,097 nps
Admin Admin
Posts : 2528 Join date : 2020-11-17 Location : Netherlands
Subject: Re: GPU speed benchmark Sun May 02, 2021 12:04 pm
Mclane wrote:
Admin wrote:
Mclane wrote:
Mine says go is no command Why not benchmark ?! Also if we all use different nets the whole relation makes no sense.
1. First start Lc0, then go movetime 10000
2. These benchmarks on internet are not to be trusted when it comes to chess.
3. Your 1060 should give similar NPS to mine.
I see. Mistake was to start lc0 directly with the command. You do 2 steps. Starting lc0 and then go movetime.
Amd rx470 (4GB) (Xeon 6 core) 627 NPS with j94-130 Nvidia 1060 (3 GB) (on the phenom II X6 core). 970 NPS j94-130 Nvidia 2070 (8 gb) (ryzen9 with 8 cores) 9021 NPS j94-130 Amd rx570 (8GB) (ryzen5 with 6 cores) 4710 NPS j94-130
Numbers make sense since j94-130 is a typical 384x30 net.
Admin Admin
Posts : 2528 Join date : 2020-11-17 Location : Netherlands
Subject: Re: GPU speed benchmark Sun May 02, 2021 12:09 pm
mwyoung wrote:
Lc0 v27 running on a RTX 2080ti
NN 128 - 155,314 nps NN 192 - 71,118 nps NN 256 - 39,198 nps NN 320 - 20,518 nps NN 384 - 15,769 nps NN 512 - 8,097 nps
Great!
And so that means that my poor 1060 runs 39198/3736 = 10,49197002141328 times slower
Mclane
Posts : 2921 Join date : 2020-11-17 Age : 57 Location : United States of Europe, Germany, Ruhr area
Subject: Re: GPU speed benchmark Sun May 02, 2021 12:40 pm
There are years between these graphical cards. Important is that the gpu fits to the cpu
Admin Admin
Posts : 2528 Join date : 2020-11-17 Location : Netherlands
Subject: Re: GPU speed benchmark Sun May 02, 2021 2:00 pm
Mclane wrote:
There are years between these graphical cards. Important is that the gpu fits to the cpu
But it never can fit, apples and oranges. I think the CCRL folks found an elegant compromise, on the 40/2 only NN engines on RTX-2080 versus PC on 8 cores, and no NN engines on 40/15. Elegant, but it does not say anything about who is stronger. It's based on the assumption that SF is equal in strength to Lc0 in this configuration. A fair rating list can only be done on equal hardware.
Mclane
Posts : 2921 Join date : 2020-11-17 Age : 57 Location : United States of Europe, Germany, Ruhr area
Subject: Re: GPU speed benchmark Sun May 02, 2021 2:06 pm
Yes but even my 2070 lc0 is stronger then Stockfish 13 on 8 cores.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: GPU speed benchmark Sun May 02, 2021 2:27 pm
Admin wrote:
Mclane wrote:
There are years between these graphical cards. Important is that the gpu fits to the cpu
But it never can fit, apples and oranges. I think the CCRL folks found an elegant compromise, on the 40/2 only NN engines on RTX-2080 versus PC on 8 cores, and no NN engines on 40/15. Elegant, but it does not say anything about who is stronger. It's based on the assumption that SF is equal in strength to Lc0 in this configuration. A fair rating list can only be done on equal hardware.
It can never be fair. What is fair. When Lc0 wins, or has the some cost in hardware, or has the same Lc0 ratio as the AZ match. I just rate Lc0 as Lc0 on x hardware. CPU or GPU with the net name. And rate Lc0 like a standalone dedicated chess computer.
The problem for Lc0 is you must rate the hardware and the net. As both are major factors in the overall rating of the group.
For example "CCRL Rating - 5. Lc0 0.26.3 t40-1541 RTX2080 3652 +16 −16 59.7% −61.5 62.4% 1094"
CCRL rating is misleading, as the t40 net is weak compared to the better nets.
The other issue is time controls. As Lc0 plays much better with more time. So playing only blitz is also misleading.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: GPU speed benchmark Sun May 02, 2021 2:29 pm
Mclane wrote:
Yes but even my 2070 lc0 is stronger then Stockfish 13 on 8 cores.
At what time control and net. I had Stockfish beat my RTX with 1 core at fast time controls.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: GPU speed benchmark Sun May 02, 2021 2:54 pm
Admin wrote:
mwyoung wrote:
Lc0 v27 running on a RTX 2080ti
NN 128 - 155,314 nps NN 192 - 71,118 nps NN 256 - 39,198 nps NN 320 - 20,518 nps NN 384 - 15,769 nps NN 512 - 8,097 nps
Great!
And so that means that my poor 1060 runs 39198/3736 = 10,49197002141328 times slower
Lc0 runs fast on tensor cores fp-16. And even a cheap RTX will work great. If you can find one.
And what is great about Lc0 is you can keep your GTX, and run Lc0 with any two cards. If you upgrade.
I have 2x 2080 Super GPUs. I used the net 68002 (one of the 384 nets, which tests strongest for my hardware). I activated the tablebase files, because I always do that. I set the threads higher than you are supposed to but I find it causes no problems ever, including with speed.
I notice that the LC0 output claims 31346 NPS, so it must be accumulating all of the nodes from all of the previous plies, whereas I used the numbers from only the last ply. So I think that the LC0 output is correct.
When you have 2 GPU's can Lc0 profit from that, like a CPU can use 2 threads?
That is right. But it has a limit, I think. Last I saw, three GPUs was the maximum that could be utilized effectively. But as hardware advances things like that change, so I do not know if it is still true. I sure bought mine at the right time. It would cost me three or four times as much if I bought them today.
When you have 2 GPU's can Lc0 profit from that, like a CPU can use 2 threads?
This is where I told LC0 to use both of my GPUs:
setoption name BackendOptions value backend=cuda-fp16,(gpu=0),(gpu=1) setoption name Backend value multiplexing
The first setoption tells it what kind of math to use (cuda-fp16) and which GPUs to use (gpu=0),(gpu=1). The second command tells it how to share the work (multiplexing).
I don't really understand the different backend values, but those bench best for me. I have read a document on how to set the parameters (it is old) but it does not really explain anything.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: GPU speed benchmark Sat May 29, 2021 12:42 am
Dann Corbit wrote:
mwyoung wrote:
Lc0 v27 running on a RTX 2080ti
NN 128 - 155,314 nps NN 192 - 71,118 nps NN 256 - 39,198 nps NN 320 - 20,518 nps NN 384 - 15,769 nps NN 512 - 8,097 nps
Where do we find 512 nets? I would like to test them.
Thanks, I am running a contest with it against 68002 at one minute base plus one second to get a rough idea of what we have.
Dann Corbit
Posts : 188 Join date : 2020-11-26
Subject: Re: GPU speed benchmark Sat May 29, 2021 6:58 am
Avert your eyes if you are not the sort who enjoys watching baby seals being clubbed to death by angry Canadians:
Code:
Program Elo + - Games Score Av.Op. Draws 1 Lc0-68002 : 3311 60 52 80 78.1 % 3089 43.8 % 2 Lc0-512 : 3089 52 60 80 21.9 % 3311 43.8 %
I planned a much longer run, but well, enough bloodshed.
So now the question remains: Was it the net or the card?
I guess it takes at least twice as long to train a double wide net. But whatever the case, this net is no match for 68002.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: GPU speed benchmark Sat May 29, 2021 12:14 pm
Dann Corbit wrote:
Avert your eyes if you are not the sort who enjoys watching baby seals being clubbed to death by angry Canadians:
Code:
Program Elo + - Games Score Av.Op. Draws 1 Lc0-68002 : 3311 60 52 80 78.1 % 3089 43.8 % 2 Lc0-512 : 3089 52 60 80 21.9 % 3311 43.8 %
I planned a much longer run, but well, enough bloodshed.
So now the question remains: Was it the net or the card?
I guess it takes at least twice as long to train a double wide net. But whatever the case, this net is no match for 68002.
Are you turning on the multi-gather option? This seems to help NPS in many positions. I ran some game in 12 hour time control games with the 512 net against Stockfish 13. I played 4 games and it draw all 4 games.
So it can play ok, but needs time. And my guess is the net was not fully trained.
Dann Corbit
Posts : 188 Join date : 2020-11-26
Subject: Re: GPU speed benchmark Sat May 29, 2021 9:30 pm
mwyoung wrote:
Are you turning on the multi-gather option?
Probably not. I do not know what most of the settings are for. I have multiplex selected for the administration of the two cards, because I measured that method fastest, but I do not know anything about multi-gather or how to set it.
mwyoung
Posts : 880 Join date : 2020-11-25 Location : USA
Subject: Re: GPU speed benchmark Sun May 30, 2021 2:08 am
I would turn on Multi-Gather. I am testing it now. And it looks like a must option to have turned ON. In version Lc0 v28. The option will be turned on by default.