sefad Posted November 12, 2019 Share Posted November 12, 2019 Hello, Using cuda-z and AIDA64 GPU benchmark I can see that "Device to Host" / "Memory Read" memory bandwidth is around 2.5 GiB/s on my eGPU Thunderbolt 3 GTX 1080. On an opengl app I am developing, I'm retrieving into CPU memory each frame/texture using http://spout.zeal.co/ My textures are 3840x2160 RGBA, right now it takes around 50 ms to retrieve the texture into client memory. If one compares with the benchmarks, I should be able to transfer the 31MB in less than 15 ms which is what I need at 60 fps. Spout uses an improved memcpy : https://github.com/leadedge/Spout2/blob/master/SpoutSDK/Source/SpoutCopy.cpp#L136 using the sse2 extension. After having setup a PBO and having mapped it to CPU memory : https://github.com/leadedge/Spout2/blob/master/SpoutSDK/Source/SpoutGLDXinterop.cpp#L2209 Cuda-z uses cudaMemcpy Would anyone have an idea of why I am not achieving the speed shown in the benchmarks? Would someone tell how the memory is copied from GPU to CPU in AIDA64? Is there any faster way to copy GPU memory in CPU memory than PBO/Mapping/SSE2 copy? Thanks in advance for any help. Cheers Quote Link to comment Share on other sites More sharing options...
Fiery Posted November 14, 2019 Share Posted November 14, 2019 On 11/12/2019 at 8:42 PM, sefad said: Hello, Using cuda-z and AIDA64 GPU benchmark I can see that "Device to Host" / "Memory Read" memory bandwidth is around 2.5 GiB/s on my eGPU Thunderbolt 3 GTX 1080. On an opengl app I am developing, I'm retrieving into CPU memory each frame/texture using http://spout.zeal.co/ My textures are 3840x2160 RGBA, right now it takes around 50 ms to retrieve the texture into client memory. If one compares with the benchmarks, I should be able to transfer the 31MB in less than 15 ms which is what I need at 60 fps. Spout uses an improved memcpy : https://github.com/leadedge/Spout2/blob/master/SpoutSDK/Source/SpoutCopy.cpp#L136 using the sse2 extension. After having setup a PBO and having mapped it to CPU memory : https://github.com/leadedge/Spout2/blob/master/SpoutSDK/Source/SpoutGLDXinterop.cpp#L2209 Cuda-z uses cudaMemcpy Would anyone have an idea of why I am not achieving the speed shown in the benchmarks? Would someone tell how the memory is copied from GPU to CPU in AIDA64? Is there any faster way to copy GPU memory in CPU memory than PBO/Mapping/SSE2 copy? Thanks in advance for any help. Cheers AIDA64 uses clEnqueueReadBuffer to read the buffer from the GPU to the CPU. There're two methods to allocate buffers however, one is pinned and the other is pageable. I'm not sure what you can or cannot achieve using OpenGL, so make sure to Google on those terms (pinned buffer, pageable buffer). Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.