eGPU Performance Device to Host

sefad · November 12, 2019

Hello,

Using cuda-z and AIDA64 GPU benchmark I can see that "Device to Host" / "Memory Read" memory bandwidth is around 2.5 GiB/s on my eGPU Thunderbolt 3 GTX 1080.

On an opengl app I am developing, I'm retrieving into CPU memory each frame/texture using http://spout.zeal.co/

My textures are 3840x2160 RGBA, right now it takes around 50 ms to retrieve the texture into client memory.

If one compares with the benchmarks, I should be able to transfer the 31MB in less than 15 ms which is what I need at 60 fps.

Spout uses an improved memcpy : https://github.com/leadedge/Spout2/blob/master/SpoutSDK/Source/SpoutCopy.cpp#L136 using the sse2 extension. After having setup a PBO and having mapped it to CPU memory : https://github.com/leadedge/Spout2/blob/master/SpoutSDK/Source/SpoutGLDXinterop.cpp#L2209

Cuda-z uses cudaMemcpy

Would anyone have an idea of why I am not achieving the speed shown in the benchmarks?
Would someone tell how the memory is copied from GPU to CPU in AIDA64?
Is there any faster way to copy GPU memory in CPU memory than PBO/Mapping/SSE2 copy?

Thanks in advance for any help.

Cheers

Fiery · November 14, 2019

On ‎11‎/‎12‎/‎2019 at 8:42 PM, sefad said:

Hello,

Using cuda-z and AIDA64 GPU benchmark I can see that "Device to Host" / "Memory Read" memory bandwidth is around 2.5 GiB/s on my eGPU Thunderbolt 3 GTX 1080.

On an opengl app I am developing, I'm retrieving into CPU memory each frame/texture using http://spout.zeal.co/

My textures are 3840x2160 RGBA, right now it takes around 50 ms to retrieve the texture into client memory.

If one compares with the benchmarks, I should be able to transfer the 31MB in less than 15 ms which is what I need at 60 fps.

Spout uses an improved memcpy : https://github.com/leadedge/Spout2/blob/master/SpoutSDK/Source/SpoutCopy.cpp#L136 using the sse2 extension. After having setup a PBO and having mapped it to CPU memory : https://github.com/leadedge/Spout2/blob/master/SpoutSDK/Source/SpoutGLDXinterop.cpp#L2209

Cuda-z uses cudaMemcpy

Would anyone have an idea of why I am not achieving the speed shown in the benchmarks?
Would someone tell how the memory is copied from GPU to CPU in AIDA64?
Is there any faster way to copy GPU memory in CPU memory than PBO/Mapping/SSE2 copy?

Thanks in advance for any help.

Cheers

AIDA64 uses clEnqueueReadBuffer to read the buffer from the GPU to the CPU. There're two methods to allocate buffers however, one is pinned and the other is pageable. I'm not sure what you can or cannot achieve using OpenGL, so make sure to Google on those terms (pinned buffer, pageable buffer).

Sign In

eGPU Performance Device to Host

Recommended Posts

sefad

Fiery

Join the conversation

Support

Online Store

Browse

Activity