Jump to content

eGPU Performance Device to Host


Recommended Posts

Hello,

Using cuda-z and AIDA64 GPU benchmark I can see that "Device to Host" / "Memory Read" memory bandwidth is around 2.5 GiB/s on my eGPU Thunderbolt 3 GTX 1080.

On an opengl app I am developing, I'm retrieving into CPU memory each frame/texture using http://spout.zeal.co/

My textures are 3840x2160 RGBA, right now it takes around 50 ms to retrieve the texture into client memory.

If one compares with the benchmarks, I should be able to transfer the 31MB in less than 15 ms which is what I need at 60 fps.

Spout uses an improved memcpy : https://github.com/leadedge/Spout2/blob/master/SpoutSDK/Source/SpoutCopy.cpp#L136 using the sse2 extension. After having setup a PBO and having mapped it to CPU memory : https://github.com/leadedge/Spout2/blob/master/SpoutSDK/Source/SpoutGLDXinterop.cpp#L2209

Cuda-z uses cudaMemcpy

Would anyone have an idea of why I am not achieving the speed shown in the benchmarks?
Would someone tell how the memory is copied from GPU to CPU in AIDA64?
Is there any faster way to copy GPU memory in CPU memory than PBO/Mapping/SSE2 copy?

Thanks in advance for any help.

Cheers

Link to comment
Share on other sites

On ‎11‎/‎12‎/‎2019 at 8:42 PM, sefad said:

Hello,

Using cuda-z and AIDA64 GPU benchmark I can see that "Device to Host" / "Memory Read" memory bandwidth is around 2.5 GiB/s on my eGPU Thunderbolt 3 GTX 1080.

On an opengl app I am developing, I'm retrieving into CPU memory each frame/texture using http://spout.zeal.co/

My textures are 3840x2160 RGBA, right now it takes around 50 ms to retrieve the texture into client memory.

If one compares with the benchmarks, I should be able to transfer the 31MB in less than 15 ms which is what I need at 60 fps.

Spout uses an improved memcpy : https://github.com/leadedge/Spout2/blob/master/SpoutSDK/Source/SpoutCopy.cpp#L136 using the sse2 extension. After having setup a PBO and having mapped it to CPU memory : https://github.com/leadedge/Spout2/blob/master/SpoutSDK/Source/SpoutGLDXinterop.cpp#L2209

Cuda-z uses cudaMemcpy

Would anyone have an idea of why I am not achieving the speed shown in the benchmarks?
Would someone tell how the memory is copied from GPU to CPU in AIDA64?
Is there any faster way to copy GPU memory in CPU memory than PBO/Mapping/SSE2 copy?

Thanks in advance for any help.

Cheers

AIDA64 uses clEnqueueReadBuffer to read the buffer from the GPU to the CPU.  There're two methods to allocate buffers however, one is pinned and the other is pageable.   I'm not sure what you can or cannot achieve using OpenGL, so make sure to Google on those terms (pinned buffer, pageable buffer).

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...