Jump to content

Volvo480

Members
  • Posts

    4
  • Joined

  • Last visited

Everything posted by Volvo480

  1. Hej @Ivan_80 . Thank you for your additional informations. With these I found more interesting Infos about specs and definitions of RDNA3 @ the chips&cheese Website https://chipsandcheese.com/2023/01/07/microbenchmarking-amds-rdna-3-graphics-architecture/ and their infos to 32bit integer IOPS (MAD). "... Since Turing, Nvidia also achieves very good integer multiplication performance. Integer multiplication appears to be extremely rare in shader code, and AMD doesn’t seem to have optimized for it. 32-bit integer multiplication executes at around a quarter of FP32 rate, and latency is pretty high too... " We see ~61.xxx dual-issue GFLOPS FP32 for the 7900 XTX in GPGPU benches. The quarter of roundabout non-dual-issue 30.xxx GFLOPS FP32 from the C&C website is ~8.xxx GIOPS INT32. So the numbers in the GPGPU benchmark seem to be okay then.
  2. When searching for screenshots of 4090s GPGPU benchmark values for VRAM memory copy for comparing, there are often values with more than 20xxGB/s visible: e.g. In theory the 4090s VRAM has a bandwith of round about 10xx GB/s - see techpowerup and geizhals and NVIDIA (page 13) . So the "measured" memory copy values seems to be much too high. Because for AMD RDNA3 GPU the memory copy is round about 9xx GB/s - that fits more to the theoretical specs of the used GDDR6 of 960 GB/s: OR: If the AIDA64 GPGPU tool is maybe errorly measuring the large L2 Cache (72MB) bandwith of the AD102 instead of the GDDR6X RAM, then it should measure the 96MB L3 cache bandwith of AMD RDNA 3 GPU too. Maybe with a new field in GPGPU?
  3. Hello. For Nvidias 4090 the AIDA Tool is counting the 128 CUs. Seems okay. For the 7900 XTX i expected 96 CUs because the System Summary also show it, but the Benchmark Tool is just showing 48. Maybe the GPGPU tool is just "using/measuring" half of the AMD CUs or MAD capabilities and thats why the values for e.g. for IOPS (24/32/64) etc. for the 7900XTX are so much lower in comparison to the great and powerfull 4090 with 128 "CU", which has surprisingly no big difference or loss from 24bit to 32bit integer IOPS (but should have - see AIDA manual). Or some values are just errorly doubled for the 4090?
×
×
  • Create New...