Jump to content

[RESOLVED] System hang with Odospace and 6800 XT


SnarlingFox

Recommended Posts

Hi folks,

I'm posting this up to see if anyone else is having this issue (or to perhaps make those aware who have these issues that this could be the cause). I have already filed a support ticket for it.

To get started, I have the following setup;

  • AMD Ryzen 5800X CPU
  • Asus ROG STRIX X570-F Gaming Mobo (running at 1900 MHz Infinity Fabric)
  • Corsair Vengeance DDR4-4000 (running XMP and clock slowed to 3800 MHz)
  • Asus ROG STRIX LC 6800 XT GPU
  • Corsair MP600 1TB NVMe SSD
  • Windows 10 x64 with all the latest updates, drivers, firmware and BIOS installed
  • Aida64 6.32.5600 with the sensor polling set to a conservative 2000 ms
  • Odospace plugin enabled with a Samsung Galaxy Tab2 for the display

When my system is at idle or near idle (surfing, Youtube, Spotify, Remote Desktop etc) it will hang at random times (roughly every 1-2 hours on average). This hang is a complete system deadlock, requiring holding the power button down or waiting for the motherboard watchdog timer to hard reboot the system).

If the system (crucially, the GPU) is under load, the system is stable. I can game for hours with no issues. But as soon as it's idle again, it's a ticking time bomb. I tried all sorts of things such as BIOS and firmware updates, intentionally overclocking the CPU, disabling power level states, raising CPU and SoC voltages in the BIOS, lowering RAM speeds, resetting BIOS to optimised defaults, even reinstalling Windows on a new drive.

Through days and days of trial and error, after I reinstalled Windows on a fresh drive I slowly reinstalled things and I narrowed this down to the Odospace plugin. As soon as I disabled this, my system became rock solid stable again. I went back to my main SSD / OS and disabled the Odospace plugin. No crashes for days, even after reboots and power cycles.

It's worth noting that whilst idle, the 6800 XT GPU SoC voltage and Wattage meters disappear and reappear every few seconds. My gut says there's some kind of race-effect where the sensor poll happens at the moment these sensor values "disappear".

Oddly though, with the Odospace plugin disabled, if I thrash the SMBUS by setting the poll rate to 10x speed (200 ms) and watch the sensor page, Aida64 and the system as a whole is still rock solid stable. I would have expected hammering the SMBUS this hard would have caused a lock up.

Another thing to note, a couple of seconds before my system hangs, any sound playing and the mouse cursor all start to skip / judder. Moments later, it hangs completely. No blue screen of death, no logs in Event Viewer, just a complete system hang.

It's my hope that by posting this up, some other poor soul who is experiencing these issues may be guided to temporarily disabling the Odospace plugin until a fix can be found.

@odospace Please note the above - you may need to get involved with the Aida64 team to patch this bug :)

Edited by SnarlingFox
Marking as resolved
Link to comment
Share on other sites

20 minutes ago, SnarlingFox said:

Hi folks,

I'm posting this up to see if anyone else is having this issue (or to perhaps make those aware who have these issues that this could be the cause). I have already filed a support ticket for it.

To get started, I have the following setup;

  • AMD Ryzen 5800X CPU
  • Asus ROG STRIX X570-F Gaming Mobo (running at 1900 MHz Infinity Fabric)
  • Corsair Vengeance DDR4-4000 (running XMP and clock slowed to 3800 MHz)
  • Asus ROG STRIX LC 6800 XT GPU
  • Corsair MP600 1TB NVMe SSD
  • Windows 10 x64 with all the latest updates, drivers, firmware and BIOS installed
  • Aida64 6.32.5600 with the sensor polling set to a conservative 2000 ms
  • Odospace plugin enabled with a Samsung Galaxy Tab2 for the display

When my system is at idle or near idle (surfing, Youtube, Spotify, Remote Desktop etc) it will hang at random times (roughly every 1-2 hours on average). This hang is a complete system deadlock, requiring holding the power button down or waiting for the motherboard watchdog timer to hard reboot the system).

If the system (crucially, the GPU) is under load, the system is stable. I can game for hours with no issues. But as soon as it's idle again, it's a ticking time bomb. I tried all sorts of things such as BIOS and firmware updates, intentionally overclocking the CPU, disabling power level states, raising CPU and SoC voltages in the BIOS, lowering RAM speeds, resetting BIOS to optimised defaults, even reinstalling Windows on a new drive.

Through days and days of trial and error, after I reinstalled Windows on a fresh drive I slowly reinstalled things and I narrowed this down to the Odospace plugin. As soon as I disabled this, my system became rock solid stable again. I went back to my main SSD / OS and disabled the Odospace plugin. No crashes for days, even after reboots and power cycles.

It's worth noting that whilst idle, the 6800 XT GPU SoC voltage and Wattage meters disappear and reappear every few seconds. My gut says there's some kind of race-effect where the sensor poll happens at the moment these sensor values "disappear".

Oddly though, with the Odospace plugin disabled, if I thrash the SMBUS by setting the poll rate to 10x speed (200 ms) and watch the sensor page, Aida64 and the system as a whole is still rock solid stable. I would have expected hammering the SMBUS this hard would have caused a lock up.

Another thing to note, a couple of seconds before my system hangs, any sound playing and the mouse cursor all start to skip / judder. Moments later, it hangs completely. No blue screen of death, no logs in Event Viewer, just a complete system hang.

It's my hope that by posting this up, some other poor soul who is experiencing these issues may be guided to temporarily disabling the Odospace plugin until a fix can be found.

@odospace Please note the above - you may need to get involved with the Aida64 team to patch this bug :)

We've recently applied a major bugfix about AMD GPUs, so before digging any further, please upgrade to the latest beta version of AIDA64 Extreme available at:

https://www.aida64.com/downloads/latesta64xebeta

After upgrading to this new version, make sure to restart Windows to finalize the upgrade.

Please give this a few days and let us know if it makes a difference.

Link to comment
Share on other sites

  • SnarlingFox changed the title to [RESOLVED] System hang with Odospace and 6800 XT
Just now, SnarlingFox said:

@Fiery I can confirm that with the beta version, my system is now rock-solid stable again, you rock! No crashes for a whole week ♥

Some of the GPU metrics still disappear and reappear randomly at idle but I can live with that, they work when gaming which is what I really need :)

Thank you for your feedback!

If you mean the GPU clock disappears/reappears, then it's normal for 6800 XT / 6900 XT video cards.  It's due to the fact that the new AMD GPU generation introduced a very deep sleep where the driver reports zero MHz GPU clock to AIDA64 -- which is considered an invalid reading and triggers the relevant SensorPanel/LCD item to disappear.  We'll need to come up with a workaround for that issue.

Link to comment
Share on other sites

  • 3 weeks later...

Sorry to hijack thread, but I wanted to report that this hang event was not only impacting systems with odospace installed.   I have a new EVGA SR-3 with the Intel Xeon W-3175x.  When running the sensor panel on a 3rd HDMI monitor the system randomly hangs while idle.  I can run 3DMark for hours but as soon as the system is idle the system will hang at random intervals.  I have installed the latest Beta version downloaded this morning and the problem persists.  I am running a pair of EVGA Titans at the moment.  I will test again when the EVGA 3090 FTW3 Hydro Copper gets here next week.  The system seems stable with Aida64 running as long as the sensor panel is not running.

Link to comment
Share on other sites

  • 1 month later...
On 3/20/2021 at 3:29 PM, Hittman said:

Sorry to hijack thread, but I wanted to report that this hang event was not only impacting systems with odospace installed.   I have a new EVGA SR-3 with the Intel Xeon W-3175x.  When running the sensor panel on a 3rd HDMI monitor the system randomly hangs while idle.  I can run 3DMark for hours but as soon as the system is idle the system will hang at random intervals.  I have installed the latest Beta version downloaded this morning and the problem persists.  I am running a pair of EVGA Titans at the moment.  I will test again when the EVGA 3090 FTW3 Hydro Copper gets here next week.  The system seems stable with Aida64 running as long as the sensor panel is not running.

Try to check it with AIDA64 v6.33 (latest version) as well.  Your motherboard uses a very particular sensor solution, so let's start with that.  Do you have any other monitoring software (either made by EVGA or a 3rd party) running in the background that may collide with AIDA64?

Link to comment
Share on other sites

  • 2 months later...

I can confirm I have same issue (crash of OS windows 10 Pro with all patches) with AIDA64 extreme 6.33.5700 build, with AMD drivers in version 21.6.2 and 21.5.1. Problem appear only when I have Sensor Panel Visible and playing games with high refresh rate above 144

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...