Jump to content

CPU1 Temperature may be incorrect


N9ZN-Extra

Recommended Posts

ON THE COMPUTER / SENSOR SCREEN

I searched the forum and found no entries for this issue.

While attempting to calibrate RealTemp to my processor temps I noticed that AIDA64 Version 1.20.1150 and RealTemp version 3.60 were in disagreement on the temperature of core 1 of my CPU. Core 1 is the second core of the CPU for clarification purposes. Please note no calibration has been applied to the Real Temp software at the time this was noticed, so you can rule that out.

I do not know if this is an AIDA64 bug or not. If it is not an AIDA64 bug then it may be a RealTemp bug. Something I cannot determined at this time however all other CPU core temps are reporting correctly and correlating between AIDA64 and RealTemp.

I also did notice that Core1 and Core2 temps seem to be changing at the same time to the same values consistantly. Any possibility AIDA64 software may be reporting core2 as core1?

Link to comment
Share on other sites

aida64 temps.bmp

ON THE COMPUTER / SENSOR SCREEN

I searched the forum and found no entries for this issue.

While attempting to calibrate RealTemp to my processor temps I noticed that AIDA64 Version 1.20.1150 and RealTemp version 3.60 were in disagreement on the temperature of core 1 of my CPU. Core 1 is the second core of the CPU for clarification purposes. Please note no calibration has been applied to the Real Temp software at the time this was noticed, so you can rule that out.

I do not know if this is an AIDA64 bug or not. If it is not an AIDA64 bug then it may be a RealTemp bug. Something I cannot determined at this time however all other CPU core temps are reporting correctly and correlating between AIDA64 and RealTemp.

I also did notice that Core1 and Core2 temps seem to be changing at the same time to the same values consistantly. Any possibility AIDA64 software may be reporting core2 as core1?

Additional information:

CE1 and SpeedStep are not enabled

No turbo is enabled

My processor is Intel QX9650 Extreme 45nm Quad core 3.0GHz

Mother Board is EVGA 790I Ultra SLI

Screen shots are attached: Unfortunately I could not post the RealTemp Screen Shot as the system would not allow it (sizeing issues).

NOTE: THIS IS THE SECOND CPU CORE (core 1) I AM REFERING TO WHEN CORES ARE NUMBERED FROM 0 TO 3, IN THE SCREEN SHOT YOU HAVE THIS CORE LABELED AS CORE 2. AS A SUGGESTION STICKING WITH THE INDUSTRY CONVENTIONS OF LABELING THINGS LIKE CORES AND MEMORY BANKS WOULD BE HELPFUL WHEN TRYING TO DESCRIBE THESE KINDS OF ISSUES OR COMMUNICATING WITH OTHERS.

Link to comment
Share on other sites

RealTemp uses a special formula to calculate core temperatures from raw DTS register readings. We advise against using RealTemp as a reference software to measure core temperatures, since every other software (e.g. AIDA64, CoreTemp, HWMonitor, HWiNFO32, SIV, etc) use the formula specified and published by Intel. So, if it's possible, please use either CoreTemp or HWMonitor to compare their measured values against AIDA64 values.

Thanks,

Fiery

Link to comment
Share on other sites

  • 2 weeks later...

RealTemp uses a special formula to calculate core temperatures from raw DTS register readings. We advise against using RealTemp as a reference software to measure core temperatures, since every other software (e.g. AIDA64, CoreTemp, HWMonitor, HWiNFO32, SIV, etc) use the formula specified and published by Intel. So, if it's possible, please use either CoreTemp or HWMonitor to compare their measured values against AIDA64 values.

Thanks,

Fiery

So you will have input from the other side I am provoding a link to the thread where this was discussed with Real Temp. I myself am simply a user trying to understand what went wrong and why. If this is in error an explination of such would be greatly appreciated and certainly would promote my understanding. From what I have understood thus far the issue seem to lie with an area know as APIC ID and the core assignments within. As I understand things has nothing to do with the calculation method as much as it has everything to do with the core temperature which is being read and reported. The thread with Real Temp is here http://forums.techpowerup.com/showthread.php?t=137023

Please let me know if this logic is incorrect. I am very interested in knowing what I am looking at is valid.

Link to comment
Share on other sites

So you will have input from the other side I am provoding a link to the thread where this was discussed with Real Temp. I myself am simply a user trying to understand what went wrong and why. If this is in error an explination of such would be greatly appreciated and certainly would promote my understanding. From what I have understood thus far the issue seem to lie with an area know as APIC ID and the core assignments within. As I understand things has nothing to do with the calculation method as much as it has everything to do with the core temperature which is being read and reported. The thread with Real Temp is here http://forums.techpowerup.com/showthread.php?t=137023

Please let me know if this logic is incorrect. I am very interested in knowing what I am looking at is valid.

Interesting debate you have started their. From Aida64's point of view they will argue that they use a specific way to measure CPU temps which has been recommended to them by intel. Realtemp is a free plugin and i cant really comment on them as developers as i know very little about realtemps software. The one thing i have noted thou is Aida64/everest has never mathced my bios readings when it comes to cpu heat measurement where as realtemp has, Something i have brought up in the past. The problem you have is Neither side are going to find the need to talk to one another as they both think their software is reading the correct temps. which don't help you of course. What you need to be careful of is sparking a debate or in some cases arguements with end users about this subject which wont really get anywhere as you'll find most mods in the forums will close the threads once they see the arguments become unmanagable. My thoughts..."Aida64 is more accurate than realtemp", Something which i tested a while back, basically i placed a sensor under my water block for a day and found that my readings were closer to everest than they were realtemps. my temps were a couple of degrees out but this would have been down to they way i was measuring. But still i have found many users who argue the realtemp is more accurate than Aida64/everest. I will be keeping my eye on your other thread just to see how this debate pans out.!

Goodluck.!

Fugitive.!

Link to comment
Share on other sites

I think there may be an issue due to mixed up APIC ID due to a BIOS bug. However, since even Windows logical processor handling doesn't take that into account, I don't think we'd implement a workaround for that issue.

As for the core temperatures accuracy, it's yet again the old arguments over Intel's DTS implementation. Intel designed DTS not for core temperatures measurement, but to fight overheating, and to protect the processor from physical damage due thermal issues. Intel defined a TJMax value for each processor part. TJMax means the temperature (in degrees Celsius) threshold for the CPU package. Once the processor gets close to TJMax, it starts throttling itself down.

The major issue with DTS is: due to it was designed for accurate temperature measurement only around TJMax, the more far you get from TJMax the more inaccurate the calculated core temperatures get. Around 40-50 Celsius running temperature the DTS accuracy is no better than +/- 10 Celsius, while around 20 Celsius the accuracy could be +/- 20 Celsius (!) DTS then completely stops working and reports invalid values around zero Celsius. It's simply not a feature designed and implemented for temperature measurement, and it's certainly not designed to measure "normal" temperatures, like 40-50 Celsius.

So even if you put the core temperatures in "proper" order, it wouldn't make the temperature values accurate ;)

Link to comment
Share on other sites

Interesting debate you have started their. From Aida64's point of view they will argue that they use a specific way to measure CPU temps which has been recommended to them by intel. Realtemp is a free plugin and i cant really comment on them as developers as i know very little about realtemps software. The one thing i have noted thou is Aida64/everest has never mathced my bios readings when it comes to cpu heat measurement where as realtemp has, Something i have brought up in the past. The problem you have is Neither side are going to find the need to talk to one another as they both think their software is reading the correct temps. which don't help you of course. What you need to be careful of is sparking a debate or in some cases arguements with end users about this subject which wont really get anywhere as you'll find most mods in the forums will close the threads once they see the arguments become unmanagable. My thoughts..."Aida64 is more accurate than realtemp", Something which i tested a while back, basically i placed a sensor under my water block for a day and found that my readings were closer to everest than they were realtemps. my temps were a couple of degrees out but this would have been down to they way i was measuring. But still i have found many users who argue the realtemp is more accurate than Aida64/everest. I will be keeping my eye on your other thread just to see how this debate pans out.!

Goodluck.!

Fugitive.!

i would not expect much in the form of debate, or maybe I should say I do not expect much of a debate. As far as I am aware the discussion at Real Temp is over with all that could be said having been said (to my knowledge).

I still would like to know why I see my temps out of order in AIDA64, meaning why did APIC ID rearange my core positions? There is a question I need to pose to Firey and maybe he can give me some guidance as to who to speak to next.

Link to comment
Share on other sites

I think there may be an issue due to mixed up APIC ID due to a BIOS bug. However, since even Windows logical processor handling doesn't take that into account, I don't think we'd implement a workaround for that issue.

As for the core temperatures accuracy, it's yet again the old arguments over Intel's DTS implementation. Intel designed DTS not for core temperatures measurement, but to fight overheating, and to protect the processor from physical damage due thermal issues. Intel defined a TJMax value for each processor part. TJMax means the temperature (in degrees Celsius) threshold for the CPU package. Once the processor gets close to TJMax, it starts throttling itself down.

The major issue with DTS is: due to it was designed for accurate temperature measurement only around TJMax, the more far you get from TJMax the more inaccurate the calculated core temperatures get. Around 40-50 Celsius running temperature the DTS accuracy is no better than +/- 10 Celsius, while around 20 Celsius the accuracy could be +/- 20 Celsius (!) DTS then completely stops working and reports invalid values around zero Celsius. It's simply not a feature designed and implemented for temperature measurement, and it's certainly not designed to measure "normal" temperatures, like 40-50 Celsius.

So even if you put the core temperatures in "proper" order, it wouldn't make the temperature values accurate ;)

Unless I am misreading something it appears that your position and Real Temps position on temperature accuracy are inline with each other. I have no problem accepting that Intel had no reason to produce accurate temps from the sensors they have deployed and fully understand why they are not concerned about the in accuracies. It has become obvious they only need enough information to drive the formulas which control and shutdown the CPU under excess heating conditions.

What does trouble me is the fact that my cores are not reported in the correct sequence and in my case why core 1 (actually core 3) seems to be lower. Is it temp inaccuracy or something else?

Lets assume I needed to shutdown a core in BIOS due to a problem with thermals under load. If I do not know which core is affected I cannot make an accurate decision to shut down the affected core. In this case if I shut down the wrong core I would be placing more stress on an already malfunctioning core by having eliminated a working core. This is one expamle of how this could affect my machine and I am sure there are other scenarios.

If APIC ID re-ordering was unnecessary then why do we have an APIC ID value, I can see no useful purpose for a variable which serves no purpose. If this is a BUG in BIOS it would be good to be aware of it and try to understand why it occured. Did BIOS initiate the BUG or did something else set the re-ordered values?

If you could point me in the right direction, I will take this topic up in that channel. This will be helped by knowing at which point APIC ID is set and in which part of the boot process it occurs. By chance do you happen to know why APIC ID was established? The establishment of a core re-ordering method implies a need to re-order core locations or that something is incapable of reading cores based on a natural order.

Let me stress this is not about how Real Temp or AIDA64 calculate temperatures, I understand there are differences. This is about why my cores are reported out of order ie: core 3 reported as core 1, other cores are also swapped. This is not an attack on AIDA64, Real Temp, or any other software. It is only an attempt at getting to the truth concerning core temperature reporting and the effect if has on our PC's.

I love AIDA64 and enjoy recommending it to others for use. I can certainly tell them there may be a BUG outside of AIDA64 which can cause inaccurate reporting. As a suggestion, so you do not have to re-code things, placing a note beside the temps affected or elsewhere letting users know of this would be a great way to caution them when viewing temperatures which may have been affected.

Link to comment
Share on other sites

As far as we know, APIC ID is assigned by the BIOS, so when the order is unusual or unexpected (even by Windows), it is the fault of the BIOS. If you want that to be fixed, then I suppose the only one to contact is the motherboard manufacturer.

APIC ID is used by modern multi-processor kernels to identify the CPU cores and the CPU packages. In various situations a software (or the operating system itself) has to know which logical processor belongs to which CPU package, or which CPU core.

Lets assume I needed to shutdown a core in BIOS due to a problem with thermals under load. If I do not know which core is affected I cannot make an accurate decision to shut down the affected core. In this case if I shut down the wrong core I would be placing more stress on an already malfunctioning core by having eliminated a working core. This is one expamle of how this could affect my machine and I am sure there are other scenarios.

I'm not sure why you brought that up. Thermal protection is managed by the processor itself, and it's a completely automatic process. Once a core is overheating, the CPU will immediately start throttling the affected core down. If it still doesn't help to lower the temperature of the affected core, the whole CPU package will shut down. Hence there's no real need to know which core is at fault, and no need to shut down the system manually at all.

The whole APIC ID assignment business is managed by the BIOS, and interpreted by the operating system, and the APIC ID is not checked or verified or read by the CPU itself. Especially not when it comes to thermal protection mechanism ;) The CPU itself knows which of its cores are which, and you don't need to know which is which.

BTW, when a core is overheating inside a CPU package, it always affects the rest of the cores, especially for monolithic multi-core processors. Hence when core#0 overheats, it will also affect (heat up) every other cores in the same package.

Link to comment
Share on other sites

As far as we know, APIC ID is assigned by the BIOS, so when the order is unusual or unexpected (even by Windows), it is the fault of the BIOS. If you want that to be fixed, then I suppose the only one to contact is the motherboard manufacturer.

APIC ID is used by modern multi-processor kernels to identify the CPU cores and the CPU packages. In various situations a software (or the operating system itself) has to know which logical processor belongs to which CPU package, or which CPU core.

I'm not sure why you brought that up. Thermal protection is managed by the processor itself, and it's a completely automatic process. Once a core is overheating, the CPU will immediately start throttling the affected core down. If it still doesn't help to lower the temperature of the affected core, the whole CPU package will shut down. Hence there's no real need to know which core is at fault, and no need to shut down the system manually at all.

The whole APIC ID assignment business is managed by the BIOS, and interpreted by the operating system, and the APIC ID is not checked or verified or read by the CPU itself. Especially not when it comes to thermal protection mechanism ;) The CPU itself knows which of its cores are which, and you don't need to know which is which.

BTW, when a core is overheating inside a CPU package, it always affects the rest of the cores, especially for monolithic multi-core processors. Hence when core#0 overheats, it will also affect (heat up) every other cores in the same package.

My example above was an attempt to explain why I personally might desire to disable a core. As for the logic of it all I will be the first to tell you I do not trust Intels ability to shut down a processor prior to damage as is the case with many other users. Right or wrong this is why I would take an action as described.

I understand how one bad core will affect all the others in the package and is exactly why I would want to shut a bad core down manually if it were exhibiting temperature values to high for personal comfort under load.

Regardless of my logic and understanding I see why this is NOT A BUG in AIDA64 or Real Temp. It is aparrent that AIDA64 is reading values as you should be expected to read them and that Real Temp has tried to correct the presentation of information caused by the abnormal ordering of the logical cores within APIC ID. It looks like both of you are right on this issue, depending on personal viewpoint, and any correction in temperature presentation is clearly no ones responsibility when the root of the problem lies elsewhere. That decision to make a correction now seems more of an internal debate of each softwares author as to how they will procede. There are good arguments to be made for both paths of action. (I can see how this could be compared to painting path lines and placing a walk signal at an intersection where pedestrians were expected not to cross and how that placement would encourage abnormal expectations, clutter the intersection, and provide a safer walk across for those few who go where they should not. Leaving the question, do you decide on safetys side or promoting whats right?)

Firey, you have done what I thought you might which is to offer as good of an explination as capability and information permits. This is one of the reasons I love AIDA64 and the scope of insight it provides. It is a better than good product. AIDA64 is well managed, organized, and coded. I will continue to re-new AIDA64 into the future and I see no other software even close to AIDA64's capability in the market.

Thank you for your time, my understanding has improved as a result of this. In fact I have learned things I did not initially set out to learn because of your dedication to your users.

Link to comment
Share on other sites

  • 1 month later...
My example above was an attempt to explain why I personally might desire to disable a core. As for the logic of it all I will be the first to tell you I do not trust Intels ability to shut down a processor prior to damage as is the case with many other users.

Intel won't shut it down in any case. It will set a flag and its the bios that should shut it down. This flag is set already during a time where the CPU might take damage. This is so by design. And its why Intel recommends the CPU be shutdown prior to the flag being set. The CPU failsafe mechanisms have been already active for some time and have been trying to cool down the processor by throttling it. These failsafe mechanisms may however fail to do so. Usually because there's a cooler failure. If the last of these mechanisms runs for a given amount of time and the temperature still hasn't dropped below TCC, the CPU will be shutdown prior to the flag being set. Usually half a second or more before that, for most of the Nehalem architecture.

Also, as Fiery already mentioned, heat spread will affect the other cores anyway. At the speed things work (TCC operations are clocked at ms), no human being could possible be able to monitor temperatures and have fast enough reflexes to disable a core before its heat spread reached the others. Before you can even realize for sure that you had to click something, the CPU had already finished TCC and temperatures were already returning to acceptable levels, or in the worse scenario it had already shutdown the cpu.

Let the CPU do it's thing. It can do it infinitely better and do it with much more knowledge about its internal conditions that you possibly can.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.



×
×
  • Create New...