Thread: CPU glitches solved

  1. #1
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607

    CPU glitches solved

    In order to try to fix my quad-core stuttering I went to Windows 7 Ultimate. After a fresh install the stuttering was still present. After looking closely at my case I found that my rear 250mm fan was dying/dead and barely moving. Also found the mobo temp was way too hot and the CPU temp was just under thermal shutdown. This might be why my computer was shutting down at times with no warning.

    In order to save my system I ditched air cooling and went with a Corsair H50 water cooling setup. My idle temps are now 35C to 39C down from 50C to 55C on air. My load temps are now 40C to 45C down from near 60C to 65C. Of course these temps are not internal core temps. One program reported my cores on air at around 55C to 56C. Thermal shutdown for AMD Phenom II's is supposed to be 62C although that sounds awful low.

    After playing several games that used to glitch I have found that with the new cooling system none of my games glitch or stutter. NBA 2K11 is smooth as silk as well as a host of other games that used to have problems. The underside of the mobo near the CPU has an obvious different holdout or appearance on the substrate surface than the rest of the board. This is very visible from certain angles which indicates to me that the board has gotten a tad bit warm in these areas.

    No more air cooling for me ever again. It is simply not enough to cool down a high performance system.

  2. #2
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    If you are not overclocking, the stock heatsink + fan has to be able to cool your processor, otherwise the product is defective and you can claim warranty from AMD.

    Water cooling is really only for extreme overclocking. Even for mild to high overclocking, air cooling on a good heatsink is usually more than sufficient. A good heat-pipe based heatsink makes a lot of difference.

  3. #3
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Congratulations on finding a fix, but I have to say, I've had no problems on stock air for a long time unless the far itself, like yours was bad. I think you would have been fine on air.

    Anyway, glad to hear you figured it out.

    Soma

  4. #4
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by VirtualAce View Post
    No more air cooling for me ever again. It is simply not enough to cool down a high performance system.
    The stock air cooling solution when used with "Cool and Quiet" is generally adequate for most GP uses such as office tasks or networking. But clearly --as I've said before-- it is not adequate to the task for systems that run constant heavy CPU loads; even at stock speeds.

    Some things you can do to help it along, even with water cooling...
    Enable "Cool and Quiet" both in BIOS and in your OS (install AMD's driver if necessary).
    Set your CPU fan to always spin, ranging from 1/2 speed (abt 7 volts) at idle to full speed (12 volts) at 50c (unless your WC setup does this for you).
    Keep the radiator and it's exhaust fan spotlessly clean.

    Good to hear you solved the problem.

  5. #5
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Glad you solved it. Although frankly that source of the problem would never occur to me. The solution you found seems a little extreme (especially the bit about never going back to air cooling). But definitely I agree that going to water cooling and witnessing a low-noise system with low temperatures makes it hard to look back at air cooling. Just not true that you can't air cool high performance systems.

    Oh, and welcome to Windows 7! You took your damn time
    Did wonder sometimes if you were not itching to take advantage of DX11. You must have been. DX10 was useless, no doubt. But DX11 is another beast altogether. Anyways, shout if you have annoyances with windows 7. I know I had some.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  6. #6
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Be very careful with condensation on water cooling setups. I ruined a perfectly good system back in college with a water cooled CPU.
    bit∙hub [bit-huhb] n. A source and destination for information.

  7. #7
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Mario F. View Post
    Glad you solved it. Although frankly that source of the problem would never occur to me.
    It's what I was telling you earlier, Mario... These systems will take a certain amount of heat but they will be in trouble long before thermal shutdown occurs.

    1 degree below total failure is not a safety margin.

    Quote Originally Posted by bithub View Post
    Be very careful with condensation on water cooling setups. I ruined a perfectly good system back in college with a water cooled CPU.
    That would be leakage... Modern WC setups are sealed and air-free.

    You won't get condensation in a system that never goes below room temperature.

  8. #8
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    The air cooling was sufficient at first but there were other factors involved:

    • My case is not too far from a wall on the left side thus limiting air flow
    • A bookcase was behind the rear of my computer thus limiting exhaust potential and allowing hot air to come back into the case from the side
    • One of my 250mm side case fans died on me. The rear one died and this was the fan that was near the CPU.
    • The stupid power supply is situated above the CPU and it has a bottom fan that either draws air in (which would be hot air from the CPU) or blows hot air out (which would be onto the CPU). Really ignorant configuration.
    • I'm not the typical user - I routinely push my graphics card, CPU load, and memory to their limits via games and graphics programming.
    • My video card and memory are both overclocked from the manuf. which generates more heat than usual.
    • My memory sits rather close to my CPU and the heat from the heatsinks really heats up the ambient air which is in the same ambient air space as the CPU.


    The video card is still air-cooled but since I've brought the ambient air temp in the case way down the video card has more cool air to cool itself with. All temps have been significantly reduced. I tested the system with the same games that used to stutter and hesitate and none of them have the problem anymore. NBA 2K11 used to stop stuttering if I disabled 3 cores which probably meant that some of my cores were getting really hot.

    I cannot verify that all my random shutdowns were due to thermal issues but I'm sure some of them were. The computer has not had the shutdown problem for quite some time so that leads me to believe there may be another issue somewhere in the case. Keep in mind I have an ASUS Crosshair Formula 2 which was at the time a very high performance board but was fraught with issues. ASUS dropped the ball on the quality of the board and many of them had to be RMAed. My father has the same board and he RMAed 2 of them before he received a good one. However, some coworkers also bought this board and they have not had any issues. But if you look on various review sites you will find this particular board is a bit flaky.

    As I write this I'm defragging the drive with Diskeeper 2011 and downloading a couple of games from Steam. My CPU temp is 35C and my mobo is 46C. Those are pretty good temps.
    Last edited by VirtualAce; 03-23-2011 at 10:10 PM.

  9. #9
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by VirtualAce View Post
    The air cooling was sufficient at first but there were other factors involved:

    • My case is not too far from a wall on the left side thus limiting air flow
    • A bookcase was behind the rear of my computer thus limiting exhaust potential and allowing hot air to come back into the case from the side
    • One of my 250mm side case fans died on me. The rear one died and this was the fan that was near the CPU.
    • The stupid power supply is situated above the CPU and it has a bottom fan that either draws air in (which would be hot air from the CPU) or blows hot air out (which would be onto the CPU). Really ignorant configuration.
    • I'm not the typical user - I routinely push my graphics card, CPU load, and memory to their limits via games and graphics programming.
    • My video card and memory are both overclocked from the manuf. which generates more heat than usual.
    • My memory sits rather close to my CPU and the heat from the heatsinks really heats up the ambient air which is in the same ambient air space as the CPU.
    Actually the best airflow, even with water cooling, is to have cool air enter at the bottom front of the case and exit through the upper back. This moves cool air throughout the entire case and creates an always moving airflow inside.

    Some things that everyone will tell you not to do, that are actually very helpful... cover over all unused fan openings... really, no fan; get out the duct tape. If you are not using a duct from center-side openings to your cpu fan cover the side opening. It is also not mere neatness to keep wiring and cables out of the main airflow, even SATA cables can create turbulence that can lead to hot spots; tie them off behind the major airflow paths. The goal is to create a smooth upwards airflow, moving front to back in the case.... Think of it as mechanically assisted convection. And yes, all fans should operate in exhaust mode... that big fan on the front blowing air in probably does 1/4 what the smaller fan on the back blowing air out does.

    I cannot verify that all my random shutdowns were due to thermal issues but I'm sure some of them were.
    It sure sounds that way...

    One question: When you removed the cooler from the CPU did you examine the termal grease to see if the surfaces were well mated (i.e. a uniform thin layer)? Of late I've seen several forced air coolers where if you take a die bar across their bottoms, they are anything but flat. In one case (as an experiment) I grabbed a file and draw-filed the bottom of the heatsink into a nice even diamond pattern, applied new grease, reinstalled and got a temperature reduction of nearly 6 degrees.


    As I write this I'm defragging the drive with Diskeeper 2011 and downloading a couple of games from Steam. My CPU temp is 35C and my mobo is 46C. Those are pretty good temps.
    Very respectable temperatures.

    As for your concern about other causes of spontaneous shutdowns... If, by chance, it was a bad connection you may well have cured the problem accidentally as you re-worked your setup installing the new cooler. I shouldn't worry about it, until it does it again....

  10. #10
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    The CPU temp is especially sweet. I cannot reach that temp myself. Also overclocked, I sit on 38 in idle which is already considered good (on a heat-pipe based air cooler). I suspect your load temp will be as good. Probably on the mid 50s.

    Side note (a question):

    I quite never understood, and maybe someone can explain me, why the insistence of these programs in giving us calculated temp when it is widely accepted this value isn't accurate. What's worse, when it is known that the cooler a CPU is the more of an error is introduced. The value I want to read is the actual delta (distance to TJMax). I am not interested in calculated temperatures when none of the manufacturers publish what's the actual TJMax of their chips.

    The delta however is a pretty solid number, that while not entirely accurate gives us nonetheless a good indication of our cooling efforts. It baffles me that every temperature program out there wants to ... check this out... take an already known inaccurate reading and then introduces yet another margin of error by pretending to guess what TJMax actual temperature is. All this only to produce the result in C of F that we actually have no use for. What is really interesting to know is our distance to TJMax.

    Why do these monitoring programs do this? What on earth does it matter my CPU "real" temp when I have a much better reading that I can use and better indicates the processor conditions; distance to TJMax? Can someone explain this to me like I was 4?
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  11. #11
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Mario F. View Post
    Why do these monitoring programs do this? What on earth does it matter my CPU "real" temp when I have a much better reading that I can use and better indicates the processor conditions; distance to TJMax? Can someone explain this to me like I was 4?
    Because they are "cooling for dummies"... You and I may have the intellect and technical backing to know the difference but joe average really just wants a "green light -- red light" indicator that says "too hot" so that he can panic in the most bizarre way possible in the circumstances.

    We who tend to work at a "higher level" sometimes forget exactly how stupid the average computer user really is....

    btw... have you tried this little gem If my thermometers are any indication, this one seems to actually be pretty accurate (at least on the dominantly AMD stuff I work with).
    Last edited by CommonTater; 03-24-2011 at 07:25 AM.

  12. #12
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by CommonTater View Post
    Because they are "cooling for dummies"...
    I won't deny that is my own answer to that question. But an unsatisfying one, hence me asking if there's something else. Why at least isn't the delta offered as an option in these programs. I've been looking for it -- maybe not hard enough -- and there's only one program that offers this as an option (more below).

    Just so you have an idea, I've purchased Aida64 some time ago. Their system analysis software includes a nice gadget that I keep alongside my other windows gadgets which lists system temps. I went to their forums and made a suggestion for them to at least include the option to choose between "real" temps and delta. Usually they answer suggestions (it's a good idea, it's not a good idea, we will do this, we won't). I got nothing. Totally baffles me. Something that seems so obvious is completely ignored as if my request was too dumb to make sense.

    btw... have you tried this little gem If my thermometers are any indication, this one seems to actually be pretty accurate (at least on the dominantly AMD stuff I work with).
    I have it yes. But it's another piece of crap, frankly. It uses the same approach to temperatures that I'm criticizing. I use Aida's windows gadget as a means to have constant information on the screen. But it's just as useless. I just keep it for... I don't even know what for anymore. Maybe because no matter how bad it is, you can still see temps climbing and dropping.

    When I require more accurate readings, I turn to Realtemp. It displays distance to TJMax (!), and includes a sensor testing that will help you calibrate each core individually with the help of this information, if you do want to get more accurate real temps. That's what I call a "gem". It's not easy, but once you are done, you at least can feel more confident on the results. Still, while I did it for the heck of it, I quickly learned to ignore the real temp row altogether. My eyes have been taught to go to the delta row where the information I care for is.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  13. #13
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Mario F. View Post
    I won't deny that is my own answer to that question. But an unsatisfying one, hence me asking if there's something else.
    Sadly most of life is like that. Unless you can settle for the shallow, slow-mindedness that is the meet of the masses, you are doomed to live a life of quiet disappointment. Your only escape is to routinely think beyond the rest, satisfying yourself in intellectual pursuits of a very personal nature while you struggle to not make the rest of the world feel totally stupid in your presence.

    Why at least isn't the delta offered as an option in these programs. I've been looking for it -- maybe not hard enough -- and there's only one program that offers this as an option (more below).
    Because it's really not any better information than the raw temperature. Think of how it's derived... Take a manufacturer's speck that is NOT guaranteed, get the reading from a sensor that we both know is accurate at only 1 temperature and subtract... "Distance to TjMax" is no more accurate than the input data it is derived from.


    I've messed with realtemp ... same thing, relying on sensors that are accurate at only 1 temperature.

    Basically all these readings are derived from leakage current through a PN diode that is designed to go into thermal cascade just below the thermal limits of the chip. When it goes cascade, it triggers the thermal throttling... Below that temperature it leaks small amounts of current that are roughly analogous to the temperature it's operating at... Nobody guarantees accuracy at any other than the cascade temperature and I shouldn't be at all surprised to discover that is on a tolerance of plus/minus 5 degrees.

  14. #14
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by CommonTater View Post
    Because it's really not any better information than the raw temperature. Think of how it's derived... Take a manufacturer's speck that is NOT guaranteed, get the reading from a sensor that we both know is accurate at only 1 temperature and subtract... "Distance to TjMax" is no more accurate than the input data it is derived from.
    Well, not really... I can see your point, but you forget this delta is the data the processor itself uses to calculate when #PROCHOT measures will start. That's the only thing that should matter. Whereas the temperature we see in these programs is an estimation based on incomplete data that gets me nowhere closer to learn when I'm close to #PROCHOT. Only further away. All I need is to misjudge TJMax to get an entirely wrong temperature reading that adds up to an already inaccurate delta, for example.

    We know there's presently no way to accurately measure temperatures. In fact manufacturers do to. And they keep telling us that. Why we insist otherwise is beyond me. But the key idea here is that they use the delta to activate counter-measures. So for my purposes of determining my processor thermal solidity this value is 100% accurate. It can't get any better than this. It's the value that determines processor behavior whether it is accurately measured or not. The processor doesn't care if it isn't.

    The idea is that the temperature of my processor, as if measured by an external thermometer, is as a useless information as it can get. Instead what is useful is knowing how far away I am from my processor to start throttling or adopt other counter-measures. We shouldn't even look at this delay as a value in degrees. We shouldn't care what it is. It's entirely irrelevant. Our processor specifications include all the information we care to know about what counter-measures will take place and exactly when. By reading the delta, and the dealt alone, we are performing a 100% accurate reading of our processor thermal conditions.

    edit: I swear, as soon as I have some free time I'm going to learn how to read these values and roll out my own processor thermal condition tool.
    Last edited by Mario F.; 03-24-2011 at 09:36 AM.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  15. #15
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Mario F. View Post
    edit: I swear, as soon as I have some free time I'm going to learn how to read these values and roll out my own processor thermal condition tool.
    Placing you squarely into the realm of a hundred other guys with the same ideas...

    It's part of the SMBuss, worked through an a/d coverter to give you an arbitrary number that represents some point along the scale from "cold" (minimal diode leakage) to "too hot" (diode cascade). The problem is that cascade "failure" of the kind exhibited by PN Diodes is not linear, it's logarythmic; well sort of... that is to say that for every increase of temperature the diode leakage increases by some amount, the closer you get to cascade the more rapidly the leakage increases... the temperature increase might be linear but the numbers coming off that A/D converter will not be.

    As you correctly point out... even the manufacturers are telling us this is not a reliable metric and should be taken, always, with a huge dose of skepticism. The only point at which the PN leakage current is reliable as an accurate standard is when the diode heats up enough to become a short, which triggers the thermal safeties.

    It's very informative to take an ordinary PN diode (1n914 will do) measure it's reverse current as you heat it up... we are talking about trying to derive large information from miniscule events... accuracy is not a well known feature in such a process.

    Still, if you can crack the problem I'm sure there's a few million self-declared experts who would thank you for it.... but you would, in the end, have to present it in pablum form with some kind of silly bargraph to get them to understand.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 8
    Last Post: 03-23-2010, 12:23 AM
  2. Need some help...
    By darkconvoy in forum C Programming
    Replies: 11
    Last Post: 05-09-2009, 02:34 AM
  3. questions on multiple thread programming
    By lehe in forum C Programming
    Replies: 11
    Last Post: 03-27-2009, 07:44 AM
  4. Simple 2D rubiks cube algorithm.
    By jeanmarc in forum Game Programming
    Replies: 19
    Last Post: 11-11-2008, 07:40 PM
  5. Upgrading my old CPU (for another old one!)
    By foxman in forum Tech Board
    Replies: 16
    Last Post: 01-11-2008, 05:41 PM