Thread: BSOD after OC

  1. #1
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446

    BSOD after OC

    I need your help to start investigating the causes of esporadic BSODs after OCing a i5 760 @ 2.8 GHz to 3.6 GHz. After many years without getting myself into these adventures, I lost track and find myself pretty much in the dark.

    The machine is a good machine built by one of the best folks around in the country. Comes from a company called F13PC built from enthusiasts and, I'll say, gurus. Specs are:

    • Chassis NOX COOLBAY Side Window with 2 fans
    • Motherboard ASUS P7P55D-E Socket 1156 (Sata 6Gb/s & USB 3.0);
    • Intel Quad Core Core i5 760 2.80 Ghz, OCed @ 3.60 Ghz
    • Cooler Artic Cooling Freezer 7 Pro Rev.2;
    • Gskill Ripjaws 4 Gb DDR3 1600 Mhz CL8 Dual Channel;
    • ASUS GEFORCE GTX 560 TI DC II 1024MB GDDR5;
    • Samsung 1 Tb 32 Mb SpinPoint F3;


    The BIOS includes two profiles. One for OC and the one with the stock settings. Under the OC profile I get the occasional BSOD. Sometimes not long after booting, other times only after long hours of uptime. BSODs happens even under nominal conditions, like while doing stuff on the desktop. The BSOD dump goes too fast. Before I can take a good look at the error, the machine is already rebooting, bless her.

    Temps under OC seem normal. The CPU cores don't exceed 47 ºC (116 ºF) when idle and can ramp up to 70 ºC (158 ºF) under heavy load in just a few seconds. But I haven't seen them ever exceed 80 ºC (176 ºF).

    My hope is this isn't a stability issue, but a driver issue. The GeForce is just a few weeks old. Was launched last month and the drivers are very new. It isn't even supported yet by Nvidia System Tools. However, the machine doesn't seem to want to BSOD when I switch to the bios profile without OC.

    How can I start investigating this BSOD? The OC resulting performance ramps up the CPU very high, giving it i7-960 like performance. Not that I couldn't live without it. Far from that. But seems a shame to miss the opportunity.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Oh, nice! Thanks. I'll start right away.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  4. #4
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079
    Run a memory test. That's always a good place to start.

    If it fails then you'll probably have to modify the ratio.
    Sent from my iPad®

  5. #5
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    It may also be the result of not enough voltage to the cpu, making it unstable. Meh. It can be too many things. A good starting point is to use the debugging tools to see what kind of error it was. The documentation for the debugging tools also contains a documentation of what the errors are and recommendations.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  6. #6
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    One thing you didn't mention was the power supply.
    Overclocking will draw more power, so it's important that your PSU has a lot of head room in normal operation so that it can easily smooth through any transients.

    > OCed @ 3.60 Ghz
    What about say 3.2?
    Is that more stable?
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  7. #7
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Mario F. View Post
    I need your help to start investigating the causes of esporadic BSODs after OCing a i5 760 @ 2.8 GHz to 3.6 GHz. After many years without getting myself into these adventures, I lost track and find myself pretty much in the dark.
    First you should understand how the speed ratings on CPU chips work... When they make a batch of chips they will take a sample from each die and test it. The die is rated for the maximum stable speed of the sample at it's designed voltage and temperature. Unless you paid an exhorbitant fee for what is commonly called "black edition" chips, there is no guarantee they will overclock *at all*.

    The first big consideration when overclocking is thermal... when you push a chip beyond it's limits it's going to get hot... that should be obvious, but what is not obvious is the way thermal ratings on these chips work. The spec Tj (Junction Temperature) is often cited for both AMD and Intel blocks. To be perfectly clear, this does not mean you can safely operate the the chip at these temperatures. It means that at those temperatures some number of chips (probably 25% or more) will fail. Above those temperatures the failure rate increases exponentially.

    Be aware that motherboard censors do not show you Tj... they're most often showing you a case termperature or a core temperature measured by internal sensors. These temperatures can be half or less of the actual Tj of the CPU at the points furthest from the sensors. Moreover, since internal sensing is primarily a failsafe mechanism, the sensors themselves can be rather inaccurate below maximum temperatures. This is to say: The reported temperatures tend to be inaccurate and often as not on the low side.

    But it's more complex than even that... using a cooling solution applied to only one side of the chip creates a temperature gradient through the chip, some spots will be hotter than others, sometimes by a significant amount. With a mediocre cooling solution, like those supplied "in box" it is very likely that you will find yourself in the non-enviable state of having enough cooling to fool the sensors but not enough to avoid a large heat gradient across the innards of the chip. (One of my techs once asked me why the old AMD XP chips always "blew out" from the bottom... this is why.)

    You should also be aware that when overclocking you need to increase the chip's voltage to keep it running. This one's really complex but for this discussion you should be aware that increasing the voltage even minutely, causes large increases in power consumption which in turn causes increases in heat production, further exacerbating the cooling problems described above.

    You are overclocking by 25%... that is a lot to ask of a chip that has not been specifically selected for overclocking. Most consumer grade chips will go 5 or 6 percent without too much trouble. Beyond that you're into high capacity cooling, selected chips, high end power supplies and a greatly increased risk of spontaneous failure.

    From your description, you are working beyond your present chip's capabilities and your best bet is to simply turn it down. Just because it will run at 3.8ghz for a minute or two while you tweak the BIOS absolutely does not mean it's going to run that way perpetually.

    In my experience, maintaining upwards of 500 machines overclocking usually creates more problems than it solves... Fun to play with, not fun to live with.

  8. #8
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Took me a bit because I was having errors with the symbols files. Who would ever thought I would need this string to load them? "srv*c:\symbols*http://msdl.microsoft.com/download/symbols"

    Anyways,
    Code:
    BugCheck 1000008E, {c0000005, 940c23de, abb34b0c, 0}
    
    Probably caused by : Npfs.SYS ( Npfs!NpRemoveAllAttributesFromList+14 )
    
    Followup: MachineOwner
    I've also snipped the top portion of the report (after !analyze -v) which is the bit that matters, I think:

    Code:
    KERNEL_MODE_EXCEPTION_NOT_HANDLED_M (1000008e)
    [...]
    Arguments:
    Arg1: c0000005, The exception code that was not handled
    Arg2: 940c23de, The address that the exception occurred at
    Arg3: abb34b0c, Trap Frame
    Arg4: 00000000
    
    EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.
    
    FAULTING_IP: 
    Npfs!NpRemoveAllAttributesFromList+14
    940c23de 897004          mov     dword ptr [eax+4],esi
    Trusting this, it's not a driver problem. I'd rather wish it was

    The PSU is good enough. A Modular NOX Apex 600w. I suppose I could try to reduce the OC a bit. But I'll need them to tell me how. I don't know how to overclock.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  9. #9
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Bugcheck description: The KERNEL_MODE_EXCEPTION_NOT_HANDLED_M bug check has a value of 0x1000008E. This indicates that a kernel-mode program generated an exception which the error handler did not catch.
    File description: A Windows (kernel) file.

    Obviously something went wrong. A bug perhaps, but unlikely. More likely that your cpu cannot handle the stress.
    You should probably run a stress tester like Prime95 or such. It's a good bet you need to lower the overclock or increase the voltage.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  10. #10
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    A mov operation on a referenced address may also indicate memory issues. Still, since I don't BSOD under normal mode, it's almost certainly as you folks say. Lower the OC. Or just plain don't do it. Which doesn't bother me much... just a little
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  11. #11
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    I suppose it could, but the CPU does... "mistakes" when under stress. Among things, it calculates things incorrectly. Possibly transferring things incorrectly to/from memory, as well.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  12. #12
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Oh right. Indeed it could.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  13. #13
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079
    The reason I was less concerned about voltage and power supply is because this is a pre-built computer. Typically, these things undergo some soft of Q.A. before they are shipped, so you can be pretty certain the power supply can handle the overclock so long as you didn't add any components and typically if the CPU is stable during testing then it will remain stable for quite some time where as I've seen memory faults crop up randomly after a few days or weeks.

    You should definitely do diagnostics on the whole system after you overclock... run memory tests, run CPU stress tests... you could even test the stability of the voltages on the rails. It's all expected, however, my short stint at building computers for people has led me to find that it's usually the memory that has problems first when it comes to overclocking.
    Sent from my iPad®

  14. #14
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by SlyMaelstrom View Post
    The reason I was less concerned about voltage and power supply is because this is a pre-built computer. Typically, these things undergo some soft of Q.A. before they are shipped
    Ummmm... lets hope so.
    But as something of an "insider" for quite some time, I've seen some pretty wild stuff going on...

    One major company (name withheld to avoid lawsuits) that I know of, builds a new configuration, installs the OS, drivers and SW bundle... then removes the hard disk and sends it to a third party company to be copied. A couple of weeks later they get a big crate of hard disks and begin assembly on the new model, which then gets boxed up and goes out the shipping door without even being turned on.

    And, no, this is not an uncommon practice.

    Seriously... I hope people don't actually think they test every motherboard, every video card, every hard disk... These days we're lucky if they test 1 in 500 during a production run. QA is just too expensive when you're offering a massively complex device for under $100.00
    Last edited by CommonTater; 02-20-2011 at 10:37 AM.

  15. #15
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    This a small enthusiasts ran store, for enthusiasts. Trust me when I say they provide one of the best services you can get. This isn't a shop that sells computers to business. They only build for exigent gamers, the OC national team, and other enthusiasts.

    Despite them clearly having failed me (I think the minidump clearly points to a stability issue and as such the machine wasn't probably tested), all I am is one email away for them to replace the machine in 24 hours, no questions asked. I just know for a fact that this setup is stable because they sell it. That's the level of confidence I have on them. Still, yeah, they failed. But they aren't anything like that.

    Anyways, memory or cpu. One of those is the culprit I guess. I will leave the machine running a memory check overnight as Sly suggests. Only I'll do it in normal mode. If in the morning it didn't detect any error, I might keep it and forget about the OC. If it does detect memory error, then I'll return it for a replacement.

    Thanks a bunch y'all.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Visual Studio 2010 - BSOD, always crashing...
    By Devils Child in forum General Discussions
    Replies: 24
    Last Post: 07-08-2010, 07:36 PM
  2. The BSOD struggle continues.
    By Aran in forum Tech Board
    Replies: 17
    Last Post: 06-03-2006, 12:51 PM
  3. BSOD on anything
    By RoD in forum Tech Board
    Replies: 2
    Last Post: 09-30-2004, 10:06 PM
  4. BSOD in XP!!!!!!!
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 12
    Last Post: 03-19-2002, 01:16 AM
  5. customizing BSOD (serious)
    By iain in forum A Brief History of Cprogramming.com
    Replies: 3
    Last Post: 12-04-2001, 07:18 PM