Thread: BSOD after OC

  1. #16
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Mario F. View Post
    This a small enthusiasts ran store, for enthusiasts. Trust me when I say they provide one of the best services you can get. This isn't a shop that sells computers to business. They only build for exigent gamers, the OC national team, and other enthusiasts.

    Despite them clearly having failed me (I think the minidump clearly points to a stability issue and as such the machine wasn't probably tested), all I am is one email away for them to replace the machine in 24 hours, no questions asked. I just know for a fact that this setup is stable because they sell it. That's the level of confidence I have on them. Still, yeah, they failed. But they aren't anything like that.

    Anyways, memory or cpu. One of those is the culprit I guess. I will leave the machine running a memory check overnight as Sly suggests. Only I'll do it in normal mode. If in the morning it didn't detect any error, I might keep it and forget about the OC. If it does detect memory error, then I'll return it for a replacement.

    Thanks a bunch y'all.
    Sounds like a good plan... Or, perhaps give them a call and see if they can't talk you through adjusting it for a 10% or so OC instead of 25%... that's a bit much for any chip, except the "black edition" ones I mentioned (Typically $1,000cn ++)

  2. #17
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Yup. Good idea. Have them teach me how to set a lower OC will make me happy indeed.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  3. #18
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079
    Quote Originally Posted by Mario F. View Post
    This a small enthusiasts ran store, for enthusiasts. Trust me when I say they provide one of the best services you can get. This isn't a shop that sells computers to business. They only build for exigent gamers, the OC national team, and other enthusiasts.
    Yes, exactly.

    What you're talking about, CommonTater, is the practices of a company like Alienware or Falcon Northwest... what I like to call manufactured-boutique computers. Or perhaps more appropriately ex-boutique computer shops. Companies that got famous for taking extra special care of their systems and then got too big for their own practice.

    Quote Originally Posted by Mario F. View Post
    Despite them clearly having failed me (I think the minidump clearly points to a stability issue and as such the machine wasn't probably tested
    Not necessarily; My point about RAM is that it can fail on you even after Q.A. G.SKILL itself actually has a pretty extensive Q.A. testing process on their units, but the nature of RAM (and I say this from experience because I really couldn't say technically why this is true) is that it can take days or use before you find out it's a lemon.

    Anyway, it sounds like you paid good money for this machine and a lot of that was certainly for technical support and insurance, so perhaps I wouldn't bother with any testing and wasting your time and go get your money's worth from them.

    By the way, 25% is a fairly modest overclock for the Core i3/i5/i7 line of processors. CommonTater is absolutely right that the best dies go right into the best chips... so by nature if it's in a standard level processor that means it's not as good... but regardless, AMD and Intel have gotten into the practice of underclocking their units from their capability for enthusiasts (and also manufacturers as it would cause less faulty units).
    Sent from my iPadŽ

  4. #19
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Nowadays they usually underclock their chips to fill all market segments, because their manufacturing process is too good to produce enough slow chips.

    My 1.86GHz Core 2 Duo runs at 3.5GHz. Almost 100% overclock. The fact that it's (or was) the absolute lowest end chip helped. This is not an exception. Almost everyone who has the same chip reports achieving similar OC.

    RAM is especially susceptible to thermal failure because it's essentially a big array of capacitors. At higher temperatures (say 50C), capacitor leakage increases, limiting maximum frequency.

  5. #20
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Mario, welcome to the world of analyzing BSOD dumps.

    I'd collect a number of these dumps, say 5 or so. Look at all of them. If you detect any sort of consistent pattern, then maybe something can be dealt with. Otherwise, you're dealing with random hardware screwups due to things running out of spec -- analyzing such things at a software level isn't the right level to be looking at it.

    I once patched a bug in a video driver using WinDbg and a hex editor. Solved a mystifying period of spontaneous reboots that had been bothering me for weeks. Tremendously fun, actually. (Okay, not JUST WinDbg and a hex editor. I have some other neat tools in my toolbox.)
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  6. #21
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    It's fascinating indeed. I'm unfortunately under a lot of stress regarding work with a list of stuff to learn. Currently I'm starting with CLR, and boy will this take my time. But I'm certainly adding this to my list for later. Two recent events made me very eager to study windows development more in-depth; this experience with WinDbg which really gave me a strong feeling of control and that story about the notepad line break bug discussed on another thread and how this fella managed to fix it (Fix Notepad's CR CR LF bug with a hex editor). This level of control is definitely very satisfying once someone attains it, as you clearly demonstrate.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  7. #22
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Well, here it is in all its glory (linked to CPU-Z validation page):

    Attachment 10377

    It turns out it was a combination of factors: The cooler wasn't properly attached and the OC had a mistake. The cpu voltage was too low. The guy who did my machine thinks he probably saved the profile to the bios, did some more tweaking and then forgot to save it again.

    It's working lime a charm now. That one can be safely OCed even to 4.1 GHz but I would need a different cooler than the one I have right now. Still, at this level it wipes the floor of most i7 and pairs up quite nicely to the i7 960 on many tests.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  8. #23
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    I wouldn't mind going up against you. I've had my i7-920 OC'd from 2.66GHz to 4GHz for over a year.
    If you understand what you're doing, you're not learning anything.

  9. #24
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079
    Quote Originally Posted by Mario F. View Post
    It turns out it was a combination of factors: The cooler wasn't properly attached and the OC had a mistake. The cpu voltage was too low. The guy who did my machine thinks he probably saved the profile to the bios, did some more tweaking and then forgot to save it again.
    Hmmm, write a letter. Maybe you can get a free cooler out of it.

    I really didn't expect a boutique shop to screw up voltages. You could definitely whine your way into some sort of discount if you wanted to. ... or you could just let it go and be happy with your computer.
    Sent from my iPadŽ

  10. #25
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by itsme86 View Post
    I wouldn't mind going up against you. I've had my i7-920 OC'd from 2.66GHz to 4GHz for over a year.
    Tsk! I'd totally be game for that if I got mine to 4.1 GHz.

    You want to compare the last of the lynfield's to the earliest of the bloomfield's? Of course, you would own me on hyper-threading. But that's about it.

    Look at me, you make me sound like a braggart hehe
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  11. #26
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by SlyMaelstrom View Post
    I really didn't expect a boutique shop to screw up voltages. You could definitely whine your way into some sort of discount if you wanted to. ... or you could just let it go and be happy with your computer.
    There was no doubt on my mind they screwed up. There wasn't on them either After all, the computer wasn't operating as it should. But they were very cool about it, didn't lie, apologized and the only thing closest to an excuse was "you can't do it right all the time". The guy was sincerely confused as to how that OC profile got in there, though. I felt like I needed to say I didn't put it in there, lol.

    Anyways, they went to my home and fixed it there in something like 5 minutes. I'm good To me it was just an honest mistake.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  12. #27
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Mario F. View Post
    Tsk! I'd totally be game for that if I got mine to 4.1 GHz.
    At 4.1ghz you would absolutely need water cooling, perhaps refrigeration cooling to sustain that speed for more than a couple of months (if you even got that out of it).

    Watch your temperatures VERY closely... if it's spiking over 60c, you will need to vastly improve your cooling or turn the multiplier down a couple of notches... You may also need to bring the voltage down one notch... all in the name of not causing a meltdown.

    Contrary to the statements of some, most CPUs are NOT "underclocked"... they will come from a sample batch where the combination of speed, voltage and heat is within acceptable ranges for consumer distribution... Some chips from that batch will do better than others, some will not.

  13. #28
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by CommonTater View Post
    At 4.1ghz you would absolutely need water cooling, perhaps refrigeration cooling to sustain that speed for more than a couple of months (if you even got that out of it).
    I'm hearing that air cooling is enough. And reports everywhere seem to confirm this. I just need a slightly better cooler.

    Watch your temperatures VERY closely... if it's spiking over 60c, you will need to vastly improve your cooling or turn the multiplier down a couple of notches... You may also need to bring the voltage down one notch... all in the name of not causing a meltdown.
    Ok, I confess I'm not into OCing. I'm totally newb on that area. But boy, you are scary

    The few things I know... 60 is a very good operating temperature under constant load. It's 40 down TJMax for these processors. Meanwhile, at TJMax they will not melt. They will underclock and powerdown. With this setup at 24x and 1.28v I fire up LinX and during the 20 passes it never reaches 70, 30 down TJMax. LinX does things to the processor that you simply won't do during your daily usage.

    Some chips from that batch will do better than others, some will not.
    That's a serious thing to say, considering the manufacturing processes involved at Intel (or pretty much any company of this type). I'd expect different batches to produce problems. Shouldn't, but as machines are recalibrated and stuff, errors can be introduced into the manufacturing line. Most are immediately detected by their systems. Others may not. But I don't expect chips from the same batch to be anything else than absolute facsimiles of each other. If they are not, that can probably involve someone high up being fired.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  14. #29
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Mario F. View Post
    That's a serious thing to say, considering the manufacturing processes involved at Intel (or pretty much any company of this type). I'd expect different batches to produce problems. Shouldn't, but as machines are recalibrated and stuff, errors can be introduced into the manufacturing line. Most are immediately detected by their systems. Others may not. But I don't expect chips from the same batch to be anything else than absolute facsimiles of each other. If they are not, that can probably involve someone high up being fired.
    My father-in-law is the head guy at Intel who figures out what went wrong when the above happens. I don't think it normally involves heads rolling, but then again, large-scale screwups don't really happen and when they do they don't make it out to the world.

    The main concern isn't a bunch of bad wafers. The serious problems are things like loading the wrong wafers into the wrong machine, destroying a $100 million machine by contaminating it with copper, stuff like that. Unfortunately those kinds of errors are typically made by low-level employees, you can guess what happens to those folks.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  15. #30
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by Mario F. View Post
    Ok, I confess I'm not into OCing. I'm totally newb on that area. But boy, you are scary
    Cautious, Mario... I am very cautious with other people's stuff... and not just computers. I'm not at all like my buddy who borrows my kid's bike and brings it back with a dent in the gas tank and a bent front fork... I treat other people's stuff like it's ready to fall apart in my hands...

    Advising you to go at this cautiously is just a natural offshoot of that.

    The few things I know... 60 is a very good operating temperature under constant load. It's 40 down TJMax for these processors. Meanwhile, at TJMax they will not melt. They will underclock and powerdown.
    When all goes well --and it doesn't always-- that is what should happen. However, they're not going to simply turn off the processor or the computer... too much risk of data loss. Consider that 10 million record customer file that is about 70% complete when it overheats... The last thing you want is a shutdown. So they engage cooling strategies as you pointed out *trusting* they will work as advertised.

    With this setup at 24x and 1.28v I fire up LinX and during the 20 passes it never reaches 70, 30 down TJMax. LinX does things to the processor that you simply won't do during your daily usage.
    Ok... bo back to my first message on this and re-read the part about temperature gradients. You can have a tjMax - 30 reading at the sensor but at some point away from the sensor you might actually be getting very close to the limit unless your cooling solution is adequate to keep the whole chip evenly warm. (As I pointed out before, this is an artifact of single sided cooling...)

    One of the things most people don't realize about cooling solutions like those used on PCs is that they do more than simply blowing heat away... they also need enough mass at their basepoints to keep the whole chip at a uniform temperature... when you get a hot spot --and it does happen-- part of the cooler's job is to cool that part of the chip AND warm the remainder of it in an attempt to keep a uniform temperature. (This is, by the way, why the original AMD "thumnail" solution was replaced by chips with substantial heat spreaders on top.)

    That's a serious thing to say, considering the manufacturing processes involved at Intel (or pretty much any company of this type). I'd expect different batches to produce problems. Shouldn't, but as machines are recalibrated and stuff, errors can be introduced into the manufacturing line. Most are immediately detected by their systems. Others may not. But I don't expect chips from the same batch to be anything else than absolute facsimiles of each other. If they are not, that can probably involve someone high up being fired.
    CPU chips are not made individually. They are made in batches of several dozen on a large die that is then cut apart and finished into the palm sized chip you see. The manufacturing process is such that all chips on the die are "grown" separately but simultaneously... it's not uncommon for this process to create errors in some parts of the die that do not exist in others.

    In fact, you stand a much better chance that a chip from position #23 on one die is very close to a chip from position #23 on the die before it than you stand of having #23 and #67 on the same die be absolutely alike.

    Every die produces a couple of duds. Every die has minor qualitative differences... Trusty old #23 might hit 4ghz with no problems but poor old #67 could well die at 3.6...

    Thus precaution tells me to treat every chip as it's own distinct entity... tweak it up get it all working on it's own merrits... then back everything off one notch and enjoy the fruits of my labours.
    Last edited by CommonTater; 02-22-2011 at 01:35 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Visual Studio 2010 - BSOD, always crashing...
    By Devils Child in forum General Discussions
    Replies: 24
    Last Post: 07-08-2010, 07:36 PM
  2. The BSOD struggle continues.
    By Aran in forum Tech Board
    Replies: 17
    Last Post: 06-03-2006, 12:51 PM
  3. BSOD on anything
    By RoD in forum Tech Board
    Replies: 2
    Last Post: 09-30-2004, 10:06 PM
  4. BSOD in XP!!!!!!!
    By DavidP in forum A Brief History of Cprogramming.com
    Replies: 12
    Last Post: 03-19-2002, 01:16 AM
  5. customizing BSOD (serious)
    By iain in forum A Brief History of Cprogramming.com
    Replies: 3
    Last Post: 12-04-2001, 07:18 PM