Thread: How Difficult is This?

  1. #31
    Registered User
    Join Date
    Feb 2011
    Posts
    96
    Salem, I understand and agree.
    I did get zarniwoop to work (I think).
    I'm still not sure how to call it.
    I'm typing in the command line now:
    zarniwoop Master.txt Slave.txt

    Is C faster than C# or not?
    Partly, I'm thinking, I've got it working in C and I can access that from C# real easy now.
    Why not in C?
    I realized (and did from the start) that the algorithm would probably contribute the most if I could find a technique that was far superior to what I was doing.
    With your help, that's done.

    If I'm doing this right, zarniwoop is going through 1.48M > 260K in less than 3 seconds total.
    That's with my box humming because it's running 50 threads to scrape the Internet and dogging my HD.

    Bravo! That's unbelievable.
    I'm still studying your code . . .

  2. #32
    Registered User
    Join Date
    Feb 2011
    Posts
    96
    Salem:
    Yea, it's doing it - with my 50 threads still running it did 1.48M > 260K in just a smidge under 2 seconds.
    WHEW! - that would have taken my old proc all day
    Thanks!

  3. #33
    Registered User
    Join Date
    Feb 2011
    Posts
    96
    Salem:
    I'm studying your code.
    Why do you call bsearch the second time against 10% of the master file?
    Also, with rand() you could end up searching the same rec twice yes?

    Why do sometimes you dim function args as const?
    Last edited by MAtkins; 02-11-2011 at 09:53 AM.

  4. #34
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    SSD drives are apparently, our future:

    Intelā€™s SSD Plans : Intel's X25-M Solid State Drive Reviewed

    VERY fast, and no moving parts. Pricey, but it will come down more, no doubt.

    What's the purpose of all these url's you're gathering?

    @Salem: I know absolutely nothing about C#, except that it needs dot net to run. Never looked into it. Yes, brute force and the most basic but efficient algorithm, was my goal. Data structures might depend on what else he's doing.
    Last edited by Adak; 02-11-2011 at 09:55 AM.

  5. #35
    Registered User
    Join Date
    Feb 2011
    Posts
    96
    Youza - 250Mb/s is fast. My drives read at 120/s sustained.
    My main drive is 650 Gig though, they'll have to make 'em bigger.

    I'm doing Internet data polling for clients, mostly for SEO work.
    Last edited by MAtkins; 02-11-2011 at 10:22 AM.

  6. #36
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    MAtkins... have a browse around this site... Techgage ... these guys really like SSDs and Rob Williams is a pretty good reviewer...

  7. #37
    Registered User
    Join Date
    Feb 2011
    Posts
    96
    Well, I'm stumped on my first real edit.
    I'm appending files:
    TestOut1.txt with the base_url
    TestOut2.txt with the full_url

    I'm loading both from the same loop but in the middle of it Windows throws me an error saying the app quit.

    When I look at the output files TestOut1.txt has less recs than TestOut2.txt.
    TestOut2.txt looks like it quit right in the middle of a URL.

    I can't tell if I ran out of memory or . . .
    Can you point me in a direction so I have a clue as to how to figure out what the problem is?

    I tried to attach my test files but they're pretty big . . .

  8. #38
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    > Why do you call bsearch the second time against 10% of the master file?
    > Also, with rand() you could end up searching the same rec twice yes?
    The whole last section is just a bit of bench-marking to see how long searching takes.
    0.01 seconds for the first result is noise, so I wanted a bigger sample.

    Reading the whole of the 2nd file is sub-optimal as well, since there is no need to store anything. But it was quick and easy to re-use the same code. It will save some short-term RAM use, but it won't do much to change the slowness of the file reading.

    > Why do sometimes you dim function args as const?
    - To stop me changing things by accident
    - To give the compiler a little extra something to play with at optimisation time. Certain things which are known to be const can be optimised better.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  9. #39
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    > fprintf(fp1, strcat(db[i].base_url,"\n"));
    Use
    fprintf(fp1, "%s\n", db[i].base_url);

    1. strcat is attempting to append a character to a string where the exact length has been allocated. This is a buffer overflow.
    2. Any % characters that appear in this string will cause fprintf to look for additional (and non-existent) additional parameters.


    > result[numRecords].full_url = r.full_url;
    > result[numRecords].base_url = r.base_url;
    > numRecords++;
    At this point, you need to set your isUnique member to false as well.
    realloc won't zero this memory for you.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  10. #40
    Registered User
    Join Date
    Feb 2011
    Posts
    96
    Man, how do you guys learn this stuff!?
    I had figured out that it had something to do with the base_url and not about writing the file at all.
    So, fprintf works like printf then? That makes sense.

    Boy, I'm feeling DUMB here -) OK, how do I set an int to false? Just make it 0? or NULL or . . .

  11. #41
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    Zero (0) will do it.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #42
    Registered User
    Join Date
    Feb 2011
    Posts
    96
    K, I did that.
    For the 1.48M > 260K it took 3.4 seconds total, including writing to the 2 files.
    That's just unbelievable.

    Now if I can figure out how to skate through a text file that fast . . .

  13. #43
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    The person who wrote the article is an idiot.
    Specialised algorithms like Boyer Moore a specifically designed for searching under different conditions than naieve algorithms are designed for. E.g. Longer strings to search for, more limited alphabet, high number of prefix but not whole word matches, searching through very long datasets etc. It's apples and oranges.
    Using an algorithm like that to search for "the" would be like using quicksort on 5 items. Bubble sort is much faster in that case.

    Surely it would have been fairer to put as much effort into optimising the C++ code as it was to whoever optimised the assembly version.

    The benchmark is flawed in other ways too. Things like the switch statement having the strstr variant as the first case. Hmm, assuming the compiler doesn't optimise the switch to a jump-table, which one will that one make fastest?! Caching effects are not taken into consideration. I could go on...
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  14. #44
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by MAtkins View Post
    So, tell me about those SSD drives . . .
    I'm running Win 7 64 bit on an Alienware Aurora - overclocked at I think 3.67 Ghz Quad core - 6 Gig RAM.
    It's got a radiator in it.
    Ooh nice! Just got a new machine at work yesterday. Win 7 64 bit, 2.9GHz 8 core, 12 Gig RAM, and a couple of 10000RPM raptor 512 Gig hard drives.

    One guy at work had an SSD in his old machine, and oh yeah is that fast!
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. How difficult is parallel programming in C?
    By darsunt in forum C Programming
    Replies: 20
    Last Post: 07-16-2009, 01:42 AM
  2. How difficult would this be - download and read file
    By spadez in forum C Programming
    Replies: 4
    Last Post: 04-12-2009, 02:05 PM
  3. 3D games re difficult to play?
    By manav in forum A Brief History of Cprogramming.com
    Replies: 4
    Last Post: 05-28-2008, 06:50 PM
  4. Difficult time understanding generating terrain (from file)
    By indigo0086 in forum Game Programming
    Replies: 3
    Last Post: 06-07-2007, 11:36 PM
  5. 3D animation (difficult?)
    By maes in forum Game Programming
    Replies: 3
    Last Post: 09-08-2003, 10:44 PM