Two Questions: Reading a word at a time, and list intersection

This is a discussion on Two Questions: Reading a word at a time, and list intersection within the C Programming forums, part of the General Programming Boards category; Hey folks, I'm trying to create a function that reads a word one at a time. I have that function ...

  1. #1
    Registered User
    Join Date
    Mar 2011
    Posts
    25

    Two Questions: Reading a word at a time, and list intersection

    Hey folks,

    I'm trying to create a function that reads a word one at a time. I have that function all ready to go, and I have a loop in my main that should terminate when there aren't any more words to be read. However, I'm having a hard time coming up with a condition that will satisfy that idea.

    Here's the code for that looping structure. My get_word returns a char*.
    Code:
      FILE *f;
      f = fopen(argv[1], "r");
      char* word;
      word = get_word(f);
      while(word != NULL)
      {  
        printf("%s\n", word);
        word = get_word(f);
      }
      fclose(f);
    My second question involves list intersection. I'm developing a simple search engine that queries an index for a given word and returns a list of files.

    So given two char** lists, return the intersection in a char** format. It needs to be relatively fast as the point of this project is speed. I'm thinking I could qsort each list and then compare their contents, but that's technically O(n^2) with cutoff due to the sorted nature (every element still needs to look at every other element up to some point). If anyone knows any algorithms it'd be a great help!

    Thanks all!

  2. #2
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    It's hard to tell without seeing your get_word() function, but if the function returns NULL when it's out of words, what you have there should work.

    What kind of problems are you having?

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,794
    Quote Originally Posted by drshmoo
    So given two char** lists, return the intersection in a char** format. It needs to be relatively fast as the point of this project is speed. I'm thinking I could qsort each list and then compare their contents, but that's technically O(n^2) with cutoff due to the sorted nature (every element still needs to look at every other element up to some point). If anyone knows any algorithms it'd be a great help!
    The sorts should take O(n log n) time, so no, it would not be O(n^2).
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Registered User
    Join Date
    Mar 2011
    Posts
    25
    Quote Originally Posted by laserlight View Post
    The sorts should take O(n log n) time, so no, it would not be O(n^2).
    Sorting would indeed be nlogn, but the intersection isnt quite.

    list 1: zubra, zzbra
    list 2: Aztec, Carl, zebra

    each word in list1 would need to be checked at list2. Given that they are sorted, there will be a cut off, but there could be a worst case n^2. zubra would have to walk the entirety of list 2 before realizing it can stop looking, for example. I'm sure theres a better way to do this though.

  5. #5
    Registered User
    Join Date
    Mar 2011
    Posts
    25
    got the get_words stuff working . Intersection is the issue now!

  6. #6
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,592
    If one list is already sorted with qsort(), then use bsearch() to perform a search, using the same compare function.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,794
    Quote Originally Posted by drshmoo
    each word in list1 would need to be checked at list2. Given that they are sorted, there will be a cut off, but there could be a worst case n^2. zubra would have to walk the entirety of list 2 before realizing it can stop looking, for example.
    I don't get you. You would be maintaining pointers to elements of both sorted lists. Let's call these pointers p1 and p2, respectively:
    Code:
    if *p1 == *p2:
        ++p1
        ++p2
        add *p1 to intersection list
    else if *p1 < *p2:
        ++p1
    else:
        ++p2
    So, the above pseudocode would be placed in a loop that terminates as soon as either p1 or p2 points one past the last element of list1 or list2, respectively. This algorithm should take O(n) time since in the worst case you traverse both lists to the end, at the same time, once.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Registered User
    Join Date
    Mar 2011
    Posts
    25
    Quote Originally Posted by laserlight View Post
    I don't get you. You would be maintaining pointers to elements of both sorted lists. Let's call these pointers p1 and p2, respectively:
    Code:
    if *p1 == *p2:
        ++p1
        ++p2
        add *p1 to intersection list
    else if *p1 < *p2:
        ++p1
    else:
        ++p2
    So, the above pseudocode would be placed in a loop that terminates as soon as either p1 or p2 points one past the last element of list1 or list2, respectively. This algorithm should take O(n) time since in the worst case you traverse both lists to the end, at the same time, once.
    Does this need equal length lists to work?

  9. #9
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,794
    Quote Originally Posted by drshmoo
    Does this need equal length lists to work?
    No. You can add zzbra to the end of list 2 in the example you provided and then run through the algorithm to help you understand it.

    By the way, since you don't need to worry about duplicates, you may find Salem's suggestion easier to implement since you can use bsearch to help you. The time complexity will be the same.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  10. #10
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,592
    > zubra would have to walk the entirety of list 2 before realizing it can stop looking, for example. I'm sure theres a better way to do this though.
    True, but then you know that everything which follows it CANNOT be in the list, since it's sorted and you've already walked off the end.

    It's a modified linear search. You always start at the point where you got to (no point in returning to the beginning), and there is no point in going all the way to the end.
    list 1: Benny, Calum, Carl, zubra, zzbra
    list 2: Aztec, Boris, Carl, zebra

    So
    Benny starts at Aztec, gives up at Boris
    Calum starts at Aztec, gives up at Carl (but notes Boris is the last of the words before it)
    Carl starts at Boris(is before Carl, because it was before Calum), stops at Carl (a match), and moves the start point to Carl
    zubra starts at Carl, falls off the end of the list
    zzbra, well, we've already got past the end, so it's not there.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  11. #11
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,592
    > zubra would have to walk the entirety of list 2 before realizing it can stop looking, for example. I'm sure theres a better way to do this though.
    True, but then you know that everything which follows it CANNOT be in the list, since it's sorted and you've already walked off the end.

    It's a modified linear search. You always start at the point where you got to (no point in returning to the beginning), and there is no point in going all the way to the end.
    list 1: Benny, Calum, Carl, zubra, zzbra
    list 2: Aztec, Boris, Carl, zebra

    So
    Benny starts at Aztec, gives up at Boris
    Calum starts at Aztec, gives up at Carl (but notes Boris is the last of the words before it)
    Carl starts at Boris(is before Carl, because it was before Calum), stops at Carl (a match), and moves the start point to Carl
    zubra starts at Carl, falls off the end of the list
    zzbra, well, we've already got past the end, so it's not there.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. reading text-and-numbers file word by word
    By bored_guy in forum C Programming
    Replies: 22
    Last Post: 10-26-2009, 10:59 PM
  2. reading file word by word
    By 98holb in forum C Programming
    Replies: 2
    Last Post: 01-25-2006, 04:49 PM
  3. Reading in a file word by word
    By Bumblebee11 in forum C Programming
    Replies: 4
    Last Post: 06-10-2003, 09:39 PM
  4. Help reading text file word by word
    By Unregistered in forum C++ Programming
    Replies: 6
    Last Post: 05-25-2002, 05:13 PM
  5. Intersection between Two Sorted List
    By lilpig in forum C Programming
    Replies: 3
    Last Post: 04-12-2002, 07:52 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21