Thread: Unsigned long long to something longer

  1. #1
    Cogito Ergo Sum
    Join Date
    Mar 2007
    Location
    Sydney, Australia
    Posts
    463

    Unsigned long long to something longer

    Code:
    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    
    #define LONGLONGSIZE (sizeof (unsigned long long))
    
    union intstr {
      unsigned long long ikey;
      char               skey[LONGLONGSIZE];
    };
    
    unsigned hashShortString (char v[], unsigned M)
    { 
      union intstr *is;
    
      is = (union intstr *) v;
    
      return is->ikey &#37; M;
    }
    
    int main (int argc, char *argv[])
    {
      char key[LONGLONGSIZE];
    
      if (2 != argc) {
        printf ("USAGE: hash_short STRING\n");
        printf ("  (STRING will be truncated after %u characters)\n", 
                LONGLONGSIZE *LONGSIZE);
        return EXIT_FAILURE;
      }
      strncpy (key, argv[1], LONGLONGSIZE * LONGSIZE);
      printf ("Hash: %u\n", hashShortString (key, 8191));
    
      return EXIT_SUCCESS;
    }
    This is my lecturer's code for hashing short strings, I'm not entirely sure why this can't take more than 8 characters, I'm guessing because it's in integer form? Any hints?

    The stuff in red is what I tried doing, it says the string will terminate after 64 characters but gives a segfault after I enter something in.
    =========================================
    Everytime you segfault, you murder some part of the world

  2. #2
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    You forgot to change,

    Code:
    char               skey[LONGLONGSIZE];
    
    /* ... */
    
    char key[LONGLONGSIZE];


    Assuming sizeof(unsigned long long) = 8,then you're doing:

    Code:
    char test[8];
    
    strncpy(test, argv[1], 64);
    And you can probably see, sizeof(test) < 64

    I guess that makes you a murderer.

  3. #3
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    long long is indeed 8 bytes.

    You could modify the hashing to use the whole string by reading multiple 8 byte chunks and for example XORing the results. Obviously, char key[] in main needs to be loinger too.

    What is "LONGSIZE"?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  4. #4
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    I'm guessing it's 8?

    Since 8 * 8 = 64

    Which is rather odd.

  5. #5
    Cogito Ergo Sum
    Join Date
    Mar 2007
    Location
    Sydney, Australia
    Posts
    463
    Ah it was meant to be longlongsize, i did actually put the correct thing in the code because my compiler gave an error but I forgot to edit it here.

    Thanks zac and mats i will now check what i'm meant to do
    =========================================
    Everytime you segfault, you murder some part of the world

  6. #6
    Cogito Ergo Sum
    Join Date
    Mar 2007
    Location
    Sydney, Australia
    Posts
    463
    Quote Originally Posted by zacs7 View Post
    You forgot to change,

    Code:
    char               skey[LONGLONGSIZE];
    
    /* ... */
    
    char key[LONGLONGSIZE];


    Assuming sizeof(unsigned long long) = 8,then you're doing:

    Code:
    char test[8];
    
    strncpy(test, argv[1], 64);
    And you can probably see, sizeof(test) < 64

    I guess that makes you a murderer.

    That stopped the segfaults but it didn't give any different hash value past 8 characters. So I'm guessing it only hashed the first 8 and truncated the rest.
    =========================================
    Everytime you segfault, you murder some part of the world

  7. #7
    Cogito Ergo Sum
    Join Date
    Mar 2007
    Location
    Sydney, Australia
    Posts
    463
    Quote Originally Posted by matsp View Post
    long long is indeed 8 bytes.

    You could modify the hashing to use the whole string by reading multiple 8 byte chunks and for example XORing the results. Obviously, char key[] in main needs to be loinger too.

    What is "LONGSIZE"?

    --
    Mats
    So keep reading the string in chunks of 8 bytes and using ^ for XOR the result by the next result and etc?
    =========================================
    Everytime you segfault, you murder some part of the world

  8. #8
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by JFonseka View Post
    That stopped the segfaults but it didn't give any different hash value past 8 characters. So I'm guessing it only hashed the first 8 and truncated the rest.
    That line would copy up to 64 bytes or the length of the string, whichever comes first. If you entered a 7-char length string, then all would be A-OK. But if you entered anything longer, you would get buffer overrun.

    Quote Originally Posted by JFonseka View Post
    So keep reading the string in chunks of 8 bytes and using ^ for XOR the result by the next result and etc?
    Pretty much, from what I understand.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  9. #9
    Cogito Ergo Sum
    Join Date
    Mar 2007
    Location
    Sydney, Australia
    Posts
    463
    Now it gives different hash values for strings longer than 8 characters, but I get an odd feeling that it's not right, like that feeling you get when someone's behind with you a knife, but much stronger.

    Code:
    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    
    #define LONGLONGSIZE (sizeof (unsigned long long))
    
    union intstr {
      unsigned long long ikey;
      char               skey[LONGLONGSIZE];
    };
    
    unsigned hashShortString (char v[], unsigned M)
    { 
      union intstr *is;
    
        for (; *v != '\0'; v++)
           is->ikey = (LONGLONGSIZE*is->ikey + *v) &#37; M;
    
      return is->ikey % M;
    }
    
    int main (int argc, char *argv[])
    {
      char key[LONGLONGSIZE];
    
      if (2 != argc) {
        printf ("USAGE: hash_short STRING\n");
        printf ("  (STRING will be truncated after %u characters)\n",
                LONGLONGSIZE);
        return EXIT_FAILURE;
      }
      strncpy (key, argv[1], LONGLONGSIZE);
      printf ("Hash: %u\n", hashShortString (key, 8191));
    
      return EXIT_SUCCESS;
    }
    =========================================
    Everytime you segfault, you murder some part of the world

  10. #10
    Cogito Ergo Sum
    Join Date
    Mar 2007
    Location
    Sydney, Australia
    Posts
    463
    Quote Originally Posted by Elysia View Post
    That line would copy up to 64 bytes or the length of the string, whichever comes first. If you entered a 7-char length string, then all would be A-OK. But if you entered anything longer, you would get buffer overrun.
    SO that doesn't copy 64 characters but 64 bytes?

    Isn't a character 1 byte and therefore it should copy all 64 instead of 8 ?
    =========================================
    Everytime you segfault, you murder some part of the world

  11. #11
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Quote Originally Posted by JFonseka View Post
    SO that doesn't copy 64 characters but 64 bytes?
    Yes, it counts characters, so it copies 64 characters, since since sizeof(char) = 1, it's 64 bytes, but you're right to say that it's safer to say 64 characters, especially if you're working with unicode.

    Isn't a character 1 byte and therefore it should copy all 64 instead of 8 ?
    The extra argument specifies the maximum amount of characters to read (excluding the null terminator). So if the length of the string is 7, it copies 7 characters + inserts null char. But if it's 100 characters, it copies 64 characters and does not insert null char.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  12. #12
    Cogito Ergo Sum
    Join Date
    Mar 2007
    Location
    Sydney, Australia
    Posts
    463
    Quote Originally Posted by Elysia View Post
    Yes, it counts characters, so it copies 64 characters, since since sizeof(char) = 1, it's 64 bytes, but you're right to say that it's safer to say 64 characters, especially if you're working with unicode.


    The extra argument specifies the maximum amount of characters to read (excluding the null terminator). So if the length of the string is 7, it copies 7 characters + inserts null char. But if it's 100 characters, it copies 64 characters and does not insert null char.
    That's fine though, i never tested with anything over 30, but it only copied 8 still with the pre-edited code. When I changed it, then it started copying the extra stuff
    =========================================
    Everytime you segfault, you murder some part of the world

  13. #13
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Gotta be careful with C
    But, since the buffer is only 8 chars, I'd check if the length of argv[1] <= 7, and if not, abort with an error. Then you can copy with strcpy.
    But with the new code trying to hash the entire string, you may need a different approach of copying only 7 chars at a time and inserting null.
    Quote Originally Posted by Adak View Post
    io.h certainly IS included in some modern compilers. It is no longer part of the standard for C, but it is nevertheless, included in the very latest Pelles C versions.
    Quote Originally Posted by Salem View Post
    You mean it's included as a crutch to help ancient programmers limp along without them having to relearn too much.

    Outside of your DOS world, your header file is meaningless.

  14. #14
    Cogito Ergo Sum
    Join Date
    Mar 2007
    Location
    Sydney, Australia
    Posts
    463
    Actually I checked it with reprinting out skey since it's a union, strings seem to copy fine, but it's the hash value that's not being updated.

    The buffer was actually made 64 too, so that was why it was confusing me. The strings go in fine, the hash value is not reupdated past 8 characters, strange.
    =========================================
    Everytime you segfault, you murder some part of the world

  15. #15
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by JFonseka View Post
    Now it gives different hash values for strings longer than 8 characters, but I get an odd feeling that it's not right, like that feeling you get when someone's behind with you a knife, but much stronger.

    Code:
    #include <stdlib.h>
    #include <stdio.h>
    #include <string.h>
    
    #define LONGLONGSIZE (sizeof (unsigned long long))
    
    union intstr {
      unsigned long long ikey;
      char               skey[LONGLONGSIZE];
    };
    
    unsigned hashShortString (char v[], unsigned M)
    { 
      union intstr *is;
    // is->iKey is not initialized at all. 
        for (; *v != '\0'; v++)
           is->ikey = (LONGLONGSIZE*is->ikey + *v) % M;
    // You should not use % M here.
    // Otherwise it's OK. 
    
      return is->ikey % M;
    }
    
    int main (int argc, char *argv[])
    {
      char key[LONGLONGSIZE];
    // You NEED to change LONGLONGSIZE here to something bigger. 
    
      if (2 != argc) {
        printf ("USAGE: hash_short STRING\n");
        printf ("  (STRING will be truncated after %u characters)\n",
                LONGLONGSIZE);
        return EXIT_FAILURE;
      }
      strncpy (key, argv[1], LONGLONGSIZE);
      printf ("Hash: %u\n", hashShortString (key, 8191));
    
      return EXIT_SUCCESS;
    }
    See comments in green.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Screwy Linker Error - VC2005
    By Tonto in forum C++ Programming
    Replies: 5
    Last Post: 06-19-2007, 02:39 PM
  2. Obtaining source & destination IP,details of ICMP Header & each of field of it ???
    By cromologic in forum Networking/Device Communication
    Replies: 1
    Last Post: 04-29-2006, 02:49 PM
  3. Dev-cpp - compiler options
    By tretton in forum C Programming
    Replies: 7
    Last Post: 01-06-2006, 06:20 PM
  4. How do i un-SHA1 hash something..
    By willc0de4food in forum C Programming
    Replies: 4
    Last Post: 09-14-2005, 05:59 AM
  5. Merge and Heap..which is really faster
    By silicon in forum C++ Programming
    Replies: 2
    Last Post: 05-10-2005, 04:06 PM