Thread: Cast byte array to 64bt int or use a union?

  1. #1
    Registered User
    Join Date
    Jun 2009
    Posts
    101

    Cast byte array to 64bt int or use a union?

    I have a byte array in my code that I need to interpret as a 64 bit integer:

    Code:
    uint8_t stuff[8];
    I can cast this to a 64 bit integer like this:

    Code:
    int64_t bigint = *((int64_t*)stuff)
    But that gives me a "strict aliasing" warning in GCC. I can use a union, and this gets rid of the warning:

    Code:
    union{
    	uint8_t c[8];
    	int64_t i;
    } stuff_u;
    What would be your preference? Would this be much of a performance penalty, especially when accessing this union from within a loop?

  2. #2
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    What version of GCC and what are your compiler flags? Can you post the actual program that gives the warning? I don't get any problems with the following with strict warnings on GCC 4.6.3:
    Code:
    $ cat foo.c
    #include <stdio.h>
    #include <stdint.h>
    #include <inttypes.h>
    
    
    int main(void)
    {
      uint8_t stuff[8] = {0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88};
      int64_t bigint = *((int64_t *) stuff);
    
    
      printf("bigint = %" PRIx64 "\n", bigint);
    
    
      return 0;
    }
    $ gcc -Wall -Wextra -std=c99 -o foo foo.c
    $ ./foo 
    bigint = 8877665544332211
    EDIT: -Wall includes -Wstrict-aliasing=3, which is the highest (strictest) level.
    Last edited by anduril462; 11-11-2013 at 03:40 PM.

  3. #3
    Registered User
    Join Date
    Jun 2009
    Posts
    101
    Quote Originally Posted by anduril462 View Post
    What version of GCC and what are your compiler flags? Can you post the actual program that gives the warning? I don't get any problems with the following with strict warnings on GCC 4.6.3:
    I'm using GCC 4.8.1 (on a Mac). However, I just tried compiling your program on an Ubuntu machine running GCC 4.7.2, and I don't get a warning unless I turn on strict aliasing (-fstrict-aliasing). Could it be that 4.8 enables it by default?

  4. #4
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Ahh, no, it's not a version thing. I didn't read the man page carefully enough:
    Code:
          -Wstrict-aliasing
               This option is only active when -fstrict-aliasing is active.  It warns about code
               which might break the strict aliasing rules that the compiler is using for
               optimization.  The warning does not catch all cases, but does attempt to catch the
               more common pitfalls.  It is included in -Wall.  It is equivalent to
               -Wstrict-aliasing=3
    
           -Wstrict-aliasing=n
               This option is only active when -fstrict-aliasing is active.  It warns about code
               which might break the strict aliasing rules that the compiler is using for
               optimization.  Higher levels correspond to higher accuracy (fewer false positives).
               Higher levels also correspond to more effort, similar to the way -O works.
               -Wstrict-aliasing is equivalent to -Wstrict-aliasing=n, with n=3.
    
               Level 1: Most aggressive, quick, least accurate.  Possibly useful when higher levels
               do not warn but -fstrict-aliasing still breaks the code, as it has very few false
               negatives.  However, it has many false positives.  Warns for all pointer conversions
               between possibly incompatible types, even if never dereferenced.  Runs in the frontend
               only.
    
               Level 2: Aggressive, quick, not too precise.  May still have many false positives (not
               as many as level 1 though), and few false negatives (but possibly more than level 1).
               Unlike level 1, it only warns when an address is taken.  Warns about incomplete types.
               Runs in the frontend only.
    
               Level 3 (default for -Wstrict-aliasing): Should have very few false positives and few
               false negatives.  Slightly slower than levels 1 or 2 when optimization is enabled.
               Takes care of the common pun+dereference pattern in the frontend:
               "*(int*)&some_float".  If optimization is enabled, it also runs in the backend, where
               it deals with multiple statement cases using flow-sensitive points-to information.
               Only warns when the converted pointer is dereferenced.  Does not warn about incomplete
               types.
    <snip>
           -fstrict-aliasing
               Allow the compiler to assume the strictest aliasing rules applicable to the language
               being compiled.  For C (and C++), this activates optimizations based on the type of
               expressions.  In particular, an object of one type is assumed never to reside at the
               same address as an object of a different type, unless the types are almost the same.
               For example, an "unsigned int" can alias an "int", but not a "void*" or a "double".  A
               character type may alias any other type.
    
               Pay special attention to code like this:
    
                       union a_union {
                         int i;
                         double d;
                       };
    
                       int f() {
                         union a_union t;
                         t.d = 3.0;
                         return t.i;
                       }
    
               The practice of reading from a different union member than the one most recently
               written to (called "type-punning") is common.  Even with -fstrict-aliasing, type-
               punning is allowed, provided the memory is accessed through the union type.  So, the
               code above will work as expected.    However, this code might not:
    
                       int f() {
                         union a_union t;
                         int* ip;
                         t.d = 3.0;
                         ip = &t.i;
                         return *ip;
                       }
    
               Similarly, access by taking the address, casting the resulting pointer and
               dereferencing the result has undefined behavior, even if the cast uses a union type,
               e.g.:
    
                       int f() {
                         double d = 3.0;
                         return ((union a_union *) &d)->i;
                       }
    
               The -fstrict-aliasing option is enabled at levels -O2, -O3, -Os.
    I left off the -fstrict-aliasing flag (-Wall includes -Wstrict-aliasing, but not -fstrict-aliasing). It turns out I get the same error:
    Code:
    $ gcc -Wall -Wextra -fstrict-aliasing -std=c99 -o foo foo.c
    foo.c: In function ‘main’:
    foo.c:8:3: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
    According to the man page (the middle paragraph of the -fstrict-aliasing option), with strict aliasing enabled, you are allowed to do this via unions, but not via type casts. This actually comes from the C standard:
    Quote Originally Posted by C99 6.5 p7
    7 An object shall have its stored value accessed only by an lvalue expression that has one of
    the following types:74)
    — a type compatible with the effective type of the object,
    — a qualified version of a type compatible with the effective type of the object,
    — a type that is the signed or unsigned type corresponding to the effective type of the
    object,
    — a type that is the signed or unsigned type corresponding to a qualified version of the
    effective type of the object,
    — an aggregate or union type that includes one of the aforementioned types among its
    members (including, recursively, a member of a subaggregate or contained union), or
    — a character type.

    74) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
    Looks like -fstrict-aliasing causes GCC to adhere to this particular part of the standard.

  5. #5
    Registered User
    Join Date
    Nov 2012
    Posts
    1,393
    Quote Originally Posted by synthetix View Post

    Code:
    union{
        uint8_t c[8];
        int64_t i;
    } stuff_u;
    What would be your preference? Would this be much of a performance penalty, especially when accessing this union from within a loop?
    The union should work as needed and will be aligned in a way compatible with both uint8_t and int64_t. With your other way, the object might end up non-optimally aligned, which will result in slow performance or possibly it will crash the program when you try to access the int64_t version which is not properly aligned.

    To answer your second question you need to do measurements. But I have no idea why accessing a union should cause a performance hit, on the contrary, I think this trick is normally used to speed things up by aligning objects on a boundary more optimal for the processor.

    However I'm no expert on this, and in real code it is normally more complicated than just one single decision which is why I say the best bet is to get a test case and simply measure how long it takes using different code, that is, if you're actually worried and if the time is actually significant.

  6. #6
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    I don't have time to find it in the standard right now, but I believe the union approach is also technically illegal because C only allows you to access a union through the "active" member, which is the member that was last written to. Accessing inactive members is undefined behaviour.

    A perfectly legal approach is using memcpy, passing both in as void pointers (void pointers are allowed to point to anything, and memcpy internally accesses them as char pointers, which, by the exception in strict alising rule, is allowed to access all types). A colleague of mine recently experimented with this, and found that GCC is smart enough to optimize it out, and not actually call memcpy, or do any copying at all.
    Last edited by cyberfish; 11-12-2013 at 08:37 PM.

  7. #7
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by cyberfish View Post
    I don't have time to find it in the standard right now, but I believe the union approach is also technically illegal because C only allows you to access a union through the "active" member, which is the member that was last written to.
    Both approaches technically involve undefined behaviour.

    Converting a pointer to a different type, and then dereferencing as is discussed here, also invokes undefined behaviour. As pointed out by laserlight in another thread recently there are some circumstances where the behaviour is defined. IIRC, however, this case discussed here is not one of said circumstances.

    Similarly, retrieving a union member other than the one last written to involves undefined behaviour, as you say. I'm not aware offhand on any caveats on that statement.

    Quote Originally Posted by cyberfish View Post
    A perfectly legal approach is using memcpy, passing both in as void pointers (void pointers are allowed to point to anything, and memcpy internally accesses them as char pointers, which, by the exception in strict alising rule, is allowed to access all types).
    Well ... that's true, but not the full story. The behaviour of memcpy() is well defined. Subsequently accessing that memory as if it is a different type than it actually is, is what invokes undefined behaviour.

    Quote Originally Posted by cyberfish View Post
    A colleague of mine recently experimented with this, and found that GCC is smart enough to optimize it out, and not actually call memcpy, or do any copying at all.
    And that is fine. A compiler is allowed to do what it likes in a scenario with undefined behaviour (that is part of the point of it being undefined according to the standard).

    If you're going to do this, by whatever means, be clear on documenting the fact your code has undefined behaviour, and ensure you revalidate the code whenever you change compiler, host system, etc.

    If the code works as intended today, it is a fair bet that the code will continue to work with gcc for some reasonable time into the future. It is not a fair bet it will work with all compilers, and not even guaranteed it will work with all future versions of gcc. Future represents a long time interval.

    The reason for the warning from gcc is that some gcc optimisation levels may cause the code to behave differently than you expect. Such is life with undefined behaviours.
    Right 98% of the time, and don't care about the other 3%.

    If I seem grumpy or unhelpful in reply to you, or tell you you need to demonstrate more effort before you can expect help, it is likely you deserve it. Suck it up, Buttercup, and read this, this, and this before posting again.

  8. #8
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Well ... that's true, but not the full story. The behaviour of memcpy() is well defined. Subsequently accessing that memory as if it is a different type than it actually is, is what invokes undefined behaviour.
    I don't see where the undefined behaviour is. You have to allocate memory using the new type first, then have memcpy copy data into the new address, which is legal. Subsequently accessing the new data through the new type is obviously also legal. There is no aliasing here.

    And that is fine. A compiler is allowed to do what it likes in a scenario with undefined behaviour (that is part of the point of it being undefined according to the standard).

    If you're going to do this, by whatever means, be clear on documenting the fact your code has undefined behaviour, and ensure you revalidate the code whenever you change compiler, host system, etc.

    If the code works as intended today, it is a fair bet that the code will continue to work with gcc for some reasonable time into the future. It is not a fair bet it will work with all compilers, and not even guaranteed it will work with all future versions of gcc. Future represents a long time interval.

    The reason for the warning from gcc is that some gcc optimisation levels may cause the code to behave differently than you expect. Such is life with undefined behaviours.
    It is guaranteed by the standard to work. Some compilers may not recognize this optimization opportunity and generate slower code with redundant memory copy, but it will still be correct. I only brought up GCC to show that this seemingly-inefficient method may not actually have a performance impact with a good optimizing compiler.

  9. #9
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Just to be clear, this is what I am talking about

    Code:
    char c[8] = ....;
    
    int64_t a;
    
    memcpy(&a, c, 8);
    
    *access a as int64_t*

  10. #10
    Make Fortran great again
    Join Date
    Sep 2009
    Posts
    1,413
    Quote Originally Posted by grumpy View Post
    Similarly, retrieving a union member other than the one last written to involves undefined behaviour, as you say. I'm not aware offhand on any caveats on that statement.
    Maybe I'm misunderstanding, but then what would be the point of a union? Isn't this exactly what a union is for?

  11. #11
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Saves memory if you know you only need one of 2 types at a time. I am not aware of other uses that are legal. I would guess most uses of union nowadays are illegal.

    That's why unions are almost never used nowadays - they aren't very useful.

    I have actually never seen a single union working 2 years as a professional developer at a few different companies.

  12. #12
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Quote Originally Posted by cyberfish
    You have to allocate memory using the new type first, then have memcpy copy data into the new address, which is legal.
    True.

    Quote Originally Posted by cyberfish
    Subsequently accessing the new data through the new type is obviously also legal.
    Not so simple. It is clear that the object of the "new" type will have the same underlying representation as the byte array by definition of memcpy, but whether that representation is valid is another matter. Even if it is valid, there is then the practical worry that it might not be the representation of the desired value.

    I think that it is obvious that if the destination object is an array of character type, then the behaviour on access is well defined, yet as far as I can tell the standard does not say what happens when one accesses a destination object of some other type, so in that sense the behaviour is undefined (i.e., by omission). On the other hand, if the representation is valid, I would argue that it would be a bug in the standard if it left the behaviour on access as undefined (after all, it would be identical in every way as with normal assignment of the value), so perhaps it does define it (which I have not found evidence of), or it was taken for granted.

    Certainly if the representation is not valid then the behaviour is undefined.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  13. #13
    Registered User
    Join Date
    Nov 2010
    Location
    Long Beach, CA
    Posts
    5,909
    Quote Originally Posted by cyberfish View Post
    I don't have time to find it in the standard right now, but I believe the union approach is also technically illegal because C only allows you to access a union through the "active" member, which is the member that was last written to. Accessing inactive members is undefined behaviour.
    Close, it's unspecified:
    Quote Originally Posted by C99 J.1 Unspecified behavior
    The value of a union member other than the last one stored into (6.2.6.1).
    That means the implementation must pick a behavior, but need not document it (though they can if they want). Obviously that means it's not portable, each implementation can do it's own thing. It's only safe if the implementation defines this behavior in such a way you can rely on it.

    As far as the OP's question goes, the GCC documentation defines type punning via unions as acceptable, even in the presence of -fstrict-aliasing. But there is a caveat: it may result in a trap representation (Structures unions enumerations and bit-fields implementation - Using the GNU Compiler Collection (GCC)).
    Quote Originally Posted by cyberfish View Post
    Saves memory if you know you only need one of 2 types at a time. I am not aware of other uses that are legal. I would guess most uses of union nowadays are illegal.

    That's why unions are almost never used nowadays - they aren't very useful.

    I have actually never seen a single union working 2 years as a professional developer at a few different companies.
    Definitely rare. I've seen them mostly in my embedded work where space is still sometimes an issue. Also:
    Quote Originally Posted by C99 6.5.2.3 p1,5
    5 One special guarantee is made in order to simplify the use of unions: if a union contains
    several structures that share a common initial sequence (see below), and if the union
    object currently contains one of these structures, it is permitted to inspect the common
    initial part of any of them anywhere that a declaration of the complete type of the union is
    visible. Two structures share acommon initial sequenceif corresponding members have
    compatible types (and, for bit-fields, the same widths) for a sequence of one or more
    initial members.
    I can see that coming in handy for message headers and pseudo OOP stuff (inheritance/base classes). But indeed, it's still rare.

  14. #14
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    Not so simple. It is clear that the object of the "new" type will have the same underlying representation as the byte array by definition of memcpy, but whether that representation is valid is another matter. Even if it is valid, there is then the practical worry that it might not be the representation of the desired value.

    I think that it is obvious that if the destination object is an array of character type, then the behaviour on access is well defined, yet as far as I can tell the standard does not say what happens when one accesses a destination object of some other type, so in that sense the behaviour is undefined (i.e., by omission). On the other hand, if the representation is valid, I would argue that it would be a bug in the standard if it left the behaviour on access as undefined (after all, it would be identical in every way as with normal assignment of the value), so perhaps it does define it (which I have not found evidence of), or it was taken for granted.
    I was only concerned how to get the data out in a portable way. Making sense of the data is another matter.

  15. #15
    Registered User
    Join Date
    Dec 2006
    Location
    Canada
    Posts
    3,229
    As far as the OP's question goes, the GCC documentation defines type punning via unions as acceptable, even in the presence of -fstrict-aliasing. But there is a caveat: it may result in a trap representation (Structures unions enumerations and bit-fields implementation - Using the GNU Compiler Collection (GCC)).
    Don't all type punning techniques produce trap representation?

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. String to Byte Array
    By drkidd22 in forum C# Programming
    Replies: 4
    Last Post: 10-30-2012, 10:30 PM
  2. Casting of structs in union on char array
    By Edelweiss in forum C Programming
    Replies: 5
    Last Post: 08-12-2011, 02:47 AM
  3. Problem in Union Array program
    By Armaan Bhati in forum C Programming
    Replies: 3
    Last Post: 05-07-2011, 07:29 AM
  4. dynamic union array doesn't work
    By Mathsniper in forum C Programming
    Replies: 2
    Last Post: 05-08-2005, 08:46 AM
  5. What is a byte array?
    By ripper079 in forum C++ Programming
    Replies: 1
    Last Post: 07-01-2003, 04:53 AM