Thread: class layout guarantees

  1. #1
    Registered User
    Join Date
    Oct 2005
    Posts
    88

    class layout guarantees

    Hi all,

    I'm wondering about the layout of a class, and what guarantees the standard (C++03) gives about class member layouts.

    For instance, if I have a class like:

    Code:
    struct Header {
      u_char name;
      u_char val;
      u_char other;
      u_char whatever;
      u_char etc;
    };
    and I have a binary packet with the same layout as above arriving (how it arrives is irrelevant in my case) in a buffer, is the following guaranteed to work:

    Code:
    Header hdr;
    read( reinterpret_cast<u_char*>(hdr) );
    // Everything in it's right place now
    In other words, is a compiler *never* allowed to alter the layout of the class? AFAICT, if it's a POD type - which I understand means it has only a default constructor - it can't be reorganised, but what if I want a more complex constructor?

    I'd appreciate any advice/pointers to the right part of the standard.

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    I'm pretty sure that the compiler will always put data in the order you define it. There is, however, no guarantee that the data won't have gaps in it - for example:
    Code:
    struct s {
      int a;
      char b;
      int c;
    };
    The gap between b and c is probably going to be 3 bytes.

    Your constructor has no say in the matter of which order the data comes.

    Of coruse, if you have virtual function(s) in your class, then there will be a vtable entry somewhere along with your data -usually the first 4 or 8 bytes.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Registered User
    Join Date
    Oct 2005
    Posts
    88
    Quote Originally Posted by matsp View Post
    I'm pretty sure that the compiler will always put data in the order you define it. There is, however, no guarantee that the data won't have gaps in it - for example:
    Code:
    struct s {
      int a;
      char b;
      int c;
    };
    The gap between b and c is probably going to be 3 bytes.
    I would have thought just having one type (ie. u_char) would make padding a non-issue, but I'm not sure.

    Your constructor has no say in the matter of which order the data comes.
    No, but my question was more along the lines of: does having anything but a default-constructor make the class a non-POD type? I've remember read somewhere that it does. If it becomes a non-POD type, does that mean the layout isn't guaranteed?

    Of coruse, if you have virtual function(s) in your class, then there will be a vtable entry somewhere along with your data -usually the first 4 or 8 bytes.

    --
    Mats
    Oh yeah, I suppose where that comes is just random too...

    This is starting to look like a dead end.

  4. #4
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by drrngrvy View Post
    I would have thought just having one type (ie. u_char) would make padding a non-issue, but I'm not sure.



    No, but my question was more along the lines of: does having anything but a default-constructor make the class a non-POD type? I've remember read somewhere that it does. If it becomes a non-POD type, does that mean the layout isn't guaranteed?
    I _THINK_ it only means that the compiler may insert "stuff" before or after your data that is unknown to you [such as a Vtable pointer and similar things]. But the order of the DATA that you have defined should still be in the order you "expect" it to be.
    [quote]


    What problem are you actually trying to solve?

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  5. #5
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Having any constructor at all makes the type non-POD. So does having base classes, virtual functions, private or protected non-static members, destructor, copy assignment operator, non-static members of type reference, pointer to member, or any type that is not POD itself.

    The moment a type is non-POD, you are absolutely not allowed to access the data through any means but the members or pointers to them.

    For any complex type at all, only arrays have any guarantees about the exact position within the aggregate of a single member; namely, you can reach it via pointer arithmetic. Classes have relationship guarantees: the address of a member is higher than the address of a member declared before it, unless there is an access specifier in-between.
    Even POD structs don't have exact guarantees, except that the first member has the same address as the struct itself. Other than that, they have compatibility guarantees (same layout for structs that are structurally identical) and the guarantee that memcpy'ing one to the other (including via an intervening char array of sufficient size) is a valid form of transferring data. There are no guarantee what happens when you access the individual bytes, though. (But at least accessing them is only unspecified, not undefined.)
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    If it becomes a non-POD type, does that mean the layout isn't guaranteed?
    I am not too sure of whether the layout is guaranteed, but I do know that it is permissible to use malloc() instead of new or new[] for POD types, so it sounds like what you want to do is reasonable.

    From the C++ Standard (ISO/IEC 14882:2003), I quote:
    Quote Originally Posted by Section 3.9
    For any object (other than a base-class subobject) of POD type T, whether or not the object holds a valid value of type T, the underlying bytes making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.
    No, but my question was more along the lines of: does having anything but a default-constructor make the class a non-POD type?
    I am not entirely sure of the rules myself, where edge cases are concerned. I can quote the Standard, heheh:
    Quote Originally Posted by Section 8.5.1
    An aggregate is an array or a class with no user-declared constructors, no private or protected non-static data members, no base classes, and no virtual functions.
    Quote Originally Posted by Section 9
    A POD-struct is an aggregate class that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user-defined copy assignment operator and no user-defined destructor. Similarly, a POD-union is an aggregate union that has no non-static data members of type non-POD-struct, non-POD-union (or array of such types) or reference, and has no user defined copy assignment operator and no user-defined destructor. A POD class is a class that is either a POD-struct or a POD-union.
    What confuses me is that the definition of "aggregate" appears to be more strict than the definition of "POD-struct", yet a "POD-struct" is a kind of "aggregate class". The Header class in your example certainly is a POD-struct, but because of this strange restriction that relaxes the rules rather than restricts, I am not sure if adding a default constructor to it would make it non-POD. Personally, I would interpret the Standard as it being non-POD if you define a default constructor.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    On a side note, doing anything at all with a reinterpret_casted pointer aside from casting it back is undefined behaviour. If you want to access the bytes, you have to go through void*:
    Code:
    unsigned char *pbytes = static_cast<unsigned char*>(static_cast<void*>(&str));
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  8. #8
    Registered User
    Join Date
    Oct 2005
    Posts
    88
    Quote Originally Posted by CornedBee View Post
    On a side note, doing anything at all with a reinterpret_casted pointer aside from casting it back is undefined behaviour. If you want to access the bytes, you have to go through void*:
    Code:
    unsigned char *pbytes = static_cast<unsigned char*>(static_cast<void*>(&str));
    I thought that's exactly what reinterpret_cast does anyway?

  9. #9
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > read( reinterpret_cast<u_char*>(hdr) );
    It seems like you're attempting serialisation
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  10. #10
    Registered User
    Join Date
    Oct 2005
    Posts
    88
    Quote Originally Posted by matsp View Post
    What problem are you actually trying to solve?
    To exand what I said in the first post:
    Code:
    Header hdr;
    read( reinterpret_cast<u_char*>(hdr) );
    // Everything in it's right place now
    // use hdr here
    That means that the reinterpret_cast<u_char*>(hdr) would have to yield an array of unsigned chars to the read function, where the layout of the struct matches that of the expected array. Eg.
    Code:
    struct Header { u_char a; u_char b; } hdr;
    // read, as above, expecting an array like: u_char arr[] = { 'a', 'b' };
    hdr.a == 'a';
    hdr.b == 'b';
    From what everyone says, this is UB, so I'll have to do this another way. Gah.

    However, just FYI, this part seems to work fine using MSVC8 and gcc 4.* on linux. On the other hand, when I try and create a structure like the one I decoded, MSVC seems to not like it and alters the struct in some way (I'm only 60% sure that's what's happening, since I haven't bothered looking at the assembly).

  11. #11
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Quote Originally Posted by drrngrvy View Post
    I thought that's exactly what reinterpret_cast does anyway?
    It probably is on all real systems in existence, but in principle an implementation would be allowed to, e.g., flip the highest-order bit of the pointer to indicate that it is invalid, and flip it back on the next cast. (On current 64-bit systems, where the effective virtual address space is only 48 bits anyway, this would be a very interesting idea. Too bad it's somewhat incompatible with most operating systems: you can't do it for casting between pointer and int, for example.) I'm not sure if reinterpret_cast even needs to be NULL-preserving.

    ... *checks* ...

    Ah, it must be. Well, don't flip the bit for null values then. This would of course introduce considerably more overhead. It's still an interesting idea for a debug version, though.



    It's not at all unsurprising that this works with MSVC and GCC, because it's a dirty trick employed very regularly. Making it not work would break endless amounts of software, starting probably with the operating systems themselves.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  12. #12
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    So, as long as you have struct with no defined constructors or other member functions, rather than a class or struct WITH defined constructors or other member functions it should work.

    Now, let's say you actually want to have a class to do this with: Well, you obviously can't do a read directly into the class, BUT:
    Code:
    class example
    {
    private:
       struct hdr   // This struct is POD!
       {
          uchar a;
          uchar b;
       };
    
       hdr m_hdr;
       ...
    public:
       Read() { read( reinterpret_cast<uchar *>m_hdr ); };
       ...
    };
    should work ok.

    --
    Mats
    Last edited by matsp; 10-18-2007 at 01:51 PM. Reason: Fix typo.
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  13. #13
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Other member functions are irrelevant. Only the special members constructor, destructor and copy assignment operator matter.

    The code still uses reinterpret_cast instead of the double static_cast.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  14. #14
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Other member functions are irrelevant. Only the special members constructor, destructor and copy assignment operator matter.
    So, virtual functions are allowed even though a POD-struct is an aggregate class and aggregates do not have virtual functions? That sounds strange.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  15. #15
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    No, virtuals are not allowed. But simple member functions are.

    To put all the rules in one place:

    A POD-struct:
    - is a class or struct (duh!)
    - has no user-declared constructors (8.5.1/1), i.e. it only has the compiler-generated default and copy constructor
    - has no private or protected non-static data members (8.5.1/1), but it may have private or protected functions or static data members
    - has no base classes (8.5.1/1)
    - has no virtual functions (8.5.1/1), but it may have non-virtual functions
    - has no non-static data members of type pointer to member (including pointer to member function) or reference (9/4), but again this doesn't apply to static members
    - has no non-static data members that themselves aren't PODs (9/4) (POD-unions, POD-structs, primitives, pointers to non-members, and arrays thereof)
    - has no copy assignment operator (9/4) (an operator= that is callable with an expression of the type of its containing struct and that is not a template)
    - has no destructor (9/4), aside from the compiler-generated one, but the POD rules guarantee that it is a no-op.

    If the type violates any of these rules, it is not a POD, and structural conformance rules as well as byte copy rules do not apply.
    If the type violates any of the first four rules, it is also not an aggregate class, and aggregate initialization cannot be used with it. Note that containing a non-aggregate is fine for an aggregate, e.g.
    Code:
    struct agg
    {
      std::string s;
    };
    is an aggregate class, despite std::string not being an aggregate.
    Last edited by CornedBee; 10-19-2007 at 04:03 AM.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Specializing class
    By Elysia in forum C++ Programming
    Replies: 6
    Last Post: 09-28-2008, 04:30 AM
  2. Default class template problem
    By Elysia in forum C++ Programming
    Replies: 5
    Last Post: 07-11-2008, 08:44 AM
  3. Screwy Linker Error - VC2005
    By Tonto in forum C++ Programming
    Replies: 5
    Last Post: 06-19-2007, 02:39 PM
  4. Creating a database
    By Shamino in forum Game Programming
    Replies: 19
    Last Post: 06-10-2007, 01:09 PM