Thread: Is this safe?

  1. #1
    Registered User
    Join Date
    Nov 2006
    Posts
    519

    Is this safe?

    EDIT: corrected some compiler errors

    Hi,

    imagine a hierachy like this:

    Code:
    struct Base
    {
      int base1;
      int base2;
    };
    
    struct Derived :public Base
    {
      int derived1;
      int derived2;
    };
    now this would going on:

    Code:
    	Derived d;
    
    	d.base1 = 5;
    	d.base2 = 6;
    
    	std::string d_str( reinterpret_cast<char*>(&d), sizeof(Derived) );
    
    	std::string b_str( d_str.c_str(), sizeof(Base) );
    
    	char* b_ch = new char( strlen(b_str.c_str()) +1 );
    	strcpy(b_ch, b_str.c_str());
    
    	//and now the interesting thing:
    	Base* b;
    	b = reinterpret_cast<Base*>(b_ch);
    
    	std::cout << b->base1 << "  " << b->base2 << "\n";
    Is b guaranteed to be a valid Base object, always?
    If not, why?

    Thank you!
    Last edited by pheres; 03-29-2008 at 12:49 PM.

  2. #2
    Registered User
    Join Date
    Nov 2006
    Posts
    519
    Playing around a bit the code above seems totally broken. Now I tried something more simple:

    Code:
    #include <string>
    #include <iostream>
    
    int main()
    {
    
    	struct Base
    	{
    	  int base1;
    	  int base2;
    	};
    
    	struct Derived :public Base
    	{
    	  int derived1;
    	  int derived2;
    	};
    
    	Derived d;
    
    	d.base1 = 5;
    	d.base2 = 6;
    
    	d.derived1 = 7;
    	d.derived2 = 8;
    
    	size_t sd = sizeof(Derived);
    	size_t sb = sizeof(Base);
    
    	char* source = new char[sd];
    	strncpy(source, reinterpret_cast<char*>(&d), sd);
    
    	Derived* d2 = reinterpret_cast<Derived*>(source);
    
    	std::cout << d2->base1 << "  " << d2->base2 << "  " << d2->derived1 << "  " << d2->derived2 << "\n";
    }
    
    The output is "5 0 0 0". Why not "5 6 7 8"?

    EDIT: I see, using memcpy isntead of strncpy leads to "5 6 7 8". Seems the revenge for never lerning C. Why does it work with memcpy?
    Last edited by pheres; 03-29-2008 at 01:10 PM.

  3. #3
    Registered User
    Join Date
    Nov 2006
    Posts
    519
    now I added the blue part and it behaves as expected, the output is

    5 6 7 8
    5 6
    Code:
    #include <string>
    #include <iostream>
    
    int main()
    {
    
    	struct Base
    	{
    	  int base1;
    	  int base2;
    	};
    
    	struct Derived :public Base
    	{
    	  int derived1;
    	  int derived2;
    	};
    
    	Derived d;
    
    	d.base1 = 5;
    	d.base2 = 6;
    
    	d.derived1 = 7;
    	d.derived2 = 8;
    
    	size_t sd = sizeof(Derived);
    	size_t sb = sizeof(Base);
    
    	char* source = new char[sd];
    	memcpy(source, reinterpret_cast<char*>(&d), sd);
    
    	Derived* d2 = reinterpret_cast<Derived*>(source);
    
    	std::cout << d2->base1 << "  " << d2->base2 << "  " << d2->derived1 << "  " << d2->derived2 << "\n";
    
    	char* dest = new char[sb];
    	memcpy(dest, reinterpret_cast<char*>(&d), sb);
    
    	Base* b = reinterpret_cast<Base*>(dest);
    
    	std::cout << b->base1 << "  " << b->base2 << "\n";
    The main qeuestion again: is this guaranteed to work (if all modules using that form of conversion are compiled using same compiler and settings)?

    Thanks for comments

  4. #4
    Registered User
    Join Date
    Jan 2008
    Posts
    290
    strcpy won't work. It's designed to work with NULL-terminated strings. The second it see's a 0 in the input it will stop.

    memcpy will copy exactly how many bytes you tell it to.

  5. #5
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by pheres View Post
    Is b guaranteed to be a valid Base object, always?
    If not, why?

    Thank you!
    Hell no!
    You entered the land of undefined behaviour at this line:
    Code:
    	std::string d_str( reinterpret_cast<char*>(&d), sizeof(Derived) );
    From then on you were walking barefoot on broken glass.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  6. #6
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Because Derived has a base class, it's not a POD, and therefore accessing its bytes via the cast to char* is undefined.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  7. #7
    Registered User
    Join Date
    Nov 2006
    Posts
    519
    Thanks for the warning and the explanation.

    I played a bit further and did send some big casted-to-char* structs over network and casted them back on the other end. It seems to work flawlessly. What badness exactly could happen due to "being undefined"?
    Can also something fail, if I specify the compiler, the platform and the data alignment of the structs?

  8. #8
    Registered User
    Join Date
    Jun 2005
    Posts
    6,815
    Quote Originally Posted by pheres View Post
    I played a bit further and did send some big casted-to-char* structs over network and casted them back on the other end. It seems to work flawlessly. What badness exactly could happen due to "being undefined"?
    The "badness" that can happen is that there is no guarantee it will work correctly. It can be sensitive to things like compiler or operating system versions (eg it can potentially break with a service pack applied to one of the machines), compiler optimisation settings, hardware versions, etc etc.
    Quote Originally Posted by pheres View Post
    Can also something fail, if I specify the compiler, the platform and the data alignment of the structs?
    Specifying those things (as long as you get very specific with compiler and platform versions) can reduce the chances of failure, but not eliminate it. If you want to do such trickery, you might get away with it. If you maintain and ship your program to customers, you can enjoy the pleasures should a fix of your program inadvertently break on a customer machine with a slightly different configuration than your development machines. Of, if this is a program for yourself, the head-scratching needed should some unrelated hardware or software update cause your programs to behave incorrectly in some seemingly random manner.

    It is usually a good idea to eliminate potential problems through design, rather than relying on coding constructs that can break for reasons outside your control.

  9. #9
    Registered User
    Join Date
    Nov 2006
    Posts
    519
    Thanks grumpy. Lately I saw code relying heavily on such casts as I tried in my example. It also runs flawlessly in an environment(*) for years now. I guess the reason to do this was performance. the software has to pass and reassemble huge amounts of large objects over a separate network as fast as possible. Serialisation would probably be way to slow for this kind of application

    (*) Environment in this case means:
    • Hardware is fixed and will never change. If a peace fails, just a new and identical is taken out of the cellar to replace it
    • Platform is fixed. It is installed and maintained by the producer. Nobody else can touch it. It will never be updated and it has no connection to the outside world (uplink)
    • If the software gets bumped to a new version, whe whole system gets updated and checked for functionality.


    Are these guys doing this really just lucky and all could break tomorrow due to a whisper in the trees?

  10. #10
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    In this case, no. Language theory aside, whether this code actually works in reality is a question of the ABI, and compiler builders are very, very careful about breaking that.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  11. #11
    Registered User
    Join Date
    Nov 2006
    Posts
    519
    Thanks. To ask a bit further, because I'm really interested in the background: If the compiler ABI is ever going to change, would that program run after full clean & recompilation with the new compiler if it was running before?

  12. #12
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Maybe. Depends on how the ABI changes. For example, if the compiler writers decide to prove that it's possible to place base classes at the end of the whole object (it is, but it is rather inefficient), the code will break.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  13. #13
    Registered User
    Join Date
    Nov 2006
    Posts
    519
    I see. Did I understand your first statement in this thread correct, that the undefined behavior would disappear, if no inheritance is involved, because the structure of the objects in memory would then be "defined" the same way as the structure of in int for example?

  14. #14
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Well ... this is a bit tricky.

    The structure of an int is not defined by the language, but by the implementation. So, in theory, even serializing ints is subject to implementation changes. (Of course, if such a change happened, it would be on the CPU level and all existing programs would no longer run, so that's not gonna happen.)
    The standard requires that copying PODs - primitives and structures with no constructors, destructors, assignment operators, members that aren't PODs themselves, or references - bytewise is safe, i.e. the state of the struct is defined by its bit pattern. The compiler is not allowed to place anything special in there (like vptrs in classes with virtual functions).
    The standard further requires that the layout of the struct is compatible with the rules of C and a few constraints of C++ itself - members declared earlier must sit at lower addresses than members defined later, and the first member must have the same address as the whole struct. However, the compiler may still insert arbitrary amounts of padding between and after those members - in fact, in most compilers you can even control how much padding the compiler inserts.

    This means: copying a POD bytewise is safe within the same process. Copying a part of such a struct (e.g. a member struct) is safe, too.
    When crossing out of the process, there's two more rules: First, obviously, there must not be any pointer members, because the pointer values would be meaningless. Second, if you're transferring the data, you must make certain that the two processes use the same ABI. The ABI ought to be documented by the compiler and platform.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  15. #15
    Registered User
    Join Date
    Nov 2006
    Posts
    519
    Thank you. That was very enlightening

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. type safe issue
    By George2 in forum C++ Programming
    Replies: 4
    Last Post: 02-12-2008, 09:32 PM
  2. Bjarne's exception safe sample
    By George2 in forum C++ Programming
    Replies: 13
    Last Post: 12-28-2007, 05:38 PM
  3. A Safe Dialect of C
    By viaxd in forum Tech Board
    Replies: 11
    Last Post: 11-26-2003, 11:14 AM
  4. How safe is it?
    By hermit in forum A Brief History of Cprogramming.com
    Replies: 40
    Last Post: 05-08-2002, 09:33 PM
  5. Safe Mode on FreeBsd
    By Unregistered in forum A Brief History of Cprogramming.com
    Replies: 1
    Last Post: 10-25-2001, 09:37 AM