Thread: Underallocating memory for a tagged union.

  1. #1
    Registered User
    Join Date
    Apr 2015
    Posts
    3

    Lightbulb Underallocating memory for a tagged union.

    Hey all,

    I've got a bit of an odd question, that I can't seem to find any infomation about.

    What i'm interested in is the behavour of a struct/union constructed like this:
    Code:
    typedef struct {
        uint64_t num1;
        uint64_t num2;
    } st_a;
    
    typedef struct {
        uint64_t num1;
        uint32_t num2;
    } st_b;
    
    typedef struct {
        uint32 type;
        union {
            st_a a;
            st_b b;
        } u;
    } somestruct;
    
    somestruct* newsomestruct(uint32_t type) {
        somestruct* result = NULL;
    
        switch(type) {
            case 0:
                result = malloc(sizeof(uint32) + sizeof(st_a));
                result->type = type;
                result->u.a.num1 = 0;
                result->u.a.num2 = 0;
            case 1:
                result = malloc(sizeof(uint32) + sizeof(st_b));
                result->type = type;
                result->u.b.num1 = 0;
                result->u.b.num2 = 0;
        }
    
        return result;
    }
    What kind of behavour could I expect from object, in the following cases:
    1. newsomestruct(0)->u.a.num1 = 2;
    2. newsomestruct(1)->u.b.num1 = 2;
    3. newsomestruct(0)->u.a.num2 = 2;
    4. newsomestruct(1)->u.b.num2 = 2;
    5. newsomestruct(0)->u.b.num1 = 2;
    6. newsomestruct(1)->u.a.num1 = 2;
    7. newsomestruct(0)->u.b.num2 = 2;
    8. newsomestruct(1)->u.a.num2 = 2;
    9.
    Code:
    somestruct* ss1 = newsomestruct(0);
    somestruct* ss2 = newsomestruct(1);
    * ss1 = * ss2;
    10.
    Code:
    somestruct* ss1 = newsomestruct(0);
    somestruct* ss2 = newsomestruct(1);
    * ss2 = * ss1;
    This is what I'd expect, but I can't find any evidence online in C standards or elsewhere:
    1. Works as expected, sets the value of a.num1 to 2.
    2. Works as expected, sets the value of b.num1 to 2.
    3. Works as expected, sets the value of a.num2 to 2.
    4. Works as expected, sets the value of b.num2 to 2.
    5. Works as expected, sets the value of b.num1 to 2.
    6. Works as expected, sets the value of a.num1 to 2.
    7. Works as expected, sets the value of b.num1 to 2.
    8. Crashes/Memory Corruption, attempted to alter memory outside struct.
    9. Works as expected, * ss1 == * ss2
    10. Crashes/Memory Corruption, attempted to alter memory outside struct.

    I've tested simular code on my machine (Xubuntu 14.04LTS compiled with gcc on -O3) and it appears to be reliable, given that you stick with acessing the type tagged in the struct or the common initial union struct members (in this case num1).

    This is likely bad practice, since I assume it's undefined behavour, but if the behavour is defined I assume it's about as dangerous as a pointer (AKA don't control how you use it and things go wrong)

    Eitherway I'm interested in what's going on behind the scenes and if this could be used, somewhat safely, given that the type tag for the union is respected.

    Thanks for your time and sorry if it's a bit rambly / poorly worded (It's a getting a bit late)

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > result = malloc(sizeof(uint32) + sizeof(st_a));
    If you just did malloc(sizeof(somestruct)) then your problems would go away.

    The problems being your copy assignments
    * ss1 = * ss2;
    and
    * ss2 = * ss1;
    Both of which exhibit UB by trying to be overly clever with your malloc calls.

    You're either copying from memory which doesn't exist, or you're trying to overwrite memory which doesn't exist.

    The copy assignment is just a shorthand for
    memmove(ss1,ss2,sizeof(somestruct));
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Apr 2015
    Posts
    3
    Quote Originally Posted by Salem View Post
    > result = malloc(sizeof(uint32) + sizeof(st_a));
    If you just did malloc(sizeof(somestruct)) then your problems would go away.
    Yeah, I completely understand that, but that's not the point. Let's say I had the following structs instead:
    Code:
    typedef struct {     
         uint64_t num1;
    } st_a;  
    
    typedef struct {      
        uint64_t numbers[65535]; 
    } st_b;  
    
    typedef struct {     
        uint32 type; 
        union {         
            st_a a;         
            st_b b;     
        } u; 
    } somestruct;
    To allocate enough memory for somestruct, I'd need enough memory to fit, effectively, sizeof(uint32_t)+sizeof(st_b) even if I intend to only use the st_a bit of the union. It'd be more memory efficent to only ask for sizeof(uint32_t)+sizeof(st_a).

    Quote Originally Posted by Salem View Post
    The problems being your copy assignments
    * ss1 = * ss2;
    and
    * ss2 = * ss1;
    Both of which exhibit UB by trying to be overly clever with your malloc calls.

    You're either copying from memory which doesn't exist, or you're trying to overwrite memory which doesn't exist.

    The copy assignment is just a shorthand for
    memmove(ss1,ss2,sizeof(somestruct));
    Yeah, I know that, and I never expected it to work but say that the "type" field were respected just as a size/length field for an array or a null character in a string. Would it be safe?

    I couldn't find anything in the C11 standards, well not explictly atleast.
    Last edited by Michael Baker; 04-05-2015 at 07:21 PM.

  4. #4
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    Yeah, I completely understand that, but that's not the point.
    O_o

    The comment Salem made is very definitely the point.

    Code:
    somestruct* result = NULL;
    // ...
    result = malloc(sizeof(uint32) + sizeof(st_a));
    You are lying to the compiler and the client.

    The memory allocated isn't sufficient for a `somestruct' object.

    The pointer does not point at a `somestruct' object.

    If you don't provide a `somestruct' object, do not pretend to return a `somestruct' object.

    Soma
    “Salem Was Wrong!” -- Pedant Necromancer
    “Four isn't random!” -- Gibbering Mouther

  5. #5
    Registered User
    Join Date
    Apr 2015
    Posts
    3
    Quote Originally Posted by phantomotap View Post
    O_o

    The comment Salem made is very definitely the point.

    Code:
    somestruct* result = NULL;
    // ...
    result = malloc(sizeof(uint32) + sizeof(st_a));
    You are lying to the compiler and the client.

    The memory allocated isn't sufficient for a `somestruct' object.

    The pointer does not point at a `somestruct' object.

    If you don't provide a `somestruct' object, do not pretend to return a `somestruct' object.

    Soma
    Sorry, what I meant was that isn't the question I'm asking, I'm not asking if it's a bad programming practice, I know it certainly is. What I'm asking about is the bahvour of C in such a case in which someone does do that - bad practice or not.

  6. #6
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    Posts
    5,108
    What I'm asking about is the bahvour of C in such a case in which someone does do that - bad practice or not.
    O_o

    The answer to your question remains "The code exhibits undefined behavior.".

    We could speculate on the behavior that you might see for any particular environment, but we aren't going to because the code is allowed to exhibit just about any behavior.

    The instant you lied about the type, you found yourself in the territory of undefined behavior.

    [Edit]
    I felt that the original comment was misleading so better left unsaid.
    [/Edit]

    Soma
    Last edited by phantomotap; 04-05-2015 at 09:06 PM.
    “Salem Was Wrong!” -- Pedant Necromancer
    “Four isn't random!” -- Gibbering Mouther

  7. #7
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    > What I'm asking about is the bahvour of C in such a case in which someone does do that - bad practice or not.
    C is the kind of language where if you say "I want to hang myself", C will simply respond with "OK, here's the rope".

    It is up to YOU to honour the meaning of your type member into only doing the right thing with the memory you know you have. If you forget to do this (or lie about it), you're on your own.


    Now if this were C++, you would have something like this
    Code:
    struct somestruct {    
        uint32 type;  // Not necessary to make the rest of it work
    };
    
    struct st_a : public somestruct {    
         uint64_t num1;
    }; 
     
    struct st_b : public somestruct {     
        uint64_t numbers[65535];
    };
    When you wanted to be general, you could return a pointer to somestruct.
    When you wanted to be specific, you would downcast your base pointer to whatever you wanted it to be - BUT the cast would be safe if you used dynamic_cast<T*>(ptr) notation. The language itself would protect you from doing something dumb.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. union will support dynamic memory allocation?
    By nkrao123@gmail. in forum C Programming
    Replies: 8
    Last Post: 11-24-2011, 09:56 AM
  2. Memory allocation in an UNION
    By NKP in forum C Programming
    Replies: 6
    Last Post: 06-13-2010, 10:58 PM
  3. Reading flat file and generating tagged file
    By AngKar in forum C# Programming
    Replies: 4
    Last Post: 03-24-2006, 08:29 AM
  4. What is a union?
    By ammar in forum C++ Programming
    Replies: 1
    Last Post: 11-17-2002, 03:36 AM
  5. Union
    By {Dnw} in forum C Programming
    Replies: 7
    Last Post: 08-18-2001, 12:28 AM

Tags for this Thread