Thread: Is aliasing of "pointer to const" ok?

  1. #1
    Registered User
    Join Date
    Apr 2018
    Posts
    14

    Question Is aliasing of "pointer to const" ok?

    I'm reviewing some code I've written some time ago, and I'm taking I'm taking this opportunity for to dust off my C.
    I'm using uint32_t arrays to store UTF8-encoded codepoints. They are treated as arrays, not strings. Every cell is assumed to contain one codepoint and one codepoint only.

    Occasionally I access a cell as a uint8_t array. We can safely assume it to be a macro for unsigned char.
    Code:
    uint8_t (*t)[4];
    t = (uint8_t (*)[4])&chars[i]; //chars is an array of uint32_t
    
    (*t)[0] = utf8Stringt[i++];
    (*t)[1] = utf8String[i++];
    //...
    I've read that compilers assume that char* aliases other types, but not the way around.
    I've seen there's a lot of discussion on the subject, still going strong to this day, but despite that, I'd like to get my code to abide to strict aliasing rules.
    I never attempt to read/write uint8_t arrays as uint32_t, and that would break strict aliasing.

    But one thing I do, is providing a helper macro LTCHAR to cast a "single unicode glyph literal string" (like "↺") to a uint32_t.
    Code:
    typedef uint32_t VTChar;
    #define LTCHAR *(VTChar *const) //strict aliasing is broken
    
    void vtFill(VTChar fillChar); //the function works with "uint32_t chars", but the macro helps creating one on the fly
    
    vtFill(LTCHAR"↺");
    One approach could be to move the data through a char array, or with memcpy(), with a helper function. But I was wondering:
    this is only a shortcut to pass a literal UTF8 codepoint to the function. If we assumed the string to be discarded afterwards (as if it was an rvalue), and assume it to be something like const char[4], would strict aliasing be maintained?

    Can strict aliasing be considered enforced, when the aliased pointer points to constant data?
    Last edited by Lucide; 11-05-2020 at 03:43 PM.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    It's not that different to doing this.
    Code:
    union {
        uint32_t code_point;
        uint8_t bytes[4];
    };
    But the whole idea is broken because of this -> Endianness - Wikipedia
    Either way, you've no idea if the first byte of your 'array' view corresponds to the LSB of your code point.

    If you want to portably mess with bytes in uint32_t, then you need to use the & | << >> operators.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Apr 2018
    Posts
    14
    Hi, your reply reminded me that I forgot to mention some important details!
    • I'm looking at code that was already working, but was not fully strict aliasing compliant. Endianness is correctly addressed under the hood (right now little-endian is assumed but endianness awareness can easily be achieve by adding some lines of CMake).
    • I'm working with C99
    • I'd like to keep the discussion focused on strict aliasing issues, for easier reading in the future



    The union approach is interesting, because I've seen various different ways of interpreting it. Here's two:
    1. unions can be used to tell the compiler to consider the two types as a "compatible type", and stop assuming strict aliasing for those two. Explicitly stated in C11, not so in C99?
    2. unions can be used to do that, but not directly. Once a union variable has been initialized as one type, referencing it with as a different type is a breakage of sa rules. The way to do that is:
      Code:
      union VTChar {
          uint32_t codepoint;
          uint8_t bytes[4];
      };
      union VTChar foo;
      strcpy(foo.bytes, "↺");
      
      vtFill(foo.codepoint); //sa broken, foo was initialized as uint8_t[4]
      
      union VTChar temp;
      temp.codepoint=foo.codepoint;
      vtFill(temp.codepoint); //sa maintained


    Either way, using an union is definitely more correct and robust, for general purpose operations, but on the first post I've described a very particular use case, I'm also interested on that.

    Can strict aliasing still be considered maintained, when both pointer point to constant data?
    And by "constant data" I mean const qualified, so just a compile time checks and yadda yadda, but with the guarantee that only initialization has been done on that data.
    Last edited by Lucide; 11-06-2020 at 03:55 AM.

  4. #4
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  5. #5
    Registered User
    Join Date
    Apr 2018
    Posts
    14
    Yes, most of the points brought up here com from there, somehow it's still going on to this day, through comments.
    What I've written above includes what I was able to understand from there. But even after going through that text wall, I found no common point of view, and attempts of discussion were severely limited by the medium.
    You cannot set up a thread from Stackoverflow comments.
    This is the only C forum I know, so I'm attempting to bring the discussion here, it's probably better suited, If some language lawyer/compiler guru bumps upon this thread, he's welcome

    Read C/compilers specifications myself is definitively out of my time budget.
    Last edited by Lucide; 11-06-2020 at 07:52 AM.

  6. #6
    Registered User
    Join Date
    Apr 2018
    Posts
    14
    Given a better look around, perhaps this is not the right place to ask such questions! It's all a jumble of pre-beginner/beginner/something else threads all mixed together.
    With no separation whatsoever, this will likely get lost under a pile of beginner help requests soon. Nothing wrong with that, I just didn't notice the strong focus on teaching.
    Keep it up, someone has to do it!
    Last edited by Lucide; 11-06-2020 at 08:02 AM.

  7. #7
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Well if you want chapter and verse, you could always read it yourself.
    Index of /jtc1/sc22/wg14/www/docs/n869

    6.5 Expressions
    ...
    An object shall have its stored value accessed only by an lvalue expression that has one of the following types: 63)
    — a type compatible with the effective type of the object,
    — a qualified version of a type compatible with the effective type of the object,
    — a type that is the signed or unsigned type corresponding to the effective type of the object,
    — a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
    — an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
    — a character type.

    63) The intent of this list is to specify those circumstances in which an object may or may not be aliased.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  8. #8
    Registered User
    Join Date
    Apr 2018
    Posts
    14
    Thank you!
    I found a link to a defect report that clarifies the union behaviour: Defect report #283.
    The union approach you suggested is safe and sound and "ready to use" in C99 (the first usage example I've mentioned above).

    Solved that, what do you think about the pointer to const question?
    Let me improve the phrasing yet again:
    Can strict aliasing still be considered maintained, when both pointer point to constant data?
    And by "constant data" I mean not only const qualified (we know that const doesn't mean immutability), but guaranteed that only initialization has been done (and will be done) on that data.
    Like a throw-away string literal mentioned in the first post

  9. #9
    Registered User
    Join Date
    Apr 2018
    Posts
    14
    Answers often come when you're not actively looking for them!
    Can strict aliasing still be considered maintained, when both pointer point to constant data?
    And by "constant data" I mean not only const qualified (we know that const doesn't mean immutability), but guaranteed that only initialization has been done (and will be done) on that data.
    Like a throw-away string literal mentioned in the first post
    No, because a read is still a strict aliasing violation, no magic stuff happening here sadly.
    But, if you were to cast a pointer to a temporary char* pointer, and never attempt to read or write the data, you should be able to cast it back to the original data type safely. Would the reverse be true? It shouldn't.
    Anyway, the only big compiler that bothers with strict aliasing is GCC, so if you really want to get kinky with pointers, you should really examine GCC's behaviour more than the C specification.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. expected ";" before "const" error.
    By Aenn in forum C++ Programming
    Replies: 0
    Last Post: 01-18-2015, 11:08 AM
  2. Replies: 14
    Last Post: 11-08-2010, 01:47 AM
  3. Replies: 3
    Last Post: 11-15-2009, 04:57 AM
  4. Replies: 17
    Last Post: 12-15-2006, 11:02 AM
  5. "itoa"-"_itoa" , "inp"-"_inp", Why some functions have "
    By L.O.K. in forum Windows Programming
    Replies: 5
    Last Post: 12-08-2002, 08:25 AM

Tags for this Thread