Thread: Global memory optimizations of const char*

  1. #1
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446

    Global memory optimizations of const char*

    I've stumped into an interesting result while trying to understand C++ memory management.

    (just a reminder: MinGW 3.4.5 on Win32)

    Code:
    const char* str1 = "There will be only one";
    const char* str2 = "There will be only one";
     
    int main() {
      std::cout << std::hex << (int)str1 << "  " << (int)str2;
    }
    Both str1 and str2 point to the same memory location, which is quiet interesting although not something one expect to take much advantage from. However,...

    - I couldn't replicate this with any other built-in type. Particularly const arrays of built-in types didn't reproduce this optimization. Why only c-strings?

    - I couldn't reproduce this behavior either on the heap or stack. Just on the global space. What prevents this optimization to take place on these other memory locations?
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  2. #2
    Deathray Engineer MacGyver's Avatar
    Join Date
    Mar 2007
    Posts
    3,210
    Are you compiling with full optimizations? You're really talking about compiler specific items, that are related specifically with the systems that they are compiling for.

    But with regard to C strings, it's because the string literal has to be saved somewhere in the end resulted executable. You can't do the same kind of thing here with regard to other built in types:

    Code:
    const int *x = 5; // Mmm, nope not the same thing at all
    Totally different concept. This is storing an address inside x of value 5 (if it compiled... probably need a cast).

    String literals, on the other hand, are relatively unique, and that's why there's no real counterpart for other types. The string literal is saved somewhere, and then the pointer is really assigned the starting address of the block of chars.

    I'm sure you know this already, but I'm just not understanding what you expect in terms of a counterpart with other data types.

  3. #3
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    You can't do something like:

    Code:
    int *p = { 1, 2, 3, 4, 5 };
    the compiler wants
    Code:
    int p[] = {1, 2, 3, 4, 5 };
    So, there's really no way to create a pointer to a literal in the same sense as strings with other types.

    --
    Mats

  4. #4
    Registered User hk_mp5kpdw's Avatar
    Join Date
    Jan 2002
    Location
    Northern Virginia/Washington DC Metropolitan Area
    Posts
    3,817
    At some point, the compiler realizes that the two string literals are the same and decides to only store one set of characters in the program's data segment and initializes the two pointers to point to that same address...

    I seem to remember an option of an old compiler that explicitly turned this on/off. Under Microsoft Visual C++ 6 in the current Project Settings->C/C++ Tab->Customize Category there is a "Eliminate duplicate strings" option that does this I believe.
    "Owners of dogs will have noticed that, if you provide them with food and water and shelter and affection, they will think you are god. Whereas owners of cats are compelled to realize that, if you provide them with food and water and shelter and affection, they draw the conclusion that they are gods."
    -Christopher Hitchens

  5. #5
    The larch
    Join Date
    May 2006
    Posts
    3,573
    Actually I'm not sure why it should be impossible, if you declare two identical global const int arrays with initilizer list, for example.

    Probably just the way the compiler works. It's probably easier to recognize identical strings. And I can't see any reason to have several instances of truly const and impossible to modify global arrays with different names...
    I might be wrong.

    Thank you, anon. You sure know how to recognize different types of trees from quite a long way away.
    Quoted more than 1000 times (I hope).

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    If you have two const arrays of int, with the same content, why do you need two arrays?

    Since const arrays are fairly rare, and multiples that are exactly the same value is even rarer, it makes little sense to optimize for that, as there's a very obvious "manual" optimization for it - have only ONE array.

    However, for string literals, it's quite common do do things like this:

    Code:
    char *strings[] = { "Unknown", "Unknown", "Unknown", "Unknwn" };
    
    ...
       if (something)
         strings[i] = mystring;
    ... 
     // or this variation:
       printf("%d\n", a);
       printf("%d\n", b);
       printf("%d\n", c);
       printf("%d\n", d);
       printf("%d\n", e);
       printf("%d\n", f);
       printf("%d\n", g);
       printf("%d\n", h);
    ...
     // or
    #define MESSAGE "Some string we print here and there\n"
    ... 
    
       printf(MESSAGE);
    ... 
       printf(MESSAGE);
    ...
      if (somethign)
        printf(MESSAGE);
    So there is much more scope for optimising the string literal multiples than there is scope for optimizing const int arrays.

    --
    Mats

  7. #7
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by matsp View Post
    If you have two const arrays of int, with the same content, why do you need two arrays?
    Conversely, why do I need two strings?

    And, btw, I did say arrays in my post. I'm not sure why the talk about pointers to integral types arised in this conversation. I didn't mention them.

    >> Since const arrays are fairly rare

    Lookup tables aren't rare. Just one example I remembered.

    >> and multiples that are exactly the same value is even rarer, it makes little sense to optimize for that, as there's a very obvious "manual" optimization for it - have only ONE array.

    Or one string. no?

    >>However, for string literals, it's quite common do do things like this:
    Code:
    char *strings[] = { "Unknown", "Unknown", "Unknown", "Unknwn" };
    It won't optimize. For optimization to take place it must be a c-string or if you define it so, a const null-terminated array of char. Further encapsulating the c-string in another array doesn't seem to optimize.

    More, the c-string or const null-terminated char array must be in the global space. So, all in all

    The question is why only c-strings and only on the global space?

    Consider this:

    Code:
    const int val1 = 5;
    const int val2 = 5;
    const char str1[2] = {'a', 'b'};
    const char str2[2] = {'a', 'b'};
     
    int main{}
    Forgetting the nonsense behind such declarations (one could think the same of two similar strings on the global space... or give the exact same reasons justifying their usage), what exactly prevents the val# and str# variables from beeing optimized?
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  8. #8
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Mario F. View Post
    Code:
    const int val1 = 5;
    const int val2 = 5;
    const char str1[2] = {'a', 'b'};
    const char str2[2] = {'a', 'b'};
     
    int main{}
    Forgetting the nonsense behind such declarations (one could think the same of two similar strings on the global space... or give the exact same reasons justifying their usage), what exactly prevents the val# and str# variables from beeing optimized?
    This isn't even closely like string literals. It's the STRING LITERALS which are being merged, not VARIABLES. If str1 and str2 were global, there would surely be two copies of the array. Why? Because they aren't the same variable, and they aren't pointers, which means you can't play the trick of making them both point the same place.

    The compiler can't just decide of its own volition "I'm going to completely remove this global variable because it's identical to this other variable."

    Had the variables been local, they might both be initialized from a "template array" somewhere in the global data area. This template array might be subject to the same optimizations as string literals, but I know of no compiler which does it.

  9. #9
    Registered User Frobozz's Avatar
    Join Date
    Dec 2002
    Posts
    546
    Quote Originally Posted by hk_mp5kpdw View Post
    I seem to remember an option of an old compiler that explicitly turned this on/off. Under Microsoft Visual C++ 6 in the current Project Settings->C/C++ Tab->Customize Category there is a "Eliminate duplicate strings" option that does this I believe.
    Under the GCC 3.4.x manual it shows

    Code:
    -fmerge-constants
        Attempt to merge identical constants (string constants and floating point constants) across compilation units.
    
        This option is the default for optimized compilation if the assembler and linker support it. Use -fno-merge-constants to inhibit this behavior.
    
        Enabled at levels -O, -O2, -O3, -Os.

  10. #10
    Just Lurking Dave_Sinkula's Avatar
    Join Date
    Oct 2002
    Posts
    5,005
    C99 adds the following in "Compound literals":
    String literals, and compound literals with const-qualified types, need not designate distinct objects.

    [footnote]This allows implementations to share storage for string literals and constant compound literals with the same or overlapping representations.
    This specification is not present in C90.
    7. It is easier to write an incorrect program than understand a correct one.
    40. There are two ways to write error-free programs; only the third one works.*

  11. #11
    Registered User
    Join Date
    Oct 2001
    Posts
    2,934
    >Both str1 and str2 point to the same memory location
    But the pointers themselves occupy different memory locations. But they both point to the same string in read only memory.

    >- I couldn't reproduce this behavior either on the heap or stack. Just on the global space.
    Interesting. When I moved the variables to the stack, they still pointed to the same memory location.

  12. #12
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by Mario F. View Post
    <snip>
    >>However, for string literals, it's quite common do do things like this:
    Code:
    char *strings[] = { "Unknown", "Unknown", "Unknown", "Unknwn" };
    It won't optimize. For optimization to take place it must be a c-string or if you define it so, a const null-terminated array of char. Further encapsulating the c-string in another array doesn't seem to optimize.
    Sure it will - I just wrote something similar a test:
    Code:
    char *list[] = { "Abcdef", "Abcdef", "Abcdef", "Abcdef" };
    The assembler listing from VC7 shows this:
    Code:
    CONST	SEGMENT
    ??_C@_06NCLOFIDO@Abcdef?$AA@ DB 'Abcdef', 00H		; `string'
    CONST	ENDS
    _DATA	SEGMENT
    _list	DD	FLAT:??_C@_06NCLOFIDO@Abcdef?$AA@
    	DD	FLAT:??_C@_06NCLOFIDO@Abcdef?$AA@
    	DD	FLAT:??_C@_06NCLOFIDO@Abcdef?$AA@
    	DD	FLAT:??_C@_06NCLOFIDO@Abcdef?$AA@
    _DATA	ENDS
    So we have ONE copy of "Abcdef" and four references to it in the variable "list".

    Other compilers may or may not do the same thing!

    --
    Mats

  13. #13
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    This takes me back to the days of embedded C programming where an array was declared to hold the language strings, in such a way as to take advantage of this optimisation.
    The type of it, iirc, was:
    Code:
    const char far * const far []
    Try saying that 10 times fast...
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  14. #14
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by Dave_Sinkula View Post
    C99 adds the following in "Compound literals":
    This specification is not present in C90.
    Thanks for that info Dave. If I read that correctly it seems to say it is indeed possible to optimize const char[] or const int[], for instance. The choice is left for the implementation to define.

    Quote Originally Posted by matsp
    Sure it will - I just wrote something similar a test
    My bad. It doesn't optimize under my implementation. Should have remembered that.

    Quote Originally Posted by swoopy
    Interesting. When I moved the variables to the stack, they still pointed to the same memory location.
    Hmm... not with mingw 3.4.5, was it?
    I just tested again and both heap or stack seem to not optimize string literals.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  15. #15
    Registered User
    Join Date
    Oct 2001
    Posts
    2,934
    Quote Originally Posted by Mario F. View Post
    Hmm... not with mingw 3.4.5, was it?
    I just tested again and both heap or stack seem to not optimize string literals.
    Yes it's mingw, but mingw32-gcc-3.3.1 under Dev-C++. And compiled with -O2, but doubt that makes a bit of difference. Are your declarations different than in your original code, other than moving them into main()? If no, then it's baffling. If yes, then I can see it being possible.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Mutex and Shared Memory Segment Questions.
    By MadDog in forum Linux Programming
    Replies: 14
    Last Post: 06-20-2010, 04:04 AM
  2. Need help implementing a class
    By jk1998 in forum C++ Programming
    Replies: 8
    Last Post: 04-05-2007, 03:13 PM
  3. Another problem with templates
    By robatino in forum C++ Programming
    Replies: 8
    Last Post: 09-21-2006, 04:32 PM
  4. pointers
    By InvariantLoop in forum C Programming
    Replies: 13
    Last Post: 02-04-2005, 09:32 AM
  5. simulate Grep command in Unix using C
    By laxmi in forum C Programming
    Replies: 6
    Last Post: 05-10-2002, 04:10 PM