Thread: char* ptr="HELLO"; String Literal:Stack/Heap/Data Segment

  1. #1
    Registered User
    Join Date
    Nov 2006
    Posts
    61

    char* ptr="HELLO"; String Literal:Stack/Heap/Data Segment

    Does string literal like "HELLO" is created on stack/heap/data segment/BSS(Block Started by Symbol)?

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    The string "HELLO" in your example will (by most conventions) be created in the CONST_DATA segment, which is data, but it's non-writable.

    By the way BSS stands for "Block Storage Section", which is essentially all global variables that aren't initialized.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Registered User
    Join Date
    Nov 2006
    Posts
    61
    Can you please elaborate CONST_DATA segment? Then what is data segment?

  4. #4
    Registered User
    Join Date
    Nov 2006
    Posts
    61
    BTW BSS is BLOCK STARTED BY SYMBOL Pl refer wikipedia

  5. #5
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Interesting - because I have seen the explanation "Block Storage Section" in written documentation about 15 years ago - never really looked it up since. Either way, it is a section that doesn't take up any space in the executable file, because it is just "memory initialized to zero", so the CRT init will fill it with zero at start of day.

    To completely answer your other question, we have to consider "how is an executable file split into portions". Trivially, we can separate code and data: Code is stuff the processor is intended to execute, whilst data is stuff the processor is using during the execution, such as numbers and strings.

    We can then further split the data section into three different parts:
    Data that is not set to any specific value, and thus initialized to zero. This is, generally, BSS.
    Data that is set to a specific value by the compiler (as directed by the programmer). This is generally the "Data" section.
    Literals, or "constants", const declared data, strings pointed to by char pointer (as well as initilized lists of integers), arguments to printf, and other "stuff" that isn't expected to be changed by the compiler. This is often called "const_data" or "data_const", or "lit".


    The data_const section is similar to data, but it's read-only, so you can't write to it - assuming of course the OS has any concept of read-only data.

    The regular data segment is writable, and contains initialized data, such as
    Code:
    char str[] = "Hello";
    
    struct {
        int x, y;
    } pointarray[] = { { 1, 1 },  { 1, 2}, {2, 2}, { 2, 1}, {1, 1}};

    In this example, data section will contain "Hello" and the numbers of pointarray in the order the fill the struct.

    The following would be const data:
    Code:
    char *str = "Hello";
    struct {
       int x, y;
    } *pointarray = { { 1, 1 },  { 1, 2}, {2, 2}, { 2, 1}, {1, 1}};
    
    printf("Hello, World");
    This is the same data, but it is now considered "const" by the compiler, as it doesn't expect you to change any of it. The last bit is a literal used as argument to a function.

    Note that segment and section are essentially identical terms, but I tend to avoid using segment for referring to sections in the memory, as segment in x86 is also a "segment register" - although the segment registers are rarely used these days, it is still possible to confuse the two. An x86 segment register is normally pointint to one of the sections of code, data, const_data, bss or stack - but not always - and in modern OS's, the segment registers aren't used to point to different sections - all segment registers use the same base-address of zero [again, by convention, exceptions and variations do exist], so they are essentially unused, as all segment registers are equal and point to the same set of all code/data/stack sections as one large lump.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    I should point out that depending on the system, the stack and heap are also some sort of "BSS" sections - or they are generated by the OS "as needed" - for example Linux would give the app some memory based on a call to sbrk (which is generally happening inside malloc). Stack is set up as part of the "fork()" OS-call.

    Similarly, windows has a HeapCreate() function [from memory - it may have a slightly different name], which creates a lump of memory, from which malloc gets memory when it needs to. Again, stack is created as part of the CreateProcess() function, and doesn't come out of the heap at all.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Registered User
    Join Date
    Nov 2006
    Posts
    61
    Thanks. Can anyone provide more information on this?

  8. #8
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by forumuser View Post
    Thanks. Can anyone provide more information on this?
    What more information would you like - if you ask a specific question, I'll answer it, but your question is very unspecific, so I could write 500 lines of information on the subject, and still miss the ten lines that you actually want.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #9
    Registered User
    Join Date
    Nov 2006
    Posts
    61

    Fundu Memory Question

    When variable is created, memory is allocated on stack/heap/DS. e.g. int i=10; memory is allocated on stack. int* p=malloc(4);*p=10; memory is allocated on heap In case of string literal it is created on data segment................I want to know how MEMORY is devidd in stack/heap/DS etc?

  10. #10
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677

    Post

    Ok, so the memory [1] itself isn't really divided into sections - the memory is just memory.

    The compiler will generate code, and if a function has local variables, it will make space for those by changing the stack-pointer (subtracting from the current stack-pointer in conventional stack systems).

    For example:

    Code:
    int main() {
    	int x, y;
    	scanf("%d %d", &x, &y);
    	printf("%d %d\n", x, y);
    	return 0;
    }
    generates (using MS VC++ compiler)
    Code:
    // String literals in "CONST" data segment. 
    CONST	SEGMENT
    string1 DB '%d %d', 00H	
    string2 DB '%d %d', 0aH, 00H 
    CONST ENDS
    // Code segment 
    _TEXT	SEGMENT
    _y$ = -8
    _x$ = -4
    _main	PROC NEAR					; COMDAT
    	sub	esp, 8
    	lea	eax, DWORD PTR _y$[esp+8]
    	push	eax
    	lea	ecx, DWORD PTR _x$[esp+12]
    	push	ecx
    	push	OFFSET FLAT:string1
    	call	_scanf
    	mov	edx, DWORD PTR _y$[esp+20]
    	mov	eax, DWORD PTR _x$[esp+20]
    	push	edx
    	push	eax
    	push	OFFSET FLAT:string2
    	call	_printf
    	xor	eax, eax
    	add	esp, 32					; 00000020H
    	ret	0
    _main	ENDP
    _TEXT	ENDS
    The blue sections are my comments. The red part is the stack adjustment to make space for the X and Y local variables.

    Note that this is "optimized" code, so the stack re-adjustment is all combined, rather than restoring the stack-space used at each call. The non-optimized code was a bit too messy to show what's going on.

    When we speak of heap, in a modern system, it is something the OS is assigning to the application as per the applications request. Somewhere in the OS, there is a function that tracks which bits of memory belongs to which process, and what memory areas are free. When a process asks for "more heap" (starting with "no heap"), then the application is given an area of "free memory".

    Malloc and C++'s new is then responsible for splitting the "chunk" of memory into smaller blocks and tracking those blocks.

    Something like this is how malloc/free works:
    Code:
    struct memblock {
       size_t size;
       void *block;
       struct memblock *next;
    };
    
    struct memblock freelist;
    struct memblock inuse;
    
    void *mymalloc(size_t size) {
       memblock p;
       do {
         for(p = freelist; p && p->size < size; p->next);
         if (!p) {
            addToFreeList(askOSForNewBlock(LargeBlockSize), LargeBlockSize);
         else {
           splitBlock(p, size);   // insert any "spare" part of block into free list.
           removeFromFreeList(p);
           addIntoInUse(p);
           return p;
         }
       } while(1);
    }
    
    void free(void *p)
    {
       removeFromInuse(p);
       addToFreeList(p);
    }
    I haven't written out the functions to add / remove from lists and splitting the block into two smaller blocks, but essentially this shows how it works.

    Stack is again "just another randomly choosen" location in memory that the OS doles out as part of the creation of a process or thread. Technically, any part of memory can be used as stack, as long as it's not "too close to zero" (because the stack is not allowed to wrap around from zero to high memory address, and the stack grows towards zero, as you can see from the example code above).

    If you have a basic system, or look at the startup-code in an OS, you may find something like this:
    Code:
    _TEXT  SEGMENT
    ___start:
        mov    esp, offset stack
        ....
        ....
    _TEXT ENDS
    
    DATA SEGMENT
    stacktop:
             ds.b    stacksize
    stack:
    DATA ENDS
    This assigns the bottom of the stack to a suitable memory location, with stacksize bytes until top of stack.

    I hope this helps - if not, please feel free to ask more specific questions.

    [1] This is the case for "generic computers". Embedded system often have code in ROM or Flash memory, which means that they can execute directly from that memory, without having to load the application and OS into RAM first.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 8
    Last Post: 04-25-2008, 02:45 PM
  2. Message class ** Need help befor 12am tonight**
    By TransformedBG in forum C++ Programming
    Replies: 1
    Last Post: 11-29-2006, 11:03 PM
  3. String
    By maxorator in forum C++ Programming
    Replies: 8
    Last Post: 10-30-2005, 12:36 AM
  4. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 03:23 PM
  5. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM