Thread: Pointer Arithmetic confusion

  1. #1
    Old Fashioned
    Join Date
    Nov 2016
    Posts
    137

    Pointer Arithmetic confusion

    I have the following program:

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    
    
    
    struct myStruct {
        char myData[50];
    };
    
    
    void print_addr(void *addr, char *msg);
    
    
    int main(void)
    {
        char *test_char = malloc(500);
        char *test_char2 = malloc(5);
        struct myStruct *test_data = malloc(sizeof(struct myStruct));
        struct myStruct *test_data2 = malloc(sizeof(struct myStruct));
    
    
        struct myStruct *a = test_data + 0;
        struct myStruct *b = test_data + 25;
        struct myStruct *c = test_data2 + 5;
    
    
        print_addr(b, "b");
        print_addr(c, "c");
        print_addr(test_char, "test_char");
        print_addr(test_char2, "test_char2");
    
    
        unsigned long int result = test_char2 - test_char;
        printf("test_char2 - test_char: %lu\n", result);
    
    
        print_addr(test_data, "test_data addr");
        print_addr(test_data2, "test_data2 addr");
    
    
        unsigned long int difference = b - a;
        unsigned long int difference_2 = b - c;
        unsigned long int difference_3 = c - b;
    
    
        printf("Difference: %lu\n", difference);
        printf("Difference 2 (b - c): %lu\n", difference_2);
        printf("Difference 2 (c - b): %lu\n", difference_3);
        
        free(test_char);
        free(test_data);
        return EXIT_SUCCESS;
    }
    
    
    void print_addr(void *addr, char *msg)
    {
        if(NULL == msg)
        {
            return;
        }
        printf("%s : %p\n", msg, addr);
        return;
    }
    This is obviously a test program. I'm trying to understand pointer arithmetic with subtraction. It produces the following output:

    :undefined_behavior user$ ./ptr_arith_ub
    b : 0x7fa591403182
    c : 0x7fa591402dda
    test_char : 0x7fa591402d90
    test_char2 : 0x7fa591402c90
    test_char2 - test_char: 18446744073709551360
    test_data addr : 0x7fa591402ca0
    test_data2 addr : 0x7fa591402ce0
    Difference: 25
    Difference 2 (b - c): 1475739525896764148
    Difference 2 (c - b): 16971004547812787468


    You can see that the math results dont make sense given the numbers. why is b - c 1475739525896764148? 0x7fa591403182 - 0x7fa591402dda clearly should not be that. Thanks.
    If I was homeless and jobless, I would take my laptop to a wifi source and write C for fun all day. It's the same thing I enjoy now!

  2. #2
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    There is no much sense adding or subtracting two pointers. Notice when you do:
    Code:
    struct myStruct *b = test_data + 25;
    One argument is a pointer and the other is an offset. Based on the pointer type this 25 is multiplied by sizeof *test_data and you get the actual address on b.
    Another problem is using unsigned long int. You should use size_t or ptrdiff_t.

    The safest way to calculate the difference between the addresses inside pointers is, for example:
    Code:
    size_t diff = (size_t)b - (size_t)a;
    Doing:
    Code:
    size_t diff = b - a;
    Constitutes in an "undefined behavior" (which one will be the pointer and which one will be the offset?).
    Last edited by flp1969; 08-08-2019 at 07:40 PM.

  3. #3
    Registered User
    Join Date
    Feb 2019
    Posts
    1,078
    Anyway... there is no garantee you'll get a greater address when allocating a second block, like in:
    Code:
    char *a = malloc(512);
    char *b = malloc(10);
    a could be greater than b and vice-versa, depending on malloc algorithm. What is garanteed is that pointer to the allocated block will point to contiguous sequence of bytes...

  4. #4
    Registered User
    Join Date
    May 2019
    Posts
    214
    Code:
    size_t diff = b - a;
    Constitutes in an "undefined behavior" (which one will be the pointer and which one will be the offset?).
    I must disagree with this observation.

    b was assigned

    Code:
    struct myStruct *a = test_data + 0;
    struct myStruct *b = test_data + 25;
    Since they are based off of test_data, they are related. The difference is 25, as expected. This is, however, a distance in myStruct occurrences, not in bytes.

    Of course, like any meaningful subtraction the order matters, because a - b results in -25, which when stored in an unsigned long int, as the code presents, won't make much sense.

    Now this:

    Code:
    size_t difference_2 = b - c;
    Makes no sense, because c was assigned

    Code:
    struct myStruct * c = test_data2 + 5;
    Since test_data and test_data2 come from difference allocations, they are not related and there's no real meaning to the subtraction.

    That was @flp1969's point about the two malloc examples in the post above this one.

    The same applies to c - b, of course.
    Last edited by Niccolo; 08-08-2019 at 08:14 PM.

  5. #5
    Old Fashioned
    Join Date
    Nov 2016
    Posts
    137
    Ah I blew it in that I forgot that by default pointer arithmetic applies to sizeof(data) in any case. Meaning, + 25 isn't 25 bytes in, but rather, it's 25*sizeof(struct myStruct) etc...

    I have a related question on some observations I had when experimenting:

    Code:
    #include <stdio.h>
    #include <stdlib.h>
    
    
    void print_addr(char *msg, void *addr);
    
    
    int main(void)
    {
        char *cp =  malloc(100*sizeof(char));
        int *ip = malloc(100*sizeof(int));
        unsigned long int *uli = malloc(100*sizeof(unsigned long int));
        char *cp_short = malloc(1*sizeof(char));
        char *cp_short2 = malloc(1*sizeof(char));
        print_addr("cp: ", cp);
        print_addr("ip: ", ip);
        print_addr("uli: ", uli);
        print_addr("cp_short: ", cp_short);
        print_addr("cp_short2: ", cp_short2);
    
    
        return EXIT_SUCCESS;
    
    
    }
    
    
    void print_addr(char *msg, void *addr)
    {
        printf("%s%p\n", msg, addr);
        return;
    }
    That produces:

    cp: 0x7fd399c02c90
    ip: 0x7fd399c02d90
    uli: 0x7fd399c02f20
    cp_short: 0x7fd399c02d00
    cp_short2: 0x7fd399c02d10

    Is there any explanation for the assigned addresses? I noticed that it starts at 02c90 for the first char ptr, then goes to 02d90 for the int ptr which means that the int ptr addr was 256 away from the first char ptr.

    Then suddenly, it goes up to 02f20 for the unsigned long int ptr. So we have sometimes 256 difference in addr for ptrs and other time, 400 difference, finally we have only a 16 difference between cp_short and cp_short2. Is this just because the allocator can pick whatever free memory it wants? Do these space differences have anything to do with the pointer type? As I understand, all pointers are the same length on a given architecture, which is usually the word length of the architecture.
    If I was homeless and jobless, I would take my laptop to a wifi source and write C for fun all day. It's the same thing I enjoy now!

  6. #6
    Registered User
    Join Date
    May 2019
    Posts
    214
    One could oversimplify the answer by saying malloc can return just about anything, and is unpredictable.

    While true, there are some answers you may find of interest.

    One thing malloc might attempt (it depends on the implementation and whether you're running a debug build or release build, or a library for alternative allocation schemes), is to use "holes" in the heap. The startup code, which some find surprising there is code executed before "main" is called, may allocate/deallocate (by whatever it chooses to do, and again, modified by "debug" vs "release" builds) material early. This might leave "holes" in the heap. Some of your allocations might fit into one or more of those "holes", and malloc might use one. This could cause returns from malloc to seem to jump all over the place.

    Complicating this is the fact that physical memory isn't as consecutive as it appears to an application, and malloc "knows" this. In cooperation with the operating system, memory is mapped from physical addresses into process addresses, which can make process addresses (the ones you see) seem to be consecutive and contiguous where the physical RAM backing that up may not be.

    This mapping happens on "page" boundaries, a concept imposed by the operating system, and malloc may attempt to align various blocks of allocations along page boundaries under certain circumstances for performance or space concerns.

    Ultimately what you see from the view of the application (running in process address space) is all about the potential memory address space - a volume of RAM up to the limit the process is allowed to consume, which appears to begin at some arbitrary starting point (again, not physically relevant) and continuing to the end of the ability of the processor to address memory. You can, as a result, obtain addresses which can't possibly represent physical addresses in the RAM installed on the computer, but are nonetheless store in that physical RAM, mapped by the operating system so as to appear to be real addresses but are, effectively, virtual addresses.

    The runtime library may cooperate with the operating system to produce locations you can use, but bear little to no resemblance to an expected "order of things".

    To that end, you may find interesting, 0 is a valid memory location - just not to processes in any modern operating system. I don't recall exactly how UEFI handles this, but the older BIOS based PC's used locations in memory starting at zero as an array of pointers to functions which were actually a response table to machine interrupts. When an interrupt occurs it may be that some hardware has literally signaled the CPU on the interrupt input pin (tapping it on the shoulder), which is a message that some hardware event has happened (or some other important event), and requires immediate attention. This could be a keystroke, some data from storage requested earlier, some mouse input, some network data arriving. Whatever the interrupt, it has a numeric value. That value corresponded, in BIOS machines, to an entry in the interrupt table of functions, for which location zero is a valid pointer to a function.

    The point being that the processor itself (usually) does not consider 0 to be an invalid memory location. The operating system presents that as a convention to the processes it launches (such a location is, by definition, reserved for the hardware and can't have a reasonable purpose to the application).

    Is this just because the allocator can pick whatever free memory it wants? Do these space differences have anything to do with the pointer type? As I understand, all pointers are the same length on a given architecture, which is usually the word length of the architecture.
    "malloc" is free to choose whatever it calculates as valid memory, requested from the operating system, but the space differences have no relevance to pointer type. Notice that malloc takes no information which would give it knowledge of the pointer's type.

    That said, there are "types" to pointers you may not yet be aware of. Pointers are not all the same size.

    Pointers to data, like that you're experimenting with, are as you expect. They are sized according to some convention for the processor. In the 16 bit modes of the x86 processor (which we don't use anymore), the pointers had two forms, short and long. You may have noticed that Windows documentation refers to pointers as "LP....", which is a relic from that era of 16 bit windows and DOS on the original 8086, 80286 and 16 modes for subsequent processors in the line. The long pointer was 32 bits in size. The short pointer was 16 bits, but as may immediately occur to you, the DOS and early Windows machines (as a result of the 8086 support) could install up to 1 MByte of RAM. 16 bits isn't enough to address 1 Mbytes on a machine that is otherwise known as a 16 bit CPU.

    This is because the memory model for that CPU was segmented. The short pointer was limited to 64K, and was a performance option for "short memory" jumps in code and references into arrays. However, most pointers had to be represented in a larger model, supporting up to 1 Mbyte. For that, the CPU used two registers, one called a "segment". The segment "portion" of the long pointer identified memory in 16 byte chunks (the segments), and the other portion was called the "offset" - the distance from that segment beginning to the actual position. As a result, it was possible for pointers with different bits to point to the same position in memory. It was very confusing. Such pointers had to be "normalized" in order to compare them.

    Beyond that, for all modes of programming to the present epoch, pointers to some "things" are not the same size as "normal" pointers. Pointers to functions are an example. Pointers to functions are implementation defined, such that even when targeting the exact same operating system and mode (say, both 64 bit Linux targets), one compiler may have a different size for function pointers than for "normal" pointers to data.

    In some compilers, they happen to be the same size, using some "clever" tricks, but on other compilers they can be much, much larger than expected.

    With respect to malloc, however, it's all simple integer math.
    Last edited by Niccolo; 08-09-2019 at 02:27 PM.

  7. #7
    Old Fashioned
    Join Date
    Nov 2016
    Posts
    137
    Niccolo, I appreciate that well-explained response. Luckily for the both of us (you, so I dont keep asking more and more questions), I'm quite familiar with many of those concepts because I've written 16-bit assembly programs before (I used emu8086 with fasm) and several times, I even forgot to load my segment register (stack segment SS) with the proper point and thus all of my offsets were pointing to the wrong memory areas and I was trying to load a stack string from the wrong offsets! That was a rough to debug! Also, I'm familiar with Windows OS internals and the virtual memory manager, paging, etc...

    So I was able to connect the dots here. However, I was still not super familiar with the "promises" that malloc makes and what a proper malloc was required to provide/the reasoning for doing so. This was very helpful because it filled in those missing gaps in my knowledge.
    If I was homeless and jobless, I would take my laptop to a wifi source and write C for fun all day. It's the same thing I enjoy now!

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Pointer arithmetic
    By kelleannmccan in forum C++ Programming
    Replies: 15
    Last Post: 12-07-2011, 11:32 PM
  2. Pointer arithmetic
    By _arjun in forum C Programming
    Replies: 1
    Last Post: 09-20-2011, 11:06 AM
  3. Pointer Arithmetic
    By taurus in forum C Programming
    Replies: 5
    Last Post: 11-14-2008, 03:28 AM
  4. Replies: 1
    Last Post: 03-24-2008, 10:16 AM
  5. Pointer arithmetic
    By depietro in forum C Programming
    Replies: 13
    Last Post: 03-29-2007, 06:03 AM

Tags for this Thread