Thread: Word boundaries

  1. #1
    Caution: Wet Floor
    Join Date
    May 2006
    Posts
    55

    Word boundaries

    How do you figure out how objects are aligned on a particular implementation?

    Code:
    #include <iostream>
    using namespace std;
    
    int main() {
    
      char* pc = 0;
      int* pi = 0;
      void* pv = 0;
    
      cout << "Address of pc and pc+1: " << &pc << ", " << &pc+1 << '\n';
      cout << "Address of pi and pi+1: " << &pi << ", " << &pi+1 << '\n';
      cout << "Address of pv and pv+1: " << &pv << ", " << &pv+1 << '\n';
    }
    The addresses are different whenever the program is run (not surprising). But how would you determine if, say, the machine always aligns a char* object on an "even" (like 0x0ffff0b4) instead of an "odd" (like 0x0ffff0b5)? Running the program several times gives a convincing, but not definite, answer. How can you tell for sure?

  2. #2
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    If it gives you odd addresses, then it can't mean it aligns them on word boundaries, can it? But then it's also a question about virtual vs. physical since virtual is looked up to physical address. I don't know if you can see if memory is aligned to word addresses in physical memory, though.

  3. #3
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by Elysia View Post
    If it gives you odd addresses, then it can't mean it aligns them on word boundaries, can it? But then it's also a question about virtual vs. physical since virtual is looked up to physical address. I don't know if you can see if memory is aligned to word addresses in physical memory, though.
    But on all popular machines, a virtual address and a physical address is equivalent on the lowest bits, so an odd virtual address is an odd physical address. For example, an x86 processor (AMD and Intel) has the lowest 12 bits common between the virtual and the physical address, and only the upper 20 bits are affected by the translation from virt to phys.

    Also, the code posted doesn't show the address of a char, but the address of a pointer to char - this will always be on an even address unless you go out of your way to "undo" the compilers attempts to make it even.

    Code:
    int main() {
    
      char* pc = 0;
      int* pi = 0;
      void* pv = 0;
    
      cout << "Address of pc and pc+1: " << pc << ", " << pc+1 << '\n';
      cout << "Address of pi and pi+1: " << pi << ", " << pi+1 << '\n';
      cout << "Address of pv and pv+1: " << pv << ", " << pv+1 << '\n';
    }
    This will show you the address you want, and I expect pc+1 is 1 in this case, pi+1 is 4, etc.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  4. #4
    Caution: Wet Floor
    Join Date
    May 2006
    Posts
    55
    The only thing that I was able to figure out from that program is how good
    the implementation is at memory management. There are no "gaps"
    when each of int*, char*, and void* are declared in that particular order.

    But here's a case where you can introduce holes:

    Code:
    #include <iostream>
    using namespace std;
    
    int main() {
    
      struct CS {
    
        const wchar_t szOffsetCS[80 + 1]; 
        void* pvCS;
        int* piCS;
        char* pcCS;
      };
    
      CS cs = {
        L"The quick brown fox jumped over the lazy dog!",
        0,
        0,
        0
      };
    
      cout << "Size of `cs' "
           << (  sizeof(cs) == sizeof(cs.szOffsetCS)
                +sizeof(cs.pvCS)
                +sizeof(cs.piCS)
                +sizeof(cs.pcCS) ? "== " : "!= " )
           << "sum of sizes of its contents."
           << '\n';
    
    }
    Output:
    Code:
    Size of `cs' == sum of sizes of its contents.
    Code:
    #include <iostream>
    using namespace std;
    
    int main() {
    
      struct CS {
    
        const char szOffsetCS[80 + 1]; 
        void* pvCS;
        int* piCS;
        char* pcCS;
      };
    
      CS cs = {
        "The quick brown fox jumped over the lazy dog!", // not wide string now
        0,
        0,
        0
      };
    
      cout << "Size of `cs' "
           << (  sizeof(cs) == sizeof(cs.szOffsetCS)
                +sizeof(cs.pvCS)
                +sizeof(cs.piCS)
                +sizeof(cs.pcCS) ? "== " : "!= " )
           << "sum of sizes of its contents."
           << '\n';
    
    }
    Output:
    Code:
    Size of `cs' != sum of sizes of its contents.
    I'm not sure if this tells you anything about word boundaries, though.

  5. #5
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    When allocating just on stack in form of variables, they're placed after each other on the stack. If you allocate them on the heap, their positions will likely differ.
    With structs, each member is aligned to word boundaries (well, it's the default behavior of most compilers anyway), but if you use sizes that are dividable by 2, then the compiler won't need to pad.

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    That's not "how good" it is, that's a measure of "how bad it is".

    If the compiler actually stores a pointer on an odd boundary, EVERY TIME you access that pointer, you get a penalty of at least one clock-cycle compared to the normal access time. On some processors, it would cause the application to crash.

    Most people have 256MB or more memory in their machines. With the char version of your structure, you save 3 bytes in a "packed" case, which means that you get 93 bytes instead of 96 bytes. You can have 2.88 million structures at 93 bytes, and 2.79 million at 96 bytes. So about 90000 more structures. There are probably better ways to actually store more data in the memory than that - for exampl, dynamically allocate the szOffset data, and not have a fixed size of 80 chars!

    Is this on a Linux box? Because I thought wchar_t on a Windows box was 16 bits, but perhpas I'm just confused.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by Elysia View Post
    When allocating just on stack in form of variables, they're placed after each other on the stack. If you allocate them on the heap, their positions will likely differ.
    With structs, each member is aligned to word boundaries (well, it's the default behavior of most compilers anyway), but if you use sizes that are dividable by 2, then the compiler won't need to pad.
    Alignment is NORMALLY done to the components own size, so a 4 byte integer will be aligned on an even 4 byte boundary, a 8 byte double on an 8 byte boundary, a 2 byte short on a 2 byte boundary.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  8. #8
    C++まいる!Cをこわせ!
    Join Date
    Oct 2007
    Location
    Inside my computer
    Posts
    24,654
    Yeah, Unicode is 2 bytes on Windows.

  9. #9
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Another point on the subject:

    You can almost always avoid "padding" if you start with the biggest data types, e.g. doubles, long long, in 64-bit machines long & pointers may be 64-bit too, then 32-bit values, such as long & pointers [32-bit machines], int etc. Then smaller types, like short,and finally char.

    That way, the alignment of the smaller type will still match up, because the bigger type is a multiple of the size of the smaller.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Seg Fault in Compare Function
    By tytelizgal in forum C Programming
    Replies: 1
    Last Post: 10-25-2008, 03:06 PM
  2. seg fault at vectornew
    By tytelizgal in forum C Programming
    Replies: 2
    Last Post: 10-25-2008, 01:22 PM
  3. please help with binary tree, urgent.
    By slickestting in forum C Programming
    Replies: 2
    Last Post: 07-22-2007, 07:55 PM
  4. brace-enclosed error
    By jdc18 in forum C++ Programming
    Replies: 53
    Last Post: 05-03-2007, 05:49 PM
  5. Wrong Output
    By egomaster69 in forum C Programming
    Replies: 7
    Last Post: 01-28-2005, 06:44 PM