Thread: Relocation in .obj -files

  1. #1
    Registered User
    Join Date
    Jan 2005
    Posts
    70

    Relocation in .obj -files

    Originally replied by Salem to OS-indepedent stuff?
    (Yes, with that inventive spelling by me)

    All the object file formats I know about contain a lot of relocation information as well. They're not simply "memory dumps" of the assembler code they represent.
    You're right. I thought the reason it didn't work was just that I hadn't arranged for the "ORG 100h" that reserves the starting 256 bytes of the in-memory-image that DOS wants to use to contain the command line (and possibly other things) but examaning one with a disassembler proves that I was indeed wrong.

    But why does an .obj not passed through a linker contain relocation info? Musn't the OS add the current load point to each memory accessing instruction anyway?

    Or does a relocatable program contain an initial where-am-I-loaded-at OS-call and then code that adds that variable/registry value to/before each memory accessing instruction (so this don't have to be arranged at load time)?

  2. #2
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    The exe contains the relocation information.

    Look up some information on www.wotsit.org about the exe format and the (not so) new portable exe format.

  3. #3
    Registered User
    Join Date
    Jan 2005
    Posts
    70
    >>The exe contains the relocation information>>

    But Salman said it's present already in the .obj? Unfortunately he's tired of "my ramblings"...

  4. #4
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    Yes, relocations, if needed, are done at process startup. Assumably the obj file contains the the locations of each address that would need to be patched.

  5. #5
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    An obj might contain relocation information for the final EXE build, but you can't execute an obj, therefore the final relocation information and all your program needs to execute correctly is inside of the exe. OBJs are temporary in-between object files that are eventually assembled to DLLs or EXEs. DLLs also contain relocation information since they really only differ from a true EXE by one byte. They are handled much differently by the OS than EXEs.

    Your linker takes the information in the obj and then places the relocation information gathered from the obj into the correct location in the exe. Windows then looks into the exe and parses the header for the relocation information (usually tables), allocates memory for your program (again this value is retrieved from the exe header), sets the initial segment and explicit return register values, loads your code into the allocated space, point CS to the code segment/selector and sets EIP to point at the first opcode in the program and then jumps to that location to begin executing the code. The relocation information is used to correctly address your variables, classes, structs, functions, etc. It's all just offsets from the starting address.

    I encourage you to look up the obj file format, the exe file format(MZ), and the portable exe file format(PZ).


    But in a cooperative pre-emptive multi-tasking OS like Windows a lot more is taking place as well. There are task switches taking place every so many milliseconds of clock time which is based on priority of threads, etc, etc. Since it is a pre-emptive system in that the program has no idea of when it will be task switched or when it will lose control of the CPU, a lot of register saving and context saving goes on as well.
    Last edited by VirtualAce; 04-06-2005 at 07:31 AM.

  6. #6
    Registered User
    Join Date
    Jan 2005
    Posts
    847
    I don't think windows PE files make use of relocations allthough DLL files may.
    A dll may not always be loaded at the same address because another dll may be loaded at that address but an exe should always be loaded at the same address.
    Consider this dissasembly of a windows exe with an Image base of 00400000
    Code:
    00401125  |. 68 30214200    PUSH test2.00422130
    68 is the push instruction and 30214200 is the address 00422130. The correct address is part of the instruction.

  7. #7
    Registered User VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,607
    A dll may not always be loaded at the same address because another dll may be loaded at that address but an exe should always be loaded at the same address.
    It may not be possible to load an exe at the same address as when it was compiled. This is because the host computer might be setup totally different and you have no idea what is in memory. Even though that address was free in the computer in which it was coded, does not mean it is free on the system the program is being run on. Thus the reason for relocation information. If we could guarantee that the address on all computers would always be free then there would be no reason for relocation information. However one cannot guarantee this and so exe's, dll's, and portable exe's all contain relocation information.

    Otherwise, if they didn't, you would face the possibility that the program would only run under certain memory configurations and would probably eliminate about 60% of the systems it could be run on.

    I cannot see how any exe or dll cannot contain relocation information and yet still run on every system it was designed to be run on. If there is no relocation information then you are sticking all your eggs in one basket, hoping that such and such an address is free. At worst, Windows would have to move the current code that is occupying that address to another address, which then brings into play relocation information for that program and the fact that moving a block of code while it is being executed is probably not a great idea.

    PE's
    Figure 20. Fixup Block Format

    To apply a fixup, a delta needs to be calculated. The 32-bit delta
    is the difference between the preferred base, and the base where the
    image is actually loaded. If the image is loaded at its preferred
    base, the delta would be zero, and thus the fixups would not have to
    be applied. Each block must start on a DWORD boundary. The ABSOLUTE
    fixup type can be used to pad a block.

    PAGE RVA = DD Page RVA. The image base plus the page rva is added to
    each offset to create the virtual address of where the fixup needs to
    be applied.

    BLOCK SIZE = DD Number of bytes in the fixup block. This includes the
    PAGE RVA and SIZE fields.

    TYPE/OFFSET is defined as:

    1 1 0
    5 1

    TYPE OFFSET

    Figure 21. Fixup Record Format

    TYPE = 4-bit fixup type. This value has the following definitions:

    o 0h __ABSOLUTE. This is a NOP. The fixup is skipped.

    o 1h __HIGH. Add the high 16-bits of the delta to the 16-bit field
    at Offset. The 16-bit field represents the high value of a 32-
    bit word.

    o 2h __LOW. Add the low 16-bits of the delta to the 16-bit field
    at Offset. The 16-bit field represents the low half value of a
    32-bit word. This fixup will only be emitted for a RISC machine
    when the image Object Align isn't the default of 64K.

    o 3h __HIGHLOW. Apply the 32-bit delta to the 32-bit field at
    Offset.

    o 4h __HIGHADJUST. This fixup requires a full 32-bit value. The
    high 16-bits is located at Offset, and the low 16-bits is
    located in the next Offset array element (this array element is
    included in the SIZE field). The two need to be combined into a
    signed variable. Add the 32-bit delta. Then add 0x8000 and
    store the high 16-bits of the signed variable to the 16-bit
    field at Offset.

    o 5h __MIPSJMPADDR.

    All other values are reserved.
    OBJs
    Segment Attributes Field
    ------------------------

    The Segment Attributes field is a variable-length field;
    its layout is:

    <-3 bits-> <-3 bits-> <-1 bit-> <-1 bit-> <-2 bytes--> <--1 byte-->
    A C B P Frame Number Offset
    <conditional> <conditional>

    The fields have the following meanings:

    A Alignment

    A 3-bit field that specifies the alignment required when
    this program segment is placed within a logical segment.
    Its values are:

    0 Absolute segment.

    1 Relocatable, byte aligned.

    2 Relocatable, word (2-byte, 16-bit) aligned.

    3 Relocatable, paragraph (16-byte) aligned.

    4 Relocatable, aligned on 256-byte boundary (a "page"
    in the original Intel specification).

    5 Relocatable, aligned on a double word (4-byte)
    boundary. This value is used by the PharLap OMF for
    the same alignment.

    6 This value is used by the PharLap OMF for page (4K)
    alignment. It is not supported by LINK.

    7 Not defined.

    The new values for LINK386 are A=4 and A=5. Double word
    alignment is expected to be useful as 32-bit memory paths
    become more prevalent. Page-align is useful for certain
    hardware-defined items (such as page tables) and error
    avoidance.

    If A=0, the conditional Frame Number and Offset fields
    are present and indicate the starting address of the
    absolute segment. LINK ignores the Offset field.

    Conflict: The original Intel specification included
    additional segment-alignment values not supported by
    Microsoft; alignment 5 now conflicts with the following
    LINK386 extensions:

    5 "unnamed absolute portion of memory address space"

    6 "load-time locatable (LTL), paragraph aligned if not
    part of any group"
    DOS EXEs
    .EXE - DOS EXE File Structure

    Offset Size Description

    00 word "MZ" - Link file .EXE signature (Mark Zbikowski?)
    02 word length of image mod 512
    04 word size of file in 512 byte pages
    06 word number of relocation items following header
    08 word size of header in 16 byte paragraphs, used to locate
    the beginning of the load module
    0A word min # of paragraphs needed to run program
    0C word max # of paragraphs the program would like
    0E word offset in load module of stack segment (in paras)
    10 word initial SP value to be loaded
    12 word negative checksum of pgm used while by EXEC loads pgm
    14 word program entry point, (initial IP value)
    16 word offset in load module of the code segment (in paras)
    18 word offset in .EXE file of first relocation item
    1A word overlay number (0 for root program)

    - relocation table and the program load module follow the header
    - relocation entries are 32 bit values representing the offset
    into the load module needing patched
    - once the relocatable item is found, the CS register is added to
    the value found at the calculated offset

    Registers at load time of the EXE file are as follows:

    AX: contains number of characters in command tail, or 0
    BX:CX 32 bit value indicating the load module memory size
    DX zero
    SS:SP set to stack segment if defined else, SS = CS and
    SP=FFFFh or top of memory.
    DS set to segment address of EXE header
    ES set to segment address of EXE header
    CS:IP far address of program entry point, (label on "END"
    statement of program)


    They all contain or can contain address fixup and/or relocation information. It seems wotsit does not have any information on DLLs. I'm sure MSDN does.
    Last edited by VirtualAce; 04-06-2005 at 02:21 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Error opening files in a different dir
    By Ozzie in forum C++ Programming
    Replies: 3
    Last Post: 10-09-2008, 06:55 AM
  2. Working with muliple source files
    By Swarvy in forum C++ Programming
    Replies: 1
    Last Post: 10-02-2008, 08:36 AM
  3. Folding@Home Cboard team?
    By jverkoey in forum A Brief History of Cprogramming.com
    Replies: 398
    Last Post: 10-11-2005, 08:44 AM
  4. Batch file programming
    By year2038bug in forum Tech Board
    Replies: 10
    Last Post: 09-05-2005, 03:30 PM
  5. .obj files?
    By Goku-ssj in forum Game Programming
    Replies: 3
    Last Post: 09-15-2002, 04:14 AM