Thread: Kernel bug? (attn matsp)

  1. #1
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396

    Kernel bug? (attn matsp)

    I'm messing around with writing my own dynamic linker for Linux (replacement for ld-linux.so). Yes, I'm a maniac.

    I discovered what I consider a kernel bug, but I'm sure the kernel people would disagree. It's a simple bug really: the kernel honors the load addresses inside the program interpreter.

    Here's what I was seeing. I created a small test program which uses my new dynamic linker as its program interpreter (PT_INTERP). My test program has 8 PHDR entries.

    When I launch the program and breakpoint inside my dynamic linker, I can see the aux vector, and it indeed indicates that there are 8 PHDR entries, and gives a pointer to their location: 0x08048034.

    But when my linker goes to examine the entries at this address, it finds... 3 PHDRs?! Everything after the third looks like garbage.

    Okay, here's what happened. Most Linux programs are linked with a base load address of 0x08048000. Since my dynamic linker itself is not linked in any particularly special way, it too is given a load address of 0x08048000.

    So what does the kernel do? It doesn't whine at all! It happily maps my dynamic linker RIGHT OVER THE TOP OF the program being dynamically linked! STUPID! So what I was seeing was the linker's own PHDRs, not the PHDRs of the program. Thanks for warning me, Mr. Kernel!

    I fixed my problem by deliberately re-basing the dynamic linker using a link script. Now it loads at address 0x00000000, which I didn't actually think was possible but apparently it is. One of the things my linker must do is remap itself higher in memory, since residing on the NULL code page is a big stupid no-no.

    Weird stuff.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    I have no idea whether this is expected behaviour or not, but I would say it is a bug.

    You may want to find someone who is maintaining that part of the kernel and ship of an e-mail to the maintainer, or enter it into bugzilla, I suppose there is one for the kernel itself.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Registered User
    Join Date
    Oct 2007
    Posts
    32
    Quote Originally Posted by brewbuck View Post
    I'm messing around with writing my own dynamic linker for Linux (replacement for ld-linux.so). Yes, I'm a maniac.

    I discovered what I consider a kernel bug, but I'm sure the kernel people would disagree. It's a simple bug really: the kernel honors the load addresses inside the program interpreter.

    Here's what I was seeing. I created a small test program which uses my new dynamic linker as its program interpreter (PT_INTERP). My test program has 8 PHDR entries.

    When I launch the program and breakpoint inside my dynamic linker, I can see the aux vector, and it indeed indicates that there are 8 PHDR entries, and gives a pointer to their location: 0x08048034.

    But when my linker goes to examine the entries at this address, it finds... 3 PHDRs?! Everything after the third looks like garbage.

    Okay, here's what happened. Most Linux programs are linked with a base load address of 0x08048000. Since my dynamic linker itself is not linked in any particularly special way, it too is given a load address of 0x08048000.

    So what does the kernel do? It doesn't whine at all! It happily maps my dynamic linker RIGHT OVER THE TOP OF the program being dynamically linked! STUPID! So what I was seeing was the linker's own PHDRs, not the PHDRs of the program. Thanks for warning me, Mr. Kernel!

    I fixed my problem by deliberately re-basing the dynamic linker using a link script. Now it loads at address 0x00000000, which I didn't actually think was possible but apparently it is. One of the things my linker must do is remap itself higher in memory, since residing on the NULL code page is a big stupid no-no.

    Weird stuff.

    It's looks like you build your dynamic linker as program, not as shared object.
    When kernel loaded shared objects it's free to map it at any address. And responsible to avoiding conflicts like yours.
    But when kernel loaded executable it should load it at exactly same addresses as specified in the program headers.

    So assuming I am right about your dynamic linker being a program rather shared object I think result is quite native. (while not desirable

  4. #4
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Valery Reznic View Post
    It's looks like you build your dynamic linker as program, not as shared object.
    When kernel loaded shared objects it's free to map it at any address. And responsible to avoiding conflicts like yours.
    But when kernel loaded executable it should load it at exactly same addresses as specified in the program headers.

    So assuming I am right about your dynamic linker being a program rather shared object I think result is quite native. (while not desirable
    Yep! I finally realized this. I was also having some trouble with position-independence -- using -fPIC doesn't work, because it causes inter-module function dispatch to go through the PLT. But the PLT is useless because I'm the dynamic linker and I have nobody to set it up for me

    So I ended up using -fpie instead, linked with -shared, and now all is well.

    I still think the call to exec() should fail if the program interpretter would be mapped over the top of the ELF image. That's just dumb.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  5. #5
    Registered User
    Join Date
    Oct 2007
    Posts
    32
    Quote Originally Posted by brewbuck View Post
    Yep! I finally realized this. I was also having some trouble with position-independence -- using -fPIC doesn't work, because it causes inter-module function dispatch to go through the PLT. But the PLT is useless because I'm the dynamic linker and I have nobody to set it up for me

    So I ended up using -fpie instead, linked with -shared, and now all is well.

    I still think the call to exec() should fail if the program interpretter would be mapped over the top of the ELF image. That's just dumb.
    Neither kernel nor user space never prevent you from shouting yourself.
    rm -rf / is sure worse then your case, but still kernel doesn't prevent it.

    And even if it did - I don't think you was better off with execve failed with EINVAL than you are now.

    mmap with MMAP_FIXED flag set also will happily overwrite already mapped memory.

    By the way, why you need your own dynamic linker ?
    What did you miss in the standard one ?

  6. #6
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Valery Reznic View Post
    Neither kernel nor user space never prevent you from shouting yourself.
    rm -rf / is sure worse then your case, but still kernel doesn't prevent it.
    A person could conceivably want to do "rm -rf /". But it makes no sense to EVER map the interpreter on top of the program image. Without the image, there is nothing for the interpreter to do. It's a scenario that makes no sense.

    And even if it did - I don't think you was better off with execve failed with EINVAL than you are now.
    I would have been able to diagnose my problem much faster.

    mmap with MMAP_FIXED flag set also will happily overwrite already mapped memory.
    That's a design flaw. You should be required to deliberately unmap the region first before mapping something else there (or at least specify a flag indicating your intention, i.e. MAP_REPLACE). Again, it leaves the door open for massive confusion when things can silently map on top of other things when that is not the intention.

    By the way, why you need your own dynamic linker ?
    What did you miss in the standard one ?
    If I can implement a dynamic linker, then I know I understand how ELF dynamic linking works in the minutest detail.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  7. #7
    Registered User
    Join Date
    Oct 2007
    Posts
    32
    Quote Originally Posted by brewbuck View Post
    A person could conceivably want to do "rm -rf /". But it makes no sense to EVER map the interpreter on top of the program image. Without the image, there is nothing for the interpreter to do. It's a scenario that makes no sense.
    I still can't see how "rm -rf /" make sense.
    But the point is - kernel usually not held your hand. It provides services and
    left to user space to make sense of it.

    And I think usually it's a good thing.


    That's a design flaw. You should be required to deliberately unmap the region first before mapping something else there (or at least specify a flag indicating your intention, i.e. MAP_REPLACE). Again, it leaves the door open for massive confusion when things can silently map on top of other things when that is not the intention.
    Agree. But I think this behavior required by POSIX, so no hope to that change :(



    If I can implement a dynamic linker, then I know I understand how ELF dynamic linking works in the minutest detail.
    I want someday to do it too. But that day is still a long way away :)

  8. #8
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by Valery Reznic View Post
    I still can't see how "rm -rf /" make sense.
    One realistic scenario would be a program which is launched inside a chroot environment, which wipes its local root when done doing whatever it's doing...

    Agree. But I think this behavior required by POSIX, so no hope to that change
    Yeah... Standards are good and bad. Linux could maintain compatibility by using a MAP_NOREPLACE flag, since not using this flag would not change the behavior. Of course, users would have to remember to use it. But simply the existence of such a flag would entail some language in the documentation, which might have clued me in to my mistake a lot sooner...
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Kernel building error
    By nsayag in forum Linux Programming
    Replies: 8
    Last Post: 07-15-2009, 06:04 PM
  2. SOS - Can a monolithic kernel be used in a micro-kernel style OS?
    By sean in forum A Brief History of Cprogramming.com
    Replies: 7
    Last Post: 11-20-2008, 09:30 AM
  3. ATL bug of CComPtr?
    By George2 in forum Windows Programming
    Replies: 6
    Last Post: 04-07-2008, 07:52 AM
  4. Programming RIP2 with kernel routes table
    By jpablo in forum Linux Programming
    Replies: 1
    Last Post: 04-22-2006, 11:26 AM
  5. CreateThread ?!
    By Devil Panther in forum Windows Programming
    Replies: 13
    Last Post: 11-15-2005, 10:55 AM