Thread: Enforcing Machine Code Restrictions?

  1. #1
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273

    Question Enforcing Machine Code Restrictions?

    Hello,

    I'm currently looking at creating a sandboxed environment for running code, but I don't necessarily wanna go down the whole VM route. Cross-platform usage is not a concern at this point. What I want to do is create a situation where you can compile the usual (in this case, x86) code and run it via a sort of loader which completely separates what you do from the operating system.

    "You've described a VM you ignorant (insert bizarre foreign colloquialism here)!", you've no doubt said to yourself. Ah, but i want to be cleverererer than that!

    If we look at the Win32 environment specifically; let's say that I have a loader that effectively just reads your code from a file, shoves into an executable page in memory and starts a thread at its entry point. Your code isn't in PE format, you can't statically link to any Win32 functions.

    So you're running in my loader's address space... but there's a problem. Although you can't import any Win32 functions, because your code is just x86 you can attempt to look at a location in the address space that you think contains KERNEL32.DLL (because every Windows program has this in its address space). From there, you can walk the export table and effectively link to the functions you need to break out of this dump and do some REAL stuff...

    Now, Address Space Layout Randomization (which was introduced in Vista) mitigates this to some extent, but what I would like to do is pre-empt code from getting this far.

    The best way I can think of doing this is by scanning the code prior to execution for particular instructions (like CALL for example) and analyzing the context to see if badness is being attempted. This would work swimmingly for an immediate operand, but would be harder to do with registers or indirect memory locations without effectively interpreting the code (which I want to avoid!).

    One way I could perhaps alleviate this is to enforce (via software) separation of code and data segments, so that no instruction can read or write memory in the code segment. I suppose I could extend this to "locking down" the segment registers so that if instructions like "POP CS" are detected then the program can be stopped before execution. The only thing is, is there any difference between the CS and DS registers in most Win32 programs?

    I know I'm in way over my head here, so I'll keep the question simple: can I do what I want to do without resorting to interpretation? Am I wasting my time even thinking about this?

  2. #2
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    I'd say no can do. You cannot scan the code before execution, because it's easy enough to get around such things by being tricky enough, e.g. xor-masking the relevant code and unmasking it at runtime. Not to mention the incredible pattern recognition engine you'd need to realize that any given code is malicious, even if you can scan it.
    Since you run in the same address space, you can't do page protection things, because you'd be killing your own loader.

    There are two things that could work here.
    One is going to the kernel level and implementing your own subsystem. Windows already has the Win32, OS/2 and POSIX subsystems. Incidently, only applications running in the Win32 subsystem need kernel32.dll and friends. You could - in theory - write your own loader and subsystem at the kernel level and have processes that can use only your API for doing stuff. The problem with this approach is that I have no idea if MS has ever released documentation on how to write a subsystem, or if they reserve that right for themselves.
    The other option is to use virtualization support to kind of let the client program run inside such an environment. The problem with this is the huge programming effort (you'd be implementing your own little OS, even if it is very small and specialized).
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #3
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273
    I would say that preventing code from touching the code segment (or pages if segments aren't really relevant anymore) would go most of the way to stopping it from hiding things.

    One other idea I've had overnight is to somehow replace long JMPs and CALLs with a call to a sort of gate function which checks the context at that point and decides whether to allow or raise an exception. As any calculations would have already been done then there's no guesswork involved, just checking that the destination address is within bounds. No other instructions would need to be looked at. Sound feasible?

  4. #4
    Reverse Engineer maxorator's Avatar
    Join Date
    Aug 2005
    Location
    Estonia
    Posts
    2,318
    Quote Originally Posted by SMurf View Post
    One other idea I've had overnight is to somehow replace long JMPs and CALLs with a call to a sort of gate function which checks the context at that point and decides whether to allow or raise an exception. As any calculations would have already been done then there's no guesswork involved, just checking that the destination address is within bounds. No other instructions would need to be looked at. Sound feasible?
    How exactly are you planning to replace 6 bytes of code (original jmp/call) with, for example, 12 bytes (gateway call)?
    "The Internet treats censorship as damage and routes around it." - John Gilmore

  5. #5
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273
    Yes, you're right, that did cross my mind...

    Only way I can think of is to require that those particular instructions in the compiled code include NOPs for padding. Not ideal.

  6. #6
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by maxorator View Post
    How exactly are you planning to replace 6 bytes of code (original jmp/call) with, for example, 12 bytes (gateway call)?
    The same way everybody else does that -- a trampoline.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  7. #7
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,195
    There is no way to prevent malicious code from making a total ass of itself except through avirtual machine, thats what virtual machines are for. Now you could write your own OS or subsystem, and limit what teh code in ring 3 has access to insofar as callgates, but ultimately it can always do somethign malicious with those as well, if they for example allow file writing.

    Just bite the bullet and run Bochs, thent he code xcan do whatever it wants and you dont have to be bothered with the specifics of the VM< just write plain old code and import it through a cd or hard drive image.

  8. #8
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273
    abachler, I know a VM would solve this problem easy, but the idea is to see how far I could control a more conventional situation, given the opportunity to review the the entire code prior to execution.

    Another idea I've come up with is to scan for long JMPs and CALLs, make a note of their locations and then replace the opcode in question with an invalid one. This would generate an illegal instruction exception which I could then handle, check the faulting address against the table that I made earlier and check the access bounds that way (or leave it unhandled if I didn't cause it). Would there be any drawbacks to that approach?

  9. #9
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    As CornedBee pointed out, the program could easily hide JMP's/CALL's, making scanning pointless. It seems the only way to do this would be a VM...
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  10. #10
    Registered /usr
    Join Date
    Aug 2001
    Location
    Newport, South Wales, UK
    Posts
    1,273
    Surely marking code pages as read only would nix that?

  11. #11
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    True. In that case, it might be doable. But you'd probably need to use a different invalid instruction for each type of branch, to keep track of things. In each case, you could just overwrite the first byte of the branch instruction. Of course, if it's a CALL, you'd need to push the return address onto the stack manually...
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  12. #12
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Quote Originally Posted by SMurf View Post
    Surely marking code pages as read only would nix that?
    Yes, but you also have to mark all non-code pages as NX.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  13. #13
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Can I be pedantic and point out that POP CS is not a legal x86 instruction. The fact is that the bit pattern that if you follow other PUSH/POP pairs, POP CS is 0x0F, which is used as a prefix for x86 instructions.

    Next, as you say, CS and DS cover the same area in memory.

    Scanning for jmp/call instructions will not prevent this:
    Code:
        push a
        push b
        ret
    a:
    ....
    
    b:
       .... 
       ret
    This will jump to b, and return to a.

    And if you think build something that scans and detects that, I can almost certainly make something up that is harder to discover, using offsets and the return address of the current function or some such.

    Also consider jump tables (such as you'd use with switch-statements for example).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #14
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Quote Originally Posted by matsp View Post
    Can I be pedantic and point out that POP CS is not a legal x86 instruction. The fact is that the bit pattern that if you follow other PUSH/POP pairs, POP CS is 0x0F, which is used as a prefix for x86 instructions.

    Next, as you say, CS and DS cover the same area in memory.

    Scanning for jmp/call instructions will not prevent this:
    Code:
        push a
        push b
        ret
    a:
    ....
    
    b:
       .... 
       ret
    This will jump to b, and return to a.

    And if you think build something that scans and detects that, I can almost certainly make something up that is harder to discover, using offsets and the return address of the current function or some such.

    Also consider jump tables (such as you'd use with switch-statements for example).

    --
    Mats
    Things aren't looking so well.

    I just thought of another thing, too. Since raw data can be mixed with the instructions, the scanner would have to follow all possible paths to make sure it doesn't inadvertantly overwrite a chunk of data.
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  15. #15
    Malum in se abachler's Avatar
    Join Date
    Apr 2007
    Posts
    3,195
    CS and DS do not necessarily point to the same area of memory, and in fact may not be ABLE to point to the same area for some systems where the status pins are used as address pins to impliment a harvard architecture.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Obfuscated Code Contest
    By Stack Overflow in forum Contests Board
    Replies: 51
    Last Post: 01-21-2005, 04:17 PM
  2. Voting Machine Source Code
    By anonytmouse in forum Tech Board
    Replies: 1
    Last Post: 11-03-2003, 05:12 PM
  3. The relationship between C++ and assembly and machine code
    By TotalBeginner in forum C++ Programming
    Replies: 5
    Last Post: 04-22-2002, 02:46 PM
  4. Machine code, asm, poop
    By tim545666 in forum A Brief History of Cprogramming.com
    Replies: 3
    Last Post: 04-02-2002, 01:27 AM
  5. Replies: 4
    Last Post: 01-16-2002, 12:04 AM