The real stack __asm{push, pop}

This is a discussion on The real stack __asm{push, pop} within the C Programming forums, part of the General Programming Boards category; Let's play with the real stack... Code: #define MAX 100 int tmp; for(int i=0; i<MAX; i++) { _asm { push ...

  1. #1
    Ugly C Lover audinue's Avatar
    Join Date
    Jun 2008
    Location
    Indonesia
    Posts
    489

    The real stack __asm{push, pop}

    Let's play with the real stack...
    Code:
    #define MAX 100
    int tmp;
    
    for(int i=0; i<MAX; i++)
    {
       _asm
       {
          push i
       };
    };
    
    for(int i=0; i<MAX; i++)
    {
       _asm
       {
          pop tmp
       };
    
       printf("%d ", tmp);
    };
    And I confusing on these:

    1. How much limit of the stack(quantity(push 1,push 2,push 3,push 4) and the size(push LONG_LONG_MAX?))?
    2. GCC syntax, please?

    Btw, thanks in advance. I'm very noob in C-inline-assembly.

  2. #2
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    What are you ACTUALLY trying to achieve?

    Code:
    #include <stdio.h>
    
    int main()
    {
    
    #define MAX 100
        int tmp;
        int i;
        
      
        for(i=0; i<MAX; i++)
        {
    	__asm__ __volatile__("push %0"::"r"(i));
        }
        
        for(i=0; i<MAX; i++)
        {
    	__asm__ __volatile__("pop %0"::"m"(tmp));
    	printf("%d\n", tmp);
        };
    
        return 0;
    }
    This compiles, but doesn't output the right values. I haven't tried to figure out why, because the code doesn't make sense.

    Stack size varies for different OS's, but generally there is "enough" (that is, Windows and Linux gives you a few megabytes of stack, which should be sufficient for any sane application).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  3. #3
    Ugly C Lover audinue's Avatar
    Join Date
    Jun 2008
    Location
    Indonesia
    Posts
    489
    I think the real stack is faster that our typedef struct Stack... ?

  4. #4
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,535
    All very well, until you want to actually return from the function who's stack frame you've been dickering about with.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  5. #5
    Ugly C Lover audinue's Avatar
    Join Date
    Jun 2008
    Location
    Indonesia
    Posts
    489
    Btw it prints out 99 to 0...

    My reason to use stack, I'm now working on parser, and speed of the stack is a critical thing, yeah, you know, even shunting down algorithm need a stack, scanner need stack of tokens, etc...

  6. #6
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,535
    So if you're reading from a file, then already you're going to be millions of times slower than reading from memory.

    Concentrate on making it "right" before you even begin to thing about making it "fast".

    Here's another saying.
    It's easier to optimise a working program than it is to make an optimised program work.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  7. #7
    Ugly C Lover audinue's Avatar
    Join Date
    Jun 2008
    Location
    Indonesia
    Posts
    489
    I already done my code. So I want to make it faster. Maybe using inline assembly stack opcodes (push and pop)...

    Btw, again...
    All very well, until you want to actually return from the function who's stack frame you've been dickering about with.
    This works well on me...
    Code:
    void push(long t)
    {
       __asm
       {
          push t
       };
    };
    
    int pop()
    {
       int i;
       __asm
       {
          pop i
       };
       return i;
    }
    
    int main()
    {
       push(10);
       push(20);
       push(30);
    
       int i;
    
       i = pop();
       printf("&#37;d\n", i);
    
       i = pop();
       printf("%d\n", i);
    
       i = pop();
       printf("%d\n", i);
    }
    We should add a counter that count how much stack we already use and then empty it by popping loop...

    But to make the stack empty I see they use add esi, 15 I don't understand this.

  8. #8
    Chinese pâté foxman's Avatar
    Join Date
    Jul 2007
    Location
    Canada
    Posts
    404
    Quote Originally Posted by audinue View Post
    I think the real stack is faster that our typedef struct Stack... ?
    I wouldn't say so. There's no reason for why a push/pop instruction should be faster than a mov instruction (in fact, I would bet they are slower). And as far as I can see, you are going for straight headache (and for no performance gain) if you try to implement a Stack abstract data type using the push/pop instructions.

    Plus, you might want to take a look at Intel Software Developper's Manual (begin by taking a look at Volume 1, Section 6.2). And your code is wrong.
    I hate real numbers.

  9. #9
    Ugly C Lover audinue's Avatar
    Join Date
    Jun 2008
    Location
    Indonesia
    Posts
    489
    Thanks foxman for the info.

    Btw are you using Intel's C/C++ compiler? How about the produced exe-size, libraries, speed, documentation(manual + reference)?

    If it's good, I'll buy it (~^_^~)*

  10. #10
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,535
    That code only works because the compiler does inline those function calls.
    If it didn't, you'd be looking at instant crashing.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  11. #11
    Chinese pâté foxman's Avatar
    Join Date
    Jul 2007
    Location
    Canada
    Posts
    404
    Quote Originally Posted by audinue View Post
    Btw are you using Intel's C/C++ compiler?
    No I'm not; I'm either using Visual Studio 2008 Professional (which I got for free) or GCC 4.2.3 (which I also got for free... ), depending on which OS I'm running. Visual Studio is like the only good stuff installed on my Windows system.
    I hate real numbers.

  12. #12
    Super Moderator VirtualAce's Avatar
    Join Date
    Aug 2001
    Posts
    9,596
    I think you are way off in the weeds here and nothing I see will magically make your program faster. First profile, then look at your C/C++ and improve it. Based on what I've seen, I doubt your assembly will be faster than the compiler generated assembly. And what do you mean the real stack? What stack do you think your compiler uses?

    Inline assembly may actually be slower because the compiler has no idea what registers you are about to use. Upon return it also does not know what registers you did use. Also you cannot depend on any registers the compiler had been using prior to entry. Because the compiler cannot possibly predict what you are using or what you will use and whether or not you are a good asm programmer and put things back the way they were upon entry to the function it must preserve all registers and flags for inline assembly code. This also means that the compiler will not optimize anything in an asm statement or block since it cannot rely on the state of the flags or registers prior to entry. Also if you are attempting to hand tune the code you would be upset if the compiler magically removed some of it and insisted on using it's version.

    In general with modern compilers you should not need to use any inline assembly. I could see using some inline assembly or pure assembly modules in low-level device code but even then C usually suffices in those types of applications.

    You used to be able to get good performance gains back in the 80's and early 90's when compilers were not as good. However that day has long since passed. I stopped using assembly when I started into DirectX because I realized the hardware was faster than any asm I could produce and for the most part my assembly routines were purely graphics related or sound related. Now I have no need for assembly save for the occasional pixel or vertex shader written in assembly.
    Last edited by VirtualAce; 07-22-2008 at 03:27 PM.

  13. #13
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    This compiles, but doesn't output the right values. I haven't tried to figure out why, because the code doesn't make sense.
    Possibly because we don't know what's on the stack. I've seen C code transform into:
    Code:
    // Push arguments for function
    // Call function
    // push arguments for function
    // call function
    // repeat...
    // pop arguments for all above functions. (Usually on leaving a stack frame)
    Such behavoir would mess up your custom push/pops, since you're popping on the assumption that the parameters to the function are also popped. (Since most of them are cdecl, the above it a fine calling strategy, trading speed (less pops) for stack space (arguments acumulate).)
    But to make the stack empty I see they use add esi, 15 I don't understand this.
    To do mass pops, it is often quicker to subtract from esp, than to do multiple pops. (Usually it's a leave, and you get esp being replaced by ebp.)

    However, I think as the forum has covered, this is going to be so compiler specific and possibly undefined as to be maddening. Furthermore, it's entirely unportable. Stick to just plain ole C, unless you really need assembly. (And then, I'd try to write it all in a assembly function that exposes a C interface.)
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

  14. #14
    Mad OnionKnight's Avatar
    Join Date
    Jan 2005
    Location
    Umeĺ, Sweden
    Posts
    555
    Correct me if I'm wrong, but I think the x86 stack is fast mostly because it's a flat array, and because it has specific instructions for pushing and popping. If you want a fast stack, just mimic that for your own stack type.
    Code:
    #include <stdio.h>
    
    #define MAX 100
    
    struct my_stack {
    	int stack[MAX];
    	int* pos;
    };
    
    void my_stack_init (struct my_stack* s) {s->pos = s->stack;}
    void my_stack_push (struct my_stack* s, int v) {*s->pos = v; s->pos++;}
    int my_stack_pop (struct my_stack* s) {s->pos--; return *s->pos;}
    
    int main (void)
    {
    	struct my_stack stack;
    	int i;
    
    	my_stack_init(&stack);
    	for (i = 0; i < MAX; i++)
    		my_stack_push(&stack, i);
    	for (i = 0; i < MAX; i++)
    		printf("&#37;d ", my_stack_pop(&stack));
    	putchar('\n');
    
    	return 0;
    }

  15. #15
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    I agree with your analysis onionknight.

    There is nothing magical about how the stack itself works - but there is a single instruction to store to memory and decrement (push) and load from memory and increment (pop). The individual instructions will take two instruction slots, which in many architectures would have the capability of executing in the same clock-cycle.

    Edit: I started out, but didn't finish it, on a comment explaining that as separate functions, you CAN NOT push and pop the stack - it only works as long as the functions are inlined, and there's nothing preventing the compiler from storing and restoring things from stack locations, and potentially leaving stuff on the stack until later for it's own purposes.

    --
    Mats
    Last edited by matsp; 07-23-2008 at 08:17 AM.
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

Page 1 of 2 12 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Stack problem - I've hit a wall!
    By miniwhip in forum C Programming
    Replies: 7
    Last Post: 11-14-2007, 02:05 AM
  2. Help With Stacks
    By penance in forum C Programming
    Replies: 7
    Last Post: 10-09-2005, 02:47 PM
  3. Stack using push and pop
    By silicon in forum C++ Programming
    Replies: 5
    Last Post: 11-03-2003, 03:54 PM
  4. Stack
    By planet_abhi in forum C Programming
    Replies: 2
    Last Post: 04-12-2003, 04:22 AM
  5. Stack functions as arrays instead of node pointers
    By sballew in forum C Programming
    Replies: 8
    Last Post: 12-04-2001, 10:13 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21