Mixed Language: C and Asm

Hi,
can someone show me a little program using mixed language, C and Assembly AT&T syntax.
I can't understand how it works :( .
Thanks

I suppose the first question is going to be "what's wrong with the 38,756 examples you can find on the web?"

But generally this is probably going to be rather system-specific, but if you're using gcc why not look at the HOWTO or the gcc manual?

Hey,
I will post some code I have.

Code:

// mlpc1.c // C language source file for mixed language programming example #include <stdio.h> void UpperCase(char *Str); int main() { char UserString[20]; fputs("Enter a string: ",stdout); fgets(UserString,19,stdin); fputs("\nYou entered: ",stdout); fputs(UserString,stdout); fputs("\nAfter call to UpperCase this becomes: ",stdout); UpperCase(UserString); fputs(UserString,stdout); fputs("\n",stdout); }

Code:

# mpla1.s # assembly language source file for mixed language programming example .data .text .global UpperCase UpperCase: push %ebp mov %esp,%ebp push %esi push %eax mov 8(%ebp),%esi # make esi point to the string UCLoop: movb (%esi),%al cmp $0,%al je UCExit andb $0xdf,%al movb %al,(%esi) inc %esi jmp UCLoop UCExit: pop %eax pop %esi pop %ebp ret .end

I don't get how it works...

So what is difficult about this? This is exactly the same as you writing a C program in two .c files, except you wrote one of your .c files in assembler.

in the Uppercase function.
i don't get why the register
%esi or %eax was pushed
I have just written a simple function to add 2 numbers. And didn't have to push %eax. I could use them directly since they are "global", right?
So when do we push registers?
I know push means returning push a variable to the stack, but I don't really get a pop.

When you pop something off the stack, the esp is subtracted by 4 bytes.
but that content you popped off, is it saved somewhere? What is the difference between it and ret?

Quote:

Originally Posted by Eman

in the Uppercase function.
i don't get why the register
%esi or %eax was pushed
I have just written a simple function to add 2 numbers. And didn't have to push %eax. I could use them directly since they are "global", right?
So when do we push registers?
I know push means returning push a variable to the stack, but I don't really get a pop.

When you pop something off the stack, the esp is subtracted by 4 bytes.
but that content you popped off, is it saved somewhere? What is the difference between it and ret?

Push stores a register value to the stack so that it can be restored later.
Pop gets the value from the stack and places it back in the register.

Of course you have to be careful with this... if you push A B C you need to pop C B A to restore the registers correctly. Also pushing something you do not later pop will cause a misalignment between stack and data...

so basically if it the data we have in the register is to be used later, we only push.
Ok, I will go try and do some experimenting on this.
Thanks.

Quote:

Originally Posted by Eman

so basically if it the data we have in the register is to be used later, we only push.
Ok, I will go try and do some experimenting on this.
Thanks.

Did you read the part about using Push and Pop with caution and in matched pairs?

Quote:

Originally Posted by CommonTater

Did you read the part about using Push and Pop with caution and in matched pairs?

yeah that does that have something to do with the cdecl format?
the parameters are popped off in the order in which they are pushed?

One thing that is confusing me, when do we push to the stack?
Before the function is called, or after?
Like in
UpperCase
ebp is pushed to the stack. Does that make it a local?

Quote:

Originally Posted by Eman

yeah that does that have something to do with the cdecl format?
the parameters are popped off in the order in which they are pushed?

One thing that is confusing me, when do we push to the stack?
Before the function is called, or after?
Like in
UpperCase
ebp is pushed to the stack. Does that make it a local?

I'm pretty sure you've been linked to this page before. Have you ever tried actually reading that page? (Especially the part where they tell you why you have to push ebp, and why you should push any registers you intend to use in the function.) Start at "calling a __cdecl function".

read it.
cool thanks. Hopefully, I won't be back ;) (doubt it)

em, so
andb 0xdf, %cl
ands 223 with the lower case char.
How can we know what figure to use? Because if this was in the exam I would have failed it instantly. I just played with anding, and it does indeed convert it to upper.
if it was upper to lower, how would i do it? What would be my mask?
Is there an easy of figuring it out?
Thanks

Code:

Uppercase E = 69 = 0b01000101 Lowercase e = 101 = 0b01100101

so i have to or it by 0x20
E | 0x20..
that wouldn't work for everything :(

Quote:

Originally Posted by Eman

so i have to or it by 0x20
E | 0x20..
that wouldn't work for everything :(

Why not?

Because if i had
Z 90 = 0b 0101 1010
z 122 = 0b 0010 0000

0b 0101 10101 | 0x20 would give me something else, not 122.
EDIT:
i think i made mistake in the calculation. brb

90 is 0b01011010, true. 122, however, is 0b01111010, not 0b00100000 which is just 32.

Yeah it worked.
I am going to try the reverse, convert from lower to upper. I don't understand why we AND, and not OR.

I just did it. And I just had to do
a&~0x20...where did 0xdf come from, it works but why not use ~0x20? :S

Quote:

Originally Posted by Eman

I just did it. And I just had to do
a&~0x20...where did 0xdf come from, it works but why not use ~0x20? :S

If you can spot the difference between ~0x20 and 0xdf, then you're a better man than I Gunga Din.

As to why the difference between and and or, think about what needs to happen for five seconds.

~0x20 is a negative number -224
0xdf is a positive 223.
btw, who is Gunga Din?

AND is used to clear, while OR was used to set? (all this binary...)

Quote:

Originally Posted by Eman

~0x20 is a negative number -224
0xdf is a positive 223.

Um, no. ~0x20 = 0b 1101 1111, while 0xdf is 0b 1101 1111. (However, if I'm not mistaken, you can type 0xdf in an assembler program, while you can't type ~0x20. I might be wrong there though.)

Quote:

Originally Posted by Eman

btw, who is Gunga Din?

Gunga Din - Wikipedia, the free encyclopedia

Quote:

Originally Posted by Eman

AND is used to clear, while OR was used to set? (all this binary...)

So, you do know what and means, right? And you do know what or means, right?

Quote:

Originally Posted by tabstop

Um, no. ~0x20 = 0b 1101 1111, while 0xdf is 0b 1101 1111. (However, if I'm not mistaken, you can type 0xdf in an assembler program, while you can't type ~0x20. I might be wrong there though.)

i should really use a calculator.

Quote:

So, you do know what and means, right? And you do know what or means, right?

em, what i have said? lol i don't know what you want me to see :(

Quote:

Originally Posted by Eman

i should really use a calculator.

em, what i have said? lol i don't know what you want me to see :(

"and" means "both have to be true (or 1)". "or" means "only one has to be true (or 1)". So you should be able, at some point, to see the difference between 0b 0101 1001 AND 0b 0010 0000, and 0b 0101 1001 OR 0b 0010 0000. And the same for working with 0b 1101 1111.

Quote:

Originally Posted by tabstop

"and" means "both have to be true (or 1)". "or" means "only one has to be true (or 1)". So you should be able, at some point, to see the difference between 0b 0101 1001 AND 0b 0010 0000, and 0b 0101 1001 OR 0b 0010 0000. And the same for working with 0b 1101 1111.

Code:

0101 1001 &0010 0000 0000 0000==0 0101 1001 | 0010 0000 0111 1001==121 0101 1001 1101 1111 & 0101 1001==89 gives me original number 0101 1001 | 1101 1111 1101 1111==0xdf gives me the mask

There is a difference alright. But what are you trying to tell me?

Quote:

Originally Posted by Eman

Code:

0101 1001 &0010 0000 0000 0000==0 0101 1001 | 0010 0000 0111 1001==121 0101 1001 1101 1111 & 0101 1001==89 gives me original number 0101 1001 | 1101 1111 1101 1111==0xdf gives me the mask

There is a difference alright. But what are you trying to tell me?

You asked why we use and in some circumstances, and or in others. I am attempting (and apparently failing) to show you the difference between them so that you know why we use and sometimes, and or other times.

tbh from that. All I can say is you could use the bitwise & to check if every bit is 0. Or to retain the original bits.
And the OR is used to set individual bits.

I wrote this code, and i have been getting seg fault. Can you help me spot what is wrong.I have been at it for half an hr. Ty

Code:

#include <stdio.h> void cstring(char *s, char *d) ; int main() { char source[]="Hello World" ; char dest[12] ; printf("%s\n", source) ; cstring(source, dest) ; //printf("After function call: \n") ; // printf("%s\n", dest) ; //fputs(source, stdout) ; return 0 ; }

Code:

.section .data .section .text .global cstring cstring: push %ebp mov %esp, %ebp push %esi push %edi push %eax mov 12(%ebp),%esi #esi to point to source mov 8(%ebp),%edi # edi to point to dest copyLoop: cmpb $0, (%esi) je exit movb (%esi),%cl movb %cl, (%edi) inc %esi inc %edi jmp copyLoop exit: mov %ebp, %esp pop %eax pop %edi pop %esi pop %ebp ret .end

Since arguments are pushed right-to-left, the one on the "top" of the stack (nearest ebp) is the first argument.

so it should be
movl 8(%ebp),%esi to point to source.
I tried that, i still got the same error

Also:

Code:

mov %ebp, %esp

means "SAY GOODBYE TO THE STACK!!!!!" The whole purpose of the pops is to bring esp down while moving the clobbered registers back to what they're supposed to be. If you fiddle with esp, now you're popping the old ebp into eax, the eip (i.e., where to return to from this function) into edi, probably the [11] byte of one of your strings (depending which one was loaded where) into esi, the [10] byte of that string into ebp [eep!] and then using the [9] byte of the string as the return address to jump to with ret. And that's not going to be pleasant.

wow it worked, very cool :) Thanks!

I don't get it. The book I am using used it, and some tutorials.
It says it frees the stack by restoring it to its original value.

Quote:

The whole purpose of the pops is to bring esp down while moving the clobbered registers back to what they're supposed to be.

do you mean the bring the esp up, as in esp+4 on every pop?

Quote:

If you fiddle with esp, now you're popping the old ebp into eax, the eip (i.e., where to return to from this function) into edi, probably the [11] byte of one of your strings (depending which one was loaded where) into esi, the [10] byte of that string into ebp [eep!] and then using the [9] byte of the string as the return address to jump to with ret. And that's not going to be pleasant.

what? haha how does that happen by simply moving %ebp to %esp, crazy stuff!

What the book is (probably) doing is getting rid of its own stack variables, which you didn't have any thus making any attempt to do so misguided.

When you do a pop, it looks at where esp is (since esp is the last/"top" thing on the stack). If you fiddle with esp and then start popping, you're going to pop from where esp thinks your stack is, not where it actually is.

but how can you tell that ebp would now be ebp and eip in edi and so on, that's really cool.

How can i return an actual value from a function? They have been void so far. unfortunately the ebook didn't cover that.
i would like to compare a strings. if they are equal return 0, else 1. but ret seems be for the return address in %eax, which i thought the eip dealt with

Generally you're expected to place the return value in %eax before you go back. eip is just an instruction pointer -- it just says "when you go back, this is the next line of code that needs to run".

eip is the address that should be returned to, isn't it?
what ret uses to return?
if the return value is to be put into %eax, then popping %eax of the stack would overwrite the return val, which means i can't pop it off?
Thanks a lot for your help.

Quote:

Originally Posted by Eman

eip is the address that should be returned to, isn't it?
what ret uses to return?
if the return value is to be put into %eax, then popping %eax of the stack would overwrite the return val, which means i can't pop it off?
Thanks a lot for your help.

I ... that's true. What that means is that you should never push %eax on the stack in the first place. The whole point of pushing registers on the stack is so they can be restored later, so that the calling function doesn't lose data. Again, it's conventional as to which registers need to be restored and which don't. For instance, you've been using the c register and not restoring it; but most conventions specify that function calls can clobber the c register and get away with it. (Usually you can get away with using the d register too I understand.) But again: this is all convention, meaning that whatever the calling function and the called function agree on, works.

You can push eax onto the stack if you remember to pop it off before putting the return value in there, or if you fake the pop with stack pointer arithmetic, like "add %esp, $4", to bump the stack pointer up 4 bytes and skip over the value of eax you pushed. This sort of arithmetic is often done as a quick way to clean up the stack at the end of a function too, but you have to make sure you count your stack usage carefully.

Since you pushed eax onto the stack at the start of cstring, you would want to pop it off near the end to keep your stack aligned, or add 4 to esp skip over the value you pushed on the stack. The more likely case of pushing eax onto the stack however, seems to me to be that the calling code had a value in eax it wanted to preserve, so before pushing on all the function parameters, it pushed eax to save the value during the function call. It would then do whatever it needs to with the return value that is in eax when the function is done, then pop the old eax off the stack for use.

Here are a couple of interesting sites with some pretty good information on x86 registers and their usage, though neither really cover eax as a return value:
x86 Registers
The Art of Picking Intel Registers

Hey!
After an hour of sweating it out, I finally figured out how to do it. I didn't push or pop it at all, although i got some problem with it, that only the function and exit could write to it. Other labels couldn't write values to it :S

Quote:

The more likely case of pushing eax onto the stack however, seems to me to be that the calling code had a value in eax it wanted to preserve, so before pushing on all the function parameters, it pushed eax to save the value during the function call. It would then do whatever it needs to with the return value that is in eax when the function is done, then pop the old eax off the stack for use.

I was the one that pushed eax to the stack. Since that didn't work, I just didn't do it again.
So if i returned control back to main. How would I pop %eax, because I would be in the C code then.
This is what I got so far. appears to be working

Code:

#include <stdio.h> int cmpString(char *s1, char *s2); int main() { char string1[]="Hello World" ; char string2[]="Hello World" ; if(!cmpString(string1,string2)) { printf("The strings match!\n") ; } else { printf("The strings do not match!\n") ; } return 0 ; } /*c function to compare strings int cmpString(char *s1, char *s2) { while((*s1!='\0')) { if(!(*s1==*s2)) { return 1; } s1++ ;s2++ ; } return 0; } */ Assembler code to compare string. .section .data .section .text .global cmpString cmpString: push %ebp movl %esp, %ebp pushl %esi #save contents of esi pushl %edi #save contents of edi pushl %ebx #save contents of ebx movl 12(%ebp), %esi #point to s2 movl 8(%ebp), %edi#point to string1 cmpLoop: cmpb $0, (%esi) je exit movb (%esi),%bl cmpb %bl, (%edi) movl $1, %eax jnz exit incl %esi incl %edi movl $0, %eax jmp cmpLoop exit: popl %ebx popl %edi popl %esi popl %ebp ret .end

You don't want to pop eax. If you mean "what is main going to do with %eax", then it looks like main is going to do jump-if-zero based on the value of %eax (it would probably have to do a cmp first I would guess). If you had done some_variable=cmpString(), it would have mov'd %eax to the appropriate location in memory. And so on. The registers are not the same as the stack, and the registers are not on the stack, so trying to pop them doesn't really make sense.

If you had written the calling code in assembler, then yes: if you had something in eax that you wanted, you would have to put it somewhere (you could push it on the stack, or you could write it to memory, or whatever). But what you do with it is on the caller -- the called function has no idea about any of that and can't worry about it in any way, shape, or form.

The cardinal rule is that, whatever you do to the stack and registers upon entering the function, you do the reverse upon leaving.

You wouldn't pop eax in any explicit fashion since you're writing C code. Some compilers support in-line assembly, but that's a non-standard feature. The compiler may do something like this automatically to your C code under certain circumstances. Also, if you wrote an assembly function that called another, you may need to write such code.

By the way, your code doesn't actually work, at least not quite. What happens if string1 is "Hello World" and string2 is "Hello"?

oh yeah i see what you mean. I forgot to check if they were the same length :( .

When i push registers into the stack. They are saved below the ebp, and themselves aren't they? That is why i have to pop them off in reverse.

Quote:

Originally Posted by Eman

oh yeah i see what you mean. I forgot to check if they were the same length :( .

Don't check if they're the same length. You can write your code such that it fails if it gets a null in one string before the other one, even if they match up to that point.

Quote:

When i push registers into the stack. They are saved below the ebp, and themselves aren't they? That is why i have to pop them off in reverse.

I'm not sure what you mean by "below the ebp, and themselves", but since you are the one in control of the pushing (in your function), they will be pushed on the stack whenever you say so. You could do this before pushing the ebp, but that would violate the convention, and is not recommended. Since you push the ebp first in your example, the old ebp goes on the stack, then the stack pointer is moved down 4 bytes, then you push the other registers, so they will be below the ebp, but nothing can be below itself.

why, is checking the same length bad. I could just the strlen, before I call the function. could maybe reduce overhead?
Below each other i mean.
push %ebp,%edi, %esi, %eax:
ebp
edi
esi
eax //then if i have locals
2
3
4
?
although reading the first local won't be
ebp-4
it would be ebp-16..
So, I am wondering when you push registers, is that what happens? ( in this scenario)

Quote:

Originally Posted by Eman

why, is checking the same length bad. I could just the strlen, before I call the function. could maybe reduce overhead?

That would increase overhead, not reduce it. One: you'd need to store the lengths somewhere (which is literal overhead). Two: You'd have to walk each string twice (once to find the length, and then once more to compare) (which is not literal overhead, but just extra work).

Quote:

Originally Posted by Eman

Below each other i mean.
push %ebp,%edi, %esi, %eax:
ebp
edi
esi
eax //then if i have locals
2
3
4
?
although reading the first local won't be
ebp-4
it would be ebp-16..
So, I am wondering when you push registers, is that what happens? ( in this scenario)

If you push registers, then yes those values go on the stack at that appropriate place. You can do it before or after locals, it doesn't make a difference (as long as you get them off in the right order). (I believe it's customary for the registers to go first, though, as you have it.)

oh, I had the idea of comparing it directly. But walking the string twice sucks :(
Learnt another thing today.

Quote:

If you push registers, then yes those values go on the stack at that appropriate place. You can do it before or after locals, it doesn't make a difference (as long as you get them off in the right order). (I believe it's customary for the registers to go first, though, as you have it.)

Thanks tabstop and Anduril. Now, I can go to the exams without dreading some Assembly :D

time to study some boring Databases. Ha, why is there no section on it? :( xD
thanks guys.

Did you fix your compare algorithm yet? Here is a little pseudo code if not:

Code:

cmpString: # regular function startup code # default eax to 0 cmpLoop: # compare esi to edi and if they differ, exit the loop and go to stringsNotEqual # check string1 for a null. we know the strings match up to this point # so if one ends, they both end and we can safely jump to the exit label # increment esi and edi # jump back to the top of cmpLoop stringsNotEqual: # set eax to 1 exit: # do all your regular exit code

This way, we account for the length of different strings because the null in one string will compare not equal to whatever is in the other string, and we will exit the loop. We also avoid resetting eax to 1 then to 0 in every iteration of the loop.

I haven't fixed it yet. Thanks for the pseudocode.
Yeah, I tried this method before. Not resetting %eax everytime but it wouldn't let me do it.
I think i mentioned it earlier. It wouldn't let me use %eax in a label I called
isEqual, there was always some random value in it. except for the function and exit.
I will try it again and see what happens.