The default stack size on linux is actually 8 MB, but you can get away with setting this much lower (even, 512 KB), so your stacks should really be something more comparable. Which brings me to the important point -- they shouldn't be taken from main's stack. They should more properly be malloc'd on the heap.
On the topic of stacks, I'm sure that your counting problem in the synchronized version is due to fact that the data in both stacks is identical except for the value of an internal counter (eg, with my code, it would be the "i"), so you have developed your code working from the empirical results of that (because you only actually have one stack, and thread A's local variables are actually also thread B's local variables).
Like I said, think logically about how your count accumulates (and hopefully, check with your prof about posting your code specifically on this forum! Life is much easier that way, less need for guessing).