int64 is an ansi standard datatype. __int64 is the windows specific implimentation of that datatype. The code is only trivially windows dependant, i.e. change the datatype from __int64 to int64 and it is ansi standard.
The code gains a huge performance increase in teh root optimization versus calculating teh root every iteration. Converting the relatively slow sqrt() to a simple divide and compare. As for the value of using SoE versus checking all odd numbers it depends on several factors, but mostly the speed of performign the calculations entirely in registers and the L1 cache versus making frequent calls into L2 and eventually uncached memory. Once the primes you are testing exceed available L2 memory for storage of the divisors, you will essentialyl be runnign at teh speed fo primary memory, which is usually orders of magnitude slower than my code, which will always run at synchronous speeds. Aside from that this code is designed to illustrate a solution not to be entered into the hall of fame for most optimized code EVAR.