But how come I've managed to see quite often a negative value returned from QueryPerformanceCounter()?

Couldn't answer to this one since i don't know the QueryPerformanceCounter() function and i don't want to look it out. I already heard about it, but i know one "easier" way to acces to clock cycle counters on IA-32 / Intel 64 architecture processor. QueryPerfomanceCounter looks too ankward for me.

Foxman correctly claimed before it will take about 40 days to a fast machine to reach to the end of a UINT64.

In fact, what i said is that it would take 40 days to a 2.5 Ghz machine to complete 2^53 clock cycles. It would take ~234 years for that same machine before completing 2^64 clock cycles.

Should I consider the option that the counter reached to the end of its cycle and turns negative all the sudden?...

Even if the counter had chance to overflow, it wouldn't make a difference. (if the number returned by the counter is a unsigned number)

That said, you might want to read a bit on how computer represent numbers and how it does arithmetic operation (that said, i have no good link to refer you; maybe you could look on wikipedia and/or google).

Basically, if you substract 2 unsigned integer, the result will always be equal to the difference between the 2 numbers.

This program may convince you (may not compile if you aren't using some Microsoft compilers):

Code:

#include <stdio.h>
int main()
{
unsigned _int16 u1 = 0x0010;
unsigned _int16 u2 = 0xFFF0; // Max value of a unsigned _int16 is 0xFFFF
unsigned _int16 r1;
r1 = u1 - u2;
printf("%04X - %04X = %04X\n", (unsigned int)u1, (unsigned int)u2, (unsigned int)r1);
return 0;
}

Max value for 64-bits unsigned integer is 2^64 - 1. Like max value of 8-bits unsigned integer is 2^8 - 1.