Well, I've analysed the code profiling data. All functions finish in about 7.38 sec. But the memory allocation/freeing that's implemented by macros has not been measured. So, I guess that's were the problems is.
Somehow the memory management of gcc+cygwin1.dll is much faster.