Have you tried setvbuf? This lets you control the buffering done. Normally line buffering is the default, but in theory full buffering with the size of the buffer the same as the disk block size should be the most efficient.
Code:
int main(void)
{
char mybuf[4096];
setvbuf(stdin, mybuf, _IOFBF, 4096);
int c;
while ((c = fgetc(stdin)) != EOF) {
putc(c, stdout);
}
return 0;
}
Also when you do timings you need to have a large test case to eliminate noise. You say your test is at 0.01 secs, well, that's probably mostly noise and meaningless. Why not try a test data with about 10-20Gigabytes for example and see how long it takes with full-buffering vs. the default.