I'd do two things:
1. Use an oscilloscope/logic analyzer to view the waveform.
2. Use single bits to indicate different timing parameters, e.g.
This assumes that your freqeuncy analyzer is on Data0 of the parallell port (from memory, that's pin 2, but I could be completely wrong on that). Connect a scope to the Data1 (which would be pin 3 if above Pin 2 is correct) and see what it does and how it varies.
// Wait a bit more. This should not make effect, but makes.
I would also look at the variation on data0, as I suspect you'll find that _WITH_ the nanosleep, it will be more variation than without - because it calls schedule, which means that your current process gives up the rest of the timeslice, and only wakes up whenever the scheduler decides it has to run...