1. ## A question about handling Fourier transform results

Hey y'all,

Been years since I've posted here. Anyway, let's see if anybody can help me take a crack at this problem I'm having. It's programming related, but not code related nor C++ related (I'm actually using Matlab for this specific task), so I figured the GD board would be appropriate.

I have an application that measures a force transducer at a 500 Hz sampling rate. Here is an example image of 5 seconds of data (2500 samples) from the force transducer:

Now, to give you an idea of what I am looking at, each 5-second sampling period is a "trial", and I have thousands of trials. Literally thousands. I would like to examine the frequency spectrum of these trials, but more specifically I really only want to look at the frequency spectrum while the force is above 35 grams (in other words, while something is actually happening).

This is easy enough. I can very easily dissect out all portions of every trial where the force is above 35 grams, run each of these through a Fourier transform, and get everything in the frequency domain. For the sake of brevity and clarity, from now on I will refer to each of these "above-35-gram-segments" as a "pull" (i.e. I am applying a pull force on the force transducer).

The difficult part comes when I want to average these together to get an idea of what the "average frequency spectrum" looks like. Because each pull is a different duration, the frequency binning of the FFT will be different. A 500 Hz signal with 500 samples will have FFT bins of 1 Hz each. A 500 Hz signal of 100 samples will have FFT bins of 5 Hz each. So on and so forth. For the sake of accuracy, I am establishing a lower-bound of 200 ms pulls (5 Hz frequency bins in the resulting FFT). There is no upper bound, so a 500 Hz signal with 1000 samples could have 0.5 Hz frequency bins.

Anyway, I know how to convert from FFT bins to real frequencies, that's not difficult. But what I don't know is how to do it a semi-continuous manner.

For example, let's say I have a 112 sample pull (4.46 Hz frequency bins). I'd like "stretch" or "compress" the FFT result to fit on a scale of 1-50 Hz (I don't really need anything above 50 Hz, so I'm just gonna trash anything above). In this case I presume that the proper method would be to take the amplitude of the first bin (which represents from 0 Hz to 4.46 Hz), and divide the amplitude by the binning amount. The resulting amount of power would then be evenly distributed to the real frequencies:

Real 0 Hz = FFT Bin 1 * (1/4.46)
Real 1 Hz = FFT Bin 1 * (1/4.46)
Real 2 Hz = FFT Bin 1 * (1/4.46)
Real 3 hz = FFT Bin 1 * (1/4.46)
Real 4 Hz = FFT Bin 1 * (0.46/4.46) + FFT Bin 2's contribution to this frequency

This seems like it could get very complex very fast, and there is a lot of room for error.

Am I off my rocker trying to do this? I can't find any examples of people doing this on the internet, basically I only see examples of people doing conversions of single bins to single frequencies, and back.

I would like to scale my FFT result to the corresponding real frequency spectrum all while keeping the POWER constant (the power shouldn't change as a result of doing this operation), so that I can average the results of all of these FFTs together.

SO, if any of you have any suggestions on how to approach this problem, I'd love to hear your ideas!

2. >_<

I'm so frustrated. I've spent ten minutes trying to find an appropriate clip of "Futurama" to make a joke about my poor relationship with mathematics, but I just can't seem to find the right moment, yet I've already spent too much time on a silly joke so can't feel right if I don't share the splendor of Bender .

[Professor Fansworth activates a projector showing several equations.]
Bender: What'd ya got there? Numbers?

Soma ;_;

Originally Posted by DavidP
I really only want to look at the frequency spectrum while the force is above 35 grams
Are you doing this correctly? It sounds like you are not.

You see, if you simply keep the samples that exceed 35 grams, and concatenate them together as if they formed a continous sampling run, a discrete Fourier transform of that data does not reflect the correct spectrum, because time-wise the samples kept are not consecutive.

Instead, you should just zero out all samples below 35 grams. This way they do not affect the spectrum. (If you feel unsure about that, just consider the DFT definition (Wikipedia). Zero samples xn have no effect on the result, as if they were never even considered. As long as the nonzero samples indexes n are unchanged, that is.)

This also means all your sample runs will have the exact same number of samples, and the exact same duration per sample, and thus discrete transforms with the exact same binning.

If the timing of the sample run is not perfect (start signal not perfect with respect to whatever is done to the measured object), I would also shift each sampling run time-wise by moving all leading zeros to the tail, so that the very first sample is the first sample that exceeded 35 grams in the run. This does not affect the power frequency spectrum (magnitudes of the complex number describing each frequency), but it does affect the phase of the transform (phase of the complex numbers) -- it basically synchronizes the phases, since the data always starts at the first effect seen. It should make the later comparisons between sample runs easier.

4. Originally Posted by phantomotap

[Professor Fansworth activates a projector showing several equations.]
Bender: What'd ya got there? Numbers?

Soma ;_;
One of my lecturers does that.

I don't really care whether that is what most mathematics lecturers do but really if you *are* teaching math you need a pen and a whiteboard

5. Nominal Animal,

Just because the force is below 35 grams doesn't mean it is zero. Certainly I could zero-out everything that is below 35 grams and take an FFT of the entire 2500 samples, but that is actually not what I'd like to do for my purposes.

I also don't want to keep each "pull" from a trial together in one FFT. I'd like to separate them because I'd like to look at their frequency spectrums individually. Take this "trial" for example, which has two "pulls":

It may even have 4 pulls, depending on if those dips drop below 35 grams or not. Anyway, that's besides the point.

One thing that I would like to investigate is whether pulls of different durations have changes in their frequency spectrums, i.e. if I am pulling for a longer amount of time, my muscle could begin to fatigue, and this could introduce more high-frequency tremor into the signal. Therefore, it would be interesting to look at very long pulls vs shorter pulls and see if there is more power in certain frequency bands associated with that.

6. Originally Posted by Aslaville
if you *are* teaching math you need a pen and a whiteboard
I'm old school. I like chalk and a blackboard.

7. Originally Posted by Nominal Animal
Instead, you should just zero out all samples below 35 grams. This way they do not affect the spectrum. (If you feel unsure about that, just consider the DFT definition (Wikipedia). Zero samples xn have no effect on the result, as if they were never even considered. As long as the nonzero samples indexes n are unchanged, that is.)
Ah crap, just figured out that you're right. I could dissect out each pull above 35 grams, and then simply zero-pad them all to each be 2500 samples.

8. Originally Posted by DavidP
I also don't want to keep each "pull" from a trial together in one FFT.
So you mean, the first sample that exceeds 35 grams actually starts a new event, which you want to handle separately? You just happen to have the data chunked in five-second chunks, where no pull event crosses the five-second boundary?

No problem. Cut each sequence delimited by < 35 gram samples (not including the delimiting samples) to a new dataset, and pad with zeroes to the maximum length (of an event), and use a fixed-size DFT.

I've used FFTW from C, and if you gather wisdom first for the window size -- that is, precalculate the optimum method to compute each DFT -- the DFTs are very, very fast.

I think I could write you an example implementation, if you're interested, although I don't use Windows. I prefer Linux, although the code should work just fine on a Mac, too. I can write it so that it should compile in Windows, but I cannot test or guarantee it. It would be trivial for the program to consume the entire dataset at once, do the windowing (event detection), and poop out each event as a separate text file containing the source description (which file and which samples taken), plus the DFT, of course.

Originally Posted by DavidP
One thing that I would like to investigate is whether pulls of different durations have changes in their frequency spectrums, i.e. if I am pulling for a longer amount of time, my muscle could begin to fatigue, and this could introduce more high-frequency tremor into the signal.
Quite interesting. Do you have the associated applied power data, known or sampled reliably in tandem with the force measurements?

Based on the materials physics alone, I'd check carefully the high-frequency characteristics of consecutive pull events with the same applied power curve. Without knowing the materials involved, it is difficult to even guess whether your sample rate is in the right ballpark; if this is some sort of bimetal thingamajic, the fatigue-related high-frequency vibrations could be several orders of magnitude higher than your sample rate. You could consider adding an acoustic sensor -- say, a piezo -- measuring those in parallel, if this happened to be the case, just to check if such frequencies are present. For soft materials, the frequencies are much lower, of course.

9. Originally Posted by DavidP
Ah crap, just figured out that you're right. I could dissect out each pull above 35 grams, and then simply zero-pad them all to each be 2500 samples.
Right. I didn't initially understand that you had multiple separate events in each file.

A simple program (even some crafty awk scripts) could do that for you automatically, if you happen to use a nice OS.

10. Man, Fourier transforms are one of the things I immediately forgot from school. I swear, the only thing I remember from physics is linear algebra and bra-ket stuff but that's only because I could use it as a vehicle for jokes.

"Professor, I just thought I'd let you know that I've never had trouble with a bra until I started your course in quantum mechanics."

11. This is a very difficult problem because it is not defined very well. The very idea of a spectrum depends on the idea of periodicity. These events are clearly not periodic. Therefore, any idea of "spectrum" you construct will be artificial to a great degree.

If you are not interested in phase information, but merely want to compare the spectral shape between pulses, then I would just do each pulse at a different FFT size, compute the magnitude spectra, and then interpolate the data to normalize the bin locations.

EDIT: Reading your later post, you probably have no use for the phase information in the pulses, so my suggestion will probably work well for you.

12. Originally Posted by brewbuck
This is a very difficult problem because it is not defined very well. The very idea of a spectrum depends on the idea of periodicity. These events are clearly not periodic. Therefore, any idea of "spectrum" you construct will be artificial to a great degree.
I disagree.

Consider the concept of wave packets. They are by definition nonperiodic, localized to a small region.

Each wave packet consists of an infinite set of component sinusoidal waves (which DO extend to infinity, even if the wave packet itself is localized), whose amplitudes and phases form the spectrum of the wave packet. Although the spectrum is technically infinite, it too is "localized" to a small region of frequencies (due to the symmetry between Fourier and inverse Fourier transforms).

These wave packets and their spectra are very useful in physics. Not just in quantum physics, either; also for materials physics in stuff like lattices (vibrations, acoustic waves, etc.), phonons, and so on.

Here, the application of power rapidly deforms the "muscle" material. At the molecular and atomic level, the fastest changes are related to the highest frequencies extant in the wave packet spectrum. Thus, to me, it sounds very sensible and physically motivated, to measure the spectrum of the wave packet, and compare successive spectra to find out what kinds of effects repeated activation has on the performance of the material.

However, I do agree with you that the straightforward discrete Fourier transform of each data set does not exactly represent the wave packet spectrum. (I think. I haven't actually examined the math related to discrete Fourier transforms of (discrete-time) wave packets.) Exactly how it should be calculated, and what the effects of the various choices are, are an interesting topic itself -- similar to windowing choices when dealing with periodic signals.

For the task at hand, the un-interpreted raw dft results might be sufficient for estimating the repeated performance characteristics of the device, as the intent is to qualitatively and quantitatively compare the events, not to physically describe each individual event as accurately as possible.

13. Originally Posted by Nominal Animal
Each wave packet consists of an infinite set of component sinusoidal waves (which DO extend to infinity, even if the wave packet itself is localized), whose amplitudes and phases form the spectrum of the wave packet. Although the spectrum is technically infinite, it too is "localized" to a small region of frequencies (due to the symmetry between Fourier and inverse Fourier transforms).
We are talking here about a discrete Fourier transform, which absolutely assumes periodicity of the signal. When you extract one of these pulses and compute the FFT, the result is what the spectrum would look like if that signal was repeated infinitely. It's not exactly unrepresentative of the signal, but it's not the actual spectrum of that pulse alone.

A wave packet has a spectrum indeed. You just aren't going to compute it using an FFT.

14. Originally Posted by brewbuck
We are talking here about a discrete Fourier transform, which absolutely assumes periodicity of the signal. When you extract one of these pulses and compute the FFT, the result is what the spectrum would look like if that signal was repeated infinitely. It's not exactly unrepresentative of the signal, but it's not the actual spectrum of that pulse alone. A wave packet has a spectrum indeed. You just aren't going to compute it using an FFT.
No. You can use a discrete Fourier transform (DFT) to sample the discrete-time Fourier transform (DTFT) of a finite signal, too. You're claiming that DFT can only be used to exactly describe the discrete-time Fourier transform of a periodic signal.

As weird as it sounds, a discrete Fourier transform provides the frequency-domain coefficients for both a periodic and nonperiodic signals.

You don't need to take my word for it, you can easily verify this yourself numerically. (I'm not a mathematician, and might get the terms wrong now and then, and that's why I prefer practical proof over theoretical mumbo-jumbo.)

Generate some N samples f0,f1,...,fN-2,fN-1.
Apply DFT to get coefficients F0,F1,...,FN-2,FN-1. Remember that if f is real, FN-k=Fk*, where * indicates complex conjugate.
Append N(M-1) zeros to f to get zero-padded sample sequence g0=f0,...,gN-1=fN-1, gN=0,...,gNM-1=0.
Apply DFT to get coefficients G0,...,GNM-1. Again, if f was real, then g is real too, and GNM-k=Gk*.
You will find that GMi=Fi.

The mathematical reason for this is that although DFT describes the discrete-time Fourier transform of a periodic signal, it also provides the uniformly sampled samples of a continuous discrete-time Fourier transform of a finite-length sequence. (For example, the discrete Fourier transform Wikipedia page mentions this exactly.)

We can rely on the Nyquist-Shannon sampling theorem, to tell us that these samples are enough to describe the frequency content in the original nonperiodic signal.

We could approximate the DFT of a N-sample wave packet followed by N(M-1) zeros (both N and M being positive integers) by taking the DFT of the N samples, and then resampling that to NM samples (essentially treating the complex DFT coefficients as time-domain data for the purposes of resampling).

Thus, a DFT of a nonperiodic signal does describe the spectrum of that signal. The logic path is a bit different than describing the spectrum of a periodic signal; the math just turns out exactly the same.

It took me a while to write this answer, because I wanted to make sure I'm not talking out of my butt. Here is an example C program (uses the fftw3 library) I used for verification:
Code:
```#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <stdio.h>
#include <math.h>
#include <complex.h>
#include <fftw3.h>

static uint64_t prng64(uint64_t x)
{
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
return x * (uint64_t)2685821657736338717ULL;
}

static double f_noise(const double v)
{
uint64_t x = (uint64_t)(1099511627776.0 * v);
int n = 16;

while (n-->0)
x = prng64(x);

return 1.0 - (x & (uint64_t)2199023255551ULL) / 1099511627776.0;
}

static double f_default(const double x)
{
const double r = x * 3.14159265358979323864;
return sin(r) * sin(20.0*r);
}

static double f_sawpulse(const double x)
{
const double w = 16.0 * x;
return x*x*(x*x - x - x + 1.0) * 32.0 * ((w - (long)w) - 0.5);
}

static double f_pulse(const double x)
{
const double r = x * 3.14159265358979323864;
return sin(r)*sin(r) * sin(15.1*r);
}

int main(int argc, char *argv[])
{
double      (*sample)(const double) = f_default;
double       *s1, *sn;
fftw_complex *c1, *cn;
fftw_plan     p;
long          n, w, c, i;
char          dummy;

if (argc < 3 || argc > 4) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s SAMPLES WAVEPACKETS [ WAVEFORM ]\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "Waveforms: (t=0..1)\n");
fprintf(stderr, "       default    sin(pi*t)*sin(20*pi*t)\n");
fprintf(stderr, "       pulse      sin(2*pi*t)*sin(2*pi*t)*sin(15.1*pi*t)\n");
fprintf(stderr, "       sawpulse   t*t*(16*t*t-32*t+16)*((16*t) mod 1.0)\n");
fprintf(stderr, "       noise      Xorshift*-based noise\n");
fprintf(stderr, "\n");
return EXIT_FAILURE;
}

if (argc > 3) {
if (!strcmp(argv[3], "sawpulse"))
sample = f_sawpulse;
else
if (!strcmp(argv[3], "noise"))
sample = f_noise;
else
if (!strcmp(argv[3], "pulse"))
sample = f_pulse;
else
if (!strcmp(argv[3], "default"))
sample = f_default;
else {
fprintf(stderr, "%s: Unknown waveform.\n", argv[3]);
return EXIT_FAILURE;
}
}

if (sscanf(argv[1], " %ld %c", &n, &dummy) != 1 || n < 1L) {
fprintf(stderr, "%s: Invalid number of samples.\n", argv[1]);
return EXIT_FAILURE;
}
if (sscanf(argv[2], " %ld %c", &w, &dummy) != 1 || w < 1L || w >= n) {
fprintf(stderr, "%s: Invalid number of wave packets.\n", argv[2]);
return EXIT_FAILURE;
}

if (n % w) {
fprintf(stderr, "The number of samples is not a multiple of wave packets.\n");
return EXIT_FAILURE;
} else
c = n / w;

s1 = fftw_malloc((size_t)c * sizeof *s1);
sn = fftw_malloc((size_t)n * sizeof *sn);
c1 = fftw_malloc((size_t)c * sizeof *c1);
cn = fftw_malloc((size_t)n * sizeof *cn);
if (!s1 || !sn || !c1 || !cn) {
fprintf(stderr, "Not enough memory.\n");
return EXIT_FAILURE;
}

/* Initialize one wave packet to s1 and sn. */
for (i = 0L; i < c; i++)
s1[i] = sn[i] = sample((double)i / (double)c);

/* sn contains zeroes following the first wave packet. */
for (i = c; i < n; i++)
sn[i] = 0.0;

/* Clear c1 and cn. */
for (i = 0; i < c; i++)
c1[i] = 0.0;
for (i = 0; i < n; i++)
cn[i] = 0.0;

p = fftw_plan_dft_r2c_1d(c, s1, c1, FFTW_ESTIMATE);
fftw_execute(p);
fftw_destroy_plan(p);

p = fftw_plan_dft_r2c_1d(n, sn, cn, FFTW_ESTIMATE);
fftw_execute(p);
fftw_destroy_plan(p);

/* Fill in c1 and cn. */
for (i = 1L; i < c-i; i++)
c1[c - i] = conj(c1[i]);
for (i = 1L; i < n-i; i++)
cn[n - i] = conj(cn[i]);

printf("#i f[i]   |F%ld[i]| Re(F%ld[i]) Im(F%ld[i])   |F[i]| Re(F[i]) Im(F[i])   i/%ld\n", w, w, w, w);

for (i = 0; i < n; i++)
if (i % w == 0)
printf("%.9f %.9f  %.9f %.9f %.9f  %.9f %.9f %.9f  %.9f\n",
(double)i / (double)n, sn[i],
cabs(cn[i]), creal(cn[i]), cimag(cn[i]),
cabs(c1[i/w]), creal(c1[i/w]), cimag(c1[i/w]), (double)(i/w) / (double)n);
else
printf("%.9f %.9f  %.9f %.9f %.9f\n",
(double)i / (double)n, sn[i],
cabs(cn[i]), creal(cn[i]), cimag(cn[i]));

return EXIT_SUCCESS;
}```
The sawpulse example wave packet is a sawtooth waveform modulated by a fouth-order polynomial,

Compare its DFT to the DFT when the sample count is tripled by padding with zeroes:

The above is just a detail around the first three frequency peaks, but all of the data matches the same way. (Sawtooth waveforms have those harmonics.)

The pulse example wave packet is 7.55 sine waves (15.1 halfwaves) modulated by one sine half-wave,

Compare its DFT to the DFT when the sample count is sevenfold (by adding six times as many zeroes to the wave packet samples):

There is only one peak (and its complex conjugate at the other end of the array) -- obviously centered on 7.55, the frequency of the sine wave used --, so the above view is centered on that. All of the data matches the same way.

The noise example wave packet is roughly white noise. If we square the number of samples, appending N(N-1) zero samples, we can see especially clearly how the DFT or the initial noise samples compare to the DFT of the zero-padded samples:

Now, increasing the number of samples -- even by adding zeroes -- increases the frequency sensitivity of the transform, and that's why the zero-padded functions' DFTs are smoother -- the noise being an extreme example.

Fortunately, the Nyquist-Shannon sampling theorem tells us that even if the DFT of the original N samples of the nonperiodic function yields a pretty crude visual curve, they do contain enough information to completely regenerate the original function (samples), and therefore does represent the spectrum of the original nonperiodic signal.

For example, if you took the N-sample DFT of some nonperiodic function, and resampled it to M > N samples (basically treating the DFT coefficients as time-domain complex data), a final IDFT would yield M samples where the initial N samples approximate the original function, and the M-N last samples are (near) zero. There would be a bit of distortion as is expected when resampling is done, but the frequency characteristics of the nonperiodic function remain.

15. No responses? Not even "oh, that's right"? I spent several hours to explain and set things straight, and nobody cares?

Duck you guys, I'm going for a beer. You're no fun.