Hi all, strange title so better explain and start with voiceAI which I am interested in.

In terms of Linux near all audio DSP is standalone and practically always involves FTT -> function -> rFFT back to audio.
Often we have a chain of steps where audio is converted by FFT and back again to audio and for embedded devices from Pi's to ESP its a really inefficient chain of stages that don't have any modular opensource linux tools and tend to be either commercial or specific branded apps.

I was wondering if a simple gstreamer esque FFT pipeline could be made that cuts its stages need to convert to FFT and back again and reduce load purely to function.

Modules I am thinking of.
Delay Sum beamformer (Not great but low load with TDOA)
Echo Cancel (Prob Speex DSP)
MFCC

My C skills are pretty non existent but I thought I would come on here and ask as Python sucks for audio rate chunking and really inefficient use of load.
Audio wise and VoiceAI I have a pretty good understanding of requirements and just wondering if anyone has any interest or pointers to creating a performant pipe.
I am thinking its just a pipe on stdin/stdout but wondering about an initial header to set config such as channel, frame, window and such like and ask on here.