That's the beauty of it... it encompasses so many different fields and so many different people coming from different backgrounds!
If you mean taking variations in air pressure in the audio range through a transducer to electrical form, then quantizing that in time and amplitude, then processing it digitally via DFT, then extracting phonemes from the formant shifts and so forth, and then considering where these groups of phonemes fit into a lexical database, building syntactic structures and beyond, then yes that's what it is.
We should work together!