I quickly hacked up a pipeline on my laptop involving Pulseaudio's
parec, my own FFT tool, and
gnuplot. Sure enough, there seemed to be an (inaudible) audio signal at about 390Hz clearly visible in an approximately 10 second integration, giving a sub-Hz resolution. But I wasn't satisfied; I wanted to see the short-term spectra scrolling across the screen in real time. I ended up hacking into the night, but ultimately frustrated by some GTK+ weirdness. (One needs to set a file descriptor to non-blocking mode when calling
g_io_add_watch.) I wasn't able to finish anything useful during my visit.
Last night I achieved victory:
This image is just a looped GIF showing two 1-second slices where I said something or my dog bumped the table. The vertical axis spans DC to 22kHz, which obscures much of the interesting stuff in the lower-frequency band where most of the information in speech lives. I'm not finished with this tool; it needs at least some zooming functionality, to adjust the range of frequencies shown, and to expand and contract the colour range I use to indicate spectral intensity. You can grab a copy of the code from repo.or.cz and help out, if you like.