I quickly hacked up a pipeline on my laptop involving Pulseaudio's
parec
, my own FFT tool, and gnuplot
. Sure enough, there seemed to be an (inaudible) audio signal at about 390Hz clearly visible in an approximately 10 second integration, giving a sub-Hz resolution. But I wasn't satisfied; I wanted to see the short-term spectra scrolling across the screen in real time. I ended up hacking into the night, but ultimately frustrated by some GTK+ weirdness. (One needs to set a file descriptor to non-blocking mode when calling g_io_add_watch
.) I wasn't able to finish anything useful during my visit.Last night I achieved victory:
This image is just a looped GIF showing two 1-second slices where I said something or my dog bumped the table. The vertical axis spans DC to 22kHz, which obscures much of the interesting stuff in the lower-frequency band where most of the information in speech lives. I'm not finished with this tool; it needs at least some zooming functionality, to adjust the range of frequencies shown, and to expand and contract the colour range I use to indicate spectral intensity. You can grab a copy of the code from repo.or.cz and help out, if you like.
No comments:
Post a Comment