Dissertation Research: The Acoustic Transient Processor (ATP)


The problem of speech recognition is made especially complicated by the problem of time warping, the lengthening or shortening of word components (phonemes) which drastically alter the sound at the physical level but not at the perceptual level. The human brain has very little difficulty in detecting that the same word has been spoken regardless of whether it has been spoken quickly or slowly.

Fortunately, not all sounds in the world are time-warped. The class of sounds known as ``acoustic transients'' are quick sounds, typically occurring in less than 1/10 second. Examples of acoustic transients include handclaps, finger snaps, the sound of a door closing, sonar signals, and certain consonant parts of speech, to name just a few. Due to the lack of time warping, it is possible to classify these signals by a direct correlation of an incoming sound signal and a stored template.

We (me, my thesis advisor Gert Cauwenberghs, and Fernando Pineda, a neural networks researcher at the Johns Hopkins University Applied Physics Laboratory) have developed an algorithm for acoustic transient correlation and constructed a system in micropower analog VLSI circuitry to implement the algorithm, which was designed with analog hardware in mind.

The first of my own papers to appear on the subject of recognition/classification of acoustic transient events was presented at ISCAS '97 in Hong Kong, June 9-12, 1997. That was three weeks before Hong Kong was handed back to China and an interesting time to be there. To get the PostScript version of the proceedings paper, click on the title below.

Experimental Results

A paper with more extensive and definitive results from the fabricated chip was presented at NIPS (Neural Information Processing Systems) conference in Denver, November 1997. We have results showing the chip performing a classification between two different sounds: a banging can and a handclap. While these sounds are relatively simple to categorize by any standard algorithm, we believe that this is the first time such a categorization has been performed using massively parallel computation of micropower analog circuits. Additional testing will determine how many classes the circuit can categorize and how robust the circuit is to various kinds of nonideal conditions.

This is my lab bench and the setup I have for taking data. Sound data pre-filtered into frequency bands and stored on the computer are digitally downloaded to the analog chip, which computes a running correlation in real-time.

This is a scope trace of the system performing a correct correlation. The top trace is the chip output, which peaks in the center of the scope screen as the input (the sound of a can banging) is maximally correlated with the stored binary template representing the sound ``banging can,'' shown below.

Below is a graph of the data from the scope trace over two presentations of the sound. You can see from the graph that it is a simple matter to compare the system output to a fixed threshold (4V, in this case) in order to detect the sound.

I did a cross-check by presenting the sound of a handclap to the system. During the presentation of ``handclap'' with the template shown above for ``can,'' the output never exceeded approximately 3V. Finally, I stored the template for ``handclap'', and got the opposite result (after scaling the output, an independent variable corresponding to the template): The output spiked over 4V when the ``handclap'' sound was presented to it, but never exceeded approximately 3V for the ``can'' sound.

The PostScript version of the paper, which includes a full set of figures for the experiment described above, will be posted here after the conference.

Log-Domain Circuits and Acoustic Frontend Processor

Another paper submitted to IEEE Trans. Circuits and Systems II (Analog and Digital Signal Processing), special issue on ``Advances in Nonlinear Electronic Circuits'' concentrates on the frontend system I designed for the Acoustic Transient Processor. Because the ATP chip is an analog current-domain system, it expects input values which are currents representing the time-frequency decomposition of the transient sound. The frontend system may be a cochlea-model structure or a parallel bank of bandpass filters. We have chosen the latter (mostly because stability is easier to ensure and noise easier to control when the signal propagates in parallel rather than propagating down the length of the filterbank).

Research on and design of voltage mode filters and filterbanks is extensive, but only recently have researchers closely investigated a class of current mode filter structures, called log-domain filters. This was a ``hot topic'' at the ISCAS '97 conference. I have not been designing these filters because the topic is in vogue, but rather because for years I have been looking for the current-mode dual of analog VLSI (voltage-mode) transconductance-C filters. Current-mode design would be greatly facilitated by current-mode filters. In the case of the Acoustic Transient Processor, the log-domain filterbank gives me a way to transform a sound signal into an array of currents, each representing the instantaneous energy of one frequency band of the signal.

midwest.ps (Expanded) paper submitted to the conference.
Slides which accompanied the talk.

Trinary-Trinary Correlation Processor

In the process of investigating the proposed algorithm, I discovered that the use of binary-transformed inputs together with binary-transformed templates does not, as origianlly claimed, cause the classification rate to go down. In some instances it actually improved classification rates by a small margin. While it was not the original intent of the thesis to investigate digital algorithms and architectures, after a bit of thought I came to the realization that not only was such a system feasible, but it could be done relatively simply with standard (board-level) parts. So I embarked on a digital version of the transient processor as a final dissertation project.

Results were presented at the 2nd European Workshop on Neuromorphic System (as well as being found in my Ph.D. dissertation).

euroneuro.ps (12MB) Paper submitted to the conference.
Slides which accompanied the talk.

Back to my home page. . .


Last updated: October 11, 2005 at 11:43pm