Dissertation Research: The Acoustic Transient Processor (ATP)

Introduction

The problem of speech recognition is made especially complicated by the problem of time warping, the lengthening or shortening of word components (phonemes) which drastically alter the sound at the physical level but not at the perceptual level. The human brain has very little difficulty in detecting that the same word has been spoken regardless of whether it has been spoken quickly or slowly.

Fortunately, not all sounds in the world are time-warped. The class of sounds known as ``acoustic transients'' are quick sounds, typically occurring in less than 1/10 second. Examples of acoustic transients include handclaps, finger snaps, the sound of a door closing, sonar signals, and certain consonant parts of speech, to name just a few. Due to the lack of time warping, it is possible to classify these signals by a direct correlation of an incoming sound signal and a stored template.

We (me, my thesis advisor Gert Cauwenberghs, and Fernando Pineda, a neural networks researcher at the Johns Hopkins University Applied Physics Laboratory) have developed an algorithm for acoustic transient correlation and constructed a system in micropower analog VLSI circuitry to implement the algorithm, which was designed with analog hardware in mind.

The first of my own papers to appear on the subject of recognition/classification of acoustic transient events was presented at ISCAS '97 in Hong Kong, June 9-12, 1997. That was three weeks before Hong Kong was handed back to China and an interesting time to be there. To get the PostScript version of the proceedings paper, click on the title below.

Title: A Mixed-Signal Correlator for Acoustic Transient Classification
Authors: R. Timothy Edwards, Gert Cauwenberghs, and Fernando J. Pineda
Abstract: Correlation computations are widely used in template-based temporal pattern recognition, but are expensive to compute in real time on DSP systems. We developed an algorithm specifically targeted for analog VLSI implementation which minimizes required memory and computation while maintaining recognition performance. We designed and fabricated a low-power chip which uses current-mode analog circuits to compute the correlation between an auditory input signal in the time-frequency domain and a stored binary template. We use a bucket-brigade device (BBD) to accumulate partial column sums over time at rates consistent with the auditory-band input. Experimental results demonstrate the correct operation of the correlator over a wide range of operating frequencies.

Experimental Results

A paper with more extensive and definitive results from the fabricated chip was presented at NIPS (Neural Information Processing Systems) conference in Denver, November 1997. We have results showing the chip performing a classification between two different sounds: a banging can and a handclap. While these sounds are relatively simple to categorize by any standard algorithm, we believe that this is the first time such a categorization has been performed using massively parallel computation of micropower analog circuits. Additional testing will determine how many classes the circuit can categorize and how robust the circuit is to various kinds of nonideal conditions.

This is my lab bench and the setup I have for taking data. Sound data pre-filtered into frequency bands and stored on the computer are digitally downloaded to the analog chip, which computes a running correlation in real-time.

This is a scope trace of the system performing a correct correlation. The top trace is the chip output, which peaks in the center of the scope screen as the input (the sound of a can banging) is maximally correlated with the stored binary template representing the sound ``banging can,'' shown below.

Below is a graph of the data from the scope trace over two presentations of the sound. You can see from the graph that it is a simple matter to compare the system output to a fixed threshold (4V, in this case) in order to detect the sound.

I did a cross-check by presenting the sound of a handclap to the system. During the presentation of ``handclap'' with the template shown above for ``can,'' the output never exceeded approximately 3V. Finally, I stored the template for ``handclap'', and got the opposite result (after scaling the output, an independent variable corresponding to the template): The output spiked over 4V when the ``handclap'' sound was presented to it, but never exceeded approximately 3V for the ``can'' sound.

The PostScript version of the paper, which includes a full set of figures for the experiment described above, will be posted here after the conference.

Title: A Neuromorphic VLSI Processor for Acoustic Transient Correlation
Authors: R. Timothy Edwards, Gert Cauwenberghs, and Fernando J. Pineda
Abstract: Correlation computations are widely used in template-based temporal pattern recognition, but are expensive to compute in real time on DSP systems. We have designed and fabricated a neuromorphic chip which uses current-mode analog circuits to compute the correlation between an auditory input signal in the time-frequency domain and a stored binary template. Experimental results of the correlator demonstrate that by using massively parallel computation, circuits made of relatively imprecise analog components are able to accurately classify transient events, are competetive with high-performance DSP systems, and operate with much smaller power requirements.

Log-Domain Circuits and Acoustic Frontend Processor

Another paper submitted to IEEE Trans. Circuits and Systems II (Analog and Digital Signal Processing), special issue on ``Advances in Nonlinear Electronic Circuits'' concentrates on the frontend system I designed for the Acoustic Transient Processor. Because the ATP chip is an analog current-domain system, it expects input values which are currents representing the time-frequency decomposition of the transient sound. The frontend system may be a cochlea-model structure or a parallel bank of bandpass filters. We have chosen the latter (mostly because stability is easier to ensure and noise easier to control when the signal propagates in parallel rather than propagating down the length of the filterbank).

Research on and design of voltage mode filters and filterbanks is extensive, but only recently have researchers closely investigated a class of current mode filter structures, called log-domain filters. This was a ``hot topic'' at the ISCAS '97 conference. I have not been designing these filters because the topic is in vogue, but rather because for years I have been looking for the current-mode dual of analog VLSI (voltage-mode) transconductance-C filters. Current-mode design would be greatly facilitated by current-mode filters. In the case of the Acoustic Transient Processor, the log-domain filterbank gives me a way to transform a sound signal into an array of currents, each representing the instantaneous energy of one frequency band of the signal.

Title: Synthesis of Log-Domain Filters for Audio-Frequency Applications
Authors: Wolfgang Himmelbauer, R. Timothy Edwards, Andreas Andreou, and Gert Cauwenberghs.
Abstract: Log-domain filters have recently come into the limelight of the VLSI community as an important class of circuits for implementing filters in the current domain. While most of the existing literature concerns filters for high-frequency (such as RF) applications, in this paper we describe synthesis of filters especially for low-frequency (audio-band) applications. We present both micropower CMOS and BiCMOS designs of first-order lowpass and second-order bandpass filters. Measurements taken from filters fabricated in 2um CMOS technology demonstrate operation extending over the audio-frequency range.

Title: Log-Domain Circuits for Auditory Signal Processing
Authors: R. Timothy Edwards and Gert Cauwenberghs.
Abstract: The theory and practice of log-domain filter design has reached the point where it is possible to incorporate log-domain filter structures into large current-mode VLSI systems. We report on interface circuits used to implement a current-mode frontend filterbank and feature extractor for acoustic pattern recognition. These circuits maintain a log-domain structure, acting on the unexpanded filter output to execute such functions as peak-to-peak voltage measurement, full-wave rectification, smoothing, and normalization. A fabricated VLSI sixteen-channel filterbank feature extractor exhibits 40 dB resolution for short-term energy envelope measurements of class A log-domain second-order bandpass filter outputs.

midwest.ps (Expanded) paper submitted to the conference.
Slides which accompanied the talk.

Trinary-Trinary Correlation Processor

In the process of investigating the proposed algorithm, I discovered that the use of binary-transformed inputs together with binary-transformed templates does not, as origianlly claimed, cause the classification rate to go down. In some instances it actually improved classification rates by a small margin. While it was not the original intent of the thesis to investigate digital algorithms and architectures, after a bit of thought I came to the realization that not only was such a system feasible, but it could be done relatively simply with standard (board-level) parts. So I embarked on a digital version of the transient processor as a final dissertation project.

Results were presented at the 2nd European Workshop on Neuromorphic System (as well as being found in my Ph.D. dissertation).

Title: Acoustic Transient Classification with a Template Correlation Processor
Author: R. Timothy Edwards
Abstract: I present an architecture for acoustic pattern classification using trinary-trinary template correlation. In spite of its computational simplicity, the algorithm and architecture represent a method which greatly reduces bandwidth of the input, storage requirements of the classifier memory, and power consumption of the system without compromising classification accuracy. The linear system should be amenable to training using recently-developed methods such as Independent Component Analysis (ICA), and we predict that behavior will be qualitatively similar to that of structures in the auditory cortex.

euroneuro.ps (12MB) Paper submitted to the conference.
Slides which accompanied the talk.

Back to my home page. . .

email:

Last updated: October 11, 2005 at 11:43pm