Dissertation Research: The Acoustic Transient Processor (ATP)
Introduction
The problem of speech recognition is made especially complicated by the problem
of time warping, the lengthening or shortening of word components (phonemes)
which drastically alter the sound at the physical level but not at the
perceptual level. The human brain has very little difficulty in detecting
that the same word has been spoken regardless of whether it has been spoken quickly
or slowly.
Fortunately, not all sounds in the world are time-warped. The class of sounds
known as ``acoustic transients'' are quick sounds, typically occurring in
less than 1/10 second. Examples of acoustic transients include handclaps, finger
snaps, the sound of a door closing, sonar signals, and certain consonant parts
of speech, to name just a few. Due to the lack of time warping, it is possible
to classify these signals by a direct correlation of an incoming sound signal
and a stored template.
We (me, my thesis advisor Gert Cauwenberghs, and Fernando Pineda, a
neural networks researcher at the Johns Hopkins University Applied Physics
Laboratory) have developed an algorithm for acoustic transient correlation
and constructed a system in micropower analog VLSI circuitry to implement
the algorithm, which was designed with analog hardware in mind.
The first of my own papers to appear on the subject of recognition/classification
of acoustic transient events was presented at ISCAS '97 in Hong Kong, June
9-12, 1997. That was three weeks before Hong Kong was handed back to China and
an interesting time to be there. To get the PostScript version of the proceedings
paper, click on the title below.
- Title: A Mixed-Signal Correlator
for Acoustic Transient Classification
- Authors: R. Timothy Edwards, Gert Cauwenberghs, and Fernando J. Pineda
- Abstract:
Correlation computations are widely used in template-based temporal pattern
recognition, but are expensive to compute in real time on DSP systems.
We developed an algorithm specifically targeted for analog VLSI
implementation which minimizes required memory and computation while
maintaining recognition performance.
We designed and fabricated a low-power chip which uses current-mode
analog circuits to compute the correlation between an auditory input signal
in the time-frequency domain and a stored binary template. We use a
bucket-brigade device (BBD) to accumulate partial column sums over time at
rates consistent with the auditory-band input. Experimental results
demonstrate the correct operation of the correlator over a wide range of
operating frequencies.
Experimental Results
A paper with more extensive and definitive results from the fabricated
chip was presented at NIPS (Neural Information Processing
Systems) conference in Denver, November 1997.
We have results showing the chip performing a classification between two
different sounds: a banging can and a handclap. While these sounds are
relatively simple to categorize by any standard algorithm, we believe that
this is the first time such a categorization has been performed using
massively parallel computation of micropower analog circuits. Additional
testing will determine how many classes the circuit can categorize and
how robust the circuit is to various kinds of nonideal conditions.
This is my lab bench and the setup I have
for taking data. Sound data pre-filtered into frequency bands and
stored on the computer are digitally downloaded to the analog chip,
which computes a running correlation in real-time.
This is a scope trace of the system performing a correct correlation.
The top trace is the chip output, which peaks in the center of the
scope screen as the input (the sound of a can banging) is maximally
correlated with the stored binary template representing the sound
``banging can,'' shown below.
Below is a graph of the data from the scope trace over two presentations
of the sound. You can see from the graph that it is a simple matter to
compare the system output to a fixed threshold (4V, in this case) in order
to detect the sound.
I did a cross-check by presenting the sound of a handclap to the system.
During the presentation of ``handclap'' with the template shown above
for ``can,'' the output never exceeded approximately 3V. Finally, I
stored the template for ``handclap'', and got the opposite result
(after scaling the output, an independent variable corresponding to the
template): The output spiked over 4V when the ``handclap'' sound was
presented to it, but never exceeded approximately 3V for the ``can''
sound.
The PostScript version of the paper, which includes a full set of figures
for the experiment described above, will be posted here after the conference.
- Title: A Neuromorphic VLSI Processor
for Acoustic Transient Correlation
- Authors: R. Timothy Edwards, Gert Cauwenberghs, and Fernando J. Pineda
- Abstract:
Correlation computations are widely used in template-based temporal pattern
recognition, but are expensive to compute in real time on DSP systems.
We have designed and fabricated a neuromorphic chip which uses current-mode
analog circuits to compute the correlation between an auditory input signal
in the time-frequency domain and a stored binary template. Experimental
results of the correlator demonstrate that by using massively parallel
computation, circuits made of relatively imprecise analog components are able
to accurately classify transient events, are competetive with high-performance
DSP systems, and operate with much smaller power requirements.
Log-Domain Circuits and Acoustic Frontend Processor
Another paper submitted to IEEE Trans. Circuits and Systems II (Analog
and Digital Signal Processing), special issue on ``Advances in
Nonlinear Electronic Circuits'' concentrates on the frontend system I
designed for the Acoustic Transient Processor. Because the ATP chip is
an analog current-domain system, it expects input values which are currents
representing the time-frequency decomposition of the transient sound.
The frontend system may be a cochlea-model structure or a parallel bank of
bandpass filters. We have chosen the latter (mostly because stability is
easier to ensure and noise easier to control when the signal propagates in
parallel rather than propagating down the length of the filterbank).
Research on and design of voltage mode filters and filterbanks is
extensive, but only recently have researchers closely investigated a class
of current mode filter structures, called
log-domain filters. This was a ``hot topic'' at the ISCAS '97
conference. I have not been designing these filters because the topic
is in vogue, but rather because for years I have been looking for the
current-mode dual of analog VLSI (voltage-mode) transconductance-C filters.
Current-mode design would be greatly facilitated by current-mode filters.
In the case of the Acoustic Transient Processor, the log-domain filterbank
gives me a way to transform a sound signal into an array of currents, each
representing the instantaneous energy of one frequency band of the signal.
- Title: Synthesis of Log-Domain Filters for Audio-Frequency
Applications
- Authors: Wolfgang Himmelbauer, R. Timothy Edwards, Andreas
Andreou, and Gert Cauwenberghs.
- Abstract:
Log-domain filters have recently come into the limelight of the VLSI
community as an important class of circuits for implementing filters
in the current domain. While most of the existing literature concerns
filters for high-frequency (such as RF) applications, in this paper we
describe synthesis of filters especially for low-frequency (audio-band)
applications. We present both micropower CMOS and BiCMOS designs of
first-order lowpass and second-order bandpass filters. Measurements
taken from filters fabricated in 2um CMOS technology demonstrate
operation extending over the audio-frequency range.
- Title: Log-Domain Circuits for Auditory Signal Processing
- Authors: R. Timothy Edwards and Gert Cauwenberghs.
- Abstract:
The theory and practice of log-domain filter design has reached the
point where it is possible to incorporate log-domain filter
structures into large current-mode VLSI systems. We report on
interface circuits used to implement a current-mode frontend
filterbank and feature extractor for acoustic pattern recognition.
These circuits maintain a log-domain structure, acting on the
unexpanded filter output to execute such functions as peak-to-peak
voltage measurement, full-wave rectification, smoothing, and
normalization. A fabricated VLSI sixteen-channel filterbank
feature extractor exhibits 40 dB resolution for short-term energy
envelope measurements of class A log-domain second-order bandpass
filter outputs.
midwest.ps (Expanded) paper submitted to the conference.
Slides which accompanied the talk.
Trinary-Trinary Correlation Processor
In the process of investigating the proposed algorithm, I discovered
that the use of binary-transformed inputs together with binary-transformed
templates does not, as origianlly claimed, cause the classification rate
to go down. In some instances it actually improved classification rates
by a small margin. While it was not the original intent of the thesis
to investigate digital algorithms and architectures, after a bit of
thought I came to the realization that not only was such a system
feasible, but it could be done relatively simply with standard (board-level)
parts. So I embarked on a digital version of the transient processor
as a final dissertation project.
Results were presented at the 2nd European Workshop on Neuromorphic System
(as well as being found in my Ph.D. dissertation).
- Title: Acoustic Transient Classification with
a Template Correlation Processor
- Author: R. Timothy Edwards
- Abstract:
I present an architecture for acoustic pattern classification using
trinary-trinary template correlation. In spite of its computational
simplicity, the algorithm and architecture represent a method which
greatly reduces bandwidth of the input, storage requirements of
the classifier memory, and power consumption of the system without
compromising classification accuracy. The linear system should be
amenable to training using recently-developed methods such as
Independent Component Analysis (ICA), and we predict that behavior
will be qualitatively similar to that of structures in the auditory
cortex.
euroneuro.ps (12MB) Paper submitted to the conference.
Slides which accompanied the talk.
Back to my home page. . .
email:
|
|
Last updated: October 11, 2005 at 11:43pm