An automatic speech recognition oriented study on segmentation, low dimensional feature extraction, and temporal trajectory information capture.

Item

Title
An automatic speech recognition oriented study on segmentation, low dimensional feature extraction, and temporal trajectory information capture.
Identifier
AAI3063902
identifier
3063902
Creator
Zhu, Yonggang.
Contributor
Adviser: Robert V. Fanelli
Date
2002
Language
English
Publisher
City University of New York.
Subject
Physics, Acoustics | Psychology, Cognitive | Computer Science
Abstract
Accurate and efficient automatic speech recognition requires feature vectors highly discriminative for the categories of interest while at a low dimensionality. Recent studies on feature extractions from mel spectra show that classical mel-frequency cepstral coefficients (MFCCs) may not be able to capture some important cues existing in the local spectral correlates. Thus, we study feature extraction together with dimensionality reduction on mel spectra using the hybrid models of neural networks and Euclidean distance proposed by us. This is mainly inspired by the adaptive nature of neural networks. If we use classical MFCCs as a benchmark, features extracted by our hybrid models can give comparable or much better classification rates while with significant dimensionality reduction. Time warping recurrent neural network, aimed to recognize phonemes and CV syllables by efficiently capturing temporal trajectory information, is studied with mel features, MFCCs and our features, and the results suggest that low dimensional features extracted by linear Euclidean neural networks may be better for this purpose.
Type
dissertation
Source
PQT Legacy CUNY.xlsx
degree
Ph.D.
Item sets
CUNY Legacy ETDs