An automatic speech recognition oriented study on segmentation, low dimensional feature extraction, and temporal trajectory information capture.

Item

Title: An automatic speech recognition oriented study on segmentation, low dimensional feature extraction, and temporal trajectory information capture.
Identifier: AAI3063902
identifier: 3063902
Creator: Zhu, Yonggang.
Contributor: Adviser: Robert V. Fanelli
Date: 2002
Language: English
Publisher: City University of New York.
Subject: Physics, Acoustics | Psychology, Cognitive | Computer Science
Abstract: Accurate and efficient automatic speech recognition requires feature vectors highly discriminative for the categories of interest while at a low dimensionality. Recent studies on feature extractions from mel spectra show that classical mel-frequency cepstral coefficients (MFCCs) may not be able to capture some important cues existing in the local spectral correlates. Thus, we study feature extraction together with dimensionality reduction on mel spectra using the hybrid models of neural networks and Euclidean distance proposed by us. This is mainly inspired by the adaptive nature of neural networks. If we use classical MFCCs as a benchmark, features extracted by our hybrid models can give comparable or much better classification rates while with significant dimensionality reduction. Time warping recurrent neural network, aimed to recognize phonemes and CV syllables by efficiently capturing temporal trajectory information, is studied with mel features, MFCCs and our features, and the results suggest that low dimensional features extracted by linear Euclidean neural networks may be better for this purpose.
Type: dissertation
Source: PQT Legacy CUNY.xlsx
degree: Ph.D.

Item sets: CUNY Legacy ETDs

Media: An automatic speech recognition oriented study on segmentation, low dimensional feature extraction, and temporal trajectory information capture.