Publish In |
International Journal of Electrical, Electronics and Data Communication (IJEEDC)-IJEEDC |
Journal Home Volume Issue |
||||||||
Issue |
Volume-3, Issue-11 ( Nov, 2015 ) | |||||||||
Paper Title |
Leveraging Jointly Spatial, Temporal And Modulation Enhancement In Creating Noise-Robust Features For Speech Recognition | |||||||||
Author Name |
Hsin-Ju Hsieh, Hao-Teng Fan, Jeih-Weih Hung | |||||||||
Affilition |
Dept of Electrical Engineering, National Chi Nan University, Taiwan, Republic of China | |||||||||
Pages |
15-19 | |||||||||
Abstract |
This paper presents to adopt various fusion types of spatial, temporal and modulation domain speech feature enhancement techniques in order to achieve superior speech recognition performance under noise-corrupted environments. With the mel-frequency cepstral coefficients (MFCC) as the standard speech feature representation, the spatial-domain techniques involve the short-time intra-frame feature enhancement, while the temporal-domain techniques compensate for the noise distortion that exists in the long-term inter-frame MFCC time stream. Furthermore, the modulation- domain techniques are conducted on the Fourier transform of a MFCC time stream. The evaluation experiments conducted on the connected-digit Aurora-2 database reveal that each of the spatial/temporal enhancement techniques adopted here performs better than the unprocessed MFCC baseline, and the integration of the methods respectively for spatial-, temporal-and modulation-domain features can result in even better recognition accuracy than the individual component method under a wide range of noise-corrupted environments. These results clearly demonstrate that the methods in the three domains treat noise in different aspects and therefore they are complementary to each other. Keywords- Noise Robustness, Speech Recognition, Spatial Processing, Temporal Processing, Modulation Domain. | |||||||||
View Paper |