AL-Lisaniyyat
Volume 20, Numéro 1, Pages 19-26
2014-06-28

Improving Performance Of Hmm-based Asr For Gsm-efr Speech Coding

Authors : Lallouani Bouchakour . Debyeche Mohamed .

Abstract

The Global System for Mobile (GSM) environment includes three main problems for Automatic Speech Recognition (ASR) systems: noisy scenarios, source coding distortion and transmission errors. The second, source coding distortion must be explicitly addressed. In this paper, we investigate different features extractions techniques for GSM EFR (Enhanced Full Rate) coding with the aim to improve the performance of ASR in the GSM domain. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech bit-stream instead of decoding it and subsequently extracting the feature vectors. The speaker independent recognition experiment was based on the Continuous Hidden Model Markov (CHMM). The performance of the proposed speech recognition technique was assessed using the ARADIGT transcoding with its 8 kHz downsampled version. Different experiments were carried out in order to explore feature calculation directly from the GSM EFR encoded parameters and to measure the degradation introduced by different aspects of the coder. The ARADIGIT database consists of 60 speakers (31 male speakers and 29 female speakers) pronouncing the ten Arabic digits, was built in order to conduct the necessary experiments. As a result, the proposed methods achieved higher performances in recognition accuracy, compared with the conventional methods employing Mel-Frequency Cepstral Coefficients MFCC. This paper presents two configurations used for extracting feature parameters for speech recognition over mobile communication; the decoded speech-based technique and the bitstream-based technique

Keywords

speech coding, GSM, EFR, CHMM, ASR, ARADIGT, MFCC, bit-strea