A Prosody-Assisted Mandarin Spontaneous Speech Recognition
|關鍵字:||自發性語音;語音辨認;spontaneous speech;speech recognition|
In recent years, the Mandarin read-speech recognition technology is quite mature. However, it is still difficult for spontaneous speech recognition due to high speaking rate and the existence of disfluent speech events. This thesis discusses Mandarin spontaneous speech recognition, focusing on language model establishment and the process of prosody-assisted recognition. In the language model establishment, two particular words of particle and marker are added to the vocabulary to model the disfluency phenomena of spontaneous speech. Besides, language model adaptation is employed to solve the problem of the insufficiency of texts of spontaneous speech. In recognition, a two-stage recognition process to incorporate prosodic information is adopted. In the first stage, an acoustic model and a bigram language model is used to generate a word lattice. Then, in the second stage the word lattice is firstly extended to replace the bigram LM with a factorized LM. Then, break-related models and prosodic state-related models of a hierarchical prosodic model are sequentially added to rescore all searching paths in order to find the best recognized word sequence. Experimental results on the Academia Sinica MCDC corpus showed that word, character and base-syllable accuracy rates of 58.29%, 64.94% and 68.89% were achieved. They were better than the results of the baseline system by 4.43%, 4.6% and 3.06%, respectively. By error analysis we find that prosodic information is useful in resolving word segmentation ambiguity and tone pattern confusion for fluent speech part, while it is less effective for disfluent part.
|Appears in Collections:||Thesis|
Files in This Item:
If it is a zip file, please download the file and unzip it, then open index.html in a browser to view the full text content.