標題: 語者調適和正規化技術在語音辨認之初步研究
A First Study on Speaker Adaptation and Normalization for Continuous Mandarin Speech Recognition
作者: 蔡忠安
Tsai, Chung-An
陳信宏
Sin-Horng Chen
電信工程研究所
關鍵字: 語者調適;語者正規化;語音辨認;Speaker Adaptation;Speaker Normalization;MLLR technique;speech recognition
公開日期: 1997
摘要: 本論文的研究重點在於國語連續語音辨認的語者調適和正規化技術.在語 者調適方面:對於每一位測試語者,以MLLR技術估計出其轉換矩陣,以調適 現有之不特定語者模型為最適合該測試語者之HMM模型.實驗結果顯示調適 後的辨認率較基本系統升高許多,而且隨著調適語料的增加,辨認率隨之而 遞增.在語者正規化方面:我們以MLLR技術所衍生出的三種方法達到語者正 規化的目的.方法一:訓練語者的特徵參數直接扣除由MLLR技術所估計出的 語者偏移量後重估HMM模型;方法二:訓練語者的特徵參數經由調適後的平 均值向量圓滑化後重估HMM模型;方法三:以MLLR技術對每一位訓練語者的 特徵參數估計出一轉換矩陣,特徵參數經轉換後重估HMM模型.實驗結果顯 示,由方法二所估計出的語者正規化模型最精確.其辨認率為62.62%比 SBR(54.91%)和CMN(56.96%)法的辨認率都高. In this thesis, sspeaker adaptation and normalization for continuous Mandarinspeech recognition are discussed. In speaker adaptation, the MLLR technique isemployed to transform the SI HMM models into a version, which is more suitablefor a new testing speaker, using a small set of his/her utterances. In speakernormalization, three schemes based on the MLLR techniques are proposed to removespeakers' personal characteristics from the input speech signals in order totrain a set of speaker independent HMM models. Experimental results showed thatthe base-syllable recognition rate can be raised from 53.16% to 62.30% by theproposed speaker adaptation method using adaptation data sets of 5 sentential utterances (about 30 seconds). The proposed speaker normalization method isalso effective on improving the recognition performance. The base- syllablerecognition rate can be raised from 53.16% to 62.62%.
URI: http://140.113.39.130/cdrfb3/record/nctu/#NT860435015
http://hdl.handle.net/11536/63034
Appears in Collections:Thesis