標題: 中文聽障語者的強健性辨認研究
Robust Distributed Recognition of Hearing-impaired Mandarin Speech over Wireless Networks
作者: 李承龍
Lee, Cheng-Lung
張文輝
Chang, Wen-Whei
電信工程研究所
關鍵字: 分散式語音辨認;合併訊源通道編碼;語音轉換;Distributed speech recognition;Joint source-channel coding;Voice conversion
公開日期: 2008
摘要: 本篇論文旨在分散式語音辨認架構下,針對語者變異與傳輸錯誤的影響分別提供其強健性處理。語者變異所造成的效能失真,其影響源自於語音辨認器在模型訓練與實際測試兩個階段的語者不匹配。針對聽障中文語者的發聲,我們提出語音轉換機制使其能匹配辨認模型所蘊含的語音特性。此轉換系統的設計乃是基於中文語音的特性,考慮聲母-韻母組合的音節結構及聲調變化,分別針對頻譜與韻律兩層次的特徵參數進行轉換,而特徵參數的擷取則是依據正弦語音模型。頻譜轉換需考慮不同音類在聲學特性的明顯差異,並據以針對聲母及韻母所屬的次音節參數分別設計其最佳化轉換函數。此外,構音速度的調變亦針對不同類型的次音節,設計其線性或非線性的轉換機制。至於聲調的調變,則考慮中文四聲變化的結構,先藉由正交轉換分析基頻變化曲線的特徵參數,再利用向量對應機制估算最佳的基頻轉換曲線。系統模擬證實,語音轉換機制可有效改善聽障者語音的清晰度,進而有效提升其語音辨認的正確率。分散式辨認系統的另一研究重點是語音特徵參數於無線傳輸過程中,將遭遇叢發性通道錯誤而導致其辨認效能衰減。有鑑於此,我們設計一錯誤隱匿解碼機制,其關鍵在於有效整合訊源編碼輸出的殘餘冗息以及通道錯誤的相關特性。在辨認特徵參數的冗息分析中,編碼輸出的量化索引序列仍存在大量的相關特性,而行動通訊的叢發性錯誤則適於以馬可夫模型來模擬。我們結合這兩種訊息,再依據最大後驗機率準則設計一合併訊源通道解碼演算法。實驗結果證實訊源通道解碼器在無線傳輸環境能有效提升其錯誤隱匿效能。
This study focuses on the robustness of distributed speech recognition (DSR) systems against the inter-speaker and channel variabilities. In the first part, we develop joint source-channel decoding algorithms with increased robustness against channel errors in mobile DSR applications. An MAP symbol decoding algorithm which exploits the combined a priori information of source and channel is proposed. This is used in conjunction with a modified BCJR algorithm for decoding convolutional channel codes based on sectionalized code trellises. Performance is further enhanced by the use of the Gilbert channel model that more closely characterizes the statistical dependencies between channel bit errors. In the second part, we develop voice conversion approaches based on the feature transformation to perform speaker adaptation for hearing-impaired Mandarin speech. The basic strategy is the combined use of spectral and prosodic conversions to modify the hearing-impaired Mandarin speech. The analysis-synthesis system is based on a sinusoidal representation of the speech production mechanism. By taking advantage of the tone structure in Mandarin speech, pitch contours are orthogonally transformed and applied within the sinusoidal framework to perform pitch modification. Also proposed is a time-scale modification algorithm that finds accurate alignment between hearing-impaired and normal utterances. Using the alignments, spectral conversion is performed on subsyllabic acoustic units by a continuous probabilistic transform based on a Gaussian mixture model.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009113807
http://hdl.handle.net/11536/47190
Appears in Collections:Thesis


Files in This Item:

  1. 380701.pdf