Title: 語音轉換及其在異常發聲矯正之應用
Voice Conversion with Application to Enhanced Intelligibility of Hearing Impaired
Authors: 李承龍
Cheng-Long Lee
Wen-Whei Chang
Keywords: 語音轉換;正弦語音模型;voice convertion;sinusoidal speech model
Issue Date: 2000
Abstract: 語音轉換是要把音源語者的聲音藉由參數轉換機制轉變成目標語者的聲音,故首要步驟是先分析出代表個人特質的特徵參數。特徵參數轉換機制則用基於高斯混合模型所設計之線性對映函數,而函數中的參數則以估計理論的技術來求得。對映函數將使得音源語者的特徵參數經轉換後和目標語者的特徵參數之間有最小的失真量,再配合正弦語音模型之諧波合成技術來合成語音訊號。研究結果顯示出針對正常人而言,語音轉換處理採用巴克頻譜會比倒頻譜係數合成的音質更佳。於主觀的聽力測試中,使用巴克頻譜作轉換的語料中有84%是被肯定的,比倒頻譜係數多了5%。此外,本論文之另一研究主題是要運用語音轉換的技術來設計聽障者發音矯正之輔具。實驗結果顯示基於倒頻譜係數而進行的語音轉換處理,可有效提昇聽障者發音的可理解程度。
Voice conversion is aimed to modify the speech signal of source speaker so that it sounds as if it was uttered by target speaker. The basic strategy is the detection and exploitation of characteristic features that identify speaker individuality. This was done by decomposing the speech waveforms into a sum of sinusoids. Sine-wave amplitudes are used to determine the spectral envelope which is then characterized under the form of a Gaussian mixture model. Characteristic features are modified by a mapping function that minimizes the spectral distortion between source and target speakers uttering the same text. The mapping function is performed by a linear transformation with parameters trained by a joint estimation algorithm. Experimental results indicate that the Bark spectrum is preferred to the cepstrum for use in voice conversion between two normal-listening speakers. The second part of this study presents a novel means of exploiting voice conversion in the design of speaking aids for the hearing-impaired. Experimental results indicate that the cepstrum-based voice conversion system appears useful in enhancing the intelligibility of the impaired speech.
Appears in Collections:Thesis