Beamforming and multi-channel post-filtering for speech enhancement based on multi-rank signal models
|關鍵字:||多秩訊號模型;波束形成;後濾波;語音純化;multi-rank signal model;beamforming;post-filtering;speech enhancement|
在波束形成技術上，本論文引入了多秩(multi-rank)訊號模型與範數限制(norm constraint)來降低傳統最小變異無失真響應(minimum variance distortionless response)設計下對陣列不確定性的影響。本論文經由虛擬觀測(pseudo-observation)的技巧，將問題轉換到狀態空間，並使用一階、二階擴展卡爾曼濾波器(extended Kalman filter)以及非察覺型卡爾曼濾波器(unscented Kalman filter)來實現以上非線性問題。此外，本論文針對範數限制值的選擇進行完整分析。由模擬結果可看出，相較於使用對角加載(diagonal loading)，使用範數限制對於未知訊號能量、模型誤差等更具有強健性。
在多通道後濾波技術上，我們定義了一種新的空間相干量度(spatial coherence measure)並首度引入多秩訊號模型到後濾波的發展中。此相干量度透過兩功率頻譜密度(power spectral density)矩陣來描述兩聲場之間的相似度。此相干量度可用於設計一種新的多通道濾波器。基於此相干量度，本論文分析了由目標聲場及雜訊聲場造成的偏離(bias)，並提出另一套偏離補償的多通道濾波器。此多通道濾波器在偏離或是雜訊功率頻譜密度矩陣被正確估測下，等同於理想的維納濾波器(Wiener filter)。本論文透過理論以及實驗證實在使用更正確的訊號模型下，本論文提出之偏離補償後的多通道濾波器可提供較佳的語音品質。|
Beamforming and multi-channel post-filtering are two major techniques in microphone array speech enhancement. In conventional algorithms, the relationship between a source to the microphones is usually described by delays or relative transfer functions (RTFs) of each microphone pair. The description of the source model is limited to the rank-1 spatial correlation. However, in the real sound propagation of the source, the spatial correlation is typically multi-rank due to local scattering, wavefront fluctuation, or reverberation. Thus, this dissertation proposes the beamforming and multi-channel post-filtering algorithms based on multi-rank signal models. The proposed algorithms can alleviate the self-cancellation phenomenon of the desired source during noise reduction and improve the speech quality. For the proposed beamforming algorithms, multi-rank signal models and norm constraint are introduced into the minimum variance distortionless response (MVDR) beamforming problem for reducing the sensitivity of the design against array uncertainties. Based on the pseudo-observation method, the beamforming problems are transformed into state spaces and solved by the first- and second-order extended Kalman filters (EKF) and the unscented Kalman filter (UKF). In addition, the selection of the norm constraint value is completely studied. The simulations show that the usage of the norm constraint is more robust to the unknown signal powers and model errors compared to the usage of the diagonal loading (DL) technique. For the proposed multi-channel post-filtering algorithms, a novel spatial coherence measure is defined and multi-rank signal models are firstly conducted into the post-filtering development. The spatial coherence measure evaluates the similarity between the measured signal fields using power spectral density matrices. A multi-channel post-filter is proposed based on this measure. Under this measure, the bias term due to the similarity of the desired signal field and the noise field is further investigated and a solution based on bias compensation is proposed. It can be shown that the compensated solution is equivalent to the optimal Wiener filter if the bias or the noise power spectral density matrix is perfectly measured. The theoretical and empirical results demonstrate that the proposed bias compensated post-filter provides better speech quality with a more accurate signal model.
|Appears in Collections:||Thesis|
Files in This Item: