標題: 自動化建構虛擬說話人臉與其相關應用之研究
A Study on Automatic Construction of Virtual Talking Faces and Applications
作者: 賴成駿
Lai, Cheng-Jyun
蔡文祥
Tsai, Wen-Hsiang
資訊科學與工程研究所
關鍵字: 說話人臉;虛擬人臉;臉部辨識;人臉動畫;語音辨識;唇形同步;自動化學習;特徵值抽取;以樣本為基礎的影像合成;talking-heads;virtual face;face recognition;facial animation;speech recognition;lip synchronization;automatic learning;feature extraction;sample-based image synthesis
公開日期: 2003
摘要: 本論文提出了一套自動化建構虛擬說話人臉的系統。這個系統以二維臉部影像為基礎,包含了三個階段:錄影學習、特徵值學習與動畫製作。在錄影學習階段,我們提出了一個包含所有種類的中文注音的稿子,模特兒只要唸上面的句子就可以完成學習,而不用單獨唸每個音。在特徵值學習階段,語音特徵、臉部特徵與背景影像序列等資訊系統都會自動學習,並以自動斷句來輔助學習語音特徵。另本系統亦能產生自然搖頭效果的背景影像序列,基於影像比對方法學習臉部特徵的位置。在達到次像素精準度的同時,這個方法也可以適用在搖動的人臉上。在動畫製作階段,我們提出了幾個方法來增進動畫的精細度。首先提出了一個達成語音與影像同步的方法。為了建立更流暢的動畫,我們分析了相連音間所轉折畫格數目,也提出了一個自動決定嘴巴影像與背景影像最佳整合方式的方法。為了建立更真實的虛擬人臉,我們研究並模擬了真人說話和唱歌時的行為。最後我們實作出三種有趣的應用。良好的實驗結果證實本論文所提出方法之可行性。
In this study, a system for automatic creation of virtual talking faces is proposed. The system is based on the use of 2D facial images and includes three processes: video recording, feature learning, and animation generation. In the video recording process, a transcript containing all classes of Mandarin syllables is proposed, so that a model can read sentences on it instead of reading all the syllables separately. In the feature learning process, audio features, facial features, and base image sequences are all learned automatically. A sentence segmentation algorithm is proposed to help the learning of syllables. Base image sequences that can exhibit natural head shaking actions are generated. An image matching method is proposed to learn the positions of facial features in a face image with sub-pixel precision. The method also can be applied to shaking faces. In the animation generation process, several methods are proposed to improve the quality of animations. A method is proposed to synchronize a speech and image frames. To create smoother animations, the number of proper transition frames between successive visemes is analyzed. Also proposed is method to find the best way for integration of a mouth image and a base image. To create more natural virtual faces, a method is proposed to simulate the behaviors of real talking persons and singing persons. Three kinds of interesting applications are implemented. Good experimental results show the feasibility of the proposed methods.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009123522
http://hdl.handle.net/11536/52757
Appears in Collections:Thesis


Files in This Item:

  1. 352201.pdf