標題: 利用物件修補之數位內容還原與修改技術
Video Content Recovery and Modification by Object Inpainting
作者: 凌誌鴻
Ling, Chih-Hung
廖弘源
陳永昇
Liao, Hong-Yuan Mark
Chen, Yong-Sheng
資訊科學與工程研究所
關鍵字: 影片修補;video inpainting
公開日期: 2011
摘要: 隨著數位攝影機的普及化,人們開始利用影像或影片記錄生活的點滴;因此,數位內容的還原及修改逐漸成為一個重要的研究議題。針對數位內容的還原,影片修補技術(video inpainting)可以自動地修補影片中內容缺失的部分,由於現存的影片修補技術對於影片中移動物體的修補成效不彰,因此在本論文中,我們提出兩種物件修補技術來修補影片中移動的物體;針對數位內容的修改,影片超解析度技術(video super-resolution)可以自動地增加影片在空間軸及時間軸上的解析度,由於現存的影片超解析度技術對於擴充影片中移動物體在時間軸上的解析度成效不彰,因此在本論文中,我們提出一種視訊內容擴充技術用來增加影片的畫面數同時擴充移動物體的動作內容。 在第一項研究中,我們先利用維度轉換將單張畫面上的物體資訊轉換成時空切片(spatio-temporal slice)上的物體軌跡資訊,每條軌跡紀錄物體某個部位沿著時間軸的變化趨勢,接著我們利用影像修補技術來修補時空切片上軌跡缺失的區域,最後經過維度反轉換,在單張畫面上我們重建被遮蔽物體可能的輪廓及位置。在下個步驟,根據重建的物體輪廓,我們從可用的物體姿態(posture)中選取適合的姿態並利用它取代畫面中被遮蔽的物體;當無可用的姿態時,我們提出一種姿態合成技術合成所需的姿態。第一種方法的效率容易受物體運動方向影響,因此我們在第二個方法中提出一種不受限於物體運動方向的物件修補技術。 在第二項研究中,我們先利用流形學習(manifold learning) 將影片中物體運動的資訊轉換成在流形空間(manifold space) 中運動軌跡的資訊;根據軌跡在流行空間中的分佈情況,我們描述動作連續的特性並定義兩種動作預測策略,利用定義的策略,我們可以預測被遮蔽物體可能的姿態。接著我們結合提出的預測策略及雙向預測方法,對於每個被遮蔽的物體選出一些可能的姿態,最後利用馬可夫隨機場(Markov random field)來選來最適當的姿態。 在第三項研究中,針對畫面數較低的影片,我們提出一種視訊內容擴充技術。我們先利用流形學習將影片中物體運動的資訊換換成在流形空間中運動軌跡的資訊。在步驟二中,我們先利用提出的運動資料對齊方法將不同的運動資訊對齊並排列至張量(tensor)中,接著利用張量分解(tensor decomposition)從訓練的影片中抽取動作的資訊,並結合原始影片的人物資訊重建原始影片在高畫面數情況下動作軌跡在流形空間中分佈的情形,最後利用接著利用研究二中提出的方法選出適當的姿態並插入影片中適當的位置。
With the popularization of digital cameras, people use image or video to record some snapshots of daily life. Hence, video content recovery and modification has become a popular research field in recent years. For video content recovery, video inpainting is considered as one of the most important techniques that can be used to automatically recover the missing regions of videos. However, most video inpainting algorithms generate artifacts if the object to be inpainted is seriously occluded or its motion is not complicated. To avoid generating such artifacts, we propose two different kinds of object-based video inpainting schemes that can solve the above-mentioned spatial inconsistency problem and the temporal continuity problem simultaneously in this dissertation. As to video content modification, video super-resolution is considered as one of important techniques that can be used to automatically increase spatial and temporal resolution of videos. However, existing super-resolution methods may fail to produce realistic and smooth results while dealing with sequences of human motion. Hence, we propose a learning-based approach which can increase the frame rate of video and also enrich the motion content of human motion. In our first work, we present a novel framework for object completion in a video. We transform object in frames into object trajectory in spatio-temporal slices, and complete the partially damaged object trajectories in the 2-D slices. The completed slices are then combined to obtain a sequence of virtual contours of the damaged object. Next, a posture sequence retrieval technique is applied to retrieve the most similar sequence of object postures based on virtual contours. Finally, a synthetic posture generation scheme is proposed to reduce the effect of insufficient postures. In our second work, we propose a human object inpainting scheme that divides the whole process into three steps: human posture synthesis, graphical model construction, and posture sequence estimation. Human posture synthesis is used to enrich the number of postures. Then, all postures are projected into manifold space to build a graphical model of human motion. We also introduce two constraints to confine the local motion continuity property. Finally, we perform both forward and backward prediction to derive local optimal solutions and then apply the Markov Random Field model to compute an overall best solution. In our third work, we propose a learning-based approach to increase the temporal resolution of human motion sequences. We summarize the proposed framework in the following steps: graphical model construction, motion trajectory reconstruction and posture sequence estimation. In the first step, each motion sequence is projected into manifold space and represented as a motion trajectory. Then, we apply tensor decomposition to decompose motion trajectories into orthogonal factors. After that, we combine the motion factor from training sequences with the person factor from the input sequence to reconstruct the motion trajectory for the input sequence. Finally, we use the reconstructed motion trajectory combined with object inpainting technique to generate the final result.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079455831
http://hdl.handle.net/11536/40928
Appears in Collections:Thesis


Files in This Item:

  1. 583101.pdf