標題: 基於循環收縮聚合技術之室內場景彩色深度影像語意分割
Semantic Segmentation of Indoor-Scene RGB-D Images Based on Iterative Contraction and Merging
作者: 卓士軒
王聖智
Cho, Shih-Hsuan
Wang, Sheng-Jyh
電子研究所
關鍵字: 語意分割;影像分割;彩色深度影像;循環;收縮;聚合;semantic segmentation;image segmentation;RGB-D Images;iteratively;contractive;merging
公開日期: 2017
摘要: 對於室內場景語意分割的議題,我們提出了一個方法同時結合了卷積神經網路(convolutional neural network)和循環收縮聚合(iteratively contractive merging)技術,同時也利用深度影像資訊幫助分析室內場景空間的資訊,我們利用兩個雙邊濾波器對有缺少資訊的深度影像進行填補以及對整張影像平滑化。循環收縮聚合技術是一種非監督式影像分割技術且同時能保有不錯的邊界資訊,我們利用原有的特點,對該技術增加許多更高階的資訊,例如:卷積神經網路之語意分割結果、深度影像、法向量圖,藉此使得循環收縮聚合技術從高解析度到低解析度的過程更趨向於語意分割的結果,最終我們能得到一個語意階層分割樹(hierarchical segmentation tree)。我們同時也提出了一個決策方法針對室內場景語意分割的問題,藉由卷積神經網路提供粗略的語意分割結果,對階層分割樹中另外找到一個較精細且最佳的語意分割結果。在實驗結果中,我們的結果對於卷積神經網路之語意分割結果相比有較佳的物件邊界,同時在提出的方法中,我們也證明更多的高階的資訊能對循環收縮聚合技術生成更佳的室內場景語意分割結果。
For semantic segmentation of indoor-scene images, we propose a method which combines convolutional neural network (CNNs) and the Iterative Contraction & Merging (ICM) algorithm. We also simultaneously utilize the depth images to efficiently analyze the 3-D space in indoor-scene images. The raw depth image from the depth camera is processed by two bilateral filters to recover a smoother and more complete depth image. On the other hand, the ICM algorithm is an unsupervised segmentation method that can preserve the boundary information well. We utilize the dense prediction from CNN, depth image and normal vector map as the high-level information to guide the ICM process for generating image segments in a more accurate way. In other words, we progressively generate the regions from high resolution to low resolution and generate a hierarchical segmentation tree. We also propose a decision process to determine the final decision of the semantic segmentation based on the hierarchical segmentation tree by using the dense prediction map as a reference. The proposed method can generate more accurate object boundaries as compared to the state-of-the-art methods. Our experiments also show that the use of high-level information does improve the performance of semantic segmentation as compared to the use of RGB information only.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070350252
http://hdl.handle.net/11536/140390
Appears in Collections:Thesis