標題: 電腦視覺技術應用於飲水動作偵測與辨認
Computer Vision Techniques for Detection and Recognition of Drinking Activity
作者: 譚傑森
陳永昇
THAM, JIE SHENG
Chen, Yong-Sheng
電機資訊國際學程
關鍵字: 電腦視覺;影像處理;辨認;偵測;深度資訊;Computer Vision;Image Processing;object detection;object recognition;depth information
公開日期: 2016
摘要: 飲水是日常生活中最重要的一個自然行為,以確保人體能夠得到充足的水分以預防脫水的現象。 在日新月異與先進的生物科技時代,人口老化(60歲或以上)的趨勢正在逐年增長。在老年人口的組群裡,智力衰退是其中一項最為普遍面臨到的問題。此外,護理人員的短缺以及日月增長的醫藥費將會成為老年族群里的棘手問題。因此,環境補助生活系統 (Ambient Assisted Living)將可以成為一個解決方案。 此系統是用日常生活周圍環境的各種科技儀器與感應器來幫助老年人的日常居家生活。目前,大多數現有的環境補助生活系統得研究都著重於摔倒,洗手和基本的肢體辨識。除此之外,目前市面上有關於飲食的補助系統大多都是感應式的餐具和身體佩戴式的感應器。使用者必須使用或佩戴這些具有感應器的系統才能有效的利用此系統來幫助老年人居家生活上的需求。然而,這些具有感應器的飲食設備有著一些缺點。其中最大的缺點就是老年人會抗拒使用這些佩戴式的感應器。隨著電腦視覺科技及演算法的進步,攝影鏡頭能夠提供更多的資訊。此外,深度相機也帶來了研發的潛能來克服傳統演算法的缺點。 這論文會介紹兩個新穎的電腦視覺技術應用於居家飲水動作偵測與辨認。這些技術的開發主要是以運用影像的深度資訊來偵測與辨識飲水的動作。目前,即有非常少的研究利用深度資訊來偵測和辨識在飲食上的環境補助生活系統。深度資訊的最大特點是系統的準確度不會因周遭的光線與照明的變化而被影響。這論文第一個提出的方法是利用飲水時,手部動作深度資訊的特征來研發。然後利用這些深度資訊的特征再導入動態時間規整演算法(Dynamic Time Warping)來偵測和辨識飲水的動作。相較於現有的電腦視覺方法,實驗結果證明這提出的方法也能夠達到具有競爭力的準確率,89%。 實驗的過程中發現到在人類的日常生活中也有許多的動作會呈現出與飲水動作相似的深度資訊。所以,為了提高偵測與辨識的準確率,第二個提出的方法是利用深度資訊來偵測與飲水相關的物體。這個方法是用飲水相關物體的深度資訊與深度直方圖的特征(Histogram Features)做辨識的特征。然後,利用兩個不同的演算法, 1)直方圖異樣的測量(Histogram Dissimilarity Measurement)與 K-NN 演算法; 2)Hu矩的特征(Hu’s Moment Invariant features)與統計演算法進行分類。直方圖異樣的測量演算法可以偵測到兩類,杯子與非杯子的物體。其中,陆地移动距离 (Earth Mover’s Distance)的直方圖異樣的測量演算法可達到66.7%的辨識準確率。為了再進一步提高系統的穩健性,Hu矩的特征可用於辨識深度資訊的物體。相較於現有的演算法,這提出的演算法可達到82% 的準確率。這兩個利用物體的深度資訊辨識的演算法的特質是不需要有大量的資料,也能夠達到相當高的準確率與效能。此外,這方法只運用深度資訊做計算,可減少計算的複雜性亦能達到不錯的辨識效能。
Drinking is one of the most important daily dining activities, because it ensures our human body has sufficient fluid to avoid dehydrations. With the advancement of medical science and technology, the global ageing population (60 and above) increases every year. One of the common problems in ageing society is the cognitive disorder, where even a simple drinking activity in daily life can be difficult. Moreover, the shortage of elderly caregiver and the ever increasing medical cost will soon become a major issue to the society. A promising candidate to overcome the shortage of elderly caregiver is assisted living, which is the use of information technologies and sensors to assist elderly in performing their daily activities at home. Currently, most of the existing ambient assisted living research work focus on fall detection, hand washing and simple human gestures. Furthermore, the commercialised dining activity assistance systems (includes drinking) are mostly sensor-based utensil and body-worn sensors that require the users to wear the specific tools. Nevertheless, these sensor-based dining assistance systems have several drawbacks and the users might be reluctant to wear the sensors. The recent advancement on computer vision technology by using camera-based sensor has brought the advantages that can provide additional information that may not be available from motion sensors. In addition, the emergence of consumer RGB-D camera has brought up more advantages in developing new solutions to overcome the drawbacks of traditional methods. This thesis presents two novel computer vision techniques for detection and recognition of drinking activities at home which utilise only the depth information from RGBD cameras. According to my best understanding, there is very little work on using video cameras with depth sensor for the detection and recognition of ambient assisted living dining activity. The main advantage of using depth information is that the accuracy will not be affected by the change of lighting condition and illumination, as compared with using the conventional RGB cameras. In particular, the first proposed technique extracts the features from the depth information of hand action characteristic during the drinking. As the drinking action features are gathered, dynamic time warping algorithm is used to recognise and detect the drinking activity. The experimental results show that the proposed method has a comparatively high recognition accuracy of 89\% in comparison with the existing visual-based techniques. Several human daily activities might have similar actions with drinking activities in depth information representation. To increase the accuracy of drinking activities detection and recognition at home, the second proposed method is to detect the drinking activities related objects by using depth information only. The second approach uses the depth information of the objects and histogram features of the drinking activities related objects. Then, by using these features, the object can be extracted efficiently for detection and recognition. The proposed method consists of two different recognition techniques, 1) the histogram dissimilarity measurement with K-Nearest-Neighbour classifier and 2)Hu Moment Invariant feature with statistical approach classification. Because of the object can be represented by using histogram features, the histogram dissimilarity measurement method can be used to recognise the drinking target objects, such as mugs. The Earth Mover's Distance histogram dissimilarity method shows an accuracy rate of 66.7\% for recognition of mugs. To further improve the robustness of the system, Hu Moment Invariant method is proposed for recognising the depth image objects. The experimental results show that the proposed method can achieve a higher recognition accuracy of 82\% in comparison with the existing techniques in the literature. The main strength of the proposed approach is that without using a large number of data for training, we can achieve high accuracy and efficiency by using the depth information only. The computational complexity can be reduced for recognising the objects using depth images.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070460809
http://hdl.handle.net/11536/140404
Appears in Collections:Thesis