VISION-BASED ANALYSIS OF OBJECT POSES AND HUMAN ACTIVITIES FOR HUMAN-COMPUTER INTERACTION APPLICATIONS
|關鍵字:||人與電腦互動;頭部姿勢臉部表情;手姿勢;腳部運動;物體姿勢;Human-computer interaction;Head pose and facial expression;Hand gesture;Leg movements;Object pose|
在分析手部姿勢方面，我們提出一個新的系統來從單張影像分析手部姿勢。在這裡我們提出一個兩階段式的方法來估計手部姿勢與手指關節角度。其中手的姿勢是用雷射光打在手背上所形成的光點與推廣式赫弗轉換（the generalized Hough transform）來求得；手指關節角度則是用反向關節計算技巧及動態程式規劃來求得。在有考慮手指之間的關係的情況下，我們所提出的方法的複雜度為O(m2)，而傳統窮舉的方式則為O(m12)，其中m代表每個關節可能的角度數目。實驗結果顯示所估計到的手部參數可以用來作為三度空間手部的電腦動畫模擬，與手部姿勢辨識之用途。證實了本方法的可行性。.
In order to increase productivity and to facilitate every day’s life, scientists and engineers have been trying to build intelligent systems that can interact with human beings via human ways for thousands of years. Such intelligent systems must have the capabilities to analyze human activities and to provide natural feedback. Today, since computer vision is a noninvasive method of sensing, vision-based systems for analyzing human activities are more convenient and friendly for many applications than the other ways of sensing. Hence, technologies of computer vision for analyzing human activities are desired for developing human-computer interaction systems. Since it is natural for humans to conduct intended activities by head poses, facial expressions, hand gestures, and leg movements, we are interested in analyzing these human activities in this dissertation study, and have proposed new methods for analyzing such human activities. In addition, by placing some man-made marks on a human, the activity of the human can be determined by analyzing the motion of these marks. This technique is often used for precise localization. There exist many vision-based techniques of localization but few of them can tell us about the qualities of inputs and estimated results. In this dissertation study, this problem is also investigated. For the analysis of the head pose and the facial expression, four new iterative methods based on the use of single images of human faces are proposed. Two of them are direct methods and designed for simplified cases. The other two are iterative methods for the general case. The two direct methods and one of the iterative methods are derived from the perspective projection equations of the feature points on the human face. The other iterative method extends the concept of successive scaled orthographic approximations to estimate the parameters for the human face. Experimental results show that the proposed methods are robust. Furthermore, the iterative methods are shown to have high percentages of convergence, proving the feasibility of the proposed approach. For the analysis of free hand gestures, a new model-based system for analyzing free-hand gestures from single images by computer vision techniques is proposed. In this study, the orientation and position of the hand, and the joint angles of the fingers and the thumb are estimated separately by two steps. The orientation and position of the hand are estimated first by using sparse range data generated by laser beams and the generalized Hough transform. Next, estimation of the parameters for the joint angles of the fingers and the thumb is regarded as an optimization problem. Possible configurations for the fingers and the thumb are generated by a novel inverse kinematic technique, and the best configurations are found by a new algorithm based on the dynamic programming technique. The estimated parameters are shown suitable for 3-D hand gesture animation by experiments. In addition, the applicability of the proposed system is also demonstrated by a simple hand gesture recognition system. Experimental results show the feasibility of the proposed approach. For the analysis of leg movements, a vision-based system for tracking and interpreting leg motion in image sequences using a single camera is developed for a user to control his movement in the virtual world by his legs. Twelve control commands are defined. The trajectories of the color marks placed on the shoes of the user are used to determine the types of leg movement by a first-order Markov process. Then, the types of leg movement are encoded symbolically as input to Mealy machines to recognize the control command associated with a sequence of leg movements. The proposed system is implemented on a commercial PC without any special hardware. Because the transition functions of Mealy machines are deterministic, the implementation of the proposed system is simple and the response time of the system is short. Experimental results with a 14 Hz frame rate on image resolution are included to prove the feasibility of the proposed approach. To develop a reliable computer vision system, the employed algorithm must guarantee good output quality. In this study, to ensure the quality of the pose estimated from line features, two simple test functions based on statistical hypothesis testing are defined. First, an error function based on the relation between the line features and some quality thresholds is defined. By using the first test function defined by a lower bound of the error function, poor input can be detected before estimating the pose. After pose estimation, the second test function can be used to decide if the estimated result is sufficiently accurate. Experimental results show that the first test function can detect input with low qualities or erroneous line correspondences, and that the overall proposed method yields reliable estimated results. In summary, the conducted experimental results of all the proposed approaches show their feasibility and prove that the proposed systems can be taken as the basis for developing a more effective human-computer interaction system.
|Appears in Collections:||Thesis|