標題: 運動影片內容分析、理解與註釋之研究
Sports Video Content Analysis, Understanding and Annotation
作者: 陳華總
Chen, Hua-Tsung
李素瑛
Lee, Suh-Yin
資訊科學與工程研究所
關鍵字: 多媒體;影像分析;運動影片;訊號處理;內容分析;multimedia;video analysis;sports video;signal processing;computer vision;content analysis
公開日期: 2009
摘要: 隨著教育、娛樂、運動以及其它各式各樣多媒體應用的發展,數位化的影音多媒體數位內容與日劇增。因此,許多研究致力於多媒體內容的分析與理解並研發實用的系統,讓使用者可以快速地獲得所要的多媒體資料。運動影片是影音多媒體資料中相當重要的一環,有著相當可觀的商業利益、多樣的娛樂效果以及龐大的觀眾群,所以有越來越多的研究著眼於運動影片分析。目前大多數的運動影片分析以場景分類或精彩片段的擷取為主。然而,有越來越多的觀眾或球員希望能有多媒體系統的輔助來取得更豐富的運動資訊。甚至,裁判也要求利用電腦技術來輔助判決以提高公平性。本論文研究重點在於單一視角之視訊特徵的整合並設計相關演算法以達成運動影片內容理解、索引、註釋與擷取。 在運動影片中,重要的事件主要發生於球跟球員之間的互動。為了得知意義上與戰術上的相關內容,首先我們提出了一個有效且快速的方法來追蹤球路並計算球在各畫面中的位置。球路追蹤是一個相當艱難的問題。球在畫面中的體積小且不明顯,移動速度又快,想在單一畫面中辨別出哪一個物體是球,幾乎是不可能的。因此,我們利用球在畫面中移動的特性來辨識哪一段軌跡是球路,而不是從各畫面中去辨識哪一個物體是球。為了取得更豐富的比賽資訊並對比賽內容有更深刻的理解,我們提出一套創新的方法,能夠從單一視角之影片重建3D球路。此3D球路重建之演算法可用於籃球、排球、網球之類擁有特定球場模型且多數場地特徵能被鏡頭所拍攝到之運動影片。在這類的運動影片中,利用攝影機所拍攝到的球場邊線與特徵物體,可以計算出三維空間位置與二維畫面座標之間的轉換關係。要從二維資訊去推論三維的資訊本來就是相當具有挑戰性的一個問題,因為在影像拍攝的過程中已經損失了空間中的深度資訊。在此,我們利用物理特性來設立球在三維空間的移動模型,再加上先前所求計算得之二維軌跡以及三維二維間的轉換關係,我們將可以估算出三維球路模型的參數,進而重建球在三維空間的運動軌跡。所取得之二維球路與重建之三維球路在運動影片中有著多樣化的重要應用,像是籃球的投籃出手點定位、排球事件偵測,以及棒球的投球球路分析。從三維球路所產生之三百六十度虛擬重播更可以讓觀眾隨己意變換不同視角來觀看球的動向。 在棒球比賽中,投球的進壘點(球經過打者時,與好球帶的相對位置)是影響球被打擊出去後移動方向的一個重要因素。好球帶是決定投球進壘點的一個參考指標,因此我們提出了一演算法分析打者姿勢與輪廓來設定好球帶,不論左打或右打姿勢都可適用。除了投打之間的對決外,球被打擊出去後的守備過程亦是吸引觀眾注目的焦點。經由辨識畫面中的特徵物體與線段,我們分類目前攝影機所拍攝的球場區域。因為攝影機所拍攝之區域即為事件發生之區域,所以我們可以利用影片中不同球場區域之轉換來推論球的移動路線與防守過程,並提供相似防守片段之比較,以分析守備策略。 我們以籃球、排球與棒球影片為測試資料,進行了多樣的實驗來評估所提出各種方法的效能。在我們的實驗中,其結果驗證所提各方法之可行性與優越性,並顯示從單一視角之運動影片即可取得相當多的比賽內容資訊供球員、教練做戰術分析與資料統計之用,並讓觀眾對比賽有更深入的了解。我們亦相信,本論文所提之運動資訊擷與影片內容理解諸多方法將可以應用於更多種類之運動影片。
The explosive proliferation of multimedia data in education, entertainment, sport and various applications necessitates the development of multimedia application systems and tools. As important multimedia content, sports video has been attracting considerable research efforts due to the commercial benefits, entertainment functionalities and a large audience base. The majority of existing work on sports video analysis focuses on shot classification and highlight extraction. However, more keenly than ever, increasing sports fans and professionals desire computer-assisted sports information retrieval. Even more, the umpires demand assistance in judgment with computer technologies. In this thesis, we concentrate on the feature integration and semantic analysis for sports video content understanding, indexing, annotation and retrieval from single camera video. In sports games, important events are mainly caused by the ball-player interaction and the ball trajectory contains significant information and semantics. To infer the semantic and tactical content, we first propose an efficient and effective scheme to track the ball and compute the ball positions over frames. Ball tracking is arduous task due to the fast speed and small size. It is almost impossible to distinguish the ball within a single frame. Hence, we utilize the ball motion characteristic over frames to identify the true ball trajectory, instead of recognizing which object is the ball in each frame. To retrieve more information about the games and have a further insight, we design an innovative approach of 3D ball trajectory reconstruction in single camera video for court sports, where the court lines and feature objects captured in the frames can be used for camera calibration to compute the transformation between the 3D real world and the 2D frame. The problem of 2D-to-3D inference is intrinsically challenging due to the loss of the depth information in picture capturing. Incorporating the 3D-2D transformation and the physical characteristic of ball motion, we are able to approximate the depth information and accomplish the 2D-to-3D trajectory reconstruction. Manifold applications of sports video understanding and sports information retrieval can be achieved on the basis of the obtained 2D trajectory and the reconstructed 3D trajectory, such as shooting location estimation in basketball, event detection in volleyball, pitch analysis in baseball, etc. The 3D virtual replay generated from the 3D trajectory makes game watching a whole new experience that the audience are allowed to switch between different viewpoints for watching the ball motion. In baseball, the pitch location (the relative location of the ball in/around the strike zone when the ball passes by the batter) is an important factor affecting the motion of the ball hit into the field. Strike zone provides the reference for determining the pitch location. Hence, we design a contour-based strike zone shaping and visualization method. No matter the batter is right- or left-handed, we are able to shape the strike zone adaptively to the batter’s stance. Computer-assisted strike/ball judgment can also be achieved via the shaped strike zone. In addition to the pitcher/batter confrontation, the defense process after the ball is batted also attracts much attention. Therefore, we design algorithms to recognize spatial patterns in frames for classifying the active regions of event occurrence in the field. The ball routing patterns and defense process can be inferred from the transitions of the active regions captured in the video. Furthermore, the sequences with similar ball routing and defense patterns can be retrieved for defense strategy analysis. Comprehensive experiments on basketball, volleyball and baseball videos have been conducted to evaluate the performance of the proposed methods. The experimental results show that the proposed methods perform well in retrieving game information and even reconstructing 3D information from single camera video for different kinds of sports. It is our belief that the preliminary work in this thesis will lead to satisfactory solution for sports information retrieval, content understanding, tactics analysis and computer-assisted game study in more kinds of sports videos.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009217807
http://hdl.handle.net/11536/74490
Appears in Collections:Thesis


Files in This Item:

  1. 780701.pdf