標題: 籃球影片之語義標注與摘要擷取之研究
A STUDY ON SEMANTIC ANNOTATION AND SUMMARIZATION OF BASKETBALL VIDEO
作者: 陳俊旻
Chen, Chun-Min
陳玲慧
Chen, Ling-Hwei
資訊科學與工程研究所
關鍵字: 語義事件;影片檢索;資訊檢索;影像辨識;網路轉播文字;慢動作重播;運動;體育;semantic event;webcast text;information retrieval;video retrieval;sports;slow motion replay;sports video summarization;sports
公開日期: 2013
摘要: 運動影片在我們的休閒娛樂中,扮演了重要角色,然因運動影片的資訊量很大,除了需要的頻寬與傳輸時間多,觀眾亦需耗費大量的時間觀賞,為了節省不必要的時間成本與能源成本,影片精華檢索、影片摘要、以及影片慢動作重播偵測已成為一個熱門的研究題目。目前大多數方法,皆對影片中的每一張畫面分析,然而語義事件只發生在有計分板的畫面,慢動作重播則只出現在沒有計分板的畫面,從不相關的畫面中擷取語義事件或慢動作重播,反而降低方法的準確度與執行效率,且現存的方法多針對足球影片而設計,對籃球影片之探討相對較少,為了解決現存方法所遇到的各式挑戰,本論文將以籃球影片為例,提出一個新穎的運動影片分析架構,讓一般民眾得以有效率的查詢賽事精華,也讓專業人士能夠用來延伸到其他相關應用(自動影片精華產生、運動員動作分析、球隊戰術分析等)。在此架構中,首先提供一個影片畫面分割方法,將運動影片分成有/無計分板兩類。接著,對有計分板的畫面提出一語義事件偵測方法,對無計分板畫面提出一慢動作重播偵測方法。 關於語義事件偵測的相關研究,現存的方法,多使用影片本身的影像或聲音作為特徵,然而僅使用影片內容作為特徵,往往會發生一些語義鴻溝,也就是較低階的影片特徵,和較高階的語義事件,兩者之間的差距。雖然近來有些方法,參考網路轉播文字作為外部知識以彌補語義鴻溝,但從網路轉播文字中擷取語義事件,並標注在運動影片上,仍然存在許多困難與挑戰。在此論文中,我們將討論相關的困境,並提出兩個方法來解決。 關於慢動作重播偵測的研究,現存方法大致可以分為兩類。慢動作重播前後,常常有製播單位後製加上的特效畫面,第一類方法都是基於這些特效的位置,來偵測慢動作重播,但籃球影片較為複雜,此假設在籃球影片未必恆成立。第二類方法是分析慢動作片段的特徵,利用這些特徵將慢動作重播片段和一般片 段作區分,但由於某些用於足球的特徵並不適用於籃球,此類方法在籃球應用上仍有改進空間。籃球是世界上最重要的運動之一,但在偵測籃球影片慢動作重播上,仍有許多挑戰尚待解決。本論文將提出一個新的方法,偵測籃球影片中的慢動作重播,提供一個重要的運動影片分析素材。 實驗結果顯示,本論文所提出的架構與方法,可行性與有效性皆可得到良好的驗證,基於提出的架構與方法皆沒有使用籃球限定的特徵,我們期望本論文可以被延伸應用於其他類型的運動影片。
Semantic event and slow motion replay extraction for sports videos have become hot research topics. Most researches analyze every video frame; however, semantic events only appear in frames with scoreboard, whereas replays only appear in frames without scoreboard. Extracting events and replays from unrelated frames causes defects and leads to degradation of performance. In this dissertation, a novel framework will be proposed to tackle challenges of sports video analysis. In the framework, a scoreboard detector is first provided to divide video frames to two classes, with/without scoreboard. Then, a semantic event extractor is presented to extract semantic events from frames with scoreboard and a slow motion replay extractor is proposed to extract replays from frames without scoreboard. As to semantic event extraction, most of existing researches focus on analyzing audio-visual features of video content as resource knowledge. However, schemes relying on video content encounter a challenge called semantic gap, which represents the distance between lower level video features and higher level semantic events. Although the multimodal fusion scheme that conducts webcast text as external knowledge to bridge the semantic gap has been proposed recently, extracting semantic events from sports webcast text and annotating semantic events in sports videos are still challenging tasks. In this dissertation, we will address the challenges in the multimodal fusion scheme. Then, we will propose two methods to overcome the challenges. As to slow motion replay detection, many methods have been proposed, and they are classified into two categories. One assumes that a replay is sandwiched by a pair of visually similar special digital video effects, but the assumption is not always true in basketball videos. The other analyzes replay features to distinguish replay segments from non-replay segments. The results are not satisfactory since some features (e.g. dominant color of sports field) are not applicable for basketball. Most replay detectors focus on soccer videos. In this dissertation, we will propose a novel idea to detect slow motion replays in basketball videos. The feasibility and effectiveness of all the above proposed methods have been demonstrated in experiments. It is expected that the proposed sports video analysis framework can be extended to other sports.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079455643
http://hdl.handle.net/11536/75903
Appears in Collections:Thesis


Files in This Item:

  1. 564301.pdf
  2. 564301.pdf