標題: 中文文本中限定性抽象名詞指代消解
Definite Abstract Anaphora Resolution in Chinese Texts
作者: 程俊樺
Cheng, Jyun-Hua
梁婷
Liang,Tyne
資訊學院資訊科技(IT)產業研發碩士專班
關鍵字: 指代消解;抽象指代;anaphora resolution;abstract anaphora
公開日期: 2010
摘要: 在文本中,指代是一種常見的詞彙替換,用以指示先前所提到的事物。在中文文件裡,指代現象包括有代名詞指代、零指代以及名詞指代,其參照對象可為抽象描述或實體名稱。在本論文中,我們針對限定性的抽象名詞指代,提出一個以小句為單位的指代消解程序。利用同義詞詞林、中研院八萬目詞辭典及網路搜尋相關詞等資源,進行指代詞辨識、辨識特徵萃取。我們建立有限狀態機,以進行指代詞辨識,在1538個實例中達到90%辨識正確率。我們萃取四種類型共十個特徵,包括位置特徵、距離特徵、詞彙特徵和語義特徵,做為回指對象的挑選依據。我們分別以支援向量機分類器和權重計算法來進行指代消解,並以基因演算法求出最佳特徵組合。實驗結果顯示在241個抽象名詞指代消解,支援向量機分類器在小句符合的正確率是40.66%,長句符合的正確率是68.46%,權重計算方法在小句符合的正確率是42.32%,長句符合的正確率是70.54%。
Anaphora is a common phenomenon in written texts, denoting the use of terms referring the mentioned entities previously. There are pronominal anaphora, zero-anaphora, and nominal anaphora in Chinese texts. The referents can be abstract or entities. In this thesis, we focus on studying definite abstract noun anaphora, and we propose a clause based anaphora resolution procedure. Furthermore, anaphora identification and feature selection are done by using CLINE, CKIP lexical and Google search results etc. The anaphora recognition achieves 90% precision using finite state machine in 1538 instances. Furthermore, we extract four types of features to classify candidate antecedents including position features, distance features, lexicon features and semantic features. These features are used for building SVM classifiers and weighted model on resolving anaphora. The best features set are found by a genetic algorithm. In 241 definite anaphora instances, the SVM classify achieves 40.66% on correct clause position and 68.46% on correct sentence position. The weighted method achieves 42.32% on correct clause position and 70.54% on correct sentence position.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079790506
http://hdl.handle.net/11536/46592
Appears in Collections:Thesis


Files in This Item:

  1. 050601.pdf