Automatic Text Summarization System for Chinese News
|關鍵字:||自動摘要;中文自動摘要;新聞自動摘要;summarization;text summarization;automatic text summarization;summarization for news|
本研究利用Yahoo新聞網之新聞內容、中央研究院詞性分類集做分析，萃取出核心關鍵詞，並將句子轉換成關鍵詞串列。利用中文語法之特性、同義詞詞庫，對核心關鍵詞做關鍵詞擴展之動作。接著，利用擴充完之關鍵詞集合做為挑出關鍵詞摘要之依據，並利用[Yihong Gong, Xin Liu, 2001]提出之概念，挑選出潛藏語意分析之摘要。本研究將上述兩種摘要結果做整合且考慮可讀性，產生一篇摘要提供使用者閱讀。|
As with the popularity of internet, information overloading has become a major problem and people have to spend more and more time to look for the information they need. In recent years, search engine has been used in many ways for many purposes, so a system which could reduce the amount of the content without losing the principle meaning of the content is necessary. In this research, the application domain is Internet News summarization and the data corpus was collected from Yahoo. We make use of CKIP (Chinese Knowledge and Information Processing) to perform POS tagging task. Based on the POS tagging information, the system analyzes and extracts the core keywords and makes a transition from a sentence to a keyword string. Then keywords expansion is performed based on the Chinese semantic architecture and HowNet. After the expansion, each core keyword will be given a weight according to its type. Then, the weight of each sentence will be obtained by the summation of the weights of the keywords in the sentence. Based on the sentence weighting information, the sentences could be ranked to obtain a core summary set. Also, We use the idea of linear algebra provided by [Yihong Gong, Xin Liu, 2001] to make an assistant summary set and get information that may be missed by using topic based way to make our summary more completely. Finally, the system integrates two summary sets mentioned above to make a summary and takes into account readability issue to make the whole summary become fluent.
|Appears in Collections:||Thesis|