Title: 結合字詞擴充與隱含主題分析之詞典文件搜尋排序方法
Ranking Approach for Searching Dictionary Documents by Query Expansion and Latent Topic Analysis
Authors: 蘇俊凱
Su, Jun-Kai
Liu, Duen-Ren
Keywords: 搜尋排序;隱含主題模型;Searching result ranking;Latent Topic Modeling;Pointwise Mutual Information
Issue Date: 2016
Abstract: 近年來數位化的數據及資料數量跟隨網路發展呈現指數般的成長,當使用者需要特定資料時,多數使用者會透過關鍵字來查詢線上詞典,輸入關鍵字後系統便回傳相關詞典文件,而這些未經排序的搜尋結果通常無法滿足使用者實際需求,所以要如何呈現搜尋後的內容便成為一重要議題。本研究提出一個詞典文件搜尋結果排序方法,所設計的方法包括透過隱含狄利克雷分布(Latent Dirichlet Allocation)方法分析字詞與詞典文件之隱含主題向量;以PMI(Pointwise Mutual Information)方法進行查詢字詞擴增;以擴充查詢字詞與詞典文件之隱含主題向量相關度進行排序;並透過分析瀏覽與查詢點擊紀錄產生使用分數來調整關聯排序。 本研究以教育百科辭典文件搜尋進行實驗評估,實驗結果顯示所提方法產生的排序結果能夠提高詞典文件的搜尋品質。
With the rapid development of the internet, the number of varied types of the data is increasing exponentially. When users need some specific information, they always search for online dictionary by using keywords, then the system will return the dictionary documents which are related to the keyword. The returned result is usually un-arranged that it cannot meet a user’s real demand. Consequently, how to display the result becomes an important issue. In this research, we propose a ranking mechanism for the searching result of querying. The mechanism takes LDA (Latent Dirichlet Allocation) to analyze the latent topic vectors of query terms and dictionary documents and uses PMI (Pointwise Mutual Information) to expand the query terms. With the latent topic vectors of expansion terms and dictionary documents, we derive the relevance ranking of dictionary documents and adjust the ranking based on usage scores analyzed from the search-click log and the browsing log. The research conducts experiments on searching the dictionary documents of “Encyclopedia of Education”. Our experiment result shows that the proposed approach can improve the quality of searching dictionary documents.
URI: http://etd.lib.nctu.edu.tw/cdrfb3/record/nctu/#GT070353408
Appears in Collections:Thesis