標題: XML文件搜尋引擎的研究
Study on the Search Engine of the XML Documents
作者: 蕭遜文
Hsun-Wen Hsiao
李素瑛
Suh-Yin Lee
資訊學院資訊學程
關鍵字: 搜尋引擎;編碼;樹狀結構;節點;序號;路徑;xml;BEL;signature file;xpath;order;shredding
公開日期: 2006
摘要: 傳統的搜尋引擎主要以關鍵字查詢為主,雖然提供布林運算式的查詢,但無法查詢關鍵字在文件中的順序(order)關係。XML文件搜尋引擎查詢時,除了必須具備傳統搜尋引擎關鍵字的查詢功能之外,必須考量XML文件中資料的階層關係,因此查詢時需透過由W3C所制定的XML查詢語言XPath[5]的語法查詢關鍵字在XML文件中的順序關係,以彌補傳統搜尋引擎所欠缺的功能。 本論文主要在大量XML文件資料庫加速查詢,利用所謂Begin-End-Level(BEL)[22]區間編碼方式,建立XML文件資料庫的索引結構。XML文件經BEL編碼之後,將索引資料值儲存於關聯式資料庫系統(Relation DataBase Man-agement System,RDBMS)。再利用XPath表示式轉換為SQL Command存取資料庫,重建(reconstruct)所得到的資料錄(records),可以獲得跟原先一致的XML文件內容。 為了加速XML搜尋引擎的查詢,引入signature file的索引機制,做為過濾機制,過濾掉不必要的資料庫查詢。
Traditional search engine mainly query by keywords.Although Boolean opera-tions are provided, it is unable to query the ordering of keywords or attributes in XML documents. In XML document serach engine, besides the keyword query function of the traditional search engine, the ordering of data or the hierarchical relation of data in the XML documents must also be considered. XML query strings expressed in Xpath, which is the W3C XML query language, can query the order of keywords,the structure of XML documents. In this thesis, we are focused on the speed up of query operations in large XML documents database. We use Begin-End-Level (BEL) interval encoding method to build the index structure for each XML document.After the BEL coding of the XML documents, the indexes are saved into Relation Database. The query in the XPath ex-pression is transformed into the SQL query Commands. The stored records can recon-struct the original and consistent contents of XML documents. In order to speed up the query, the index mechanism of signature file is employed to filter out the unqualified documents first and avoid nonessential query operations.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009067586
http://hdl.handle.net/11536/41591
Appears in Collections:Thesis


Files in This Item:

  1. 758601.pdf