標題: 針對簡單正規表示式之字串比對演算法
An Algorithm for simple regular expression matching
作者: 林建碩
李程輝
電信工程研究所
關鍵字: 字串比對;正規語言表示式;預先過濾器;pattern matching;regular expression;pre-filter
公開日期: 2010
摘要: 字串比對的技術,由於能準確地偵測出關鍵字,是現今在防毒/防蟲技術上的重要應用。在眾多有名的字串比對演算法中,Aho-Corasick (AC)演算法,是一個能夠同時比對多重字串,並且在各種環境之下都能夠保證穩定的輸出表現的演算法。然而AC演算法是根據純粹字串的比對來設計的,如此對於以正規表示式來表示的病毒/蠕蟲卻無法直接應用。 在本篇論文中,我們使用有系統的演算法,來建構一個字串比對系統,並針對有限長度且可用簡單正規表示式之字串。所提出之系統包含預先過濾器及驗證模組。經由預先過濾器,系統可快速的略過明顯不含字串的文件範圍,且在掃瞄到可疑字串之起始位置時回報給驗證模組;驗證模組為AC演算法的延伸,其中包含了多階層的狀態轉移圖,以及和階層的狀態轉移圖相關的分岔函數。在掃描的過程,可同步處理不同階層的狀態轉移圖。 實驗結果顯示,我們所提出的演算法跟ClamAV、及加強之ClamAV、延伸有限自動狀態機(XFA)比較,我們的系統具有較佳的處理效能並且擁有滿意之記憶體占用大小。
Because of its accuracy, pattern matching is considered an important technique in anti-virus/worm applications. Among some famous pattern matching algorithms, the Aho-Corasick (AC) can match multiple patterns simultaneously and guarantee deterministic performance under all circumstances. However, the AC algorithm was developed for strings while virus/worm signatures could be specified by simple regular expressions. In this paper, we enhance the AC algorithm to systematically construct a signature matching system which can indicate the ending position in a finite input string for the occurrence of virus/worm signatures that are specified by strings or simple regular expressions. The regular expressions studied are those adopted in ClamAV for signature specification. Our proposed signature matching system consists of a pre-filter and a verification module. The purpose of pre-filter is to quickly exclude the parts of input string which obviously do not contain signatures and find the starting positions of suspicious sub-strings which may result in match of some signatures. The verification module is an extension of the AC algorithm that consists of multiple levels of goto graphs. Goto graphs in the same level are connected by a novel fork function. Those in different levels could be traversed concurrently. Experimental results show that, compared with ClamAV implementation and its enhancement and the extended finite automaton (XFA), our proposed system yields better throughput performance with acceptable memory requirement.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079713527
http://hdl.handle.net/11536/44546
顯示於類別:畢業論文


文件中的檔案:

  1. 352701.pdf