標題: 高效能字串比對演算法及其實現
An Efficient Pattern Matching Algorithm and Its Implementation
作者: 梁嘉旂
Chia-Chi Liang
李程輝
Tsern-Huei Lee
電信工程研究所
關鍵字: 網路安全;字串比對;network security;string matching
公開日期: 2006
摘要: 因為可以準確判斷,因此字串比對在一般入侵偵測系統中扮演著相當重要的角色,被廣泛應用到網路安全設備來偵測攻擊或病毒。在現今有許多知名的字串比對演算法中,因為AC演算法可以同時比對多個字串並保證在最壞情況的效能,因此被廣泛使用。然而,原始的AC演算法有兩項缺點需要改進,其中一項是記憶體需求量,另一項是工作輸出量。因為原始的AC演算法在一個運算週期只能處理一個字元,無法滿足現在高速網路,所以本研究延伸AC演算法,使其可以處理多的字元,以達到工作輸出量的改進。在演算法裡,全部的字樣及欲比對之字串都分成K份,有K組比對引擎同時做比對,一個運算週期共處理K個字元,所以工作輸出量增加至K倍。我們實作在Xilinx FPGA上,當K=4的時候可以達到4.5Gbps的工作輸出量。然而考慮實作的情況,因為每個運算週期都必須讀取相當多的位元並不理想,所以根據所提的演算法,將字樣依規則分成幾個組別,使用編號方式做延伸改進。除此之外,考量記憶體的資源,我們可以複製較少份的查表資料,使其在多個運算週期內計算完K組比對結果。換言之,我們可以達到一個運算週期處理多個字元,並且使用較少的記憶體,根據任何硬體考量可以做修正,適用於各種的字樣的字串比對演算法。
Because of its accuracy, pattern matching technique has recently been applied to Internet security applications such as intrusion detection/prevention, anti-virus, and anti-malware. Among the various pattern matching algorithms, the Aho-Corasick (AC) can match multiple pattern strings simultaneously with worst-case performance guarantee and thus is widely adopted. However, the throughput performance of the original AC may not be satisfactory for high speed environments because only one symbol is processed in an operation cycle. In this paper we present an extension of the AC algorithm where multiple symbols are processed in an operation cycle to improve throughput performance. In our proposed scheme, all pattern strings, and the input text string as well, are divided into K substrings, if K symbols are processed in an operation cycle. Moreover, K pattern search engines are employed to scan the text substrings in parallel. As a result, the throughput performance can be improved by K times. We implemented our proposed pattern matching scheme with Xilinx FPGA and achieved more than 4.5Gbps throughput for K = 4. As we need to access so many bits for PMV per cycle, we have presented an extension of our algorithm with output index. We separate patterns into several groups and PMV can be replaced by index. Considering the memory resource of hardware, we can duplicate fewer tables by processing it in several cycles. As a result it’s flexible to deal with any given rule set in all situations.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009413516
http://hdl.handle.net/11536/80777
Appears in Collections:Thesis


Files in This Item:

  1. 351601.pdf