標題: 在營運網路中多埠網通設備之即時捕捉與重播機制
On-The-Fly Capture and Replay Mechanisms for Multi-port Network Devices in Operational Networks
作者: 林昱安
Lin, Yu-An
林盈達
Lin, Ying-Dar
資訊科學與工程研究所
關鍵字: 網路設備;故障轉移;OpenFlow switch;多埠重播;減量;networking devices;failover;OpenFlow switch;multi-port replay;downsizing
公開日期: 2012
摘要: 利用真實環境測試網路設備可以得到複雜的真實測試流量,但缺點是可能造成網路中斷且錯誤無法重製。而透過重播真實流量測試網路設備可以重製錯誤,但因為流量重播工具的限制以及不完整的待測物狀態重建導致不佳的錯誤重製率。為了保留複雜的測試流量及提升錯誤重製率,我們設計一個新機制,它使用OpenFlow switch對待測物進行自動上/下線與多埠流量重播。當待測物在線上時,此機制對待測物進行監控並捕捉錯誤流量。為了節省空間,我們只捕捉足夠觸發錯誤的封包長度及封包數量。當待測物下線時,便重播錯誤流量以進行錯誤標示。我們針對不同類型的錯誤使用不同的減量方式以有效率地進行錯誤標示。實驗結果顯示,錯誤流量的捕捉只需保留封包的部分內容便可觸發錯誤。針對第二層設備,保留封包前46 bytes就足夠觸發錯誤;而我們的第三層設備只需留下前154 bytes。封包數量則是依測試環境而異。在錯誤標示方面,我們針對封包欄位造成的錯誤及超載造成的錯誤設計減量方式,這個減量方式是以二元搜尋法為基礎。我們提出的減量方式對封包欄位造成的錯誤之縮減比率高達98.8%、超載造成的錯誤可達96%。對於因待測物下線而造成的服務中斷時間,我們發現在監控間距為1秒、容許連續錯誤次數為2次時,進行待測物下線能最有效地降低服務中斷時間。
Testing networking devices in the live environment has complex real traffic, but it may cause network interrupt and cannot reproduce defects. Replaying with real traffic to test networking devices can reproduce defects, but the effectiveness of defect reproduction is not high because of the limitation of replay tools and incomplete reconstruction of DUT (Devices Under Test) states. To keep the high complexity of test traffic and also improve the effectiveness of defect reproduction, we design a new mechanism which can allow DUT to automatically be online/offline and process multi-port replay for multi-port networking devices with an OpenFlow switch. We monitor and capture defect traces when the DUT is online. To save the space, we capture partial payload and limited packet count that are enough to trigger the defects. When we detect the DUT failure, we let the DUT be offline and replay defect trace to identify the defect. For efficient defect identification, we process different reductions for different types of defect. The experimental results show that the partial payload in the packets of captured defect traces can trigger defects. The first 46 bytes is enough for Layer-2 devices and the first 154 bytes is sufficient for our Layer-3 device. The packet count of defect trace depends on the testbed. For defect identification, a reduction based on binary searching algorithm is proposed to deal with defects caused by the payload anomaly and defects caused by the busy condition. The downsizing ratio for defects caused by the payload anomaly is up to 98.8% and the one for defects caused by the busy condition is up to 96%. For the outage time of the failover during the DUT failure, the minimum outage time is obtained when the check interval is 1 second and tolerant consecutive failure time is 2.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT070056130
http://hdl.handle.net/11536/71822
Appears in Collections:Thesis


Files in This Item:

  1. 613001.pdf