標題: 建構在 ARM 平台的 H.264/MPEG-4 AVC 解碼器以及去方塊濾波加速器
ARM-based Platform Design for H.264/MPEG-4 AVC Decoder and Accelerator for Deblocking Filter
作者: 張世騫
Shihchien Chang
蔣迪豪
Tihao Chiang
電子研究所
關鍵字: 解碼器;平台式設計;去方塊濾波器;單晶片系統;巨方塊平行處理架構;去方塊濾波加速器;H.264 decoder;MPEG4 AVC;Deblocking Filter;Loop Filter;ARM;Platform
公開日期: 2004
摘要: 本論文使用最佳化的平台式設計方法去建構一個H264/MPEG-4 AVC 解碼器。考量其高效能、低成本及廣泛的應用範圍,我們使用ARM微處理器作為CPU核心。同時,我們使用高效能控制匯排流架構 (AMBA) 去提升系統傳輸效能和彈性。為了提升解碼器的速度,我們同時對軟體及硬體做最佳化。同時,我們提出一個巨方塊平行處理的架構(macroblock-level pipelining) 使得軟體和硬體能夠同步處理而提升效能。在我們的硬體設計裡,我們實現三個加速器去滿足三個計算需求最強的模組: 去方塊濾波器(deblocking filter), 動作補償(motion compensation) 和轉置DCT 運算(inverse transform)。其中,在去方塊濾波器的設計裡,我們提出適應性傳輸方法(adaptive transfer scheme)和匯排流同步傳輸的架構(bus-interleaved architecture)。考量去方塊濾波器需要大量的傳輸頻寬,我們將傳輸分成8種模式以適應性的方法減少傳輸資料量而使頻寬有效被利用。另外,為了減少去方塊濾波處理的時間,我們使用匯排流同步傳輸資料的架構使資料傳輸和濾波處理能平行處理。和前人去方塊濾波硬體設計比較,我們最高有7倍的效能改善。就整體解碼效能改善而言,我們的設計比起H.264參考軟體JM6.0有9到16倍的效能提升。整體而言,我們的平台系統設計可以快速的整合到單晶片系統(system-on-chip)的設計中。而且,我們提出的硬體架構設計也可滿足低成本與即時播放(real-time)的應用。
In this thesis, we present a baseline H264/MPEG-4 AVC decoder based on an optimized platform-based design methodology. In our platform, we employ the ARM microprocessor as the CPU core due to its high performance, low cost, and wide application. Besides, the Advanced Microcontroller Bus Architecture (AMBA) is integrated into our system as the on-chip bus due to its high performance and flexibility. To improve our system, we jointly optimize the software and hardware in the decoder. Also, we propose a macroblock-level pipelining architecture to achieve the synchronization of the software and the dedicated hardware co-processors. In our hardware design, three dedicated accelerators of deblocking filter, motion compensation and inverse transform, which are the most computationally intensive modules, are realized. Specifically, in the architecture design of deblocking filter, we proposed an adaptive transfer scheme and a platform-based bus-interleaved architecture. As considering the high bandwidth usage of bus for deblocking filter, we classify the filtering mode into 8 types and use an adaptive transmission scheme to avoid redundant data transfers so as to efficiently use the bus bandwidth. Moreover, to reduce the processing latency, we use a bus-interleaved architecture for conducting data transfer and filtering operation in parallel. As compared to the state-of-the-art designs of deblocking filter, our scheme offers up to 7x performance improvement. To compare the overall decoding performance, our experiments show that the throughput of H.264 reference software of JM6.0 decoder can be improved by 9 to 16 times. Finally, our proposed platform system can be easily applied in the system-on-chip design. Also, our proposed hardware architectures are suitable for low-cost and real-time applications.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009211618
http://hdl.handle.net/11536/66935
Appears in Collections:Thesis


Files in This Item:

  1. 161801.pdf