標題: 在PACDSP平台上之MPEG-4視訊解碼器軟體實現
Software Implementation of MPEG-4 Video Decoder on PACDSP Platform
作者: 蔡崇諺
Chung-Yen Tsai
林大衛
David W. Lin
電子研究所
關鍵字: MPEG-4;視訊;解碼器;實現;PACDSP;MPEG-4;Video;Decoder;Implementation;PACDSP
公開日期: 2005
摘要: MPEG-4為一廣泛應用之多媒體訊號壓縮標準。本篇論文介紹在PACDSP平台上MPEG-4視訊解碼器之實現,本平台由一超長指令數位訊號處理器與一ARM920T處理器所組成。為了最佳化程式流程,我們也完成了許多的靜態分析,並且利用超長指令處理器架構上之特性來達到即時解碼。我們也完成了簡單的雙核心展示並驗證其正確性。 在我們的實作當中,我們使用了MPEG-4參考軟體,MoMuSys,當作驗證的比較對象。首先,我們分析了MPEG-4基於圖像解碼器之運算複雜度並藉此找到有效率的實現方法。接著,我們根據離散餘弦轉換(DCT)之特性來跳過多餘的運算,並且對於全零之剩餘方塊亦有許多可略過之計算。為了加速執行時間,我們將規律之運算分佈於兩組以增加處理器之效能。我們也使用單指令多資料(SIMD)指令以及一般指令層級平行化來減少處理器之延遲。我們討論了離散餘弦反轉換(IDCT)之效能與精確度,並且我們的實現能夠符合IEEE1180-1190標準之規範。我們所使用之演算法在效能上也具有與其他實現競爭的能力。在所有的最佳化之後,我們在最差情況下解碼一張QCIF格式之圖像需要5,700,000週期。也就是說,對一個工作在175MHz的真實PACDSP晶片而言,我們能夠達到每秒三十張畫面之即時解碼。而整個程式的大小為27 Kbytes,也小於PACDSP的程式快取記憶體大小32 Kbytes。最後我們在PSDK平台上展示了雙核心的實現。 在本篇論文當中,我們首先介紹了MPEG-4標準以及PADSP平台之概述。接著討論靜態分析、實作策略、最佳化方法、以及實驗結果。最後簡單介紹了展示雙核心實現的系統與機制。
MPEG-4 is a widely-applied multimedia coding standard. This thesis presents an implementation of MPEG-4 video decoder on the PACDSP platform, which consists of a VLIW digital signal processor (DSP) and an ARM920T processor. We complete many anlyses to optimize the program flow and utilize the advantage of VLIW processor to achieve real-time decoding. A simple dual-core demostration is completed and verified. In our implementation, the MPEG-4 reference software, MoMuSys, is used as a golden model to verily our implementation. First, we analyze the computational complexity of the MPEG-4 frame-based video decoder, and find efficient algorithms for the implementation. Second, we skip some computations according to the nature of discrete cosine transform (DCT), and there are also lots of comutation skipped for all-zero residual blocks. Third, to speed up the execution time, we distribute the regular computations to both clusters to increase the efficiency of the processor. Single-instruction-multiple-data (SIMD) instructions and general instruction level parallelism also utilized to reduce the processor stalls. We also discuss the efficiency and accuracy of IDCT, and the accuracy of our IDCT implementation can meet the IEEE 1180-1190 standard. The performance of our alogorithm is also competitive to other implementations. After all the optimizations, the worst-case computaion time for QCIF format is less than 5,700,000 cycles. That is, our implementation can achieve real-time decoding, 30 frame-per-second, for a real PACDSP chip running over 175 MHz. The code size is 27 Kbyte, which is smaller than the 32-Kbyte instruction cache on PACDSP. Finally, we demonstrate a simple dual-core implementation on the PAC System Developer’s Kit (PSDK). In this thesis, we first introduce the MPEG-4 standard and give an overview of the PACDSP platform. Then the static analysis, implementation strategies, the optimiztion methods, and the experiment results are discussed. Finally, we brief the system and mechanism for demonstration of the dual-core implementation on PSDK plarform.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT009311626
http://hdl.handle.net/11536/78096
Appears in Collections:Thesis


Files in This Item:

  1. 162601.pdf