標題: JPEG2000 壓縮在雙核心數位訊號處理器上的實作
Implementation and optimization of JPEG2000 compression on dual-core DSP processors
作者: 何柏瑲
Ho, Po-Chiang
游逸平
You, Yi-Ping
資訊學院資訊科技(IT)產業研發碩士專班
關鍵字: Blackfin;數位訊號處理器;BF561;JPEG2000;平行處理;雙核心;Blackfin;DSP;BF561;JPEG2000;parallel processing;dual-core
公開日期: 2010
摘要: 多核心是未來處理器設計的趨勢。Analog Device (ADI)在最新一代的Blackfin處理器—ADSP-BF561—中也採用了多核心的設計。BF561是一顆採用微信號架構(Micro Signal Architecture)的雙核心數位訊號處理器,此架構擅長於處理影像及各種多媒體訊息。在本篇論文中,我們從OpenJPEG公開原始碼計畫中移植一個JPEG2000壓縮的程式到BF561上,接著在應用程式的階層上提出並實作最佳化的方法。我們的最佳方法主要在於(一)資料地域性最佳化和(二)把工作分配到兩個核心上執行。我們挑選了JPEG2000壓縮中佔運算比重最大的兩個部份—DWT和EBCOT Tier-1—來實行我們所提出最佳化方法。此外,我們在論文中討論實驗中遇到的兩個關於編譯器的問題:其一是GCC內建函式對跨函式最佳化的干擾,另一是GCC無法有效率的產生出平行指令。在我們的實驗中,我們發現使用我們所提出的資料地域性最佳化後可以有效地提昇兩個核心的使用效率,原因是我們的最佳化幅度減少了對外界低速記憶體存取的需求。我們使用了四張標準的測試影像來評估我們最佳化的效能。我們的最佳化結果相較於原始程式在一個核心上執行並加了-O3編譯器最佳化,可以加速影像壓縮達1.92至2.04倍左右。
Multi-core is the trend of future processor design. Along with this trend, Analog Device (ADI) developed their latest Blackin processor–ADSP-BF561–with a multi-core design. BF561 is a dual-core, SMP-like DSP processor based on micro signal architecture (MSA), which is specialized for video processing and multimedia computations. In this paper we propose several software-level optimizations to speed up a JPEG2000 compression program ported from OpenJPEG project on a Blackin BF561 processor. Two optimization methods, data locality optimization and utilization of two cores, are performed on the two heavy-loading stages of JPEG2000 compression: DWT and EBCOT Tier-1. Implementation issues such as the disturbance to compiler optimizations when using GCC attributes and inefficient generations of parallel instructions are discussed. In our experiments, we found that we can only benefit from the utilization of two cores after the data locality optimization is well performed because the data locality optimization reduces the heavy loading of accesses to low-speed SDRAM. Four popular image testbenches are used to evaluate the efficiency of our optimizations. The experiments showed that the optimizations have a speed-up of 1.92x–2.04x for the compression compared to the baseline with -O3 optimization flag running on single core.
URI: http://140.113.39.130/cdrfb3/record/nctu/#GT079790512
http://hdl.handle.net/11536/46598
Appears in Collections:Thesis


Files in This Item:

  1. 051201.pdf