標題: 高速低密度同位元檢查區塊/迴旋碼解碼器之設計與實現
Design and Implementation of High-Throughput LDPC-BC/CC Decoders
作者: 陳志龍
Chen, Chih-Lung
Chang, Hsie-Chia
Lee, Chen-Yi
關鍵字: 低密度同位元檢查碼;高速低密度同位元檢查迴旋碼;錯誤更正碼解碼器;LDPC;LDPC-CC;ECC Decoder;High-throughput implementation
公開日期: 2012
摘要: 在行動通訊系統裡,高運算量的通道編解碼模組往往扮演相當關鍵的角色,不僅要達到傳輸需求的高吞吐量,也必須降低伴隨而來的功率消耗,以提供具有技術競爭力的解決方案。近年來,低密度同位元檢查區塊碼因為解碼效能優異,被廣泛使用於各種通訊規格裡,然而文獻中低密度同位元檢查區塊碼解碼器難以提供彈性編碼率與可變碼字長度。相反地,低密度同位元檢查迴旋碼結合了近似低密度同位元檢查區塊碼的優異解碼效能與旋迴碼的可變碼字長度特性,卻面臨解碼延遲過長、平行度低、解碼吞吐量偏低等缺點,如何達到Gb/s的吞吐量並且降低功率消耗仍是重大挑戰。據此,本論文研究探討低密度同位元檢查區塊碼與低密度同位元檢查旋迴碼的解碼器設計以達成更高吞吐量與低佳的能量效率。 在低密度同位元檢查區塊碼的部份,本論文設計了(2048, 1920)不規則低密度同位元檢查區塊碼解碼器,利用提出的CP-PEG演算法進行碼的建置以達成更佳解碼效能,但伴隨高碼率15/16而來的高節點維度也造成實現上的瓶頸。為了設計高碼率低密度同位元檢查區塊碼解碼器,本論文提出了以變數節點為主的循序排程來減少疊代次數、單一管線解碼器架構來減少信息儲存記憶體容量、以及檢查節點最佳化來進一步縮減暫存器數量,跟傳統架構相比,可節省73%的信息儲存記憶體容量。藉由90奈米製程下線,此低密度同位元檢查區塊碼解碼器測試晶片可以在1.4伏特操作電壓下達到最高11.5Gbps吞吐量,晶片面積為2.7 × 1.4 mm2,並可在達成IEEE 802.15.3c吞吐規範量5.77Gbps的情況下將電壓下降至0.8V,能量效率可達0.033nJ/bit。 對於低密度同位元檢查迴旋碼的部份,本論文實現了一個(491,3,6)時變的低密度同位元檢查迴旋碼解碼器晶片,結合了演算法層級、節點層級、位元層級的最佳化,以可接受的硬體代價與功率達成超過2Gbps的吞吐量。演算法層級改善了即時變數節點啟動排程,將通道值隱藏至其它信息之中,不但能達到log-BP演算法的兩倍快解碼收斂速度,也能減少17%的信息儲存記憶體容量。節點層級的最佳化則複製了多套檢查節點與變數節點與提出對應的架構,提高平行度的結果達到12倍的吞吐量。至於位元層級最佳化則提高了操作頻率,混合分割式FIFO把記憶體容量分割儲存至多塊雙埠記憶體中,不僅能提供足夠的記憶體頻寬給多套節點使用,同時也降低功率消耗。結合了這些技術,90奈米的低密度同位元檢查迴旋碼解碼器測試晶片佔用2.37 × 1.14 mm2面積,最高吞吐量在1.2V操作電壓下為2.37Gb/s,能量效率0.024nJ/bit比區塊碼解碼器更佳,若將電壓下降至0.8伏特可進入低功耗模式,在達成1.58Gb/s吞吐量的情況下只消耗90.2mW的功耗。 總結本論文提出的兩個實作結果,可提供涵蓋數百Mbps至數個Gbps的吞吐量範圍、具彈性的碼率與可調大小的frame、優異解碼效能、以及出色的硬體與功耗效率。藉此可使低密度同位元檢查區塊碼與低密度同位元檢查迴旋碼比其它錯誤更正碼更具有競爭力。
The channel coding module with high computation load plays an important role in wireless communication system. The competitive design must not only meet the system requirements in high throughput but also improve the energy efficiency. In the past decade, LDPC block codes (LDPC-BCs) are widely adopted in communication specifications for excellent error-correcting performance and high throughput. However, the state-of-the-art designs of LDPC-BC decoders show their weakness for providing flexible code-rates and variable codeword length. Contrarily, the LDPC convolutional codes (LDPC-CCs) combine the excellent error-correcting performance similar to LDPC block codes and variable data frame size similar to convolutional codes. But the drawbacks of LDPC-CC include the long decoding latency, low parallelism, and low to medium decoding throughput. How to achieve over Gbps throughput and to reduce the power consumption are still difficult to LDPC-CC decoder design. Accordingly, this dissertation investigates both LDPC-BCs and LDPC-CCs to explore the potential for higher throughput and better energy efficiency. For LDPC-BCs, an (2048, 1920) irregular LDPC code is generated by proposed CP-PEG algorithm with better performance than other PEG-based codes; however, the large check node degrees introduced by high code-rate 15/16 become the implementation bottleneck. To design such a high code-rate LDPC decoder, our approach features variable-node-centric sequential scheduling to reduce iteration number, single pipelined decoder architecture to lessen the message storage memory size, as well as optimized check node unit to further compress the register number. Overall 73% message storage memory is saved as compared with traditional architecture. Fabricated in 90nm 1P9M CMOS technology, the test chip of LDPC-BC decoder could achieve maximum 11.5Gbps throughput under 1.4V supply voltage with core area of 2.7 × 1.4 mm^2. The energy efficiency is only 0.033 nJ/bit with 5.77 Gb/s at 0.8V to meet IEEE 802.15.3c requirements. For LDPC-CCs, a (491,3,6) time-varying LDPC-CC decoder chip is implemented. The proposed design combines the algorithm level, node level, and bit level optimizations to achieve over 2Gb/s throughput with acceptable hardware cost and power. The algorithm level optimization is the on-demand variable node activation scheduling with concealing channel values, which can not only achieve twice faster decoding convergence speed than log-belief propagation (log-BP) algorithm but also reduce the 17% message storage capacity. The node level optimization duplicates the check node units and variable node units and unfolds the message storage FIFOs so that the throughput becomes twelve multiplying with clock frequency. In the meantime the bit level optimization is employed to retime the critical path such that the higher clock frequency can be achieved and message storage size is slightly reduced. Furthermore, a novel hybrid-partitioned FIFO is proposed to provide sufficient memory bandwidth to processing units and alleviate power consumption. With these schemes, a test chip of proposed LDPC-CC decoder has been fabricated in 90nm CMOS technology with core area of 2.37 × 1.14 mm^2. Maximum throughput 2.37Gb/s is measured under 1.2V supply with energy efficiency of 0.024nJ/bit/proc. Depending on the operation mode, power can be scaled down to 90.2mW while maintaining 1.58Gb/s at 0.8V supply. Eventually these two works provide good features covering hundreds Mbps to several Gbps throughput range, flexible code rates, adjustable frame size, excellent performance, and better hardware/power efficiency. The proposed methodologies would make LDPC codes more competitive to the other error-control codes.
Appears in Collections:Thesis