Media Streaming Architecture with Homogeneous Processor Cores
|關鍵字:||齊質性處理核心;多媒體串流處理架構;矽智財;浮點運算處理器;效能評估;Media Streaming Architecture;Homogeneous Processor Core;Intellectual Property;ALU cluster IP;Floating Point Operation Unit;Performance Evaluation|
在本論文中，設計並下線製作一個與AMBA 介面相容之多媒體處理單元作為構成具多齊質性處理核心之多媒體串流處理架構的核心運算單元。除此之外亦設計實做浮點運算處理器，利用此浮點運算處理器提供此一與AMBA 介面相容之多媒體處理單元有效率的浮點運算處理能力，使其可以更廣泛的應用於各種多媒體處理運算中。透過不同運算單元與架構間的效能評估與比較，證實了僅需要些許的硬體成本即可提供更有效率且更廣泛的多媒體運算處理能力。另外此效能評估與比較亦證實了具多齊質性處理核心之多媒體串流處理架構在擁有不同數目之處理核心時，在合理的硬體成本之下其效能可以有效的提升。|
As the evolution of information technology, embedded systems with media applications for portable devices are more and more important in modern life. However, the conventional processor architecture does not handle the processing requirement of media applications very well since the characteristics of media applications and other inheritance disability from conventional microprocessor architecture’s memory accessing model and processor-memory performance gap. Recent research shows that the stream processing model and stream processor architecture are suitable for media applications. However, software implementations for a streaming processor are not a trivial job since it evolves a lot of hand and manual optimization in memory exchange and tread deployment to different processor element or functional unit. In this thesis, a processing element for reconfigurable homogenous ALU cluster and its Advanced Microcontroller Bus Architecture (AMBA) platform interface has been designed and implemented. The proposed design integrated platform based design methodology and stream processing model to overcome the challenge of media applications. The proposed homogenous ALU cluster is utilized as a reconfigurable hardware accelerator for specific and different functions in media applications. The chosen AMBA interface provides an integration platform for embedded operating system and programming development environment. The combination of these methodologies provides a turnkey solution for media applications development in modern portable devices. The ALU cluster IP with AMBA interface is taped out using TSMC 0.15um technology and operates at 100MHz. The chip area is 3.9*3.9 mm2 and gate count is 0.2 million. A 4-layer FRP printed circuit board is designed and fabricated as the daughter card for system integration. The daughter card carries the designed chip is integrated to ARM versatile platform board as the system integration and application development environment. In addition, a floating point operation unit for ALU cluster IP is proposed and implemented and it will be integrated with ALU cluster IP as the future revision of the hardware accelerator. The hard macro of the floating point unit operates at 75MHz, its area and gate count is 0.415mm2 and 0.02 million respectively. The performance evaluation and comparison in floating point operation benchmark between different proposed architectures are presented. Media applications can be developed for proposed reconfigurable homogenous processing elements in the future using the chips and systems build in this thesis.
|Appears in Collections:||Thesis|
Files in This Item: