Stack Instruction Folding of Java Processors: Modeling and Realization
|關鍵字:||爪哇處理機;爪哇虛擬機器;堆疊運算摺疊;資料相依性;堆疊指令摺疊;Java Processor;Java Virtual Machine;Stack Operations Folding;Data Dependency;Stack Instruction Folding|
|摘要:||爪哇 (Java) 語言由於其較安全、具有跨平台特性、且其程式碼較小 (每一指令平均約1.8 Bytes)，被廣泛應用於Internet以及嵌入式控制器上。然爪哇處理機 (Java Processor) 受到其虛擬機器 (Virtual Machine) 資料相依性之特性影響，效能受到嚴重之限制。為了克服以上之問題，在本論文中介紹了靜態式 (Pattern-based) 及動態式 (Rule-based) 兩種指令摺疊方法。在靜態式摺疊中，可摺疊指令之組合型態必須事先找出，然後一一與目前指令串來比對。而在動態式摺疊中，指令將依其可摺疊屬性自動地被摺疊在一起。同時它可再分為連續性指令 (Continuous Instruction) 及非連續性指令 (Discontinuous Instruction) 摺疊。前者僅可摺疊一串連續之可摺疊指令，而後者可將被非堆疊指令或已被摺疊過指令隔開之指令摺疊在一起。本論文提出一連續動態摺疊方法—POC Folding，及其架構設計。初期模擬統計資料顯示，4-摺疊策略已接近摺疊上限，可省去84%之堆疊運算；且2-、3-及4-摺疊相對於無指令摺疊之效能比為1.22、1.32及1.34。此方法可以最簡單之硬體設計來完成。另外，一個非連續之動態摺疊模型，同時也是POC摺疊之擴充模型—Hybrid-EPOC Folding，在本論文中一併被提出。它可藉由軟體之重新安排，利用既有之簡單POC硬體額外增加一擴充指令 (P’ Bytecode) 來達成。其在4-摺疊策略下可省去94.8%之堆疊運算 (使用第二種測試程式—SPECjvm98 / s10 Data Set)，較POC摺疊之80.1%多了14.7%，效能比高了13%；其IIPC (Issued Instructions Per Cycle) 在 2-、3-、4-摺疊策略下為1.60、1.70與1.71，其中1.71達到理論上限值1.77之96.6%。然而其所付出之額外P’ Bytecode程式碼少於原先虛擬機器總大小之8%。此種方法在ROM相對面積較小之高效能SoC設計中有其好處。昇陽之picoJava-II用靜態之摺疊方法，其可省去之堆疊運算僅為39.6%、IIPC為1.25。目前本研究為世界之首，無論在方法及架構上均有最佳之貢獻。不但獲得美國及中華民國專利，且被學術界其他研究所參考。如 Kim 之Advanced POC 摺疊模型即是參考本POC摺疊模型而來。|
Java has been extensively adopted in Internet and embedded applications because of its robust, cross platform and small code size (every instruction is 1.8 bytes on average) characteristics. However, the performance of the Java processor is greatly limited by the true data dependency inherited from its virtual machine. Two kinds of folding models are introduced to overcome this problem. One is pattern-based folding and the other is rule-based folding. In the pattern-based folding model, folding patterns are identified first and then compared with incoming instructions. In the rule-based folding model, bytecode instructions are classified before the folding algorithm automatically perform folding checks based on the folding attributes of each bytecode instruction. Furthermore, the folding can be divided into continuous instruction and discontinuous instruction folding. The former can only fold sequentially ordered instructions. The latter folds instructions that may be blocked by non-stack instruction or may include folded instructions. A continuous rule-based folding model – POC folding – and its corresponding architecture are proposed. The first simulation shows that 4-foldable strategy that almost reaches the performance upper bound can eliminate 84% of all stack operations. The 2-, 3-, and 4-foldable strategies yield in an overall program speed-up of 1.22, 1.32 and 1.34 times, respectively, when compared to a stack machine without folding. This model can be implemented by simple hardware design. A discontinuous rule-based folding model that extends POC folding – Hybrid-EPOC folding – is also proposed. It can be implemented by software re-scheduling accompanied by simple POC folding hardware, with the support of an extra extended bytecode (P’ bytecode). A 4-foldable Hybrid-EPOC Folding model folds 94.8% of stack operations using the second benchmark program – SPECjvm98/s10 data set. The model exhibits a 14.7% increase and a 13% performance gain over the 80.1% of the POC folding model can fold. The IIPC (Issued Instructions Per Cycle) of the Hybrid-EPOC folding model for 2-, 3-, and 4-foldable strategies are 1.72, 1.73, and 1.74, respectively. 1.74 is 98.3% of the theoretical upper bound of 1.77. However, the increased code size of the P’ bytecode is less than 8% of that of the total virtual machine code. This model is suitable for the high performance and low ROM size of the system on a chip design. Sun Microsystem’s picoJava-II uses a pattern-based folding model, which can eliminate 39.6% of all stack operations and 1.25 of IIPC by the 4-foldable strategy. Our research leads this field, in both methodology and architecture. We have obtained US and ROC patents, and the research results has also been referenced by other research, including Kim’s Advanced POC model.