. Introduction The compiler could not effectively perform compilation and optimization if only a...

1
. Introduction The compiler could not effectively perform compilation and optimization if only a small number of architected registers are exposed through the ISA, however, if we tried to extend the number of architected registers (or the register field in the ISA) that could produce adverse effect as follows. The register field directly affects code size. Adding 1 bit to the register field typically leads to an increase of 2 or more bits for each instruction due to multiple register fields. However, the register fields take a significant portion of the code size, since most of the instructions in the generated code use register(s) one way or another. For example, register field takes about 28 percent of the Alpha binary and 25 percent of the ARM binary). Therefore, the size of the register field heavily influences the size of generated code. Except for this, wider instructions also complicate the decode stage in the pipeline, stretching clock cycles, increasing die size and the power consumption, etc. Overview of our approach In this thesis, we propose a novel MIPS instruction format that different from the original one. Novel MIPS instruction format allows 256 registers to be addressed in the operand field of instructions than the direct encoding currently being used. Except for this, we tried to reconstruct the original register file into register banks and by providing a reverse operation instructions. This can allow all of the registers to be available for register allocation. The register banks should be completely symmetric in the sense that the primary bank (R0-R127) and the secondary bank (R128-255). In addition, we propose a novel graph, called node dependence graph (NDG) that is composed of registers in the instructions. Finally, we develop an algorithm for register allocation that depends on node dependence graph. Compare with conventional register allocation, our approach without incurring encoding space expansion and offer a solution to work around the bottleneck in ISA and to make full use of the extra physical registers. An illustration is shown in Figure 3.1. Results Conclusions and future work This thesis proposes a novel MIPS instruction format and NDG register allocation that can provide more architected registers than the register field allows. Compare with conventional register allocation, our approach without incurring encoding space expansion and offer a solution to work around the bottleneck in ISA and to make full use of the extra physical registers. Experimental results show that NDG register allocation could reduce the cycles around 5 percent to 6.25 percent on average, and significantly speeds up program execution, except for this, we could to observe if there are high register pressure or more spill situations, our approach has better improvement. In the future, we will study more optimization techniques, so as to reduce extra movement instructions and speed up program execution. A low-cost register extension approach for RISC processors 603410035 戴戴戴 603410093 戴戴戴 Computer Science and Information Engineering, National Chung Cheng University, Taiwan, R.O.C 0907 In addition, we compare with the inlining optimization as shown in Figure 9.4. The register pressure increases significantly due to functions are aggressively inlined to reduce function call overhead, high register pressure regions may appear, therefore, we could to observe if there are more spill situations, our approach have better improvement, the result shows our approach could reduce the cycles around 6.25 percent on average. The Figure 9.3 show the percent of cycles that be reduced for each SPEC2000 benchmarks, in this experiment, we compare with the situation of original 32 registers, the result shows our approach could reduce the cycles around 5 percent on average. References [1] M. M. Fernandes, J. Llosa, and N. P. Topham. Allocating lifetimes to queues in software pipelined architectures. In Euro-Par '97: Proceedings of the Third International Euro-Par Conference on Parallel Processing, pages 1066{1073, London, UK, 1997. Springer-Verlag. [2] T. Kiyohara, S. A. Mahlke, W. Y. Chen, R. A. Bringmann, R. E. Hank, S. Anik, and W. mei W. Hwu. Register connection: A new approach to adding registers into instruction set architectures. In ISCA, pages 247{256, 1993. [3] J.-H. Lee, J. Park, and S.-M. Moon. Securing more registers with re- duced instruction encoding architectures. In RTCSA '07: Proceedings of the 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, pages 417{425, Washington, DC, USA, 2007. IEEE Computer Society. [4]J. Park, J.-H. Lee, and S.-M. Moon. Register allocation for banked reg- ister file. In LCTES '01: Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems, pages 39{47, New York, NY, USA, 2001. ACM. [5] R. A. Ravindran, R. M. Senger, E. D. Marsman, G. S. Dasika, M. R. Guthaus, S. A. Mahlke, and R. B. Brown. Increasing the number of effective registers in a low-power processor using a windowed register file. In CASES '03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, pages 125{136, New York, NY, USA, 2003. ACM. [6] R. M. Senger, E. D. Marsman, and M. R. Guthaus. Partitioning variables across register windows to reduce spill code in a low-power processor. IEEE Trans. Computer., 54(8):998{1012, 2005. Student Member-Rajiv A. Ravindran and Student Member-Ganesh S. Dasika and Member-Scott A. Mahlke and Senior Member-Richard B. Brown. [7] M. Smelyanskiy, G. S. Tyson, and E. S. Davidson. Register queues: A new hardware/software approach to e±cient software We evaluate our approaches on X86 machine, and we use GNU cross compiler toolchain to simulate MIPS architecture, as show in Figure 9.1. GNU Compiler Collection (GCC) is a set of compilers produced for various programming language by the GNU project. GCC is a key component of GNU toolchain. As well as being the offcial compiler at most other modern Unix-like computer operating systems, including Linux, the BSD family and Mac OS X. The GNU toolchain is usually required to translate high level source files into binary code, directly running on the target machine. Figure 3.1: The Concept of Our Approach Figure 9.1: Software Configuration In addition, we evaluate 6 benchmark programs from SPEC2000. and benchmark summary is shown in Figure 9.2. The results of our experiments are presented in the following statistics. The baseline in all case is GCC compiler with 32 registers. Figure 9.2: Benchmark Summary Figure 9.3: Compare with Original 32 Register In addition, we discuss the concept of viewing the situation as a whole. An illustration is shown in Figure 3.2. First to extend the number of register to 256 registers, so as to perform compilation and optimization in compiler, and then we added the NDG register allocation into compiler. Next to modify the encoder and decoder to support novel MIPS instruction format. Finally, we must ensure the dual register bank should normally be accessed during execution time. Figure 3.2: Program Flow for Our Approach The advantage of NDG register allocation as follows, we could effective to allocate extra physical registers, and the spill problem could be handled in compile time, except for this, NDG register allocation could increase the opportunity of the nodes to be reused so as to reduce extra movement instructions.

Transcript of . Introduction The compiler could not effectively perform compilation and optimization if only a...

Page 1: . Introduction The compiler could not effectively perform compilation and optimization if only a small number of architected registers are exposed through.

.

Introduction

The compiler could not effectively perform compilation and optimization if only a small number of architected registers are exposed through the ISA, however, if we tried to extend the number of architected registers (or the register field in the ISA) that could produce adverse effect as follows. The register field directly affects code size. Adding 1 bit to the register field typically leads to an increase of 2 or more bits for each instruction due to multiple register fields. However, the register fields take a significant portion of the code size, since most of the instructions in the generated code use register(s) one way or another. For example, register field takesabout 28 percent of the Alpha binary and 25 percent of the ARM binary). Therefore, the size of the register field heavily influences the size of generated code. Except for this, wider instructions also complicate the decode stage in the pipeline, stretching clock cycles, increasing die size and the power consumption, etc.

Overview of our approach

In this thesis, we propose a novel MIPS instruction format that different from the original one. Novel MIPS instruction format allows 256 registers to be addressed in the operand field of instructions than the direct encoding currently being used. Except for this, we tried to reconstruct the original register file into register banks and by providing a reverse operation instructions. This can allow all of the registers to be available for register allocation. The register banks should be completely symmetric in the sense that the primary bank (R0-R127) and the secondary bank (R128-255). In addition, wepropose a novel graph, called node dependence graph (NDG) that is composed of registers in the instructions. Finally, we develop an algorithm for register allocation that depends on node dependence graph. Compare with conventional register allocation, our approach without incurring encoding space expansion and offer a solution to work around the bottleneck in ISA and to make full use of the extra physical registers. An illustration is shown in Figure 3.1.

Results

Conclusions and future work

This thesis proposes a novel MIPS instruction format and NDG register allocation that can provide more architected registers than the register field allows. Compare with conventional register allocation, our approach without incurring encoding space expansion and offer a solution to work around the bottleneck in ISA and to make full use of the extra physical registers. Experimental results show that NDG register allocation could reduce the cycles around 5 percent to 6.25 percent on average, and significantly speeds up program execution, except for this, we could to observe if there are high register pressure or more spill situations, our approach has better improvement. In the future, we will study more optimization techniques, so as to reduce extra movement instructions and speed up program execution.

A low-cost register extension approach for RISC processors603410035 戴子為 603410093 林函儀

Computer Science and Information Engineering, National Chung Cheng University, Taiwan, R.O.C 0907

In addition, we compare with the inlining optimization as shown in Figure 9.4. The register pressure increases significantly due to functions are aggressively inlined to reduce function call overhead, high register pressure regions may appear, therefore, we could to observe if there are more spill situations, our approach have better improvement, the result shows our approach could reduce the cycles around 6.25 percent on average.

The Figure 9.3 show the percent of cycles that be reduced for each SPEC2000 benchmarks, in this experiment, we compare with the situation of original 32 registers, the result shows our approach could reduce the cycles around 5 percent on average.

References [1] M. M. Fernandes, J. Llosa, and N. P. Topham. Allocating lifetimes toqueues in software pipelined architectures. In Euro-Par '97: Proceedingsof the Third International Euro-Par Conference on Parallel Processing,pages 1066{1073, London, UK, 1997. Springer-Verlag.[2] T. Kiyohara, S. A. Mahlke, W. Y. Chen, R. A. Bringmann, R. E. Hank,S. Anik, and W. mei W. Hwu. Register connection: A new approachto adding registers into instruction set architectures. In ISCA, pages247{256, 1993.[3] J.-H. Lee, J. Park, and S.-M. Moon. Securing more registers with re-duced instruction encoding architectures. In RTCSA '07: Proceedingsof the 13th IEEE International Conference on Embedded and Real-TimeComputing Systems and Applications, pages 417{425, Washington, DC,USA, 2007. IEEE Computer Society.[4]J. Park, J.-H. Lee, and S.-M. Moon. Register allocation for banked reg-ister file. In LCTES '01: Proceedings of the ACM SIGPLAN workshopon Languages, compilers and tools for embedded systems, pages 39{47,New York, NY, USA, 2001. ACM.[5] R. A. Ravindran, R. M. Senger, E. D. Marsman, G. S. Dasika, M. R.Guthaus, S. A. Mahlke, and R. B. Brown. Increasing the number ofeffective registers in a low-power processor using a windowed registerfile. In CASES '03: Proceedings of the 2003 international conferenceon Compilers, architecture and synthesis for embedded systems, pages125{136, New York, NY, USA, 2003. ACM.[6] R. M. Senger, E. D. Marsman, and M. R. Guthaus. Partitioningvariables across register windows to reduce spill code in a low-powerprocessor. IEEE Trans. Computer., 54(8):998{1012, 2005. StudentMember-Rajiv A. Ravindran and Student Member-Ganesh S. Dasikaand Member-Scott A. Mahlke and Senior Member-Richard B. Brown.[7] M. Smelyanskiy, G. S. Tyson, and E. S. Davidson. Register queues:A new hardware/software approach to e±cient software pipelining. InPACT '00: Proceedings of the 2000 International Conference on ParallelArchitectures and Compilation Techniques, page 3, Washington, DC,USA, 2000. IEEE Computer Society.[8] J. Zalamea, J. Llosa, E. Ayguade, and M. Valero. Two-level hierarchi-cal register ¯le organization for vliw processors. In Proceedings of the33rd annual ACM/IEEE international symposium on Microarchitecture,pages 137{146, New York, NY, USA, 2000. ACM.

We evaluate our approaches on X86 machine, and we use GNU crosscompiler toolchain to simulate MIPS architecture, as show in Figure 9.1.GNU Compiler Collection (GCC) is a set of compilers produced for variousprogramming language by the GNU project. GCC is a key component ofGNU toolchain. As well as being the offcial compiler at most other modernUnix-like computer operating systems, including Linux, the BSD family andMac OS X. The GNU toolchain is usually required to translate high levelsource files into binary code, directly running on the target machine.

Figure 3.1: The Concept of Our Approach

Figure 9.1: Software Configuration

In addition, we evaluate 6 benchmark programs from SPEC2000. and benchmark summary is shown in Figure 9.2. The results of our experiments are presented in the following statistics. The baseline in all case is GCC compiler with 32 registers.

Figure 9.2: Benchmark Summary

Figure 9.3: Compare with Original 32 Register

In addition, we discuss the concept of viewing the situation as a whole.An illustration is shown in Figure 3.2. First to extend the number of register to 256 registers, so as to perform compilation and optimization in compiler, and then we added the NDG register allocation into compiler. Next to modify the encoder and decoder to support novel MIPS instruction format. Finally,we must ensure the dual register bank should normally be accessed during execution time.

Figure 3.2: Program Flow for Our Approach

The advantage of NDG register allocation as follows, we could effective to allocate extra physical registers, and the spill problem could be handled in compile time, except for this, NDG register allocation could increase the opportunity of the nodes to be reused so as to reduce extra movement instructions.