P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization...
Transcript of P OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS · 2013-03-08 · power optimization...
POWER OPTIMIZATION METHODS IN HETEROGENEOUS MULTICORE PROCESSORS
PREPARED FOR: SHARON AHLERS ENGINEERING COMMUNICATIONS 350 COLLEGE OF ELECTRICAL AND COMPUTER ENGINEERING CORNELL UNIVERSITY PREPARED BY: ALEXANDER VITKALOV COLLEGE OF ELECTRICAL AND COMPUTER ENGINEERING CORNELL UNIVERSITY DECEMBER 12, 2005
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 2
ABSTRACT
This report evaluates the benefits of using heterogeneous processor cores as a means of
reducing microprocessor power consumption while increasing its performance. The project
focuses on the hardware implementation of heterogeneous processors rather than software.
Advantages of multicore architectures are evaluated across five main categories including
performance, efficiency, compatibility, functionality and cost. Increases in speed and
efficiency of multicore processors are derived through extrapolation of data from comparison
between single core processors and their dual core counterparts. Compatibility and
functionality advantages are discussed in terms of backwards compatibility, design flexibility
and power consumption. The report concludes with a feasibility study outlining the
technological and financial conditions required for profitable development of multicore
processors.
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 3
TABLE OF CONTENTS LIST OF FIGURES ................................................................................................................ 4
1. INTRODUCTION........................................................................................................... 5
2. PERFORMANCE ........................................................................................................... 5 2.1 CHOICE OF PROCESSORS ...................................................................................... 6 2.2 OVERALL PERFORMANCE .................................................................................... 7 2.2 PERFORMANCE EXTRAPOLATION ......................................................................... 9
3. EFFICIENCY .............................................................................................................. 10 3.1 PERFORMANCE PER WATT................................................................................. 11 3.2 EFFECTS OF CORE HETEROGENEITY.................................................................. 12 3.3 CHALLENGES ..................................................................................................... 13
4. COMPATIBILITY........................................................................................................ 13 4.1 BACKWARDS COMPATIBILITY............................................................................ 13 4.2 CORE COMPATIBILITY....................................................................................... 14
5. FUNCTIONALITY ....................................................................................................... 15 5.1 PROGRAMMABLE PROCESSORS .......................................................................... 15 5.2 CHALLENGES ..................................................................................................... 17
6. FEASIBILITY.............................................................................................................. 17 6.1 CURRENT TECHNOLOGIES.................................................................................. 17 6.2 FUTURE TECHNOLOGIES .................................................................................... 18
7 CONCLUSION ............................................................................................................ 19
8 RECOMMENDATIONS ................................................................................................ 19
REFERENCES..................................................................................................................... 21
GLOSSARY ........................................................................................................................ 23
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 4
LIST OF FIGURES
FIGURE 1: SELECTED PROCESSORS………………………………………………………6
FIGURE 2: PROCESSOR POWER CONSUMPTION……………………………………...…...7
FIGURE 3: OVERALL PROCESSOR PERFORMANCE [DHRYSTONE]……………….………..7
FIGURE 4: OVERALL PROCESSOR PERFORMANCE [WHETSTONE]………………………...8
FIGURE 5: OVERALL PROCESSOR PERFORMANCE EXTRAPOLATION…....……………….. 9
FIGURE 6: PERFORMANCE PER WATT COMPARISON…………… ……….…………….. 10
FIGURE 7: PERFORMANCE PER WATT EXTRAPOLATION ……………....………………..11
FIGURE 8: BACKWARDS COMPATIBILITY……………………………....………………..14
FIGURE 9: PROGRAMMABLE COMMUNICATIONS BUS AS CORE INTERCONNECT……….. 14
FIGURE 10: VIDEO DECODER SCENARIO……………………………....……………… ..16
FIGURE 11: RELATIVE CORE SIZING……………………………... …....……………….18
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 5
1. INTRODUCTION
Over the past twenty years processor
frequency has always been considered to be
the fundamental measure of performance.
Higher frequency generally meant faster
performance. However, this notion has
changed as processor power consumption
became increasingly important. Power
consumption is dependent on operating
frequency and the number of transistors used
in a processor. Today’s processors use as
many as 250 million transistors, meaning that
a small increase in frequency of each can
cause a dramatic increase in overall power
consumption. The enormous heat that is
generated as a result causes thermal
breakdown of silicon crystals. Although
improving manufacturing process and
decreasing transistor sizes lowers the power
consumption this approach is becoming
increasingly costly. Therefore, we have
reached a point where the true processor
performance is no longer determined solely
by its frequency or transistor size but is
dependent on the elegance and efficiency of
its architecture.
Advances such as pipelining, branch
prediction and hyperthreading have enabled
the increase in performance and efficiency of
processors. However, even the most efficient
single core architectures cannot provide
effective solutions to the demands of
consumers. Unacceptable levels of power
consumption and increasing costs of
developing complex single core chips forced
the manufacturers to improve the efficiency
and performance of their processors through
the use of dual core solutions. Although
using two identical cores is a step in the right
direction, efficiency and performance of
microprocessors can be further improved by
using multiple heterogeneous cores. The
advantages provided by this method enable
the fusion of high-performance and mobile
processor architectures and will provide
effective solutions for the years to come.
To understand why utilizing
parallelism in processors by using multiple
heterogeneous cores is the most effective
method of improvement it is helpful to start
from a simple performance and efficiency
comparison of identical single and dual core
processors and then gradually advance to
more complicated issues.
2. PERFORMANCE
Performance of a processor truly
depends on a multitude of factors determined
by its architecture and manufacturing
process. In general, frequency has always
emerged to be the leading factor. However
architectural features such as cache size,
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 6
efficiency of a branch predictor and pipeline
depth among others are becoming
increasingly important in determining the
performance of a single core processor.
Intuitively, the effects of designing a superior
architecture are magnified when more than
one core is used.
2.1 CHOICE OF PROCESSORS To demonstrate the importance of
processor architecture, the performance of
several cores from different applications need
to be compared to each other (Figure 1). In
high performance segment, Intel Pentium 4
670 [1] and AMD Athlon FX-55 [2] were
chosen, since they represent the fastest single
core processors available today. Intel
Pentium 4 840D [3] and AMD Athlon 64-X2
[4] are their dual core counterparts. In
addition, Intel Pentium M 780 [5] and
Transmeta Efficeon 8800 [6] were selected
from two opposite ends of mobile spectrum.
The disparity in their operating frequency
and power consumption illustrates the
flexibility that is required for a truly mobile
solution. Intel PXA270 [7] is the sole
example of a true system on chip (SoC)
processor that is typically used in personal
digital assistants.
From Figure 2, which illustrates
processor power consumption, it can be seen
that the energy use in modern desktop
processors, based on Pentium IV or AMD
Athlon nearly quadruples that of a typical
laptop, based on Pentium M. One of the
objectives of this report is to investigate how
this figure can be reduced through an
efficient combination of multiple
heterogeneous cores.
FIGURE 1: SELECTED PROCESSORS
PROCESSOR FREQUENCY TRANSITORS SIZE PRICE
PERFORMANCE
Intel Pentium IV 670 3800Mhz 169M 112mm2 $625
AMD Athlon FX-55 2600Mhz 114M 115mm2 $824
MOBILE
Intel Pentium M 780 2260Mhz 140M 87mm2 $638
Transmetta Efficeon 1300Mhz 40M 29mm2 n/a
Intel PXA 270 624Mhz 2.5M 50mm2 n/a
DUAL CORE
Intel Pentium IV 840D 3200Mhz 230M 237mm2 $667
AMD Athlon X2 4800+ 2400Mhz 233M 199mm2 $790
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 7
FIGURE 2: PROCESSOR POWER CONSUMPTION
0 20 40 60 80 100 120 140
AMD Athlon X2 4800+
Intel Pentium IV 840D
Intel PXA 270
Transmetta Efficion
Intel Pentium M 780
AMD Athlon FX-55
Intel Pentium IV 670
POWER CONSUMPTION (WATT)
Clearly, the highest power consumers
are the performance based cores of Intel
Pentium 670 and 840D along with dual core
AMD Athlon FX-55 and X2. The overall
power consumption of dual core solutions is
greater. However, the consumption per-core
of dual core solution nearly halves the one of
a single core. Therefore, if each core
provides equal performance to its single core
counterpart at half the power, it is twice as
efficient. The subsequent sections, focused
on processor performance, verify the degree
of validity of the above statement.
2.2 OVERALL PERFORMANCE The overall performance of the
selected processors can be compared using
FIGURE 3: OVERALL PROCESSOR PERFORMANCE: SISOFT SANDRA (DHRYSTONE)
0 5000 10000 15000 20000 25000
AMD Athlon X2 4800+
Intel Pentium IV 840D
Intel PXA 270
Transmetta Efficeon
Intel Pentium M 780
AMD Athlon FX-55
Intel Pentium IV 670
POWER CONSUMPTION (WATT)
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 8
various benchmarks, specifically SiSoft
Sandra 2004 Dhrystone and Whetstone [8,9].
Dhrystone benchmark compares the speed of
processors by counting the number of largely
numerical operations or MIPS (Millions of
Instruction Per Second), associated with
common application instructions such as the
ones received from Windows (Figure 3).
Whetstone evaluates the floating point
performance of a processor MFLOPS
(Million Floating Operations per Second),
typically associated with scientific or
multimedia applications. The second aspect
of the benchmark is becoming increasingly
important as computers are being used to
watch videos, listen to music and play 3D
games (Figure 4).
In the overall performance
comparison, dual core processors remain to
be the definite favorites, especially in the raw
performance benchmark such as SiSoft
Sandra Dhrystone. For both Intel and AMD,
the dual core nearly doubles the performance.
The Whetstone also shows similar picture.
Although the performance of the high-end
desktop chips is superior to the mobile
architectures, the difference is marginal. The
Dhrystone performance of Intel Pentium M,
which is a mobile processor, is only 15% less
than that of a single core Intel Pentium 670
and AMD Athlon FX-55. In multimedia
benchmark, SiSoft Whetstone, Intel Pentium
M also rounded up well against the desktop
counter parts.
On the other hand, Transmetta
Efficeon and handheld Intel PXA-270 show a
dramatic disadvantage in to the overall
performance figures. Practically non-existent
FIGURE 4: OVERALL PROCESSOR PERFORMANCE: SISOFT SANDRA (WHETSTONE)
0 2000 4000 6000 8000 10000 12000
AMD Athlon X2 4800+
Intel Pentium IV 840D
Intel PXA 270
Transmetta Efficeon
Intel Pentium M 780
AMD Athlon FX-55
Intel Pentium IV 670
MILLION FLOATING OPERATIONS PER SECOND
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 9
Dhrystone and Whetstone benchmarks of
Intel PXA-270, which is one of the top
handheld processors used today, shows a
definite underperformance when it comes to
ultra portable solutions. This significant
difference in performance also explains why
PDAs are not as widespread as desktops and
laptops. Slow processors seriously limit the
functionality of the units, giving consumers
less incentive to buy them. The performance
of the ultra portable processors such as Intel
PXA-270 is capped by the stringent limits in
power supply that are required to keep the
units portable. Although battery capacities
are slowly increasing, other methods, such as
use of multiple cores, are required to make
ultra portable electronics more practical.
2.2 PERFORMANCE EXTRAPOLATION During Intel Developer’s Forum in
the Spring 2005, Intel corporation predicted a
ten-fold increase in processor performance
due to the introduction of multicores.
Considering that the performance of desktop
processors increased by 68 times since the
introduction of 8086 in 1978 this prediction
is quite accurate in the long run. As Intel
readies dual and quad core processors to hit
the markets in 2006-07 the probability of a
significant short term performance increase is
also likely. Considering that these processors
will be manufactured on a decreased .065μm
process, the power and performance
FIGURE 5. PERFORMANCE EXTRAPOLATION (BASED ON SANDRA DHRYSTONE BENCHMARK)
0% 50% 100%
150%
200%
250%
300%
350%
400%
450%
Intel Pentium M (Quad)
Intel Pentium M (Dual)
Intel Pentium M 780
AMD Athlon (Quad)
AMD Athlon X2 (Dual)
AMD Athlon FX-55
Intel Pentium (Quad)
Intel Pentium IV 840D
Intel Pentium IV 670
PERFORMANCE %
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 10
advantages will also be extended. By the
most conservative estimates the quad core
configurations should improve the overall
processor performance by a factor of at least
3 (Figure 5).
If the dual and quad core versions are
used with a portable processor, such as Intel
Pentium M, laptops will experience an even
more significant increase in performance.
Laptop processor designs are more energy
efficient and therefore produce less heat.
Considering that power dissipation issues are
shifting to the forefront of technological
limitations of processor designs, mobile
architectures are likely to benefit more from
the increased core count. If fact, Intel is
already making plans to introduce dual core
mobile processors based on Pentium M, with
a current codename Yohan. In addition, a
quad core desktop processor, codenamed
Kentsfield, is planned for the mid 2007
arrival [10].
3. EFFICIENCY Although performance has always
been the most important measure of
evaluating processor superiority, issues with
energy consumption are rapidly adding new
meanings to this concept. Currently,
manufacturers like Intel and AMD are
beginning to evaluate their products on the
basis of performance per watt, which reflects
not only the speed of the processor, but also
how efficient it is in terms of energy use.
FIGURE 6: PERFORMANCE PER WATT COMPARISON.
0 50 100 150 200 250 300 350 400
AMD Athlon X2 4800+
Intel Pentium IV 840D
Intel PXA 270
Transmetta Efficeon
Intel Pentium M 780
AMD Athlon FX-55
Intel Pentium IV 670
PERFORMANCE(MIPS PER WATT)
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 11
3.1 PERFORMANCE PER WATT Since the introduction of performance
per watt design approach the industry has
been achieving higher performance figures at
lower power consumption. Although mobile
processors have always been better with
respect to efficiency, the answer for the
desktop systems has often been dual core
architectures. Figure 6, derived from figures
2 and 3, compares the performance per watt
ratings of the selected processors. For the
purpose of brevity, only Dhrystone
benchmark was used for the overall
performance component of this comparison.
Intel PXA-270 was not included due to
significantly different CPU architecture. Due
to the fact that its architecture is optimized
for extremely low power consumption, the
performance per watt figure varies
significantly from one application to the
next. It is important to note that Pentium M
has a significantly higher performance per
watt ratio than any other process in
comparison. Another important trend is that
the dual cores improve the performance per
watt ratio by roughly 30%. Combining these
results with Figure 5 we reach a clear
conclusion that mobile cores benefit the most
from having multiple cores (Figure 7). To
obtain the results, the performance
percentages of figure 5 were divided by the
performance per watt ratio of a given
processor relative to Pentium M. For
instance, for AMD Athlon FX-55 the
performance per watt ratio is 350/140 = 2.5.
FIGURE 7: PERFORMANCE PER WATT EXTRAPOLATION
0% 50% 100%
150%
200%
250%
300%
350%
400%
450%
Intel Pentium M (Quad)
Intel Pentium M (Dual)
Intel Pentium M 780
AMD Athlon (Quad)
AMD Athlon X2 (Dual)
AMD Athlon FX-55
Intel Pentium (Quad)
Intel Pentium IV 840D
Intel Pentium IV 670
PERFORMANCE % (Relative to Pentium M)
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 12
Its performance based on Sandra Dhrystone
is 119% relative to Pentium M. Therefore,
the total performance per watt is only 47.6%
relative to Pentium M. Since performance
and power consumption data was unavailable
for the quad cores, it was derived by
obtaining performance per watt ratio of a
corresponding single and dual core solution.
To obtain the quad core solution, this factor
was then multiplied by the performance per
watt percent of the dual core solution.
Clearly, data shows that in terms of
efficiency, mobile solutions are hard to beat.
3.2 EFFECTS OF CORE
HETEROGENEITY
By using heterogeneous cores we can
further increase both performance and
efficiency of a processor [11,12]. In a likely
scenario, several cores with different
performance, efficiency and complexity
indexes can be combined in one multicore
design. Large variations in the core
architectures causes the entire design to
become more flexible and adaptive to a
specific application. Kumar, et. al. in their
study found that on average heterogeneous
cores provide a significant 39% in power
reduction while having only a negligible 3%
reduction in performance. Since their
experiment was based on the Alpha
processor, which is rarely used in consumer
products, it can only suggest the potential in
improvement that can be made if processor
architectures were specifically designed to
take advantage of multicore heterogeneity. In
this case, powerful cores can be combined
with more efficient ones to generate a
significant improvement in terms of
performance per watt figures. Since modern
processors remain underutilized for most of
the time, this approach would yield
significant idle power reductions. More
powerful cores would simply be shut off and
used only when their performance counts.
For the desktops the reduced power load
would decrease the demand for the custom
water cooled solutions that are beginning to
appear to resolve heat dissipation problems.
In addition, the increased processing
capabilities would greatly reduce bottlenecks
in calculation intensive applications, such as
file archiving and conversion.
In ultra mobile applications, the speed
of a desktop processor is rarely necessary,
however if need does arise the batteries may
provide enough power for shorts spurs of
time through the use of capacitors. In
addition, less complex cores can be use used
for the specific application further reducing
the power consumption, which will be
discussed later.
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 13
3.3 CHALLENGES
One of the primary challenges of
introducing heterogeneous cores is the
increasing complexity of the communication
bus that is required for such a complicated
network. When it comes to the modern dual
core processors, the communication between
the cores is still in its infant stages of
development. Although these processors
provide nearly a two fold increase in some
applications, they may provide none in
others. Creating an effective communications
bus is a difficult challenge, which can be
magnified by introduction of heterogeneous
cores that may work on different or even
variable frequencies. In addition,
performance bottlenecks and power
distribution issues have to be evaluated
largely on the hardware level. Although,
Kumar et.al [11] used software to determine
the processor assignment for a specific
instruction, hardware implementations of an
effective algorithm would have significant
performance advantages. A separate co-
processing unit may be necessary just to deal
with power and performance optimization.
4. COMPATIBILITY
One of the most important factors in
the success of experienced by Intel
Corporation over the years is backwards
compatibility. Backwards compatibility
means that newer processors produced by
Intel are compatible to the ones twenty
years ago. The purpose of this practice is
so the software does not have to be
rewritten for every new generation of
processors. As a result, many of the
newest processors have a number of old
artifacts that serve no purpose in any of
the modern applications. This section of
the report focuses not only on describing
the method of ensuring backwards
compatibility in multicore processors but
also on making sure that heterogeneous
cores are compatible with each other.
4.1 BACKWARDS COMPATIBILITY
Multicore processors offer the best
possible solution in terms of backwards
compatibility. Compared to the modern
counterparts, the processors from twenty
years ago were much slower. Therefore,
highly efficient processor cores, working at
low frequencies, are more than sufficient to
emulate the operation of their ancestors
(Figure 8). The backwards compatibility, in
faster high performance cores can therefore
be neglected. As a result, the unnecessary
redundancy would be eliminated from the
entire design. In addition, the performance
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 14
and efficiency should increase because
common instruction set potentially excludes
practically useless operations.
Since heterogeneous multicore
processors may use significantly different
cores to diversify their performance index
across a range of applications, efficient
operation has to be ensured through core
compatibility [13].
4.2 CORE COMPATIBILITY
FIGURE 8: BACKWARDS COMPATIBILITY
MOBILE CORE
PERFORMANCE CORE INSTRUCTION TYPE A
INSTRUCTION TYPE A
INSTRUCTION TYPE B
Most modern designs use a reduced
in instruction set count (RISC) architectures.
This approach provides advantages in both
performance and power consumption over
other processor instruction types. On the
other hand, the instruction sets vary from
processor to processor. For instance, Pentium
IV includes an additional set of multimedia
instructions to increase its performance
across a wide range of multimedia
FIGURE 9: PROGRAMMABLE COMMUNICATION BUS AS CORE INTERCONNECT.
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 15
applications, while earlier processors such as
80486 do not. The situation becomes more
complicated as processors with different
RISC are combined together. For instance,
the RISC in Pentium IV is optimized for the
high performance applications. On the other
hand, the RISC in the Arm processor,
commonly used in handheld applications, is
optimized for power consumption [14]. As a
result, the instruction sets are not compatible
even though they are quite similar in their
purpose.
The most efficient solution is to
combine the heterogeneous cores using
translation layer [15]. Nava et. al proposed an
approach resembling a network topology to
resolve the communication issues in between
the heterogeneous cores. Having network
based communication bus act as a translation
layer between the heterogeneous cores will
eliminate most of the compatibility issues
between the cores. In addition, since the
proposed communication bus can be
programmable. The power consumption can
therefore be further decreased by using smart
routing techniques optimized to increase the
performance per watt rating of the processor.
Programmable components, such as
communication bus connecting
heterogeneous cores, can have a significant
impact on the increased performance
compared to the approach used by Kumar et.
al [11]. Since the communication bus will be
programmed by a local coprocessor, rather
by indirect software methods used in [11],
the efficiency and performance of the overall
design is likely to have a significant increase.
5. FUNCTIONALITY Since heterogeneous multicore
processors are likely to include a number of
programmable components besides a
communications bus, the functionality of the
design is likely to increase significantly. The
purpose of this section is to discuss the
advantages in functionality that are
associated with multicore processors and
their programmable components.
5.1 PROGRAMMABLE PROCESSORS
The key element of a heterogeneous
multicore design of tomorrow will be an
increased number of programmable
processors. The advantages of programmable
processors include custom execution units,
variable instruction sets as wells as registers
and register files [16]. Conventional fixed
instruction set processors simply cannot
compete with flexibility and performance
advantages offered by the programmable
processors when it comes to specific
applications. For instance, the emergence of
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 16
digital signal processing in internet and
multimedia applications allowed system
architects to design processors for their
specific algorithms and subsequently update
them as their algorithms improve over time.
For instance, a configurable processor used
to decode older MPEG 2 video files, can be
reprogrammed to decode newer MPEG-4
videos.
Today’s system on chips design
contains hundreds of custom programmable
processors [16]. This is achieved by keeping
the complexity of the programmable
processor cores at relatively low levels. In
future, the programmable cores may become
more complex and have wide range of
functionality. For instance, the same
programmable processor can a physics
processor for scientific and entertainment
applications or act as a GPS processor for
navigation applications.
Considering that modern processors
are swaying away from the traditional
handcrafted design approach, the complexity
of the processors is likely to increase. In the
long run, the goal is to enable computers to
design themselves with as few human inputs
as possible. Today, computers go only as far
as aiding the designer in optimizing a given
processor architecture. Special software
FIGURE 10: VIDEO DECODER SCENARIO USING HETEROGENEOUS MULTICORE PROCESSOR
COREA
PROGCORE
COREB
ON
OFF
PROGCORE
PERFORMANCE MODE MOBILE MODE
ON
OFF
COREA
PROGCORE
COREB
PROGCORE
PROGRAMMABLE COMMUNICATIONS BUS PROGRAMMABLE COMMUNICATIONS BUS
OPERATING SYSTEM OPERATING SYSTEM [POWER SAVE]
VIDEO APPLICATION (RESOLUTION 1280X720 @30FPS)
VIDEO APPLICATION (RESOLUTION 800X480 @ 25 FPS)
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 17
packages can be used to reprogram
processors, while optimizing its performance
and power consumption to a given scenario.
It is likely, that one of the first applications
of this approach will be configurable
processors that would reprogram themselves
based on the amount of available power. For
instance, a video stream decoder could
provide an HDTV quality resolution while a
laptop is plugged into a wall outlet, while
giving a lower resolution and saving power
when the user is traveling (Figure 10).
5.2 CHALLENGES
The primary challenge as the number
and complexity of programmable
components increases is to ensure that
heterogeneous cores are communicating
efficiently [17]. The efficient operation
means that the power consumption has to be
minimized according to the performance
demand. An increasing variability in
instruction sets makes this a challenging task.
As instructions become significantly
different from each other, it is harder to
determine the best suited component. 64 bit
instructions, already present in some CPU’s
may provide the answer to this problem since
they accommodate twice the amount of
information compared to traditional 32 bit
instruction.
In addition, as the complexity of the
heterogeneous components is increasing so
will the complexity of the tools that are
required to design them. A very high level of
abstraction is required to create a system with
so many variable parameters. In addition, a
point may be reached when the complexity of
the instructions and design will start adding a
burden on the overall system performance.
6. FEASIBILITY The purpose of the feasibility study in
this report is to evaluate technological and
market conditions required for making
heterogeneous core processors a viable
alternative to current technologies. The study
evaluates current technological conditions
and investigates near future trends. 6.1 CURRENT TECHNOLOGIES For the past four years the
manufacturing standard was the .09μ
technology. Along with improved processor
architectures, it allowed the increase in
processor frequencies from roughly 1.4 to 3.8
Ghz. This seemingly disproportional
frequency increase had a serous negative
impact on power requirements of a typical
processor. Since larger number of transistors
was needed to achieve high frequency
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 18
designs and each transistor required higher
operating voltage for faster switching, the
overall power increased quadratically.
Modern designs have reached a point where
performance is strictly limited by the ability
of the processor to withstand the enormous
heat it generates. In addition, the increased
complexity of the designs is making modern
processors have lower production yields,
which drive the unit prices up. In some
cases, such as Intel Pentium Extreme Edition,
the unit prices have reached a staggering
1000 dollars.
To resolve the problem with power
dissipation and low yields manufacturers
resorted to the use of dual cores and 64 bit
processors. By increasing parallelism in the
architectures, the processor designers were
able to decrease the frequency and power
consumption of each core while still
increasing the performance. Less
complicated cores that are used in dual core
solutions have higher yields, decreasing costs
of the overall design. 6.2 FUTURE TECHNOLOGIES For the next several years the trend of
exploiting parallelism in processor designs
will gain momentum. Improved
manufacturing processes will continue to
drive down the costs of developing dual core
processors. However, in order to make a
significant leap into the future the core
manufacturing technology needs to decrease
from .09μm down to 0.065μm. This 38%
decrease in length would cause nearly a 50%
decrease in total area (Figure 11).
Considering that manufacturers are also
shifting towards using larger wafers1, the
decrease in production costs per unit will at
least half.
FIGURE 11. RELATIVE CORE SIZING
CORE
CORE
CORECACHE
CORE
CORE
CACHE
SINGLE CORE 0.09um QUAD CORE 0.065um
PO
WE
R
PER
FOR
MAN
CE
PO
WE
R
PER
FOR
MAN
CE
The decreased core area will also
allow more cores to be combined into one
processor. The current .09μ technology does
not allow practical integration of more than
two cores, since cooling large cores becomes
1 Wafers are used to grow silicon crystals, which are subsequently sliced and divided up into processor cores.
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 19
problematic. A larger area can also
potentially allocate more space to the power
optimization circuitry. This is one of the
reasons why 0.065μm technology would
allow for development of quad-core and
multicore processors.
Unfortunately, many fundamental
physical phenomena, such as leakage
currents, are imposing serious constraints to
sub .065 μm technologies. Although Intel is
currently looking at a potential reduction
down to .045μm within the next two years
[10], this projection may be as realistic as
5Ghz Pentiums that were rumored by 2005
and never delivered. Sub .045μ technologies
will eventually become a reality once
extreme forms of lithography are
implemented. At this point heterogeneous
multicore processors will likely become the
dominant trend in the markets due to a
number of advantages discussed in prior in
this report.
7 CONCLUSION
Although true high performance
heterogeneous multicore processors are still
guarded by unresolved technological
limitations, the future of this technology
seems promising. The advantages introduced
by multicore designs overshadow the
benefits of current single and dual core
solutions. In terms of performance and
efficiency, heterogeneous multicore
processors offer unprecedented increases due
to power consumption flexibility and high
level of configurability. Although this report
at times focused solely on performance
advantages of multicore architectures, one
has to keep in mind that in future the terms
performance and efficiency will become
interchangeable. Due to the increased
parallelism, the performance of future
multicore processors will be limited by
mostly the amount of power supplied and not
the frequency at which it operates. Therefore,
the foremost issue with heterogeneous
multicore processor design is optimizing
power consumption by carefully selecting the
cores and engaging programmable
components based on the demands of
applications. 8 RECOMMENDATIONS
Based on their performance and
efficiency multicore processors are clearly
the future of computing. Advantages in
functionality and power consumption
associated with multicore designs will
increase the amount of possible applications
while adding new ways we will use
computers in our lives. For instance,
heterogeneous multicores and breakthroughs
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 20
in memory technologies such as
perpendicular recording and magnetic
memory, will allow the current desktops to
be squeezed down do the size of a cellphone.
Personalized programmable processors in
these phones may enable the use of
cellphones as credit cards, which will be
orders of magnitude more secure compared
to traditional methods. Ultimately,
heterogeneous cores will cause an
increasingly interactive experience from all
electronics across the board.
This is one of the multitude of
examples reflecting the importance of
technologies which accelerate the
development of heterogeneous multicore
processors. Although alternative research
directions, such as quantum computing,
promise lucrative opportunities, they do not
have solid theoretical and practical
foundations. Advances in multicore
processor architectures are based on decades
of research in the field of silicon based
semiconductors and not on a small number of
theoretical speculations. This is why
increasing research and financing in the field
of heterogeneous multicore processors is an
undeniably solid investment that will bring
significant returns in the long run.
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 21
REFERENCES
[1] Intel Pentium 4 Processor 670 Processor Datasheet. Intel Corporation [Online] Available from: www.intel.com [2] AMD Athlon 64 4800+ Processor: Datasheet. AMD Corporation [Online] Available from: www.amd.com [3] Intel Pentium D Processor 840, 830, and 820: Datasheet. Intel Corporation. [Online]
Available from: www.intel.com [4] AMD Athlon 64 X2 Processor: Datasheet. AMD Corporation [Online] Available from: www.amd.com [5] Intel Pentium M 770 Processor: Datasheet. Intel Corporation [Online] Available from: www.intel.com [6] Transmeta Efficeon TM 8800 Processor. Transmeta Corporation [Online] Available from www.transmeta.com [7] Intel PXA270 Processor: Datasheet. Intel Corporation [Online] Available from: www.intel.com [8] SiSoft – The Diagnostic Tool. SiSoft Corporation [Online] Available from: http://www.sisoftware.net/index.html?dir=&location=qa&langx=en&a= [9] Tom’s Hardware Guide Processors. Tom’s Guide Publishing, 2005. [Online] Available from: http://www23.tomshardware.com/index.html [10] SCHMID, P. Top Secret Intel Processor Plans Uncovered. Tom’s Guide Publishings.
[Online]. Available from: http://www.tomshardware.com/2005/12/04/top_secret_intel_processor_plans_uncovered/
index.html
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 22
[11] KUMAR R, FARKAS K, JOUPPI N, RANGANATHAN P, TULLSEN D. Single-ISA Heterogeneous Muti-Core Architectures: The Potential for Processor Power Reduction. Proceedings of the 36th International Symposium on Microarchitecture (MICRO-36’03). IEEE. 2003.
[12] BALAKRISHNAN S, RAJWAR R, UPTON M, LAI K. The Impact of Performance Asymmetry in Emerging Multicore Architectures. Proceedings of the 32nd International Symposium on Computer Architecture (ISCA’05). IEEE. 2005.
[13] JERRAYA A, TENHUNEN H, WOLF W. Introduction to Microprocessor Systems On Chips.
Computer, v 38, n 7, July 2005. pp. 36-40. [14] Goodacre J, Sloss A. Parallelism and the ARM Instruction Set Architecture. Computer, v
38, n 7, July 2005. pp. 42-50 [15] NAVA M.D, BLOUET P, TENINGE P, COPPOLA M, BEN-ISMAIL T, PICCHIOTTINO S, WILSON
R. An Open Platform for Developing Multiprocessor SOCs. Computer, v 38, n 7, July 2005. pp. 60-67
[16] LEIBSON S, KIM J. Configurable Processors: A New Era in Chip Design. Computer, v 38, n 7, July 2005. pp. 51-59.
[17] JERRAYA A, BAGHDADI A, CESARIO W, GAUTHIER L, LYONNARD D, NICOLESCU G, PAVIOT Y, YOO S. Application Specific Multiprocessor Systems-on-Chip. SASIMI.
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 23
GLOSSARY
CORE – Silicon device that contains transistor logic for the Processor. CPU – Central Processing Unit, or a Processor. DHRYSTONE – Benchmark used to measure floating point (MFLOPS)
performance of a processor GPS- Global Positioning System. HETEROGENEOUS – made of processor core of different architectures. KENTSFIELD- First quad-core desktop processor due to appear in 2007. MIPS – Million Instructions Per Second MFLOPS – Million Floating Operations per Second PDA- Personal Digital Assistant PROCESSOR – Component that is responsible for evaluation of instructions. RISC - Reduced Instruction Set Coun.This approach is used in most
modern microprocessors. It allows for faster and more efficient hardware designs.
WATT- Unit of power, or work per second. WHETSTONE- Benchmark used to measure integer (MIPS) performance of a
processor YOHAN- Dual core processor based on .065 process due to replace current
Pentium M.
VITKALOV | HETEROGENEOUS MULTICORE PROCESSORS 24