Itanium Microprocessorr.doc

19
Itanium Microprocessor Submitted by: Ammar Ibrahim Majeed Ph.D. Control Course / 2011-2012 Subject: Microprocessor

description

processors

Transcript of Itanium Microprocessorr.doc

Page 1: Itanium Microprocessorr.doc

Itanium Microprocessor

Submitted by: Ammar Ibrahim MajeedPh.D. Control Course / 2011-2012

Subject: Microprocessor

Page 2: Itanium Microprocessorr.doc

Contents Introduction

1 Market reception

o 1.1 High-end server market

o 1.2 Other markets

2 History

o 2.1 Development: 1989–2000

o 2.2 Itanium (Merced): 2001

o 2.3 Itanium 2: 2002–2010

o 2.4 Itanium 9300 (Tukwila): 2010

3 Market share

4 Architecture

o 4.1 Instruction execution

o 4.2 Memory architecture

o 4.3 Architectural changes

5 Hardware support

o 5.1 Systems

o 5.2 Chipsets

6 Software support

o 6.1 Emulation

7 Competition

8 Supercomputers and high-performance computing

Page 3: Itanium Microprocessorr.doc

9 Processors

o 9.1 Released processors

o 9.2 Future processors

9.2.1 Poulson

9.2.2 Kittson

Introduction:

Itanium is a family of 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). Intel markets the processors for enterprise servers and high-performance computing systems. The architecture originated at Hewlett-Packard (HP), and was later jointly developed by HP and Intel.

The Itanium architecture is based on explicit instruction-level parallelism, in which the compiler decides which instructions to execute in parallel. This contrasts with other superscalar architectures, which depend on the processor to manage instruction dependencies at runtime. Itanium cores up to and including Tukwila execute up to six instructions per clock cycle. The first Itanium processor, codenamed Merced, was released in 2001.

Itanium-based systems have been produced by HP (the HP Integrity Servers line) and several other manufacturers. As of 2008[update], Itanium was the fourth-most deployed microprocessor architecture for enterprise-class systems, behind x86-64, IBM POWER, and SPARC.[1] The most recent processor, Tukwila, originally planned for release in 2007, was released on February 8, 2010.[2][3]

1. Market reception :

High-end server market:

When first released in 2001, Itanium's performance, compared to better-established RISC and CISC processors, was disappointing.[4] [5] Emulation to run existing x86 applications and operating systems was particularly poor, with one benchmark in 2001 reporting that it was equivalent at best to a 100 MHz Pentium in this mode (1.1 GHz Pentiums were on the market at that time).[6] Itanium failed to make significant inroads against x86-32 or RISC, and then suffered from the successful introduction of x86-64 based systems into the high-end server market, systems which were more compatible with the older x86 applications.[citation needed] Journalist John C. Dvorak, commenting in 2009 on the history of the Itanium processor, said "This continues to be one of the great fiascos of the last 50 years" in an article titled "How the

Page 4: Itanium Microprocessorr.doc

Itanium Killed the Computer Industry".[7] Tech columnist Ashlee Vance commented that the delays and underperformance "turned the product into a joke in the chip industry."[8] In an interview, Donald Knuth said "The Itanium approach...was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write."[9] A former Intel official reported that the Itanium business had become profitable for Intel in late 2009.[10]

By 2009, the chip was almost entirely deployed on servers made by HP, which had over 95% of the Itanium server market share,[11] making the main operating system for Itanium HP-UX. Both Red Hat and Microsoft have announced plans to drop Itanium support in future versions of their operating systems due to lack of market interest;[12] [13] however, other Linux distributions including Debian are available for Itanium. On March 22, 2011, Oracle announced discontinuation of development on Itanium. Support for existing products will continue. On March 22, 2011 Intel reaffirmed its commitment to Itanium with multiple generations of chips in development and on schedule.[14]

Other markets:

Although remaining in development, and having attained a limited success in the niche of high-end computing, Intel had originally hoped to make Itanium a replacement for the original x86 architecture.[15]

AMD chose a different direction, designing the less radical x86-64, a 64-bit extension to the existing x86 architecture, which Microsoft then supported, forcing Intel to introduce the same extension in its own x86-based processors.[16] These designs can run existing 32-bit applications at native hardware speed, while offering support for 64-bit memory addressing and other enhancements to new applications.[11] This architecture has now become the predominant 64-bit architecture in the desktop and portable market; although some Itanium-based workstations were initially introduced by companies such as SGI, these are no longer available.

2. History:

Development: 1989–2000

In 1989, HP determined that reduced instruction set computer (RISC) architectures were approaching a processing limit at one instruction per cycle. HP researchers investigated a new architecture, later named explicitly parallel instruction computing (EPIC), that allows the processor to execute multiple instructions in each clock cycle.

Page 5: Itanium Microprocessorr.doc

EPIC implements a form of very long instruction word (VLIW) architecture, in which a single instruction word contains multiple instructions. With EPIC, the compiler determines in advance which instructions can be executed at the same time, so the microprocessor simply executes the instructions and does not need elaborate mechanisms to determine which instructions to execute in parallel.[19] The goal of this approach is twofold: to enable deeper inspection of the code at compile time to identify additional opportunities for parallel execution, and to simplify processor design and reduce energy consumption by eliminating the need for runtime scheduling circuitry.

HP believed that it was no longer cost-effective for individual enterprise systems companies such as itself to develop proprietary microprocessors, so it partnered with Intel in 1994 to develop the IA-64 architecture, derived from EPIC. Intel was willing to undertake a very large development effort on IA-64 in the expectation that the resulting microprocessor would be used by the majority of enterprise systems manufacturers. HP and Intel initiated a large joint development effort with a goal of delivering the first product, Merced, in 1998.[19]

During development, Intel, HP, and industry analysts predicted that IA-64 would dominate in servers, workstations, and high-end desktops, and eventually supplant RISC and complex instruction set computer (CISC) architectures for all general-purpose applications.[4][5] Compaq and Silicon Graphics decided to abandon further development of the Alpha and MIPS architectures respectively in favor of migrating to IA-64.[20]

Several groups developed operating systems for the architecture, including Microsoft Windows, Linux, and UNIX variants such as HP-UX, Solaris,[21] [22] [23] Tru64 UNIX ,[20] and Monterey/64 [24] (the last three were canceled before reaching the market). By 1997, it was apparent that the IA-64 architecture and the compiler were much more difficult to implement than originally thought, and the delivery of Merced began slipping.[25] Technical difficulties included the very high transistor counts needed to support the wide instruction words and the large caches.[citation needed] There were also structural problems within the project, as the two parts of the joint team used different methodologies and had slightly different priorities.[citation needed] Since Merced was the first EPIC processor, the development effort encountered more unanticipated problems than the team was accustomed to. In addition, the EPIC concept depends on compiler capabilities that had never been implemented before, so more research was needed.[citation needed]

Intel announced the official name of the processor, Itanium, on October 4, 1999.[26] Within hours, the name Itanic had been coined on a Usenet newsgroup, a reference to Titanic,

Itanium Server Sales forecast history.[17][18]

Page 6: Itanium Microprocessorr.doc

the "unsinkable" ocean liner that sank in 1912.[27] "Itanic" has since often been used by The Register,[28] and others,[29] [30] [31] to imply that the multibillion dollar investment in Itanium—and the early hype associated with it—would be followed by its relatively quick demise.

Itanium (Merced): 2001

By the time Itanium was released in June 2001, its performance was not superior to competing RISC and CISC processors.[32] Itanium competed at the low-end (primarily 4-CPU and smaller systems) with servers based on x86 processors, and at the high end with IBM's POWER architecture and Sun Microsystems' SPARC architecture. Intel repositioned Itanium to focus on high-end business and HPC computing, attempting to duplicate x86's successful "horizontal" market (i.e., single architecture, multiple systems vendors). The success of this initial processor version was limited to replacing PA-RISC in HP systems, Alpha in Compaq systems and MIPS in SGI systems, though IBM also delivered a supercomputer based on this processor.[33] POWER and SPARC remained strong, while the 32-bit x86 architecture continued to grow into the enterprise space, building on economies of scale fueled by its enormous installed base.

Only a few thousand systems using the original Merced Itanium processor were sold, due to relatively poor performance, high cost and limited software availability.[34] Recognizing that the lack of software could be a serious problem for the future, Intel made thousands of these early systems available to independent software vendors (ISVs) to stimulate development. HP and Intel brought the next-generation Itanium 2 processor to market a year later

Itanium 2: 2002–2010

The Itanium 2 processor was released in 2002, and was marketed for enterprise servers rather than for the whole gamut of high-end computing. The first Itanium 2, code-named McKinley, was jointly developed by HP and Intel. It relieved many of the performance problems of the original Itanium processor, which were mostly caused by an inefficient memory subsystem. McKinley contained 221 million transistors (of which 25 million were for logic), measured 19.5 mm by 21.6 mm (421 mm2) and was fabricated in a 180 nm, bulk CMOS process with six layers of aluminium metallization.[35]

In 2003, AMD released the Opteron, which implemented its own 64-bit architecture (x86-64). Opteron gained rapid acceptance in the enterprise server space because it provided an easy upgrade from x86. Intel responded by implementing x86-64 in its Xeon microprocessors in 2004.[20]

Intel released a new Itanium 2 family member, codenamed Madison, in 2003. Madison used a 130 nm process and was the basis of all new Itanium processors until Montecito was released in June 2006.

In March 2005, Intel announced that it was working on a new Itanium processor, codenamed Tukwila, to be released in 2007. Tukwila would have four processor cores and would

Page 7: Itanium Microprocessorr.doc

replace the Itanium bus with a new Common System Interface, which would also be used by a new Xeon processor.[36] Later that year, Intel revised Tukwila's delivery date to late 2008.[37]

In November 2005, the major Itanium server manufacturers joined with Intel and a number of software vendors to form the Itanium Solutions Alliance to promote the architecture and accelerate software porting.[38] The Alliance announced that its members would invest $10 billion in Itanium solutions by the end of the decade.[39]

In 2006, Intel delivered Montecito (marketed as the Itanium 2 9000 series), a dual-core processor that roughly doubled performance and decreased energy consumption by about 20 percent.[40]

Intel released the Itanium 2 9100 series, codenamed Montvale, in November 2007.[41] In May 2009 the schedule for Tukwila, its follow-on, was revised again, with release to OEMs planned for the first quarter of 2010.[2]

Itanium 9300 (Tukwila): 2010

The Itanium 9300 series processor, codenamed Tukwila, was released on 8 February 2010 with greater performance and memory capacity.[3]

The device uses a 65 nm process, includes two to four cores, up to 24 MB on-die caches, Hyper-Threading technology and integrated memory controllers. It implements double-device data correction, which helps to fix memory errors. Tukwila also implements Intel QuickPath Interconnect (QPI) to replace the Itanium bus-based architecture. It has a peak interprocessor bandwidth of 96 GB/s and a peak memory bandwidth of 34 GB/s. With QuickPath, the processor has integrated memory controllers and interfaces the memory directly, using QPI interfaces to directly connect to other processors and I/O hubs. QuickPath is also used on Intel processors using the Nehalem microarchitecture, making it probable that Tukwila and Nehalem will be able to use the same chipsets.[42] Tukwila incorporates four memory controllers, each of which supports multiple DDR3 DIMMs via a separate memory controller,[43] much like the Nehalem-based Xeon processor code-named Beckton.[44]

3. Market share:

In comparison with its Xeon family of server processors, Itanium has never been a high-volume product for Intel. Intel does not release production numbers. One industry analyst estimated that the production rate was 200,000 processors per year in 2007.[45]

Please note that the following numbers are based on servers and not on processors. It is not reported how many processors or multi-core processors were built into these servers, and neither is it clear whether clustered servers were counted as a single server or not. Therefore there seems to be no valid method for reasonably determining how many processors are represented by that number of systems.

Page 8: Itanium Microprocessorr.doc

According to Gartner Inc., the total number of Itanium servers sold by all vendors in 2007 was about 55,000. This compares with 417,000 RISC servers (spread across all RISC vendors) and 8.4 million x86 servers. From 2001 through 2007, IDC reports that a total of 184,000 Itanium-based systems have been sold. For the combined POWER/SPARC/Itanium systems market, IDC reports that POWER captured 42% of revenue and SPARC captured 32%, while Itanium-based system revenue reached 26% in the second quarter of 2008.[46] According to an IDC analyst, in 2007 HP accounted for perhaps 80% of Itanium systems revenue.[47] According to Gartner, in 2008 HP accounted for 95% of Itanium sales.[8] HP's Itanium system sales were at an annual rate of $4.4Bn at the end of 2008, and declined to $3.5Bn by the end of 2009,[48] compared to a 35% decline in UNIX system revenue for Sun and an 11% drop for IBM, with an x86-64 server revenue increase of 14% during this period.

4. Architecture:

The Intel Itanium architecture

Intel has extensively documented the Itanium instruction set and microarchitecture,[49] and the technical press has provided overviews.[4] [25] The architecture has been renamed several times during its history. HP originally called it PA-WideWord. Intel later called it IA-64, then Itanium Processor Architecture (IPA),[50] before settling on Intel Itanium Architecture, but it is still widely referred to as IA-64.

Page 9: Itanium Microprocessorr.doc

It is a 64-bit register-rich explicitly parallel architecture. The base data word is 64 bits, byte-addressable. The logical address space is 264 bytes. The architecture implements predication, speculation, and branch prediction. It uses a hardware register renaming mechanism rather than simple register windowing for parameter passing. The same mechanism is also used to permit parallel execution of loops. Speculation, prediction, predication, and renaming are under control of the compiler: each instruction word includes extra bits for this. This approach is the distinguishing characteristic of the architecture.

The architecture implements 128 integer registers, 128 floating point registers, 64 one-bit predicates, and eight branch registers. The floating point registers are 82 bits long to preserve precision for intermediate results.

Instruction execution:

Each 128-bit instruction word contains three instructions, and the fetch mechanism can read up to two instruction words per clock from the L1 cache into the pipeline. When the compiler can take maximum advantage of this, the processor can execute six instructions per clock cycle. The processor has thirty functional execution units in eleven groups. Each unit can execute a particular subset of the instruction set, and each unit executes at a rate of one instruction per cycle unless execution stalls waiting for data. While not all units in a group execute identical subsets of the instruction set, common instructions can be executed in multiple units.

The execution unit groups include:

Six general-purpose ALUs, two integer units, one shift unit Four data cache units Six multimedia units, two parallel shift units, one parallel multiply, one population count Two 82-bit floating-point multiply–accumulate units, two SIMD floating-point multiply–

accumulate units (two 32-bit operations each)[51] Three branch units

The compiler can often group instructions into sets of six that can execute at the same time. Since the floating-point units implement a multiply–accumulate operation, a single floating point instruction can perform the work of two instructions when the application requires a multiply followed by an add: this is very common in scientific processing. When it occurs, the processor can execute four FLOPs per cycle. For example, the 800 MHz Itanium had a theoretical rating of 3.2 GFLOPS and the fastest Itanium 2, at 1.67 GHz, was rated at 6.67 GFLOPS.

Memory architecture:

From 2002 to 2006, Itanium 2 processors shared a common cache hierarchy. They had 16 kB of Level 1 instruction cache and 16 kB of Level 1 data cache. The L2 cache was unified (both instruction and data) and is 256 kB. The Level 3 cache was also unified and varied in size

Page 10: Itanium Microprocessorr.doc

from 1.5 MB to 24 MB. The 256 kB L2 cache contains sufficient logic to handle semaphore operations without disturbing the main arithmetic logic unit (ALU).

Main memory is accessed through a bus to an off-chip chipset. The Itanium 2 bus was initially called the McKinley bus, but is now usually referred to as the Itanium bus. The speed of the bus has increased steadily with new processor releases. The bus transfers 2×128 bits per clock cycle, so the 200 MHz McKinley bus transferred 6.4 GB/s, and the 533 MHz Montecito bus transfers 17.056 GB/s [52]

Architectural changes:

Itanium processors released prior to 2006 had hardware support for the IA-32 architecture to permit support for legacy server applications, but performance for IA-32 code was much worse than for native code and also worse than the performance of contemporaneous x86 processors. In 2005, Intel developed the IA-32 Execution Layer (IA-32 EL), a software emulator that provides better performance. With Montecito, Intel therefore eliminated hardware support for IA-32 code.

In 2006, with the release of Montecito, Intel made a number of enhancements to the basic processor architecture including [53] :

Hardware multithreading: Each processor core maintains context for two threads of execution. When one thread stalls during memory access, the other thread can execute. Intel calls this "coarse multithreading" to distinguish it from the "hyper-threading technology" Intel integrated into some x86 and x86-64 microprocessors. Coarse multithreading is well matched to the Intel Itanium Architecture and results in an appreciable performance gain.

Hardware support for virtualization: Intel added Intel Virtualization Technology (Intel VT-i), which provides hardware assists for core virtualization functions. Virtualization allows a software "hypervisor" to run multiple operating system instances on the processor concurrently.

Cache enhancements: Montecito added a split L2 cache, which included a dedicated 1 MB L2 cache for instructions. The original 256 kB L2 cache was converted to a dedicated data cache. Montecito also included up to 12 MB of on-die L3 cache.

5. Hardware support:

Systems :

As of 2009[update] several manufacturers offer Itanium systems, including HP, SGI, NEC, Fujitsu, Hitachi, and Groupe Bull. In addition, Intel offers a chassis that can be used by system integrators to build Itanium systems.[54] HP, the only one of the industry's top four server manufacturers to offer Itanium-based systems today, manufactures at least 80% of all Itanium

Page 11: Itanium Microprocessorr.doc

systems. HP sold 7200 systems in the first quarter of 2006.[55] The bulk of systems sold are enterprise servers and machines for large-scale technical computing, with an average selling price per system in excess of US$200,000. A typical system uses eight or more Itanium processors.

Chipsets:

The Itanium bus interfaces to the rest of the system via a chipset. Enterprise server manufacturers differentiate their systems by designing and developing chipsets that interface the processor to memory, interconnections, and peripheral controllers. The chipset is the heart of the system-level architecture for each system design. Development of a chipset costs tens of millions of dollars and represents a major commitment to the use of the Itanium. IBM created a chipset in 2003, and Intel in 2002, but neither of them has developed chipsets to support newer technologies such as DDR2 or PCI Express.[56] Currently, modern chipsets for Itanium supporting such technologies are manufactured by HP, Fujitsu, SGI, NEC, and Hitachi.

The "Tukwila" Itanium processor model has been designed to share a common chipset with the Intel Xeon processor EX (Intel’s Xeon processor designed for four processor and larger servers). The goal is to streamline system development and reduce costs for server OEMs, many of whom develop both Itanium- and Xeon-based servers.

6. Software support:

As of 2010[update], Itanium is supported by the following operating systems:

Windows Server 2003 and Windows Server 2008 HP-UX 11i OpenVMS I64 NonStop OS multiple GNU/Linux distributions (including Debian, Ubuntu, Gentoo, Red Hat and Novell SuSE) FreeBSD /ia64[57]

However, Microsoft announced in 2010 that Windows Server 2008 R2 will be the last version of Windows Server to support the Itanium, and that it would also discontinue development of the Itanium versions of Visual Studio and SQL Server.[12] Likewise, Red Hat Enterprise Linux 5 was the last Itanium edition of Red Hat Enterprise Linux[13] and Canonical's Ubuntu 10.04 LTS was the last supported Ubuntu release on Itanium.[58] HP will not be supporting or certifying Linux on Itanium 9300 (Tukwila) servers.[59]

Oracle Corporation announced in March 2011 that it would drop development of application software for Itanium platforms, with the explanation that "Intel management made it clear that their strategic focus is on their x86 microprocessor and that Itanium was nearing the end of its life."[60]

HP sells a virtualization technology for Itanium called Integrity Virtual Machines.

Page 12: Itanium Microprocessorr.doc

To allow more software to run on the Itanium, Intel supported the development of compilers optimized for the platform, especially its own suite of compilers.[61] [62] Starting in November 2010, with the introduction of new product suites, the Intel Itanium Compilers were no longer bundled with the Intel x86 compilers in a single product. Intel offers Itanium tools and Intel x86 tools, including compilers, independently in different product bundles. GCC,[63][64] Open64 and MS Visual Studio 2005 (and later) [65] are also able to produce machine code for Itanium. According to the Itanium Solutions Alliance over 13,000 applications were available for Itanium based systems in early 2008,[66] though Sun has contested Itanium application counts in the past.[67] The ISA also supports Gelato, an Itanium HPC user group and developer community that ports and supports open source software for Itanium.[68]

Emulation:

Emulation is a technique that allows a computer to execute binary code that was compiled for a different type of computer. Before IBM's acquisition of QuickTransit in 2009, application binary software for IRIX/MIPS and Solaris/SPARC could run via type of emulation called "dynamic binary translation" on Linux/Itanium. Similarly, HP implemented a method to execute PA-RISC/HP-UX on the Itanium/HP-UX via emulation, to simplify migration of its PA-RISC customers to the radically different Itanium instruction set. Itanium processors can also run the mainframe environment GCOS from Groupe Bull and several x86 operating systems via Instruction Set Simulators.

7. Competition:

Itanium is aimed at the enterprise server and high-performance computing (HPC) markets. Other enterprise- and HPC-focused processor lines include Oracle Corporation's SPARC T4, Fujitsu's SPARC64 VII+ and IBM's POWER7. Measured by quantity sold, Itanium's most serious competition comes from x86-64 processors including Intel's own Xeon line and AMD's Opteron line. As of 2009[update], most servers were being shipped with x86-64 processors.[48]

In 2005, Itanium systems accounted for about 14% of HPC systems revenue, but the percentage has declined as the industry shifts to x86-64 clusters for this application.[69]

An October 2008 paper by Gartner on the Tukwila processor stated that "...the future roadmap for Itanium looks as strong as that of any RISC peer like Power or SPARC."[70]

8. Supercomputers and high-performance computing:

Page 13: Itanium Microprocessorr.doc

Percentage of Top500 systems

An Itanium-based computer first appeared on list of the TOP500 supercomputers in November 2001.[33] The best position ever achieved by an Itanium 2 based system in the list was #2, achieved in June 2004, when Thunder (LLNL) entered the list with an Rmax of 19.94 Teraflops. In November 2004, Columbia entered the list at #2 with 51.8 Teraflops, and there was at least one Itanium-based computer in the top 10 from then until June 2007. The peak number of Itanium-based machines on the list occurred in the November 2004 list, at 84 systems (16.8%); by June 2010, this had dropped to five systems (1%).[71]

9. Processors:

Released processors:

The Itanium processors show a progression in capability. Merced was a proof of concept. McKinley dramatically improved the memory hierarchy and allowed Itanium to become reasonably competitive. Madison, with the shift to a 130 nm process, allowed for enough cache space to overcome the major performance bottlenecks. Montecito, with a 90 nm process, allowed for a dual-core implementation and a major improvement in performance per watt. Montvale added three new features: core-level lockstep, demand-based switching and front-side bus frequency of up to 667 MHz.

Page 14: Itanium Microprocessorr.doc

Future processors:

As of December 2009, some information and speculations on future Itanium processors and roadmaps have been released.

Poulson:

Poulson will be the follow-on processor to Tukwila and is planned for release in 2012.[73] According to Intel, it will skip the 45 nm process technology and use a 32 nm process technology; it will feature eight cores, have a 12-wide issue architecture, multithreading enhancements, and new instructions to take advantage of parallelism, especially in virtualization.[42] [74] [75] Poulson has the world's biggest L3 cache size — 54 MB (32MB for Tukwila). L2 cache size is 6 MB, 768 kB per core[citation needed]. Die size is 544 mm², less than its predecessor Tukwila (698.75 mm²).[76]

At ISSCC 2011, Intel presented a paper called, "A 32nm 3.1 Billion Transistor 12-Wide-Issue Itanium Processor for Mission Critical Servers."[77] Given Intel's history of disclosing details about Itanium microprocessors at ISSCC, this paper most likely refers to Poulson. Analyst David Kanter speculates that Poulson will use a new microarchitecture, with a more advanced form of multi-threading that uses as many as four threads, to improve performance for single threaded and multi-threaded workloads.[78]

Kittson

Page 15: Itanium Microprocessorr.doc

Kittson will follow Poulson in 2014. Few details are known other than the existence of the codename and the binary and socket compatibility between Poulson, Kittson and Tukwila.[42]