AMD Fusion: How It Started, Where It’s Going, And What It Means By William Van Winkle, Tom's...

7/29/2019 AMD Fusion: How It Started, Where Its Going, And What It Means By William Van Winkle, Tom's Hardware US, Au

1/14

AMD Fusion: How It Started, Where Its Going, And

What It Means

http://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262.html12:00 AM - August 14, 2012 by William Van WinkleSource: Tom's Hardware US

Table of content

1 - The Story Of Fusion Begins

2 - Looking For The Other Half

3 - Merger And Mayhem

4 - Scaling The Brick Wall

5 - Up From The Ashes

6 - Fusion Ignites

7 - Heterogeneous Roots

8 - OpenCL And HSA

9 - Focus On The Programmer 10 - HSA's Big Picture

11 - More About The Big Picture

12 - HSA Tomorrow

The Story Of Fusion Begins

Nothing is more difficult than the art of maneuver. What is difficult about maneuver is to make the

devious route the most direct and to turn misfortune to advantage.

Sun Tzu, The Art of War

When I interviewed Dave Orton, then the president of ATI Technologies, in 2002, one of the firstthings he told me, "Its always whats possible in the business that keeps people going." No moreprophetic words could describe the coming merger between his company and CPU manufacturer

AMD. The big question, of course, isnt what is possible or even whether the possible will becomereality. The real question is whether the possible will become reality soon enough.
http://www.tomshardware.com/http://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-2.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-3.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-4.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-5.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-6.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-7.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-8.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-9.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-9.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-10.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-11.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-12.htmlhttp://www.tomshardware.com/http://www.tomshardware.com/http://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-2.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-3.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-4.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-5.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-6.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-7.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-8.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-9.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-10.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-11.htmlhttp://www.tomshardware.com/reviews/fusion-hsa-opencl-history,3262-12.html


2/14

Orton spent most of the '90s with Silicon Graphics and, in 1999, when almost anything intechnology seemed possible, he left SGI to join a little core logic startup called ArtX. The littlecompany won the development contract for Nintendos GameCube, which went on to sell a fewunits (somewhere north of 20 million). That fall, ArtX showed its first integrated chipset at Comdexand immediately the company flashed on the industrys radar as a prime acquisition target.

Ultimately, ATI was able to put ArtX in its pocket and made Orton its president and COO. Then thetech bubble burst, driver problems abounded, schedules slipped, and, for a while, it seemed that

ATI could do nothing right.

Part of the road back to glory hinged on Orton figuring out how to complete the meshing of thesetwo development teams. He was the one who figured out how to get ATI on a 12-month cycle fornew architectures and six- to nine-month cycles for iterative design revisions. Product teams weregiven more control and responsibility. And slowly, over 18 months, perhaps, with Nvidia kicking it ithe ribs at every turn, ATI managed to get back on its feet. The company rediscovered how toexecute.

"Just step back and understand your roots," said Orton. "Constantly build. You can never besatisfied with where you are. Youve got to be satisfied with where you can be and then drive tothat."

Back on top of its game, Orton knew it was time to keep drivingbut to where? I detected noglimmer of the future in our 2002 discussion. ATI continued to excel at integrating graphics intonorthbridge chips, and Intel, which still viewed integrated graphics as only needing to be goodenough for business apps, was still more of a partner than a competitor.

However, in a keenly prescient moment, Orton told me, "I guess if I could change one thing about

computing, Id like it to be more open to create a broader range of innovation. I recognize theadvantages of standards. Standards provide opportunity."

At two different points in our conversation, Orton lamented his daily Silicon Valley commute, evensaying that if he could invent anything, no matter how fantastic, it would be a Star Trek-esquetransporter. So perhaps we can take him at his word when, in 2007, he left his post as executivevice president of AMD in order to spend more time with his family. But this is jumping ahead. First,Ortons drive from Toronto was about to take a hard southern turn, straight down to Texas.

Looking For The Other Half

Right around the time I was speaking with Dave Orton, AMD founder Jerry Sanders was steppingdown to begin his well-earned retirement. Since just after its founding in 1969, Sanders had led

AMD through over three decades of highs and lows, cementing its place as the only major rival toIntel in the global CPU market. Sanders hired Hector Ruiz away from Motorola Semiconductor in2000 to become his right-hand man (as president and COO) and next in line for the CEO's chair.Two years later, Sanders took his final bow and Ruiz seized AMD's steering wheel.
http://www.tomshardware.com/gallery/Jerry-Sanders,0101-347514-0-2-3-1-jpg-.htmlhttp://www.tomshardware.com/gallery/Dave-Orton,0101-347505-0-2-3-1-jpg-.html


3/14

Meanwhile, Dirk Meyer, a former processor engineer for DEC and Intel, was rising through AMD'sranks. Meyer led the team in 1998 and '99 that produced the Athlon (K7), a design so successful(and based on a bus derived from DEC) that AMD crushed the then-leading Pentium III and beatIntel to the 1 GHz threshold.

In 2003, AMD followed up its Athlon success with the K8 (Hammer) architecture, which almostimmediately trounced everything Intel had going in the server market. Intels NetBurst, far fromrealizing its 10 GHz aspirations, turned out to be a blistering disappointment as Intel slowly beganto realize that efficiency trumps brute force. Intel didnt have a follow-up ready until 2006, when itreleased its Core architecture, which in turn presaged the groundbreaking Nehalem design of2008.

We all know that the Fates are fickle in technology. Everything ebbs and flows. If Athlon lit the fuseunder AMD, Hammer took the company into orbit. By 2005, Ruiz sensed it was time to try andoutflank his bigger competitor. That December, Ruiz elevated Dirk Meyer to chief operating officerof AMDs microprocessor business, effectively making Meyer number two in the company. By that

time, the two of them understood that having graphics on the northbridge was good...but not goodenough.

According to Forbes, AMD approached Nvidia about merging. As we go forward in this story,consider the element of personalities and corporate cultures, and ask yourself if AMD could havesurvived such a mix. The answer is likely found in Nvidias response: CEO Jen-Hsun Huang waswilling to entertain the idea, provided that he would be the chief executive of the resultingorganization. Ruiz, at that point feeling on top of the microprocessor world, understandably wentlooking elsewhere.

In July of 2006, AMD announced that it would buy ATI for the princely, almosthyperbolic, sum of $5.4 billion at a time when AMD was worth somewhere shy of $9 billion.

According to Joe Macri, who was then ATIs director of engineering and another Silicon Graphicsalumnus, it was a "grand vision" spawned from Dave Orton and Dirk Meyer.

"There was a lot of risk on AMDs part," says Macri, now as AMDs product chief technology office"But there was a lot of courage between Dirk and Dave. They could see a future of the need toconverge the CPU and GPU in a way that would allow it to be treated as a unified compute modelThat initial vision sounded simple. It started with a big business deal that was quite the effort to puoff. And then they brought us together as leaders and said, Do it!"

"Quite the effort" is an understatement of epic proportions.

Merger And Mayhem

In the summer of 2006, most people didnt grasp the strategy of what would eventually be calledFusion, the melding of CPU and GPU on a common die. Like most,Ars Technica at the timeviewed the merger as a way for AMD to bolster its portfolio breadth, freeing it from reliance on thirdparties for chipsets and expanding the company into areas such as ultramobile graphics and digitaTV.
http://www.forbes.com/sites/briancaulfield/2012/02/22/amd-talked-with-nvidia-about-acquisition-before-grabbing-ati/http://arstechnica.com/uncategorized/2006/07/7328/http://www.tomshardware.com/gallery/Dirk-Meyer,0101-347506-0-2-3-1-jpg-.htmlhttp://www.tomshardware.com/gallery/Hector-Ruiz,0101-347553-0-2-3-1-jpg-.htmlhttp://www.forbes.com/sites/briancaulfield/2012/02/22/amd-talked-with-nvidia-about-acquisition-before-grabbing-ati/http://arstechnica.com/uncategorized/2006/07/7328/


4/14

For their part, AMD and ATI remained thoroughly mute. While some of the silence may have beenmandated, owing to lengthy legal process of two large companies melding, a more practical reasomight have been the safeguarding of existing sales.

"At any point in time, youre married to a lot of partners," says AMDs Macri. "I use the wordmarried because theyre very deep relationships, both in business and, at the end of the day, apersonal level. Business is about personal relationships. We make commitments to each other. Wmight embody them in contracts, but part of that is were making a big personal commitment. And

you are only as good as your word. If you throw the grand vision out there without the time that ittakes to move all your partners to the vision, you lose all your partners. AMD at the time had Nvidias a very strong chipset and graphics partner. You cant flip those relationships on and off like aswitch. So the guys were somewhat limited in their ability to explain to the world this grand visionand how it would all play out."

In early 2006, AMDs stock price hovered just above $40 per share. One year later, at a time whenthe market was nearing its pre-recession peak, AMD had tumbled to under $15. Two years, later, was bouncing on a $2 floor. A five-year comparison between AMD and Intel shows the story frompre- to post-recession. While Intel looks relatively flat, the rise and fall of AMD is as exhilarating asit is heartbreaking.

Economic downturn aside, what happened? Heading into late 2006, AMD entered into the first ofwhat would become seven consecutive quarterly losses preceding Hector Ruizs resignation. IntelCore architecture was out and ramping. Nvidias GeForce 7-series, launched in June 2005 toconsiderable fanfare, gave way to the even better 8-series in November 2006. Meanwhile, thedelay-plagued ATI Radeon X1000 series arrived in 2005; there was no major 2006 update. Thefollow-up Radeon HD 2000 didnt launch until April 2007in Tunisia, of all placesand eventhough AMD/ATIs performance was starting to edge back up, its momentum in the market had

significantly slipped.

And those were just the visible problems. Behind the scenes, in the back rooms where the twocompanies were trying to figure out how to coexist and blend, matters were even more muddled.

Scaling The Brick Wall

This AMD core team found itself with two fundamental problems, one technical and the otherphilosophical, and both had to be solved before anything could move forward.

"On a pure technology and transistor side, we had a conundrum on our hands," says AMDs Macri"What makes CPUs go really fast ends up burning a lot of power on a GPU. What makes GPUs goreally fast will actually slow a CPU down incredibly. So the first thing we ran into was just gettingthem to live on the die together. We had the high-speed transistor combined with the very low-resistance metal stack thats optimal for CPUs versus the GPUs more moderate-speed transistoroptimized around very dense metalization. If you look at the GPUs metal stack, it looks like theletter T. It looks like the letter Z in a CPU. Ones low-resistance, ones lower density, and so higheresistance. We knew we had to get these guys to live on the same die where they both performvery well, because no ones going to give us any accolades if the CPU drops off or the GPU powegoes up or performance falls. We needed to do both well. We very quickly discovered that wall."
http://www.tomshardware.com/gallery/AMD-INTC-Chart,0101-347500-0-2-3-1-jpg-.html


5/14

Imagine the pressure on that team. With billions of dollars and the companys future at stake, thegroup eventually realized that a hybrid solution couldnt exist on the current 45 nm process.Ultimately, 45 nm was too optimized for CPU. Understanding that, the question then became howto tune 32 nm silicon-on-insulator (SOI) so that it would effectively play both sides of the fence. Ofcourse, 32 nm didnt exist outside of the lab yet, and much of what finally defined the 32 nm nodefor AMD grew from the Fusion pursuit.

Unfortunately, until the 32 nm challenge was solved, Fusion was at a standstilland it took a year

of work to reach that solution. Only then could design work begin.

Meanwhile, the Fusion team was also fighting a philosophical battle. With the transistor andprocess struggle, it was massive, but at least the team knew where it needed to go and what thefinish line looked like. Even with the transistor challenge figured out, the question still remained ofhow to best architect an APU.

"One view was like, the GPU should be mostly used for visualization. We should really keep the

compute on the CPU side," says Macri. "A second view said, no, weve gotta split the views acrossthe two halves. Weve got this beautiful compute engine on the GPU side; we need to takeadvantage of it. There were two camps. One said things should be more tightly coupled betweenthe CPU and GPU. Another camp said things should be more loosely coupled. So we had to havethis philosophical debate of deciding what we should treat as a compute engine. Through a lot ofmodeling, we proved that there was an enormous advantage to a vector engine when you haveinherent parallelism in your code."

This might have seemed obvious from ATIs prior work with Stream, but the question was howmuch work to throw at the GPU. Despite being highly parallel, GPUs remain optimized forvisualization. They can process traditional parallel compute tasks, but this introduces more

overhead. With more overhead comes more impact on visualization. With infinite availabletransistors on the die, one could just keep throwing resources at the problem. But, of course, thereare only a few hundred million transistors to go around.

"Think of all the applications of the world as a bathtub," says Macri. "If you look at the left edge ofthe bathtub, we call those applications the least parallel, the ones with the least amount of inherenparallelism. A good example of that would be pointer chasing, right? You need a reference. Youneed to go grab that memory to figure out the next memory you gotta go grab. No parallelism thereat all. The only way to parallelize is to start guessingprediction. Then, if you go to the right edge othe bathtub, matrix multiply is a great example of a super-parallel piece of code. Everything isdisambiguated very nicely, read and write stream is all separate, its just beautiful. You can

parallelize that out the wazoo. For those applications, its very low overhead to go and map thatinto a GPU. To do the left side well, though, means building a low-latency memory system, andthat would load all kinds of problems into a GPU that really wants a high-bandwidth, throughput-optimized memory system. So we said, 'How do we shrink the edges of the bathtub?' Because, thecloser we could bring those edges, the more programs we could address in a very efficient way."

A big part of the philosophical debate boiled down to how much to shrink those bathtub edgeswhile preserving all of AMDs existing visualization performance. Naturally, though, while all of thisdebate was happening, AMD was getting hammered in the market.
http://www.tomshardware.com/gallery/Fusion-Dragon,0101-347507-0-2-3-1-jpg-.html


6/14

Up From The Ashes

No one on the outside could see the engineers frantically fighting for answers and a path forward.What they saw were stony-faced executives and delays. Lots and lots of delays in multiplesegments from graphics to CPUs to chipsets. In July, ten months after the merger, AMD executivevice president Dave Orton, arguably one of the most influential minds behind todays hybridprocessor trend, resigned. In September, chief sales and marketing officer Henri Richard followedOrton out the door.

Ultimately, the $5.4 billion decision to buy ATI rested on Hector Ruizs shoulders, and onlookerscouldnt help but associate this purchase with the catastrophic plummet in AMDs financials.

Sources at TheStreet.com indicated that Ruiz might also be resigning, although his contract ranuntil April 26, 2008. As it turned out, Ruiz survived another three months, finally leaving in July.

Ruiz went on to become the first CEO of GlobalFoundries in March 2009. More significantly,

GlobalFoundries started out as AMDs spun-off manufacturing arm. With its various setbacks, AMDcould no longer afford to maintain so much manufacturing capacity all for itself. This marks thelatest and hopefully last major divestiture of the companys assets, capping 2008s jettisoning ofthe digital TV business and (this one really had to hurt) early 2009s $65 million sale to Qualcommof the old ATI handset division and all of the mobile graphics and multimedia intellectual propertythat went with it. By that point, AMD had already written off $3.2 billion in bottom-line value.

Of course, as with most major downturns in life, so long as you dont stop moving, youre not deadand the odds are that things will improve. Having shed much of its former self, AMD is now leftas a predominantly R&D- and IP-based company. Its smaller, lighter, and more flexible. But is thaenough? AMDs board didnt seem to think so. In January 2011, the last of the old directorate, Dirk

Meyer, was pushed aside, according to the press release, to "accelerate the companys ability" to"have significant growth, establish market leadership and generate superior financial returns."

Some argued that this was undeserved for a guy who apparently brought AMD back from the brinkof ruin and saw the first fruits of his Fusion "grand vision" finally start to reach the market. Butothers questioned at the time of Meyerss ascendance to the top post whether he had the saleschops to make the big deals that AMD so desperately needed.

With that in mind, one might examine his replacement, Rory Read. Read spent 23 years at IBM,where he held a "broad range of management positions." Following IBMs PC division, Read thenmoved to Lenovo in 2006 and eventually became its president and COO in 2009. During his timethere, Lenovo became the third-largest PC vendor in the world. Read was appointed president andCEO of AMD on August 25, 2011. Not even a year into the job, its still too early to render a verdicon Read. But if the AMD board was hoping for a guy that would yield big deals and new directionsit appears they got exactly what they wanted.
http://gigaom.com/2009/01/20/amd-chipping-away-at-ati-buy/http://www.amd.com/us/press-releases/Pages/amd-appts-seifert-2011jan10.aspxhttp://eetimes.com/electronics-news/4219307/AMD-appoints-former-Lenovo-exec-CEOhttp://www.tomshardware.com/gallery/Rory-Read,0101-347520-0-2-3-1-jpg-.htmlhttp://gigaom.com/2009/01/20/amd-chipping-away-at-ati-buy/http://www.amd.com/us/press-releases/Pages/amd-appts-seifert-2011jan10.aspxhttp://eetimes.com/electronics-news/4219307/AMD-appoints-former-Lenovo-exec-CEO


7/14

Fusion Ignites

Throughout this shuffling of top office name plates, AMD engineers continued their dogged pursuitof Fusion. What began as a team of four peopleformer ATI vet Joe Macri, the recently deceased

AMD fellow Chuck Moore, then-graphics CTO Eric Demmers (now at Qualcomm), and AMD fellowPhil Rogers, who was the groups technical leadhad grown to envelop the top three layers ofengineers from both the CPU and GPU sides of the company. Macri describes the early phase oftheir collaboration as "the funnest five months Ive ever had." The first 90% of the Fusion effort wa

an executive engineers dream. "The last 10% was excruciating pain in some ways."

"That effort resulted in a couple of things," adds Macri. "One, we ended up with the bestarchitecture out there thats unifying scalar and vector compute. It blows away what [Intel] did withLarrabee. The Nvidia guys have only attacked part of the problem, because they only have the IPportfolio to attack part of the problem. What theyve done isnt bad. Its actually good for having onhand tied behind their back. But with [Fusion], we had the full IP capability, and it truly is the firstunified architecture, top to bottom."

Technical architecture aside, AMD developed something else: a cohesive, merged company. Outof the pressure and pain of Fusion development emerged a different company than either of thetwo that had gone into it. The old days of talking about "red" and "green" teams were finally gone.

"We were similar in that we were both in a major fist fight with one guy," says Macri. "I think ATIhad the fairer fight in that we were up against a similarly-sized company [Nvidia]. But this had a lotof impact on design and implementation cycles. Now, the guys at AMD had won a number of timesbut it was more like David and Goliath [Intel]. It was like, 'Wow, we actually beat Goliath!' With ATIwed been in a fist fight for many years with Nvidia, and we won as many as we lost. So we had adifferent attitude about winning. ATI needed to learn that there were some Goliaths out there, and

you have to be pretty damned smart to beat a Goliath. AMD learned that it actually was a Goliath icertain cases. It could be an equal. Now merge that with some faster time to market strategies.Today, our product cycle time is faster than ever across the board. So the melding gave both sidesa better ability to attack not just their traditional competitors but also new competitors coming up.

And those new guys coming up arent big. Theyre all kind of small. Theyre all AMD-sized. I dontthink AMD ever would have had the right attitude on how to beat someone their own size withoutinheriting ATI. And I dont think ATI could have figured out how to beat someone several timesbigger without AMDs attitude of asking how you aim where the other guys not aiming."

Just as the two organizations were completing their cultural fusion, the Fusion effort itself wasnearing the end of its first stage. AMD showed its first Fusion APUs to the world at CES in early2011, and product started shipping shortly thereafter. In the consumer space, the Llano platforms,based on the 32 nm K10 core, arrived in the A4, A6, A8, and E2 APU series. Anotherannouncement from 2011 CES was that the Fusion System Architecture would henceforward beknown as the Heterogeneous System Architecture (HSA).According to AMD, the company wanteto turn HSA into an open industry standard, and a name that didnt reflect a long-standing AMD-centric effort would help illustrate that fact. This would prove to be the first hint of AMDs evenlarger aspirations.
http://blogs.amd.com/fusion/2012/01/18/amd-fusion-system-architecture-is-now-heterogeneous-systems-architecture/http://www.tomshardware.com/gallery/APU-GFLOPS,0101-347502-0-2-3-1-jpg-.htmlhttp://blogs.amd.com/fusion/2012/01/18/amd-fusion-system-architecture-is-now-heterogeneous-systems-architecture/


8/14

Heterogeneous Roots

In the end, did Fusion matter? Quite simply, it changed the direction of modern mainstreamcomputing. All parties agree that discrete graphics will remain firmly entrenched at the high-end.But according to IDC, by the end of 2011, nearly three out of every four PC processors sold wereintegrated, hybrid processorsAPUs, as AMD calls them. AMD adds that half of all processorssold across all computing device segments, including smartphones, are now what it refers to as

APUs.

Ubiquitous as that might sound, though, the APU is not the endgame; its only the beginning.Simply having two different cores on the same die may improve latency, but the aim of Fusion wasalways to leverage heterogeneous computing in the most effective ways possible. Having discreteCPUs and GPUs each chew on the application tasks best suited to them is still heterogeneouscomputing. Having those two cores present on the same die is merely an expression ofheterogeneous computing more suited to certain system goals, such as optimizing highperformance in a lower power envelope. Of course, this assumes that programs are being written

to leverage a heterogeneous compute modeland most are not.

Ageia was one of the first companies in the PC world to address this problem. In 2004, a fledglingsemiconductor company named Ageia purchased a physics middleware company called NovodeXand thus was born the short-lived field of physics processing units (PPUs), available on third-partystandalone cards. For games coded to leverage Ageias PhysX engine, these cards could radicallyimprove physics simulation and particle motion. PhysX caught on with many developers, andNvidia bought Ageia in 2008. Over time, Nvidia phased out the PPU side of the technology andsupported PhysX on any CUDA-ready 8-series or newer GeForce card.

Ageias fame drew the attention of Dave Orton and others at ATI. Even before the AMD merger,ATI had been working to enable general-purpose GPU computing (GPGPU) in its Radeon line. In2006, the R580 GPU became the first ATI product to support GPGPU, which the company soonbranded as Stream. The confusing nomenclature of Stream, FireStream, stream processors, andso on gives some indication to the initial lack of cohesion in this effort. Stanfords folding@homedistributed computing project became ATIs first showcase for just how mind-blowing the GPGPUperformance advantage could be.

The trouble was that Stream never caught on. Nvidia seized its 2006/2007 execution upswing,

capitalized on the confusion reigning at AMD at that time, and solidified CUDA as the go-totechnology for GPGPU computing. But this is a bit like describing a goldfish as the hugest creaturein the tank when all of the other fish are guppies. Despite a lot of notoriety in gaming and academicircles, GPGPU development remained very niche and far from mainstream awareness.

"AMD has been promoting GPU compute for a really long time," says Manju Hegde, former CEO oAgeia and now corporate vice president of heterogeneous applications and developer solutions atAMD. "But eight years ago, it wasnt right. Five years ago, it wasnt right. Now, with the explosion othe low-power market, smartphones and tablets, its right. And for developers to create the kinds oexperiences that normal PC users expect, they have to go to GPU computingbut it has to bebased on something easy like HSA."
http://www.tomshardware.com/news/sandy-bridge-fusion-igp-onboard-graphics-gpu,13898.htmlhttp://folding.stanford.edu/English/HomePagehttp://www.tomshardware.com/gallery/APU-Sales,0101-347503-0-2-3-1-jpg-.htmlhttp://www.tomshardware.com/news/sandy-bridge-fusion-igp-onboard-graphics-gpu,13898.htmlhttp://folding.stanford.edu/English/HomePage


9/14

OpenCL And HSA

The slow take-off of GPGPU computing had less to do with niche markets than it did withproblematic programming. Simply put, the world was built to code compute for CPUs, and shiftingsome of that code over to GPUs was anything but straightforward.

"Various specialized hardware designs, such as Cell, GPGPUs, and MIC, have gained traction asalternative hardware designs capable of delivering higher flop rates than conventional designs,"

notes IEEE author D.M. Kunzman in the abstract for the paper Programming HeterogeneousSystems. "However, a drawback of these accelerators is that they simultaneously increaseprogrammer burden in terms of code complexity and decrease portability by requiring hardwarespecific code to be interleaved throughout application code...Further, balancing the applicationworkload across the cores becomes problematic, especially if a given computation must be splitacross a mixture of core types with variable performance characteristics."

Not surprisingly, all of the traditional APIs built to interface with GPUs were designed for graphics.

To make a GPU compute math, one had to pretend operations were based on textures andrectangles. The great advance of OpenCL was that it dispensed with this work-around andprovided a straight compute interface for the GPU. OpenCL is managed by the non-profit KhronosGroup, and it is now supported by a wide range of industry players involved with heterogeneouscomputing, including AMD, ARM, Intel, and Nvidia.

So, if OpenCL provides a software framework for heterogeneous computing, that still doesntaddress the hardware side of the problem. Whether discussing servers, PCs, or smartphones, howshould the hardware platform (distinct from the CPU, GPU, and/or APU) perform heterogeneouscomputing? Clearly, platforms were not designed for this paradigm in the past. The computingdevice typically had one system memory pool, and the programmer has to copy data from the CPU

memory space to the GPU memory spacewithin the same poolbefore the application can starexecuting its process. That same is true for fetching the results back again. In a system with onlyone memory pool, repetitive copying of data to different areas within the same memory is highlyinefficient.

This is where HSA comes in. HSA brings the GPU into a shared memory environment with theCPU as a co-processor. The application gives work directly to the GPU just as it does to the CPU.The two cores can work together on the same data sets. With a shared memory space, theprocessors use the same points and addresses, making it much more efficient to offload smalleramounts of work because all of that old copying overhead is gone.

In addition to unified memory, AMD notes that HSA establishes cache coherency between the CPand GPU, eliminating the need to do a DMA flush every time the programmer wants to move databetween the CPU and GPU. The GPU is also now allowed to reference pageable memory, so theentire virtual memory space is available. Not least of all, HSA adds context switching, enablingquality of service. With these features in hardware, an HSA platform becomes very similar inprogramming style to that of a CPU.

"Shared memory makes the whole system much easier to program," adds AMD fellow Phil Rogers
http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6009017http://www.tomshardware.com/gallery/hsastacksample2,0101-347993-0-2-3-1-png-.htmlhttp://www.tomshardware.com/gallery/HSA-Problem-Solution,0101-347511-0-2-3-1-jpg-.htmlhttp://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6009017


10/14

"One of the barriers to using GPU compute today is a lot of programmers tell us they find it toohard. They have to learn a new API. They have to manage these different address spaces. Theyrenot sure when the right time is to copy the data. When you eliminate barriers like this across theboard and enable high-level languages, you make it so much easier to program that suddenly youget tens of thousands of programmers working on your platform instead of dozens or hundreds.Thats a really big deal."

Focus On The Programmer

"Every programmer has their own favorite language," says Phil Rogers. "They can almost bereligious about it. You dont want to tell a programmer that he has to change the way he currentlydevelops applications in order to deliver better experiences. HSA enables heterogeneouscomputing for all high-level languages over time."

Compatibility with C and C++ might have been sufficient for some, but AMD wanted to make sureeveryone was covered, so it expanded HSA to work with C#, Java, and even functional languages

And because theres only so much AMD can do on its own, the company decided to turn HSA intoan open standard governed by the HSA Foundation, which boasts founding members including

ARM, MediaTek, and Texas Instruments. Officially launched in June 2012, the HSA Foundationsgoal is to promote HSA-enabled platforms and software at all levels. This includes making SDKs,libraries, training, and other resources available to all programmers, often for free. From adeveloper perspective, the whole idea of HSA is that programmers can now easily take advantageof the heterogeneous compute model in their apps without being bound to design or write in anycertain way.

"Programmers dont just program to the metal," says AMDs Manju Hegde. "They need propercompilers, debuggers, profilers, optimization tools, libraries. These are the tasks ahead of us,which is why we established the HSA Foundation, to drive the standard forward. A lot of the toolswe do will be open sourced. This will give partners quicker time to market and lower the financialburden. When the software ecosystem sees that this is a genuine industry effort, the value of thatwill not be lost upon them. Because this is one of the first times that hardware companies aremaking changes to chip architecture to accommodate ease of software development. Tons ofcompanies make changes because they want new capabilities, new features. But weve madechanges simply to make it easier for programmers. Thats whats needed to make thingspervasive."

Its important to keep in mind that OpenCL and HSA are two very different things, and its likely thathe former will evolve to better fit the latter with time. Even without HSA, OpenCL offers a muchdifferent programming experience than it did even two years ago. For example, OpenCL 1.2drastically reduces the amount of initialization code and other overhead code that used to berequired for OpenCL. With HSA, that trend toward simplicity and performance will continue asprogrammers no longer need to manage two different memory spaces.

"Say a programmer uses Visual Studio today, writing C++ apps in Windows," says Phil Rogers.
http://hsafoundation.com/http://www.tomshardware.com/gallery/Lines-of-Code-and-Performance,0101-347515-0-2-3-1-jpg-.htmlhttp://www.tomshardware.com/gallery/Manju-Hegde,0101-347516-0-2-3-1-jpg-.htmlhttp://hsafoundation.com/


11/14

"There are hundreds of thousands of programmers doing that. For them, they can use this newtechnology in Visual Studio Microsoft released called C++ AMP, short for Accelerated MassiveParallelism. In C++ AMP, they only added two keywords to the language, restrict and array_view,and just by adding those two functions, those programs are marked for offload into GPU. The tinychange to the program gives numerous benefits when they have chunks of very parallel code inexisting applications. Its a much easier transition than one might expect."

HSA's Big Picture

With all of the focus on APUs and GPGPU, its easy to forget that theres more to life than parallelcode. The CPU remains a critical part of heterogeneous computing. Much of the code in modernapplications remains serial and scalar in nature and will only run well on strong CPU cores. Buteven for the CPU, there are different types of workloads. Some loads do best on a few fast cores,while others excel on a larger number of lower-power cores. In both cases, and as mentionedearlier, applications need to be tailored to fit a power envelope for a particular device, whether itsan all-in-one, notebook, tablet, or phone. As APUs gradually take over most (but not all) of the CParena, were seeing APUs diversify and segment in order to address these different powerrequirements. The difference now is that APUs seem likely to soon offer nearly twice the diversityof recent CPU families since they must address both scalar- as well as vector-based needs acrossdevice markets.

I attended AMDs 2012 Fusion Developer Summit (AFDS) in Seattle, and, to my ears, it soundedlike the last thing on anyones mind was the desktop market. There was a lot of buzz about AMDleveraging HSA to find better roads into the mobile markets. The biggest news was far and away

ARM, the leading name in ultramobile processors, joining the HSA Foundation (remember, HSA is

agnostic to architecture). This carries significant ramifications in many directions.

Everybody knows that mobile is hot and desktop is flat, at least in industry sales terms. To me,much of the messaging at AFDS seemed to reflect this, perhaps because desktop is the segmentthat seems to care least about the power and efficiency benefits HSA promises. So when I wasable to sit down with Phil Rogers, I asked him if one of the outcomes of HSA would be a graduallyincreasing shift by AMD toward battery-powered devices and a leaning away from desktops.

"This is a common misconception," he said. "Power matters a lot on the whole range of platforms.On the battery situation, everybody gets it. Even with desktops, and more and more, what werecalling desktops are becoming all-in-ones, people want to know not just how fast it runs but howquiet it is and how attractive it is as a product. Were seeing that even gamers dont want a boxnext to their leg pumping hot air on their shins while theyre playing. What they really want is a 30screen on the wall with a PC built into it that runs fantastic. And in that environment, you do careabout power. Even if you dont care about the electricity bill, you dont want fans whining andscreaming or heating up and taking away clock speed."

At the other end of the market, servers stand to benefit greatly from HSA. Consider data centersand the continuing growth of cloud computing. With even smallish data centers now hosting more
http://www.tomshardware.com/gallery/Convergence-Trend,0101-347504-0-2-3-1-jpg-.htmlhttp://www.tomshardware.com/gallery/Phil-Rogers-and-Jem-Davies,0101-347519-0-2-3-1-jpg-.html


12/14

than 10 000 servers each, power efficiency continues to grow in urgency. Generally speaking,hardware costs comprise only one-third of a server's total cost over its service life. Another third isspent on electricity used, and the remaining third goes toward cooling costs. If HSA can helpimprove compute efficiency, allowing systems to complete tasks more quickly so they can turn offlarge logic blocks or entire cores, then power consumption can decline drastically.

"You can only pack processors so densely," says Hegde. "HSA allows you to process moredensely and at a lower power envelope. I dont need to tell you that CUDA has been doing HPCapplications for five years. All those applications are so much easier with HSA because HSA isheterogeneous between GPU and CPU. Nvidia always leans toward the GPU. And certainly, thereare some embarrassingly parallel applications out there, but the vast majority of applications arenot, including many HPC applications. So dont think that HSA is just about client; its anarchitecture that spans many platforms."

More About The Big Picture

For many PC users, "integrated" chips are synonymous with lower performance. This perceptionperhaps remains from the days of northbridge-based graphics, which were mediocre even on thebest days. Most likely, that stigma will soon vanish. We dont have the side-by-side data to showhow CPU Die X performs in both APU and graphics-free versions. But it seems safe to say that in heterogeneous environment, leveraging software designed for APU-type processing, an APU willdeliver better total performance than if those CPU and GPU cores are separated into discretecomponents. If the APU paradigm wasnt inherently better, the entire industry wouldnt be shiftingto it so rapidly.

Testing at Toms Hardware shows that, in a toe-to-toe battle between AMD and Intel, just looking ax86 performance, Intel is todays clear victor. Far less clear is what happens once heterogeneousgraphics are factored in. Just as Intel has the stronger CPU, AMD obviously has the strongergraphics architecture. What happens when the software running across these platforms takes fairand ample use of OpenCL and other heterogeneous architectures? Thats what our ongoingheterogeneous compute series is looking to answer.

"Today," says AMDs Hegde, "the CPUs in our APUs may be lower in performance in your tests

against Intel. But thats not an indication of where we are going. Were now embracing the low-power space in a very strong way, so we are building CPUs that are going to have very goodperformance with a very low power envelope. So when we say balanced platform, were notsaying that you have to take one or the other. Were saying that, in a balanced platform, for everyworkload, do it in the place that makes the most sense. Intels approach of doing everything on theCPU is just wrong, because when you put a balanced workload on just the CPU, your power drawis dramatic and unnecessary. Thats why we dont think thats a good model. Were going to makethe cost of transition from CPU to GPU next to nothing. Then its up to each application to choosethe right execution engine in terms of performance and power."
http://www.tomshardware.com/gallery/2013-Client-Platforms,0101-347499-0-2-3-1-jpg-.htmlhttp://www.tomshardware.com/gallery/Next-Gen-Opteron-APU-Platform,0101-347518-0-2-3-1-jpg-.html


13/14

Similarly, AMD feels that HSA may be its path to success in smartphones and tablets, becausewhen applications are properly optimized for balance, the GPU becomes much more influential indetermining battery life. HSA targets two things: ease of programming and performance per watt.Once GPGPU compute comes into play, analysis shows that the GPU is 4x to 6x moreperformance per watt efficient than a CPU. ARM may have the biggest piece of that heterogeneou

smartphone opportunity, but AMD is betting theres still plenty of room for a strong number-twoplayer who just happens to be using the same programming architecture as ARM. As mentioned,the implications for Nvidia and ultramobile would-be contender Intel are significant.

HSA Tomorrow

I asked Manju Hedge what part of the whole HSA effort keeps him awake at night. He answeredwithout hesitation:

"How were going to get adoption. Our plan is to go broad. With the HSA Foundation, we made ahuge bet. Were contributing...I dont know how many millions of dollars of IP. The reason weredonating this to the industry is we want a rising tide to lift all. When that happens, the softwareecosystem gets excited and that will be a catalyst to increasing adoption."

AMD is dead serious about this. Today, all of the companys OpenCL tools are freely distributed.To my knowledge, every other company in the heterogeneous space charges for its tools. AMDwants all kinds of developers, from the largest names to one-person garage outfits, to be free from

worrying about the economics of tools for developer environments.

Will HSA complete the reversal of AMDs misfortunes and restore the company to its former glory?A cynic might answer: it sure couldnt hurt. But the more thoughtful answer is that HSA is theculmination of a heterogeneous strategy born almost a decade ago. Dave Orton and Dirk Meyersaw the inevitability of today and set out to make that future happen in a way that was as beneficiato everyone involved. We see time and again that the industry gravitates to open standards andincreasing efficiency. Given this, it seems likely that AMD has finally scored the victory its soughtfor so long.

"At the end of the day, were not maniacally focused on beating a single company," says AMDsJoe Macri. "Were maniacally focused on up-leveling the experience of all consumers. By doingthat, many of our competition will fall out. Theyll not have the IP, or theyll have only some of theIP. Or maybe theyll be forced to merge, which is a very difficult thing. Intel is a wildcard. No doubtabout it, theyre a very capable company and a great bunch of folks. But when youre theincumbent, its much more difficult to embrace change. You only want to embrace unavoidablechange because everything else costs money. So we gotta see how they pull their triggers. But Idhate to be in their position. You dont know how many people I interview from Intel specificallybecause their research doesnt get utilized, because a VP says, 'Hey, thats just going to cost us
http://www.tomshardware.com/gallery/HSA-Roadmap,0101-347512-0-2-3-1-jpg-.htmlhttp://www.tomshardware.com/gallery/AMD-Open-Source,0101-347501-0-2-3-1-jpg-.htmlhttp://www.tomshardware.com/gallery/Heterogeneous-Supporters,0101-347509-0-2-3-1-jpg-.html


14/14

money. Maybe well utilize your research somewhere over the next ten years.' That kills anengineer.

At a time when cracks are appearing in Moores Law and the costs of shrinking fab processescontinue to skyrocket, an industry that wants to keep accelerating compute capabilities must turnincreasingly to optimizing efficiency. Ultimately, this is what GPGPU and HSA enable. By oldmethods, how much would a CPU need to evolve in order to facilitate a 5x performance gain?Now, such gains are possible simply through hardware and software vendors adopting an end-to-

end platform such as HSA. No pushing the envelope of lithography physics. No new multi-billion-dollar factories. Just more efficient utilization of the technologies already on the table. And throughthat, the world of computing can take a quantum leap forward.

Tom's Hardware - http://www.tomshardware.com
http://www.tomshardware.com/http://www.tomshardware.com/

AMD Fusion: How It Started, Where It’s Going, And What It Means By William Van Winkle, Tom's...

Documents

Transcript of AMD Fusion: How It Started, Where It’s Going, And What It Means By William Van Winkle, Tom's...