Tesla Brochure 12 Lr

download Tesla Brochure 12 Lr

of 14

Transcript of Tesla Brochure 12 Lr

  • 8/6/2019 Tesla Brochure 12 Lr

    1/14

    NVIDIA

    TESLA

    GPU COMPUTINGREVOLUTIONIZING HIGH PERFORMANCECOMPUTING

    To learn more, go to www.nvidia.com/tesla 2010 NVIDIA, the NVIDIA logo, CUDA, GPUDirect, Parallel Nsight, and Tesla are trademarks and/or registered trademarks o NVIDIA Corporationin the United States and other countries. Other company and product names may be trademarks o the respective companies with which they areassociated. All rights reserved. 10/2010

  • 8/6/2019 Tesla Brochure 12 Lr

    2/14

    GPUS ARE REVOLUTIONIZING COMPUTING

    The high perormance computing (HPC) industrys need

    computation is increasing, as large and complex computa

    problems become commonplace across many industry se

    Traditional CPU technology, however, is no longer capabl

    in perormance suciently to address this demand.

    The parallel processing capability o the Graphics Proces

    (GPU) allows it to divide complex computing tasks into th

    smaller tasks that can be run concurrently. This ability is

    computational scientists and researchers to address som

    worlds most challenging computational problems up to s

    orders o magnitude aster.

  • 8/6/2019 Tesla Brochure 12 Lr

    3/14

    The use o GPUs or computation is a

    dramatic shit in HPC. GPUs deliver

    perormance increases o 10x to 100x

    to solve problems in minutes instead

    o hours, outpacing the perormance

    o traditional computing with x86-

    based CPUs alone. In addition, GPUs

    also deliver greater perormance per

    watt o power consumed.

    From climate modeling to medical

    tomography, NVIDIA Tesla GPUs

    are enabling a wide variety o

    segments in science and industry to

    progress in ways that were previously

    impractical, or even impossible, due

    to technological limitations.

    Conventional CPU computing architecture

    is no longer scaling to match the

    demands o HPC.

    Co-processing reers to

    the use o an accelerator,

    such as a GPU, to ooad

    the CPU and to increase

    computational efciency.

    10000

    1000

    100

    10

    1

    1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2016

    Performanc

    evsVAX

    100Perormance Advantage

    by 2021

    25% year

    52% year

    20% year

    CPU

    GPU

    Growth per year

    NVIDIA TESLAGPUS ARE REVOLUCOMPUTING

    WHY GPU COMPUTING?

    With the ever-increasing demand

    or more computing perormance,

    the HPC industry is moving toward a

    hybrid computing model, where GPUs

    and CPUs work together to perorm

    general purpose computing tasks.

    As parallel processors, GPUs excel

    at tackling large amounts o similar

    data because the problem can be split

    into hundreds or thousands o pieces

    and calculated simultaneously.

    As sequential processors, CPUs

    are not designed or this type o

    computation, but they are adept

    at more serial-based tasks such

    as running operating systems and

    organizing data. NVIDIAs GPU

    solutions outpace others as they

    apply the most relevant processor to

    the specic task in hand.

    Tesla GPU computing is de

    transormative increases i

    or a wide range o HPC ind

    segments.

    50XMATLAB Computing

    AccelerEyes

    14Financial

    Oxord U

    30XGene Sequencing

    U o Maryland

    3Molecula

    U o Illinois, Ur

    146XMedical Imaging

    U o Utah

    20X3D Ultrasound

    TechniScan

    18XVideo Transcoding

    Elemental Technologies

    5XDigital Content Creation

    Adobe

    100XAstrophysics

    RIKEN

    80XWeather Modeling

    Tokyo Institute o Technology

    Source: Hennessy & Patterson, CAAQA, 4th Edition.

    The convergence o new, ast GPUs optimizedor computation as well as 3D graphicsacceleration and industry-standard sotwaredevelopment tools marks the real beginningo the GPU computing era.

    Nathan BrookwoodPrincipal Analyst & Co-Founder, Insight64

  • 8/6/2019 Tesla Brochure 12 Lr

    4/14

    NVIDIA TESLAGPUS ARE REVOLUTCOMPUTING

    CUDA PARALLELCOMPUTINGARCHITECTURE

    CUDA is NVIDIAs parallel computing

    architecture. Applications that

    leverage the CUDA architecture can

    be developed in a variety o languagesand APIs, including C, C++, Fortran,

    OpenCL, and DirectCompute.

    The CUDA architecture contains

    hundreds o cores capable o running

    many thousands o parallel threads,

    while the CUDA programming

    model lets programmers ocus on

    parallelizing their algorithms and not

    the mechanics o the language.

    The latest generation CUDA

    architecture, codenamed Fermi, is

    the most advanced GPU computing

    architecture ever built. With over

    three billion transistors, Fermi is

    making GPU and CPU co-processing

    pervasive by addressing the ull-spectrum o computing applications.

    With support or C++, GPUs based on

    the Fermi architecture make parallel

    processing easier and accelerate

    perormance on a wider array o

    applications than ever beore. Just a

    ew applications that can experience

    signicant perormance benets

    include ray tracing, nite element

    analysis, high-precision scientic

    computing, sparse linear algebra,

    sorting, and search algorithms.

    PARALLELACCELERATION

    Multi-core programming with x86

    CPUs is dicult and oten results in

    marginal perormance gains when

    going rom 1 core to 4 cores to 16

    cores. Beyond 4 cores, memory

    bandwidth becomes the bottleneck to

    urther perormance increases.

    To harness the parallel computing

    power o GPUs, programmers can

    simply modiy the perormance-

    critical portions o an application

    to take advantage o the hundreds

    o parallel cores in the GPU. The

    rest o the application remains the

    same, making the most ecient use

    o all cores in the system. Running

    a unction on the GPU involvesrewriting that unction to expose

    its parallelism, then adding a ew

    new unction-calls to indicate which

    unctions will run on the GPU or the

    CPU. With these modications, the

    perormance-critical portions o the

    application can now run signicantly

    aster on the GPU.

    Core comparison between a

    CPU and a GPU.

    CPUMultiple Cores

    GPUHundreds o Cores

    Developers use indust

    languages and tools to

    massively parallel CUD

    Libraries and Middleware

    Language Solutions Device level APIs

    C C++ FortranJava and

    Pythoninterfaces

    DirectCompute

    OpenCL

    NVIDIA GPU CUDA Parallel Computing Architecture

    GPU COMPUTING APPLICATIONS

    The next generation CUDA

    computing architecture, co

    Fermi.

    History will record Fermi as a signifcantmilestone.

    Dave PattersonDirector, Parallel Computing Research Laboratory, U.C. Berkeley

    Co-author o Computer Architecture: A Quantitative Approach

  • 8/6/2019 Tesla Brochure 12 Lr

    5/14

    NVIDIA TESLACase Study: Universit

    University o Illinois: Accelerated molecularmodeling enables rapid response to H1N1

    CHALLENGE A rst step in

    mitigating a global pandemic, like

    H1N1, requires quickly developing

    drugs to eectively treat a virus

    that is new and likely to evolve. This

    requires a compute-intensive process

    to determine how, in the case o

    H1N1, mutations o the fu virus

    protein could disrupt the binding

    pathway o the vaccine Tamifu,

    rendering it potentially ineective.

    This determination involved a

    daunting simulation o a 35,000-

    atom system, something a group

    o University o Illinois, Urbana-

    Champaign scientists, led by John

    Stone, decided to tackle in a new way

    using GPUs.

    Conducting this kind o simulation on

    a CPU would take more than a month

    to calculate...and that would only

    amount to a single simulation, not the

    multiple simulations that constitute a

    complete study.

    SOLUTION Stone and his team

    turned to the NVIDIA CUDA parallel

    processing architecture running

    on Tesla GPUs to perorm their

    molecular modeling calculations

    and simulate the drug resistance

    o H1N1 mutations. Thanks to GPU

    technology, the scientists could

    eciently run multiple simulations

    and achieve potentially lie-saving

    results aster.

    IMPACT The GPU-accelerated

    calculation was completed in just

    over an hour. The almost thousand-

    old improvement in perormance

    available through GPU computingand advanced algorithms empowered

    the scientists to perorm emergency

    computing to study biological

    problems o extreme relevance and

    share their results with the medical

    research community.

    This speed and perormance

    increase not only enabled

    researchers to ulll their

    original goaltesting Tamifus

    ecacy in treating H1N1 and its

    mutationsbut it also bought

    them time to make other importantdiscoveries. Further calculations

    showed that genetic mutations which

    render the swine or avian fu resistant

    to Tamifu had actually disrupted

    the binding unnel, providing new

    understanding about a undamental

    mechanism behind drug resistance.

    In the midst o the H1N1 pandemic,

    the use o improved algorithmsbased on CUDA and running on Tesla

    GPUs made it possible to produce

    actionable results about the ecacy

    o Tamifu during a single aternoon.

    This would have taken weeks or

    months o computing to produce

    the same results using conventional

    approaches.

    Ribosome or protein synthesis.

    WHO BENEFITS FROM GPU

    COMPUTING?

    Computational scientists

    and researchers who are

    using GPUs to a ccelerate

    their applications are

    seeing results in days

    instead o months, even

    minutes instead o days.

    The benefts o GPU computing can be

    replicated in other research areas as well

    All o this work is made speedier and mor

    efcient thanks to GPU technology, which

    us means quicker results as well as dollaand energy saved.

    Joh

    Sr. Research Prog

    University o Illinois at Urbana-Cha

  • 8/6/2019 Tesla Brochure 12 Lr

    6/14

    NVIDIA TESLACase Study: Harvard

    Harvard University:Finding Hidden Heart Problems Faster

    320 detector-row CT has enabled single

    heart beat coronary imaging so that the

    entire coronary contrast opacication

    can be evaluated at a single time point.

    The ull 3D course o the arteries, in turn,

    allows researchers to simulate the blood

    fowing through it by using computational

    fuid fow simulations, and subsequentlycompute the endothelial shear stress.

    CHALLENGE Heart attacks, the

    leading cause o death worldwide, are

    caused when plaque, that has built

    up on artery walls, dislodges and

    blocks the fow o blood to the heart.Up to 80% o heart attacks are caused

    by plaque that is not detectable by

    conventional medical imaging. Even

    viewing the 20% that is detectable

    requires invasive endoscopic

    procedures, which involve running

    several eet o tubing into the patient

    in an eort to take pictures o arterial

    plaque.

    This level o uncertainty with regard

    to the exact location o potentially

    deadly plaque poses a signicant

    challenge or cardiologists.Historically, it has been a guessing

    game or heart specialists to

    determine i and where to place

    arterial stents in patients with

    blockages. Knowing the location o

    the plaque could greatly improve

    patient care and save lives.

    SOLUTION A team o researchers,

    including doctors at Harvard Medical

    School and Brigham & Womens

    Hospital in Boston, Massachusetts,

    have discovered a non-invasive

    way to nd dangerous plaque in a

    patients arteries. Tapping into the

    computational power o GPUs, they

    can create a highly individualized

    model o blood fow within a patient in

    a study called hemodynamics.

    The buildup o plaque is highly

    correlated to the shapeor

    geometryo a patients arterial

    structure. Bends in an artery tend to

    be areas where dangerous plaque is

    especially concentrated.

    Using imaging devices like a CTscan, scientists are able to create

    a model o a patients circulatory

    system. From there, an advanced

    fuid dynamics simulation o the blood

    fow through the patients arteries

    can be conducted on a computer to

    identiy areas o reduced endothelial

    sheer stress on the arterial wall.

    A complex simulation like this one

    requires billions o fuid elements to

    be modeled as they pass through an

    artery system. An area o reduced

    sheer stress indicates that plaque

    has ormed on the interior artery

    walls, preventing the bloodstream

    rom making contact with the inner

    wall. The overall output o the

    simulation provides doctors with

    an atherosclerotic risk map. The

    map provides cardiologists with the

    location o hidden plaque and can

    serve as an indicator as to where

    stents may eventually need to be

    placedand all o this knowledge

    is gained without invasive imaging

    techniques or exploratory surgery.

    IMPACT GPUs provide 20x morecomputational power and an order

    o magnitude more perormance

    per dollar to the application o

    image reconstruction and blood

    fow simulation, nally making such

    advanced simulation techniques

    practical at the clinical level.

    Without GPUs, the amount o

    computing equipmentin terms o

    size and expensewould render a

    hemodynamics approach unusable.

    Because it can detect dangerous

    arterial plaque earlier than anyother method, it is expected that

    this breakthrough could save

    numerous lives when it is approved

    or deployment in hospitals and

    research centers.

  • 8/6/2019 Tesla Brochure 12 Lr

    7/14

    MotionDSP: The increasing importance o theGPU in the Armed Forces

    NVIDIA TESLACase Study: MotionD

    MotionDSPs product, Ikena

    ISR, leverages NVIDIAs CUDA

    parallel computing architecture

    allowing it to render, stabilize and

    enhance live video aster and more

    accurately than its competitors.

    Ikena ISR eatures computationally

    intense, advanced motion-

    tracking algorithms that provide

    the basis or sophisticated image

    stabilization and super-resolution

    video reconstruction. Perhaps most

    importantly, it can all be run on o-the-

    shel Windows laptops and servers.

    Using NVIDIA Tesla GPUs, MotionDSPs

    customers, which include a variety o

    military-unded research groups, are

    making UAVs saer and more reliable

    while reducing deployment costs,

    improving simulation accuracy and

    dramatically boosting perormance.

    IMPACT Using only CPUs to execute

    the kind o sophisticated video post-

    processing algorithms required or

    eective reconnaissance would result in

    up to six hours o processing or each one

    hour o videonot a viable solution when

    real-time results are critical. In contrast,

    Tesla GPUs enable MotionDSPs Ikena

    ISR sotware to process any live video

    source in real-time with less than 200ms

    o latency. Moreover, instead o requiringexpensive CPU-clustered computing

    systems to complete the work, Ikena can

    perorm at ull capacity on a standard

    workstation small enough to t inside

    military vehicles.

    Merlin International is one o the astest

    growing providers o inormation

    technology solutions in the United States;

    their Collaborative Video Delivery oering

    which includes Ikena helps support

    the deense and intelligence missions o

    the US Federal Government.

    MotionDSPs use o GPU technologyhas greatly enhanced the capabilities

    o its Ikena sotware, enabling it to

    deliver real-time super-resolution

    analysis o intelligence video

    something that simply was not possible

    beore, said John Trauth, President

    o Merlin International. Integrating

    this technology into our Collaborative

    Video Delivery solution can enable our

    government customers to quickly and

    easily access the data they need or

    eective intelligence, surveillance and

    reconnaissance (ISR) this saves lives

    and signicantly increases mission

    success rates.

    Super-resolution algorithms allow

    MotionDSP to reconstruct video with

    better and cleaner detail, increased

    resolution and reduced noise.

    With the GPU, were bringing higher

    quality video to all ISR platorms, includin

    smaller UAVs, by being smarter about how

    we utilize COTS PC technology.

    Our technology makes the impossible

    possible, and this is making our military

    saer and better prepared.Sea

    Chie Executive Ofcer, Mo

    CHALLENGE Unmanned Aerial

    Vehicles (UAVs) represent the latest

    in high-tech weaponry deployed

    to strengthen and improve the

    militarys capabilities. But with new

    technologies come new challenges,

    such as capturing actionableintelligence while fying at speeds

    upwards o 140 mph, 10 miles above

    the earth.

    One key eature o the UAV is that it

    is capable o providing a real-time

    stream o detailed images taken

    with multiple cameras on the vehicle

    simultaneously. The challenge is

    that the resulting images need to be

    rendered, stabilized and enhanced in

    real-time and across vast distances in

    order to be useul.

    Once they have been processed,

    the images can give inantry

    critical inormation about potential

    challenges aheadthe end goal being

    to ensure the saety and protection o

    military personnel in the eld.

    Using CPUs alone, this process

    is very time consuming and does

    not allow inormation to be viewed

    in real-time. As a result, military

    action could be based on potentially

    outdated intelligence data and

    inaccurate guides.

    SOLUTION MotionDSP, a sotware

    company based in San Mateo,

    Caliornia, has developed super-

    resolution algorithms that allow it

    to reconstruct video with better and

    cleaner detail, increased resolution

    and reduced noise. All o which are

    ideal or the live streaming o video

    rom the cameras attached to a UAV.

  • 8/6/2019 Tesla Brochure 12 Lr

    8/14

    Bloomberg: GPUs increase accuracy and reduceprocessing time or bond pricing

    Financial engineering is integral totodays buying and selling decisions.

    CHALLENGE Getting a mortgage

    and buying a home is a complex

    nancial transaction, and or

    lenders, the competitive pricing and

    management o that mortgage is aneven greater challenge. Transactions

    involving thousands o mortgages

    at once are a routine occurrence in

    nancial markets, spurred by banks

    that wish to sell o loans to get their

    money back sooner.

    Known as collateralized debt

    obligations (CDO) and collateralized

    mortgage obligations (CMO), baskets

    o thousands o loans are publicly

    traded nancial instruments. For thebanks and institutional investors who

    buy and sell these baskets, timely

    pricing updates are essential because

    o ast-changing market conditions.

    Bloomberg, one o the worlds leading

    nancial services organizations, prices

    CDO/CMO baskets or its customers

    by running powerul algorithms that

    model the risks and determine the price.

    NVIDIA TESLACase Study: Bloombe

    This technique requires calculating

    huge amounts o data, rom interest

    rate volatility to the payment behavior

    o individual borrowers. These data-

    intensive calculations can take hours

    to run with a CPU-based computing

    grid. Time is money, and Bloomberg

    wanted a new solution that would allow

    them to get pricing updates to their

    customers aster.

    SOLUTION Bloomberg implemented

    an NVIDIA Tesla GPU computing

    solution in their datacenter. By

    porting their application to run on

    the NVIDIA CUDA parallel processing

    architecture to harness the power o

    GPUs, Bloomberg received dramatic

    improvements across the board. Large

    calculations that had previously taken

    up to two hours can now be completed

    in two minutes. Smaller runs that

    had taken 20 minutes can now be

    perormed in just seconds.

    In addition, the capital outlay or the

    new GPU-based solution was one-

    tenth the cost o an upgraded CPU

    solution, and urther savings are being

    realized due to the GPUs ecient

    power and cooling needs.

    IMPACT As Bloomberg customers

    make CDO/CMO buying and selling

    decisions, they now have access to

    the best and most current pricing

    inormation, giving them a serious

    competitive trading advantage in a

    market where timing is everything.

    One o the challenges Bloomberg always aces is that we

    very large scale. Were serving all the fnancial and businecommunity and there are a lot o dierent instruments an

    models people want calculated.

    ShawCTO, B

    GPU ACCELERATI

    Large calculations

    had previously tak

    to two hours can n

    completed in two

    Smaller runs that

    taken 20 minutes

    now be perormed

    seconds.

  • 8/6/2019 Tesla Brochure 12 Lr

    9/14

    NVIDIA TESLACase Study: A

    A: Accelerating 3D seismic interpretation

    CHALLENGE In the search or oil

    and gas, the geological inormation

    provided by seismic images o the

    earth is vital. By interpreting the data

    produced by seismic imaging surveys,geoscientists can identiy the likely

    presence o hydrocarbon reserves and

    understand how to extract resources

    most eectively. Today, sophisticated

    visualization systems and

    computational analysis tools are used

    to streamline what was previously a

    subjective and labor intensive process.

    Today, geoscientists must process

    increasing amounts o data as

    dwindling reserves require them

    to pinpoint smaller, more complex

    reservoirs with greater speed andaccuracy.

    SOLUTION UK-based company A

    provides world leading 3D seismic

    analysis sotware and services to the

    global oil and gas industry. Its sotware

    tools extract detailed inormation rom3D seismic data, providing a greater

    understanding o complex 3D geology,

    improving productivity and reducing

    uncertainty within the interpretation

    process. The sophisticated tools are

    compute-intensive so it can take

    hours, or even days, to produce results

    on conventional high perormance

    workstations.

    With the recent release o its

    CUDA enabled 3D seismic analysis

    application, A users routinely achieve

    over an order o magnitude speed-upcompared with perormance on high

    end multi-core CPUs.

    The latest benchmark

    results using Tesla GPUs

    have produced perormance

    improvements o up to 37x

    versus high-end workstations

    with two quad core CPUs.

    40

    30

    20

    10

    0

    Speedups

    Quad-Core

    CPU

    Tesla

    GPU + CPU

    Data courtesy o RMOTC

    CUDA-based GPUs power large computa

    tasks and interactive computational work

    that we could not hope to implement

    eectively otherwise.Stev

    A Technical

    This step change in perormance

    signicantly increases the amount o

    data that geoscientists can analyze in a

    given timerame. Plus, it allows them

    to ully exploit the inormation derived

    rom 3D seismic surveys to improve

    subsurace understanding and reduce

    risk in oil and gas exploration and

    exploitation.

    IMPACT NVIDIA CUDA is allowing A

    to provide scalable high perormance

    computation or seismic data on

    hardware platorms equipped with

    one or more NVIDIA Tesla and Quadro

    GPUs. The latest benchmark results

    using Tesla GPUs have produced

    perormance improvements o up to

    37x versus high-end workstations with

    two quad core CPUs.

    Access to high perormance, high

    quality 3D computational tools on

    a workstation platorm drastically

    improves the productivity curve in

    3D seismic analysis and seismic

    interpretation, giving our users a real

    edge in oil and gas exploration and de-

    risking eld development.

  • 8/6/2019 Tesla Brochure 12 Lr

    10/14

    NVIDIA TESLAGPUS ARE REVOLUTCOMPUTING

    Approximately every 10 years, the world o supercomputing experiences

    a undamental shit in computing architectures. It was around 10

    years ago when cluster-based computing largely superceded vector-

    based computing as the de acto standard or large-scale computing

    installations, and this shit saw the supercomputing industry move

    beyond the petaFLOP perormance barrier. With the next perormance

    target being exascale-class computing, it is time or a new shit incomputing architecturesthe move to parallel computing.

    The shit is already underway.

    THE SHIFT TO PARALLEL COMPUTINGAND THE PATH TO ExASCALE

  • 8/6/2019 Tesla Brochure 12 Lr

    11/14

    NVIDIA TESLATHE SHIFT TO PARACOMPUTING AND THEXASCALE

    Nebulae, powered by 464

    20-series GPUs, is one o t

    supercomputers in the wor

    In November 2008, Tokyo Institute

    o Technology became the rst

    supercomputing center to enter

    the Top500 with a GPU-based

    hybrid systema system that

    uses GPUs and CPUs together to

    deliver transormative increases

    in perormance without breaking

    the bank with regards to energy

    consumption. The system, called

    TSUBAME 1.2, entered the list at

    number 24.

    Fast orward to June 2010 andhybrid systems have started to make

    appearances even higher up the list.

    Nebulae, a system installed at the

    Shenzhen Supercomputing Center

    in China, equipped with 4640 Tesla

    20-series GPUs, made its entry into

    the list at number 2, just one spot

    behind Oak Ridge National Labs

    Jaguar, the astest supercomputer in

    the world.

    What is even more impressive than

    the overall perormance o Nebulae,

    is how little power it consumes. WhileJaguar delivers 1.77 petaFLOPs, it

    consumes more than 7 megawatts o

    power to do so.

    To put that into context, 7 megawatts

    is enough energy to power 15,000

    homes. In contrast, Nebulae delivers

    1.27 petaFLOPs, yet it does this

    within a power budget o just 2.55

    megawatts. That makes it twice

    as power-ecient as Jaguar. This

    dierence in computational throughput

    and power is owed to the massively

    parallel architecture o GPUs, where

    hundreds o cores work together

    within a single processor, delivering

    unprecedented compute density or

    next generation supercomputers.

    Another very notable entry into

    this years Top500 was the Chinese

    Academy o Sciences (CAS). The Mole

    8.5 supercomputer at CAS uses 2200

    Tesla 20-series GPUs to deliver 207teraFLOPS, which puts it at number 19

    in the Top500.

    Future computing architectures will be hybridsystems with parallel-core GPUs working intandem with multi-core CPUs.

    Jack DongarraDistinguished Proessor at University o Tennessee

    CAS is one o the worlds most dynamic

    research and educational acilities.

    Pro. Wei Ge, Proessor o Chemical

    Engineering at the Institute o Process

    Engineering at CAS, introduced GPU

    computing to the Beijing acility in 2007

    to help them with discrete particle and

    molecular dynamics simulations. Since

    then, parallel computing has enabled

    the advancement o research in dozens

    o other areas, including: real-time

    simulations o industrial acilities,

    the design and optimization o multi-

    phase and turbulent industrial reactors

    using computational fuid dynamics,

    the optimization o secondary and

    tertiary oil recovery using multi-scale

    simulation o porous materials, the

    simulation o nano- and micro-fow in

    chemical and bio-chemical processes,

    and much more.

    These computational problems

    represent a tiny raction o the entire

    landscape o computational challenges

    that we ace today, and these problems

    are not getting any smaller. The sheer

    quantity o data that many scientists,

    engineers and researchers must

    analyze is increasing exponentially and

    supercomputing centers are over-

    subscribed as demand is outpacing

    the supply o computational resources.

    I we are to maintain our rate o

    innovation and discovery, we must take

    computational perormance to a level

    where it is 1000 times aster than what

    it is today. The GPU is a transormative

    orce in supercomputing and

    represents the only viable strategy to

    successully build exascale systems

    that are aordable to build and ecient

    to operate.

    Five years rom now, the bulk o

    serious HPC is going to be done

    with some kind o accelerated

    heterogeneous architecture.

    said Steve Scott, CTO o Cray Inc.

    TSUBAME 1.2, by Tokyo Institute o

    Technology, was the rst Tesla GPU based

    hybrid cluster to appear on the Top500 list.

  • 8/6/2019 Tesla Brochure 12 Lr

    12/14

    NVIDIA TESLAGPUS ARE REVOLUTCOMPUTING

    NVIDIA TESLAWORLDS FIRST COMPUTATIONAL GPU

    Tesla 20-series GPU computing solutions are designed rom the ground up or high-

    perormance computing and are based on NVIDIAs latest CUDA GPU architecture,

    code named Fermi. It delivers many must have eatures or HPC including ECC

    memory or uncompromised accuracy and scalability, C++ support, and 7x the double

    precision perormance o CPUs. Compared to typical quad-core CPUs, Tesla 20-series

    GPU computing products can deliver equivalent perormance at 1/10th the cost and

    1/20th the power consumption.

  • 8/6/2019 Tesla Brochure 12 Lr

    13/14

    TESLA GPU COMPUTING SOLUTIONS

    NVIDIA Tesla products are designed or high-perormance computing,

    and oers exclusive computing eatures.

    Superior Perormance

    > Highest double precision foating point perormance

    > Large HPC data sets supported by larger on-board memory

    > Faster communication with InniBand using NVIDIA GPUDirect

    Highly Reliable

    > ECC protection or uncompromised data reliability

    > Stress tested or zero error tolerance

    > Manuactured by NVIDIA

    > Enterprise-level support that includes a three-year warranty

    Designed or HPC

    > Integrated by leading OEMs into workstations, servers and blades

    NVIDIA TESLAGPU COMPUTING SO

    TESLA DATA CENTER PRODUCTS

    Available rom OEMs and certied resellers, Tesla GPU computing products

    are designed to supercharge your computing cluster.

    Tesla M2050/M2070 GPU Computing

    Module enables the use o GPUs and

    CPUs together in an individual server

    node or blade orm actor.

    Tesla S2050 GPU Computing System

    is a 1U system powered by 4 Tesla

    GPUs and connects to a CPU server.

    750

    600

    450

    300

    150

    0

    70

    60

    50

    40

    30

    20

    10

    0

    800

    600

    400

    200

    0

    8x

    5x

    4x

    PerformanceGflops

    CPU Server GPU-CPU

    Server

    CPU Server GPU-CPU

    Server

    CPU Server GPU-CPU

    Server

    Performance/$Gflops/$K

    Performance/wattGflops/kwatt

    Highest Perormance, Highest Efciency

    CPU 1U Server: 2x Intel Xeon X5550 (Nehalem) 2.66 GHz, 48 GB memory, $7K, 0.55 kwGPU-CPU 1U Server: 2x Tesla C2050 + 2x Intel Xeon X5550, 48 GB memory, $11K, 1.0 kw

    GPU-CPU server solutions

    8x higher Linpack perorm

    Tesla C2050/C2070 GPU Computing

    Processor delivers the power o

    a cluster in the orm actor o a

    workstation.

    TESLA WORkSTATION PRODUCTS

    Designed to deliver cluster-level perormance on a workstation, the NVIDIA Tesla

    GPU Computing Processors uel the transition to parallel computing while making

    personal supercomputing possibleright at your desk.

    Highest Perormance, Highest Efciency

    Workstations powered by T

    outperorm conventional C

    solutions in lie science ap

    120

    100

    80

    60

    40

    20

    0

    MIDG: DiscontinuousGalerkin Solvers forPDEs

    Speedups

    8

    7

    6

    5

    4

    3

    2

    1

    0

    8

    7

    6

    5

    4

    3

    2

    1

    0

    180

    160

    140

    120

    100

    80

    40

    20

    0

    8

    7

    6

    5

    4

    3

    2

    1

    0

    AMBER MolecularDynamics (MixedPrecision)

    FFT: cuFFT 3.1 inDouble Precision

    OpenEye ROCSVirtual DrugScreening

    Radix SortCUDA SDK

    Intel Xeon X5550 CPU

    Tesla C2050

  • 8/6/2019 Tesla Brochure 12 Lr

    14/14

    NVIDIA TESLAGPU COMPUTING EC

    DEVELOPER ECOSYSTEM AND WORLDWIDE EDUCATION

    In just a ew years, an entire sotware ecosystem has developed around the CUDA

    architecturerom more than 350 universities worldwide teaching the CUDA

    programming model, to a wide range o libraries, compilers and middleware that

    help users optimize applications or GPUs.

    The NVIDIA GPU Computing Ecosystem

    A rich ecosystem o sotware

    applications, libraries, programming

    language solutions, and service providers

    support the CUDA parallel computing

    architecture.

    CUDA

    NVIDIAs CUDA architecture has the industrys most robust language and API

    support or GPU computing developers, including C, C++, OpenCL, DirectCompute,

    and Fortran. NVIDIAParallel Nsight, a ully integrated development

    environment or Microsot Visual Studio is also available. Used by more than six

    million developers worldwide, Visual Studio is one o the worlds most popular

    development environments or Windows-based applications and services. Adding

    unctionality specically or GPU computing developers, Parallel Nsight makes the

    power o the GPU more accessible than ever beore.

    In addition to the CUDA C development tools, math libraries, and hundreds o code samples

    in the NVIDIA GPU computing SDK, there is also a rich ecosystem o solutions:

    Libraries and Middleware Solutions

    > Acceleware FDTD libraries

    > CUBLAS, complete BLAS library*

    > CUFFT, high-perormance FFTroutines*

    > CUSP

    > EM Photonics CULA Tools,heterogeneous LAPACKimplementation

    > NVIDIA OptiX and other AXEs*

    > NVIDIA Perormance Primitives orimage and video processingwww.nvidia.com/npp*

    > Thrust

    Compilers and Language Solutions

    > CAPS HMPP

    > NVIDIA CUDA C Compiler (NVCC),supporting both CUDA C and CUDAC++*

    > Par4All

    > PGI CUDA Fortran

    > PGI Accelerator Compilers orC and Fortran

    > PyCUDA

    GPU Debugging Tools

    > Allinea DDT

    > Fixstars Eclipse plug-in

    > NVIDIA cuda-gdb*

    > NVIDIA Parallel Nsightor Visual Studio

    > TotalView Debugger

    GPU Perormance Analysis Tools

    > NVIDIA Visual Proler*

    > NVIDIA Parallel Nsightor Visual Studio

    > TAU CUDA

    > Vampir

    Cluster and Grid ManagementSolutions

    > Bright Cluster Manager

    > NVIDIA system managementinterace (nvidia-smi)

    > Platorm Computing

    Math Pacages> Jacket by AccelerEyes

    > Mathematica 8 by Wolram

    > MATLAB Distributed ComputingServer (MDCS) by Mathworks

    > MATLAB Parallel ComputingToolbox (PCT) by Mathworks

    Consulting and Training

    Consulting and training services areavailable to support you in portingapplications and learning aboutdeveloping with CUDA.

    For more inormation, visitwww.nvidia.com/object/cuda_consultants.html.

    Education and Certifcation

    > CUDA Certicationwww.nvidia.com/certication

    > CUDA Center o Excellenceresearch.nvidia.com

    > CUDA Research Centersresearch.nvidia.com/

    > CUDA Teaching Centersresearch.nvidia.com/

    > CUDA and GPU computing books

    * Available with the latest CUDA toolkit at www.nvidia.com/getcuda

    For more inormation about the CUDA Certication Program,

    visit www.nvidia.com/certifcation.

    INTEGRATEDDEVELOPMENTENVIRONMENT

    RESEARCH &EDUCATION LIBRARIES

    TOOLS& PARTNERS

    ALL MAJORPLATFORMS

    MATHEMATICALPACKAGES

    LANGUAGES& APIS

    CONSULTANTS,TRAINING, &CERTIFICATION

    NVIDIA Parallel Nsight

    sotware is the industrys

    rst development

    environment or massively

    parallel computing

    integrated into Microsot

    Visual Studio. It integrates

    CPU and GPU development,

    allowing developers to

    create optimal GPU-

    accelerated applications.