CUDA - Wikipedia, The Free Encyclopedia

14
8/6/2015 CUDA Wikipedia, the free encyclopedia https://en.wikipedia.org/wiki/CUDA 1/14 CUDA A parallel computing platform and programming model Developer(s) NVIDIA Corporation Initial release June 23, 2007 Stable release 7.0 / March 17, 2015 Operating system Windows XP and later, Mac OS X, Linux Platform Supported GPUs Type GPGPU License Freeware Website www.nvidia.com/object/cuda_home_new.html (http://www.nvidia.com/object/cuda_home_new.html) CUDA From Wikipedia, the free encyclopedia CUDA, which stands for Compute Unified Device Architecture, [1] is a parallel computing platform and application programming interface (API) model created by NVIDIA. [2] It allows software developers to use a CUDAenabled graphics processing unit (GPU) for general purpose processing – an approach known as GPGPU. The CUDA platform is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements. [3] The CUDA platform is designed to work with programming languages such as C, C++ and Fortran. This accessibility makes it easier for specialists in parallel programming to utilize GPU resources, as opposed to previous API solutions like Direct3D and OpenGL, which required advanced skills in graphics programming. Also, CUDA supports programming frameworks such as OpenACC and OpenCL. [3] Contents 1 Background 2 Programming capabilities 3 Advantages 4 Limitations 5 Supported GPUs 6 Version features and specifications 7 Example 8 Language bindings 9 Current and future usages of CUDA architecture 10 See also 11 References 12 External links

description

Compute Unified Device Architecture

Transcript of CUDA - Wikipedia, The Free Encyclopedia

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 1/14

    CUDA

    Aparallelcomputingplatformandprogrammingmodel

    Developer(s) NVIDIACorporation

    Initialrelease June23,2007

    Stablerelease 7.0/March17,2015

    Operatingsystem WindowsXPandlater,MacOSX,Linux

    Platform SupportedGPUs

    Type GPGPU

    License Freeware

    Website www.nvidia.com/object/cuda_home_new.html(http://www.nvidia.com/object/cuda_home_new.html)

    CUDAFromWikipedia,thefreeencyclopedia

    CUDA,whichstandsforComputeUnifiedDeviceArchitecture,[1]isaparallelcomputingplatformandapplicationprogramminginterface(API)modelcreatedbyNVIDIA.[2]ItallowssoftwaredeveloperstouseaCUDAenabledgraphicsprocessingunit(GPU)forgeneralpurposeprocessinganapproachknownasGPGPU.TheCUDAplatformisasoftwarelayerthatgivesdirectaccesstotheGPU'svirtualinstructionsetandparallelcomputationalelements.[3]

    TheCUDAplatformisdesignedtoworkwithprogramminglanguagessuchasC,C++andFortran.ThisaccessibilitymakesiteasierforspecialistsinparallelprogrammingtoutilizeGPUresources,asopposedtopreviousAPIsolutionslikeDirect3DandOpenGL,whichrequiredadvancedskillsingraphicsprogramming.Also,CUDAsupportsprogrammingframeworkssuchasOpenACCandOpenCL.[3]

    Contents

    1Background2Programmingcapabilities3Advantages4Limitations5SupportedGPUs6Versionfeaturesandspecifications7Example8Languagebindings9CurrentandfutureusagesofCUDAarchitecture10Seealso11References12Externallinks

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 2/14

    ExampleofCUDAprocessingflow1.CopydatafrommainmemtoGPUmem2.CPUinstructstheprocesstoGPU3.GPUexecuteparallelineachcore4.CopytheresultfromGPUmemtomainmem

    Background

    TheGPU,asaspecializedprocessor,addressesthedemandsofrealtimehighresolution3Dgraphicscomputeintensivetasks.Asof2012,GPUshaveevolvedintohighlyparallelmulticoresystemsallowingveryefficientmanipulationoflargeblocksofdata.ThisdesignismoreeffectivethangeneralpurposeCPUsforalgorithmswhereprocessingoflargeblocksofdataisdoneinparallel,suchas:

    pushrelabelmaximumflowalgorithmfastsortalgorithmsoflargeliststwodimensionalfastwavelettransformmoleculardynamicssimulations

    Programmingcapabilities

    TheCUDAplatformisaccessibletosoftwaredevelopersthroughCUDAacceleratedlibraries,compilerdirectivessuchasOpenACC,andextensionstoindustrystandardprogramminglanguagesincludingC,C++andFortran.C/C++programmersuse'CUDAC/C++',compiledwith"nvcc"NVIDIA'sLLVMbasedC/C++compiler.[4]Fortranprogrammerscanuse'CUDAFortran',compiledwiththePGICUDAFortrancompilerfromThePortlandGroup.

    Inadditiontolibraries,compilerdirectives,CUDAC/C++andCUDAFortran,theCUDAplatformsupportsothercomputationalinterfaces,includingtheKhronosGroup'sOpenCL,[5]Microsoft'sDirectCompute,OpenGLComputeShaders(http://www.opengl.org/wiki/Compute_Shader)andC++AMP.[6]ThirdpartywrappersarealsoavailableforPython,Perl,Fortran,Java,Ruby,Lua,Haskell,R,MATLAB,IDL,andnativesupportinMathematica.

    Inthecomputergameindustry,GPUsareusednotonlyforgraphicsrenderingbutalsoingamephysicscalculations(physicaleffectssuchasdebris,smoke,fire,fluids)examplesincludePhysXandBullet.CUDAhasalsobeenusedtoacceleratenongraphicalapplicationsincomputationalbiology,cryptographyandotherfieldsbyanorderofmagnitudeormore.[7][8][9][10][11]

    CUDAprovidesbothalowlevelAPIandahigherlevelAPI.TheinitialCUDASDKwasmadepublicon15February2007,forMicrosoftWindowsandLinux.MacOSXsupportwaslateraddedinversion2.0,[12]whichsupersedesthebetareleasedFebruary14,2008.[13]CUDAworkswithallNvidiaGPUsfromtheG8xseriesonwards,includingGeForce,QuadroandtheTeslaline.CUDAiscompatiblewithmoststandardoperatingsystems.NvidiastatesthatprogramsdevelopedfortheG8xserieswillalsoworkwithoutmodificationonallfutureNvidiavideocards,duetobinarycompatibility.

    Advantages

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 3/14

    CUDAhasseveraladvantagesovertraditionalgeneralpurposecomputationonGPUs(GPGPU)usinggraphicsAPIs:

    ScatteredreadscodecanreadfromarbitraryaddressesinmemoryUnifiedvirtualmemory(CUDA4.0andabove)Unifiedmemory(CUDA6.0andabove)SharedmemoryCUDAexposesafastsharedmemoryregionthatcanbesharedamongstthreads.Thiscanbeusedasausermanagedcache,enablinghigherbandwidththanispossibleusingtexturelookups.[14]FasterdownloadsandreadbackstoandfromtheGPUFullsupportforintegerandbitwiseoperations,includingintegertexturelookups

    Limitations

    CUDAdoesnotsupportthefullCstandard,asitrunshostcodethroughaC++compiler,whichmakessomevalidC(butinvalidC++)codefailtocompile.[15][16]InteroperabilitywithrenderinglanguagessuchasOpenGLisoneway,withOpenGLhavingaccesstoregisteredCUDAmemorybutCUDAnothavingaccesstoOpenGLmemory.Copyingbetweenhostanddevicememorymayincuraperformancehitduetosystembusbandwidthandlatency(thiscanbepartlyalleviatedwithasynchronousmemorytransfers,handledbytheGPU'sDMAengine)Threadsshouldberunningingroupsofatleast32forbestperformance,withtotalnumberofthreadsnumberinginthethousands.Branchesintheprogramcodedonotaffectperformancesignificantly,providedthateachof32threadstakesthesameexecutionpaththeSIMDexecutionmodelbecomesasignificantlimitationforanyinherentlydivergenttask(e.g.traversingaspacepartitioningdatastructureduringraytracing).UnlikeOpenCL,CUDAenabledGPUsareonlyavailablefromNvidia[17]NoemulatororfallbackfunctionalityisavailableformodernrevisionsValidC/C++maysometimesbeflaggedandpreventcompilationduetooptimizationtechniquesthecompilerisrequiredtoemploytouselimitedresources.Asingleprocessmustrunspreadacrossmultipledisjointmemoryspaces,unlikeotherClanguageruntimeenvironments.C++RunTimeTypeInformation(RTTI)isnotsupportedinCUDAcode,duetolackofsupportintheunderlyinghardware.ExceptionhandlingisnotsupportedinCUDAcodeduetoperformanceoverheadthatwouldbeincurredwithmanythousandsofparallelthreadsrunning.CUDA(withcomputecapability2.x)allowsasubsetofC++classfunctionality,forexamplememberfunctionsmaynotbevirtual(thisrestrictionwillberemovedinsomefuturerelease).[SeeCUDACProgrammingGuide3.1AppendixD.6]InsingleprecisiononfirstgenerationCUDAcomputecapability1.xdevices,denormalnumbersarenotsupportedandareinsteadflushedtozero,andtheprecisionsofthedivisionandsquarerootoperationsareslightlylowerthanIEEE754compliantsingleprecisionmath.Devicesthatsupportcomputecapability2.0andabovesupportdenormalnumbers,andthedivisionandsquarerootoperationsareIEEE754compliantbydefault.However,userscanobtainthepreviousfastergaminggrademathofcomputecapability1.xdevicesifdesiredbysettingcompilerflagstodisableaccuratedivisions,disableaccuratesquareroots,andenableflushingdenormalnumberstozero.[18]

    SupportedGPUs

    Computecapabilitytable(versionofCUDAsupported)byGPUandcard.AlsoavailabledirectlyfromNvidia(http://developer.nvidia.com/cudagpus):

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 4/14

    Computecapability(version)

    Microarchitecture GPUs Cards

    1.0

    Tesla

    G80,G92,G92b,G94,G94b

    GeForceGT420*,GeForce8800Ultra,GeForce8800GTX,GeForceGT340*,GeForceGT330*,GeForceGT320*,GeForce315*,GeForce310*,GeForce9800GT,GeForce9600GT,GeForce9400GT,QuadroFX5600,QuadroFX4600,QuadroPlex2100S4,TeslaC870,TeslaD870,TeslaS870

    1.1

    G86,G84,G98,G96,G96b,G94,G94b,G92,G92b

    GeForceG110M,GeForce9300MGS,GeForce9200MGS,GeForce9100MG,GeForce8400MGT,GeForce8600GT,GeForce8600GTS,GeForceG105M,QuadroFX4700X2,QuadroFX3700,QuadroFX1800,Quadro

    FX1700,QuadroFX580,QuadroFX570,QuadroFX470,QuadroFX380,QuadroFX370,QuadroFX370LowProfile,QuadroNVS450,QuadroNVS420,QuadroNVS290,QuadroNVS295,QuadroPlex2100D4,QuadroFX3800M,QuadroFX3700M,QuadroFX3600M,QuadroFX

    2800M,QuadroFX2700M,QuadroFX1700M,QuadroFX1600M,QuadroFX770M,QuadroFX570M,QuadroFX370M,QuadroFX360M,QuadroNVS320M,QuadroNVS160M,QuadroNVS150M,QuadroNVS140M,QuadroNVS135M,QuadroNVS130M,QuadroNVS450,QuadroNVS

    420,QuadroNVS295

    1.2GT218,GT216,GT215

    GeForceGT240,GeForceGT220*,GeForce210*,GeForceGTS360M,GeForceGTS350M,GeForceGT335M,GeForceGT330M,GeForceGT

    325M,GeForceGT240M,GeForceG210M,GeForce310M,GeForce305M,QuadroFX380LowProfile,NVIDIANVS300,QuadroFX1800M,

    QuadroFX880M,QuadroFX380M,NVIDIANVS300,NVS5100M,NVS3100M,NVS2100M,ION

    1.3 GT200,GT200b

    GeForceGTX280,GeForceGTX275,GeForceGTX260,QuadroFX5800,QuadroFX4800,QuadroFX4800forMac,QuadroFX3800,QuadroCX,

    QuadroPlex2200D2,TeslaC1060,TeslaS1070,TeslaM1060

    2.0

    Fermi

    GF100,GF110

    GeForceGTX590,GeForceGTX580,GeForceGTX570,GeForceGTX480,GeForceGTX470,GeForceGTX465,GeForceGTX480M,Quadro

    6000,Quadro5000,Quadro4000,Quadro4000forMac,QuadroPlex7000,Quadro5010M,Quadro5000M,TeslaC2075,TeslaC2050/C2070,Tesla

    M2050/M2070/M2075/M2090

    2.1

    GF104,GF106

    GF108GF114,GF116,GF119

    GeForceGTX560Ti,GeForceGTX550Ti,GeForceGTX460,GeForceGTS450,GeForceGTS450*,GeForceGT640(GDDR3),GeForceGT630,

    GeForceGT620,GeForceGT610,GeForceGT520,GeForceGT440,GeForceGT440*,GeForceGT430,GeForceGT430*,GeForceGTX675M,GeForceGTX670M,GeForceGT635M,GeForceGT630M,

    GeForceGT625M,GeForceGT720M,GeForceGT620M,GeForce710M,GeForce610M,GeForceGTX580M,GeForceGTX570M,GeForceGTX560M,GeForceGT555M,GeForceGT550M,GeForceGT540M,GeForceGT525M,GeForceGT520MX,GeForceGT520M,GeForceGTX485M,GeForceGTX470M,GeForceGTX460M,GeForceGT445M,GeForceGT435M,GeForceGT420M,GeForceGT415M,GeForce710M,GeForce410M,Quadro2000,Quadro2000D,Quadro600,Quadro410,Quadro

    4000M,Quadro3000M,Quadro2000M,Quadro1000M,NVS5400M,NVS5200M,NVS4200M

    3.0GK104,GK106,GK107

    GeForceGTX770,GeForceGTX760,GeForceGT740,GeForceGTX690,GeForceGTX680,GeForceGTX670,GeForceGTX660Ti,GeForceGTX660,GeForceGTX650TiBOOST,GeForceGTX650Ti,GeForceGTX650,GeForceGTX880M,GeForceGTX780M,GeForceGTX770M,

    GeForceGTX765M,GeForceGTX760M,GeForceGTX680MX,GeForceGTX680M,GeForceGTX675MX,GeForceGTX670MX,GeForceGTX660M,GeForceGT750M,GeForceGT650M,GeForceGT745M,GeForce

    GT645M,GeForceGT740M,GeForceGT730M,GeForceGT640M,GeForceGT640MLE,GeForceGT735M,GeForceGT730M,Quadro

    K5000,QuadroK4200,QuadroK4000,QuadroK2000,QuadroK2000D,

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 5/14

    Kepler QuadroK600,QuadroK420,QuadroK500M,QuadroK510M,QuadroK610M,QuadroK1000M,QuadroK2000M,QuadroK1100M,QuadroK2100M,QuadroK3000M,QuadroK3100M,QuadroK4000M,Quadro

    K5000M,QuadroK4100M,QuadroK5100M,TeslaK103.2 TegraK1 JetsonTK1(SoC)

    3.5 GK110,GK208

    GeForceGTXTITANZ,GeForceGTXTITANBlack,GeForceGTXTITAN,GeForceGTX780Ti,GeForceGTX780,GeForceGT640

    (GDDR5),GeForceGT630v2,GeForceGT730,GeForceGT720,QuadroK6000,QuadroK5200,TeslaK40,TeslaK20x,TeslaK20

    3.7 GK210 TeslaK80

    5.0

    Maxwell

    GM107,GM108

    GeForceGTX750Ti,GeForceGTX750,GeForceGTX960M,GeForceGTX950M,GeForce940M,GeForce930M,GeForceGTX860M,GeForceGTX850M,GeForce845M,GeForce840M,GeForce830M,QuadroK2200,

    QuadroK1200,QuadroK620,QuadroK620M

    5.2GM200,GM204,GM206

    GeForceGTXTITANX,GeForceGTX980Ti,GeForceGTX980,GeForceGTX970,GeForceGTX960,GeForceGTX950,GeForceGTX980M,

    GeForceGTX970M,GeForceGTX965M,QuadroM6000,QuadroM5000,QuadroM4000

    5.3 TegraX1

    '*'OEMonlyproducts

    AtableofdevicesofficiallysupportingCUDA:[17]

    NvidiaGeForceGeForceGTXTITANXGeForceGTX980TiGeForceGTX980GeForceGTX970GeForceGTX960GeForceGTX950GeForceGTXTitanZGeForceGTXTITANBlackGeForceGTXTITANGeForceGTX780TiGeForceGTX780GeForceGTX770GeForceGTX760GeForceGTX750TiGeForceGTX750GeForceGT740GeForceGT730GeForceGTX690GeForceGTX680GeForceGTX670GeForceGTX660TiGeForceGTX660GeForceGTX650TiBOOSTGeForceGTX650TiGeForceGTX650

    NvidiaGeForceMobileGeForceGTX980MGeForceGTX970MGeForceGTX965MGeForceGTX960MGeForceGTX950MGeForce940MGeForce930MGeForceGTX880MGeForceGTX870MGeForceGTX860MGeForceGTX850MGeForce845MGeForce840MGeForce830MGeForceGTX780MGeForceGTX770MGeForceGTX765MGeForceGTX760MGeForceGT750MGeForceGT745MGeForceGT740MGeForceGT735MGeForceGT730MGeForceGTX680MXGeForceGTX680M

    NvidiaQuadroQuadroM6000QuadroM5000QuadroM4000QuadroK6000QuadroK5200QuadroK5000QuadroK4200QuadroK4000QuadroK2200QuadroK2000DQuadroK2000QuadroK1200QuadroK620QuadroK600QuadroK420Quadro6000Quadro5000Quadro4000Quadro2000Quadro600QuadroFX5800QuadroFX5600QuadroFX4800QuadroFX4700X2QuadroFX4600

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 6/14

    GeForceGT640GeForceGT630GeForceGT620GeForceGT610GeForceGTX590GeForceGTX580GeForceGTX570GeForceGTX560TiGeForceGTX560GeForceGTX550TiGeForceGT520GeForceGTX480GeForceGTX470GeForceGTX465GeForceGTX460GeForceGTX460SEGeForceGTS450GeForceGT440GeForceGT430GeForceGT420GeForceGTX295GeForceGTX285GeForceGTX280GeForceGTX275GeForceGTX260GeForceGTS250GeForceGTS240GeForceGT240GeForceGT220GeForce210/G210GeForceGT140GeForce9800GX2GeForce9800GTX+GeForce9800GTXGeForce9800GTGeForce9600GSOGeForce9600GTGeForce9500GTGeForce9400GTGeForce9400mGPUGeForce9300mGPUGeForce9100mGPUGeForce8800UltraGeForce8800GTXGeForce8800GTSGeForce8800GTGeForce8800GSGeForce8600GTSGeForce8600GT

    GeForceGTX675MXGeForceGTX675MGeForceGTX670MXGeForceGTX670MGeForceGTX660MGeForceGT650MGeForceGT645MGeForceGT640MGeForceGTX580MGeForceGTX570MGeForceGTX560MGeForceGT555MGeForceGT550MGeForceGT540MGeForceGT525MGeForceGT520MGeForceGTX480MGeForceGTX470MGeForceGTX460MGeForceGT445MGeForceGT435MGeForceGT425MGeForceGT420MGeForceGT415MGeForceGTX285MGeForceGTX280MGeForceGTX260MGeForceGTS360MGeForceGTS350MGeForceGTS260MGeForceGTS250MGeForceGT335MGeForceGT330MGeForceGT325MGeForceGT320MGeForce310MGeForceGT240MGeForceGT230MGeForceGT220MGeForceG210MGeForceGTS160MGeForceGTS150MGeForceGT130MGeForceGT120MGeForceG110MGeForceG105MGeForceG103MGeForceG102MGeForceG100

    QuadroFX3800QuadroFX3700QuadroFX1800QuadroFX1700QuadroFX580QuadroFX570QuadroFX380QuadroFX370QuadroNVS510QuadroNVS450QuadroNVS420QuadroNVS295QuadroPlex1000ModelIVQuadroPlex1000ModelS4NvidiaQuadroMobileQuadroK5100MQuadroK5000MQuadroK4100MQuadroK4000MQuadroK3100MQuadroK3000MQuadroK2100MQuadroK2000MQuadroK1100MQuadroK1000MQuadroK620MQuadroK610MQuadroK510MQuadroK500MQuadro5010MQuadro5000MQuadro4000MQuadro3000MQuadro2000MQuadro1000MQuadroFX3800MQuadroFX3700MQuadroFX3600MQuadroFX2800MQuadroFX2700MQuadroFX1800MQuadroFX1700MQuadroFX1600MQuadroFX880MQuadroFX770MQuadroFX570MQuadroFX380MQuadroFX370MQuadroFX360M

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 7/14

    GeForce8600mGTGeForce8500GTGeForce8400GSGeForce8300mGPUGeForce8200mGPUGeForce8100mGPU

    GeForce9800MGTXGeForce9800MGTSGeForce9800MGTGeForce9800MGSGeForce9700MGTSGeForce9700MGTGeForce9650MGTGeForce9650MGSGeForce9600MGTGeForce9600MGSGeForce9500MGSGeForce9500MGGeForce9400MGGeForce9300MGSGeForce9300MGGeForce9200MGSGeForce9100MGGeForce8800MGTXGeForce8800MGTSGeForce8700MGTGeForce8600MGTGeForce8600MGSGeForce8400MGTGeForce8400MGSGeForce8400MGGeForce8200MG

    QuadroNVS320MQuadroNVS160MQuadroNVS150MQuadroNVS140MQuadroNVS135MQuadroNVS130M

    NvidiaTeslaTeslaK80TeslaK40TeslaK20XTeslaK20TeslaK10TeslaC2050/2070TeslaM2050/M2070TeslaS2050TeslaS1070TeslaM1060TeslaC1060TeslaC870TeslaD870TeslaS870

    Versionfeaturesandspecifications

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 8/14

    Featuresupport(unlistedfeaturesaresupportedforallcomputecapabilities)

    Computecapability(version)1.0 1.1 1.2 1.3 2.x 3.0 3.5 3.7 5.0 5.2

    Integeratomicfunctionsoperatingon32bitwordsinglobalmemory

    No YesatomicExch()operatingon32bitfloatingpointvaluesinglobalmemoryIntegeratomicfunctionsoperatingon32bitwordsinsharedmemory

    No YesatomicExch()operatingon32bitfloatingpointvaluesinsharedmemoryIntegeratomicfunctionsoperatingon64bitwordsinglobalmemoryWarpvotefunctionsDoubleprecisionfloatingpointoperations No YesAtomicfunctionsoperatingon64bitintegervaluesinsharedmemory

    No Yes

    Floatingpointatomicadditionoperatingon32bitwordsinglobalandsharedmemory_ballot()_threadfence_system()_syncthreads_count(),_syncthreads_and(),_syncthreads_or()Surfacefunctions3DgridofthreadblockWarpshufflefunctions No YesFunnelshift

    No YesDynamicparallelism

    Featuresupport(unlistedfeaturesaresupportedforallcomputecapabilities)

    1.0 1.1 1.2 1.3 2.x 3.0 3.5 3.7 5.0 5.2Computecapability(version)

    TechnicalspecificationsComputecapability(version)

    1.0 1.1 1.2 1.3 2.x 3.0 3.5 3.7 5.0 5.2Maximumdimensionalityofgridofthreadblocks 2 3

    Maximumxdimensionofagridofthreadblocks 65535 2

    311

    Maximumy,orzdimensionofagridofthreadblocks 65535

    Maximumdimensionalityofthreadblock 3

    Maximumxorydimensionofablock 512 1024

    Maximumzdimensionofablock 64Maximumnumberofthreadsperblock 512 1024Warpsize 32Maximumnumberofresidentblockspermultiprocessor 8 16 32

    Maximumnumberofresidentwarpspermultiprocessor 24 32 48 64

    Maximumnumberofresidentthreadspermultiprocessor 768 1024 1536 2048

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 9/14

    Numberof32bitregisterspermultiprocessor 8K 16K 32K 64K 128K 64KMaximumnumberof32bitregistersperthread 128 63 255

    Maximumamountofsharedmemorypermultiprocessor 16KB 48KB

    112KB

    64KB

    96KB

    Numberofsharedmemorybanks 16 32Amountoflocalmemoryperthread 16KB 512KBConstantmemorysize 64KBCacheworkingsetpermultiprocessorforconstantmemory

    8KB 10KB

    Cacheworkingsetpermultiprocessorfortexturememory

    Devicedependent,between6KBand8KB 12KB

    Between12KB

    and48KB24KB

    Maximumwidthfor1DtexturereferenceboundtoaCUDAarray 8192 65536

    Maximumwidthfor1Dtexturereferenceboundtolinearmemory 2

    27

    Maximumwidthandnumberoflayersfora1Dlayeredtexturereference 8192512 163842048

    Maximumwidthandheightfor2DtexturereferenceboundtoaCUDAarray 6553632768 6553665535

    Maximumwidthandheightfor2Dtexturereferenceboundtoalinearmemory 6500065000 6500065000

    Maximumwidthandheightfor2DtexturereferenceboundtoaCUDAarraysupportingtexturegather

    N/A 1638416384

    Maximumwidth,height,andnumberoflayersfora2Dlayeredtexturereference 81928192512 16384163842048

    Maximumwidth,heightanddepthfora3DtexturereferenceboundtolinearmemoryoraCUDAarray

    204820482048 409640964096

    Maximumwidth(andheight)foracubemaptexturereference N/A 16384

    Maximumwidth(andheight)andnumberoflayersforacubemaplayeredtexturereference

    N/A 163842046

    Maximumnumberoftexturesthatcanbeboundtoakernel 128 256

    Maximumwidthfora1DsurfacereferenceboundtoaCUDAarray

    Notsupported

    65536

    Maximumwidthandnumberoflayersfora1Dlayeredsurfacereference 655362048

    Maximumwidthandheightfora2DsurfacereferenceboundtoaCUDAarray 6553632768

    Maximumwidth,height,andnumberoflayersfora2Dlayeredsurfacereference 65536327682048

    Maximumwidth,height,anddepthfora3DsurfacereferenceboundtoaCUDAarray

    65536327682048

    Maximumwidth(andheight)foracubemapsurfacereferenceboundtoaCUDAarray 32768

    Maximumwidth(andheight)andnumber

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 10/14

    oflayersforacubemaplayeredsurfacereference

    327682046

    Maximumnumberofsurfacesthatcanbeboundtoakernel 8 16

    Maximumnumberofinstructionsperkernel 2million 512million

    Technicalspecifications1.0 1.1 1.2 1.3 2.x 3.0 3.5 3.7 5.0 5.2

    Computecapability(version)

    ArchitecturespecificationsComputecapability(version)

    1.0 1.1 1.2 1.3 2.0 2.1 3.0 3.5 3.7 5.0 5.2NumberofALUlanesforintegerandfloatingpointarithmeticoperations 8

    [19] 32 48 192 128

    Numberofspecialfunctionunitsforsingleprecisionfloatingpointtranscendentalfunctions 2 4 8 32

    Numberoftexturefilteringunitsforeverytextureaddressunitorrenderoutputunit(ROP) 2 4 8 16 8

    Numberofwarpschedulers 1 2 4Numberofinstructionsissuedatoncebyscheduler 1 2[20]

    Formoreinformationpleasevisitthissite:http://www.geeks3d.com/20100606/gpucomputingnvidiacudacomputecapabilitycomparativetable/andalsoreadNvidiaCUDAprogrammingguide.[21]

    Example

    ThisexamplecodeinC++loadsatexturefromanimageintoanarrayontheGPU:

    texturetex;

    voidfoo(){cudaArray*cu_array;

    //AllocatearraycudaChannelFormatDescdescription=cudaCreateChannelDesc();cudaMallocArray(&cu_array,&description,width,height);

    //CopyimagedatatoarraycudaMemcpyToArray(cu_array,image,width*height*sizeof(float),cudaMemcpyHostToDevice);

    //Settextureparameters(default)tex.addressMode[0]=cudaAddressModeClamp;tex.addressMode[1]=cudaAddressModeClamp;tex.filterMode=cudaFilterModePoint;tex.normalized=false;//donotnormalizecoordinates

    //BindthearraytothetexturecudaBindTextureToArray(tex,cu_array);

    //Runkerneldim3blockDim(16,16,1);dim3gridDim((width+blockDim.x1)/blockDim.x,(height+blockDim.y1)/blockDim.y,1);kernel(d_data,height,width);

    //UnbindthearrayfromthetexturecudaUnbindTexture(tex);}//endfoo()

    __global__voidkernel(float*odata,intheight,intwidth){unsignedintx=blockIdx.x*blockDim.x+threadIdx.x;

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 11/14

    unsignedinty=blockIdx.y*blockDim.y+threadIdx.y;if(x

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 12/14

    MathematicaCUDALink(http://reference.wolfram.com/mathematica/CUDALink/tutorial/Overview.html)MATLABParallelComputingToolbox,MATLABDistributedComputingServer,[24]and3rdpartypackageslikeJacket..NETCUDA.NET(http://www.casshpc.com/solutions/libraries/cudanet),ManagedCUDA(https://managedcuda.codeplex.com),CUDAfy.NET(http://www.hybriddsp.com).NETkernelandhostcode,CURAND,CUBLAS,CUFFTPerlKappaCUDA(http://psilambda.com/download/kappaforperl),CUDA::Minimal(https://github.com/run4flat/perlCUDAMinimal)PythonNumba,NumbaPro,PyCUDA(http://mathema.tician.de/software/pycuda),KappaCUDA(http://psilambda.com/download/kappaforpython),TheanoRubyKappaCUDA(http://psilambda.com/download/kappaextras)Rgputools(http://brainarray.mbni.med.umich.edu/brainarray/rgpgpu/)

    CurrentandfutureusagesofCUDAarchitecture

    Acceleratedrenderingof3DgraphicsAcceleratedinterconversionofvideofileformatsAcceleratedencryption,decryptionandcompressionDistributedcalculations,suchaspredictingthenativeconformationofproteinsMedicalanalysissimulations,forexamplevirtualrealitybasedonCTandMRIscanimages.Physicalsimulations,inparticularinfluiddynamics.NeuralnetworktraininginmachinelearningproblemsDistributedcomputingMoleculardynamicsMiningcryptocurrencies

    Seealso

    AllineaDDTAdebuggerforCUDA,OpenACC,andparallelapplicationsOpenCLAstandardforprogrammingavarietyofplatforms,includingGPUsBrookGPUtheStanfordUniversitygraphicsgroup'scompilerArrayprogrammingParallelcomputingStreamprocessingrCUDAAnAPIforcomputingonremotecomputersMolecularmodelingonGPU

    References1. Shimpi,AnandLalWilson,Derek(November8,2006)."NVIDIA'sGeForce8800(G80):GPUsRe

    architectedforDirectX10"(http://www.anandtech.com/show/2116/8).AnandTech.RetrievedMay16,2015.2. NVIDIACUDAHomePage(http://www.nvidia.com/object/cuda_home_new.html)3. AbiChahla,Fedy(June18,2008)."Nvidia'sCUDA:TheEndoftheCPU?"

    (http://www.tomshardware.com/reviews/nvidiacudagpu,1954.html).Tom'sHardware.RetrievedMay17,2015.

    4. CUDALLVMCompiler(http://developer.nvidia.com/cuda/cudallvmcompiler)5. FirstOpenCLdemoonaGPU(https://www.youtube.com/watch?v=r1sN1ELJfNo)onYouTube6. DirectComputeOceanDemoRunningonNvidiaCUDAenabledGPU(https://www.youtube.com/watch?

    v=K1I4kts5mqc)onYouTube7. GiorgosVasiliadis,SpirosAntonatos,MichalisPolychronakis,EvangelosP.MarkatosandSotirisIoannidis

    (September2008)."Gnort:HighPerformanceNetworkIntrusionDetectionUsingGraphicsProcessors"(http://www.ics.forth.gr/dcs/Activities/papers/gnort.raid08.pdf)(PDF).Proceedingsofthe11thInternational

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 13/14

    ExternallinksOfficialwebsite(http://www.nvidia.com/object/cuda_home.html)CUDACommunity(https://plus.google.com/communities/114632076318201174454)onGoogle+AlittletooltoadjusttheVRAMsize(https://devtalk.nvidia.com/default/topic/726765/needalittletooltoadjustthevramsize/)

    Retrievedfrom"https://en.wikipedia.org/w/index.php?title=CUDA&oldid=674383050"

    Categories: Computerphysicsengines GPGPU GPGPUlibraries GraphicshardwareNvidiasoftware Parallelcomputing Videocards Videogamehardware

    (http://www.ics.forth.gr/dcs/Activities/papers/gnort.raid08.pdf)(PDF).Proceedingsofthe11thInternationalSymposiumonRecentAdvancesinIntrusionDetection(RAID).

    8. Schatz,M.C.,Trapnell,C.,Delcher,A.L.,Varshney,A.(2007)."HighthroughputsequencealignmentusingGraphicsProcessingUnits"(http://www.biomedcentral.com/14712105/8/474).BMCBioinformatics.8:474:474.doi:10.1186/147121058474(https://dx.doi.org/10.1186%2F147121058474).PMC2222658(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2222658).PMID18070356(https://www.ncbi.nlm.nih.gov/pubmed/18070356).

    9. Manavski,SvetlinA.GiorgioValle(2008)."CUDAcompatibleGPUcardsasefficienthardwareacceleratorsforSmithWatermansequencealignment"(http://www.biomedcentral.com/14712105/9/S2/S10).BMCBioinformatics9:S10.doi:10.1186/147121059S2S10(https://dx.doi.org/10.1186%2F147121059S2S10).PMC2323659(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2323659).PMID18387198(https://www.ncbi.nlm.nih.gov/pubmed/18387198).

    10. PyritGoogleCodehttps://code.google.com/p/pyrit/11. UseyourNvidiaGPUforscientificcomputing(http://boinc.berkeley.edu/cuda.php),BOINCofficialsite

    (December18,2008)12. NvidiaCUDASoftwareDevelopmentKit(CUDASDK)ReleaseNotesVersion2.0forMACOSX

    (http://developer.download.nvidia.com/compute/cuda/sdk/website/doc/CUDA_SDK_release_notes_macosx.txt)13. CUDA1.1NowonMacOSX(http://news.developer.nvidia.com/2008/02/cuda11nowo.html)(Posted

    onFeb14,2008)14. Silberstein,MarkSchuster,AssafGeiger,DanPatney,AnjulOwens,JohnD.(2008).Efficientcomputation

    ofsumproductsonGPUsthroughsoftwaremanagedcache.Proceedingsofthe22ndannualinternationalconferenceonSupercomputingICS'08.pp.309318.doi:10.1145/1375527.1375572(https://dx.doi.org/10.1145%2F1375527.1375572).ISBN9781605581583.

    15. NVCCforcesc++compilationof.cufiles(https://devtalk.nvidia.com/default/topic/508479/cudaprogrammingandperformance/nvccforcesccompilationofcufiles/#entry1340190)

    16. C++keywordsonCUDACcode(http://stackoverflow.com/questions/15362678/ckeywordsoncudaccode/15362798)

    17. "CUDAEnabledProducts"(http://www.nvidia.com/object/cuda_learn_products.html).CUDAZone.NvidiaCorporation.Retrieved20081103.

    18. Whitehead,NathanFitFlorea,Alex."Precision&Performance:FloatingPointandIEEE754ComplianceforNVIDIAGPUs"(https://developer.nvidia.com/sites/default/files/akamai/cuda/files/NVIDIACUDAFloatingPoint.pdf)(PDF).Nvidia.RetrievedNovember18,2014.

    19. ALUsperformonlysingleprecisionfloatingpointarithmetics.Thereis1doubleprecisionfloatingpointunit.

    20. Nomorethanoneschedulercanissue2instructionsatonce.ThefirstschedulerisinchargeofthewarpswithanoddIDandthesecondschedulerisinchargeofthewarpswithanevenID.

    21. AppendixF.FeaturesandTechnicalSpecifications(http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf)PDF(3.2MiB),Page148of175(Version5.0October2012)

    22. PyCUDA(http://mathema.tician.de/software/pycuda)23. pycublas(http://kered.org/blog/20090413/easypythonnumpycudacublas/)24. "MATLABAddsGPGPUSupport"(http://www.hpcwire.com/features/MATLABAddsGPGPUSupport

    103307084.html).20100920.

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 14/14

    Thispagewaslastmodifiedon3August2015,at15:46.TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmayapply.Byusingthissite,youagreetotheTermsofUseandPrivacyPolicy.WikipediaisaregisteredtrademarkoftheWikimediaFoundation,Inc.,anonprofitorganization.