Download - CUDA - Wikipedia, The Free Encyclopedia

Transcript
  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 1/14

    CUDA

    Aparallelcomputingplatformandprogrammingmodel

    Developer(s) NVIDIACorporation

    Initialrelease June23,2007

    Stablerelease 7.0/March17,2015

    Operatingsystem WindowsXPandlater,MacOSX,Linux

    Platform SupportedGPUs

    Type GPGPU

    License Freeware

    Website www.nvidia.com/object/cuda_home_new.html(http://www.nvidia.com/object/cuda_home_new.html)

    CUDAFromWikipedia,thefreeencyclopedia

    CUDA,whichstandsforComputeUnifiedDeviceArchitecture,[1]isaparallelcomputingplatformandapplicationprogramminginterface(API)modelcreatedbyNVIDIA.[2]ItallowssoftwaredeveloperstouseaCUDAenabledgraphicsprocessingunit(GPU)forgeneralpurposeprocessinganapproachknownasGPGPU.TheCUDAplatformisasoftwarelayerthatgivesdirectaccesstotheGPU'svirtualinstructionsetandparallelcomputationalelements.[3]

    TheCUDAplatformisdesignedtoworkwithprogramminglanguagessuchasC,C++andFortran.ThisaccessibilitymakesiteasierforspecialistsinparallelprogrammingtoutilizeGPUresources,asopposedtopreviousAPIsolutionslikeDirect3DandOpenGL,whichrequiredadvancedskillsingraphicsprogramming.Also,CUDAsupportsprogrammingframeworkssuchasOpenACCandOpenCL.[3]

    Contents

    1Background2Programmingcapabilities3Advantages4Limitations5SupportedGPUs6Versionfeaturesandspecifications7Example8Languagebindings9CurrentandfutureusagesofCUDAarchitecture10Seealso11References12Externallinks

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 2/14

    ExampleofCUDAprocessingflow1.CopydatafrommainmemtoGPUmem2.CPUinstructstheprocesstoGPU3.GPUexecuteparallelineachcore4.CopytheresultfromGPUmemtomainmem

    Background

    TheGPU,asaspecializedprocessor,addressesthedemandsofrealtimehighresolution3Dgraphicscomputeintensivetasks.Asof2012,GPUshaveevolvedintohighlyparallelmulticoresystemsallowingveryefficientmanipulationoflargeblocksofdata.ThisdesignismoreeffectivethangeneralpurposeCPUsforalgorithmswhereprocessingoflargeblocksofdataisdoneinparallel,suchas:

    pushrelabelmaximumflowalgorithmfastsortalgorithmsoflargeliststwodimensionalfastwavelettransformmoleculardynamicssimulations

    Programmingcapabilities

    TheCUDAplatformisaccessibletosoftwaredevelopersthroughCUDAacceleratedlibraries,compilerdirectivessuchasOpenACC,andextensionstoindustrystandardprogramminglanguagesincludingC,C++andFortran.C/C++programmersuse'CUDAC/C++',compiledwith"nvcc"NVIDIA'sLLVMbasedC/C++compiler.[4]Fortranprogrammerscanuse'CUDAFortran',compiledwiththePGICUDAFortrancompilerfromThePortlandGroup.

    Inadditiontolibraries,compilerdirectives,CUDAC/C++andCUDAFortran,theCUDAplatformsupportsothercomputationalinterfaces,includingtheKhronosGroup'sOpenCL,[5]Microsoft'sDirectCompute,OpenGLComputeShaders(http://www.opengl.org/wiki/Compute_Shader)andC++AMP.[6]ThirdpartywrappersarealsoavailableforPython,Perl,Fortran,Java,Ruby,Lua,Haskell,R,MATLAB,IDL,andnativesupportinMathematica.

    Inthecomputergameindustry,GPUsareusednotonlyforgraphicsrenderingbutalsoingamephysicscalculations(physicaleffectssuchasdebris,smoke,fire,fluids)examplesincludePhysXandBullet.CUDAhasalsobeenusedtoacceleratenongraphicalapplicationsincomputationalbiology,cryptographyandotherfieldsbyanorderofmagnitudeormore.[7][8][9][10][11]

    CUDAprovidesbothalowlevelAPIandahigherlevelAPI.TheinitialCUDASDKwasmadepublicon15February2007,forMicrosoftWindowsandLinux.MacOSXsupportwaslateraddedinversion2.0,[12]whichsupersedesthebetareleasedFebruary14,2008.[13]CUDAworkswithallNvidiaGPUsfromtheG8xseriesonwards,includingGeForce,QuadroandtheTeslaline.CUDAiscompatiblewithmoststandardoperatingsystems.NvidiastatesthatprogramsdevelopedfortheG8xserieswillalsoworkwithoutmodificationonallfutureNvidiavideocards,duetobinarycompatibility.

    Advantages

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 3/14

    CUDAhasseveraladvantagesovertraditionalgeneralpurposecomputationonGPUs(GPGPU)usinggraphicsAPIs:

    ScatteredreadscodecanreadfromarbitraryaddressesinmemoryUnifiedvirtualmemory(CUDA4.0andabove)Unifiedmemory(CUDA6.0andabove)SharedmemoryCUDAexposesafastsharedmemoryregionthatcanbesharedamongstthreads.Thiscanbeusedasausermanagedcache,enablinghigherbandwidththanispossibleusingtexturelookups.[14]FasterdownloadsandreadbackstoandfromtheGPUFullsupportforintegerandbitwiseoperations,includingintegertexturelookups

    Limitations

    CUDAdoesnotsupportthefullCstandard,asitrunshostcodethroughaC++compiler,whichmakessomevalidC(butinvalidC++)codefailtocompile.[15][16]InteroperabilitywithrenderinglanguagessuchasOpenGLisoneway,withOpenGLhavingaccesstoregisteredCUDAmemorybutCUDAnothavingaccesstoOpenGLmemory.Copyingbetweenhostanddevicememorymayincuraperformancehitduetosystembusbandwidthandlatency(thiscanbepartlyalleviatedwithasynchronousmemorytransfers,handledbytheGPU'sDMAengine)Threadsshouldberunningingroupsofatleast32forbestperformance,withtotalnumberofthreadsnumberinginthethousands.Branchesintheprogramcodedonotaffectperformancesignificantly,providedthateachof32threadstakesthesameexecutionpaththeSIMDexecutionmodelbecomesasignificantlimitationforanyinherentlydivergenttask(e.g.traversingaspacepartitioningdatastructureduringraytracing).UnlikeOpenCL,CUDAenabledGPUsareonlyavailablefromNvidia[17]NoemulatororfallbackfunctionalityisavailableformodernrevisionsValidC/C++maysometimesbeflaggedandpreventcompilationduetooptimizationtechniquesthecompilerisrequiredtoemploytouselimitedresources.Asingleprocessmustrunspreadacrossmultipledisjointmemoryspaces,unlikeotherClanguageruntimeenvironments.C++RunTimeTypeInformation(RTTI)isnotsupportedinCUDAcode,duetolackofsupportintheunderlyinghardware.ExceptionhandlingisnotsupportedinCUDAcodeduetoperformanceoverheadthatwouldbeincurredwithmanythousandsofparallelthreadsrunning.CUDA(withcomputecapability2.x)allowsasubsetofC++classfunctionality,forexamplememberfunctionsmaynotbevirtual(thisrestrictionwillberemovedinsomefuturerelease).[SeeCUDACProgrammingGuide3.1AppendixD.6]InsingleprecisiononfirstgenerationCUDAcomputecapability1.xdevices,denormalnumbersarenotsupportedandareinsteadflushedtozero,andtheprecisionsofthedivisionandsquarerootoperationsareslightlylowerthanIEEE754compliantsingleprecisionmath.Devicesthatsupportcomputecapability2.0andabovesupportdenormalnumbers,andthedivisionandsquarerootoperationsareIEEE754compliantbydefault.However,userscanobtainthepreviousfastergaminggrademathofcomputecapability1.xdevicesifdesiredbysettingcompilerflagstodisableaccuratedivisions,disableaccuratesquareroots,andenableflushingdenormalnumberstozero.[18]

    SupportedGPUs

    Computecapabilitytable(versionofCUDAsupported)byGPUandcard.AlsoavailabledirectlyfromNvidia(http://developer.nvidia.com/cudagpus):

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 4/14

    Computecapability(version)

    Microarchitecture GPUs Cards

    1.0

    Tesla

    G80,G92,G92b,G94,G94b

    GeForceGT420*,GeForce8800Ultra,GeForce8800GTX,GeForceGT340*,GeForceGT330*,GeForceGT320*,GeForce315*,GeForce310*,GeForce9800GT,GeForce9600GT,GeForce9400GT,QuadroFX5600,QuadroFX4600,QuadroPlex2100S4,TeslaC870,TeslaD870,TeslaS870

    1.1

    G86,G84,G98,G96,G96b,G94,G94b,G92,G92b

    GeForceG110M,GeForce9300MGS,GeForce9200MGS,GeForce9100MG,GeForce8400MGT,GeForce8600GT,GeForce8600GTS,GeForceG105M,QuadroFX4700X2,QuadroFX3700,QuadroFX1800,Quadro

    FX1700,QuadroFX580,QuadroFX570,QuadroFX470,QuadroFX380,QuadroFX370,QuadroFX370LowProfile,QuadroNVS450,QuadroNVS420,QuadroNVS290,QuadroNVS295,QuadroPlex2100D4,QuadroFX3800M,QuadroFX3700M,QuadroFX3600M,QuadroFX

    2800M,QuadroFX2700M,QuadroFX1700M,QuadroFX1600M,QuadroFX770M,QuadroFX570M,QuadroFX370M,QuadroFX360M,QuadroNVS320M,QuadroNVS160M,QuadroNVS150M,QuadroNVS140M,QuadroNVS135M,QuadroNVS130M,QuadroNVS450,QuadroNVS

    420,QuadroNVS295

    1.2GT218,GT216,GT215

    GeForceGT240,GeForceGT220*,GeForce210*,GeForceGTS360M,GeForceGTS350M,GeForceGT335M,GeForceGT330M,GeForceGT

    325M,GeForceGT240M,GeForceG210M,GeForce310M,GeForce305M,QuadroFX380LowProfile,NVIDIANVS300,QuadroFX1800M,

    QuadroFX880M,QuadroFX380M,NVIDIANVS300,NVS5100M,NVS3100M,NVS2100M,ION

    1.3 GT200,GT200b

    GeForceGTX280,GeForceGTX275,GeForceGTX260,QuadroFX5800,QuadroFX4800,QuadroFX4800forMac,QuadroFX3800,QuadroCX,

    QuadroPlex2200D2,TeslaC1060,TeslaS1070,TeslaM1060

    2.0

    Fermi

    GF100,GF110

    GeForceGTX590,GeForceGTX580,GeForceGTX570,GeForceGTX480,GeForceGTX470,GeForceGTX465,GeForceGTX480M,Quadro

    6000,Quadro5000,Quadro4000,Quadro4000forMac,QuadroPlex7000,Quadro5010M,Quadro5000M,TeslaC2075,TeslaC2050/C2070,Tesla

    M2050/M2070/M2075/M2090

    2.1

    GF104,GF106

    GF108GF114,GF116,GF119

    GeForceGTX560Ti,GeForceGTX550Ti,GeForceGTX460,GeForceGTS450,GeForceGTS450*,GeForceGT640(GDDR3),GeForceGT630,

    GeForceGT620,GeForceGT610,GeForceGT520,GeForceGT440,GeForceGT440*,GeForceGT430,GeForceGT430*,GeForceGTX675M,GeForceGTX670M,GeForceGT635M,GeForceGT630M,

    GeForceGT625M,GeForceGT720M,GeForceGT620M,GeForce710M,GeForce610M,GeForceGTX580M,GeForceGTX570M,GeForceGTX560M,GeForceGT555M,GeForceGT550M,GeForceGT540M,GeForceGT525M,GeForceGT520MX,GeForceGT520M,GeForceGTX485M,GeForceGTX470M,GeForceGTX460M,GeForceGT445M,GeForceGT435M,GeForceGT420M,GeForceGT415M,GeForce710M,GeForce410M,Quadro2000,Quadro2000D,Quadro600,Quadro410,Quadro

    4000M,Quadro3000M,Quadro2000M,Quadro1000M,NVS5400M,NVS5200M,NVS4200M

    3.0GK104,GK106,GK107

    GeForceGTX770,GeForceGTX760,GeForceGT740,GeForceGTX690,GeForceGTX680,GeForceGTX670,GeForceGTX660Ti,GeForceGTX660,GeForceGTX650TiBOOST,GeForceGTX650Ti,GeForceGTX650,GeForceGTX880M,GeForceGTX780M,GeForceGTX770M,

    GeForceGTX765M,GeForceGTX760M,GeForceGTX680MX,GeForceGTX680M,GeForceGTX675MX,GeForceGTX670MX,GeForceGTX660M,GeForceGT750M,GeForceGT650M,GeForceGT745M,GeForce

    GT645M,GeForceGT740M,GeForceGT730M,GeForceGT640M,GeForceGT640MLE,GeForceGT735M,GeForceGT730M,Quadro

    K5000,QuadroK4200,QuadroK4000,QuadroK2000,QuadroK2000D,

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 5/14

    Kepler QuadroK600,QuadroK420,QuadroK500M,QuadroK510M,QuadroK610M,QuadroK1000M,QuadroK2000M,QuadroK1100M,QuadroK2100M,QuadroK3000M,QuadroK3100M,QuadroK4000M,Quadro

    K5000M,QuadroK4100M,QuadroK5100M,TeslaK103.2 TegraK1 JetsonTK1(SoC)

    3.5 GK110,GK208

    GeForceGTXTITANZ,GeForceGTXTITANBlack,GeForceGTXTITAN,GeForceGTX780Ti,GeForceGTX780,GeForceGT640

    (GDDR5),GeForceGT630v2,GeForceGT730,GeForceGT720,QuadroK6000,QuadroK5200,TeslaK40,TeslaK20x,TeslaK20

    3.7 GK210 TeslaK80

    5.0

    Maxwell

    GM107,GM108

    GeForceGTX750Ti,GeForceGTX750,GeForceGTX960M,GeForceGTX950M,GeForce940M,GeForce930M,GeForceGTX860M,GeForceGTX850M,GeForce845M,GeForce840M,GeForce830M,QuadroK2200,

    QuadroK1200,QuadroK620,QuadroK620M

    5.2GM200,GM204,GM206

    GeForceGTXTITANX,GeForceGTX980Ti,GeForceGTX980,GeForceGTX970,GeForceGTX960,GeForceGTX950,GeForceGTX980M,

    GeForceGTX970M,GeForceGTX965M,QuadroM6000,QuadroM5000,QuadroM4000

    5.3 TegraX1

    '*'OEMonlyproducts

    AtableofdevicesofficiallysupportingCUDA:[17]

    NvidiaGeForceGeForceGTXTITANXGeForceGTX980TiGeForceGTX980GeForceGTX970GeForceGTX960GeForceGTX950GeForceGTXTitanZGeForceGTXTITANBlackGeForceGTXTITANGeForceGTX780TiGeForceGTX780GeForceGTX770GeForceGTX760GeForceGTX750TiGeForceGTX750GeForceGT740GeForceGT730GeForceGTX690GeForceGTX680GeForceGTX670GeForceGTX660TiGeForceGTX660GeForceGTX650TiBOOSTGeForceGTX650TiGeForceGTX650

    NvidiaGeForceMobileGeForceGTX980MGeForceGTX970MGeForceGTX965MGeForceGTX960MGeForceGTX950MGeForce940MGeForce930MGeForceGTX880MGeForceGTX870MGeForceGTX860MGeForceGTX850MGeForce845MGeForce840MGeForce830MGeForceGTX780MGeForceGTX770MGeForceGTX765MGeForceGTX760MGeForceGT750MGeForceGT745MGeForceGT740MGeForceGT735MGeForceGT730MGeForceGTX680MXGeForceGTX680M

    NvidiaQuadroQuadroM6000QuadroM5000QuadroM4000QuadroK6000QuadroK5200QuadroK5000QuadroK4200QuadroK4000QuadroK2200QuadroK2000DQuadroK2000QuadroK1200QuadroK620QuadroK600QuadroK420Quadro6000Quadro5000Quadro4000Quadro2000Quadro600QuadroFX5800QuadroFX5600QuadroFX4800QuadroFX4700X2QuadroFX4600

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 6/14

    GeForceGT640GeForceGT630GeForceGT620GeForceGT610GeForceGTX590GeForceGTX580GeForceGTX570GeForceGTX560TiGeForceGTX560GeForceGTX550TiGeForceGT520GeForceGTX480GeForceGTX470GeForceGTX465GeForceGTX460GeForceGTX460SEGeForceGTS450GeForceGT440GeForceGT430GeForceGT420GeForceGTX295GeForceGTX285GeForceGTX280GeForceGTX275GeForceGTX260GeForceGTS250GeForceGTS240GeForceGT240GeForceGT220GeForce210/G210GeForceGT140GeForce9800GX2GeForce9800GTX+GeForce9800GTXGeForce9800GTGeForce9600GSOGeForce9600GTGeForce9500GTGeForce9400GTGeForce9400mGPUGeForce9300mGPUGeForce9100mGPUGeForce8800UltraGeForce8800GTXGeForce8800GTSGeForce8800GTGeForce8800GSGeForce8600GTSGeForce8600GT

    GeForceGTX675MXGeForceGTX675MGeForceGTX670MXGeForceGTX670MGeForceGTX660MGeForceGT650MGeForceGT645MGeForceGT640MGeForceGTX580MGeForceGTX570MGeForceGTX560MGeForceGT555MGeForceGT550MGeForceGT540MGeForceGT525MGeForceGT520MGeForceGTX480MGeForceGTX470MGeForceGTX460MGeForceGT445MGeForceGT435MGeForceGT425MGeForceGT420MGeForceGT415MGeForceGTX285MGeForceGTX280MGeForceGTX260MGeForceGTS360MGeForceGTS350MGeForceGTS260MGeForceGTS250MGeForceGT335MGeForceGT330MGeForceGT325MGeForceGT320MGeForce310MGeForceGT240MGeForceGT230MGeForceGT220MGeForceG210MGeForceGTS160MGeForceGTS150MGeForceGT130MGeForceGT120MGeForceG110MGeForceG105MGeForceG103MGeForceG102MGeForceG100

    QuadroFX3800QuadroFX3700QuadroFX1800QuadroFX1700QuadroFX580QuadroFX570QuadroFX380QuadroFX370QuadroNVS510QuadroNVS450QuadroNVS420QuadroNVS295QuadroPlex1000ModelIVQuadroPlex1000ModelS4NvidiaQuadroMobileQuadroK5100MQuadroK5000MQuadroK4100MQuadroK4000MQuadroK3100MQuadroK3000MQuadroK2100MQuadroK2000MQuadroK1100MQuadroK1000MQuadroK620MQuadroK610MQuadroK510MQuadroK500MQuadro5010MQuadro5000MQuadro4000MQuadro3000MQuadro2000MQuadro1000MQuadroFX3800MQuadroFX3700MQuadroFX3600MQuadroFX2800MQuadroFX2700MQuadroFX1800MQuadroFX1700MQuadroFX1600MQuadroFX880MQuadroFX770MQuadroFX570MQuadroFX380MQuadroFX370MQuadroFX360M

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 7/14

    GeForce8600mGTGeForce8500GTGeForce8400GSGeForce8300mGPUGeForce8200mGPUGeForce8100mGPU

    GeForce9800MGTXGeForce9800MGTSGeForce9800MGTGeForce9800MGSGeForce9700MGTSGeForce9700MGTGeForce9650MGTGeForce9650MGSGeForce9600MGTGeForce9600MGSGeForce9500MGSGeForce9500MGGeForce9400MGGeForce9300MGSGeForce9300MGGeForce9200MGSGeForce9100MGGeForce8800MGTXGeForce8800MGTSGeForce8700MGTGeForce8600MGTGeForce8600MGSGeForce8400MGTGeForce8400MGSGeForce8400MGGeForce8200MG

    QuadroNVS320MQuadroNVS160MQuadroNVS150MQuadroNVS140MQuadroNVS135MQuadroNVS130M

    NvidiaTeslaTeslaK80TeslaK40TeslaK20XTeslaK20TeslaK10TeslaC2050/2070TeslaM2050/M2070TeslaS2050TeslaS1070TeslaM1060TeslaC1060TeslaC870TeslaD870TeslaS870

    Versionfeaturesandspecifications

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 8/14

    Featuresupport(unlistedfeaturesaresupportedforallcomputecapabilities)

    Computecapability(version)1.0 1.1 1.2 1.3 2.x 3.0 3.5 3.7 5.0 5.2

    Integeratomicfunctionsoperatingon32bitwordsinglobalmemory

    No YesatomicExch()operatingon32bitfloatingpointvaluesinglobalmemoryIntegeratomicfunctionsoperatingon32bitwordsinsharedmemory

    No YesatomicExch()operatingon32bitfloatingpointvaluesinsharedmemoryIntegeratomicfunctionsoperatingon64bitwordsinglobalmemoryWarpvotefunctionsDoubleprecisionfloatingpointoperations No YesAtomicfunctionsoperatingon64bitintegervaluesinsharedmemory

    No Yes

    Floatingpointatomicadditionoperatingon32bitwordsinglobalandsharedmemory_ballot()_threadfence_system()_syncthreads_count(),_syncthreads_and(),_syncthreads_or()Surfacefunctions3DgridofthreadblockWarpshufflefunctions No YesFunnelshift

    No YesDynamicparallelism

    Featuresupport(unlistedfeaturesaresupportedforallcomputecapabilities)

    1.0 1.1 1.2 1.3 2.x 3.0 3.5 3.7 5.0 5.2Computecapability(version)

    TechnicalspecificationsComputecapability(version)

    1.0 1.1 1.2 1.3 2.x 3.0 3.5 3.7 5.0 5.2Maximumdimensionalityofgridofthreadblocks 2 3

    Maximumxdimensionofagridofthreadblocks 65535 2

    311

    Maximumy,orzdimensionofagridofthreadblocks 65535

    Maximumdimensionalityofthreadblock 3

    Maximumxorydimensionofablock 512 1024

    Maximumzdimensionofablock 64Maximumnumberofthreadsperblock 512 1024Warpsize 32Maximumnumberofresidentblockspermultiprocessor 8 16 32

    Maximumnumberofresidentwarpspermultiprocessor 24 32 48 64

    Maximumnumberofresidentthreadspermultiprocessor 768 1024 1536 2048

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 9/14

    Numberof32bitregisterspermultiprocessor 8K 16K 32K 64K 128K 64KMaximumnumberof32bitregistersperthread 128 63 255

    Maximumamountofsharedmemorypermultiprocessor 16KB 48KB

    112KB

    64KB

    96KB

    Numberofsharedmemorybanks 16 32Amountoflocalmemoryperthread 16KB 512KBConstantmemorysize 64KBCacheworkingsetpermultiprocessorforconstantmemory

    8KB 10KB

    Cacheworkingsetpermultiprocessorfortexturememory

    Devicedependent,between6KBand8KB 12KB

    Between12KB

    and48KB24KB

    Maximumwidthfor1DtexturereferenceboundtoaCUDAarray 8192 65536

    Maximumwidthfor1Dtexturereferenceboundtolinearmemory 2

    27

    Maximumwidthandnumberoflayersfora1Dlayeredtexturereference 8192512 163842048

    Maximumwidthandheightfor2DtexturereferenceboundtoaCUDAarray 6553632768 6553665535

    Maximumwidthandheightfor2Dtexturereferenceboundtoalinearmemory 6500065000 6500065000

    Maximumwidthandheightfor2DtexturereferenceboundtoaCUDAarraysupportingtexturegather

    N/A 1638416384

    Maximumwidth,height,andnumberoflayersfora2Dlayeredtexturereference 81928192512 16384163842048

    Maximumwidth,heightanddepthfora3DtexturereferenceboundtolinearmemoryoraCUDAarray

    204820482048 409640964096

    Maximumwidth(andheight)foracubemaptexturereference N/A 16384

    Maximumwidth(andheight)andnumberoflayersforacubemaplayeredtexturereference

    N/A 163842046

    Maximumnumberoftexturesthatcanbeboundtoakernel 128 256

    Maximumwidthfora1DsurfacereferenceboundtoaCUDAarray

    Notsupported

    65536

    Maximumwidthandnumberoflayersfora1Dlayeredsurfacereference 655362048

    Maximumwidthandheightfora2DsurfacereferenceboundtoaCUDAarray 6553632768

    Maximumwidth,height,andnumberoflayersfora2Dlayeredsurfacereference 65536327682048

    Maximumwidth,height,anddepthfora3DsurfacereferenceboundtoaCUDAarray

    65536327682048

    Maximumwidth(andheight)foracubemapsurfacereferenceboundtoaCUDAarray 32768

    Maximumwidth(andheight)andnumber

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 10/14

    oflayersforacubemaplayeredsurfacereference

    327682046

    Maximumnumberofsurfacesthatcanbeboundtoakernel 8 16

    Maximumnumberofinstructionsperkernel 2million 512million

    Technicalspecifications1.0 1.1 1.2 1.3 2.x 3.0 3.5 3.7 5.0 5.2

    Computecapability(version)

    ArchitecturespecificationsComputecapability(version)

    1.0 1.1 1.2 1.3 2.0 2.1 3.0 3.5 3.7 5.0 5.2NumberofALUlanesforintegerandfloatingpointarithmeticoperations 8

    [19] 32 48 192 128

    Numberofspecialfunctionunitsforsingleprecisionfloatingpointtranscendentalfunctions 2 4 8 32

    Numberoftexturefilteringunitsforeverytextureaddressunitorrenderoutputunit(ROP) 2 4 8 16 8

    Numberofwarpschedulers 1 2 4Numberofinstructionsissuedatoncebyscheduler 1 2[20]

    Formoreinformationpleasevisitthissite:http://www.geeks3d.com/20100606/gpucomputingnvidiacudacomputecapabilitycomparativetable/andalsoreadNvidiaCUDAprogrammingguide.[21]

    Example

    ThisexamplecodeinC++loadsatexturefromanimageintoanarrayontheGPU:

    texturetex;

    voidfoo(){cudaArray*cu_array;

    //AllocatearraycudaChannelFormatDescdescription=cudaCreateChannelDesc();cudaMallocArray(&cu_array,&description,width,height);

    //CopyimagedatatoarraycudaMemcpyToArray(cu_array,image,width*height*sizeof(float),cudaMemcpyHostToDevice);

    //Settextureparameters(default)tex.addressMode[0]=cudaAddressModeClamp;tex.addressMode[1]=cudaAddressModeClamp;tex.filterMode=cudaFilterModePoint;tex.normalized=false;//donotnormalizecoordinates

    //BindthearraytothetexturecudaBindTextureToArray(tex,cu_array);

    //Runkerneldim3blockDim(16,16,1);dim3gridDim((width+blockDim.x1)/blockDim.x,(height+blockDim.y1)/blockDim.y,1);kernel(d_data,height,width);

    //UnbindthearrayfromthetexturecudaUnbindTexture(tex);}//endfoo()

    __global__voidkernel(float*odata,intheight,intwidth){unsignedintx=blockIdx.x*blockDim.x+threadIdx.x;

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 11/14

    unsignedinty=blockIdx.y*blockDim.y+threadIdx.y;if(x

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 12/14

    MathematicaCUDALink(http://reference.wolfram.com/mathematica/CUDALink/tutorial/Overview.html)MATLABParallelComputingToolbox,MATLABDistributedComputingServer,[24]and3rdpartypackageslikeJacket..NETCUDA.NET(http://www.casshpc.com/solutions/libraries/cudanet),ManagedCUDA(https://managedcuda.codeplex.com),CUDAfy.NET(http://www.hybriddsp.com).NETkernelandhostcode,CURAND,CUBLAS,CUFFTPerlKappaCUDA(http://psilambda.com/download/kappaforperl),CUDA::Minimal(https://github.com/run4flat/perlCUDAMinimal)PythonNumba,NumbaPro,PyCUDA(http://mathema.tician.de/software/pycuda),KappaCUDA(http://psilambda.com/download/kappaforpython),TheanoRubyKappaCUDA(http://psilambda.com/download/kappaextras)Rgputools(http://brainarray.mbni.med.umich.edu/brainarray/rgpgpu/)

    CurrentandfutureusagesofCUDAarchitecture

    Acceleratedrenderingof3DgraphicsAcceleratedinterconversionofvideofileformatsAcceleratedencryption,decryptionandcompressionDistributedcalculations,suchaspredictingthenativeconformationofproteinsMedicalanalysissimulations,forexamplevirtualrealitybasedonCTandMRIscanimages.Physicalsimulations,inparticularinfluiddynamics.NeuralnetworktraininginmachinelearningproblemsDistributedcomputingMoleculardynamicsMiningcryptocurrencies

    Seealso

    AllineaDDTAdebuggerforCUDA,OpenACC,andparallelapplicationsOpenCLAstandardforprogrammingavarietyofplatforms,includingGPUsBrookGPUtheStanfordUniversitygraphicsgroup'scompilerArrayprogrammingParallelcomputingStreamprocessingrCUDAAnAPIforcomputingonremotecomputersMolecularmodelingonGPU

    References1. Shimpi,AnandLalWilson,Derek(November8,2006)."NVIDIA'sGeForce8800(G80):GPUsRe

    architectedforDirectX10"(http://www.anandtech.com/show/2116/8).AnandTech.RetrievedMay16,2015.2. NVIDIACUDAHomePage(http://www.nvidia.com/object/cuda_home_new.html)3. AbiChahla,Fedy(June18,2008)."Nvidia'sCUDA:TheEndoftheCPU?"

    (http://www.tomshardware.com/reviews/nvidiacudagpu,1954.html).Tom'sHardware.RetrievedMay17,2015.

    4. CUDALLVMCompiler(http://developer.nvidia.com/cuda/cudallvmcompiler)5. FirstOpenCLdemoonaGPU(https://www.youtube.com/watch?v=r1sN1ELJfNo)onYouTube6. DirectComputeOceanDemoRunningonNvidiaCUDAenabledGPU(https://www.youtube.com/watch?

    v=K1I4kts5mqc)onYouTube7. GiorgosVasiliadis,SpirosAntonatos,MichalisPolychronakis,EvangelosP.MarkatosandSotirisIoannidis

    (September2008)."Gnort:HighPerformanceNetworkIntrusionDetectionUsingGraphicsProcessors"(http://www.ics.forth.gr/dcs/Activities/papers/gnort.raid08.pdf)(PDF).Proceedingsofthe11thInternational

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 13/14

    ExternallinksOfficialwebsite(http://www.nvidia.com/object/cuda_home.html)CUDACommunity(https://plus.google.com/communities/114632076318201174454)onGoogle+AlittletooltoadjusttheVRAMsize(https://devtalk.nvidia.com/default/topic/726765/needalittletooltoadjustthevramsize/)

    Retrievedfrom"https://en.wikipedia.org/w/index.php?title=CUDA&oldid=674383050"

    Categories: Computerphysicsengines GPGPU GPGPUlibraries GraphicshardwareNvidiasoftware Parallelcomputing Videocards Videogamehardware

    (http://www.ics.forth.gr/dcs/Activities/papers/gnort.raid08.pdf)(PDF).Proceedingsofthe11thInternationalSymposiumonRecentAdvancesinIntrusionDetection(RAID).

    8. Schatz,M.C.,Trapnell,C.,Delcher,A.L.,Varshney,A.(2007)."HighthroughputsequencealignmentusingGraphicsProcessingUnits"(http://www.biomedcentral.com/14712105/8/474).BMCBioinformatics.8:474:474.doi:10.1186/147121058474(https://dx.doi.org/10.1186%2F147121058474).PMC2222658(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2222658).PMID18070356(https://www.ncbi.nlm.nih.gov/pubmed/18070356).

    9. Manavski,SvetlinA.GiorgioValle(2008)."CUDAcompatibleGPUcardsasefficienthardwareacceleratorsforSmithWatermansequencealignment"(http://www.biomedcentral.com/14712105/9/S2/S10).BMCBioinformatics9:S10.doi:10.1186/147121059S2S10(https://dx.doi.org/10.1186%2F147121059S2S10).PMC2323659(https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2323659).PMID18387198(https://www.ncbi.nlm.nih.gov/pubmed/18387198).

    10. PyritGoogleCodehttps://code.google.com/p/pyrit/11. UseyourNvidiaGPUforscientificcomputing(http://boinc.berkeley.edu/cuda.php),BOINCofficialsite

    (December18,2008)12. NvidiaCUDASoftwareDevelopmentKit(CUDASDK)ReleaseNotesVersion2.0forMACOSX

    (http://developer.download.nvidia.com/compute/cuda/sdk/website/doc/CUDA_SDK_release_notes_macosx.txt)13. CUDA1.1NowonMacOSX(http://news.developer.nvidia.com/2008/02/cuda11nowo.html)(Posted

    onFeb14,2008)14. Silberstein,MarkSchuster,AssafGeiger,DanPatney,AnjulOwens,JohnD.(2008).Efficientcomputation

    ofsumproductsonGPUsthroughsoftwaremanagedcache.Proceedingsofthe22ndannualinternationalconferenceonSupercomputingICS'08.pp.309318.doi:10.1145/1375527.1375572(https://dx.doi.org/10.1145%2F1375527.1375572).ISBN9781605581583.

    15. NVCCforcesc++compilationof.cufiles(https://devtalk.nvidia.com/default/topic/508479/cudaprogrammingandperformance/nvccforcesccompilationofcufiles/#entry1340190)

    16. C++keywordsonCUDACcode(http://stackoverflow.com/questions/15362678/ckeywordsoncudaccode/15362798)

    17. "CUDAEnabledProducts"(http://www.nvidia.com/object/cuda_learn_products.html).CUDAZone.NvidiaCorporation.Retrieved20081103.

    18. Whitehead,NathanFitFlorea,Alex."Precision&Performance:FloatingPointandIEEE754ComplianceforNVIDIAGPUs"(https://developer.nvidia.com/sites/default/files/akamai/cuda/files/NVIDIACUDAFloatingPoint.pdf)(PDF).Nvidia.RetrievedNovember18,2014.

    19. ALUsperformonlysingleprecisionfloatingpointarithmetics.Thereis1doubleprecisionfloatingpointunit.

    20. Nomorethanoneschedulercanissue2instructionsatonce.ThefirstschedulerisinchargeofthewarpswithanoddIDandthesecondschedulerisinchargeofthewarpswithanevenID.

    21. AppendixF.FeaturesandTechnicalSpecifications(http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf)PDF(3.2MiB),Page148of175(Version5.0October2012)

    22. PyCUDA(http://mathema.tician.de/software/pycuda)23. pycublas(http://kered.org/blog/20090413/easypythonnumpycudacublas/)24. "MATLABAddsGPGPUSupport"(http://www.hpcwire.com/features/MATLABAddsGPGPUSupport

    103307084.html).20100920.

  • 8/6/2015 CUDAWikipedia,thefreeencyclopedia

    https://en.wikipedia.org/wiki/CUDA 14/14

    Thispagewaslastmodifiedon3August2015,at15:46.TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmayapply.Byusingthissite,youagreetotheTermsofUseandPrivacyPolicy.WikipediaisaregisteredtrademarkoftheWikimediaFoundation,Inc.,anonprofitorganization.