Download - The Complete HLSL Reference - Attila Project - AttilaWiki

The COMPLETEHLSL Reference

Sebastien St-Laurent

All the information developers need from shader assembly instruction

to a complete HLSL intrinsic function overview covering

up to shader model 3.0. Presented in a quick and

easy to access format!

� TheCOMPLETEHLSLReference

Vertex and Pixel Shader Assembly Language

AlthoughthedevelopmentofassemblylevelshadersisraretodaybecauseoftheHLSLlanguage,aproperunderstand-

ingoftheunderlayingassemblyinstructionsetsiscoretothedevelopmentanddebuggingofefficientshaders.

Both the vertex and pixel shader assembly languages arecomposed of setup and assembly instructions. Throughoutthefollowingpagesyouwillfindtablescontainingasummatyofbothtypesofinstructionsalongwithadescriptionoftheirfunction, parameter information, performance indicationsandsupportofvariousshaderversions.

Beforestarting,Ihavetomakeanoteinregardstosomeoftheflowcontrolinstructions.Forexample,theif_compin-struction defines flow control based on a comparison wherethe_compcomponentwillbereplacedbyacomparisonopera-tion.Thetablebelowsummarizesalltheoperationswhichcanbeusedwithinflowcontrol:

Flow Control Comparison ModesMode Description_gt Greater_lt Lesser_ge Greaterorequal_le Lesserorequal_eq Equal_ne Notequal

Inaddition,thefollowingtablescontainsomeindicationof the support of each instruction on every vertex and pixelshader version using a color coded system. The table belowsummarizesthecoding:

Color DescriptionRequireslessthan4instructionsslots.Requiresbetween4and7instructionslots.Required8ormoreinstructionslots.Unsupported.

Imustalsomakeafinalnoteinregardstotheperformancevalues indicated over the next pages. The quoted values arebasedontheDirectXspecificationsandindicatetheworstcaseperformancethatmustbeensuredbytherenderinghardware.This means that certain hardware implementations may ex-ecutecertaininstructionsfasterthanperscribedbythespeci-fications.

TheCOMPLETEHLSLReference �

Vertex Shader AssemblyName Description 1.1 �.0 �.x �.0

dcl_usage Declareinputregisters.

def Defineconstants.

defb Definebooleanconstants.

defi Defineintegerconstants.

label Defineflowcontrollabels.

vs Retrievetheshaderversion.

Name 1.1 �.0 �.x �.0

Description

AdditionalParameterInfo

abs dst, src 1 1 1

Absolutevalueofthesrc.

add dst, src0, src1 1 1 1 1

Addsrc0andsrc1together.

break 1 1

Breakoutofalooporrepblock.

break_comp src0, src1

� �

Conditionallybreakoutofalooporrepblockbasedonascalarconditional.

_comp=comparisonmode.src0,src1=sourceregisters.

breakp [!]p0.{x|y|z|w}

� �

Breakoutofalooporrepblockbasedonapredicate.

[!]=optionalNOToperator.p0=isapredicateregister.{x|y|z|w}=requiredreplicateswizzle.

call l# � � �

Callasubroutinedefinedbyalabel.

l#=labeltosubroutine.

callnz_bool l#, [!]b#

� � �

Callasubroutineifregisterisnonzero.

l#=labeltocall.[!]=optionalNOToperator.b#=booleanregister.

callnz_pred l#, [!]p0.{x|y|z|w}

� �

Callasubroutinebasedonapredicate.

l#=labeltocall.[!]=optionalNOToperator.p0=predicateregister.{x|y|z|w}=requiredreplicateswizzle.

crs dst, src0, src1 � � �

Crossproductofsrc0andsrc1.

dp3 dst, src0, src1 1 1 1 1

Three-componentdotproduct.

dp4 dst, src0, src1 1 1 1 1

Four-componentdotproduct.

4 TheCOMPLETEHLSLReference

Name 1.1 �.0 �.x �.0

Description


dst dst, src0, src1 1 1 1 1

Calculatelightingattenuationvector.

src0=(ignored,d×d,d×d,ignored).src1=(ignored,1/d,ignored,1/d).dst=(1,d,d×d,1/d).

else 1 1 1

Beginanelseblock.

endif 1 1 1

Endanif/elseblock.

endloop � � �

Endaloopblock.

endrep � � �

Endarepblock.

exp dst, src 10 1 1 1

Fullprecision�x.

expp dst, src 1 1 1 1

Partialprecision(16-bits)�x.

frc dst, src 1 1 1 1

Fractionalcomponentofinput.vs_1_1 can only write to .y and .xy.

if bool � � �

Beginanif/else/endifblock.

bool=booleanregister.

if_comp src0, src1 � �

Beginanifblockwithacomparison.

_comp=comparisonmode.src0,src1=sourceregisters.

if [!]pred.{x|y|z|w} � �

Beginanifblockwithapredicate.

[!]=optionalNOToperator.pred=predicateregister.{x|y|z|w}=requiredreplicateswizzle.

lit dst, src 1 � � �

Partiallightingcalculation.

dst=destinationregister.src=(N•L,N•H,unused,exponent).

log dst, src 10 1 1 1

Fullprecisionlog�(x).

logp dst, src 1 1 1 1

Partialprecision(16-bits)log�(x).

loop aL, i# � � �

Startaloopblock.

aL=loopcountingregister.i#=constantintegerregister.

lrp dst, src0, src1, src2

� � �

Linearlyinterpolatesbetweensrc1andsrc2usingsrc0([0..1]range).

m3x2 dst, src0, src1

� � � �

Productofvectoranda�×�matrix.


Name 1.1 �.0 �.x �.0

Description



� � � �

Productofavectoranda�×�matrix.


4 4 4 4

Productofavectoranda4×�matrix.


� � � �

Productofavectoranda�×4matrix.


4 4 4 4

Productofavectoranda4×4matrix.

mad dst, src0, src1, src2

1 1 1 1

Multipliessrc0andsrc1thenaddssrc2.

max dst, src0, src1 1 1 1 1

Maximumofsrc0andsrc1.

min dst, src0, src1 1 1 1 1

Minimumofsrc0andsrc1.

mov dst, src 1 1 1 1

Movedatafromoneregistertoanother.

mova dst, src 1 1 1

Movedatatoanaddressregister.

mul dst, src0, src1 1 1 1 1

Multipliessrc0andsrc1together.

nop 1 1 1 1

Nooperation.

nrm dst, src � � �

Normalizestheinput(src/length(src)).

pow dst, src0, src1 � � �

Returnssrc0src1.

rcp dst, src 1 1 1 1

Reciprocaloftheinput(1/src).

rep i# � � �

Startsarepblock.

I#=integerregisterwithrepeatcountinxcomponent.

ret 1 1 1

Endsubroutineormain.

rsq dst, src 1 1 1 1

Reciprocalsquareroot(1/sqrt(src)).

setp_comp dst, src0, src1

1 1

Setthepredicateregister.

_comp=comparisonmode.dst=destinationpredicateregister.src0,src1=sourceregisters.

sge dst, src0, src1 1 1 1 1

Greaterthanorequalcompare.

sgn dst, src0, src1, src2

� � �

Returnsthesignofthesrc0.

dst=-1fornegativeand1forpositive.src0=sourceregisters.src1,src�=temporaryregistersusedforcalculation.


Name 1.1 �.0 �.x �.0

Description


sincos dst.{x|y|xy}, src0.{x|y|z|w}, src1, src2

8 8 8

Computesboththesineandcosine.

dst=destinationregister(.xcontainssinand.ycontainscosandhastopointtoatemporaryregister).src0=inputanglesourceregister.src1=D3DSINCOSCONST1.src�=D3DSINCOSCONST2.

slt dst, src0, src1 1 1 1 1

Lessthancompare.

sub dst, sub0, sub1 1 1 1 1

Subtractssrc1fromsrc0.

texldl dst, src0, src1 �,�forCubemap

Loadsatexturesample.

dst=destinationregister.src0=texturecoordinate.src1=texturesampler.

Pixel Shader AssemblyName Description 1.x �.0 �.x �.0

dcl Declareinputregisters.

dcl_samplerType Declaretexturedimensionsforasampler.

phase Transitionbetweenphase1andphase�shadercode.

def Defineconstants.

defb Definebooleanconstants.

defi Defineintegerconstants.

label Definelabels.

ps Retrievetheshaderversion.

Name 1.1 1.� 1.� 1.4 �.0 �.x �.0

Description

ParameterInformation

abs dst, src 1 1 1Returntheabsolutevalueofsrc.

add dst, src0, src1

1 1 1 1 1 1 1Addsrc0andsrc1together.

bem dst.rg, src0, src1

�Applyafakebumpmapingtransform.dst=destinationregister(.rgonly).src0,src1=sourceregisters.

break 1 1Breakoutofalooporrepblock.

TheCOMPLETEHLSLReference 7

Name 1.1 1.� 1.� 1.4 �.0 �.x �.0

Description


break_comp src0, src1

� �Conditionalbreakoutofalooporrep.comp=comparisonmode.src0,src1=sourceregisters.

breakp [!]p0.{x|y|z|w}

� �Predicatebreakoutofalooporrep.[!]=optionalNOToperator.p0=predicateregister.{x|y|z|w}=requiredreplicateswizzle.

call l# � �Callthespecifiedsubroutine.l#=labeltocall

callnz l#, [!]b# � �Callasubroutineonanonzeroboolean.l#=labeltocall.[!]=optionalNOToperator.b#=booleanregister.

callnz l#, [!]p0.{x|y|z|w}

� �Callasubroutineonapredicate.l#=labeltocall.[!]=optionalNOToperator.p0=predicateregister.{x|y|z|w}=requiredreplicate.

cmp dst, src0, src1

� � 1 1 1 1Comparesourcevectorto0.

cnd dst, src0, src1, src2

1 1 1 1Comparesrc0vectorto0.�,returningeithersrc1orsrc2.

crs dst, src0, src1

� � �Crossproductofsrc0andsrc1.

dp2add dst, src0, src1, src2.{x|y|z|w}

� � ��Ddotproductwithscalaradditiontotheresult.

dp3 dst, src0, src1

1 1 1 1 1 1 1Three-componentvectordotproduct.

dp4 dst, src0, src1

� � 1 1 1 1Four-componentdotproduct.

dsx dst, src 1 1Rateofchangeinthexdirection.

dsy dst, src 1 1Rateofchangeintheydirection.

else 1 1Beginanelseblock.

endif 1 1Endanif/elseblock.


Name 1.1 1.� 1.� 1.4 �.0 �.x �.0

Description


endloop � �Endaloop/endloopblock.

endrep � �Endarepeatblock.

exp dst, src 1 1 1Fullprecision�x.

frc dst, src 1 1 1Returnthefractionalofsrc.

if bool � �Beginanif /else/endifblock.bool=booleanregister.

if_comp src0, src1

� �Beginanifblock,withacomparison._comp=comparisonmode.src0,src1=sourceregisters.

if_pred [!]pred.{x|y|z|w}

� �Beginanifblockwithapredicate.[!]=optionalNOToperator.pred=predicateregister.{x|y|z|w}=requiredreplicateswizzle.

log dst, src 1 1 1Fullprecisionlog�(x).

lrp dst, src0, src1, src2

1 1 1 1 � � �Interpolationofsrc1andsrc2basedonsrc0(inthe[0..1]range).


� � �Productofavectoranda�×�matrix.


� � �Productofavectoranda�×�matrix.


4 4 4Productofavectoranda4×�matrix.


� � �Productofavectoranda�×4matrix.


4 4 4Productofavectoranda4×4matrix.

mad dst, src0, src1, src2

1 1 1 1 1 1 1Multiplysrc0andsrc1togetheraddingsrc2totheresult.

max dst, src0, src1

1 1 1Maximumbetweensrc0andsrc1.

min dst, src0, src1

1 1 1Minimumbetweensrc0andsrc1.

mov dst, src 1 1 1 1 1 1 1Movedatafromoneregistertoanother.


Name 1.1 1.� 1.� 1.4 �.0 �.x �.0

Description


mul dst, src0, src1

1 1 1 1 1 1 1Multipliessrc0andsrc1together.

nop 1 1 1 1 1 1 1Nooperation.

nrm dst, src � � �Normalizeavector.(src/length(src)).

pow dst, src0, src1

� � �Returnsrc0src1.

rcp dst, src 1 1 1Returnthereciprocalofsrc.

rep i# � �Startarepblock.i#=integerregisterspecifyingtheloopcountinthexcomponent.

ret 1 1Endasubroutineormain.

rsq dst, src 1 1 1Reciprocalsquareroot,or1/sqrt(src).

setp_comp dst, src0, src1

1 1Setthepredicateregister._comp=comparisonmode.dst=destinationpredicateregister.src0,src1=sourceregisters.

sincos dst.{x|y|xy}, src0.{x|y|z|w}, src1, src2

8 8 8Computeboththesineandcosine.dst=destinationregister(.xcontainssinand.ycontainscosandhastopointtoatemporaryregister).src0=inputangle.src1=D3DSINCOSCONST1.src�=D3DSINCOSCONST2.

sub dst, src0, src1

1 1 1 1 1 1 1Subtractsrc1fromsrc0.

tex dst 1 1 1Sampleatexture.

texbem dst, src 1 1 1Applyafakebumpmaptransform.

texbeml dst, src � � �Applyabumpmapwithluminance.

texcoord dst 1 1 1Treattexturecoordinatesascolordata.

texcrd dst, src 1Copytexturecoordinateascolordata.

texdepth dst 1Calculatedepthvalues.


Name 1.1 1.� 1.� 1.4 �.0 �.x �.0

Description


texdp3 dst, src 1 1Three-componentdotproductbetweentextureandthetexturecoordinates.

texdp3tex dst, src

1 1Three-componentdotproductand1Dtexturelookup.

texkill src 1 1 1 1 1 Note1 �Cancelrenderingofpixelsbasedonacomparison.src0=killpixelifanycomponent<0.

texld dst, src 1 1 Note� Note�

Sampleatexture.texldb dst, src0, src1

1 Note� 6Sampletexturewithlevelofdetailbiasfromw-componentofthetexturecoordinates.dst=destinationregister.src0=texturecoordinates.src1=sampler.

texldd dst, src0, src1, src2, src3

� �Sampletexturewithgradient.dst=destinationregister.src0=texturecoordinates.src1=sampler.src�=xgradient.src�=ygradient.

texldl dst, src0, src1

Note6

SampletexturewithLODfromw-component.dst=destinationregister.src0=texturecoordinates.src1=sampler.

Texldp dst, src0, src1

1 Note4 Note7

Sampletexturewithprojectivedividebyw-component.dst=destinationregister.src0=texturecoordinates.src1=sampler.

texm3x2depth dst, src

1Calculateper-pixeldepthvalues.

texm3x2pad dst, src

1 1 1Firstrowmatrixmultiplyofatwo.

texm3x2tex dst, src

1 1 1Finalrowmatrixmultiplyofatwo.

texm3x3 dst, src 1 1�×�matrixmultiply.


Name 1.1 1.� 1.� 1.4 �.0 �.x �.0

Description


texm3x3pad dst, src

1 1 1Firstorsecondrowmultiplyofthree.

texm3x3spec dst, src0, src1

1 1 1Finalrowmultiplyofthree.

texm3x3tex dst, src

1 1 1Texturelookupusinga�×�matrixmultiply.

texm3x3vspec dst, src

1 1 1Texturelookupusinga�×�matrixmultiply,withnonconstanteye-rayvector.

texreg2ar dst, src

1 1 1Sampleatextureusingthealphaandredcomponents.

texreg2gb dst, src

1 1 1Sampleatextureusingthegreenandbluecomponents.

texreg2rgb dst, src

1 1Sampleatextureusingthered,greenandbluecomponents.

Note 1:IfD3DPS20CAPS_NOTEXINSTRUCTIONLIMITisset,slots=�;otherwiseslots=1.

Note 2: If D3DPS20CAPS_NOTEXINSTRUCTIONLIMIT is set andthetextureisacubemap,slots=4;otherwiseslot=1.

Note 3:IfD3DPS20CAPS_NOTEXINSTRUCTIONLIMITisset,slots=6;otherwiseslots=1.

Note 4: IfD3DPS20CAPS_NOTEXINSTRUCTIONLIMIT isnot set,slots=1;otherwise: if the texture is a cubemap, slots=4; if thetextureisnotacubemap,slots=�.

Note 5:Ifthetextureisacubemap,slots=4;otherwiseslots=1.

Note 6:Ifthetextureisacubemap,slots=�;otherwiseslots=�.

Note 7:Ifthetextureisacubemap,slots=4;otherwiseslots=�.

1� TheCOMPLETEHLSLReference

HLSL Intrinsic Functions

Tosimplifythedevelopmentofshaders,theHLSLshadinglanguage introduces a high-level programming scheme

similartoC.Thissimplifiesthedeveloperstasksandtaketheirburdenawayfrommanual instructionoptimizationandreg-isterallocation.

ThefollowingtablecontainsacompleteoverviewofalltheHLSL intrinsic functionality.The table contains informationaboutthefunctionsuchasadescriptionofitsuse,parameteroverview,performanceconsiderationandanindicationoftheminimalshaderversionrequiredtoexecutethefunction.

Keep in mind that performance figures given below arebasedonasimpleshaderusingonlythespecifiedinstructioninordertoavoidoptimizations.Thisshouldserveasaworstcasescenarioas realusecaseswillgenerally result inbetterperformingshaders.

Name VS PS Performance

Parameters

Description

ret abs(x) 1.1 1.4 Implementedinoneinstructionslot.

x | in | scalar, vector or matrix | float, intret | out | same as input x | same as input x

Returntheabsolutevalueofx.

ret acos(x) 1.1 �.0 Taylorseriestaking17instructionslots.

x | in | scalar, vector or matrix | floatret | out | same as input x | float

Returnthearccosinevaluex(validfor[-1,1]).

ret all(x) 1.1 1.4 Implementedusing�instructionslots.

x | in | scalar, vector or matrix | float, int, boolret | out | scalar | bool

Testifallthecomponentsofxarenonzero.

ret any(x) 1.1 1.4 Implementedusing�instructionslots.

x | in | scalar, vector or matrix | float, int, boolret | out | scalar | bool

Testsifanyofthecomponentsofxarenonzero.

ret asin(x) 1.1 �.0 Taylorseriesusing18instructionslots.


Returnthearcsinevalueofx (validfor[-π/�,π/�]).

ret atan(x) 1.1 �.0 Taylorseriesusing�1instructionslots.


Returnthearctanvalueofx(resultrange[-π/�,π/�]).

ret atan2(x, y) 1.1 �.0 Taylorseriesusing��instructionslots.

x | in | scalar, vector or matrix | floaty | in | same as input x | floatret | out | same as input x | float

Returnthearctanofx/y(resultrange[-π/�,π/�]).

TheCOMPLETEHLSLReference 1�


Parameters

Description

ret ceil(x) 1.1 �.0 Implementedwiththefrcinstructionandtakes�instructionslotsbutmaytakemoreon1.xhardware.


Returntheintegergreaterorequaltox.

ret clamp(ret, min, max)

1.1 1.4 Implementedusingtheminandmaxinstructionstaking�instructionslots.

x | in | scalar, vector or matrix | float, intmin | in | same as input x | same as input xmax | in | same as input x | same as input xret | out | same as input x | same as input x

Returnstheinputxclampedtotherange[min,max].

clip(x) N/A 1.1 RemovesthebenefitsofearlyZrejection.

x | in | scalar, vector or matrix | float

Thisfunctionwilldiscardthecurrentpixelifanyofcomponentsofxarelessthanzero.

ret cos(x) 1.1 �.0 Taylorseriesusing1�instructionslots.


Returnthecosineofx.

ret cosh(x) 1.1 �.0 Numericalapproximationusingtheexpinstruction.Uses8�instructionsslotson1.xhardware,11slotson�.0+hardware.


Returnthehyperboliccosineofx.

ret cross(x,y) 1.1 1.4 Emulatedon1.xhardwareusing4slots.

x | in | vector | floaty | in | vector | floatret | out | vector | float

Executethecrossproductbetweenxandy.

ret ddx(x) N/A �.x Verifycapabilitiesbeforeusing.


Partialderivateinthescreenspacexcoordinate.

ret ddy(x) N/A �.x Verifycapabilitiesbeforeusing.


Partialderivateofxinthescreenspaceycoordinate.

ret degrees(x) 1.1 �.0 N/A


Convertxfromradianstodegrees.

ret determinant(x)

1.1 1.4 Implementedusingtheproperequationstakingabout1�instructionslots.

x | in | matrix | floatret | out | scalar | float

Returnthedeterminantofthesquarematrixx.

©�00�byParadoxalPress.

All rights reserved. No part of this bookletmay be reproduced or transmitted in anyformorbyanymeans,electronicormechani-cal,includingphotocopying,recording,orbyany information storage or retrieval systemwithout written permission from ParadoxalPress,exceptfortheinclusionofbriefquota-tionsinareview.

The Paradoxal Press logo and related tradedressaretrademarksofParadoxalPressandmaynotbeusedwithoutwrittenpermission.

Frontcoverandbackgroundillustrationcon-tainsscreenshotsofvariousdemoscreatedbyATITechnologiesInc,usedwiththeirpremis-sion.

Author:SebastienSt-LaurentTechnical Review:WolfgangEngelLayout & Design:SebastienSt-LaurentProofreader:NicoleSt-Laurent

PRINTEDINTHEU.S.A.

$7.99 USA / $9.99 CANADA

ParadoxalPress��81AvondaleRd.NERedmond,WA�80��[email protected]

Use

vel

cro

to a

ffix

this

bo

okle

t to

the

side

of

your

mon

itor.