Principles of Adjoint Coding Thomas J. Kleespies NOAA/NESDIS [email protected].
-
date post
21-Dec-2015 -
Category
Documents
-
view
221 -
download
3
Transcript of Principles of Adjoint Coding Thomas J. Kleespies NOAA/NESDIS [email protected].
This presentation is based on a class I taught a couple of years ago.
Techniques will focus on applications to radiative transfer, but are generally applicable.
This will be a highly technical presentation with little or no science.
Some coding examples will be given in Fortran 90.
Some of this material was briefly presented by Ron Errico. This presentation will go into more detail.
Who am I
• BS, 1974, Atmospheric Sciences, University of Washington
• MS, 1977, Atmospheric Science, Colorado State University
• PhD, 1994, Meteorology, University of Utah
What is my background?• 1978-1984, Research Meteorologist, NEPRF, now
known as NRLMRY– Retrievals, Retrieval Assimilation, Cloud Tracking,
Direct Readout, PW retrieval
• 1984-1993, Research Meteorologist, AFGL, now known as AFRL– Direct Readout, image processing, visualization, cloud
microphysics retrieval
• 1993-Present, Physical Scientist, NOAA/NESDIS– sounding science, fast radiative transfer, adjoint
coding, calibration, instrument characterization, navigation, visualization
• Charter Member International ATOVS Working Group • NESDIS representative to EUMETSAT ATOVS Instrument Functional Chain
Team • Senior Member, Joint Center for Satellite Data Assimilation • Member NPOESS CALVAL Team • Member NPOESS Data Exploitation Team • Member NPOESS Microwave Operational Algorithm Team • Member Sounding Products Oversight Panel • Member Navigation Products Oversight Panel • Member Calibration Products Oversight Panel • Member JCSDA Fast Radiative Transfer Team • Member, IJPS Ground Segment Review Board • Member SSMIS CALVAL Team • Investigator, Annals of Improbable Research
Service Memberships
Misc• Awards:• 2007: Department of Commerce Bronze Medal “For developing an integrated
system for accurately calibrating NOAA-18 instruments that lead to a high quality of operational satellite products. “
• 2003: Department of Commerce Bronze Medal "For enabling early access to improved data for climate and weather applications by augmenting the post-launch checkout of the NOAA-17 sensors."
• 1999: Department of Commerce Silver Medal "For accelerating the use of Advanced Microwave Sounding Unit satellite observations to advance the nation's capability in weather prediction."
• 1998: Department of Commerce Bronze Medal "For pioneering development of optimal assimilation of satellite data into global forecast models and of fast, accurate radiative transfer algorithms."
• 1978: National Aeronautics and Space Administration Group Achievement Award presented to the Ozone Processing Team.
• Visiting Scientist Positions:• 2004: European Organisation for the Exploitation of Meteorological Satellites • 2001: United Kingdom Meteorological Office, Satellite Application Facility
for Numerical Weather Prediction
• Restrict ourselves to passive systems
• Infrared and Microwave only
• Satellite systems only, i.e. upwelling radiation from surface and atmosphere
Radiative Transfer Summary
Radiative Transfer Summary:The Equation of Transfer
I(0,,)
(1 ˜ ())B(T( ))e /
0
s
d
˜ ( )2
I0e /0 P( ,,; 0,0 )e /
0
sd
˜ ( )4 0
2
I( , , )P( ,,; , )e /
1
1
0
s d d d
()B(Ts )e s /
0
I0e
s
/0(,; 0 ,0 )e
s/
e
s/
(,; , )I(s , , )0
1
0
2
d d
AtmosphericThermal EmissionAtmospheric
Solar Single Scattering
Atmospheric Diffuse Multiple Scattering
Surface Emittance
Surface Solar Reflectance
Surface Diffuse Reflectance
Radiative Transfer Summary:Spectral Regions of Importance
I(0,,)
(1 ˜ ())B(T( ))e /
0
s
d
˜ ( )2
I0e /0 P( ,,; 0,0 )e /
0
sd
˜ ( )4 0
2
I( , , )P( ,,; , )e /
1
1
0
s d d d
()B(Ts )e s /
0
I0e
s
/0(,; 0 ,0 )e
s/
e
s/
(,; , )I(s , , )0
1
0
2
d d
IR & w
Near IR
IR, w in precipitation IR & w
Near IR
IR & w
Fast Radiative Transfer
• Full blown radiative transfer far too computationally expensive for data assimilation
• Various approximations and trade-offs made in designing fast radiative transfer models
Fast Radiative Transfer cont.
• Atmospheric thermal emission easiest– multiple regression used to fit spectral line-by-
line atmospheric attenuation,– almost as accurate as full line-by-line
spectroscopic calculations while many orders of magnitude faster
• Surface thermal emission easy– surface emissivity problematic, especially over
land
Fast Radiative Transfer, cont.
• Single scattering not very expensive, but not accurate in clouds and precipitation– uncertainty in scattering phase function
• Multiple scattering most expensive– N-stream approximation allows trade between
accuracy and speed– Successive order of scattering approximation
allows trade between accuracy and speed
JCSDA Community Radiative Transfer Model (CRTM)
Nee: Optical Path TransmittanceOPTRAN
• Includes thermal emittance, clouds, simplified scattering, CBR, solar influence, surface emissivity models, support for many instruments
Traditional Fast Transmittance Model
• Interpolate T(P), q(P) to fixed pressure levels
• Predictors T, q
• Include zenith angle as a predictor
• Predictand is transmittance departure or optical depth, multiple linear regression
Optical Path Transmittance (OPTRAN) approach
• Regression on levels of absorber amount• Predictors are a function of T, P, q• Zenith angle implicit in absorber amount• Arbitrary pressure profile permitted• Predictand is absorption coefficient for H2O, O3,
mixed gases• Permits changes to ‘mixed gas’ amounts as well,
e.g. CO2
Heritage
• McMillin, Fleming and Hill (AO,1979)
• McMillin, Crone, Goldberg, Kleespies (AO,1995)
• McMillin, Crone, Kleespies (AO,1995)
• Kleespies, vanDelst, McMillin, Derber (AO, 2004)
• McMillin, Xiong, Han, Kleespies, van Delst (AO, 2006)
OPTRAN performance
• Water vapor channels generally better than RTTOV
• Temperature channels generally not quite as good as RTTOV (before OPTRAN 7 improvements)
RTTOV
• Regression on fixed pressure levels• Predictors are a function of T, P, q• Zenith angle explicitly included as a predictor• Must interpolate state to fixed pressure levels• Predictand is optical depth • Includes clouds, simplified scattering, CBR,
surface emissivity models, support for many instruments
The following slides graciously provided by Dr. Sid Boukabara,
IMG/NOAA contractor.
Mathematically:Mathematically:
Core Retrieval Mathematical Basis
P(X)X)|mP(Y
Main Goal in ANY Retrieval System is to find a vector X with a maximum probability of being the source
responsible for the measurements vector Ym
Main Goal in ANY Retrieval System is to find a vector X: P(X|Ym) is Max
In plain words:In plain words:
Problem reduces to how to maximize:Problem reduces to how to maximize:
Probability PDF Assumed Gaussian around Background X0 with a
Covariance B
0
XX1BT
0XX
21exp
Mathematically:Mathematically:Probability PDF Assumed Gaussian around Background Y(X) with a
Covariance E
Y(X)mY1E
TY(X)mY
21exp
Y(X)mY1E
TY(X)mY
21exp
0XX1B
T0
XX21exp
Maximizing Maximizing
Y(X)mY1ETY(X
)mY2
1exp0XX1BT
0XX21exp
Is Equivalent to Minimizing Is Equivalent to Minimizing
)mY|P(Xln
Which amounts to Minimizing J(X) –also called COST FUNCTION –Same cost Function used in 1DVAR Data Assimilation System
Which amounts to Minimizing J(X) –also called COST FUNCTION –Same cost Function used in 1DVAR Data Assimilation System
Y(X)YEY(X)Y
2
1XXBXX
2
1J(X) m1Tm
01T
0
)mY|P(X
Assumptions Made in Solution Derivation
• The PDF of X is assumed Gaussian
• Operator Y able to simulate measurements-like radiances
• Errors of the model and the instrumental noise combined are assumed (1) non-biased and (2) Normally distributed.
• Forward model assumed locally linear at each iteration.
What’s this adjoint stuff it all about?1DVAR / maximum probability solution is that which minimizes a ‘cost’ or ‘penalty function:
xyyOxyyxxBxxxJ o1Tob1Tb
where xb is an initial estimate given by the model state vector, x is the model state for which the solution is desired, yo is the vector of observations, y(x) is an operator which transforms the model state vector into the same parameters as the observations, and B and O are the background and observational error covariance matrices respectively. For our purposes, y(x) is the radiative transfer operator. Note that O is a combination of observational errors and radiative transfer errors.
What’s it all about: part deux
How do we find the minimum? From first quarter Calculus: Take the first derivative and set it equal to zero.
0o1Tb1 xyyOxKxxBxJ
where K(x) is the matrix of partial derivatives of y(x) with respect to the elements of x. (factor of 2 divides out)
What’s it all about: part trois
It is evident that the solution requires both the forward radiative transfer operator y(x), and the transpose of its derivative, K(x)T . K(x)T is called the adjoint, or Jacobian.
x = { T1, T2, T3, …, Tn, q1, q2, q3, …, qn, …}
y(x) = {R1, R2, R3, …, Rm}T
What’s it all about, part quatre
n
m
n
3
n
2
n
1
2
m
2
3
2
2
2
1
1
m
1
3
1
2
1
1
n
m
n
3
n
2
n
1
2
m
2
3
2
2
2
1
1
m
1
3
1
2
1
1
T
q
R
q
R
q
R
q
R
q
R
q
R
q
R
q
Rq
R
q
R
q
R
q
RT
R
T
R
T
R
T
R
T
R
T
R
T
R
T
RT
R
T
R
T
R
T
R
K(x)
What’s it all about, part cinq
In olden days (say 1990), computation of K(x)T required N+1 forward model calculations using forward (or backward) finite differencing (centered required 2N+1). Thus these techniques were only used in limited studies
In these modern times, using adjoint coding techniques K(x)T can be computed with the effort of about 3 forward model calculations.
What are these all the models?• The tangent linear model is derived from the forward model
- gives the derivative of the radiance with respect to the state vector (vector output, M channels)
• The adjoint is derived from the tangent linear model
- gives the transpose of the derivative of the radiance with respect to the state vector (vector output, N variables)
• The Jacobian is derived from the adjoint model
- gives the transpose of the derivative of the radiance with respect to the state vector by channel (matrix output, NxM)
• Only the forward and the Jacobian models are actually used, but all models must be developed and maintained in order to assure a testing path, and to make sure the performance is correct.
Tropical Temperature derivatives for NOAA 14
0
200
400
600
800
1000
0.00 0.05 0.10 0.15 0.20
dTb/dT (K)
Pres
sure
(hPa
)
HIRS2
HIRS5
HIRS9
HIRS10
HIRS11
HIRS12
HIRS15
Why can’t we just use the Tangent Linear Model?
• You can.
• However, it still takes N TL calculations.
• You avoid the finite differencing because the TL is the analytic derivative, but you just get a vector of radiances for each call. You still have to call it for each element of the input vector.
Testing
• Testing the code is rigorous and analytic• Each code is tested for consistency with the model
from which it was developed• Code is tested bottom up, lowest level routine first.• Full TL model is tested before moving to adjoint• Full Adjoint is tested before moving to Jacobian
Adjoint Compiler
• Giering and others have written compilers that generate TL and adjoint code
• Some people at NCAR swear by them
• Others swear at them: there is a lot of work preparing the code for the compiler
• We feel that better optimization can be achieved by hand coding.
Summary Thusfar
• Quick overview of OPTRAN
• Description of Adjoint and associated models
Comments on Giering&Kaminski(Recipes for Adjoint Code Construction, ACM Trans. Math. Software, 24(4), 1998, also TR212, Max Planck Institut für Meteorologie, 1996)
Pluses and Minuses+ useful for formal aspects of mathematics and coding+ good coding examples
- combines tangent linear and adjoint steps- does not differentiate between adjoint and Jacobian--- does not discuss testing
Nomenclature• Forward Model Variables: Those variables that contain
values used or computed in the forward model (T,p,k,q,)• Tangent Linear Variables: Those variables that contain
differential values computed in the tangent linear model. • Active variables: “Variables depending on the control
variables and having an influence on the cost function are called active” (ex, temperature, moisture, OPTRAN absorber amounts in weighted predictors)
• Passive variable: those not active. (e.g. constants, OPTRAN absorber amount coordinate system, orbital elements, loop indices)
• Perturbed Forward Model: Forward model called with the input vector perturbed by the TL input vector (+-)
Recommended TL Naming Conventions
There is no ‘standard’ naming convention. Here is what I recommend:
• Keep forward model variable names the same
• Append “ _TL” to forward model variable and routine names to describe tangent linear variables and routines
Tangent Linear Coding Rules
• Only assignment statements need to be differentiated
• There must be one TL routine for each forward model routine
• Generally the full forward model is invoked before the TL
• Accommodations must be made to carry required forward model variables to the TL code
Tangent Linear Coding ExampleEquation:
y = a + bx + cx2 + dx3 +ez1/2
Differential:
y’ = bx’ + 2cxx’ + 3dx2x’ + 1/2ez -1/2z’
Forward Code:
Y = A + B*X + C*X**2. + D*X**3. + E*SQRT(Z)
TL Code:
Y_TL = B*X_TL + 2.*C*X*X_TL + 3.*D*X**2.*X_TL &
+ .5*E*Z**(-.5)*Z_TL
Y_TL = B*X_TL + 2.*C*X*X_TL + 3.*D*X**2.*X_TL &
* .5*E*Z**(-.5)*Z_TL
Need X and Z as well as X_TL and Z_TL to satisfy this code fragment.
Depending on speed/memory tradeoffs:
1) Re-compute forward variable (can make testing tricky if you create an error in re-coding the forward calculation)
2) Store on a temporary file (IO bound, bookkeeping)
3) Store in memory (cleanest method if not too many ACTIVE forward variables)
Forward Model Variables in TL Code
What the heck are these TL variables?
• TL output variables are the derivative of the forward model output with respect to each element of the input variables. e.g.
• Internal TL variables represent local derivatives. The chain rule sorts the results out in the end.
• See Errico’s presentation for formalism.
185 qTb
Pitfalls: Conditionals• Logical tests in the forward model MUST be
carried to the TL model using the ACTIVE FORWARD MODEL VARIABLES
Forward example:
IF(T > 273.) THEN
Q = T**2.
ELSE
Q = 2.*T
ENDIF
TL example:
IF(T > 273.) THEN ! NOT T_TL
Q_TL = 2.*T*T_TL
ELSE
Q_TL = 2.*T_TL
ENDIF
General TL Rules
• If it has an equal sign in it, differentiate it.
• If it doesn’t, leave it alone.
• Only exception is subroutines & function names, etc. which must be renamed (_TL)
Testing TL Model consistency with Forward Model
1,xTLM
)FM()ΔFM(lim
0
x
xxxx
This looks a lot like the definition of the derivative.
FM(x) = Forward model acting on xFM(x+x)=perturbed Forward model acting on x+xTL(x,x)= Tangent Linear model acting on x (at x)
Testing TL Model consistency with Forward Model, part zwei
• It is best to write and test the TL of each routine before going to the next
• Start with the bottom most routine and work up • Guidelines for limit test:
– Call the forward model first. Keep results. (exception as TBD later)– Pick an initial increment of 10% of the forward model input values– Write an outer loop that halves the increment, 10-15 iterations– Apply the increment independently in both a positive and negative
sense to both the perturbed forward model and the TL model. Do this for ALL variables/levels at a time.
– Calculate the ratio and observe how it approaches unity from both sides. If it converges to something other than unity, or doesn’t converge, start checking your code, and/or check precision.
Subroutine Planck_TL(V,Temp,Temp_TL,B, B_TL)
! Computes Planck TL radiance at wavenumber V, Radiance B, ! temperature Temp and Temp_TL
Implicit None
Real V ! Input wavenumberReal B ! Input Radiance mW m-2 sr-1 (cm-1)-1Real Temp ! Input Temperature (K)Real Temp_TL ! Input TL TemperatureReal B_TL ! Output TL Radiance
Real C1 /1.1905e-5/Real C2 /1.4385/
!forward model code included as a comment for reference!B = (C1*v*v*v)/(Exp(C2*V/Temp) - 1.) B_TL = C2*V*B*B*Temp_TL*exp(C2*V/Temp)/(C1*V*V*V*Temp*Temp)
ReturnEnd Subroutine Planck_TL
! Code fragment from testing routine… most of the work done here.
Call Planck(Vnu(Ichan),Temp,B) ! Compute forward model radiance
Temp_TL(1) = -Temp / 10. ! initial value negative increment Temp_TL(2) = Temp / 10. ! initial value positive increment
Write(6,*) ' HIRS channel ',Ichan ! This just prints a header Write(2,*) ' HIRS channel ',Ichan Write(6,6110) " Iter Neg dx Pos dx Neg ratio Pos ratio" Write(2,6110) " Iter Neg dx Pos dx Neg ratio Pos ratio"
Do i = 1 , niter ! outer loop emulates taking the limit
! Compute TL values asymptotically Do isign = 1 , 2 ! inner loop delta x -> 0 +- Call Planck(Vnu(Ichan),Temp+Temp_TL(isign),BP(Isign)) ! perturbed forward model Call Planck_TL(Vnu(Ichan), Temp,Temp_TL(isign), B, B_TL(Isign)) ! tangent linear model Ratio(isign) = (BP(Isign) - B ) / B_TL(Isign) ! ratio EndDo
Write(6,6120) i, Temp_TL,Ratio(1),Ratio(2) Write(2,6120) i, Temp_TL,Ratio(1),Ratio(2)
Temp_TL(1) = Temp_TL(1) * 0.5 ! now halve the input TL temperature and repeat Temp_TL(2) = Temp_TL(2) * 0.5
EndDo
HIRS channel 16 Iter Neg dx Pos dx Neg ratio Pos ratio 1 -25.000000000 25.000000000 0.590156871 1.729834400 2 -12.500000000 12.500000000 0.764005950 1.315447047 3 -6.250000000 6.250000000 0.873208093 1.146620850 4 -3.125000000 3.125000000 0.934268719 1.070686369 5 -1.562500000 1.562500000 0.966532956 1.034705682 6 -0.781250000 0.781250000 0.983113900 1.017195751 7 -0.390625000 0.390625000 0.991518526 1.008558887 8 -0.195312500 0.195312500 0.995749622 1.004269732 9 -0.097656250 0.097656250 0.997872396 1.002132442 10 -0.048828125 0.048828125 0.998935594 1.001065616 11 -0.024414062 0.024414062 0.999467646 1.000532657 12 -0.012207031 0.012207031 0.999733785 1.000266291 13 -0.006103516 0.006103516 0.999866883 1.000133136 14 -0.003051758 0.003051758 0.999933439 1.000066566 15 -0.001525879 0.001525879 0.999966719 1.000033282
Example of Output of testing routine(Your results may vary)
1: ratio approaches 1 from both sides, but do not expect negative to approach from below.2: as perturbation enters linear regime, ratio goes half the distance to the goal line each iteration3: If you do too many iterations, ratio may get strange because of machine precision. Try DP.
Test_Bright_TL output Single Precision HIRS channel 16 Iter Neg dx Pos dx Neg ratio Pos ratio 1 -0.054812677 0.054812677 1.044734240 0.960472047 2 -0.027406339 0.027406339 1.021662235 0.979655027 3 -0.013703169 0.013703169 1.010679841 0.989705265 4 -0.006851585 0.006851585 1.005261421 0.994774103 5 -0.003425792 0.003425792 1.002581358 0.997454226 6 -0.001712896 0.001712896 1.001183033 0.998852491 7 -0.000856448 0.000856448 1.000250816 0.999318600 8 -0.000428224 0.000428224 1.001183033 0.999318600 9 -0.000214112 0.000214112 0.999318600 0.999318600 10 -0.000107056 0.000107056 0.999318600 0.999318600 11 -0.000053528 0.000053528 0.999318600 0.999318600 12 -0.000026764 0.000026764 1.014233828 1.014233828 13 -0.000013382 0.000013382 1.014233828 1.014233828 14 -0.000006691 0.000006691 0.954572976 0.954572976 15 -0.000003346 0.000003346 0.954572976 0.954572976
Example of something ‘apparently’ going wrong
Test_Bright_TL output Double Precision HIRS channel 16 Iter Neg dx Pos dx Neg ratio Pos ratio 1 -0.054812675 0.054812675 1.044735123 0.960478588 2 -0.027406337 0.027406337 1.021643085 0.979654926 3 -0.013703169 0.013703169 1.010650418 0.989673749 4 -0.006851584 0.006851584 1.005283591 0.994797430 5 -0.003425792 0.003425792 1.002631531 0.997388722 6 -0.001712896 0.001712896 1.001313217 0.998691846 7 -0.000856448 0.000856448 1.000655973 0.999345292 8 -0.000428224 0.000428224 1.000327828 0.999672488 9 -0.000214112 0.000214112 1.000163875 0.999836205 10 -0.000107056 0.000107056 1.000081927 0.999918092 11 -0.000053528 0.000053528 1.000040961 0.999959044 12 -0.000026764 0.000026764 1.000020480 0.999979521 13 -0.000013382 0.000013382 1.000010240 0.999989761 14 -0.000006691 0.000006691 1.000005120 0.999994880 15 -0.000003346 0.000003346 1.000002560 0.999997440
Switch to DP reveals problem is machine precision
Testing TL Model consistency with Forward Model, part drei
• The previous simple example was easy to test because only one output TL variable
• Ex: If you have a TL input profile with surface variables:– Perturb all input variables at once.– Sometimes it is useful to perturb only a part of the input
vector at once, e.g. Temperature only, or surface parameters only. This will help isolate any the errors.
• In general, test each TL output variable• How this is done depends upon each routine. • If you have stored forward model variables in COMMON, be
careful about getting them mixed with perturbed forward model variables.(call perturbed forward first, then straight forward, then TL).
Testing TL Model consistency with Forward Model, part vier
Tip: If you have a lot of TL inputs, string them into a vector for easy handling using F90 vector operations:Real T(40),q(40),o3(40),tsfc,emissReal FMBuf(122)Equivalence(FMBuf(1), T )Equivalence(FMBuf(41), Q )Equivalence(FMBuf(81), O3 )Equivalence(FMBuf(121), tsfc)Equivalence(FMBuf(122), emiss)Real T_TL(40),q_TL(40),o3_TL(40),tsfc_TL,emiss_TLReal TLBuf(122)Equivalence(TLBuf(1), T_TL )Equivalence(TLBuf(41), Q_TL )Equivalence(TLBuf(81), O3_TL )Equivalence(TLBuf(121), tsfc_TL)Equivalence(TLBuf(122), emiss_TL)TLBuf = FMBuf * .1
Then construct the ratio from the selected element of the output. This method will also make Adjoint testing easier, as we will see.
(Yeah, I know, EQUIVALENCE is archaic in Fortran90)
What good are adjoints?
If your pickup truck is broken, your girl has left you,
and your dog has died:
Using adjoint techniques, you can :
• fix your pickup, • get your girl back, • and bring your dog back to life, as long as they
have been properly linearized. (at least in theory)
Adjoint coding objective
• To make the linearized (TL) code run backwards.• E.g.: TL code inputs linearized temperature profile
and outputs linearized brightness temperature• Adjoint code inputs linearized brightness
temperatures and outputs linearized temperature profile
• Note that I often interchange ‘linearized’ and ‘derivative’
Our objective is the Jacobian
n
m
n
3
n
2
n
1
2
m
2
3
2
2
2
1
1
m
1
3
1
2
1
1
n
m
n
3
n
2
n
1
2
m
2
3
2
2
2
1
1
m
1
3
1
2
1
1
T
q
R
q
R
q
R
q
R
q
R
q
R
q
R
q
Rq
R
q
R
q
R
q
RT
R
T
R
T
R
T
R
T
R
T
R
T
R
T
RT
R
T
R
T
R
T
R
K(x)
Recommended AD Naming Conventions
There is no ‘standard’ naming convention. Here is what I recommend:
• Keep forward model variable names the same
• Append “ _AD” to forward model variable and routine names to describe adjoint variables and routines
How do we derive the Adjoint Code
• By taking the transpose of the Tangent Linear Code
• It’s that simple.
Huh?
Well, maybe it’s not quite that simple.
Adjoint Coding Rules• Call forward model first to initialize forward variables• Reverse the order of TL routine calls• Convert Functions to Subroutines• Reverse the order of active loop indices• Reverse the order of code within loops and routines• Reverse the inputs and outputs of assignment statements• Accumulate the outputs of the assignment statements• Rename TL variables and routines to AD
• Initializing output accumulators is VERY important
Example 1: reverse order of TL routines
TLProgram Main_TL Call Sub1 Call Sub2 Call Sub3
Call Sub1_TL Call Sub2_TL Call Sub3_TLEnd Program Main_TL
AdjointProgram Main_AD Call Sub1 Call Sub2 Call Sub3
Call Sub3_AD Call Sub2_AD Call Sub1_ADEnd Program Main_AD
Example 2: Functions to Subroutines, Reverse code order, reverse assignment I/O&accumulate
TLReal Function Bright_TL(V,Radiance,Radiance_TL,C1,C2)K2 = C2*VK1 = C1*V*V*V
TempTb_TL = K2*Alog(K1/Radiance + 1.)**(-2.) * Radiance_TL/(K1+Radiance) * K1/Radiance (1)
Bright_TL = C2*TempTb_TL (2)
ReturnEnd Function Bright_TL
AdjointSubroutine Bright_AD
(V,Radiance,Radiance_AD,C1,C2,Bright_AD)
K2 = C2*VK1 = C1*V*V*V !inactive constantsTempTb_AD = 0 ! initialize for each
invocationTempTb_AD = TempTb_AD +
C2*Bright_AD (2)
Radiance_AD = Radiance_AD + K2*Alog(K1/Radiance + 1.)**(-
2.) * TempTb_AD/(K1+Radiance) * K1/Radiance (1)
ReturnEnd Subroutine Bright_AD
Example 3 – from Compbright_AD:Reverse inputs and outputs of assignments & accumulate
Sum_AD = Sum_AD ! Doesn’t do anything, we can toss this statementBs_AD = Bs_AD + Sum_AD *Tau(N,ichan)*Emiss(ichan) Tau_AD(N,ichan)= Tau_AD(N,ichan) + Bs(ichan) *Sum_AD *Emiss(ichan) Emiss_AD(ichan)= Emiss_AD(ichan) + Bs(ichan) *Tau(N,ichan)*Sum_AD
Sum_TL = Sum_TL + Bs_TL *Tau(N,ichan) *Emiss(ichan) & + Bs(ichan) *Tau_TL(N,ichan)*Emiss(ichan) & + Bs(ichan) *Tau(N,ichan) *Emiss_TL(ichan)
1 2
3
4
Accumulate Reverse inputs and outputs
Sum = Sum + Bs*Tau(N,ichan)*Emiss(Ichan)
Example 3 revisited ala G&K pg 121m
s
ss
m
s
'
'
'B
'sum
1000
0100
0010
''B''B''1
'
'
'B
'sum
*m
s
s
s
*1m
s
'
'B
'sum
100''B
010''B
001''
0001
'
'
'B
'sum
Taking the transpose and reversing realizations
m is the current realization of the values
TL
AD
This explains why you get four adjoint equations from one tangent linear equation.
Example 4: Reverse indexing of loops
Do I = 1 , Nlevel
B_Tl(I) = T_TL(I)*Tau(I) + T(I)*Tau_TL(I)
EndDo
Do I = Nlevel, 1 , -1
T_AD(I) = T_AD(I) + B_AD(I)*Tau(I)
Tau_AD(I) = Tau_AD(I) + T(I)*B_AD(I)
EndDo
TL
AD
This illustrates reversing loop flow. Doesn’t make any difference for this particular code fragment, but in general it does.
Initializing Accumulators
G&K say zero accumulators after done using them.
However, you have to zero them before you use them the first time,
so just zero them before you start.
AD variables local to a routine should be zeroed there.
Adjoint testing• Objective: Assure that the adjoint is the transpose of the
tangent linear• Method: Construct Jacobians from TL and AD and compare
N inputs -> TL -> M outputsM inputs -> AD -> N outputs
Call TL N times with the ith element=1, all other elements =0Put output into ith row of an NxM array
Call AD M times with the jth element=1, all other elements=0Put output into a jth row of an MxN arrayVerify that AD = TLT to within machine precision
Tangent-Linear Output
X
R
X
R
X
R
X
R K(X) m321
For a single call to TL, output is derivative of each channel radiance with respect to whole input state vector.
Adjoint Output
For a single call to AD, output is derivative of all channel radiances with respect to each element of the input state vector.
n
2
1
n
2
1
T
q
R
q
Rq
RT
R
T
RT
R
)x(K
Filling the Jacobian
We call the TL and AD with all input elements set to zero except one so as to isolate the derivative to a specific element of the Jacobian. This gives the derivative
i
j
x
R
n
m
n
3
n
2
n
1n
2
m
2
3
2
2
2
12
1
m
1
3
1
2
1
11
x
R
x
R
x
R
x
R )K(x
x
R
x
R
x
R
x
R )K(x
x
R
x
R
x
R
x
R )K(x
TL Jacobian Construction
AD Jacobian Construction
n
m
2
m
1
m
n
m
2
m
1
m
Tm
n
2
2
2
1
2
n
2
2
2
1
2
T2
n
1
2
1
1
n
1
2
1
1
1
T1
q
R
q
Rq
RT
R
T
RT
R
)x(K,,
q
R
q
Rq
RT
R
T
RT
R
)x(K,
q
R
q
Rq
RT
R
T
RT
R
)x(K
Machine Precision Considerations
Test that
Abs(TL-AD)/TL < MP
Rule of thumb:
MP = 1.e-7 for Single precision
MP = 1.e-12 for Double precision
Use errors intelligently
• If the AD /= TLT , use the location in the matrix to find the error in the code.
• E.G. if Tsfc_AD /= Tsfc_TT , look where Tsfc_AD is computed for the error.
• Make sure AD variables are initialized to zero.
Adjoint Testing Example! Compute forward model radiance
Call Planck(Vnu(Ichan),Temp,B)
! Compute TL values Temp_TL = 1.0 ! Initialize input B_TL = 0.0 ! Initialize output
Call Planck_TL(Vnu(Ichan),Temp,Temp_TL,B,B_TL) ! tangent linear model
! Compute AD values B_AD = 1.0 ! Initialize input Temp_AD = 0.0 ! Initialize output (accumulator) Call Planck_AD(Vnu(Ichan),Temp,Temp_AD,B,B_AD) ! Adjoint model
! Here the output of the TL is 1x1 and the output of the AD is 1x1,! so Transpose(TL) = AD ==> B_TL = Temp_AD
Write(6,*) B_TL, Temp_AD, B_TL-Temp_AD
Jacobian, or K code development and testing is the easiest of the
three.
The Jacobian code is often called the K code.
The reason is historical.
Our objective is the Jacobian
n
m
n
3
n
2
n
1
2
m
2
3
2
2
2
1
1
m
1
3
1
2
1
1
n
m
n
3
n
2
n
1
2
m
2
3
2
2
2
1
1
m
1
3
1
2
1
1
T
q
R
q
R
q
R
q
R
q
R
q
R
q
R
q
Rq
R
q
R
q
R
q
RT
R
T
R
T
R
T
R
T
R
T
R
T
R
T
RT
R
T
R
T
R
T
R
K(x)
Adjoint Output
For a single call to AD, output is derivative of all channel radiances with respect to each element of the input state vector.
n
2
1
n
2
1
T
q
R
q
Rq
RT
R
T
RT
R
)x(K
We need to distribute the adjoint level derivatives through the
number of channels.
How to do this
• Examine the outputs of the AD.• Add a channel dimension to those that do
not have one.• Add a channel loop if necessary.
• In general, low level routines K code is same as AD. Save channel loop for higher level routines.
Recommended Jacobian Naming Conventions
There is no ‘standard’ naming convention. Here is what I recommend:
• Keep forward model variable names the same
• Append “ _K” to forward model variable and routine names to describe Jacobian variables and routines
Subroutine Planck_AD(V,Temp,Temp_AD, B, B_AD)! Computes Planck AD wavenumber V, Radiance B, temperature Temp and B_ADImplicit NoneReal V ! Input wavenumberReal B ! Input Radiance mW m-2 sr-1 (cm-1)-1Real Temp ! Input Temperature (K)Real Temp_AD ! Output AD TemperatureReal B_AD ! Input AD RadianceReal C1 /1.1905e-5/Real C2 /1.4385/
Temp_AD = Temp_AD + C2*V*B*B*B_AD*exp(C2*V/Temp)/(C1*V*V*V*Temp*Temp)
ReturnEnd Subroutine Planck_AD
Subroutine Planck_K(V,Temp,Temp_K, B, B_K)! Computes Planck K, wavenumber V, temperature Temp, Temp_K, Radiance B and B_KImplicit NoneReal V ! Input wavenumberReal B ! Input Radiance mW m-2 sr-1 (cm-1)-1Real Temp ! Input Temperature (K)Real Temp_K ! Output K TemperatureReal B_K ! Input K RadianceReal C1 /1.1905e-5/Real C2 /1.4385/
Temp_K = Temp_K + C2*V*B*B*B_K*exp(C2*V/Temp)/(C1*V*V*V*Temp*Temp)
ReturnEnd Subroutine Planck_K
Subroutine CompBright_Save_AD(Vnu,T,T_AD,Tau,Tau_AD, & Tskin,Tskin_AD,Emiss,Emiss_AD,&
BC1,BC2,N,M, & Tb,Tb_AD)
Active VariablesN levelsM channelsReal T_AD(N) ! Need to expand by channelReal Tau_AD(N,M) ! This is OKReal Tskin_AD ! Need to expand by channelReal Emiss_AD(M) ! This is OK
First examine AD outputs to see which variables need a channel dimension.
Second: Add the channel dimension to these variables in declaration and assignment statements
This is within channel loop:Call Planck_AD &(Vnu(ichan),T(level),T_AD(level),B(level,Ichan),B_AD(level))
Call Planck_K & (Vnu(ichan),T(level),T_K(level,Ichan),B(level,Ichan),B_K(level))
If necessary, add channel loop.
Jacobian Testing
• Goal is to assure that Jacobian is consistent with adjoint.
• Done by making sure that the sum by channel of the Jacobian elements matches the corresponding Adjoint element.
n
2
1
n
2
1
q
R
q
Rq
RT
R
T
RT
R
)(
TADxK
n
m
n
3
n
2
n
1
2
m
2
3
2
2
2
1
1
m
1
3
1
2
1
1
n
m
n
3
n
2
n
1
2
m
2
3
2
2
2
1
1
m
1
3
1
2
1
1
TK
TTTT
T
R
T
R
T
R
T
R
K(x)
q
R
q
R
q
R
q
R
q
R
q
R
q
R
q
Rq
R
q
R
q
R
q
R
RRRR
T
R
T
R
T
R
T
R
AD
m
i Kxx j1 j
i RR
K Code Testing
1. Call Adjoint with ALL inputs set to unity. Make sure you have zeroed all outputs.
2. Call K with ALL inputs set to unity. Make sure you have zeroed all outputs.
3. For each level/variable, sum the channels of K. This should be equal to AD to within machine precision.
4. If a row of K and the corresponding element of AD equal zero, this is probably OK.
K Code Testing- Exception
If the AD output variable already has a channel dimension, don’t sum the K output by channel. Compare the AD and K variables element by element.
Machine Precision Considerations
Test that
Abs(K-AD)/AD < MP
Rule of thumb:
MP = 1.e-7 for Single precision
MP = 1.e-12 for Double precision
Pitfalls
If you compute Abs(K-AD)/AD and find that you get a number like 1, 2, ½ , (an integer or a fraction with one over an integer in the denominator) you probably are summing over channels too many times. Comment out the channel loop from a low level routine and repeat test.
! Ke is summed jacobian… should be equal to ADDo j = 1 , nADout If(Ke(j) /= AD(j)) Then ! Test if unequal ktall = ktall + 1 ! counter for total unequal Write(68,6192) j,Ke(j), AD(j) , ' found',(AD(j) - Ke(j)) / AD(j)
If(Ke(j) == 0.0) Then ! K = 0, AD not Write(68,6192) j,Ke(j), AD(j) , ' AD'
KtAD = KtAD + 1EndIf
If(AD(j) == 0.0) Then ! AD = 0, K not Write(68,6192) j,Ke(j), AD(j) , ' K' KtKe = KtKe + 1EndIf
If(K(j) /= 0.0 .and. AD(j) /= 0.0) Then ! Both not eq 0 If(abs((AD(j) - Ke(j)) / AD(j)) < 1.e-7) Then ! But close enough Write(68,6192) j,Ke(j), AD(j) , ' close',(AD(j) - Ke(j)) / AD(j)
KtClose = KtClose + 1 Else ! BAD BAD BAD
Write(68,6192) j,Ke(j), AD(j) , ' both ',(AD(j) - Ke(j)) / AD(j) KtBad = KtBad + 1
EndIfEndIf
EndIf ! Ke /= ad
EndDo
Testing four ways
Single precision. Double precision reveals no differences.
15 0.2418907582760E+00 0.2418907433748E+00 found -0.6160286147860E-07 18 0.2083575427532E+00 0.2083575576544E+00 found 0.7151725611720E-07 22 0.2486892938614E+00 0.2486893236637E+00 found 0.1198375656486E-06 27 0.2785069644451E+00 0.2785069942474E+00 found 0.1070074446829E-06 28 0.3823396861553E+00 0.3823396563530E+00 found -0.7794724155019E-07 30 0.4918318092823E+00 0.4918317794800E+00 found -0.6059454449314E-07 32 0.4514432251453E+00 0.4514432549477E+00 found 0.6601565871733E-07 41 0.5664035081863E+00 0.5664035677910E+00 found 0.1052335250051E-06 46 0.9583471715450E-01 0.9583472460508E-01 found 0.7774406185490E-07 9 Total elements not matching 0 Elements that are close 0 Elements AD /= 0, K = 0 0 Elements Ke /= 0, AD = 0 0 Elements just don't agree
Using the code
• In real life, just the forward and K codes are used.• The K code is called with just the brightness
temperatures (or equivalent input) set to unity.• DO NOT throw out the TL and AD codes. You
will need them if you make any changes to the forward model. These changes MUST be propagated through the TL,AD and K and tested each step of the way.
We have now seen four ways to compute Jacobians
1. Finite differencing – 2N+1 -forward calcs
2. Tangent Linear – N -TL calcs
3. Adjoint – M -AD calcs
4. K - ~ 3 forward calcs
Das ist alles. Fin. The End