Chapter 16 - Houston H. Stokes Pagehhstokes.people.uic.edu/ftp/book1/ch16.doc · Web viewChapter...

163
Revised Chapter 16 in Specifying and Diagnostically Testing Econometric Models (Edition 3) © by Houston H. Stokes 26 January 2010. All rights reserved. Preliminary Draft Chapter 16. Programming using the Matrix Command...........................1 16.0 Introduction.............................................1 16.1 Brief Introduction to the B34S Matrix language...........2 16.2 Overview of Nonlinear Capability........................18 16.3 Rules of the Matrix Language............................24 16.4 Linear Algebra using the Matrix Language................51 16.5 Extended Eigenvalue Analysis...........................81 16.6 A Preliminary Investigation of Inversion Speed Differences .............................................................90 Table 16.1 Relative Cost of Equalization & Refinement of a General Matrix............................................ 105 16.7 Variable Precision Math................................108 Table 16.2 LRE for Various Approaches to an OLS Model of the Filippelli Data........................................... 111 Table 16.4 Coefficients estimated with QR using Real*16 Filippelli Data........................................... 113 Table 16.5 Residual Sum of Squares on a VAR model of Order 6 – Gas Furnace Data........................................ 114 Table 16.6 LRE for Various Estimates of Coef, SE and RSS of Pontius Data.............................................. 116 Table 16.7 LRE for QR, Cholesky, SVD linpack and lapack for Eberhardt Data............................................ 116 Table 16.8 VPA Alternative Estimates of Filippelli Data set .......................................................... 124 Table 16.9 Lessons to be learned from VPA and Other Accuracy Experiments............................................... 125 16.8 Conclusion.............................................126 Programming using the Matrix Command 16.0 Introduction

Transcript of Chapter 16 - Houston H. Stokes Pagehhstokes.people.uic.edu/ftp/book1/ch16.doc · Web viewChapter...

Revised Chapter 16 in Specifying and Diagnostically Testing Econometric Models (Edition 3) © by Houston H. Stokes 26 January 2010. All rights reserved. Preliminary Draft

Chapter 16.

Programming using the Matrix Command.................................................................................116.0 Introduction........................................................................................................................116.1 Brief Introduction to the B34S Matrix language............................................................216.2 Overview of Nonlinear Capability.................................................................................1816.3 Rules of the Matrix Language........................................................................................2416.4 Linear Algebra using the Matrix Language..................................................................5116.5 Extended Eigenvalue Analysis.......................................................................................8116.6 A Preliminary Investigation of Inversion Speed Differences......................................90

Table 16.1 Relative Cost of Equalization & Refinement of a General Matrix....................10516.7 Variable Precision Math...............................................................................................108

Table 16.2 LRE for Various Approaches to an OLS Model of the Filippelli Data.............111Table 16.4 Coefficients estimated with QR using Real*16 Filippelli Data.......................113Table 16.5 Residual Sum of Squares on a VAR model of Order 6 – Gas Furnace Data....114Table 16.6 LRE for Various Estimates of Coef, SE and RSS of Pontius Data.................116Table 16.7 LRE for QR, Cholesky, SVD linpack and lapack for Eberhardt Data.............116Table 16.8 VPA Alternative Estimates of Filippelli Data set............................................124Table 16.9 Lessons to be learned from VPA and Other Accuracy Experiments................125

16.8 Conclusion......................................................................................................................126

Programming using the Matrix Command

16.0 Introduction

The B34S matrix command is a full featured 4th generation programming language that allows users to customize calculations that are not possible with the built-in B34S procedures. This chapter provides an introduction to the matrix command language with both special emphasis on linear algebra applications and the general design of the language. This chapter should be thought of as an introduction and overview of the power of programming capability with an emphasis on applications.1 At many junctures, the reader is pointed to other chapters for further discussion of the theory underlying the application The role of this chapter is to provide a

1 In the 1960's, with the advent of the availability Fortran compilers and limited capability mainframe computers, econometric researchers programmed / developed software that used column dependent input, had limited capability and was difficult to extend as outlined in Stokes (2004b). In the 1970's and early 1980's econometric software improved, but was still expensive to develop due to high cpu costs on mainframe computers. The PC revolution together with the development of programming languages such as GAUSS® and MATLAB® stimulated researchers to develop their own procedures, without waiting for software developers to "hard wire" this capability in their commercially distributed systems. The matrix command allows a user to develop custom calculations using a programming language with many econometric commands already available. While many of the cpu intensive commands are "hard wired" into the language, many are themselves just subroutines or functions written in the matrix language and available to the user to modity as needed. The goal of this chapter is to discuss how this might be done. Many code examples are provides as an illustration of what is available in the language.

Matrix Command Language

reference to matrix command basics.2 An additional and no less important goal is to discuss a number of tests of linpack and lapack eigenvalue, Cholesky and LU routines for speed and accuracy.

Section 16.1 provides a brief introduction to the matrix command language. All commands in the language are listed by type, but because of space limitations are not illustrated in any detail. Since the matrix command has a running example for all commands, the user is encouraged to experiment with commands of special interest by first reading the help file and next running one or more the of the supplied examples. To illustrate the power of the system, a program to perform a fast Fourier transform example with real*8 and complex*16 data is shown. A user subroutine filter illustrates how the language can be extended.3 The help file for the schur factorization, which is widely used in rational expectations models, is provided as an example to both show capability and illustrate what a representative command help file contains. Section 16.2 provides an overview of some of the nonlinear capability built into the language and motivates why knowledge of this language or one similar like MATLAB or SPEAKEASY(r) is important. The solutions to a variety of problems are illustrated but not discussed in any detail. Section 16.3 discusses many of the rules of the matrix language while section 16.4 illustrates matrix algebra applications. In sections 16.5 and 16.6 refinements to eigenvalue analysis and inversion speed issues are illustrated. Section 16.7 shows the gains obtained by real*16 and VPA math calculations. 16.1 Brief Introduction to the B34S Matrix language

The matrix language is a full featured 4th generation language that can be used to program custom calculations. Analysis is supported for real*8, real*16, complex*16 and complex*32 data. A number of character manipulation routines are supported. High resolution graphics is available on all platforms4 and batch and interactive operation is available. The matrix facility supports user programs, which use the local address space and subroutines and functions, which have their own address space.5 This design means that variable names inside these routines will not conflict with variables known at the global level which is set to 100. Variables in the language are built using an object-oriented programming using analytic statements such as:

y = r*2.;

where r is a variable that could be a matrix, 2D array, 1D array, vector or a scaler. The class of the variables determines the calculation performed. If x were a n by k matrix of data values and y

2 Nonlinear modeling examples are discussed in Chapter 11, while many other examples of applications are given in the other chapters such as 2 and 14. A subsequent book is under development Stokes (200x) that will discuss a large number of applications, particularly in the area of time series analysis. 3 This facility is somewhat similar to the MATLAB m file which contains help commands in its first lines. In contract to MATLAB, the b34s libraries allow placement of a large number of subroutines and help files in one file. The b34s design dates from the IBM MVS PDS facility except that it is portable. SCA has a similar design as regards to macro files.

4 See chapter 1 for the platforms that are supported.5 Examples will be supplied later.

2

Chapter 16

was a n element vector of left hand side data points, the OLS solution using the text book formula could be calculated as

beta=inv(transpose(x)*x)*transpose(x)*y;

By use of vectors that are created by statements such as integers(1,3) it is possible to subset a matrix without the use of do loops and other programming constructs. This capability is illustrated by

a=rn(matrix(10,5:));newa= a(integers(1,3),integers(2,4));call print('Pull off rows 1-3, cols 2-4',a,newa);

which produces output

=> A=RN(MATRIX(10,5:))$

=> NEWA=A(INTEGERS(1,3),INTEGERS(2,4))$

=> CALL PRINT('Pull off rows 1-3, cols 2-4',A,NEWA)$

Pull off rows 1-3, cols 2-4

A = Matrix of 10 by 5 elements

1 2 3 4 5 1 1.82793 -2.14489 0.166069 -0.532415E-01 0.466859 2 -0.641156 0.219954 1.27446 0.477187 0.555387 3 0.726593 -0.282409E-01 -0.555147 0.410387 -0.373611E-01 4 0.174686 -0.957929 1.27209 0.940303 -1.63219 5 1.01451 -0.795788 -0.752745 -0.973475E-01 0.719606E-01 6 -1.70319 -0.853220 -1.68396 0.888468 0.204583 7 2.23174 -1.34378 0.551978 0.411578 0.604946 8 0.256844 -0.266375 1.14227 -0.956681 -0.559318 9 1.26117 -0.396155 -1.84390 0.628140E-02 1.68875 10 -0.303238 -1.07086 1.21187 0.295704 -1.71790

NEWA = Matrix of 3 by 3 elements

1 2 3 1 -2.14489 0.166069 -0.532415E-01 2 0.219954 1.27446 0.477187 3 -0.282409E-01 -0.555147 0.410387

In addition to analytic statements that might contain function calls, call statements which provide a brach to a subroutine are supported. Examples are:

call olsq(y x{0 to 10} y{1 to 10} :print);

which estimates an OLS model for and

call tabulate(x,y,z);

which produces a table of x and y. Both functions and subroutines can be built-in to the execuitable and thus hidden from the user or themselves written in the matrix language. The formula and solve commands allow recursive solution of an analytic statement over a range of index values. By vectorizing the calculation, at the loss of some generality, these features speed up calculations that would have had to use do loops which have substantial overhead. A number of examples that illustrate these features are shown later in this document are all commands and programming statements are shown. Inspection of the language will show that the matrix facility

3

Matrix Command Language

has been influenced closely by SPEAKEASY®, which was developed by Stan Cohen. The programming languages of the two systems are very similar and share the same save-file structure. However thare are a number of important differences that will be discussed further below. The matrix facility is not designed to run interactively, although commands can be given interactively in the Manual Mode. Output is written to the b34s.out file and error messages are displayed in both the b34s.out and b34s.log files. The objective of the matrix facility is to give the user access to a powerful object-oriented programming language so that custom calculations can be made.

A particular strength of the facility is to estimate complex nonlinear least squares and maximum likelihood models. Such models, which are specified in matrix command programs, can be solved using either subroutines or with the nonlinear commands nleq, nlpmin1, nlpmin2, nlpmin3, nllsq, maxf1, maxf2, maxf3, cmaxf1, cmaxf2 and cmaxf3 are discussed in Chapter 11. While the use of B34S subroutines for the complete calculation would give the user total control of the estimation process, speed would be given up. The above nonlinear commands give the user complete control of the form of the estimated model, which is specified in a matrix command program. Since these programs are called by compiled solvers, there is a substantial speed advantage over a design that writes the solver in a subroutine written in that program's language.6 By design the nonlinear solvers were designed to call matrix command programs not matrix command subroutines, although a link to a subroutine can be made.7

The below listed example illustrates the programming language and shows part of the real*8 and complex*16 fft decomposition of data generated with the dcos function. This example uses the commands dcos, fft and ifft. The code is completely vectorized with no loops. The inverse fft is used to recover the series (times n). Real*8 and complex*16 problems are shown.

* Example from IMSL (10) Math Page 707-709; n=7.; ifft=grid(1.,n,1.); xfft=dcos((ifft-1.)*2.*pi()/n); rfft=fft(xfft); bfft=fft(rfft:back); call tabulate(xfft,rfft,bfft); * Complex Case See IMSL(10) Math Page 715-717; cfft=complex(0.0,1.); hfft=(complex(2.*pi())*cfft/complex(n))*complex(3.0); xfft=dexp(complex(ifft-1.)*hfft); cfft=fft(xfft); bfft=fft(cfft:back); call tabulate(xfft,cfft,bfft);

The grid command creates a vector from 1. to 7. in increments of 1. The matrix language supports integer*4 and real*8 data so the command was n=7. not n=7 which would have created an integer. If data types are mixed, the program will generate a mixed mode error since the parser does not know the data type to save the result. The complex command creates a

6 Subroutines DUD, DUD2 and NARQ written in the matrix command language itself are supplied in file matrix2.mac to illustrate a fully programmed nonlinear least squares solver using the Marquardt (1963) method that mimics the SAS nonlin command. 7 The technical reason for this is that for a function or subroutine call a duplicate copy of all arguments is made to named storage at the currect level plus 1. This way the arguments are "local" in the subroutine using a possibly different name. The disadvantage is that this takes more space and slows execution. Use of a program allows all variables to be accessed without explicitly being passed.

4

Chapter 16

complex*16 datatype from 1 to 2 real*8 arguments. In the above example a series is generated, the fft is calculated and the series is recovered (times 7.).

Output is:=> * EXAMPLE FROM IMSL (10) MATH PAGE 707-709$

=> N=7.$

=> IFFT=GRID(1.,N,1.)$

=> XFFT=DCOS((IFFT-1.)*2.*PI()/N)$

=> RFFT=FFT(XFFT)$

=> BFFT=FFT(RFFT:BACK)$

=> CALL TABULATE(XFFT,RFFT,BFFT)$

Obs XFFT RFFT BFFT 1 1.000 -0.2220E-15 7.000 2 0.6235 3.500 4.364 3 -0.2225 -0.4653E-15 -1.558 4 -0.9010 -0.5773E-14 -6.307 5 -0.9010 -0.2129E-16 -6.307 6 -0.2225 0.6328E-14 -1.558 7 0.6235 -0.9279E-17 4.364

=> * COMPLEX CASE SEE IMSL(10) MATH PAGE 715-717$

=> CFFT=COMPLEX(0.0,1.)$

=> HFFT=(COMPLEX(2.*PI())*CFFT/COMPLEX(N))*COMPLEX(3.0)$

=> XFFT=DEXP(COMPLEX(IFFT-1.)*HFFT)$

=> CFFT=FFT(XFFT)$

=> BFFT=FFT(CFFT:BACK)$

=> CALL TABULATE(XFFT,CFFT,BFFT)$

Obs XFFT CFFT BFFT 1 1.000 0.000 -0.2220E-15 0.4441E-15 7.000 -0.4733E-29 2 -0.9010 0.4339 -0.2720E-14 0.2818E-15 -6.307 3.037 3 0.6235 -0.7818 0.7938E-14 0.3890E-15 4.364 -5.473 4 -0.2225 0.9749 7.000 -0.2209E-14 -1.558 6.824 5 -0.2225 -0.9749 0.1110E-13 0.3556E-15 -1.558 -6.824 6 0.6235 0.7818 -0.4496E-14 0.3917E-15 4.364 5.473 7 -0.9010 -0.4339 0.2165E-14 0.3464E-15 -6.307 -3.037

5

Matrix Command Language

The matrix command Subroutine filter, listed next, shows some other aspects of the language. Within the filter subroutine all variables are local. The filter subroutine can be called with commands such as

call filter(xold,xnew,10.,14.);

subroutine filter(xold,xnew,nlow,nhigh);/$/$ Depending on nlow and nhigh filter can be a low pass/$ or a high pass filter/$/$ Real FFT is done for a series. FFT values are zeroed/$ out if outside range nlow - nhigh. xnew recovered/$ by inverse FFT/$/$ FINTERC subroutine uses Complex FFT/$/$ Use of FILTER in place of FILTERC may result in/$ Phase and Gain loss/$/$ xold - input series/$ xnew - filtered series/$ nlow - lower filter bound/$ nhigh - upper filter bound/$/$ Routine built 2 April 1999/$n=norows(xold);

if(n.le.0)then;call print('Filter finds # LE 0');go to done;endif;

if(nlow.le.0.or.nlow.gt.n)then;call print('Filter finds nlow not set correctly');go to done;endif;

if(nhigh.le.nlow.or.nhigh.gt.n)then;call print('Filter finds nhigh not set correctly');go to done;endif;

fftold = fft(xold);fftnew = array(n:);i=integers(nlow,nhigh);fftnew(i) = fftold(i);xnew =afam(fft(fftnew :back))*(1./dfloat(n));

done continue;return;end;

The complete matrix command vocabulary of over 400 words is listed by subroutine, function and keyword:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

List of Built-In Matrix Command Subroutines

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

ACEFIT - Alternating Conditional Expectation Model EstimationACF_PLOT - Simple ACF PlotADDCOL - Add a column to a 2d array or matrix.ADDROW - Add a row to a 2d array or matrix.

6

Chapter 16

AGGDATA - Aggregate Data under control of an ID Vector.ALIGN - Align Series with Missing DataARMA - ARMA estimation using ML and MOM.AUTOBJ - Automatic Estimation of Box-Jenkins ModelBACKSPACE - Backspace a unitBDS - BDS Nonlinearity test.BESTREG - Best OLS REGRESSIONB_G_TEST - Breusch-Godfrey (1978) Residual TestBGARCH - Calculate function for a BGARCH model.BLUS - BLUS Residual AnalysisBPFILTER - Baxter-King Filter.BREAK - Set User Program Break Point.BUILDLAG - Builds NEWY and NEWX for VAR ModelingCCFTEST - Display CCF Function of Prewhitened dataCHAR1 - Place a string is a character*1 array.CHARACTER - Place a string in a character*1 array.CHECKPOINT - Save workspace in portable file.CLEARALL - Clears all objects from workspace.CLEARDAT - Clears data from workspace.CLOSE - Close a logical unit.CLS - Clear screen.CMAXF1 - Constrained maximization of function using zxmwd.CMAXF2 - Constrained maximization of function using dbconf/g.CMAXF3 - Constrained maximization of function using db2pol.COMPRESS - Compress workspace.CONSTRAIN - Subset data based on range of values.CONTRACT - Contract a character array.COPY - Copy an object to another objectCOPYLOG - Copy file to log file.COPYOUT - Copy file to output file.COPYTIME - Copy time info from series 1 to series 2COPYF - Copy a file from one unit to another.CSPECTRAL - Do cross spectral analysis.CSUB - Call SubroutineCSV - Read and Write a CSV fileDATA_ACF - Calculate ACF and PACF PlotsDATA2ACF - Calculate ACF and PACF Plots added argumentDATAFREQ - Data FrequencyDATAVIEW - View a Series Under Menu ControlDELETECOL - Delete a column from a matrix or array.DELETEROW - Delete a row from a matrix or array.DES - Code / decode.DESCRIBE - Calculate Moment 1-4 and 6 of a seriesDF - Calculate Dickey-Fuller Unit Root Test.DISPLAYB - Displays a Buffer contentsDIST_TAB - Distribution TableDODOS - Execute a command string if under dos/windows.DO_SPEC - Display Periodogram and SpectrumDO2SPEC - Display Periodogram and Spectrum added argumentDOUNIX - Execute a command string if under unix.DQDAG - Integrate a function using Gauss-Kronrod rulesDQDNG - Integrate a smooth function using a nonadaptive rule.DQDAGI - Integrate a function over infinite/semi-infinite interval.DQDAGP - Integrate a function with singularity points givenDQDAGS - Integrate a function with end point singularitiesDQAND - Multiple integration of a functionDTWODQ - Two Dimensional Iterated IntegralESACF - Extended Sample Autocorrelation FunctionECHOOFF - Turn off listing of execution.ECHOON - Turn on listing of execution.EPPRINT - Print to log and output file.EPRINT - Print to log file.ERASE - Erase file(s).EXPAND - Expand a character arrayFORMS - Build Control FormsFORPLOT - Forecast PlotFREE - Free a variable.FPLOT - Plot a FunctionFPRINT - Formatted print facility.GAMFIT - Generalized Additive Model Estimation

7

Matrix Command Language

GARCH - Calculate function for a ARCH/GARCH model.GARCHEST - Estimate ARCH/GARCH model.GET - Gets a variable from b34s.GETDMF - Gets a data from a b34s DFM file.GETKEY - Gets a keyGETMATLAB - Gets data from matlab.GET_FILE - Gets a File nameGET_NAME - Get Name of a Matrix VariableGETRATS - Reads RATS Portable file.GETSCA - Reads SCA FSAVE and MAD portable files.GMFAC - LU factorization of n by m matrixGMINV - Inverse of General Matrix using LAPACKGMSOLV - Solve Linear Equations system using LAPACKGRAPH - High Resolution graph.GRAPHP - Multi-Pass Graphics Programing CapabilityGRCHARSET - Set Character Set for Graphics.GRREPLAY - Graph replay and reformat command.GTEST - Tests output of a ARCH/GARCH ModelGWRITE - Save Objects in GAUSS Format using one fileGWRITE2 - Save objects in GAUSS format using two filesHEADER - Turn on headerHEXTOCH - Concert hex to a character representation.HINICH82 - Hinich 1982 Nonlinearity Test.HINICH96 - Hinich 1996 Nonlinearity Test.HPFILTER - Hodrick-Prescott Filter.ISEXTRACT - Place data in a structure.IALEN - Get actual length of a buffer of character dataIBFCLOSE - Close a file that was used for Binary I/OIBFOPEN - Open a File for Binary I/OIBFREADC - Reads from a binary file into Character*1 arrayIBFREADR - Reads from a binary file into Real*8 arrayIBFSEEK - Position Binary read/write pointerIBFWRITER - Write noncharacter buffer on a binary fileIBFWRITEC - Write character buffer on a binary fileIB34S11 - Parse a token using B34S11 parserIFILESIZE - Determine number of bites in a fileIFILLSTR - Fill a string with a characterIGETICHAR - Obtain ichar info on a character bufferIGETCHARI - Get character from ichar valueIJUSTSTR - Left/Right/center a stringILCOPY - Move bites from one location to anotherILOCATESTR - Locate a substring in a string - 200 length maxILOWER - Lower case a string - 200 length maxINEXTR8 - Convert next value in string to real*8 variableINEXTR4 - Convert next value in string to real*4 variableINEXTSTR - Extract next blank deliminated sub-string from a stringINEXTI4 - Convert next value in a string to integer.INTTOSTR - Convert integer to string using formatIRF - Impulse Response Functions of VAR ModelIR8TOSTR - Convert real*8 value to string using formatISTRTOR8 - Convert string to real*8ISTRTOINT - Convert string to integerIUPPER - Upper case a string - 200 length maxI_DRNSES - Initializes the table used by shuffled generators.I_DRNGES - Get the table used in the shuffled generators.I_DRNUN - Uniform (0,1) GeneratorI_DRNNOR - Random Normal DistributionI_DRNBET - Random numbers from beta distributionI_DRNCHI - Random numbers from Chi-squared distributionI_DRNCHY - Random numbers from Cauchy distributionI_DRNEXP - Random numbers from standard exponentialI_DRNEXT - Random numbers from mixture of two exponential distributionsI_DRNGAM - Random numbers from standard gamma distributionI_DRNGCT - Random numbers from general continuous distributionI_DRNGDA - Random integers from discrete distribution alias approachI_DRNGDT - Random integers from discrete using table lookupI_DRNLNL - Random numbers from lognormal distributionI_DRNMVN - Random numbers from multivariate normal

8

Chapter 16

I_DRNNOA - Random normal numbers using acceptance/rejectionI_DRNNOR - Random normal numbers using CDF methodI_DRNSTA - Random numbers from stable distributionI_DRNTRI - Random numbers from triangular distributionI_DRNSPH - Random numbers on the unit circleI_DRNVMS - Random numbers from Von Mises distributionI_DRNWIB - Random numbers from Weibull distributionI_RNBIN - Random integers from binomial distributionI_RNGET - Gets seed used in IMSL Random Number generators.I_RNOPG - Gets the type of generator currently in use.I_RNOPT - Selects the type of uniform (0,1) generator.I_RNSET - Sets seed used in IMSL Random Number generators.I_RNGEO - Random integers from Geometric distributionI_RNHYP - Random integers from Hypergeometric distribution.I_RNMTN - Random numbers from multinomial distributionI_RNNBN - Negative binomial distributionI_RNPER - Random perturbation of integersI_RNSRI - Index of random sample without replacementKEENAN - Keenan Nonlinearity testKSWTEST - K Period Stock Watson TestKSWTESTM - Moving Period Stock Watson TestLAGMATRIX - Builds Lag Matrix.LAGTEST - 3-D Graph to display RSS for OLS LagsLAGTEST2 - 3-D Graph to display RSS for MARS LagsLAPACK - Sets Key LAPACK parametersLM - Engle Lagrange Multiplier ARCH test.LOAD - Load a Subroutine from a library.LOADDATA - Load Data from b34s into MATRIX command.LPMAX - Solve Linear Programming maximization problem.LPMIN - Solve Linear Programming minimization problem.LRE - McCullough Log Relative ErrorMAKEDATA - Place data in a b34s data loading structure.MAKEFAIR - Make Fair-Parke Data Loading FileMAKEGLOBAL - Make a variable global (seen at all levels).MAKELOCAL - Make a variable seen at only local level.MAKEMATLAB - Place data in a file to be loaded into Matlab.MAKEMAD - Makes SCA *.MAD datafile from vectorsMAKERATS - Make RATS portable file.MAKESCA - Make SCA FSV portable file.MANUAL - Place MATRIX command in manual mode.MARS - Multivariate Autoregressive Spline ModelsMARSPLINE - Updated MARS Command using Hastie-Tibshirani codeMARS_VAR - Joint Estination of VAR Model using MARS ApproachMAXF1 - Maximize a function using IMSL ZXMIN.MAXF2 - Maximize a function using IMSL DUMINF/DUMING.MAXF3 - Maximize a function using simplex method (DU2POL).MELD - Form all possible combinations of vectors.MENU - Put up user Menu for InputMESSAGE - Put up user message and allow a decision.MINIMAX - Estimate MINIMAX with MAXF2MISSPLOT - Plot of a series with Missing DataMQSTAT - Multivariate Q StatisticMVNLTEST - Multivariate Third Order Hinich TestNAMES - List names in storage.NLEQ - Jointly solve a number of nonlinear equations.NLLSQ - Nonlinear Least Squares Estimation.NL2SOL - Alternative Nonlinear Least Squares Estimation.NLPMIN1 - Nonlinear Programming fin. diff. grad. DN2CONF.NLPMIN2 - Nonlinear Programming user supplied grad. DN2CONG.NLPMIN3 - Nonlinear Programming user supplied grad. DN0ONF.NLSTART - Generate starting values for NL routines.NOHEADER - Turn off header.OLSQ - Estimate OLS, MINIMAX and L1 models.OLSPLOT - Plot of Fitted and Actual Data & ResOPEN - Open a file and attach to a unit.OUTDOUBLE - Display a Real*8 value at a x, y on screen.OUTINTEGER - Display an Integer*4 value at a x, y on screen.OUTSTRING - Display a string value at a x, y point on screen.PCOPY - Copy an object from one pointer address to anotherPERMUTE - Reorder Square MatrixPISPLINE - Pi Spline Nonlinear Model Building

9

Matrix Command Language

PLOT - Line-Printer GraphicsPOLYFIT - Fit an nth degree polynomialPOLYVAL - Evaluate an nth degree polynomialPOLYMCONV - Convert storage of a polynomial matrixPOLYMDISP - Display/Extract a polynomial matrixPOLYMINV - Invert a Polynomial MatrixPOLYMMULT - Multiply a Polynomial MatrixPP - Calculate Phillips Peron Unit Root testPRINT - Print text and data objects.PRINTALL - Lists all variables in storage.PRINTOFF - Turn off PrintingPRINTON - Turn on Printing (This is the default)PRINTVASV - Resets so that vectors/arrays print as vectors/arraysPRINTVASCMAT - Vectors/Arrays print as Column Matrix/ArrayPRINTVASRMAT - Vectors/Arrays print as Row Matrix/ArrayPROBIT - Estimate Probit (0-1) Model.PVALUE_1 - Present value of $1 recieved at end of n yearsPVALUE_2 - Present value of an Annuity of $1PVALUE_3 - Present value of $1 recieved throughout yearQPMIN - Quadratic Programming.QUANTILE - Calculate interquartile range.READ - Read data directly into MATRIX workspace from a file.REAL16INFO - Obtain Real16 infoREAL16OFF - Turn off Real16 addREAL16ON - Turn on extended accuracyREAL32OFF - Turn off Real32 addREAL32ON - Turn on extended accuracy for real*16REAL32_VPA - Turn on extended accuracy for real*16 using vpaRESET - Calculate Ramsey (1969) regression specification test.RESET77 - Thursby - Schmidt Regression Specification TestRESTORE - Load data back in MATRIX facility from external save file.RTEST - Test Residuals of ModelRTEST2 - Test Residuals of Model - No RES and Y PlotsREVERSE - Test a real*8 vector for reversibility in Freq. DomainREWIND - Rewind logical unit.ROTHMAN - Test a real*8 vector for reversibility in Time DomainRMATLAB - Runs MatlabRRPLOTS - Plots Recursive Residual DataRRPLOTS2 - Plots Recursive Residual CoefRUN - Terminates the matrix command being in "manual" mode.SAVE - Save current workspace in portable file format.SCHUR - Performs Schur decompositionSCREENCLOSE - Turn off Display ManagerSCREENOPEN - Turn on Display ManagerSCREENOUTOFF - Turn screen output off.SCREENOUTON - Turn screen output on.SET - Set all elements of an object to a value.SETCOL - Set column of an object to a value.SETLABEL - Set the label of an object.SETLEVEL - Set level.SETNDIMV - Sets an element in an n dimensional object.SETROW - Set row of an object to a value.SETTIME - Sets the time info in an existing seriesSETWINDOW - Set window to main(1), help(2) or error(3).SIGD - Set print digits. Default g16.8SIMULATE - Dynamically Simulate OLS ModelSMOOTH - Do exponential smoothing.SOLVEFREE - Set frequency of freeing temp variables.SORT - Sort a real vector.SPECTRAL - Spectral analysis of a vector or 1d array.STEPWISE - Stepwise OLS RegressionSTOP - Stop execution of a matrix command.SUBRENAME - Internally rename a subroutine.SUSPEND - Suspend loading and Execuiting a programSWARTEST - Stock-Watson VAR TestSYSTEM - Issue a system command.TABULATE - List vectors in a table.TESTARG - Lists what is passed to a subroutine or function.TIMER - Gets CPU time.

10

Chapter 16

TRIPLES - Calculate Triples Reversability TestTSAY - Calculate Tsay nonlinearity test.TSLINEUP - Line up Time Series DataTSD - Interface to TSD Data setVAREST - VAR ModelingVPASET - Set Variable Precision Math OptionsVOCAB - List built-in subroutine vocabulary.WRITE - Write an object to an external file.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Matrix Command Built-In Function Vocabulary

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

ACF - Calculate autocorrelation function of a 1d object.AFAM - Change a matrix or vector to an array class object.ARGUMENT - Unpack character argument at run-timeARRAY - Define a 1d or 2d array.BETAPROB - Calculate a beta probability.BINDF - Evaluate Binomial Distribution FunctionBINPR - Evaluate Binomial Probability FunctionBOOTI - Calculate integers to be used with bootstrap.BOOTV - Bootstraps a vector with replacement.BOXCOX - Box-Cox Transformation of a series given lamda.BSNAK - Compute Not a Knot SequenceBSOPK - Compute optimal spline knot sequenceBSINT - Compute 1-D spline interpolant given knotsBSINT2 - Compute 2-D spline interpolant given knotsBSINT3 - Compute 3-D spline interpolant given knotsBSDER - Compute 1-D spline values/derivatives given knotsBSDER2 - Compute 2-D spline values/derivatives given knotsBSDER3 - Compute 3-D spline values/derivatives given knotsBSITG - Compute 1-D spline integral given knotsBSITG2 - Compute 2-D spline integral given knotsBSITG3 - Compute 3-D spline integral given knotsC1ARRAY - Create a Character*1 arrayC8ARRAY - Create a Character*8 arrayCATCOL - Concatenates an object by columns.CATROW - Concatenates an object by rows.CCF - Calculate the cross correlation function on two objects.CHAR - Convect an integer in range 0-127 to character.CHARDATE - Convert julian variable into character date dd\mm\yy.CHARDATEMY - Convert julian variable into character data mm\yyyy.CHARTIME - Converts julian variable into character date hh:mm:ssCHISQPROB - Calculate chi-square probability.CHTOR - Convert a character variable to a real variable.CHTOHEX - Convert a character to its hex representation.CFUNC - Call FunctionCOMB - Combination of N objects taken M at a time.COMPLEX - Build a complex variable from two real*8 variables.CSPLINEFIT - Fit a 1 D Cubic Spline using alternative modelsCSPLINE - Calculate a cubic spline for 1 D dataCSPLINEVAL - Calculate spline value given splineCSPLINEDER - Calculate spline derivative given spline valueCSPLINEITG - Calculate integral of a cubic splineCUSUM - Cumulative sum.CUSUMSQ - Cumulative sum squared.CWEEK - Name of the day in character.DABS - Absolute value of a real*8 variable.DARCOS - Arc cosine of a real*8 variable.DARSIN - Arc sine of a real*8 variable.DATAN - Arc tan of a real*8 variable.DATAN2 - Arc tan of x / y. Signs inspected.DATENOW - Date now in form dd/mm/yyDBLE - Convert real*4 to real*8.DCONJ - Conjugate of complex argument.DCOS - Cosine of real*8 argument.DCOSH - Hyperbolic cosine of real*8 argument.DDOT - Inner product to two vectors.

11

Matrix Command Language

DERF - Error function of real*8/real*16 argument.DERFC - Inverse of error function.DERIVATIVE - Analytic derivative of a vector.DET - Determinate of a matrix.DEXP - Exponential of a real*8 argument.DFLOAT - Convert integer*4 to real*8.DGAMMA - Gamma function of real*8 argument.DIAG - Place diagonal of a matrix in an array.DIAGMAT - Create diagonal matrix.DIF - Difference a series.DINT - Extract integer part of real*8 numberDNINT - Extract nearest integer part of real*8 numberDIVIDE - Divide with an alternative return.DLGAMMA - Natural log of gamma function.DLOG - Natural log.DLOG10 - Base 10 log.DMAX - Largest element in an array.DMAX1 - Largest element between two arrays.DMIN - Smallest element in an array.DMIN1 - Smallest element between two arrays.DMOD - Remainder.DROPFIRST - Drops observations on top or array.DROPLAST - Drops observations on bottom of an array.DSIN - Calculates sine.DSINH - Hyperbolic sine.DSQRT - Square root of real*8 or complex*16 variable.DTAN - Tangent.DTANH - Hyperbolic tangent.EIGENVAL - Eigenvalue of matrix. Alias EIG.EPSILON - Positive value such that 1.+x ne 1.EVAL - Evaluate a character argumentEXP - Exponential of real*8 or complex*16 variable.EXTRACT - Extract elements of a character*1 variable.FACT - FactorialFDAYHMS - Gets fraction of a day.FFT - Fast fourier transform.FIND - Finds location of a character string.FLOAT - Converts integer*4 to real*4.FPROB - Probability of F distribution.FREQ - Gets frequency of a time series.FRACDIF - Fractional DifferencingFYEAR - Gets fraction of a year from julian date.GENARMA - Generate an ARMA series given parameters.GETDAY - Obtain day of year from julian series.GETHOUR - Obtains hour of the day from julian date.GETNDIMV - Obtain value from an n dimensional object.GETMINUTE - Obtains minute of the day from julian date.GETMONTH - Obtains month from julian date.GETQT - Obtains quarter of year from julian date.GETSECOND - Obtains second from julian date.GETYEAR - Obtains year.GOODCOL - Deletes all columns where there is missing data.GOODROW - Deletes all rows where there is missing data.GRID - Defines a real*8 array with a given increment.HUGE - Largest number of typeHYPDF - Evaluate Hypergeometric Distribution FunctionHYPPR - Evaluate Hypergeometric Probability FunctionINTEGER8 - Load an Integer*8 object from a stringI4TOI8 - Move an object from integer*4 to integer*8I8TOI4 - Move an object from integer*8 to integer*4ICHAR - Convect a character to integer in range 0-127.ICOLOR - Sets Color numbers. Used with Graphp.IDINT - Converts from real*8 to integer*4.IDNINT - Converts from real*8 to integer*4 with rounding.INFOGRAPH - Obtain Interacter Graphics INFOIMAG - Copy imaginary part of complex*16 number into real*8.IAMAX - Largest abs element in 1 or 2D objectIAMIN - Smallest abs element in 1 or 2D objectIMAX - Largest element in 1 or 2D objectIMIN - Smallest element in 1 or 2D objectINDEX - Define integer index vector, address n dimensional

12

Chapter 16

object.INLINE - Inline creation of a programINT - Copy real*4 to integer*4.INTEGERS - Generate an integer vector with given interval.INV - Inverse of a real*8 or complex*16 matrix.INVBETA - Inverse beta distribution.INVCHISQ - Inverse Chi-square distribution.INVFDIS - Inverse F distribution.INVTDIS - Inverse t distribution.IQINT - Converts from real*16 to integer*4.IQNINT - Converts from real*16 to integer*4 with rounding.ISMISSING - Sets to 1.0 if variable is missingIWEEK - Sets 1. for monday etc.JULDAYDMY - Given day, month, year gets julian value.JULDAYQY - Given quarter and year gets julian value.JULDAYY - Given year gets julian value.KEEPFIRST - Given k, keeps first k observations.KEEPLAST - Given k, keeps last k observations.KIND - Returns kind of an object in integer.KINDAS - Sets kind of second argument to kind first arg.KLASS - Returns klass of an object in integer.KPROD - Kronecker Product of two matrices.LABEL - Returns label of a variable.LAG - Lags variable. Missing values propagated.LEVEL - Returns current level.LOWERT - Lower triangle of matrix.MCOV - Consistent Covariance MatrixMAKEJUL - Make a Julian date from a time seriesMASKADD - Add if mask is set.MASKSUB - Subtract if mask is set.MATRIX - Define a matrix.MEAN - Average of a 1d object.MEDIAN - Median of a real*8 object.MFAM - Set 1d or 2d array to vector or matrix.MISSING - Returns missing value.MLSUM - Sums log of elements of a 1d object.MOVELEFT - Moves elements of character variable left.MOVERIGHT - Move elements of character variable right.NAMELIST - Creates a namelist.NEAREST Nearest distinct number of a given typeNCCHISQ - Non central chi-square probability.NOCOLS - Gets number of columns of an object.NOELS - Gets number of elements in an object.NORMDEN - Normal density.NORMDIST - 1-norm, 2-norm and i-norm distance.NOROWS - Gets number of rows of an object.NOTFIND - Location where a character is not found.OBJECT - Put together character objects.PDFAC - Cholesky factorization of PD matrix.PDFACDD - Downdate Cholesky factorization.PDFACUD - Update Cholesky factorization.PDINV - Inverse of a PD matrix.PDSOLV - Solution of a PD matrix given right hand side.PI - Pi value.PINV - Generalized inverse.PLACE - Places characters inside a character array.POIDF - Evaluate Poisson Distribution FunctionPOIPR - Evaluate Poisson Probability FunctionPOINTER - Machine address of a variable.POLYDV - Division of polynomials.POLYMULT - Multiply two polynomialsPOLYROOT - Solution of a polynomial.PROBIT - Inverse normal distribution.PROBNORM - Probability of normal distribution.PROBNORM2 - Bivariate probability of Nornal distribution.PROD - Product of elements of a vector.Q1 - Q1 of a real*8 object.Q3 - Q3 of a real*8 object.QCOMPLEX - Build complex*32 variable from real*16 inputs.QINT - Extract integer part of real*16 numberQNINT - Extract nearest integer part of real*16 number

13

Matrix Command Language

QREAL - Obtain real*16 part of a complex*326 number.QR - Obtain Cholesky R via QR method using LAPACK.QRFAC - Obtain Cholesky R via QR method.QRSOLVE - Solve OLS using QR.RANKER - Index array that ranks a vector.RCOND - 1 / Condition of a MatrixREAL - Obtain real*8 part of a complex*16 number.R8TOR16 - Convert Real*8 to Real*16R16TOR8 - Convert Real*16 to Real*8REAL16 - Input a Real*16 VariableREC - Rectangular random number.RECODE - Recode a real*8 or character*8 variableRN - Normally distributed random number.ROLLDOWN - Moves rows of a 2d object down.ROLLLEFT - Moves cols of a 2d object left.ROLLRIGHT - Moves cols of a 2d object right.ROLLUP - Moves rows of a 2d object up.RTOCH - Copies a real*8 variable into character*8.SEIGENVAL - Eigenvalues of a symmetric matrix. Alias SEIG.SEXTRACT - Takes data out of a field.SFAM - Creates a scalar object.SNGL - Converts real*8 to real*4.SPACING - Absolute spacing near a given numberSPECTRUM - Returns spectrum of a 1d object.SUBSET - Subset 1d, 2d array, vector or matrix under a mask.SUBMATRIX - Define a SubmatrixSUM - Sum of elements.SUMCOLS - Sum of columns of an object.SUMROWS - Sum of rows of an object.SUMSQ - Sum of squared elements of an object.SVD - Singular value decomposition of an object.TIMEBASE - Obtains time base of an object.TIMENOW - Time now in form hh:mm:ssTIMESTART - Obtains time start of an object.TINY - Smallest number of typeTDEN - t distribution density.TO_RMATRIX - Convert Object to Row-MatrixTO_CMATRIX - Convert Object to Col-MatrixTO_RARRAY - Convert Object to Row-ArrayTO_CARRAY - Convert Object to Col-MatrixTO_VECTOR - Convert Object to VectorTO_ARRAY - Convert Object to ArrayTPROB - t distribution probability.TRACE - Trace of a matrix.TRANSPOSE - Transpose of a matrix.UPPERT - Upper Triangle of matrix.VARIANCE - Variance of an object.VECTOR - Create a vector.VFAM - Convert a 1d array to a vector.VOCAB - List built in functions.VPA - Variable Precision Math calculationZDOTC - Conjugate product of two complex*16 objects.ZDOTU - Product of two complex*16 objects.ZEROL - Zero lower triangle.ZEROU - Zero upper triangle.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Matrix Programming Language key words

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

CALL - Call a subroutineCONTINUE - go to statementDO - Starts a do loopDOWHILE - Starts a dowhile loopNEXT i - End of a do loopENDDO - End of a do loopENDDOWHILE - End of a dowhile loopEND - End of a program, function or Subroutine.EXITDO - Exit a DO loop

14

Chapter 16

EXITIF - Exit an IF statementFOR - Start a do loop,FORMULA - Define a recursive formula.GO TO - Transfer statementFUNCTION - Beginning of a function.IF( ) - Beginning of an IF structureENDIF - End of an IF( )THEN structurePROGRAM - Beginning of a program,RETURN - Next to last statement before end.RETURN( ) - Returns the result of a function.SOLVE - Solve a recursive system.SUBROUTINE - Beginning of subroutine.WHERE( ) - Starts a where structure.

Within the B34S Display Manager, individual help is available on each command. Usually the help document shows an example. In addition, for each command an example that can be run from the Tasks menu is provided in the file matrix.mac. Users are encouraged to cut and paste the commends from these help documents and example files to create their custom programs. Full documentation for the matrix command can be obtained from the display manager or by running the command

b34sexec help=matrix; b34srun;

Since subroutine libraries and help libraries are text files, users can easily add examples and helps from their own applications or build libraries of custom procedures. The help file for the schur command, which is shown next, provides an example of the on line documentation which is available for all matrix command keywords:

SCHUR - Performs Schur decomposition

call schur(a,s,u);

factors real*8 matrix A such that

A=U*S*transpose(U)

and S is upper triangular.

For complex*16 the equation is

A=U*S*transpose(dconj(U))

U is an orthogonal matrix such that

for real*8

u*transpose(u) = I

Eigenvalues of A are along diagonal of S.

An optional calling sequence for real*8 is

call schur(a,s,z,wr,wi);

where wr and wi are the real and imaginary parts, respectively, of the computed eigenvalues in the same order that they appear on the diagonal of the output Schur form s. Complex conjugate pairs of eigenvalues will appear consecutively with the eigenvalue having the positive imaginary part first.

15

Matrix Command Language

Optional calling sequence for complex*16 is

call schur(a,s,z,w);

where w contains the complex eigenvalues.

The Schur decomposition can be performed on many real*8 and complex*16 matrices for which eigenvalues cannot be found. For detail see MATLAB manual page 4-36.

The schur command uses the lapack version 3 routines dgees and zgees.

Exemple:

b34sexec matrix; * Example from MATLAB - General Matrix; a=matrix(3,3: 6., 12., 19., -9., -20., -33., 4., 9., 15.); call schur(a,s,u); call print(a,s,u); is_ident=u*transpose(u); is_a =u*s*transpose(u);

* Positive Def. case ; aa=transpose(a)*a; call schur(aa,ss,uu); ee=eigenval(aa); call print(aa,ss,uu,ee);

* Expanded calls; call schur(a,s,u,wr,wi); call print('Real and Imag eigenvalues'); call tabulate(wr,wi);

* Testing Properties; call print(is_a,is_ident);

* Random Problem ; n=10; a=rn(matrix(n,n:)); call schur(a,s,u); call print(a,s,u); is_ident=u*transpose(u); is_a =u*s*transpose(u); call schur(a,s,u,wr,wi); call print('Real and Imag eigenvalues'); call tabulate(wr,wi); call print(is_a,is_ident);

* Complex case ; a=matrix(3,3: 6., 12., 19., -9., -20., -33., 4., 9., 15.); ca=complex(a,2.*a); call schur(ca,cs,cu,cw); call print(ca,cs,cu,'Eigenvalues two ways', cw,eigenval(ca)); is_ca=cu*cs*transpose(dconj(cu)); call print(is_ca); b34srun;

When run this example produces edited output:

B34S(r) Matrix Command. d/m/y 29/ 6/07. h:m:s 9:15:23.

=> * EXAMPLE FROM MATLAB - GENERAL MATRIX$

16

Chapter 16

=> A=MATRIX(3,3: 6., 12., 19., => -9., -20., -33., => 4., 9., 15.)$

=> CALL SCHUR(A,S,U)$

=> CALL PRINT(A,S,U)$

A = Matrix of 3 by 3 elements

1 2 3 1 6.00000 12.0000 19.0000 2 -9.00000 -20.0000 -33.0000 3 4.00000 9.00000 15.0000

S = Matrix of 3 by 3 elements

1 2 3 1 -1.00000 20.7846 -44.6948 2 0.00000 1.00000 -0.609557 3 0.00000 0.00000 1.00000

U = Matrix of 3 by 3 elements

1 2 3 1 -0.474100 0.664753 0.577350 2 0.812743 0.782061E-01 0.577350 3 -0.338643 -0.742959 0.577350

=> IS_IDENT=U*TRANSPOSE(U)$

=> IS_A =U*S*TRANSPOSE(U)$

=> * POSITIVE DEF. CASE $

=> AA=TRANSPOSE(A)*A$

=> CALL SCHUR(AA,SS,UU)$

=> EE=EIGENVAL(AA)$

=> CALL PRINT(AA,SS,UU,EE)$

AA = Matrix of 3 by 3 elements

1 2 3 1 133.000 288.000 471.000 2 288.000 625.000 1023.00 3 471.000 1023.00 1675.00

SS = Matrix of 3 by 3 elements

1 2 3 1 2432.40 0.333414E-12 -0.852649E-13 2 0.00000 0.599956 -0.810500E-13 3 0.00000 0.00000 0.685245E-03

UU = Matrix of 3 by 3 elements

1 2 3 1 -0.233460 -0.842147 0.486091 2 -0.506875 -0.321212 -0.799938 3 -0.829804 0.433141 0.351873

EE = Complex Vector of 3 elements

( 2432. , 0.000 ) ( 0.6000 , 0.000 ) ( 0.6852E-03, 0.000 )

Note that the diagonal of SS contains the eigenvalues shown in EE => * EXPANDED CALLS$

17

Matrix Command Language

=> CALL SCHUR(A,S,U,WR,WI)$

=> CALL PRINT('Real and Imag eigenvalues')$

Real and Imag eigenvalues

=> CALL TABULATE(WR,WI)$

Obs WR WI 1 -1.000 0.000 2 1.000 0.000 3 1.000 0.000

=> * TESTING PROPERTIES$

=> CALL PRINT(IS_A,IS_IDENT)$

A is recovered from the Schur factorization and . To save space the random problem is not shown. IS_A = Matrix of 3 by 3 elements

1 2 3 1 6.00000 12.0000 19.0000 2 -9.00000 -20.0000 -33.0000 3 4.00000 9.00000 15.0000

IS_IDENT= Matrix of 3 by 3 elements

1 2 3 1 1.00000 0.555112E-16 0.555112E-16 2 0.555112E-16 1.00000 0.555112E-16 3 0.555112E-16 0.555112E-16 1.00000

16.2 Overview of Nonlinear Capability

The B34S matrix command contains a number of nonlinear commands that allow the user to specify a model in a 4th generation language while performing the calculation using compiled code. Chapter 11 discussed nonlinear least squares and a number of maximization/minimization examples. In many cases a matrix command uses routines from the commercially available IMSL subroutine library, LINPACK, LAPACK, EISPACK or FFTPACK. For nonlinear modeling applications users of the stand-alone IMSL product would have to license a Fortran compiler, write the model and main program in Fortran, build routines to display the results and compile all code each time a model needed to be estimated. In contrast, the B34S implementation allows the user to specify the model in a 4th generation language and further process the results from within a general programming language. Optionally it is possible to view the solution progress from a GUI. Grouping the nonlinear capability of the matrix command by function (with some overlap) and showing the underlying routines used, there are:

Constrained maximization commands:

CMAXF1 - Constrained maximization of function using zxmwd.CMAXF2 - Constrained maximization of function using dbconf/g.CMAXF3 - Constrained maximization of function using db2pol.

Unconstrained maximization commands:

MAXF1 - Maximize a function using IMSL ZXMIN.MAXF2 - Maximize a function using IMSL DUMINF/DUMING.MAXF3 - Maximize a function using simplex method (DU2POL).

18

Chapter 16

Linear and non-linear programming commands:

LPMAX - Solve Linear Programming maximization problem.LPMIN - Solve Linear Programming minimization problem.NLEQ - Jointly solve a number of nonlinear equations.NLPMIN1 - Nonlinear Programming fin. diff. grad. DN2CONF.NLPMIN2 - Nonlinear Programming user supplied grad. DN2CONG.NLPMIN3 - Nonlinear Programming user supplied grad. DN0ONF.

Nonlinear least squares and utility commands:

BGARCH - Calculate function for a BGARCH model.GARCH - Calculate function for a ARCH/GARCH model.GARCHEST - Estimate a ARCH/GARCH ModelNLEQ - Jointly solve a number of nonlinear equations.NLLSQ - Nonlinear Least Squares Estimation.NL2SO - Alternative Nonlinear Least Squares Estimation.NLSTART - Generate starting values for NL routines.QPMIN - Quadratic Programming.SOLVEFREE - Set frequency of freeing temp variables.

Integration of a user function Commands:

DQDAG - Integrate a function using Gauss-Kronrod rulesDQDNG - Integrate a smooth function using a nonadaptive rule.DQDAGI - Integrate a function over infinite/semi-infinite interval.DQDAGP - Integrate a function with singularity points givenDQDAGS - Integrate a function with end point singularitiesDQAND - Multiple integration of a function

Spline and Related Commands:

ACEFIT - Alternating Conditional Expectation Model EstimationBSNAK - Compute Not a Knot SequenceBSOPK - Compute optimal spline know sequenceBSINT - Compute 1-D spline interpolant given knotsBSINT2 - Compute 2-D spline interpolant given knotsBSINT3 - Compute 3-D spline interpolant given knotsBSDER - Compute 1-D spline values/derivatives given knotsBSDER2 - Compute 2-D spline values/derivatives given knotsBSDER3 - Compute 3-D spline values/derivatives given knotsBSITG - Compute 1-D spline integral given knotsBSITG2 - Compute 2-D spline integral given knotsBSITG3 - Compute 3-D spline integral given knotsCSPLINEFIT - Fit a 1 D Cubic Spline using alternative modelsCSPLINE - Calculate a cubic spline for 1 D dataCSPLINEVAL - Calculate spline value given splineCSPLINEDER - Calculate spline derivative given spline valueCSPLINEITG - Calculate integral of a cubic splineGAMFIT - Generalized Additive Model EstimationMARS - Multivariate Autoregressive Spline ModelsPISPLINE - Pi Spline Nonlinear Model Building

While space limits a full discussion of each command, examples from within each group will be discussed briefly and illustrated using supplied problems in this Chapter and in Chapter 11 where the optimization and NLLS capability was discussed in some detail. The strength of the nonlinear capability is that the user has great flexibility to specify the model in a B34S matrix program. Once the model has been coded, the solution proceeds using the built-in command

19

Matrix Command Language

which consists of compiled code.8 The user can optionally display on the screen the solution process. Some of the applications in this area are shown next.

Integration is an important topic and a number of commands are available. For example the command dqand provides solution of up to 20 integrals where the user specifies the model in a matrix language program. Consider the problem

(16.2-1)

where the ranges of integration are successively widened. The above problem can be solved with

b34sexec matrix; * This is a big problem. Note maxsub 100000 ; program test; f=dexp(-1.*(x(1)*x(1)+x(2)*x(2)+x(3)*x(3))); return; end;

/$ We solve 6 problems – each with wider bounds. /$ As constant => inf and => pi()**1.5

lowerv=array(3:); upperv=array(3:); x =array(3:);

call print(test);

call echooff;

j=integers(3);

do i=1,6; cc=dfloat(i)/2.0;

lowerv(j)=(-1.)*cc; upperv(j)= cc;

call dqand(f x :name test :lower lowerv :upper upperv :errabs .0001 :errrel .001 :maxsub 100000 :print);

call print('lower set as ',cc:); call print('results ',%result:); call print('error ',%error:); enddo;

call print('Limit answer ',pi()**1.5 :); b34srun;

to produce answers for the range (–3., 3.) of:

8 In contrast to the B34S design where the nonlinear commands are built into the language, the MATLAB minimization commands fmin and fminbnd are totally written in the MATLAB 4th generation language. While the user can see what is being calculated, the cost is that as the model is solved the MATLAB parser must crack each statement in the command. This design substantially slows execution.

20

Chapter 16

Integration using DQAND

For Integral 1 Lower Integration value -3.000000000000000 Upper Integration value 3.000000000000000

For Integral 2 Lower Integration value -3.000000000000000 Upper Integration value 3.000000000000000

For Integral 3 Lower Integration value -3.000000000000000 Upper Integration value 3.000000000000000

ERRABS set as 1.000000000000000E-04 ERRREL set as 1.000000000000000E-03 MAXSUB set as 100000 Result of Integration 5.567958983584796 Error estimate 3.054134012359100E-08 Limit answer 5.568327996831708

Spline models can be used to fit a model to data where the underlying function is not known. Assume the function

(16.2-2)

which is very hard to evaluate. The setup

b34sexec matrix;* Test Example from IMSL(10) ;call echooff;nxdata=21;nydata=6;nzdata=8;kx=5;ky=2;kz=3;i=integers(nxdata);j=integers(nydata);k=integers(nzdata);xdata=dfloat(i-11)/10.;ydata=dfloat(j-1)/5.;zdata=dfloat(k-1)/dfloat(nzdata-1);iimax=index(nxdata,nydata,nzdata:);f=array(iimax:);

do ii=1,nxdata;do jj=1,nydata;do kk=1,nzdata;ii3=index(nxdata,nydata,nzdata:ii,jj,kk);f(ii3)=(xdata(ii)**3.) + (xdata(ii)*ydata(jj)*zdata(kk));enddo;enddo;enddo;

xknot=bsnak(xdata,kx);yknot=bsnak(ydata,ky);zknot=bsnak(zdata,kz);

bscoef3=bsint3(xdata,ydata,zdata,f,xknot,yknot,zknot);

a=0.0;b=1.0;c=.5;d=1.0;

21

Matrix Command Language

e=0.0;ff=.5;val=bsitg3(a,b,c,d,e,ff,xknot,yknot,zknot,bscoef3);

g =.5*(b**4.-a**4.);h =(b-a)*(b+a);ri=g*(d-c);rj=.5*h*(d-c)*(d+c);exact=.5*(ri*(ff-e)+.5*rj*(ff-e)*(ff+e));error=val-exact;

call print('Test of bsitg3 ***********************':);call print('Lower 1 = ',a:);call print('Upper 1 = ',b:);call print('Lower 2 = ',c:);call print('Upper 2 = ',d:);call print('Lower 3 = ',e:);call print('Upper 3 = ',ff:);

call print('Integral = ',val:);call print('Exact = ',exact:);call print('Error = ',error:);

b34srun;

allows solution of the above three dimensional problem without explicit knowledge of the function. Note for this test problem we generate the data but as far as the bsitg3 command is known, the function is not known. What is happening in the solution is that splines are used to approximate the function and from these splines, the integral can be calculated. If the exact answer is known, the results can be tested. For this problem the answers were:

Test of bsitg3 *********************** Lower 1 = 0.000000000000000E+00 Upper 1 = 1.000000000000000 Lower 2 = 0.5000000000000000 Upper 2 = 1.000000000000000 Lower 3 = 0.000000000000000E+00 Upper 3 = 0.5000000000000000 Integral = 8.593750000000001E-02 Exact = 8.593750000000000E-02 Error = 1.387778780781446E-17

More extensive problems involving higher dimensions where the function is not explicitly known can be solved with the mars, gamfit and pispline models which are documented in Chapter 14 which can be run as procedures or as part of the matrix language.9 A simpler

problem that can easily be seen is . Its solution is found by:

b34sexec matrix;* Test Example from IMSL(10) ;call echooff;ndata=21;korder=5;i =integers(ndata);xdata =dfloat(i-11)/10.;f =xdata**3.;

xknot =bsnak(xdata,korder);bscoef=bsint(xdata,f,xknot);a =0.0;b =1.0;

9 Another option is the acefit command which will estimate an ACE model. The gamfit command estimates GAM models. For further detail see chapter 14.

22

Chapter 16

val =bsitg(a,b,xknot,bscoef);

* fi(x)= x**4./4.;

exact =(b**4./4.)-(a**4./4.);error=exact-val;call print('Test of bsitg ***********************':);call print('Lower = ',a:);call print('Upper = ',b:);call print('Integral = ',val:);call print('Exact = ',exact:);call print('Error = ',error:);

b34srun;

Edited output is:

Test of bsitg *********************** Lower = 0.000000000000000E+00 Upper = 1.000000000000000 Integral = 0.2500000000000001 Exact = 0.2500000000000000 Error = -1.110223024625157E-16

In both of the above examples the data was simulated by evaluation of a function. In most cases, this is not possible. The power of the spline capability is that nonlinear models can be fit using few data points. Since a spline model cannot forecast outside the range of x variable it would appear that such models are of limited use. However if a spline model is fit, then values can be interpolated and more observations can be generated. These observations can be used to fit to a nonlinear model. Since Chapter 11 contains extensive examples for nonlinear least squares and maximization problems, these features will not be discussed further here. In the next sections we discuss the matrix command language.

23

Matrix Command Language

16.3 Rules of the Matrix Language

While the B34S help facility is the place to go for detailed instructions, the basic structure of the matrix command can be illustrated by a number of examples and simple rules shown next.

1. Command Form. The matrix command begins with the statement

b34sexec matrix;

and ends with the statement b34srun;

All commands are between these two statements, unless the matrix command in running in interactive manual mode under the Display Manager. This "manual mode" allows only one line commands to be specified.

2. Sentence Terminator. All matrix statements must end in $ or ;. For example:

x=dsin(q);

There is no continuation character needed and sentences can extend over many lines.

3. Assignment Issures. Mixed mode math is not allowed. For example assuming x is real*8

x=x*2;

is not allowed because x is real*8 and 2 is an integer*4. The reason mixed mode is not allowed is that the processor would not know what to do with the result. This design is in contrast to many languages that automatically create real*8 values. The correct form for the above statement is:

x=x*2.;

if real*8 results are desired or

x=idint(x)*2;

if you want an integer*4 result and x was real*8 before the command. The form

x=dint(x*2.);

truncates x*2. and places it in the real*8 variable x.

4. Structured Objects. Calculated structured objects can only be used on the right of an expression or in a subroutine call as input. For example if x is a 2-D object

mm=mean(x(,3));

calculates the mean of column 3 while

24

Chapter 16

nn=mean(x(3,));

calculates the mean of row 3.

i=integers(2,30); y(i)=x(i-1);

copies x elements from locations 1 through 29 into y locations 2 to 30. What is not allowed is

i=integers(1,29); y(i+1)=x(i);

since it involves a calculated subscript on the left of the equals sign.

5. Data storage issues. Since structured objects repackage the data, they cannot be used for output from a subroutine or function. For example assume x is a 3 by 5 matrix. If we wanted to take the log of a row or column, the correct code is

x(2,)=dlog(x(2,)); x(,3)=dlog(x(,3));

The code

subroutine dd(x); x=log(x); return; end;

call dd(x(2,)); call dd(x(,3));

will not work since rows and columns are repackaged into vectors which do not line up with the original storage of the matrix. If a user function is desired to be used, then logic such as

b34sexec matrix;x=matrix(3,3:1 2 3 4 5 6 7 8 9);call print(x);

function dd(x); yy=dlog(x); return(yy); end;

x(2,)=dd(x(2,)); x(,3)=dd(x(,3)); call print(x);

b34srun; should be used. However the above code has a "hidden" bug that impacts the x(2,3) element. The reader should study what is happening. As a hint. The original x(2,3) term was 6. The log of

25

Matrix Command Language

6 = 1.7918. The value found in the x(2,3) position is .583179 or the log(log(6)) because of the first replacement which might not be what is intended.

6. Automatic Expansion. Structured objects can be used on the left of an assignment statement to load data. To add another element to x use

x=3.0; x(2)=4.0;

while to place 0.0 in column 2 use

x=rn(matrix(4,4:)); x(,2)=0.;

To place 99. in row 3

x(3,)=99.;

while to set element 3,2 of x to .77 use

x(3,2)=.77;

The following code shows advanced structured index processing. This code is available in matrix.mac in overview_2

/$ Illustrates Structural Index Processingb34sexec matrix;

x =rn(matrix(6,6:));y =matrix(6,6:);yy =matrix(6,6:);z =matrix(6,6:);zz =matrix(6,6:);i=integers(4,6);j=integers(1,3);xhold=x;

hold=x(,i);call print('cols 4-6 x go to hold',x,hold);

y(i, )=xhold(j,);call print('Rows 1-3 xhold in rows 4-6 y ',xhold,y);y=y*0.0;j2 =xhold(j,);y(i, )=j2 ;call print('Rows 1-3 xhold in rows 4-6 y ',xhold,y);

z(,i)=xhold(,j);call print('cols 1-3 xhold in cols 4-6 z ',xhold,z);j55 =xhold(,j);z=z*0.0;z(,i)=j55;call print('cols 1-3 xhold in cols 4-6 z ',xhold,z);

26

Chapter 16

yy=yy*0.0;yy(i,)=xhold;call print('rows 1-3 xhold in rows 4-6 yy',xhold,yy);

zz=zz*0.0;do ii=1,3;jj=ii+3;zz(,jj)=xhold(ii,);enddo;

/; i=integers(4,6);/; j=integers(1,3);

call print('Note that zz(,i)= xhold(j,) will not work':);call print('Testing zzalt(,i)= transpose(xhold(j,))':);

/; Use of Transpose speeds things up over do loop

zzalt=zz*0.0;zzalt(,i)= transpose(xhold(j,)) ;call print('rows 1-3 xhold in cols 4-6 zz',xhold,zz,zzalt);

zz=zz*0.0;zzalt=zz;do ii=1,3;jj=ii+3;zz(jj,)=xhold(,ii);enddo;

call print('Note that zz(i,)=xhold(,j) will not work':);call print('Testing zzalt(i,)= transpose(xhold(,j))':);zzalt(i,)=transpose(xhold(,j));call print('cols 1-3 xhold in rows 4-6 zz',xhold,zz,zzalt);

oldx=rn(matrix(20,6:));newx= matrix(20,5:);i=integers(4);newx(,i)=oldx(,i);call print('Col 1-4 in oldx goes to newx',oldx,newx);

oldx=rn(matrix(20,6:));newx= matrix(20,5:);i=integers(4);

newx(1,i)=oldx(1,i);call print('This puts the first element in col ',oldx,newx);newx=newx*0.0;

newx(i,1)=oldx(i,1);call print('This puts the first element in row ',oldx,newx);newx=newx*0.0;

newx( ,i)=oldx( ,i);call print('Whole col copied here',oldx,newx);

oldx=rn(matrix(10,5:));newx= matrix(20,5:);

27

Matrix Command Language

i=integers(4);

newx(i,1)=oldx(i,1);call print('This puts the first element in row ',oldx,newx);

newx=newx*0.0;newx(i,)=oldx(i,);call print('Whole row copied',oldx,newx);

* We subset a matrix here ;a=rn(matrix(10,5:));call print('Pull off rows 1-3, cols 2-4', a,a(integers(1,3),integers(2,4)));

b34srun;

The reader is invited to run this sample program and inspect the results. Structured index programming is compact and fast and should be used wherever possible. The do command is provided for the cases, hopefully few are far between, when structured index processing is not possible. In this example is is demonstrated that by use of the transpose after the structured extract, a do loop is not required. Styructured index programming takes care but can achieve great gains due to lowering the paser overhead implicit in a do loop.

7. Restrictions on the left hand side of an expression. Functions or math expressions are not allowed on the left hand side of an equation. Assume the user wants to load another row. The command

x(norows(x)+1,)=v;

in the sequence

x=matrix(3,3:1 2 3 4 5 6 7 8 9); v=vector(3:22 33 44); x(norows(x)+1,)=v;

will not work. The correct way to proceed is:

x=matrix(3,3:1 2 3 4 5 6 7 8 9); v=vector(3:22 33 44); n=norows(x)+1; x(n,)=v;

to produce

28

Chapter 16

Note that the matrix, array and vector commands automatically convert integers to real*8 in the one exception to rule 3 about mixed mode operations above. The command

x(i+1)=value;

will not work since there is a calculation implicit on the left. The correct code is:

j=i+1; x(j)=value;

Advanced code includes:

b34sexec matrix display=col80medium; x=matrix(3,3:1 2 3 4 5 6 7 8 9); v=vector(:1 2 3 4 5 6 7 8 9); xx=matrix(3,3:v); /; Note that xx is saved by columns hence the elements /; in xx2 repack into a 9 by 1 vector of the columns of x /; xx3 is transpose(x) xx2=matrix(9,1:xx); xx3=matrix(3,3:xx2); call print(x,v,xx,xx2,xx3); b34srun;

X = Matrix of 3 by 3 elements

1 2 3 1 1.00000 2.00000 3.00000 2 4.00000 5.00000 6.00000 3 7.00000 8.00000 9.00000

V = Vector of 9 elements

1.00000 2.00000 3.00000 4.00000 5.00000 6.00000 7.00000 8.00000 9.00000

XX = Matrix of 3 by 3 elements

1 2 3 1 1.00000 2.00000 3.00000 2 4.00000 5.00000 6.00000 3 7.00000 8.00000 9.00000

XX2 = Matrix of 9 by 1 elements

1 1 1.00000 2 4.00000 3 7.00000 4 2.00000 5 5.00000 6 8.00000 7 3.00000 8 6.00000 9 9.00000

29

Matrix Command Language

XX3 = Matrix of 3 by 3 elements

1 2 3 1 1.00000 4.00000 7.00000 2 2.00000 5.00000 8.00000 3 3.00000 6.00000 9.00000

1-D and 2-D objects can be concatenated using catcol and catrow commands. If objects are of unequal length, missing data will be supplied. Examples files for catcol and catrow should be run for further detail.

8. Matrix/Vector Math vs. Array Math. Matrix and array math is supported. If x is a 3 by 3 matrix, the command

ax=afam(x);

will create a 3 by 3 array ax. If x is a 3 by 1 array. The command

mx=vfam(x);

will create a 3 by 1 matrix mx containing x. To convert x to a vector, column by column use

vvnew=vector(:x);

Array math is element by element math, while matrix math uses linear algebra rules. If v is a vector of 6 elements the command

newv=afam(v)*afam(v);

squares all elements while

p=v*v;

is the inner product or the sum of the elements squared. An important issue is how to handle matrix/vector addition. If A and B are both n by m matrices, the command

c=a+b;

creates the n by m matrix C where . As Greene(2000,11) notes "matrices cannot be added unless they have the same dimensions, in which case they are said to be conformable for addition." If A and B were vectors of length n, then . If A is a n by n matrix, the statement

c=a+2.;

creates C where . If A were an 2-D array, then If A was a 1-D object, then element by element math would be used. This

convention is similar to SPEAKEASY and in contrast to MATLAB which for addition and subtraction of scalars handles things as if the objects were arrays. In B34S if a scalar is added or subtracted from a m by n matrix where , an error message is given. For vectors we have

=> VX=VECTOR(5:1 1 1 1 1)$

30

Chapter 16

=> CALL PRINT((VX+1.))$

Vector of 5 elements

2.00000 2.00000 2.00000 2.00000 2.00000

element by element operations. This is similar to MATLAB operations on n by 1 and 1 by n objects which are treated as if they were vectors.

9. Keywords as variable Names. Keywords should not be used as variable names. If they are user, the command with this name is "turned off." This can cause unpredictable results with user programs, subroutines and functions. User's keywords cannot conflict with user program, subroutine or function names since the users code is not loaded unless a statement of the form

call load(name);

is given.

10. Passing Arguments. Subroutines and functions allow passing arguments which can be changed. Structured index arrays cannot be changed (see rule 5 above). For example:

call myprog(x,y); y=myfunc(x,y);

A complete example:

b34sexec matrix;

subroutine test(a); call print('In routine test A= ',a); * Reset a; call character(a,'This is a very long string'); return; end;

/$ pass in character*8

call test('junk');

call character(jj,'some junk going in'); call print(jj);

/$ pass in a character*1 array

call test(jj); call print(jj); b34srun;

Special characters such as : and | are not allowed in user subroutines or function calls because of the difficulty of parsing these characters in the user routine. This restriction may change in future versions of the matrix command if there is demand.

11. Coding assumptions. Statements such as:

31

Matrix Command Language

y = x-z;

are allowed. Statements such as

y = -x+z;

will not work as intended. The error message will be "Cannot classify sentence Y ...". The command should be given as

y = -1.*x + z;

or better still

y = (-1.*x) + z;

A statement

y = x**2;

where x is real*8 will get a mixed mode message and should be given as

y = x**2.;

Complex statements such as

yhat = b1*dexp(-b2*x)+ b3*dexp(-(x-b4)**2./b5**2.) + b6*dexp(-(x-b7)**2./b8**2.);

will not work as intended and should have ( ) around the power expressions and -1.* .

yhat = b1*dexp(-1.0*b2*x)+ b3*dexp(-1.0*((x-b4)**2.)/(b5**2.)) + b6*dexp(-1.0*((x-b7)**2.)/(b8**2.));

It is a good plan to use ( ) to make sure what is calculated was what is intended.

Examples of matrix language statements:

The statement

y=dsin(x);

is an analytic statement that creates the structured object y by taking the sin of the structured object x. The variable x can be a scalar, 1-D object (array, vector) or a 2-D object (matrix, array). The following code copies elements 5-10 of y to x(2),...,x(7)

i=integers(5,10);j=i-3;x(j)=y(i);

and is much faster than the scalar implementation

do j=2,7;x(j)=y(j+3);

32

Chapter 16

enddo;

which has high parse overhead.

12. Automatic Expansion of Variables – Some Cautionary notes. The following code illustrates automatic expansion issues.

x(1)=10.; x(2)=20.;

The array x contains elements 10. and 20. Warning! The commands

x(1)=10.; x(2)=20;

produces an array of 0 20 since the statement

x(2)=20;

redefines the x array to be integer! This is an easy mistake to make since computers do what we tell them to do very quickly! Statements such as

x(0) = 20.; x(-1)= 20.;

x(1) = 20.;

all set element 1 of x to 20. The x(0) and x(-1) statements will generate a message warning the user.

13. Memory Management. Automatic expansion of arrays can will cause the program to "waste" memory since newer copies of the expanded variable will not fit into the old location. The matrix command will have to allocate a new space which will leave a "hole" in memory. The command

call compress;

cannot be used to compress the workspace if it is given while in a user subroutine, function or program. In addition to space requirements, prior allocation will substantially speed up execution. If memory problems are encountered, the command

call names(all);

can be used to see how the variables are saved in memory and whether as the calculation proceeds more space is used. For example compare the following code;

n=10; x=array(n:); call names(all); do i=1,n; x(i)=dfloat(i); call names(all); enddo;

33

Matrix Command Language

with

n=10; call names(all); do i=1,n; x(i)=dfloat(i); call names(all); enddo;

The first job will run faster and not use up memory. This job can be found in the file matrix.mac under MEMORY and should be run by users wanting to write efficient subroutines. An alternative is to use the solvefree command as

do i=1,2000; call solvefree(:alttemp); * many commands here ;

call solvefree(:cleantemp); enddo;

The first call with :alttemp sets %%____ style temp variables in place of the default ##____ style. The command :cleantemp resets the temp style to ##____ and cleans all %%____ temps, leaving the ##_____ style temps in place. If this capability is used carefully, substantialspeed gains can be made. In addition the max number of temps will not be reached. Use of this feature slows down processing and is usually not needed. The command

call solvefree(:cleantemp2);

cleans user temps at or above the current level. This can be useful within a nested call to clean work space. Many systems like SPEAKEASY do automatic compression which substantially slows execution since the location of all variables must be constantly checked on the chance that they have moved. The matrix command releases temp variables after each line of code but does not do a compress unless told to do so. New temps are slotted into unused locations. The latter is not possible if objects are getting bigger during execution of a job.

The dowhile loop usually is cycled many times and needs active memory management. An Example is:

b34sexec matrix; sum=0.0; add=1.; ccount=1.; count=1.; tol=.1e-6;

/$ outer dowhile does things 2 times

call outstring(2,2,'We sum until we can add nothing!!'); call outstring(2,4,'Tol set as '); call outdouble(20,4,tol);

call echooff;

call solvefree(:alttemp);

34

Chapter 16

dowhile(ccount.ge.1..and.ccount.le.3.);

sum=0.0; add=1.; count=1.;

dowhile(add.gt.tol); oldsum=sum; sum=oldsum+((1./count)**3.); count=count+1.; call outdouble(2,6,add); add=sum-oldsum;

/$ This section cleans temps

if(dmod(count,10.).eq.0.)then; call solvefree(:cleantemp); call solvefree(:alttemp); endif;

enddowhile;

ccount=ccount+1.; call print('Sum was ',sum:); call print('Count was ',count); enddowhile; b34srun;

14. Missing Data. Missing data often causes problems. Assume the following code:

b34sexec matrix; x=rn(array(10:)); lagx=lag(x,1); y=x-(10.*lagx); goody=goodrow(y); call tabulate(x,lagx,y,goody); b34srun;

Y will contain missing data in row 1. The variable goody will contain 9 non missing value observations.

15. Recursive solutions. In many cases the solution to a problem requires recursive evaluation of an expression. While the use of recursive function calls is possible, it is not desirable since there is great overhead in calling the function or subroutine over and over again. The do loop, while still slow, is approximately 100 times faster than a recursive function call. The test problem RECURSIVE in c:\b34slm\matrix.mac documents how slow the recursive function call and do loop are for large problems. Another reason that a recursive function call is not recommended is that the stack must be saved. The best way to handle a recursive call is to use the solve statement to define the expression that has to be evaluated one observation at a time. If the expression contains multiple expressions that are the same, a formula can be defined and used in the solve statement. The formula and solve statements evaluate an expression over a range, one observation at a time. This is in contrast to the usual analytic expression which is evaluated completely on the right before the copy is made. Unlike an expression, a formula or solve statement can refer to itself on the right. The block keyword determines the order in which the formulas are evaluated. If the expression in the solve statement does not have duplicate code,

35

Matrix Command Language

it is faster not to define a formula. Examples of both approaches are given next. The first problem is a simple expression not requiring a formula. The code

b34sexec matrix;test=array(10:);

test(1)=.1; b=.9; solve(test=b*test(t-1)

:range 2 norows(test) :block test);

call print(test);b34srun;

works but

test = b*lag(test,1);

will not get the "correct" answer since the right hand side is built before the copy is done.The formula statement requires use of the subscript t unless the variable is a scaler. The use of the formula and solve statements are illustrated below:

b34sexec options ginclude('gas.b34'); b34srun;b34sexec matrix ;call loaddata;double=array(norows(gasout):);formula double = dlog(gasout(t)*2.);call names;call print(double);test2=array(norows(gasout):);solve(test2=test2(t-1)+double(t)+double(t-1) :range 2, norows(gasout) :block double);call print(mean(test2));b34srun;

The following two statements are the same but execute at different speeds.

do i=1,n; x(i)=y(i)/2.; enddo;

solve(x=y(t)/2. :range 1 n);

The formula and solve statements can be used to generate an AR(1) model. This is example solve7 in c:\b34slm\matrix.mac

b34sexec matrix ;* Generate ar(1) model;g(1)=1.;theta= .97;vv = 10. ;formula jj=g(t-1)*theta+vv*rn(1.);solve(g=jj(t) :range 2 300 :block jj);call graph(g :heading 'AR(1) process');call print(g);call autobj(g :print :nac 24 :npac 24 :nodif :autobuild );b34srun;

Solve and formula statements cannot contain user functions. More detail on the solve and formula statements are given below. In RATS, unless do loops are used, recursive models are

36

Chapter 16

only allowed in the maximize command, and then only in a very restrictive form.10 The B34S implementation, while slower, allows more choices.

16. User defined data structures. The B34S matrix command allows users to build custom data types. The below example shows the structure PEOPLE consisting of a name field (PNAME), a SSN field (SSN), an age field (AGE), a race field (RACE) and an income field (INCOME). The built-in function sextract( ) is used to take out a field and the built-in subroutine isextract is used to place data in a structure. Both sextract and isextract allow a third argument that operates on an element. The name sextract is "structure extract" while isextract is "inverse structure extract." Use of these commands is illustrated by:

b34sexec matrix; people=namelist(pname,ssn,age,race,income); pname =namelist(sue,joan,bob); ssn =array(:20,100,122); age =idint(array(:35,45,58)); race =namelist(hisp,white,black); income=array(:40000,35000,50000); call tabulate(pname,ssn,age,race,income); call print('This prints the age vector',sextract(people(3))); call print('Second person',sextract(people(1),2), sextract(people(3),2)); * make everyone a year older ; nage=age+1; call isextract(people(3),nage); call print(age); * make first person 77 years old; call isextract(people(3),77,1); call print(age); b34srun;

Data structures are very powerful and, in the hands of an expert programmer, can be made to bring order to complex problems.

17. Advanced programming Concepts and Techniques for Large Problems

Programs such as SPEAKEASY and MATLAB, which are meant to be used interactively, have automatic workspace compression. As a result a SPEAKEASY LINKULE programmer has to check for movement of defined objects anytime an object is created or freed. In a design decision to increase speed, B34S does not move the variables inside named storage unless told to do so. If a do loop terminates and the user is not in a SUBROUTINE, temp variables are freed. If a new temp variable is needed, B34S will try to place this variable in a free slot. If a variable is growing, this may not be possible. Hence it is good programming practice to create arrays and not rely on automatic variable expansion. In a subroutine call, a variable passed in is first copied to another location and set to the current level + 1. Thus there are named storage implications of a subroutine call. The command

call compress;

will manually clean out all temp variables and compress the workspace. While this command takes time, in a large job it may be required to save time and space. Temp variables are named

10 In Rats version 6.30 this restriction seems to have been somewhat lifted

37

Matrix Command Language

##1 ...... ##999999. If the peak number of temp variables gets > 999999, then B34S has to reuse old names and as a result slows down checking to see if a name is currently being used. A call to compress will reset the temp variable counter as well as free up space. If compress is called from a place it cannot run, say in a do loop or in a subroutine or program or function, then it will not run. No message will be given. The matrix command termination message gives space used, peak space used and peak and current temp # usage. Users can monitor their programs with these measures to optimize performance. In the opinion of the developer, the B34S matrix command do loop is too slow. The problem is that the do loop will start to run without knowing the ending point because it is supported at the lowest programming level. In contrast, SPEAKEASY requires that the user have a do loop only in a subroutine, program or function where the loop end is known in theory. Ways to increase do loop speed are high on the "to do" list. Faster CPU's, better compilers or better chip design may be the answer. The Lahey LF95 compiler appears to make faster do loops than the older Lahey LF90 compiler. This suggests that the compiler management of the cache may be part of what is slowing the do loop down. The test problem solve6 in c:\b34slm\matrix.mac illustrates some of these issues. Times and gains from the solve statement vary based on the compiler used to build the B34S. Columns 1 and 2 were run on the same machine (400 GH) with the same source code. The Lahey LF95 compiler was a major upgrade over the order LF90. Column 3 shows the same problem with the addition of a solvefree call added to the do loop run on a 1000 GH machine running LF95 5.6g. In this example the source code was improved. The speed-up exceeds the chip gain of 2.5 (1000/400) and can be attributed to compiler improvements, and source code improvements and chip design improvements.

LF90 4.50i LF95 5.5b LF95 5.6g SOLVE time 9.718 9.22 1.3018 DO time 41.69 13.73 1.422 Gain of SOLVE 4.3897 1.49 1.0932

In summary LF90 appears to make a very slow do loop while LF95 is faster. In simple equations the formula and solve commands are useful. With large complex sequences of commands, the do loop cost may have to be "eaten" by the user since it is relative low in comparison to the cost of parsing the expression. Speed can be increased by using variables for constants because at parse time all scalars are made temps. Creating temps outside the loop speeds things up. The following four examples show various speed code:

* slow code;do i=1,1000;

x(i)=x(i)*2.; enddo;

* better code;two=2.0;

do i=1,1000; x(i)=x(i)*two; enddo;

* vectorized code; i=integers(1,1000); x=x(i)*2.;

* Compact vectorized code x=x(integers(3,1000))*2.;

38

Chapter 16

If all elements need to be changed the fastest code is

x=x*2.;

In the vectorized examples parse time is the same no matter whether there are 10 elements in x or 10,000,000. For speed gains from the use of masks, see # 20 below.

Since B34S can create, compile and execute Fortran programs, for complex calculations a branch to Fortran is always an option. The larger the dataset the less the overhead costs. The fortran example in matrix.mac illustrates dynamically building, compiling and calling a Fortran program from the matrix command.

b34sexec matrix;call open(70,'_test.f');call rewind(70);/$ 1234567890call character(test," write(6,*)'This is a test # 2'" " n=1000 " " write(6,*)n " " do i=1,n " " write(6,*) sin(float(i)) " " enddo " " stop " " end ");call write(test,70);call close(70);

/$ lf95 is Lahey Compiler/$ g77 is Linux Compiler/$ fortcl is script to run Lahey LF95 on Unix to link libs

call dodos('lf95 _test.f');* call dounix('g77 _test.f -o_test'); call dounix('lf95 _test.f -o_test');* call dounix('fortcl _test.f -o_test');call dodos('_test > testout':);call dounix('./_test > testout':);call open(71,'testout');call character(test2,' ');call read(test2,71);call print(test2);testd=0.0;n=0;call read(n,71);testd=array(n:);call read(testd,71);call print(testd);

call close(71);call dodos('erase testout');call dodos('erase _test.f');call dounix('rm testout');call dounix('rm _test.f');

b34srun;

39

Matrix Command Language

A substantially more complex example using a GARCH model is shown next. At issue is how to treat the first non observed second moment observation. Three problems are run. The garchest command does not set this value to 0.0. The Fortran implementation does set the value to 0.0 and 100% matches RATS output. The Fortran implementation is orders of magnitude slower but shows the user's ability to have total control of what is being maximized.

/$ Tests RATS vs GARCHEST vs FORTRAN/$ In the FORTRAN SETUP see line arch(1)=0.0/$ If line is commented out => GARCHEST = FORTRAN/$ If line is not commented out FORTRAN = RATS/$ This illustrates the effect of starting values!!!!!!/$ Also illustrates Fortran as a viable alternative when there/$ are very special models to be run that are recursive in/$ nature

b34sexec options ginclude('b34sdata.mac') member(lee4); b34srun;

%b34slet dorats=1;

/$ Using garchest

%b34slet dob34s1=1;

/$ Using Fortran

%b34slet dob34s2=1;

/$ **********************************************************

%b34sif(&dob34s1.ne.0)%then;b34sexec matrix ;call loaddata ;

* The data has been generated by GAUSS by following settings $* a1 = GMA = 0.09 $* b1_n = GAR = 0.5 ( When Negative) $* b1 = GAR = 0.01 $

call echooff ;

maxlag=0 ;y=doo1 ;y=y-mean(y) ;

v=variance(y) ;arch=array(norows(y):) + dsqrt(v);

* GARCH on a TGARCH Model ;

call garchest(res,arch,y,func,maxlag,n :ngar 1 :garparms array(:.1) :ngma 1 :gmaparms array(:.1) :maxit 2000 :maxfun 2000 :maxg 2000 :steptol .1d-14 :cparms array(2:.1,.1) :print ); b34srun; %b34sendif;

/$ Fortran

%b34sif(&dob34s2.ne.0)%then;

40

Chapter 16

b34sexec matrix ; call loaddata ;

* The data has been generated by GAUSS by following settings $* a1 = GMA = 0.09 $* b1_n = GAR = 0.5 ( When Negative) $* b1 = GAR = 0.01 $

* call echooff ;

/$ Setup fortran

call open(70,'_test.f');call rewind(70);

/$ We now save the Fortran Program in a Character object/$ Will get overflows

call character(fortran,/$23456789012345678901234567890" implicit real*8(a-h,o-z) "" parameter(nn=10000) "" dimension data1(nn) "" dimension res1(nn) "" dimension res2(nn) "" dimension parm(100) "" call dcopy(nn,0.0d+00,0,data1,1)"" call dcopy(nn,0.0d+00,0,res2 ,1)"" open(unit=8,file='data.dat') "" open(unit=9,file='tdata.dat') "" read(8,*)nob "" read(8,*)(data1(ii),ii=1,nob) "" read(9,*)npar "" read(9,*)(parm(ii),ii=1,npar) "" read(9,*) res2(1) "" close(unit=9) "" "" do i=1,nob "" res1(i)=data1(i)-parm(3) "" enddo "" "" func=0.0d+00 "" do i=2,nob "" res2(i) =parm(1)+(parm(2)* res2(i-1) ) +"" * (parm(4)*(res1(i-1)**2) ) "

" if(dabs(res2(i)).le.dmach(3))then "" func= 1.d+40 "" go to 100 "" endif "

" func=func+(dlog(dabs(res2(i))))+ "" * ((res1(i)**2)/res2(i)) "" enddo "" func=-.5d+00*func "" 100 continue "" close(unit=8) "" open(unit=8,file='testout') "" write(8,fmt='(e25.16)')func "" close(unit=8) "" stop "" end ");

/$ Fortran Object written here

call write(fortran,70);call close(70);

maxlag=0 ;y=doo1 ;y=y-mean(y) ;

41

Matrix Command Language

* compile fortran and save data;

/$ lf95 is Lahey Compiler/$ g77 is Linux Compiler/$ fortcl is script to run Lahey LF95 on Unix to link libs

call dodos('lf95 _test.f');* call dounix('g77 _test.f -o_test');* call dounix('lf95 _test.f -o_test'); call dounix('fortcl _test.f -o_test');call open(72,'data.dat');call rewind(72);call write(norows(y),72);call write(y,72,'(3e25.16)');call close(72);

v=variance(y) ;arch=array(norows(y):) + dsqrt(v);

i=2;j=norows(y);

count=0.0;

call echooff;

program test;

call open(72,'tdata.dat');call rewind(72);npar=4;call write(npar,72);call write(parm,72,'(e25.16)');/$/$ If below line is commented out => GARCHEST = FORTRAN/$ If below line is not commented out FORTRAN = RATS/$ arch(1)=0.0d+00 ;call write(arch(1),72,'(e25.16)');call close(72);

call dodos('_test');call dounix('./_test ');call open(71,'testout');func=0.0;call read(func,71);call close(71);

count=count+1.0; call outdouble(10,5 ,func); call outdouble(10,6 ,count); call outdouble(10,7, parm(1)); call outdouble(10,8, parm(2)); call outdouble(10,9, parm(3)); call outdouble(10,10,parm(4));

return;end;

ll =array(4: -.1e+10, .1e-10,.1e-10,.1e-10);uu =array(4: .1e+10, .1e+10,.1e+10,.1e+10);

* parm=array(:.0001 .0001 .0001 .0001);* parm(1)=v;* parm(3)=mean(y);

rvec=array(4: .1 .1, .1, .1);parm=rvec;

42

Chapter 16

* call names(all);

call cmaxf2(func :name test :parms parm :ivalue rvec :maxit 2000 :maxfun 2000 :maxg 2000 :lower ll :upper uu :print);*call dodos('erase testout'); call dodos('erase _test.exe');*call dounix('rm testout'); call dounix('rm _test');

b34srun;%b34sendif;

%b34sif(&dorats.ne.0)%then;b34sexec options open('rats.dat') unit(28) disp=unknown$ b34srun$b34sexec options open('rats.in') unit(29) disp=unknown$ b34srun$b34sexec options clean(28)$ b34srun$b34sexec options clean(29)$ b34srun$

b34sexec pgmcall$ rats passastspcomments('* ', '* Data passed from B34S(r) system to RATS', '* ') $pgmcards$* The data has been generated by GAUSS by following settings* a1 = GMA = 0.09* b1_n = GAR = 0.5 ( When Negative)* b1 = GAR = 0.01

compute gstart=2,gend=1000

declare series u ;* Residualsdeclare series h ;* Variancesdeclare series s ;* SD*

set rt = doo1set h = 0.0

nonlin(parmset=base) p0 a0 a1 b1 nonlin(parmset=constraint) a1>=0.0 b1>=0.0

* GARCH ************ Not correct model

frml at = rt(t)-p0frml g1 = a0 + a1*at(t-1)**2 + b1*h(t-1)frml logl = -.5*log(h(t)=g1(t))-.5*at(t)**2/h(t)

smpl 2 1000 compute p0 = 0.1 compute a0 = 0.1, a1 = 0.1, b1 =0.1

* maximize(parmset=base+constraint,method=simplex, $* recursive,iterations=100) logl

maximize(parmset=base+constraint,method=bhhh, $ recursive,iterations=10000) logl

b34sreturn;b34srun;

b34sexec options close(28)$ b34srun$b34sexec options close(29)$ b34srun$

43

Matrix Command Language

b34sexec options/$ dodos('start /w /r rats386 rats.in rats.out') dodos('start /w /r rats32s rats.in /run') dounix('rats rats.in rats.out')$ b34srun$b34sexec options npageout writeout('output from rats',' ',' ') copyfout('rats.out') dodos('erase rats.in','erase rats.out','erase rats.dat') dounix('rm rats.in','rm rats.out','rm rats.dat') $ b34srun%b34sendif;

Edited output is shown next:

B34S 8.11C (D:M:Y) 1/ 7/07 (H:M:S) 8: 3:34 DATA STEP TGARCH GMA .09 GAR2 .5 GAR .01 PAGE 1

Variable # Cases Mean Std Deviation Variance Maximum Minimum

DOO1 1 1000 -0.8631441630E-02 0.3606170368 0.1300446473 1.217023800 -1.143099300 DOO2 2 1000 0.1027493796E-01 0.3416024362 0.1166922244 1.167096100 -1.045661400 DOO3 3 1000 0.1702775142E-02 0.3401124651 0.1156764889 1.246665600 -1.053053800 DOO4 4 1000 -0.1313232100E-01 0.3612532834 0.1305039348 1.152290800 -1.303597500 DOO5 5 1000 0.2912316775E-01 0.3418287186 0.1168468729 1.128081900 -1.161346400 DOO6 6 1000 -0.5975311198E-02 0.3665673422 0.1343716164 1.521247000 -1.480897300 DOO7 7 1000 0.3427857886E-02 0.3380034576 0.1142463373 1.096066300 -0.9579734600 DOO8 8 1000 -0.1170972201E-01 0.3630108981 0.1317769122 1.593915500 -1.135067800 DOO9 9 1000 0.1179714228E-01 0.3544133951 0.1256088546 1.172364000 -1.100395100 DO10 10 1000 0.8936612970E-02 0.3380290307 0.1142636256 1.225441200 -1.297209500 CONSTANT 11 1000 1.000000000 0.000000000 0.000000000 1.000000000 1.000000000

Number of observations in data file 1000 Current missing variable code 1.000000000000000E+31

B34S(r) Matrix Command. d/m/y 1/ 7/07. h:m:s 8: 3:34.

=> CALL LOADDATA $

=> * THE DATA HAS BEEN GENERATED BY GAUSS BY FOLLOWING SETTINGS $

=> * A1 = GMA = 0.09 $

=> * B1_N = GAR = 0.5 ( WHEN NEGATIVE) $

=> * B1 = GAR = 0.01 $

=> CALL ECHOOFF $

GARCH/ARCH Model Estimated using DB2ONF Routine

Constrained Maximum Likelihood Estimation using CMAXF2 Command Finite-difference Gradiant

Model Estimated res1(t)=y(t)-cons(1)-ar(1)*y(t-1) -... -ma(1)*res1(t-1) -... res2(t)= cons(2)+gar(1)*res2(t-1) +... +gma(1)*(res1(t-1)**2)+... where: gar(i) and gma(i) ge 0.0 LF =-.5*sum((ln(res2(t))+res1(t)**2/res2(t)))

Final Functional Value 530.5635346303904 # of parameters 4 # of good digits in function 15 # of iterations 11 # of function evaluations 23 # of gradiant evaluations 13 Scaled Gradient Tolerance 6.055454452393343E-06 Scaled Step Tolerance 1.000000000000000E-15 Relative Function Tolerance 3.666852862501036E-11 False Convergence Tolerance 2.220446049250313E-14 Maximum allowable step size 2000.000000000000 Size of Initial Trust region -1.000000000000000 # of terms dropped in ML 0 1/ Condition of Hessian Matrix 5.340454918178504E-04

# Name order Parm. Est. SE t-stat 1 GAR 1 0.21526580 0.15889283 1.3547861 2 GMA 1 0.15494322 0.43788030E-01 3.5384834 3 CONS_1 0 0.11895044E-01 0.10240513E-01 1.1615672 4 CONS_2 0 0.81870430E-01 0.19816063E-01 4.1315184

44

Chapter 16

SE calculated as sqrt |diagonal(inv(%hessian))| Order of Parms: AR, MA, GAR, GMA, MU, vd, Const 1&2

Hessian Matrix

1 2 3 4 1 947.739 677.515 -31.3699 7092.35 2 711.220 1045.35 -698.983 4863.30 3 29.0281 -669.541 10146.9 111.143 4 7063.23 5153.42 547.320 55246.5

Gradiant Vector

0.296595E-03 0.370295E-03 -0.188742E-03 0.202584E-02

Lower vector

0.100000E-16 0.100000E-16 -0.100000E+33 0.100000E-16

Upper vector

0.100000E+33 0.100000E+33 0.100000E+33 0.100000E+33

B34S Matrix Command Ending. Last Command reached.

Space available in allocator 11869781, peak space used 29977 Number variables used 68, peak number used 69 Number temp variables used 29, # user temp clean 0

B34S(r) Matrix Command. d/m/y 1/ 7/07. h:m:s 8: 3:35.

=> CALL LOADDATA $

=> * THE DATA HAS BEEN GENERATED BY GAUSS BY FOLLOWING SETTINGS $

=> * A1 = GMA = 0.09 $

=> * B1_N = GAR = 0.5 ( WHEN NEGATIVE) $

=> * B1 = GAR = 0.01 $

=> * CALL ECHOOFF $

=> CALL OPEN(70,'_test.f')$

=> CALL REWIND(70)$

=> CALL CHARACTER(FORTRAN, => " implicit real*8(a-h,o-z) " => " parameter(nn=10000) " => " dimension data1(nn) " => " dimension res1(nn) " => " dimension res2(nn) " => " dimension parm(100) " => " call dcopy(nn,0.0d+00,0,data1,1)" => " call dcopy(nn,0.0d+00,0,res2 ,1)" => " open(unit=8,file='data.dat') " => " open(unit=9,file='tdata.dat') " => " read(8,*)nob " => " read(8,*)(data1(ii),ii=1,nob) " => " read(9,*)npar " => " read(9,*)(parm(ii),ii=1,npar) " => " read(9,*) res2(1) " => " close(unit=9) " => " " => " do i=1,nob " => " res1(i)=data1(i)-parm(3) " => " enddo " => " " => " func=0.0d+00 " => " do i=2,nob " => " res2(i) =parm(1)+(parm(2)* res2(i-1) ) +" => " * (parm(4)*(res1(i-1)**2) ) " => " if(dabs(res2(i)).le.dmach(3))then " => " func= 1.d+40 " => " go to 100 " => " endif " => " func=func+(dlog(dabs(res2(i))))+ " => " * ((res1(i)**2)/res2(i)) "

45

Matrix Command Language

=> " enddo " => " func=-.5d+00*func " => " 100 continue " => " close(unit=8) " => " open(unit=8,file='testout') " => " write(8,fmt='(e25.16)')func " => " close(unit=8) " => " stop " => " end ")$

=> CALL WRITE(FORTRAN,70)$

=> CALL CLOSE(70)$

=> MAXLAG=0 $

=> Y=DOO1 $

=> Y=Y-MEAN(Y) $

=> * COMPILE FORTRAN AND SAVE DATA$

=> CALL DODOS('lf95 _test.f')$

=> * CALL DOUNIX('G77 _TEST.F -O_TEST')$

=> * CALL DOUNIX('LF95 _TEST.F -O_TEST')$

=> CALL DOUNIX('fortcl _test.f -o_test')$

=> CALL OPEN(72,'data.dat')$

=> CALL REWIND(72)$

=> CALL WRITE(NOROWS(Y),72)$

=> CALL WRITE(Y,72,'(3e25.16)')$

=> CALL CLOSE(72)$

=> V=VARIANCE(Y) $

=> ARCH=ARRAY(NOROWS(Y):) + DSQRT(V)$

=> I=2$

=> J=NOROWS(Y)$

=> COUNT=0.0$

=> CALL ECHOOFF$

Constrained Maximum Likelihood Estimation using CMAXF2 Command Final Functional Value 530.5828775439368 # of parameters 4 # of good digits in function 15 # of iterations 28 # of function evaluations 38 # of gradiant evaluations 30 Scaled Gradient Tolerance 6.055454452393343E-06 Scaled Step Tolerance 3.666852862501036E-11 Relative Function Tolerance 3.666852862501036E-11 False Convergence Tolerance 2.220446049250313E-14 Maximum allowable step size 2000.000000000000 Size of Initial Trust region -1.000000000000000 1 / Cond. of Hessian Matrix 4.118214520668572E-04

# Name Coefficient Standard Error T Value 1 BETA___1 0.74116830E-01 0.20221217E-01 3.6653001 2 BETA___2 0.28143495 0.17082386 1.6475154 3 BETA___3 0.11717038E-01 0.11319858E-01 1.0350870 4 BETA___4 0.14761371 0.46375692E-01 3.1829975

46

Chapter 16

SE calculated as sqrt |diagonal(inv(%hessian))|

Hessian Matrix

1 2 3 4 1 65640.6 8265.13 1262.64 6204.13 2 8277.36 1086.36 115.630 858.414 3 1376.24 147.313 8253.82 -388.923 4 6230.55 865.144 -369.582 1216.54

Gradiant Vector

0.179395E-02 0.248061E-03 -0.844847E-04 0.213908E-03

Lower vector

-0.100000E+10 0.100000E-10 0.100000E-10 0.100000E-10

Upper vector

0.100000E+10 0.100000E+10 0.100000E+10 0.100000E+10

B34S Matrix Command Ending. Last Command reached.

Space available in allocator 11868953, peak space used 28116 Number variables used 58, peak number used 60 Number temp variables used 10326, # user temp clean 0

B34S 8.11C (D:M:Y) 1/ 7/07 (H:M:S) 8: 3:56 PGMCALL STEP TGARCH GMA .09 GAR2 .5 GAR .01 PAGE 2

output from rats

* * Data passed from B34S(r) system to RATS * CALENDAR(IRREGULAR) ALLOCATE 1000 OPEN DATA rats.dat DATA(FORMAT=FREE,ORG=OBS, $ MISSING= 0.1000000000000000E+32 ) / $ DOO1 $ DOO2 $ DOO3 $ DOO4 $ DOO5 $ DOO6 $ DOO7 $ DOO8 $ DOO9 $ DO10 $ CONSTANT SET TREND = T TABLE Series Obs Mean Std Error Minimum Maximum DOO1 1000 -0.008631 0.360617 -1.143099 1.217024 DOO2 1000 0.010275 0.341602 -1.045661 1.167096 DOO3 1000 0.001703 0.340112 -1.053054 1.246666 DOO4 1000 -0.013132 0.361253 -1.303597 1.152291 DOO5 1000 0.029123 0.341829 -1.161346 1.128082 DOO6 1000 -0.005975 0.366567 -1.480897 1.521247 DOO7 1000 0.003428 0.338003 -0.957973 1.096066 DOO8 1000 -0.011710 0.363011 -1.135068 1.593916 DOO9 1000 0.011797 0.354413 -1.100395 1.172364 DO10 1000 0.008937 0.338029 -1.297209 1.225441 TREND 1000 500.500000 288.819436 1.000000 1000.000000

* The data has been generated by GAUSS by following settings * a1 = GMA = 0.09 * b1_n = GAR = 0.5 ( When Negative) * b1 = GAR = 0.01 compute gstart=2,gend=1000 declare series u ;* Residuals declare series h ;* Variances declare series s ;* SD * set rt = doo1 set h = 0.0 nonlin(parmset=base) p0 a0 a1 b1 nonlin(parmset=constraint) a1>=0.0 b1>=0.0 * GARCH ************ Not correct model frml at = rt(t)-p0 frml g1 = a0 + a1*at(t-1)**2 + b1*h(t-1) frml logl = -.5*log(h(t)=g1(t))-.5*at(t)**2/h(t) smpl 2 1000 compute p0 = 0.1 compute a0 = 0.1, a1 = 0.1, b1 =0.1 * maximize(parmset=base+constraint,method=simplex, $ * recursive,iterations=100) logl maximize(parmset=base+constraint,method=bhhh, $

47

Matrix Command Language

recursive,iterations=10000) logl

MAXIMIZE - Estimation by BHHH Convergence in 11 Iterations. Final criterion was 0.0000017 <= 0.0000100 Usable Observations 999 Function Value 530.58287754

Variable Coeff Std Error T-Stat Signif ******************************************************************************* 1. P0 0.0030855387 0.0113778017 0.27119 0.78624540 2. A0 0.0741165760 0.0202703454 3.65640 0.00025578 3. A1 0.1476142210 0.0517016977 2.85511 0.00430214 4. B1 0.2814353360 0.1738216127 1.61910 0.10542480

Note that for the RATS and Fortran implementation the function value was 530.582877. The garchest vaue was 530.5635346303904. What is most surprising is the effect on the parameters which are shown below for garchest. (Note p0 = CONS_1, a0=CONS_2, a1=GMA and b1=GAR). This example shows that one observation out of 1000 can make an important difference. It also shows the abilkity for the user to have 100% control of the function being maximized.

# Name order Parm. Est. SE t-stat 1 GAR 1 0.21526580 0.15889283 1.3547861 2 GMA 1 0.15494322 0.43788030E-01 3.5384834 3 CONS_1 0 0.11895044E-01 0.10240513E-01 1.1615672 4 CONS_2 0 0.81870430E-01 0.19816063E-01 4.1315184

18. Termination Issues. Do loop and if statement termination must be hit. If this is not done, the max if statement limit or do statement limit can be exceeded depending on program logic. This "limitation" comes from having if and do loops outside programs. In Fortran, the complete do loop or if statement is known to the compiler when the executable was built. In an interpreted language such as the matrix command, the command parser does not know about a statement it has not read yet. With in a built-in command such as olsq the possible logical paths are completely known at compile time.

Remark: A human is a curious mixture of compiled and interpreted code. If one drinks too much, it can be predicted what will happen in terms of loss of coordination etc. In this sense the body knows what will occur, given an input. Free will, in contrast to predestination, implies an interpretative structure, where when one hits a branch (become an economist) one will never know what would have occurred had one taken another path.

As an example of interpreted code where the end is never seen consider

loop continue;if(dabs(z1-z2).gt.1.d-13)then;z2=z1;z1=dlog(z1)+c;go to loop;endif;

which will never hit endif; The B34S parser will not know the position of this statement and the max if statement limit could be hit if the if structure was executed many times. A better approach is not to use an if structure in this situation. Better code is:

loop continue; if(dabs(z1-z2).le.1.d-13)go to nextstep; z2=z1; z1=dlog(z1)+c; go to loop;

48

Chapter 16

nextstep continue;

19. Mask Issues. Assume an array x where for x < 0 we want y=x**2. while for we want y=2*x. A slow but very clear way to do this would be:

do i=1,norows(x); if(x(i) .lt. 0.0)y(i)=x(i)**2.; if(x(i) .ge. 0.0)y(i)=x(i)*2. ; enddo;

since the larger the X array the more parsing is required because the do loop cycles more times. A vectorized way to do the same calculation is to define two masks. Mask1 = 0.0 if the logical expression is false, = 1.0 if it is true. Faster code would be

mask1= x .lt. 0.0 ; mask2= x .ge. 0.0 ; y= mask1*(x**2.0) + mask2*(x*2.0);

Compact fast code would be

y= (x .lt. 0.0)*(x**2.0) + (x .ge. 0.0 )*(x*2.0);

Complete problem:

b34sexec matrix; call print('If X GE 0.0 y=20*x. Else y=2*x':); x=rn(array(20:)); mask1= x .lt. 0.0 ; mask2= x .ge. 0.0 ; y= mask1*(x*2.0) + mask2*(x*20.); call tabulate(x,y,mask1,mask2); b34srun;

Compact code (placing the logical calculation in the calculation expression) is:

b34sexec matrix; call print('If X GE 0.0 y=20*x. Else y=2*x':); x=rn(array(20:)); y= (x.lt.0.0)*(x*2.0) + (x.ge.0.0)*(x*20.); call tabulate(x,y); b34srun;

Logical mask expressions should be used in function and subroutine calls to speed calculation.

20. N Dimensional Objects. While the matrix command saves only 1 and 2 dimensional objects, it is possible to save and address n dimensional objects in 1-D arrays. B34S saves n dimensional objects by col. The command index(2 3 5) creates an integer array with elements 2 3 5, index(2 3 5:) determines the number of elements in a 3 dimensional array with dimensions 2, 3,5 and index(a,b,c:i,j,k) determines the position on a one dimensional vector of the i, j, k element of a three dimensional array with max dimensions a, b and c. The commands:

nn=index(i,j,k:); x=array(nn); call setndimv(index(i,j,k),index(1,2,3),x,value);

will make an 3 dimensional(i, j, k) object x and place value in the 1, 2, 3 position. The function call

49

Matrix Command Language

yy=getndimv(index(i,j,k),index(1,2,3),x);

or

yy=x(index(i,j,k:1,2,3));

can be used to pull a value out. For example to define the 4 dimensional object x with dimensions 2 3 4 5:

nn=index(2,3,4,5:);x=array(nn:);

To fill this array with values 1.,...,norows(x)

x=dfloat(integers(norows(x)));

or to set the 1, 2, 3, 1 value to 100.

call setndim(index(2,3,4,5),index(1,2,3,1),x,100.);

Examples of this facility:

b34sexec matrix; x=rn(array(index(4,4,4:):)); call print(x,getndimv(index(4,4,4),index(1,2,1),x));

do k=1,4; do i=1,4; do j=1,4; test=getndimv(index(4,4,4),index(i,j,k),x); call print(i,j,k,test); enddo; enddo; enddo;

b34srun;

b34sexec matrix; xx=index(1,2,3,4,5,4,3); call names(all); call print(xx);

call print('Integer*4 Array ',index(1 2 3 4 5 4 3)); call print('# elements in 1 2 3 4 is 24',index(2 3 4:)); call print('Position of 1 2 in a 4 by 4 is 5',index(4 4:1 2):);

call print('Integer*4 Array ',index(1,2,3,4,5 4 3));call print('# elements in 1 2 3 5 is 30',index(2,3,5:));call print('Position of 1 3 in a 4 by 4 is 9',index(4,4:1,3):);b34srun;

b34sexec matrix;mm=index(4,5,6:);xx=rn(array(mm:));idim =index(4,5,6);idim2=index(2,2,2);call setndimv(idim,idim2,xx,10.);vv= getndimv(idim,idim2 ,xx);call print(xx,vv);b34srun;

50

Chapter 16

21. Complex Math Issues. The statements

x=complex(1.5,1.5);y=complex(1.0,0.0);a=x*y;

produces a=(1.5,1.5); To zero out the imag part of a use

a=complex(real(x*y),0.0);

In summary, the B34S matrix facility provides a 4th generation programming language that is tailored to applied econometrics and time series applications. The next section discusses basic linear algebra using the matrix facility..

16.4 Linear Algebra using the Matrix Language

Basic rules of linear algebra as discussed in Greene (2000) are illustrated using the matrix command. Although the complex domain is supported, due to space limitations, this material was removed. Interested readers can look at the extensive example files for each individual matrix command. Assume A is an m by n matrix, B is n by k and C is m by k, then

(16.4-1)

The following code, which is part of ch16_13 in bookruns.mac, illustrates these calculations:

a=matrix(2,3:1 3 2 4 5,-1);b=matrix(3,2:2 4 1 6 0 5);c=a*b;call print(a,b,c,'AB',a*b,'BA',b*a);n=3;a=rn(matrix(n,n:));b=rn(a);c=a*b;call print(a,b,c,'AB',a*b,' ' 'BA',b*a,' ' 'a*(b+c) = a*b+a*c', a*(b+c), a*b+a*c );call print(' ','We show that transpose(a*b) = transpose(b)*transpose(a)', transpose(a*b), transpose(b)*transpose(a));

Edited output is:

=> A=MATRIX(2,3:1 3 2 4 5,-1)$

=> B=MATRIX(3,2:2 4 1 6 0 5)$

=> C=A*B$

=> CALL PRINT(A,B,C,'AB',A*B,'BA',B*A)$

51

Matrix Command Language

A = Matrix of 2 by 3 elements

1 2 3 1 1.00000 3.00000 2.00000 2 4.00000 5.00000 -1.00000

B = Matrix of 3 by 2 elements

1 2 1 2.00000 4.00000 2 1.00000 6.00000 3 0.00000 5.00000

C = Matrix of 2 by 2 elements

1 2 1 5.00000 32.0000 2 13.0000 41.0000

AB

Matrix of 2 by 2 elements

1 2 1 5.00000 32.0000 2 13.0000 41.0000

BA

Matrix of 3 by 3 elements

1 2 3 1 18.0000 26.0000 0.00000 2 25.0000 33.0000 -4.00000 3 20.0000 25.0000 -5.00000

=> N=3$

=> A=RN(MATRIX(N,N:))$

=> B=RN(A)$

=> C=A*B$

=> CALL PRINT(A,B,C,'AB',A*B,' ' => 'BA',B*A,' ' => 'a*(b+c) = a*b+a*c', => A*(B+C), A*B+A*C )$

A = Matrix of 3 by 3 elements

1 2 3 1 2.05157 1.27773 -1.32010 2 1.08325 -1.22596 -1.52445 3 0.825589E-01 0.338525 -0.459242

B = Matrix of 3 by 3 elements

1 2 3 1 -0.605638 1.49779 1.26792 2 0.307389 -0.168215 0.741401 3 -1.54789 0.498469 -0.187157

C = Matrix of 3 by 3 elements

1 2 3 1 1.19362 2.19986 3.79559 2 1.32678 1.06882 0.749854 3 0.764914 -0.162207 0.441611

AB

Matrix of 3 by 3 elements

1 2 3

52

Chapter 16

1 1.19362 2.19986 3.79559 2 1.32678 1.06882 0.749854 3 0.764914 -0.162207 0.441611

BA

Matrix of 3 by 3 elements

1 2 3 1 0.484652 -2.18085 -2.06609 2 0.509621 0.849968 -0.489832 3 -2.65109 -2.65224 1.36943

a*(b+c) = a*b+a*c

Matrix of 3 by 3 elements

1 2 3 1 4.32792 8.29281 11.9577 2 -0.172882 2.38876 3.26893 3 0.961325 0.455725 0.806010

Matrix of 3 by 3 elements

1 2 3 1 4.32792 8.29281 11.9577 2 -0.172882 2.38876 3.26893 3 0.961325 0.455725 0.806010

=> CALL PRINT(' ', => 'We show that transpose(a*b) = transpose(b)*transpose(a)', => TRANSPOSE(A*B), TRANSPOSE(B)*TRANSPOSE(A))$

We show that transpose(a*b) = transpose(b)*transpose(a)

Matrix of 3 by 3 elements

1 2 3 1 1.19362 1.32678 0.764914 2 2.19986 1.06882 -0.162207 3 3.79559 0.749854 0.441611

Matrix of 3 by 3 elements

1 2 3 1 1.19362 1.32678 0.764914 2 2.19986 1.06882 -0.162207 3 3.79559 0.749854 0.441611

If we define i as a n by 1 matrix of 1's, then a vector of the means of series x can be calculated as where x is a vector and n is the number of observations in the vector. This is shown by

b34sexec matrix;call print(' ''Define i as a n by 1 matrix of ones':)

n=10;i=matrix(n,1:vector(n:)+1.);seriesx=rn(vector(n:));mm=mean(seriesx);call print(mm);meanmm=i*transpose(i)*seriesx/dfloat(norows(seriesx));call print(meanmm);b34srun;

which produces values of the mean two different ways:

53

Matrix Command Language

=> CALL PRINT(' ' => 'Define i as a n by 1 matrix of ones':) => => N=10$

=> I=MATRIX(N,1:VECTOR(N:)+1.)$

=> SERIESX=RN(VECTOR(N:))$

=> MM=MEAN(SERIESX)$

=> CALL PRINT(MM)$

MM = 0.19517131

=> MEANMM=I*TRANSPOSE(I)*SERIESX/DFLOAT(NOROWS(SERIESX))$

=> CALL PRINT(MEANMM)$

MEANMM = Vector of 3 elements

0.195171 0.195171 0.195171

An idempotent matrix has the property that , while if is symmetric then in

addition, . Greene (2000, 16) discusses a matrix with this property. If

where x and y are vectors, then calculates the variance covariance matrix. Note that the diagonal elements of while . This is shown next where we calculate the mean of series x two ways using mean to get mm and to get meanmm. The covariance matrix is calculated using the cov function and as

where Z is a matrix whose columns are the vectors of data. Finally is tested and found to be close to 0.0 as expected. In terms of the program, .

b34sexec matrix;call load(cov :staging);call echooff;* Examples from Greene(2000);

/;/; Use of i/;n=3;call print('Define i as a n by 1 matrix of ones':);i=matrix(n,1:vector(n:)+1.);seriesx=rn(vector(n:));mm=mean(seriesx);meanmm=i*transpose(i)*seriesx/dfloat(norows(seriesx));call print(mm,meanmm);

/$ Get Variance Covariance

call print('Define Idempotent matrix M' 'Diagonal = 1-(1/n). Off Diag -(1/n)':);

bigi=matrix(n,n:)+1.;littlei=(1./dfloat(n))*(i*transpose(i));m0=(bigi-littlei);call print('m0 m0*m0 transpose(m0)*m0',m0,m0*m0,transpose(m0)*m0);

54

Chapter 16

seriesy=rn(seriesx);con=vector(n:)+1.;z=catcol(seriesy,seriesx,con);call print(z);vcov=transpose(z)*m0*z;call print(variance(seriesy));call print(variance(seriesx));call print('Sums and Cross Products ',vcov,m0*z);call print('cov(z) ', cov(z));call print('(1.dfloat(n-1))*vcov', (1./dfloat(n-1))*vcov);call print('mo * i . Is this 0?',m0*i);b34srun;

Output is:

B34S(r) Matrix Command. d/m/y 1/ 7/07. h:m:s 15:41: 8.

=> CALL LOAD(COV :STAGING)$

=> CALL ECHOOFF$

Define i as a n by 1 matrix of ones

MM = 1.0724604

MEANMM = Vector of 3 elements

1.07246 1.07246 1.07246

Define Idempotent matrix MDiagonal = 1-(1/n). Off Diag -(1/n)

m0 m0*m0 transpose(m0)*m0

M0 = Matrix of 3 by 3 elements

1 2 3 1 0.666667 -0.333333 -0.333333 2 -0.333333 0.666667 -0.333333 3 -0.333333 -0.333333 0.666667

Matrix of 3 by 3 elements

1 2 3 1 0.666667 -0.333333 -0.333333 2 -0.333333 0.666667 -0.333333 3 -0.333333 -0.333333 0.666667

Matrix of 3 by 3 elements

1 2 3 1 0.666667 -0.333333 -0.333333 2 -0.333333 0.666667 -0.333333 3 -0.333333 -0.333333 0.666667

Z = Matrix of 3 by 3 elements

1 2 3 1 1.27773 2.05157 1.00000 2 -1.22596 1.08325 1.00000 3 0.338525 0.825589E-01 1.00000

55

Matrix Command Language

1.5996959

0.96933890

Sums and Cross Products

VCOV = Matrix of 3 by 3 elements

1 2 3 1 3.19939 0.902698 0.555112E-16 2 0.902698 1.93868 0.444089E-15 3 0.433308E-16 0.357201E-15 0.333067E-15

Matrix of 3 by 3 elements

1 2 3 1 1.14763 0.979110 0.111022E-15 2 -1.35606 0.107914E-01 0.111022E-15 3 0.208429 -0.989901 0.111022E-15

cov(z)

Array of 3 by 3 elements

1 2 3 1 1.59970 0.451349 0.00000 2 0.451349 0.969339 0.00000 3 0.00000 0.00000 0.00000

(1.dfloat(n-1))*vcov

Matrix of 3 by 3 elements

1 2 3 1 1.59970 0.451349 0.277556E-16 2 0.451349 0.969339 0.222045E-15 3 0.216654E-16 0.178601E-15 0.166533E-15

mo * i . Is this 0?

Matrix of 3 by 1 elements

1 1 0.111022E-15 2 0.111022E-15 3 0.111022E-15

Since the Variance-Covariance Matrix can be obtained two ways, of interest is which to use. The traditional method is slower, more accurate and takes less space. The idempotent matrix method is faster due to no do loops but as will be shown is not as accurate, especially in the case of real*4 calculations using data that is not scaled. This will be demonstrated next by running the following program:

/;

56

Chapter 16

/; This illustrates two ways to get the variance-Covariance Matrix/; Using Greene's Idempotentent Matrix and real*4 if tghe data is not/; Scaled, there can be problems of accuracy that are detected using/; real*16 results as the benchmark/;b34sexec options ginclude('gas.b34'); b34srun;b34sexec matrix;call loaddata;call load(cov :staging);call load(cov2 :staging);call load(cor :staging);

call echooff;scale=1000.;h=catcol(gasin,scale*gasout);/; call print(h);call print('Covariance using function cov':);call print(cov(h));call print('Covariance using function cov2':);call print(cov2(h));call print('Difference of two methods using real*8':);call print(cov(h)-cov2(h));

call print('Correlation using function cor':);call print(cor(h));

call print('Real*16 results':);

h16=r8tor16(h);call print('Covariance using function cov':);call print(cov(h16));call print('Covariance using function cov2':);call print(cov2(h16));call print('Difference of the two methods using real*16 ':);call print(cov(h)-cov2(h));

/; Testing which is closer

real8_1=afam(cov(h));real8_2=afam(cov2(h));

call print('Difference against real16 for real8_1 & real8_2 ':);call print('Where Traditional Method = real8_1 ':);call print('Where M0 Method = real8_2 ':);call print(r8tor16(real8_1)-afam(cov(h16)));call print(r8tor16(real8_2)-afam(cov(h16)));

call print('Correlation using function cor':);call print(cor(h16));

/; real*4 results

h4=r8tor4(h);call print(' ':);call print('Where Traditional Method = real4_1 ':);call print('Where M0 Method = real4_2 ':);real4_1=afam(cov(h4));

57

Matrix Command Language

real4_2=afam(cov2(h4));call print(real4_1,real4_2);

call print('Difference against real16 for real4_1 & real4_2 ':);call print(r8tor16(r4tor8(real4_1))-afam(cov(h16)));call print(r8tor16(r4tor8(real4_2))-afam(cov(h16)));

call print('Correlation using function cor':);call print(cor(h4));

b34srun;

that calls functions cov and cov2 which are listed next.

function cov(x);/;/; Use matrix language to calculate cov of a matrix. For series use/; mm=catcol(series1,series2)/; Can use real*4, real*8, real*16 and VPA./;test=afam(x);i=nocols(x);d=kindas(x,dfloat(norows(x)-1));

do j=1,i;test(,j)=test(,j)-mean(test(,j));enddo;

ccov=afam(transpose(mfam(test))*mfam(test))/d;

if(klass(x).eq.2.or.klass(x).eq.1)ccov=mfam(ccov);return(ccov);end;

function cov2(x);/;/; Use matrix language to calculate cov of a matrix. For series use/; mm=catcol(series1,series2)/; Can use real*4, real*8, real*16 and VPA./;/; Uses Greene(2000,16) idempotent M0 matrix/; function cov( ) is a more traditional approach that uses/; far less space at the cost of a speed loss due to do loops/; cov( ) is more accurate than cov2( ) if there are scaling/; differences/;/; At issue is that m0 is n by n !!!!!/;/; Use of i which is a vector of 1's/; z=catcol(x1,x2,...,xk)/; ccov=transpose(z)*m0*z/(1/(n-1));/; where m0 diagonal = 1-(1/n). Off Diag = -1/n/;n=norows(x);real_n=kindas(x,dfloat(n));real_one=kindas(x,1.0);

58

Chapter 16

/; Define i as a n by 1 matrix of ones

i=matrix(n,1:kindas(x,(vector(n:)+1.)));

/; Get Variance Covariance/; Define Idempotent matrix M Diagonal = 1-(1/n). Off Diag -(1/n)

bigi=kindas(x,matrix(n,n:)) + real_one;littlei=(real_one/real_n)*(i*transpose(i));m0=(bigi-littlei);ccov2=transpose(mfam(x))*m0*mfam(x)/(real_n-real_one);

if(klass(x).eq.6.or.klass(x).eq.5)ccov2=afam(ccov2);return(ccov2);end;

The variance-covariance matrix for the scaled gas data is calculated using real*8, real*4 and real*16. Assuming the real*16 results are the correct answers, the results obtained for the two methods are compared. The gasout series was multiplied by 1000 to cause a scale problem that will be detected using real*4 calculations. Annotated output is shown below:

Variable Label # Cases Mean Std. Dev. Variance Maximum Minimum TIME 1 296 148.500 85.5921 7326.00 296.000 1.00000 GASIN 2 Input gas rate in cu. ft / min 296 -0.568345E-01 1.07277 1.15083 2.83400 -2.71600 GASOUT 3 Percent CO2 in outlet gas 296 53.5091 3.20212 10.2536 60.5000 45.6000 CONSTANT 4 296 1.00000 0.00000 0.00000 1.00000 1.00000 Number of observations in data file 296 Current missing variable code 1.000000000000000E+31 B34S(r) Matrix Command. d/m/y 2/ 7/07. h:m:s 10:51: 4. => CALL LOADDATA$ => CALL LOAD(COV :STAGING)$ => CALL LOAD(COV2 :STAGING)$ => CALL LOAD(COR :STAGING)$ => CALL ECHOOFF$

Using real*8 the two methods appear to be producing the same answers with a small difference in the cov(2,2) position of .625849e-6.

Covariance using function cov Array of 2 by 2 elements 1 2 1 1.15083 -1664.15 2 -1664.15 0.102536E+08 Covariance using function cov2 Array of 2 by 2 elements 1 2 1 1.15083 -1664.15 2 -1664.15 0.102536E+08 Difference of two methods using real*8

59

Matrix Command Language

Array of 2 by 2 elements 1 2 1 -0.666134E-15 -0.341061E-11 2 0.682121E-12 0.625849E-06 Correlation using function cor Array of 2 by 2 elements 1 2 1 1.00000 -0.484451 2 -0.484451 1.00000 Real*16 resultsCovariance using function cov Array of 2 by 2 elements (real*16) 1 2 1 1.15083 -1664.15 2 -1664.15 0.102536E+08 Covariance using function cov2 Array of 2 by 2 elements (real*16) 1 2 1 1.15083 -1664.15 2 -1664.15 0.102536E+08 Difference of the two methods using real*16 Array of 2 by 2 elements 1 2 1 -0.666134E-15 -0.341061E-11 2 0.682121E-12 0.625849E-06

When testing against the real*16 results the traditional method difference for the 2,2 position is .133107E-09 which is smaller than the -.625716E-06 result for the MO method. This suggests accuracy gains even with real*8.

Difference against real16 for real8_1 & real8_2 Where Traditional Method = real8_1 Where M0 Method = real8_2 Array of 2 by 2 elements (real*16) 1 2 1 -0.353299E-15 0.102890E-11 2 0.102890E-11 0.133107E-09 Array of 2 by 2 elements (real*16) 1 2 1 0.312834E-15 0.443950E-11 2 0.346778E-12 -0.625716E-06 Correlation using function cor Array of 2 by 2 elements (real*16) 1 2 1 1.00000 -0.484451 2 -0.484451 1.00000

Using real*4 but comparing in real*16 against the real*16 results produces the result that for the traditional method in the 2,2 position the difference is .469079. However for the MO method, the difference is -104.531. These findings show the accuracy loss when real*4 calculations are made both with an appropriate method and with a poorer method.

Where Traditional Method = real4_1 Where M0 Method = real4_2

60

Chapter 16

REAL4_1 = Array of 2 by 2 elements (real*4) 1 2 1 1.15083 -1664.15 2 -1664.15 0.102536E+08 REAL4_2 = Array of 2 by 2 elements (real*4) 1 2 1 1.15083 -1664.15 2 -1664.15 0.102535E+08 Difference against real16 for real4_1 & real4_2 Array of 2 by 2 elements (real*16) 1 2 1 0.313754E-07 -0.478797E-04 2 -0.478797E-04 0.469079 Array of 2 by 2 elements (real*16) 1 2 1 0.313754E-07 -0.478797E-04 2 0.741906E-04 -104.531 Correlation using function cor Array of 2 by 2 elements (real*4) 1 2 1 1.00000 -0.484451 2 -0.484451 1.00000 B34S Matrix Command Ending. Last Command reached. Space available in allocator 7869592, peak space used 803279 Number variables used 37, peak number used 48 Number temp variables used 918, # user temp clean 0

In Chapter 2 equation (2.9-6) we showed the relationship between the population error e and the sample error u. Recall that . The sum of squared sample residuals was related to the population residuals by Since M is not full rank, it is not possible to estimate the population residual as and hence there are only N-K BLUS residuals. Theil shows that . Sample job CH16_LUS in bookruns.mac illustrates these calculations. For more detail see Chapter 2.

b34sexec matrix;* See Chapter 2 equation 2.9-4) - (2.9-7) ;* In this example all coefficients are 1.0;n=20;k=5;beta=vector(k:)+1.;x=rn(matrix(n,k:));x(,1)=1.0;y=x*beta+rn(vector(norows(x):));bigi=matrix(n,n:) + 1.;m=(bigi-x*inv(transpose(x)*x)*transpose(x));mm=m*m;test=sum(dabs(mm-m));call print('Test ',test:);call print('Theil (1971) page shows sumsq error = y*m*y');testss=y*m*y;betahat=inv(transpose(x)*x)*transpose(x)*y;call olsq(y,x :noint :print);call print(testss,betahat);u=y-x*betahat;u_alt=m*y;sse=sumsq(u);

61

Matrix Command Language

sse2=sumsq(u_alt);call print('Two ways to get sum of squares',sse,sse2);call print('Show M not full rank',det(m));b34srun;

Which when run produces:

=> N=20$

=> K=5$

=> BETA=VECTOR(K:)+1.$

=> X=RN(MATRIX(N,K:))$

=> X(,1)=1.0$

=> Y=X*BETA+RN(VECTOR(NOROWS(X):))$

=> BIGI=MATRIX(N,N:) + 1.$

=> M=(BIGI-X*INV(TRANSPOSE(X)*X)*TRANSPOSE(X))$

=> MM=M*M$

=> TEST=SUM(DABS(MM-M))$

=> CALL PRINT('Test ',TEST:)$

Test 9.350213745623615E-15

=> CALL PRINT('Theil (1971) page shows sumsq error = y*m*y')$

Theil (1971) page shows sumsq error = y*m*y

=> TESTSS=Y*M*Y$

=> BETAHAT=INV(TRANSPOSE(X)*X)*TRANSPOSE(X)*Y$

=> CALL OLSQ(Y,X :NOINT :PRINT)$

Ordinary Least Squares EstimationDependent variable YCentered R**2 0.8293118234985847Residual Sum of Squares 14.14785861384947Residual Variance 0.9431905742566311Sum Absolute Residuals 12.125182574595871/Condition XPX 0.2405358242295580Maximum Absolute Residual 1.875788654723293Number of Observations 20

Variable Lag Coefficient SE tCol____1 0 1.5104320 0.24171911 6.2487072Col____2 0 0.87494443 0.27288711 3.2062505Col____3 0 0.63416941 0.22752179 2.7872909Col____4 0 1.2184971 0.22690316 5.3701196Col____5 0 1.0098815 0.29505957 3.4226360

=> CALL PRINT(TESTSS,BETAHAT)$

TESTSS = 14.147859

BETAHAT = Vector of 5 elements

62

Chapter 16

1.51043 0.874944 0.634169 1.21850 1.00988

=> U=Y-X*BETAHAT$

=> U_ALT=M*Y$

=> SSE=SUMSQ(U)$

=> SSE2=SUMSQ(U_ALT)$

=> CALL PRINT('Two ways to get sum of squares',SSE,SSE2)$

Two ways to get sum of squares

SSE = 14.147859

SSE2 = 14.147859

=> CALL PRINT('Show M not full rank',DET(M))$

Show M not full rank

0.17959633E-79

The BLUE property of OLS discussed in chapter 2 requires that there be no correlation between the estimated left hand side vector and the residual vector . Given that X is a matrix of explanatory variables, and , then:

(16.4-2)

which implies the restriction from which the OLS solution equation quickly follows. This is illustrated assuming

n=30;x=rn(matrix(n,5:));x(,1)=1.0;beta=vector(5:1. 2. 3. 4. 5.);y=x*beta +rn(vector(n:));xpx=transpose(x)*x;betahat=inv(xpx)*transpose(x)*y;call olsq(y x :print :noint);resid=y-x*betahat;

/$ Test if orthogonal

call print(beta,betahat,'Is residual Orthogonal with yhat?', ddot(resid,x*betahat));

63

Matrix Command Language

The results are verified with the olsq command and the orthogonal restriction is tested. The calculated ddot value 0.21405100E-12 suggests that the restriction is met by the solution vector.=> N=30$

=> X=RN(MATRIX(N,5:))$

=> X(,1)=1.0$

=> BETA=VECTOR(5:1. 2. 3. 4. 5.)$

=> Y=X*BETA +RN(VECTOR(N:))$

=> XPX=TRANSPOSE(X)*X$

=> BETAHAT=INV(XPX)*TRANSPOSE(X)*Y$

=> CALL OLSQ(Y X :PRINT :NOINT)$

Ordinary Least Squares EstimationDependent variable YCentered R**2 0.9804554404605988Residual Sum of Squares 23.50833421529196Residual Variance 0.9403333686116784Sum Absolute Residuals 21.454263095338631/Condition XPX 0.2398115678243269Maximum Absolute Residual 2.059994965106420Number of Observations 30

Variable Lag Coefficient SE tCol____1 0 1.1096979 0.18104778 6.1293098Col____2 0 1.8668577 0.18101248 10.313420Col____3 0 2.8053762 0.21003376 13.356787Col____4 0 3.9369505 0.17176593 22.920439Col____5 0 4.8105246 0.18925387 25.418369

=> RESID=Y-X*BETAHAT$

=> CALL PRINT(BETA,BETAHAT,'Is residual Orthogonal with yhat?',=> DDOT(RESID,X*BETAHAT))$

BETA = Vector of 5 elements

1.00000 2.00000 3.00000 4.00000 5.00000

BETAHAT = Vector of 5 elements

1.10970 1.86686 2.80538 3.93695 4.81052Is residual Orthogonal with yhat?

0.21405100E-12

Given A and B are square matrices, linear algebra rules for inverse and determinant are:

64

Chapter 16

(16.4-3)

This is illustrated by:

/$ Rules of Inverses

call print(xpx, inv(xpx),xpx*inv(xpx));call print(' ' 'We perform tests involving inverses ' '1/det(x) = det(inv(x))',1./det(xpx),det(inv(xpx)), 'Test if: transpose(inv(x)) = inv(transpose(x))', transpose(inv(xpx)), inv(transpose(xpx)));

x1=xpx;x2=rn(xpx);x3=rn(x2);call print('Test if: inv(x1*x2*x3) = inv(x3)*inv(x2)*inv(x1)', inv(x1*x2*x3),inv(x3)*inv(x2)*inv(x1));

which when run produces:

=> CALL PRINT(XPX, INV(XPX),XPX*INV(XPX))$

XPX = Matrix of 5 by 5 elements

1 2 3 4 5 1 30.0000 4.88927 10.3848 -1.45371 -0.536573 2 4.88927 22.1316 0.241093 -6.20091 -2.52910 3 10.3848 0.241093 30.4069 3.57937 -7.08262 4 -1.45371 -6.20091 3.57937 27.9920 -4.84565 5 -0.536573 -2.52910 -7.08262 -4.84565 39.5877

Matrix of 5 by 5 elements

1 2 3 4 5 1 0.396630E-01 -0.842611E-02 -0.142149E-01 0.160454E-02 -0.234751E-02 2 -0.842611E-02 0.507700E-01 0.228504E-02 0.113704E-01 0.492987E-02 3 -0.142149E-01 0.228504E-02 0.397422E-01 -0.417971E-02 0.655197E-02 4 0.160454E-02 0.113704E-01 -0.417971E-02 0.397024E-01 0.486005E-02 5 -0.234751E-02 0.492987E-02 0.655197E-02 0.486005E-02 0.273106E-01

Matrix of 5 by 5 elements

1 2 3 4 5 1 1.00000 0.763278E-16 0.398986E-16 0.303577E-17 0.00000 2 -0.754605E-16 1.00000 0.208167E-16 -0.520417E-17 0.416334E-16 3 -0.104083E-16 0.277556E-16 1.00000 0.277556E-16 -0.555112E-16 4 0.867362E-17 0.589806E-16 0.346945E-16 1.00000 0.00000 5 0.00000 0.277556E-16 0.555112E-16 0.277556E-16 1.00000

=> CALL PRINT(' ' 'We perform tests involving inverses ' => '1/det(x) = det(inv(x))',1./DET(XPX),DET(INV(XPX)), => 'Test if: transpose(inv(x)) = inv(transpose(x))', => TRANSPOSE(INV(XPX)), => INV(TRANSPOSE(XPX)))$

We perform tests involving inverses 1/det(x) = det(inv(x))

65

Matrix Command Language

0.62036286E-07

0.62036286E-07 Test if: transpose(inv(x)) = inv(transpose(x))

Matrix of 5 by 5 elements

1 2 3 4 5 1 0.396630E-01 -0.842611E-02 -0.142149E-01 0.160454E-02 -0.234751E-02 2 -0.842611E-02 0.507700E-01 0.228504E-02 0.113704E-01 0.492987E-02 3 -0.142149E-01 0.228504E-02 0.397422E-01 -0.417971E-02 0.655197E-02 4 0.160454E-02 0.113704E-01 -0.417971E-02 0.397024E-01 0.486005E-02 5 -0.234751E-02 0.492987E-02 0.655197E-02 0.486005E-02 0.273106E-01

Matrix of 5 by 5 elements

1 2 3 4 5 1 0.396630E-01 -0.842611E-02 -0.142149E-01 0.160454E-02 -0.234751E-02 2 -0.842611E-02 0.507700E-01 0.228504E-02 0.113704E-01 0.492987E-02 3 -0.142149E-01 0.228504E-02 0.397422E-01 -0.417971E-02 0.655197E-02 4 0.160454E-02 0.113704E-01 -0.417971E-02 0.397024E-01 0.486005E-02 5 -0.234751E-02 0.492987E-02 0.655197E-02 0.486005E-02 0.273106E-01

=> X1=XPX$

=> X2=RN(XPX)$

=> X3=RN(X2)$

=> CALL PRINT('Test if: inv(x1*x2*x3) = inv(x3)*inv(x2)*inv(x1)', => INV(X1*X2*X3),INV(X3)*INV(X2)*INV(X1))$

Test if: inv(x1*x2*x3) = inv(x3)*inv(x2)*inv(x1)

Matrix of 5 by 5 elements

1 2 3 4 5 1 -0.110146E-01 -0.295302E-01 0.305773E-01 -0.208158E-01 0.230482E-01 2 -0.897845E-02 -0.246023E-01 0.475944E-01 -0.141358E-01 0.218481E-01 3 -0.528828E-02 0.373354E-01 -0.612578E-02 0.128556E-01 -0.678314E-02 4 0.799993E-02 -0.780149E-02 -0.239615E-02 0.111239E-01 -0.453775E-02 5 -0.559507E-04 -0.513571E-02 0.296128E-01 -0.399576E-02 0.199970E-01

Matrix of 5 by 5 elements

1 2 3 4 5 1 -0.110146E-01 -0.295302E-01 0.305773E-01 -0.208158E-01 0.230482E-01 2 -0.897845E-02 -0.246023E-01 0.475944E-01 -0.141358E-01 0.218481E-01 3 -0.528828E-02 0.373354E-01 -0.612578E-02 0.128556E-01 -0.678314E-02 4 0.799993E-02 -0.780149E-02 -0.239615E-02 0.111239E-01 -0.453775E-02 5 -0.559507E-04 -0.513571E-02 0.296128E-01 -0.399576E-02 0.199970E-01

The Kronecker product between matrix A which is k by j and B which is m by n produces C which is k*m by j*n. In words, every element of A is multiplied by the B matrix. Using the Greene (2000) example data with:

a=matrix(2,2:3 0 5 2); b=matrix(2,2:1 4 4 7);call print(a,b,kprod(a,b), a(1,1)*b, a(1,2)*b, a(2,1)*b, a(2,2)*b);

we print A, B the Kronecker product and each element.

=> A=MATRIX(2,2:3 0 5 2)$

=> B=MATRIX(2,2:1 4 4 7)$

=> CALL PRINT(A,B,KPROD(A,B),

66

Chapter 16

=> A(1,1)*B, A(1,2)*B,=> A(2,1)*B, A(2,2)*B)$

A = Matrix of 2 by 2 elements

1 2 1 3.00000 0.00000 2 5.00000 2.00000

B = Matrix of 2 by 2 elements

1 2 1 1.00000 4.00000 2 4.00000 7.00000

Matrix of 4 by 4 elements

1 2 3 4 1 3.00000 12.0000 0.00000 0.00000 2 12.0000 21.0000 0.00000 0.00000 3 5.00000 20.0000 2.00000 8.00000 4 20.0000 35.0000 8.00000 14.0000

Matrix of 2 by 2 elements

1 2 1 3.00000 12.0000 2 12.0000 21.0000

Matrix of 2 by 2 elements

1 2 1 0.00000 0.00000 2 0.00000 0.00000

Matrix of 2 by 2 elements

1 2 1 5.00000 20.0000 2 20.0000 35.0000

Matrix of 2 by 2 elements

1 2 1 2.00000 8.00000 2 8.00000 14.0000

There are a number of very important factorizations in linear algebra. Assume A is a general matrix and B is a positive definite matrix, both of size n by n. The LU factorization of a general matrix writes A in terms of a lower triangular matrix L and an upper triangular matrix U.

(16.4-4)

The Cholesky decomposition writes the B in terms of an upper triangular matrix R as

(16.4-5)

The following code illustrates these decompositions.

/$ LU and Cholesky factorization

n=5; x=rn(matrix(n,n:));xpx=transpose(x)*x;call gmfac(xpx,l,u);r=pdfac(xpx);

call print('Inverse from L U = inv(u)*inv(l)'

67

Matrix Command Language

'inv(xpx)', inv(xpx), 'inv(u)', inv(u), 'inv(l)', inv(l), 'Test inverse from looking at u and l', 'inv(u)*inv(l)', inv(u)*inv(l));

call print(xpx,l,u,'l*u',l*u, 'Cholesky Factorization of pd matrix',r, 'transpose(r)*r', transpose(r)*r);

Edited output is:=> N=5$

=> X=RN(MATRIX(N,N:))$

=> XPX=TRANSPOSE(X)*X$

=> CALL GMFAC(XPX,L,U)$

=> R=PDFAC(XPX)$

=> CALL PRINT('Inverse from L U = inv(u)*inv(l)' => 'inv(xpx)', INV(XPX), => 'inv(u)', INV(U), => 'inv(l)', INV(L), => 'Test inverse from looking at u and l', => 'inv(u)*inv(l)', => INV(U)*INV(L))$

Inverse from L U = inv(u)*inv(l)

inv(xpx)

Matrix of 5 by 5 elements

1 2 3 4 5 1 0.672320 0.565469 -0.197200 0.289017 -0.201206 2 0.565469 0.930016 -0.224394 0.213563 0.540409E-01 3 -0.197200 -0.224394 0.347142 0.120672E-01 -0.307224E-01 4 0.289017 0.213563 0.120672E-01 0.300834 -0.170462 5 -0.201206 0.540409E-01 -0.307224E-01 -0.170462 0.441608

inv(u)

Matrix of 5 by 5 elements

1 2 3 4 5 1 0.152498 0.243980 -0.211385 0.211351 -0.201206 2 0.00000 0.548226 -0.220842 0.234423 0.540409E-01 3 0.00000 0.00000 0.345005 0.208219E-03 -0.307224E-01 4 0.00000 0.00000 0.00000 0.235035 -0.170462 5 0.00000 0.00000 0.00000 0.00000 0.441608

inv(l)

Matrix of 5 by 5 elements

1 2 3 4 5 1 1.00000 0.00000 0.00000 0.00000 0.00000 2 0.445036 1.00000 0.00000 0.00000 0.00000 3 -0.612703 -0.640112 1.00000 0.00000 0.00000

68

Chapter 16

4 0.899230 0.997398 0.885904E-03 1.00000 0.00000 5 -0.455621 0.122373 -0.695695E-01 -0.386004 1.00000

Test inverse from looking at u and l inv(u)*inv(l)

Matrix of 5 by 5 elements

1 2 3 4 5 1 0.672320 0.565469 -0.197200 0.289017 -0.201206 2 0.565469 0.930016 -0.224394 0.213563 0.540409E-01 3 -0.197200 -0.224394 0.347142 0.120672E-01 -0.307224E-01 4 0.289017 0.213563 0.120672E-01 0.300834 -0.170462 5 -0.201206 0.540409E-01 -0.307224E-01 -0.170462 0.441608

=> CALL PRINT(XPX,L,U,'l*u',L*U, => 'Cholesky Factorization of pd matrix',R, => 'transpose(r)*r', => TRANSPOSE(R)*R)$

XPX = Matrix of 5 by 5 elements

1 2 3 4 5 1 6.55746 -2.91830 2.14973 -2.98786 2.34107 2 -2.91830 3.12282 0.210901 -0.490650 -1.88651 3 2.14973 0.210901 4.35066 -2.14731 0.427456 4 -2.98786 -0.490650 -2.14731 7.43273 1.41839 5 2.34107 -1.88651 0.427456 1.41839 4.13919

L = Matrix of 5 by 5 elements

1 2 3 4 5 1 1.00000 0.00000 0.00000 0.00000 0.00000 2 -0.445036 1.00000 0.00000 0.00000 0.00000 3 0.327830 0.640112 1.00000 0.00000 0.00000 4 -0.455643 -0.997965 -0.885904E-03 1.00000 0.00000 5 0.357008 -0.463060 0.692275E-01 0.386004 1.00000

U = Matrix of 5 by 5 elements

1 2 3 4 5 1 6.55746 -2.91830 2.14973 -2.98786 2.34107 2 0.00000 1.82407 1.16761 -1.82035 -0.844652 3 0.00000 0.00000 2.89851 -0.256781E-02 0.200657 4 0.00000 0.00000 0.00000 4.25468 1.64233 5 0.00000 0.00000 0.00000 0.00000 2.26445

l*u

Matrix of 5 by 5 elements

1 2 3 4 5 1 6.55746 -2.91830 2.14973 -2.98786 2.34107 2 -2.91830 3.12282 0.210901 -0.490650 -1.88651 3 2.14973 0.210901 4.35066 -2.14731 0.427456 4 -2.98786 -0.490650 -2.14731 7.43273 1.41839 5 2.34107 -1.88651 0.427456 1.41839 4.13919

Cholesky Factorization of pd matrix

R = Matrix of 5 by 5 elements

1 2 3 4 5 1 2.56075 -1.13963 0.839491 -1.16679 0.914210 2 0.00000 1.35058 0.864523 -1.34783 -0.625399 3 0.00000 0.00000 1.70250 -0.150825E-02 0.117860 4 0.00000 0.00000 0.00000 2.06269 0.796207 5 0.00000 0.00000 0.00000 0.00000 1.50481

69

Matrix Command Language

transpose(r)*r

Matrix of 5 by 5 elements

1 2 3 4 5 1 6.55746 -2.91830 2.14973 -2.98786 2.34107 2 -2.91830 3.12282 0.210901 -0.490650 -1.88651 3 2.14973 0.210901 4.35066 -2.14731 0.427456 4 -2.98786 -0.490650 -2.14731 7.43273 1.41839 5 2.34107 -1.88651 0.427456 1.41839 4.13919

The usual eigenvalue decomposition of A writes it in terms of a "left handed" eigenvector matrix V and a diagonal matrix D with the eigenvalues along the diagonal as

(16.4-6)

The trace of a matrix is or the sum of the diagonal elements of D while the determinant is

. If A is symmetric we have which implies that all the columns in V are orthogonal or that for this case . The Schur decomposition writes

(16.4-7)

where U is an orthagonal matrix and S is block upper triangular with the eigenvalues on the diagonal. Here for all matrices , unlike the eigenvalue decomposition where only when the factored matrix is symmetric. Code to illustrate these types of calculations for both general and symmetric matrices:

/$ Eigenvalues & Schur we write a*v = v*lamda

a=matrix(2,2:5,1 2,4);lamda=eig(a,v);det1=det(a);trace1=trace(a);det2=prod(lamda);trace2=sum(lamda);lamda=diagmat(lamda);

call schur(a,s,u);

call print('We have defined a general matrix',a, lamda, v, 'Is sum of eigenvalues trace?' 'Is product of eigenvalues det?' det1,det2,trace1,trace2, 'With Eigenvalues a = v*lamda*inv(v)', v*lamda*inv(v), ' ', 'With Schur a = u*s*transpose(u) ' 'Schur => s upper triangular', s,u, 'a = u*s*transpose(u)' 'u*transpose(u)=I' u*transpose(u), 'This is a from the schur, is it a?', u*s*transpose(u));

70

Chapter 16

/$ PD Matrix Case

call print('Positive Def case s = lamda' 'transpose(v)*v = I' 'sum(lamda)=trace' 'prod(lamda)=det');

a=xpx;

/$ Note we use the symmetric eigen call here

lamda=seig(a,v);d =diagmat(lamda);call schur(a,s,u);

call print(a,lamda,d,v,'With Eigenvalues a = v*d*inv(v)', v*d*inv(v),' ','inv(u)=transpose(u)', inv(u),transpose(u),'Is v*transpose(v)= I ?', v*transpose(v),'Is transpose(v)*v= I ?', transpose(v)*v,

'A = v*d*inv(v)', v*d*inv(v),'With Schur a = u*s*transpose(u)', 'Schur => s upper triangular', s,u,'a = u*s*transpose(u)', 'u*transpose(u)=I' u*transpose(u), 'This is a matrix from the schur', u*s*transpose(u), 'sum(lamda)=trace' 'prod(lamda)=det', sum(lamda),trace(a), prod(lamda),det(a));

produces detailed but instructive results:

=> A=MATRIX(2,2:5,1 2,4)$

=> LAMDA=EIG(A,V)$

=> DET1=DET(A)$

=> TRACE1=TRACE(A)$

=> DET2=PROD(LAMDA)$

=> TRACE2=SUM(LAMDA)$

=> LAMDA=DIAGMAT(LAMDA)$

=> CALL SCHUR(A,S,U)$

=> CALL PRINT('We have defined a general matrix',A, => LAMDA, => V,

71

Matrix Command Language

=> 'Is sum of eigenvalues trace?' => 'Is product of eigenvalues det?' => DET1,DET2,TRACE1,TRACE2, => 'With Eigenvalues a = v*lamda*inv(v)', => V*LAMDA*INV(V), => ' ', => 'With Schur a = u*s*transpose(u) ' => 'Schur => s upper triangular', => S,U, => 'a = u*s*transpose(u)' => 'u*transpose(u)=I' => U*TRANSPOSE(U), => 'This is a from the schur, is it a?', => U*S*TRANSPOSE(U))$

We have defined a general matrix

A = Matrix of 2 by 2 elements

1 2 1 5.00000 1.00000 2 2.00000 4.00000

LAMDA = Complex Matrix of 2 by 2 elements

1 2 1 ( 6.000 , 0.000 ) ( 0.000 , 0.000 ) 2 ( 0.000 , 0.000 ) ( 3.000 , 0.000 )

V = Complex Matrix of 2 by 2 elements

1 2 1 ( 0.7071 , 0.000 ) ( -0.4714 , 0.000 ) 2 ( 0.7071 , 0.000 ) ( 0.9428 , 0.000 ) Is sum of eigenvalues trace? Is product of eigenvalues det?

DET1 = 18.000000

DET2 = (18.00000000000000,0.000000000000000E+00)

TRACE1 = 9.0000000

TRACE2 = (9.000000000000000,0.000000000000000E+00)

With Eigenvalues a = v*lamda*inv(v)

Complex Matrix of 2 by 2 elements

1 2 1 ( 5.000 , 0.000 ) ( 1.000 , 0.000 ) 2 ( 2.000 , 0.000 ) ( 4.000 , 0.000 )

With Schur a = u*s*transpose(u) Schur => s upper triangular

S = Matrix of 2 by 2 elements

1 2 1 6.00000 -1.00000 2 0.00000 3.00000

U = Matrix of 2 by 2 elements

1 2 1 0.707107 -0.707107

72

Chapter 16

2 0.707107 0.707107

a = u*s*transpose(u) u*transpose(u)=I

Matrix of 2 by 2 elements

1 2 1 1.00000 0.00000 2 0.00000 1.00000

This is a from the schur, is it a?

Matrix of 2 by 2 elements

1 2 1 5.00000 1.00000 2 2.00000 4.00000

=> CALL PRINT('Positive Def case s = lamda' => 'transpose(v)*v = I' => 'sum(lamda)=trace' => 'prod(lamda)=det')$

Positive Def case s = lamda transpose(v)*v = I sum(lamda)=trace prod(lamda)=det

=> A=XPX$

=> LAMDA=SEIG(A,V)$

=> D =DIAGMAT(LAMDA)$

=> CALL SCHUR(A,S,U)$

=> CALL PRINT(A,LAMDA,D,V, => 'With Eigenvalues a = v*d*inv(v)', => V*D*INV(V), => ' ', => 'inv(u)=transpose(u)', => INV(U),TRANSPOSE(U), => 'Is v*transpose(v)= I ?', => V*TRANSPOSE(V), => 'Is transpose(v)*v= I ?', => TRANSPOSE(V)*V, => => 'A = v*d*inv(v)', => V*D*INV(V), => 'With Schur a = u*s*transpose(u)', => 'Schur => s upper triangular', => S,U, => 'a = u*s*transpose(u)', => 'u*transpose(u)=I' => U*TRANSPOSE(U), => 'This is a matrix from the schur', => U*S*TRANSPOSE(U), => 'sum(lamda)=trace' => 'prod(lamda)=det', => SUM(LAMDA),TRACE(A), => PROD(LAMDA),DET(A))$

A = Matrix of 5 by 5 elements

73

Matrix Command Language

1 2 3 4 5 1 6.55746 -2.91830 2.14973 -2.98786 2.34107 2 -2.91830 3.12282 0.210901 -0.490650 -1.88651 3 2.14973 0.210901 4.35066 -2.14731 0.427456 4 -2.98786 -0.490650 -2.14731 7.43273 1.41839 5 2.34107 -1.88651 0.427456 1.41839 4.13919

LAMDA = Matrix of 5 by 1 elements

1

1 0.639475 2 1.59842 3 3.38122 4 8.20610 5 11.7777

D = Matrix of 5 by 5 elements

1 2 3 4 5 1 0.639475 0.00000 0.00000 0.00000 0.00000 2 0.00000 1.59842 0.00000 0.00000 0.00000 3 0.00000 0.00000 3.38122 0.00000 0.00000 4 0.00000 0.00000 0.00000 8.20610 0.00000 5 0.00000 0.00000 0.00000 0.00000 11.7777

V = Matrix of 5 by 5 elements

1 2 3 4 5 1 -0.608222 0.244514 -0.153385 -0.286747 0.681563 2 -0.703468 -0.395537 0.319134 0.441200 -0.228428 3 0.222860 0.246090 0.858163 0.143791 0.364217 4 -0.270891 0.368819 0.306957 -0.615213 -0.563809 5 0.110225 -0.766274 0.209667 -0.569172 0.180992

With Eigenvalues a = v*d*inv(v)

Matrix of 5 by 5 elements

1 2 3 4 5 1 6.55746 -2.91830 2.14973 -2.98786 2.34107 2 -2.91830 3.12282 0.210901 -0.490650 -1.88651 3 2.14973 0.210901 4.35066 -2.14731 0.427456 4 -2.98786 -0.490650 -2.14731 7.43273 1.41839 5 2.34107 -1.88651 0.427456 1.41839 4.13919

inv(u)=transpose(u)

Matrix of 5 by 5 elements

1 2 3 4 5 1 0.681563 -0.228428 0.364217 -0.563809 0.180992 2 -0.286747 0.441200 0.143791 -0.615213 -0.569172 3 0.153385 -0.319134 -0.858163 -0.306957 -0.209667 4 0.608222 0.703468 -0.222860 0.270891 -0.110225 5 -0.244514 0.395537 -0.246090 -0.368819 0.766274

Matrix of 5 by 5 elements

1 2 3 4 5 1 0.681563 -0.228428 0.364217 -0.563809 0.180992 2 -0.286747 0.441200 0.143791 -0.615213 -0.569172 3 0.153385 -0.319134 -0.858163 -0.306957 -0.209667 4 0.608222 0.703468 -0.222860 0.270891 -0.110225 5 -0.244514 0.395537 -0.246090 -0.368819 0.766274

Is v*transpose(v)= I ?

Matrix of 5 by 5 elements

74

Chapter 16

1 2 3 4 5 1 1.00000 0.471845E-15 -0.249800E-15 0.111022E-15 -0.222045E-15 2 0.471845E-15 1.00000 -0.319189E-15 0.111022E-15 -0.138778E-16 3 -0.249800E-15 -0.319189E-15 1.00000 0.277556E-16 -0.152656E-15 4 0.111022E-15 0.111022E-15 0.277556E-16 1.00000 -0.693889E-16 5 -0.222045E-15 -0.138778E-16 -0.152656E-15 -0.693889E-16 1.00000

Is transpose(v)*v= I ?

Matrix of 5 by 5 elements

1 2 3 4 5 1 1.00000 0.00000 0.211636E-15 -0.138778E-15 -0.693889E-17 2 0.00000 1.00000 -0.832667E-16 0.388578E-15 0.555112E-16 3 0.211636E-15 -0.832667E-16 1.00000 0.277556E-16 -0.902056E-16 4 -0.138778E-15 0.388578E-15 0.277556E-16 1.00000 -0.971445E-16 5 -0.693889E-17 0.555112E-16 -0.902056E-16 -0.971445E-16 1.00000

A = v*d*inv(v)

Matrix of 5 by 5 elements

1 2 3 4 5 1 6.55746 -2.91830 2.14973 -2.98786 2.34107 2 -2.91830 3.12282 0.210901 -0.490650 -1.88651 3 2.14973 0.210901 4.35066 -2.14731 0.427456 4 -2.98786 -0.490650 -2.14731 7.43273 1.41839 5 2.34107 -1.88651 0.427456 1.41839 4.13919

With Schur a = u*s*transpose(u) Schur => s upper triangular

S = Matrix of 5 by 5 elements

1 2 3 4 5 1 11.7777 -0.330465E-15 0.221534E-16 -0.503934E-14 0.874203E-15 2 0.00000 8.20610 0.212782E-15 0.273997E-15 0.165088E-15 3 0.00000 0.00000 3.38122 -0.666146E-15 0.189045E-15 4 0.00000 0.00000 0.00000 0.639475 -0.183493E-15 5 0.00000 0.00000 0.00000 0.00000 1.59842

U = Matrix of 5 by 5 elements

1 2 3 4 5 1 0.681563 -0.286747 0.153385 0.608222 -0.244514 2 -0.228428 0.441200 -0.319134 0.703468 0.395537 3 0.364217 0.143791 -0.858163 -0.222860 -0.246090 4 -0.563809 -0.615213 -0.306957 0.270891 -0.368819 5 0.180992 -0.569172 -0.209667 -0.110225 0.766274

a = u*s*transpose(u) u*transpose(u)=I

Matrix of 5 by 5 elements

1 2 3 4 5 1 1.00000 0.111022E-15 -0.305311E-15 0.346945E-15 -0.138778E-15 2 0.111022E-15 1.00000 -0.693889E-16 0.527356E-15 0.555112E-16 3 -0.305311E-15 -0.693889E-16 1.00000 -0.277556E-16 -0.277556E-16 4 0.346945E-15 0.527356E-15 -0.277556E-16 1.00000 -0.277556E-15 5 -0.138778E-15 0.555112E-16 -0.277556E-16 -0.277556E-15 1.00000

This is a matrix from the schur

Matrix of 5 by 5 elements

1 2 3 4 5 1 6.55746 -2.91830 2.14973 -2.98786 2.34107 2 -2.91830 3.12282 0.210901 -0.490650 -1.88651 3 2.14973 0.210901 4.35066 -2.14731 0.427456

75

Matrix Command Language

4 -2.98786 -0.490650 -2.14731 7.43273 1.41839 5 2.34107 -1.88651 0.427456 1.41839 4.13919

sum(lamda)=trace prod(lamda)=det

25.602863

25.602863

334.02779

334.02779

Note that only with the symmetric matrix are the eigenvectors orthogonal. The SVD decomposition, discussed in Chapter 10, writes

(16.4-8)

where both U and V are orthogonal whether A is symmetric or not. is a diagonal matrix and if is symmetric contains the eigenvalues. Define X as the n by k matrix of explanatory variables and factor as . The OLS coefficients where and is the principle component coefficient vector. As discussed in chapter 10, the QR factorization operates on X directly and writes it as

(16.4-9)

where R is the upper triangular Cholesky matrix and . Using the QR approach the OLS coefficient vector can be calculated as where is the truncated QR factorization Q. 11 The eigenvalue and SVD decompositions are shown next.

/$ SVD case

n=4;noob=20;x=rn(matrix(noob,n:));s=svd(x,b,11,u,v);call print('X',x,'Singular values',s,'Left Singular vectors',U, 'Right Singular vectors',v);call print('Test of Factorization. Is S along diagonal?', 'Transpose(u)*x*v',transpose(u)*x*v, 'Is U orthagonal?','transpose(U)*U', transpose(U)*U, 'Is V orthagonal?','transpose(V)*V', transpose(V)*V, ' ');

/$ OLS with SVD

n=30;k=5;x =rn(matrix(n,k:));x(,1) =1.0;

11 See Chapter 10 and especially equation (10.1-4) for a further discussion of the SVD and QR approaches to OLS estimation.

76

Chapter 16

beta =vector(5:1. 2. 3. 4. 5.);y =x*beta +rn(vector(n:));xpx =transpose(x)*x;* Solve reduced problem;s =svd(x,bad,21,u,v);sigma =diagmat(s);betahat1=inv(xpx)*transpose(x)*y;betahat2=v*inv(sigma)*transpose(u)*y;call print('OLS from two approaches',betahat1,betahat2);

/$ Show that SVD of PD matrix produces eigenvalues

x=rn(matrix(5,5:));xpx=Transpose(x)*x;e=eig(xpx);ee=seig(xpx);s=svd(xpx);call print(e,ee,s);b34srun;

Edited output produces:

=> N=4$

=> NOOB=20$

=> X=RN(MATRIX(NOOB,N:))$

=> S=SVD(X,B,11,U,V)$

=> CALL PRINT('X',X,'Singular values',S,'Left Singular vectors',U, => 'Right Singular vectors',V)$

X

X = Matrix of 20 by 4 elements

1 2 3 4 1 -1.68924 0.835446 -0.657493 -0.254551 2 1.29993 1.91536 1.47204 -0.390060 3 0.971522 -0.974110 0.819547 1.15896 4 0.275934 -1.85672 -0.279932 -0.159116 5 1.04040 0.720938 -0.898864E-01 0.622223 6 2.19227 0.677597 1.44798 0.493764E-01 7 1.34663 1.52468 -0.880582 -0.986808 8 0.487722 0.825081 0.482098 0.478458 9 -1.42759 0.717196 2.55604 0.966945 10 0.784950 0.415158 -0.532131E-01 -0.504051 11 1.24785 -0.322106 -0.169876 -0.409886 12 -0.170586 1.74290 -1.43276 -0.209339 13 1.94869 0.720260 -0.299384 -0.516373 14 -0.143676 1.02657 0.746487 -1.32309 15 1.25783 -1.44179 0.860977 0.180785 16 1.04911 1.40971 0.919866E-01 0.210994 17 -0.417547 0.304695 -1.34788 1.02744 18 0.361858 -0.991184 3.42229 1.01987 19 0.975757E-02 0.233525 -0.797029 1.35735 20 2.33496 1.40810 -0.509794 -1.84371

Singular values

S = Vector of 4 elements

6.23289 5.81154 4.30460 3.01437

Left Singular vectors

U = Matrix of 20 by 20 elements

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 -0.322101E-01 -0.260349 0.296990 0.965272E-01 0.275910 0.375527 0.179450 0.116293 -0.320597 0.994671E-01 0.224954 -0.228019E-01 0.344104 -0.234639 0.265510 0.209481 0.893493E-01 0.447487E-01 0.189791 0.270443 2 0.255356 0.318931 0.303561 -0.124349E-01 0.509073E-01 -0.206155 -0.222228 -0.615182E-01 -0.237380 -0.116724 0.520870E-02 -0.712554E-01 -0.107362 -0.436742 0.115344 -0.120714

77

Matrix Command Language

0.294401 -0.105955 0.301436 -0.385389 3 -0.117659 0.229697 -0.211241 -0.289396 0.703497E-01 -0.140903 0.264761 0.108067E-01 -0.262047 0.461419E-01 -0.160541E-01 0.364765 0.998936E-01 0.152946E-01 -0.267798 0.118000 0.202830 -0.560129 0.117248 0.203495 4 -0.125971 -0.806928E-02 -0.390195 0.143602 0.276577 0.176119 -0.296716E-01 0.262122 0.396289 -0.581931E-01 -0.140321 0.168026 -0.717059E-02 -0.168916 -0.130592 0.278325 0.269062 0.200548 0.361415 -0.243721 5 0.135539 0.904080E-01 0.397197E-03 -0.333015 0.863076 -0.661692E-01 -0.553338E-01 -0.784820E-01 0.484921E-01 -0.274459E-02 -0.102591E-01 -0.953795E-01 -0.743152E-01 0.129263 0.153749E-01 -0.114721 -0.149306 0.628860E-01 -0.174926 0.456352E-02 6 0.196584 0.411161 -0.237031E-01 -0.125418 -0.110976 0.840641 -0.429142E-01 -0.753492E-01 -0.227056E-02 -0.249172E-01 -0.452214E-01 0.484881E-02 -0.981376E-01 0.625786E-01 -0.697688E-01 -0.100587 -0.287309E-01 -0.924831E-01 -0.772205E-01 -0.541132E-01 7 0.388463 -0.207906E-01 0.231855E-01 0.554756E-02 -0.358442E-01 -0.556430E-01 0.871121 -0.113239E-01 0.115154 -0.516838E-01 -0.495775E-01 -0.732890E-01 -0.103534 -0.409910E-01 0.164006E-01 -0.602158E-01 0.641075E-02 0.116558 0.155384E-01 -0.175532 8 0.698504E-01 0.118010 0.133311 -0.207380 -0.858638E-01 -0.570154E-01 -0.280015E-01 0.935966 -0.322599E-01 0.417217E-03 0.102239E-01 -0.589486E-01 -0.352846E-01 0.560134E-01 0.227976E-01 -0.830720E-01 -0.792262E-01 -0.269435E-02 -0.100480 0.131425E-01 9 -0.270935 0.242999 0.528093 -0.279063E-01 0.109161E-01 -0.296105E-01 0.939612E-01 -0.580439E-01 0.688969 0.382602E-01 0.964087E-01 0.320377E-01 0.100278 -0.680625E-01 0.508002E-01 -0.189478E-01 0.392287E-01 -0.220869 0.134421E-01 0.141768 10 0.151415 0.570221E-01 -0.313488E-01 0.542150E-01 -0.537323E-02 -0.381130E-01 -0.424956E-01 0.861935E-03 0.422894E-01 0.976281 -0.297865E-01 -0.587722E-04 -0.439334E-01 -0.247489E-01 -0.189020E-01 -0.139336E-01 0.290474E-01 0.115740E-01 0.278619E-01 -0.772195E-01 11 0.126017 0.884524E-01 -0.228266 0.349257E-01 -0.201022E-01 -0.444373E-01 -0.294424E-01 0.740036E-02 0.977614E-01 -0.210630E-01 0.946900 0.235356E-01 -0.579962E-01 0.300595E-01 -0.618489E-01 -0.567557E-02 0.818546E-02 0.117491E-01 -0.275030E-03 -0.645416E-01 12 0.241971 -0.232379 0.208484 -0.169038 -0.481729E-01 0.208588E-01 -0.107268 -0.280739E-01 0.602448E-01 -0.199072E-01 0.119206E-01 0.853532 -0.471590E-01 -0.983946E-02 0.106358 -0.724567E-01 -0.800231E-01 0.181101 -0.574899E-01 -0.792930E-01 13 0.308152 0.129450 -0.139860 -0.778142E-01 -0.787375E-01 -0.916376E-01 -0.937637E-01 -0.307506E-01 0.123591 -0.396893E-01 -0.641483E-01 -0.395774E-01 0.885985 0.436799E-01 -0.387356E-01 -0.726982E-01 -0.341018E-01 0.598312E-01 -0.490760E-01 -0.122433 14 0.128668 0.700518E-01 0.250314 0.394257 0.120646 -0.199654E-01 -0.220366E-01 0.467469E-01 -0.895837E-01 -0.356546E-01 -0.301890E-02 0.679223E-01 0.165615E-01 0.777693 0.158059E-02 0.575085E-01 0.210172 -0.887863E-01 0.229046 -0.126451 15 -0.743264E-01 0.250716 -0.336067 0.222758E-01 -0.258233E-01 -0.609790E-01 0.484097E-01 0.605200E-03 0.426337E-01 0.369693E-02 -0.471335E-01 0.937636E-01 -0.249419E-01 0.845400E-01 0.882325 0.144754E-01 0.469393E-02 -0.102393 -0.266146E-01 0.302488E-01 16 0.218939 0.108324 0.133400 -0.246618 -0.113702 -0.800897E-01 -0.795206E-01 -0.755218E-01 0.183609E-01 -0.174255E-01 -0.830885E-02 -0.972989E-01 -0.794022E-01 0.589704E-01 0.314735E-01 0.884023 -0.970485E-01 0.512990E-01 -0.116959 -0.465850E-01 17 -0.389875E-02 -0.219255 0.181067E-02 -0.408373 -0.112126 0.433601E-01 -0.341732E-01 -0.534232E-01 0.710544E-01 0.231058E-01 0.269130E-01 -0.143208 -0.195894E-01 0.154202 0.803092E-01 -0.824188E-01 0.791055 0.168326 -0.207292 0.733172E-01 18 -0.311206 0.543786 0.787399E-01 0.110617E-01 -0.228036E-01 -0.109945 0.137477 -0.536145E-01 -0.229800 0.352634E-01 0.265673E-01 0.143426 0.539288E-01 0.343820E-01 -0.993364E-01 -0.347539E-02 0.478663E-01 0.671949 -0.145931E-01 0.153050 19 -0.201913E-01 -0.951078E-01 -0.130967E-01 -0.492924 -0.153688 0.482572E-02 -0.245629E-01 -0.820223E-01 0.532134E-01 0.279124E-01 0.244190E-01 -0.139727 -0.331699E-01 0.200144 0.603488E-01 -0.112053 -0.240932 0.128265 0.742236 0.996318E-01 20 0.508172 0.113523 -0.955852E-01 0.224285 0.551242E-02 -0.102689 -0.145848 0.191119E-01 0.150667 -0.807279E-01 -0.965548E-01 -0.337589E-02 -0.137046 -0.106721 -0.501192E-01 -0.292739E-01 0.112420 0.630360E-01 0.119293 0.730950

Right Singular vectors

V = Matrix of 4 by 4 elements

1 2 3 4 1 0.606725 0.545782 -0.525263 -0.241052 2 0.598306 -0.263772E-01 0.767939 -0.227166 3 -0.338768 0.833411 0.363935 0.241276 4 -0.398937 0.827817E-01 0.438230E-01 -0.912182

=> CALL PRINT('Test of Factorization. Is S along diagonal?', => 'Transpose(u)*x*v',TRANSPOSE(U)*X*V, => 'Is U orthagonal?','transpose(U)*U', => TRANSPOSE(U)*U, => 'Is V orthagonal?','transpose(V)*V', => TRANSPOSE(V)*V, => ' ')$

Test of Factorization. Is S along diagonal? Transpose(u)*x*v

Matrix of 20 by 4 elements

1 2 3 4 1 6.23289 -0.194289E-15 -0.888178E-15 0.444089E-15 2 0.444089E-15 5.81154 0.221698E-14 -0.133227E-14 3 -0.115186E-14 0.700828E-15 4.30460 -0.333067E-15 4 0.666134E-15 -0.666134E-15 -0.416334E-15 3.01437 5 0.314691E-15 0.374147E-16 -0.450527E-15 -0.341664E-16 6 0.278651E-16 -0.531336E-16 -0.322149E-15 -0.154196E-15 7 -0.313373E-15 -0.169282E-15 -0.161397E-15 -0.129486E-15 8 0.349962E-15 -0.249257E-15 0.106333E-15 0.272142E-16 9 0.762711E-16 -0.421771E-15 -0.811914E-15 0.315349E-15 10 -0.584209E-16 -0.249419E-16 0.162566E-15 -0.212325E-15 11 0.250437E-15 0.349601E-15 0.221742E-15 -0.975753E-16 12 -0.432716E-15 0.206492E-15 -0.131224E-15 -0.270899E-15 13 0.564207E-16 0.195553E-15 0.216933E-15 -0.118217E-15 14 0.941971E-16 -0.218145E-16 0.571930E-15 -0.137410E-15

78

Chapter 16

15 0.223408E-15 0.403956E-15 -0.893632E-16 0.117226E-15 16 0.545941E-15 -0.225733E-16 0.563153E-15 -0.768332E-16 17 -0.297085E-15 -0.829490E-17 -0.478270E-15 0.453433E-16 18 -0.255229E-15 -0.174354E-15 -0.537131E-15 0.708976E-15 19 0.107049E-16 -0.394639E-15 -0.474331E-15 -0.632835E-16 20 0.420439E-15 0.381199E-15 0.516133E-15 -0.124486E-15

Is U orthagonal? transpose(U)*U

Matrix of 20 by 20 elements

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 1.00000 -0.208167E-16 -0.256739E-15 0.166533E-15 0.216840E-16 -0.693889E-17 0.00000 0.433681E-16 0.277556E-16 -0.416334E-16 0.138778E-16 0.124466E-15 0.416334E-16 -0.763278E-16 -0.485723E-16 -0.104083E-16 0.624500E-16 0.624500E-16 0.555112E-16 0.555112E-16 2 -0.208167E-16 1.00000 0.267147E-15 -0.204697E-15 -0.500901E-16 -0.364292E-16 -0.416334E-16 -0.216840E-16 0.555112E-16 0.693889E-17 0.451028E-16 -0.148861E-15 0.121431E-16 0.121431E-16 0.144849E-15 0.477049E-17 -0.693889E-16 -0.156125E-16 -0.485723E-16 0.138778E-16 3 -0.256739E-15 0.267147E-15 1.00000 -0.326128E-15 -0.141705E-15 -0.104083E-16 0.555112E-16 0.294903E-16 -0.104083E-15 0.416334E-16 0.555112E-16 -0.115034E-15 0.329597E-16 0.104083E-16 -0.546438E-16 -0.290566E-16 -0.163064E-15 -0.607153E-17 -0.137043E-15 0.194289E-15 4 0.166533E-15 -0.204697E-15 -0.326128E-15 1.00000 0.305745E-16 0.173472E-16 -0.555112E-16 0.615827E-16 0.166533E-15 -0.277556E-16 -0.451028E-16 -0.325261E-16 -0.659195E-16 -0.145717E-15 -0.693889E-17 0.234188E-16 0.381639E-16 0.130104E-15 0.277556E-16 -0.555112E-16 5 0.216840E-16 -0.500901E-16 -0.141705E-15 0.305745E-16 1.00000 -0.329597E-16 -0.274303E-16 0.187431E-16 0.635342E-16 -0.140946E-16 -0.358871E-16 -0.889046E-17 -0.521501E-16 -0.574627E-17 -0.617995E-17 -0.109504E-16 -0.214672E-16 0.548064E-16 0.241777E-16 -0.112757E-16 6 -0.693889E-17 -0.364292E-16 -0.104083E-16 0.173472E-16 -0.329597E-16 1.00000 0.208167E-16 -0.173472E-17 -0.398986E-16 0.173472E-17 -0.277556E-16 0.140946E-16 0.190820E-16 0.242861E-16 -0.225514E-16 -0.416334E-16 0.138778E-16 -0.312250E-16 0.156125E-16 0.138778E-16 7 0.00000 -0.416334E-16 0.555112E-16 -0.555112E-16 -0.274303E-16 0.208167E-16 1.00000 -0.303577E-16 0.208167E-16 -0.867362E-17 -0.693889E-17 -0.235272E-16 -0.138778E-16 0.329597E-16 0.112757E-16 -0.954098E-17 -0.138778E-16 0.156125E-16 -0.277556E-16 -0.138778E-16 8 0.433681E-16 -0.216840E-16 0.294903E-16 0.615827E-16 0.187431E-16 -0.173472E-17 -0.303577E-16 1.00000 0.468375E-16 -0.101915E-16 -0.290566E-16 -0.379471E-18 -0.130104E-16 -0.433681E-18 -0.411997E-17 -0.249366E-17 0.325261E-16 0.407660E-16 0.294903E-16 -0.260209E-16 9 0.277556E-16 0.555112E-16 -0.104083E-15 0.166533E-15 0.635342E-16 -0.398986E-16 0.208167E-16 0.468375E-16 1.00000 -0.520417E-17 -0.277556E-16 0.992045E-16 0.763278E-16 0.277556E-16 -0.112757E-16 0.260209E-17 0.451028E-16 -0.126635E-15 0.867362E-16 -0.138778E-16 10 -0.416334E-16 0.693889E-17 0.416334E-16 -0.277556E-16 -0.140946E-16 0.173472E-17 -0.867362E-17 -0.101915E-16 -0.520417E-17 1.00000 0.607153E-17 -0.487891E-18 -0.156125E-16 0.346945E-17 0.00000 -0.737257E-17 0.693889E-17 0.433681E-17 -0.346945E-17 -0.208167E-16 11 0.138778E-16 0.451028E-16 0.555112E-16 -0.451028E-16 -0.358871E-16 -0.277556E-16 -0.693889E-17 -0.290566E-16 -0.277556E-16 0.607153E-17 1.00000 -0.726415E-17 -0.451028E-16 0.121431E-16 -0.112757E-16 -0.177809E-16 -0.156125E-16 -0.156125E-16 -0.693889E-17 0.00000 12 0.124466E-15 -0.148861E-15 -0.115034E-15 -0.325261E-16 -0.889046E-17 0.140946E-16 -0.235272E-16 -0.379471E-18 0.992045E-16 -0.487891E-18 -0.726415E-17 1.00000 -0.211962E-16 -0.948677E-17 -0.139049E-16 -0.233103E-17 -0.501986E-16 0.700395E-16 -0.207625E-16 -0.607153E-17 13 0.416334E-16 0.121431E-16 0.329597E-16 -0.659195E-16 -0.521501E-16 0.190820E-16 -0.138778E-16 -0.130104E-16 0.763278E-16 -0.156125E-16 -0.451028E-16 -0.211962E-16 1.00000 0.173472E-16 -0.147451E-16 -0.355618E-16 -0.156125E-16 0.156125E-16 -0.242861E-16 -0.416334E-16 14 -0.763278E-16 0.121431E-16 0.104083E-16 -0.145717E-15 -0.574627E-17 0.242861E-16 0.329597E-16 -0.433681E-18 0.277556E-16 0.346945E-17 0.121431E-16 -0.948677E-17 0.173472E-16 1.00000 0.251535E-16 -0.650521E-17 0.381639E-16 -0.260209E-17 0.277556E-16 0.693889E-16 15 -0.485723E-16 0.144849E-15 -0.546438E-16 -0.693889E-17 -0.617995E-17 -0.225514E-16 0.112757E-16 -0.411997E-17 -0.112757E-16 0.00000 -0.112757E-16 -0.139049E-16 -0.147451E-16 0.251535E-16 1.00000 -0.975782E-17 0.173472E-17 -0.780626E-17 -0.173472E-17 -0.138778E-16 16 -0.104083E-16 0.477049E-17 -0.290566E-16 0.234188E-16 -0.109504E-16 -0.416334E-16 -0.954098E-17 -0.249366E-17 0.260209E-17 -0.737257E-17 -0.177809E-16 -0.233103E-17 -0.355618E-16 -0.650521E-17 -0.975782E-17 1.00000 0.169136E-16 0.101915E-16 -0.121431E-16 -0.693889E-17 17 0.624500E-16 -0.693889E-16 -0.163064E-15 0.381639E-16 -0.214672E-16 0.138778E-16 -0.138778E-16 0.325261E-16 0.451028E-16 0.693889E-17 -0.156125E-16 -0.501986E-16 -0.156125E-16 0.381639E-16 0.173472E-17 0.169136E-16 1.00000 0.390313E-16 -0.884709E-16 0.138778E-16 18 0.624500E-16 -0.156125E-16 -0.607153E-17 0.130104E-15 0.548064E-16 -0.312250E-16 0.156125E-16 0.407660E-16 -0.126635E-15 0.433681E-17 -0.156125E-16 0.700395E-16 0.156125E-16 -0.260209E-17 -0.780626E-17 0.101915E-16 0.390313E-16 1.00000 0.546438E-16 -0.416334E-16 19 0.555112E-16 -0.485723E-16 -0.137043E-15 0.277556E-16 0.241777E-16 0.156125E-16 -0.277556E-16 0.294903E-16 0.867362E-16 -0.346945E-17 -0.693889E-17 -0.207625E-16 -0.242861E-16 0.277556E-16 -0.173472E-17 -0.121431E-16 -0.884709E-16 0.546438E-16 1.00000 -0.138778E-16 20 0.555112E-16 0.138778E-16 0.194289E-15 -0.555112E-16 -0.112757E-16 0.138778E-16 -0.138778E-16 -0.260209E-16 -0.138778E-16 -0.208167E-16 0.00000 -0.607153E-17 -0.416334E-16 0.693889E-16 -0.138778E-16 -0.693889E-17 0.138778E-16 -0.416334E-16 -0.138778E-16 1.00000

Is V orthagonal? transpose(V)*V

Matrix of 4 by 4 elements

1 2 3 4 1 1.00000 0.124900E-15 0.461436E-15 -0.166533E-15 2 0.124900E-15 1.00000 -0.191687E-15 0.971445E-16 3 0.461436E-15 -0.191687E-15 1.00000 -0.624500E-16 4 -0.166533E-15 0.971445E-16 -0.624500E-16 1.00000

=> N=30$

=> K=5$

=> X =RN(MATRIX(N,K:))$

79

Matrix Command Language

=> X(,1) =1.0$

=> BETA =VECTOR(5:1. 2. 3. 4. 5.)$

=> Y =X*BETA +RN(VECTOR(N:))$

=> XPX =TRANSPOSE(X)*X$

=> * SOLVE REDUCED PROBLEM$

=> S =SVD(X,BAD,21,U,V)$

=> SIGMA =DIAGMAT(S)$

=> BETAHAT1=INV(XPX)*TRANSPOSE(X)*Y$

=> BETAHAT2=V*INV(SIGMA)*TRANSPOSE(U)*Y$

=> CALL PRINT('OLS from two approaches',BETAHAT1,BETAHAT2)$

OLS from two approaches

BETAHAT1= Vector of 5 elements

0.662539 2.00676 2.95431 3.50355 4.73075

BETAHAT2= Vector of 5 elements

0.662539 2.00676 2.95431 3.50355 4.73075

=> X=RN(MATRIX(5,5:))$

=> XPX=TRANSPOSE(X)*X$

=> E=EIG(XPX)$

=> EE=SEIG(XPX)$

=> S=SVD(XPX)$

=> CALL PRINT(E,EE,S)$

E = Complex Vector of 5

( 13.45 , 0.000 ) ( 5.467 , 0.000 ) ( 0.3605 , 0.000 ) ( 2.253 , 0.000 ) ( 1.472 , 0.000 )

EE = Matrix of 5 by 1 elements

1

1 0.360474

2 1.47185

3 2.25274

4 5.46686

5 13.4516

S = Vector of 5 elements

13.4516 5.46686 2.25274 1.47185 0.360474

B34S Matrix Command Ending. Last Command reached.

Space available in allocator 2881535, peak space used 6234 Number variables used 88, peak number used 97 Number temp variables used 407, # user temp clean 0

16.5 Extended Eigenvalue Analysis

80

Chapter 16

Recall from (16.4) that given a general matrix A, an eigenvalue value decomposition writes where V is the usual "left hand" eigenvector matrix V and D is a diagonal matrix with the eigenvalues along the diagonal. It should be noted that the matrix V is not unique and can be scaled. The matrix command e=eig(a,V) will use the eispack routines rg and cg to produce a non-scaled eigenvector matrix, while the commands e=eig(a,v :LAPACK) or e=eig(a,v :LAPACK2)will produce eigenvalues using lapack routines dgeevx/zgeevx or dgeev/zgeev where V is scaled so that each column has a norm of 1. The dgeevx/zgeevx call does not balance the matrix while dgeev/zgeev does. In addition to the usual eigenvectors, it is possible to define "right handed" eigenvectors where:

(16.5-1)

The below listed code illustrates these refinements for real*8 and complex*16 matrices. We first estimate and test the non-scaled eigenvectors evec which were estimated using eispack. Next the lapack code is used to estimate and test the right and left handed eigenvectors.

b34sexec matrix;* Exercises Eigenvalue calculations ;* IMSL test case ;A = matrix(3,3: 8.0, -1.0,-5.0, -4.0, 4.0,-2.0, 18.0, -5.0,-7.0);e =eig(a,evec);call print('Test Eispack',a,evec*diagmat(e)*inv(evec));

e2 =eig(a,evecr,evecl :lapack);call print('test eispack vs lapack':);call print(a,e,evec,e2,evecr,evecl);

call print('test right' evecr*diagmat(e2)*inv(evecr) 'test left' inv(transpose(dconj(evecl)))*diagmat(e2)*transpose(dconj(evecl)));

ca=complex(a,a*a);e =eig(ca,evec);call print('Test Eispack factorization', ca,evec*diagmat(e)*inv(evec));

e2 =eig(ca,evecr,evecl :lapack);call print('test eispack vs lapack':);call print(ca,e,evec,e2,evecr,evecl);

call print('test right' evecr*diagmat(e2)*inv(evecr) 'test left' inv(transpose(dconj(evecl)))*diagmat(e2)*transpose(dconj(evecl)));

b34srun;

81

Matrix Command Language

Edited output is listed next.

B34S(r) Matrix Command. Version February 2004.

Date of Run d/m/y 26/ 2/04. Time of Run h:m:s 14:55:16.

=> * EXERCISES EIGENVALUE CALCULATIONS $

=> * IMSL TEST CASE $

=> A = MATRIX(3,3: 8.0, -1.0,-5.0, => -4.0, 4.0,-2.0, => 18.0, -5.0,-7.0)$

=> E =EIG(A,EVEC)$

=> CALL PRINT('Test Eispack',A,EVEC*DIAGMAT(E)*INV(EVEC))$

Test Eispack

A = Matrix of 3 by 3 elements

1 2 3 1 8.00000 -1.00000 -5.00000 2 -4.00000 4.00000 -2.00000 3 18.0000 -5.00000 -7.00000

Complex Matrix of 3 by 3 elements

1 2 3 1 ( 8.000 , 0.5467E-14) ( -1.000 , -0.2402E-14) ( -5.000 , -0.2178E-14) 2 ( -4.000 , 0.3828E-14) ( 4.000 , -0.3625E-15) ( -2.000 , -0.2136E-14) 3 ( 18.00 , 0.7021E-14) ( -5.000 , -0.3845E-14) ( -7.000 , -0.2400E-14)

=> E2 =EIG(A,EVECR,EVECL :LAPACK)$

=> CALL PRINT('test eispack vs lapack':)$

test eispack vs lapack

=> CALL PRINT(A,E,EVEC,E2,EVECR,EVECL)$

A = Matrix of 3 by 3 elements

1 2 3 1 8.00000 -1.00000 -5.00000 2 -4.00000 4.00000 -2.00000 3 18.0000 -5.00000 -7.00000

E = Complex Vector of 3 elements

( 2.000 , 4.000 ) ( 2.000 , -4.000 ) ( 1.000 , 0.000 )

EVEC = Complex Matrix of 3 by 3 elements

1 2 3 1 ( 0.1129 , 0.5397 ) ( 0.1129 , -0.5397 ) ( 2.367 , 0.000 ) 2 ( -0.4268 , 0.6526 ) ( -0.4268 , -0.6526 ) ( 4.735 , 0.000 ) 3 ( 0.6526 , 0.4268 ) ( 0.6526 , -0.4268 ) ( 2.367 , 0.000 )

82

Chapter 16

E2 = Complex Vector of 3 elements

( 2.000 , 4.000 ) ( 2.000 , -4.000 ) ( 1.000 , 0.000 )

EVECR = Complex Matrix of 3 by 3 elements

1 2 3 1 ( 0.3162 , 0.3162 ) ( 0.3162 , -0.3162 ) ( 0.4082 , 0.000 ) 2 ( -0.9992E-15, 0.6325 ) ( -0.9992E-15, -0.6325 ) ( 0.8165 , 0.000 ) 3 ( 0.6325 , 0.000 ) ( 0.6325 , 0.000 ) ( 0.4082 , 0.000 )

EVECL = Complex Matrix of 3 by 3 elements

1 2 3 1 ( -0.8771 , 0.000 ) ( -0.8771 , 0.000 ) ( -0.8165 , 0.000 ) 2 ( 0.2631 , -0.8771E-01) ( 0.2631 , 0.8771E-01) ( 0.4082 , 0.000 ) 3 ( 0.3508 , 0.1754 ) ( 0.3508 , -0.1754 ) ( 0.4082 , 0.000 )

=> CALL PRINT('test right' => EVECR*DIAGMAT(E2)*INV(EVECR) => 'test left' => INV(TRANSPOSE(DCONJ(EVECL)))*DIAGMAT(E2)*TRANSPOSE(DCONJ(EVECL)))$

test right

Complex Matrix of 3 by 3 elements

1 2 3 1 ( 8.000 , 0.000 ) ( -1.000 , 0.2220E-15) ( -5.000 , 0.8777E-16) 2 ( -4.000 , 0.000 ) ( 4.000 , 0.000 ) ( -2.000 , 0.1755E-15) 3 ( 18.00 , 0.000 ) ( -5.000 , 0.4441E-15) ( -7.000 , 0.8777E-16)

test left

Complex Matrix of 3 by 3 elements

1 2 3 1 ( 8.000 , 0.1595E-14) ( -1.000 , -0.1314E-15) ( -5.000 , -0.6865E-15) 2 ( -4.000 , 0.3626E-15) ( 4.000 , -0.1813E-15) ( -2.000 , 0.7069E-15) 3 ( 18.00 , 0.4039E-14) ( -5.000 , -0.4599E-15) ( -7.000 , -0.2347E-14)

=> CA=COMPLEX(A,A*A)$

=> E =EIG(CA,EVEC)$

=> CALL PRINT('Test Eispack factorization', => CA,EVEC*DIAGMAT(E)*INV(EVEC))$

Test Eispack factorization

CA = Complex Matrix of 3 by 3 elements

1 2 3 1 ( 8.000 , -22.00 ) ( -1.000 , 13.00 ) ( -5.000 , -3.000 ) 2 ( -4.000 , -84.00 ) ( 4.000 , 30.00 ) ( -2.000 , 26.00 ) 3 ( 18.00 , 38.00 ) ( -5.000 , -3.000 ) ( -7.000 , -31.00 )

Complex Matrix of 3 by 3 elements

1 2 3 1 ( 8.000 , -22.00 ) ( -1.000 , 13.00 ) ( -5.000 , -3.000 ) 2 ( -4.000 , -84.00 ) ( 4.000 , 30.00 ) ( -2.000 , 26.00 ) 3 ( 18.00 , 38.00 ) ( -5.000 , -3.000 ) ( -7.000 , -31.00 )

=> E2 =EIG(CA,EVECR,EVECL :LAPACK)$

83

Matrix Command Language

=> CALL PRINT('test eispack vs lapack':)$

test eispack vs lapack

=> CALL PRINT(CA,E,EVEC,E2,EVECR,EVECL)$

CA = Complex Matrix of 3 by 3 elements

1 2 3 1 ( 8.000 , -22.00 ) ( -1.000 , 13.00 ) ( -5.000 , -3.000 ) 2 ( -4.000 , -84.00 ) ( 4.000 , 30.00 ) ( -2.000 , 26.00 ) 3 ( 18.00 , 38.00 ) ( -5.000 , -3.000 ) ( -7.000 , -31.00 )

E = Complex Vector of 3 elements

( -14.00 , -8.000 ) ( 1.000 , 1.000 ) ( 18.00 , -16.00 )

EVEC = Complex Matrix of 3 by 3 elements

1 2 3 1 ( -7.205 , -0.2937 ) ( -0.5782 , 0.2242 ) ( -0.9345 , 1.542 ) 2 ( -6.911 , -7.499 ) ( -1.156 , 0.4484 ) ( 0.6072 , 2.476 ) 3 ( -7.499 , 6.911 ) ( -0.5782 , 0.2242 ) ( -2.476 , 0.6072 )

E2 = Complex Vector of 3 elements

( -14.00 , -8.000 ) ( 1.000 , 1.000 ) ( 18.00 , -16.00 )

EVECR = Complex Matrix of 3 by 3 elements

1 2 3 1 ( 0.3162 , -0.3162 ) ( 0.4082 , -0.2498E-15) ( 0.3162 , -0.3162 ) 2 ( 0.6325 , 0.000 ) ( 0.8165 , 0.000 ) ( 0.1665E-14, -0.6325 ) 3 ( 0.1638E-14, -0.6325 ) ( 0.4082 , -0.3608E-15) ( 0.6325 , 0.000 )

EVECL = Complex Matrix of 3 by 3 elements

1 2 3 1 ( 0.8771 , 0.000 ) ( 0.8165 , 0.000 ) ( 0.8771 , 0.000 ) 2 ( -0.2631 , 0.8771E-01) ( -0.4082 , 0.1665E-15) ( -0.2631 , -0.8771E-01) 3 ( -0.3508 , -0.1754 ) ( -0.4082 , -0.3053E-15) ( -0.3508 , 0.1754 )

=> CALL PRINT('test right' => EVECR*DIAGMAT(E2)*INV(EVECR) => 'test left' => INV(TRANSPOSE(DCONJ(EVECL)))*DIAGMAT(E2)*TRANSPOSE(DCONJ(EVECL)))$

test right

Complex Matrix of 3 by 3 elements

1 2 3 1 ( 8.000 , -22.00 ) ( -1.000 , 13.00 ) ( -5.000 , -3.000 ) 2 ( -4.000 , -84.00 ) ( 4.000 , 30.00 ) ( -2.000 , 26.00 ) 3 ( 18.00 , 38.00 ) ( -5.000 , -3.000 ) ( -7.000 , -31.00 )

test left

Complex Matrix of 3 by 3 elements

1 2 3 1 ( 8.000 , -22.00 ) ( -1.000 , 13.00 ) ( -5.000 , -3.000 )

84

Chapter 16

2 ( -4.000 , -84.00 ) ( 4.000 , 30.00 ) ( -2.000 , 26.00 ) 3 ( 18.00 , 38.00 ) ( -5.000 , -3.000 ) ( -7.000 , -31.00 )

B34S Matrix Command Ending. Last Command reached.

Space available in allocator 2874697, peak space used 1518 Number variables used 28, peak number used 40 Number temp variables used 86, # user temp clean 0

The reason the matrix command has both types of eigenvalue routines is that while above order 200 the lapack code is faster, below order 200 the eispack code is faster. The next job tests thisand in addition uses the eispack symmetric matrix code tred1/imtql1 and tred2/imtql2 for eigenvalue and eigenvalue/eigenvector respectively. In the following series of tests, where appropriate, two computers were used, both running Windows XP Professional. The Dell Latitude Model C810 had an Intel Family 6 Model 11 CPU with a speed of 1,122 MH as measured by MS Word version 2003 and 512 MB of memory and represents the "low end". The Dell Model 650 workstation has two dual core Xeon CPU's each 3,056 MH as measured by MS Word and represents "high end" computing capability.

b34sexec matrix;* ispeed1 on pd matrix ;* ispeed2 on general matrix;* ispeed3 on complex general matrix;* up 625 has been run ;

igraph=0;ispeed1=1;ispeed2=1;ispeed3=1;upper=450;mesh=50;

/$ PD Results

if(ispeed1.ne.0)then;call echooff;icount=0;n=0;

top continue;

icount=icount+1;n=n+mesh;if(n .eq. upper)go to done;x=rec(matrix(n,n:)); x=transpose(x)*x;* x=complex(x,dsqrt(dabs(x)));

call compress;call timer(base10);e=seig(x);call timer(base20);

call compress;call timer(base110);e=seig(x,evec);call timer(base220);

call compress;call timer(base11);e=eig(x);

85

Matrix Command Language

call timer(base22);

call compress;call timer(base111);e=eig(x:lapack2);call timer(base222);

call compress;call timer(base1);e=eig(x,evec);call timer(base2);

call compress;call timer(base3);e=eig(x,evec,evec2 :lapack2);call timer(base4);

call compress;call timer(base5);e=eig(x,evec:lapack2);call timer(base6);

size(icount) = dfloat(n);sm1(icount) =base20-base10;sm2(icount) =base220-base110;eispack1(icount) =(base22-base11);lapack1(icount) =(base222-base111);eispack2(icount) =(base2-base1);lapack2a(icount) =(base4-base3);lapack2b(icount) =(base6-base5);

call free(x,xinv1,ii);

go to top;

done continue;

call print('EISPACK vs LAPACK on PD Matrix ':);call print('lapack2a gets both right and left eigenvectors':);call tabulate(size,sm1 sm2,eispack1,lapack1,eispack2,lapack2a,lapack2b);

if(igraph.eq.1)call graph(size sm1,sm2,eispack1,lapack1,eispack2,lapack2a,lapack2b :plottype xyplot :nokey :file 'pd_matrix.wmf' :heading 'Real*8 PD Matrix Results');endif;

if(ispeed2.ne.0)then;call echooff;icount=0;n=0;

top2 continue;

icount=icount+1;n=n+mesh;if(n .eq. upper)go to done2;x=rec(matrix(n,n:));* x=transpose(x)*x;* x=complex(x,dsqrt(dabs(x)));

call compress;call timer(base11);

86

Chapter 16

e=eig(x);call timer(base22);

call compress;call timer(base111);e=eig(x:lapack2);call timer(base222);

call compress;call timer(base1);e=eig(x,evec);call timer(base2);

call compress;call timer(base3);e=eig(x,evec,evec2 :lapack2);call timer(base4);

call compress;call timer(base5);e=eig(x,evec:lapack2);call timer(base6);

size(icount) = dfloat(n);eispack1(icount) =(base22-base11);lapack1(icount) =(base222-base111);eispack2(icount) =(base2-base1);lapack2a(icount) =(base4-base3);lapack2b(icount) =(base6-base5);

call free(x,xinv1,ii);

go to top2;

done2 continue;

call print('EISPACK vs LAPACK on General Matrix ':);call print('lapack2a gets both right and left eigenvectors':);call tabulate(size,eispack1,lapack1,eispack2,lapack2a,lapack2b);

if(igraph.eq.1)call graph(size ,eispack1,lapack1,eispack2,lapack2a,lapack2b :plottype xyplot :nokey :file 'real_8.wmf' :heading 'Real*8 General Matrix Results');endif;

if(ispeed3.ne.0)then;call echooff;icount=0;n=0;

top3 continue;

icount=icount+1;n=n+mesh;if(n .eq. upper)go to done3;x=rec(matrix(n,n:));x=complex(x,dsqrt(dabs(x)));

call compress;call timer(base11);e=eig(x);call timer(base22);

87

Matrix Command Language

call compress;call timer(base111);e=eig(x:lapack2);call timer(base222);

call compress;call timer(base1);e=eig(x,evec);call timer(base2);

call compress;call timer(base3);e=eig(x,evec,evec2 :lapack2);call timer(base4);

call compress;call timer(base5);e=eig(x,evec:lapack2);call timer(base6);

size(icount) = dfloat(n);eispack1(icount) =(base22-base11);lapack1(icount) =(base222-base111);eispack2(icount) =(base2-base1);lapack2a(icount) =(base4-base3);lapack2b(icount) =(base6-base5);

call free(x,xinv1,ii);

go to top3;

done3 continue;

call print('EISPACK vs LAPACK on a Complex General Matrix ':);call print('lapack2a gets both right and left eigenvectors':);call tabulate(size,eispack1,lapack1,eispack2,lapack2a,lapack2b);

if(igraph.eq.1)call graph(size ,eispack1,lapack1,eispack2,lapack2a,lapack2b :plottype xyplot :nokey :file 'complex_16.wmf' :heading 'Complex*16 Results');endif;

b34srun;

Results from running this script on the Dell 650 workstation are shown next. Note that:

lapack2a gets both right and left eigenvectorsSM1 => Eispack with eigenvalue only tred11/imtql1SM1 => Eispack eigenvalue and Vec. tred12/imtql2Eispack1 => Eispack with eigenvalue Only rgLapack1 => Lapack with eigenvalue Only DGEEVXEispack2 => Eispack eigenvalue and eigenvector rgLapack2a => Both right and left eigenvalues dgeevxLapack2b => Lapack eigenvalue and Vec. dgeevx

EISPACK vs LAPACK on PD Matrixlapack2a gets both right and left eigenvectors

Obs SIZE SM1 SM2 EISPACK1 LAPACK1 EISPACK2 LAPACK2A LAPACK2B

88

Chapter 16

1 50.00 0.000 0.000 0.000 0.000 0.1562E-01 0.000 0.000 2 100.0 0.000 0.1562E-01 0.1562E-01 0.1562E-01 0.1562E-01 0.3125E-01 0.4688E-01 3 150.0 0.1562E-01 0.1562E-01 0.3125E-01 0.4688E-01 0.6250E-01 0.1250 0.1250 4 200.0 0.1562E-01 0.6250E-01 0.6250E-01 0.1250 0.1719 0.3125 0.2969 5 250.0 0.3125E-01 0.1250 0.1406 0.2500 0.3438 0.6875 0.6406 6 300.0 0.4688E-01 0.2188 0.2656 0.4375 0.6875 1.156 1.062 7 350.0 0.7812E-01 0.4375 0.4531 0.7188 1.188 1.828 1.688 8 400.0 0.1406 0.6875 0.7656 1.094 1.938 2.766 2.609

EISPACK vs LAPACK on General Matrixlapack2a gets both right and left eigenvectors

Obs SIZE EISPACK1 LAPACK1 EISPACK2 LAPACK2A LAPACK2B 1 50.00 0.000 0.000 0.1562E-01 0.000 0.1562E-01 2 100.0 0.1562E-01 0.1562E-01 0.3125E-01 0.3125E-01 0.4688E-01 3 150.0 0.4688E-01 0.4688E-01 0.9375E-01 0.1094 0.1094 4 200.0 0.9375E-01 0.1094 0.2344 0.2656 0.2500 5 250.0 0.1875 0.2031 0.4844 0.5781 0.5312 6 300.0 0.3438 0.3594 0.9688 0.9844 0.9219 7 350.0 0.5781 0.5938 1.609 1.594 1.453 8 400.0 0.9844 0.8906 2.578 2.422 2.234

EISPACK vs LAPACK on a Complex General Matrixlapack2a gets both right and left eigenvectors

Obs SIZE EISPACK1 LAPACK1 EISPACK2 LAPACK2A LAPACK2B 1 50.00 0.1562E-01 0.1562E-01 0.3125E-01 0.1562E-01 0.1562E-01 2 100.0 0.4688E-01 0.4688E-01 0.1875 0.9375E-01 0.7812E-01 3 150.0 0.1719 0.1094 0.6875 0.2812 0.2656 4 200.0 0.3906 0.2656 1.609 0.6875 0.6406 5 250.0 0.8594 0.5625 3.344 1.438 1.328 6 300.0 1.672 0.9062 6.031 2.375 2.188 7 350.0 3.000 1.438 9.922 3.812 3.531 8 400.0 4.359 2.141 14.47 5.688 5.312

By using the appropriate routine to calculate the eigenvalues (TRED1/IMTQL1) of the PD matrix for order 400 the cost is 18.37% (.1406/.7656) of the real*8 general matrix eispack routine RG or 12.85% (.1406/1.094) of the more expensive lapack routine DGEEVX. What is surprising is that even at a size of 400 by 400, eispack dominates lapack in terms of time (.7656 vs 1.094) if only eigenvalues are requested. This gain appears to carry through for calculations involving eigenvectors. Since the column LAPACK2A, involves calculation of both right and left hand side eigenvectors, the correct column to compare is LAPACK2B. Here eispack time is 74.28% (1.938/2.609) of lapack.

For a real*8 general matrix the results are somewhat different. For just eigenvalues, lapack never dominates eispack. However if eigenvectors are requested, lapack begins to dominate in the range 300 to 350. At 200 and 250 the eispack / lapack) times were close to the same (.2344 /.2500) and (.4844 /.5312) while at 300 these became (.9688/.9219) respectively as lapack became slightly faster. At 400 the times were (2.578/2.234) and lapack was moving out as the clear winner. The lesson to be drawn is that the bigger the system, the more appropriate it is to use lapack. Note that the option :lapack2 has been used that turns off balancing. Thus, the reported tests are biased in favor of lapack since eispack balances. It has been found and noted by others, that balancing can be dangerous, especially for complex matrices. The job speed1.b34 in c:\b34slm\b34stest\ can be used to benchmark these results of different machines.

For a complex*16 matrix the cross over appears between 100 to 150 if only eigenvalues are requested. At 150 lapack is the clear winner with a time of .1094 vs eispack's .1719. By an order 400 matrix, the times were 4.359 and 2.141 respectively or over 2.0 times faster. If both eigenvalues and eigenvectors are requested for an order 400 matrix the relative times were 14.47 and 5.312 where lapack is 2.724 times faster. Since most problems are relatively small, these

89

Matrix Command Language

results suggest that for the time being eispack should be the default eigen routine for both real*8 and complex*16 problems. In additional advantage of eispack is that it runs with substantially less work memory. The lapack code was developed to run large matrices and make use of a block design. This seems to work especially well with complex*16 matrices. For real*16 and complex*32 matrices, lapack is not available and especially modified versions of the eispack routines are used. The options :lapack which calls DGEEV/ZGEEV runs into problems on large complex*16 matrices due to balancing. The effect of permutations and scaling can be investigated using the options :lapackp, :lapacks and :lapackb which turn on and off various options in DGEEVX/ZGEEVX. The above test problem appears to fails for large complex matrices generated using the sample code when these options are turned on. Section 16.6 discusses the ilaenv routine whereby alternative blocksizes can be investigated. It is to noted that the above times can be influenced by the size of the workspace which appears to influence memory access speed. The default B34S setting is to let lapack obtain the optimum work space. 16.6 A Preliminary Investigation of Inversion Speed Differences

There are substantial speed differences between various approaches to matrix inversion. The below listed program shows the relative speed of inverting a positive definite matrix of from size 50 to 600 using lapack, linpack, a SVD (for the pseudo inverse) and a Cholesky decomposition using linpack.

b34sexec matrix;* Tests speed of Linpack vs LAPACK vs svd (pinv) vs ;* Requires a large size ;call echooff;icount=0;n=0;upper=600;mesh=50;

top continue;

icount=icount+1;n=n+mesh;if(n .gt. upper)go to done;call print('Doing size ',n:);x=rn(matrix(n,n:));x=transpose(x)*x;ii=matrix(n,n:)+1.;

/$ Use LAPACK LU

call compress;call timer(base1);xinv1=inv(x:gmat);call timer(base2);error1(icount)=sum(dabs(ii-(xinv1*x)));

/$ Use LINPACK 'Default' LU

call compress;call timer(base3);xinv1=inv(x);call timer(base4);error2(icount)=sum(dabs(ii-(xinv1*x)));

/$ Use IMSL pinv code

call compress;

90

Chapter 16

call timer(base5);xinv1=pinv(x);call timer(base6);error3(icount)=sum(dabs(ii-(xinv1*x)));

/$ Use Linpack DPOCO / DPODI Code

call compress;call timer(base7);xinv1=inv(x :pdmat);call timer(base8);error4(icount)=sum(dabs(ii-(xinv1*x)));

size(icount) =dfloat(n);lapack(icount) =(base2-base1);linpack(icount)=(base4-base3);svdt(icount) =(base6-base5);chol(icount) =(base8-base7);

call free(x,xinv1,ii);

go to top;

done continue;

call tabulate(size,lapack,linpack,svdt,chol, error1,error2,error3,error4);call graph(size lapack,linpack :heading 'Lapack Vs Linpack' :plottype xyplot);call graph(size lapack,linpack svdt :heading 'LAPACK vs Linpack vs SVD' :plottype xyplot);b34srun;

Edited output from running the above program on a Dell 650 workstation machine using the built-in Fortran CPU timer produces:

Obs SIZE LAPACK LINPACK SVDT CHOL ERROR1 ERROR2 ERROR3 ERROR4 1 50.00 0.000 0.000 0.1562E-01 0.000 0.2129E-08 0.2129E-08 0.4866E-08 0.2845E-08 2 100.0 0.000 0.1562E-01 0.1562E-01 0.1562E-01 0.1133E-05 0.1109E-05 0.2193E-05 0.1418E-05 3 150.0 0.1562E-01 0.1562E-01 0.6250E-01 0.000 0.1352E-08 0.1315E-08 0.3145E-08 0.1685E-08 4 200.0 0.1562E-01 0.1562E-01 0.1406 0.1562E-01 0.1432E-05 0.1425E-05 0.3864E-05 0.1920E-05 5 250.0 0.3125E-01 0.3125E-01 0.3438 0.1562E-01 0.1982E-08 0.1957E-08 0.4886E-08 0.2428E-08 6 300.0 0.7812E-01 0.7812E-01 0.7812 0.3125E-01 0.1292E-05 0.1297E-05 0.3265E-05 0.1692E-05 7 350.0 0.1406 0.1562 1.281 0.7812E-01 0.2907E-07 0.2874E-07 0.6778E-07 0.3556E-07 8 400.0 0.2031 0.2656 1.906 0.1250 0.5379E-06 0.5386E-06 0.1298E-05 0.6448E-06 9 450.0 0.2812 0.3750 2.766 0.2031 0.4115E-06 0.4115E-06 0.1069E-05 0.5292E-06 10 500.0 0.3906 0.8281 5.328 0.4531 0.1325E-06 0.1315E-06 0.3427E-06 0.1671E-06 11 550.0 0.5469 0.8594 6.562 0.5312 0.1731E-06 0.1731E-06 0.4113E-06 0.2253E-06 12 600.0 0.8906 1.141 6.797 0.6719 0.6622E-06 0.6546E-06 0.1540E-05 0.7743E-06

The lapack code uses dgetrf/dgecon/dgetri, linpack uses dgeco/dgefa/dgedi and SVD uses the IMSL routine dslgrr which internally calls the linpack SVD routine dsvdc. The blocksize of the lapack workspace was optimized by a preliminary call to dgetrf which in turn calls the lapack ilaenv routine. All routines were compiled with Lahey Fortran LF95 version 7.1 release except for the IMSL code which was compiled with an earlier release of Lahey. Note that there is a call to compress before each test. If this is removed and placed right before go to top; there will be a noticeable speed difference, especially for the large systems. This is due to the fact that there is unused temporary space in memory and new temp variables are allocated quite a distance away from the x matrix. This "thrashing of memory" slows things down. The matrix command inv( ) defaults to the linpack LU solver since as this example shows for matrix sizes of 300 and smaller linpack at a speed near that of lapack. For larger systems,

91

Matrix Command Language

lapack runs faster.12 For example for order 600 the gain is ~ 21.95% ((1.141-.8906/1.141). If the system were a positive definite matrix, the gain would be ~41.11% ((1.141-.6719)/1.141) over the linpack LU solver. The SVD approach is ~ 10.12 times more costly (6.797/.6719) than the Cholesky and ~5.96 (6.797/1.141) times more expensive than the linpackl LU. For the problems run, the error calculated as is comparable.

The inv command uses linpack as the default inverter. Users with large problems involving general matrix systems should use the call inv(x:gmat) to get the LAPACK routines. The subroutine gminv uses lapack by default and optionally returns an indicator of whether the matrix is not full rank. Prior to a call to a lapack routine, the routine ilaenv is called to determine the optimum blocksize. This is then used to set the work space. The defaults are used. The B34S user is given the option of playing with these defaults, although most users may not have sufficient knowledge to make this choice.

Since many of the matrices in econometrics are positive definite, it is wasteful to calculated inverses with the general matrix LU approach. In addition since both linpack and lapack can detect if the matrix is positive definite, by using the Cholesky approach, one obtains speed and in addition has a built in error check against possible problems.. While the linpack routines dpoco/dpodi have stood the test of time since the 1970's, the lapack Cholesky routines dpotrf/dpocon/dpotri appear to have speed advantages for larger systems that are even greater than found for the LU inverters. The following code illustrates what was found:

b34sexec matrix;* Tests speed of Linpack vs LAPACK ;* Uses PD matrix;call echooff;icount=0;n=0;upper=700;mesh=25;

top continue;

icount=icount+1;n=n+mesh;if(n .eq. upper)go to done;x=rn(matrix(n,n:));x=transpose(x)*x;ii=matrix(n,n:)+1.;

/$ LINPACK PDF

call compress;call timer(base11);xinv1=inv(x :pdmat);call timer(base22);error(icount)=sumsq(ii-(xinv1*x));

/$ LAPACK PDF

call compress;call timer(base111);xinv1=inv(x :pdmat2);call timer(base222);

12 The tests shown here use as the default the optimum blocksize for lapack. A later example investigates the effect of blocksize changes on speed. At issue is that fact that lapack default workspace is quite large. Users with large probelms may run out of memory and have to reduce the blocksize of the calculation.

92

Chapter 16

error0(icount)=sumsq(ii-(xinv1*x));

/$ LAPACK LU

call compress;call timer(base1);xinv1=inv(x:gmat);call timer(base2);error1(icount)=sumsq(ii-(xinv1*x));

/$ LINPACK LU

call compress;call timer(base3);xinv1=inv(x);call timer(base4);error2(icount)=sumsq(ii-(xinv1*x));

size(icount) = dfloat(n);pdmat(icount) =(base22-base11);pdmat2(icount) =(base222-base111);lapack(icount) =(base2-base1);linpack(icount)=(base4-base3);

call free(x,xinv1,ii);

go to top;

done continue;

call print('LINPACK Cholesky vs LAPACK Cholesky':);call tabulate(size,pdmat,pdmat2, error, error0, lapack,linpack,error1,error2);

call graph(size pdmat,pdmat2,lapack,linpack :nokey :plottype xyplot);b34srun;

The above program was run on the Dell 650 machine with dual 3.056 MH Xeon processors, each with two cores. The operating system was XP Professional and the memory was 4.016 MB. B34S(r) Matrix Command. d/m/y 3/ 7/07. h:m:s 12:34: 0. => * TESTS SPEED OF LINPACK VS LAPACK $ => * USES PD MATRIX$ => CALL ECHOOFF$ LINPACK Cholesky vs LAPACK Cholesky Obs SIZE PDMAT PDMAT2 ERROR ERROR0 LAPACK LINPACK ERROR1 ERROR2 1 25.00 0.000 0.000 0.7573E-25 0.3868E-25 0.000 0.000 0.2938E-25 0.2647E-25 2 50.00 0.000 0.000 0.1331E-23 0.1380E-23 0.000 0.000 0.8757E-24 0.9335E-24 3 75.00 0.000 0.000 0.3375E-22 0.4386E-22 0.000 0.000 0.2817E-22 0.2816E-22 4 100.0 0.000 0.000 0.4514E-21 0.9178E-21 0.000 0.000 0.4007E-21 0.4350E-21 5 125.0 0.000 0.000 0.8331E-15 0.6825E-15 0.000 0.000 0.4070E-15 0.3993E-15 6 150.0 0.000 0.1562E-01 0.7372E-19 0.9395E-19 0.1562E-01 0.1562E-01 0.7390E-19 0.7555E-19 7 175.0 0.1562E-01 0.1562E-01 0.2189E-19 0.3052E-19 0.1562E-01 0.000 0.2251E-19 0.2182E-19 8 200.0 0.1562E-01 0.1562E-01 0.2318E-20 0.2453E-20 0.1562E-01 0.1562E-01 0.1688E-20 0.1794E-20 9 225.0 0.1562E-01 0.1562E-01 0.3924E-21 0.3721E-21 0.3125E-01 0.1562E-01 0.2295E-21 0.2246E-21 10 250.0 0.3125E-01 0.1562E-01 0.2808E-19 0.3311E-19 0.4688E-01 0.3125E-01 0.1474E-19 0.1403E-19 11 275.0 0.3125E-01 0.3125E-01 0.3262E-20 0.3096E-20 0.4688E-01 0.4688E-01 0.1694E-20 0.1689E-20 12 300.0 0.3125E-01 0.4688E-01 0.1200E-19 0.1277E-19 0.6250E-01 0.9375E-01 0.9001E-20 0.8945E-20 13 325.0 0.4688E-01 0.6250E-01 0.5676E-19 0.7629E-19 0.9375E-01 0.1250 0.5762E-19 0.5736E-19 14 350.0 0.7812E-01 0.6250E-01 0.1672E-18 0.1691E-18 0.1406 0.1719 0.1120E-18 0.1095E-18 15 375.0 0.9375E-01 0.7812E-01 0.1625E-17 0.1808E-17 0.1562 0.2188 0.9533E-18 0.9562E-18 16 400.0 0.1250 0.9375E-01 0.1240E-18 0.1074E-18 0.2031 0.2812 0.8809E-19 0.8789E-19 17 425.0 0.1562 0.1250 0.8648E-19 0.9579E-19 0.2344 0.3281 0.5594E-19 0.5497E-19 18 450.0 0.2031 0.1406 0.3484E-17 0.3320E-17 0.2812 0.4062 0.2044E-17 0.2065E-17 19 475.0 0.2656 0.1719 0.5813E-17 0.5600E-17 0.3281 0.5312 0.3526E-17 0.3493E-17 20 500.0 0.3438 0.1875 0.6721E-18 0.6349E-18 0.3906 0.6562 0.4566E-18 0.4522E-18 21 525.0 0.4375 0.2500 0.2468E-16 0.2307E-16 0.4531 0.7969 0.1327E-16 0.1301E-16 22 550.0 0.5469 0.2812 0.1045E-17 0.9424E-18 0.5469 0.9531 0.7014E-18 0.6929E-18 23 575.0 0.6406 0.3438 0.2694E-19 0.2545E-19 0.6094 1.125 0.2018E-19 0.1994E-19 24 600.0 0.7656 0.3750 0.3614E-14 0.3663E-14 0.7188 1.297 0.2943E-14 0.2858E-14

93

Matrix Command Language

25 625.0 0.8750 0.4375 0.3936E-16 0.4080E-16 0.7969 1.516 0.2895E-16 0.2929E-16 26 650.0 1.016 0.4688 0.3966E-16 0.4584E-16 0.9062 1.672 0.3556E-16 0.3567E-16 27 675.0 1.141 0.5625 0.1106E-16 0.1359E-16 1.000 1.891 0.9189E-17 0.9295E-17

For matrices above 325 there are increasing speed gains from using the lapack Cholesky inverters. At 675 the gain was on the order of a 50.70% reduction in cost ((1.141-.5625)/1.141) in lapack was used. An uninformed user who went with a general matrix inverter and selected the linpack LU inverter would have found costs went up 1.66 times (1.891/1.141) over the linpack Cholesky solver and 3.363 times (1.891/.5625) over what could be obtained with the lapack Cholesky routine.

A number of researchers have stayed with linpack due to possible accuracy issues

involving lapack. These appear to be related to not using the lapack subroutines correctly. For example the lapack LU factoring routine dgetrf provides a return INFO that in their words is set > 0 if "U(i,i) is exactly zero. The factorization has been completed, but the factor U is exactly singular, and division by zero will occur if it is used to solve a system of equations." Experience tells us that it is dangerous to proceed in near singular cases that are not trapped by INFO and that a call to dgecon to get the condition is in order. As an example, consider attempting to invert

x=matrix(3,3:1 2 3 4 5 6 7 8 9);

if the condition is not checked or is ignored. The linpack and lapack rcond values of this matrix are 0.20559686E-17 1.541976423090495E-18 and are not different from zero when tested as

if((rcond+1.0d+00).eq.1.0d+00)write(6,*)'Matrix is near Singular'

in Fortran. Letting the inverse proceed is very dangerous as is illustrated with the following MATLAB code:

>> x=[1 2 3;4 5 6; 7 8 9]x = 1 2 3 4 5 6 7 8 9>> ix=inv(x)Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 1.541976e-018.ix = -4.5036e+015 9.0072e+015 -4.5036e+015 9.0072e+015 -1.8014e+016 9.0072e+015 -4.5036e+015 9.0072e+015 -4.5036e+015>> ix*xans = 4 0 0 0 8 0 4 0 0

where .If there is concern over the accuracy of an OLS model, the QR approach can be used.

Another approach would be to use real*16 math which is possible with the matrix command The command r8tor16 that will convert a matrix prior to a call to inv. Possible accuracy improvements can be obtained in a matrix inversion using refinement and or refinement and

94

Chapter 16

equalization. These are done using the lapack routines dgesvx/zgesvx at the cost of a reduction in speed. The sample job gminv_4 in matrix.mac address accuracy questions.

b34sexec matrix;call echooff;n=100;* test1 and test3 use LAPACK ;x=rn(matrix(n,n:));* to show effect of balancing uncomment next statement; x(1,)=x(1,)*100000.;call gminv(x,xinv1,info);xinv2=inv(x);xinv3=inv(x:gmat);j=inv(x,rcond:gmat);j=inv(x,rcond2);xinv4=inv(x,rcond3 :refine);xinv5=inv(x,rcond4 :refinee);dtest=matrix(n,n:)+1.0;test1=x*xinv1;test2=x*xinv2;test3=x*xinv3;test4=x*xinv4;test5=x*xinv5;if(n.le.5)call print(x ,xinv1 ,xinv2,xinv3 ,test1,test2,test3);call print('Matrix is of order ',n:);call print('LAPACK 3 => refine':);call print('LAPACK 4 => refinee':);call print('Max Error for LAPACK 1', dmax(dabs(dtest-test1)):);call print('Max Error for LAPACK 2', dmax(dabs(dtest-test3)):);call print('Max Error for LAPACK 3', dmax(dabs(dtest-test4)):);call print('Max Error for LAPACK 4', dmax(dabs(dtest-test5)):);call print('Max Error for LINPACK ', dmax(dabs(dtest-test2)):);call print('Sum Error for LAPACK 1', sum(dabs(dtest-test1)):);call print('Sum Error for LAPACK 2', sum(dabs(dtest-test3)):);call print('Sum Error for LAPACK 3', sum(dabs(dtest-test4)):);call print('Sum Error for LAPACK 4', sum(dabs(dtest-test5)):);call print('Sum Error for LINPACK ', sum(dabs(dtest-test2)):);call print('Sumsq Error for LAPACK 1',sumsq(dtest-test1):);call print('Sumsq Error for LAPACK 2',sumsq(dtest-test3):);call print('Sumsq Error for LAPACK 3',sumsq(dtest-test4):);call print('Sumsq Error for LAPACK 4',sumsq(dtest-test5):);

call print('Sumsq Error for LINPACK ',sumsq(dtest-test2):);call print('rcond rcond2 rcond3,rcond4',rcond,rcond2,rcond3,rcond4);

cx=complex(x,dsqrt(dabs(x)));call gminv(cx,cxinv1,info);cxinv2=inv(cx);cxinv3=inv(cx:gmat);cxinv4=inv(cx,rcond3 :refine);cxinv5=inv(cx,rcond4 :refinee);dc=complex(dtest,0.0);test1=cx*cxinv1;test2=cx*cxinv2;test3=cx*cxinv3;test4=cx*cxinv4;test5=cx*cxinv5;j=inv(x,rcond:gmat);j=inv(x,rcond2);if(n.le.5)call print(cx,cxinv1,cxinv2,cxinv3,test1,test2,test3);call print('Matrix is of order ',n:);call print('Max Error for LAPACK 1 real', dmax(dabs(real(dc-test1))):);call print('Max Error for LAPACK 2 real', dmax(dabs(real(dc-test3))):);call print('Max Error for LAPACK 3 real', dmax(dabs(real(dc-test4))):);call print('Max Error for LAPACK 4 real', dmax(dabs(real(dc-test5))):);call print('Max Error for LINPACK real',

95

Matrix Command Language

dmax(dabs(real(dc-test2))):);call print('Max Error for LAPACK 1 imag', dmax(dabs(imag(dc-test1))):);call print('Max Error for LAPACK 2 imag', dmax(dabs(imag(dc-test3))):);call print('Max Error for LAPACK 3 imag', dmax(dabs(imag(dc-test4))):);call print('Max Error for LAPACK 4 imag', dmax(dabs(imag(dc-test5))):);call print('Max Error for LINPACK imag', dmax(dabs(imag(dc-test2))):);call print('Sum Error for LAPACK 1 real',sum(dabs(real(dc-test1))):);call print('Sum Error for LAPACK 2 real',sum(dabs(real(dc-test3))):);call print('Sum Error for LAPACK 3 real',sum(dabs(real(dc-test4))):);call print('Sum Error for LAPACK 4 real',sum(dabs(real(dc-test5))):);call print('Sum Error for LINPACK real',sum(dabs(real(dc-test2))):);call print('Sum Error for LAPACK 1 imag',sum(dabs(imag(dc-test1))):);call print('Sum Error for LAPACK 2 imag',sum(dabs(imag(dc-test3))):);call print('Sum Error for LAPACK 3 imag',sum(dabs(imag(dc-test4))):);call print('Sum Error for LAPACK 4 imag',sum(dabs(imag(dc-test5))):);call print('Sum Error for LINPACK imag',sum(dabs(imag(dc-test2))):);call print('Sumsq Error for LAPACK 1 real',sumsq(real(dc-test1)):);call print('Sumsq Error for LAPACK 2 real',sumsq(real(dc-test3)):);call print('Sumsq Error for LAPACK 3 real',sumsq(real(dc-test4)):);call print('Sumsq Error for LAPACK 4 real',sumsq(real(dc-test5)):);call print('Sumsq Error for LINPACK real',sumsq(real(dc-test2)):);call print('Sumsq Error for LAPACK 1 imag',sumsq(imag(dc-test1)):);call print('Sumsq Error for LAPACK 2 imag',sumsq(imag(dc-test3)):);call print('Sumsq Error for LAPACK 3 imag',sumsq(imag(dc-test4)):);call print('Sumsq Error for LAPACK 4 imag',sumsq(imag(dc-test5)):);call print('Sumsq Error for LINPACK imag',sumsq(imag(dc-test2)):);call print('rcond rcond2 rcond3,rcond4',rcond,rcond2,rcond3,rcond4);

b34srun;

The test job forms a 100 by 100 matrix of random normal numbers. The job was run with and with out multiplying the first row by 100000 to induce possible accuracy problems. The matrix xinv1 and xinv3 were calculated by the lapack LU inverters using the matrix command gminv and inv respectfully. These should run 100% the same and in the output are LAPACK 1 and LAPACK 2. The matrix XINV2 was calculated with the linpack default LU inverter and is shown in the tables as LINPACK, while XINV3 and XINV4 were calculated using refinement and equalization/refinement and are shown as LAPACK 3 and LAPACK 4 respectively. The goal is to see how much of an improvement refinement and equalization/refinement make. The maintained hypothesis is that in a poorly condition matrix, these added steps should, make a difference in accuracy and pay for their added cost.

Real Matrix – Row 1 AdjustedMatrix Command. Version January 2002.

=> CALL ECHOOFF$

Matrix is of order 100 LAPACK 3 => refine LAPACK 4 => refinee Max Error for LAPACK 1 7.334165275096893E-09 Max Error for LAPACK 2 7.334165275096893E-09 Max Error for LAPACK 3 8.585629984736443E-10 Max Error for LAPACK 4 7.275957614183426E-10 Max Error for LINPACK 5.918991519138217E-09 Sum Error for LAPACK 1 2.031745830718839E-07 Sum Error for LAPACK 2 2.031745830718839E-07 Sum Error for LAPACK 3 1.340332846354633E-08 Sum Error for LAPACK 4 1.186949409438610E-08

96

Chapter 16

Sum Error for LINPACK 1.520938054071593E-07 Sumsq Error for LAPACK 1 6.346678588451170E-16 Sumsq Error for LAPACK 2 6.346678588451170E-16 Sumsq Error for LAPACK 3 4.170551230676075E-18 Sumsq Error for LAPACK 4 3.291420393045907E-18 Sumsq Error for LINPACK 3.502669224237807E-16

rcond rcond2 rcond3,rcond4

RCOND = 0.15657795E-07

RCOND2 = 0.32449440E-07

RCOND3 = 0.15657795E-07

RCOND4 = 0.36041204E-04

Complex Case – Row 1 adjusted

Matrix is of order 100 Max Error for LAPACK 1 real 1.089574652723968E-09 Max Error for LAPACK 2 real 1.089574652723968E-09 Max Error for LAPACK 3 real 7.730704965069890E-11 Max Error for LAPACK 4 real 9.436007530894130E-11 Max Error for LINPACK real 8.449205779470503E-10 Max Error for LAPACK 1 imag 1.143234840128571E-09 Max Error for LAPACK 2 imag 1.143234840128571E-09 Max Error for LAPACK 3 imag 1.132320903707296E-10 Max Error for LAPACK 4 imag 1.371063262922689E-10 Max Error for LINPACK imag 9.201812645187601E-10 Sum Error for LAPACK 1 real 2.892584898682689E-08 Sum Error for LAPACK 2 real 2.892584898682689E-08 Sum Error for LAPACK 3 real 2.080174401783049E-09 Sum Error for LAPACK 4 real 2.380964692715299E-09 Sum Error for LINPACK real 1.709247001571822E-08 Sum Error for LAPACK 1 imag 2.400662145467662E-08 Sum Error for LAPACK 2 imag 2.400662145467662E-08 Sum Error for LAPACK 3 imag 2.333172253018985E-09 Sum Error for LAPACK 4 imag 3.145330114537025E-09 Sum Error for LINPACK imag 1.685032463933840E-08 Sumsq Error for LAPACK 1 real 1.305137954525119E-17 Sumsq Error for LAPACK 2 real 1.305137954525119E-17 Sumsq Error for LAPACK 3 real 8.000884036556901E-20 Sumsq Error for LAPACK 4 real 1.005147393906282E-19 Sumsq Error for LINPACK real 4.711921249005595E-18 Sumsq Error for LAPACK 1 imag 9.729510489806058E-18 Sumsq Error for LAPACK 2 imag 9.729510489806058E-18 Sumsq Error for LAPACK 3 imag 1.030133637684152E-19 Sumsq Error for LAPACK 4 imag 2.061903353378799E-19 Sumsq Error for LINPACK imag 5.314085939146504E-18

rcond rcond2 rcond3,rcond4

RCOND = 0.15657795E-07

RCOND2 = 0.32449440E-07

RCOND3 = 0.19119694E-06

RCOND4 = 0.36654208E-03

b34s Matrix Command Ending. Last Command reached.

Space available in allocator 2882597, peak space used 583540 Number variables used 38, peak number used 38 Number temp variables used 226, # user temp clean 0

The next part of the job does not have the first row multiplied by 1000,000. The first section is for a real*8 matrix.

97

Matrix Command Language

Matrix Command. Version January 2002.

=> CALL ECHOOFF$

Matrix is of order 100 LAPACK 3 => refine LAPACK 4 => refinee Max Error for LAPACK 1 1.365574320288943E-13 Max Error for LAPACK 2 1.365574320288943E-13 Max Error for LAPACK 3 3.086420008457935E-14 Max Error for LAPACK 4 3.086420008457935E-14 Max Error for LINPACK 1.767475055203249E-13 Sum Error for LAPACK 1 1.473775453617650E-10 Sum Error for LAPACK 2 1.473775453617650E-10 Sum Error for LAPACK 3 1.983027804464133E-11 Sum Error for LAPACK 4 1.983027804464133E-11 Sum Error for LINPACK 1.427848022651258E-10 Sumsq Error for LAPACK 1 4.135364007296756E-24 Sumsq Error for LAPACK 2 4.135364007296756E-24 Sumsq Error for LAPACK 3 1.074041726812546E-25 Sumsq Error for LAPACK 4 1.074041726812546E-25 Sumsq Error for LINPACK 3.829022654580863E-24

rcond rcond2 rcond3,rcond4

RCOND = 0.41205738E-04

RCOND2 = 0.84677306E-04

RCOND3 = 0.41205738E-04

RCOND4 = 0.41205738E-04

Complex*16 matrix. Row # 1 not adjusted. Matrix is of order 100 Max Error for LAPACK 1 real 1.976196983832779E-14 Max Error for LAPACK 2 real 1.976196983832779E-14 Max Error for LAPACK 3 real 3.736594367254042E-15 Max Error for LAPACK 4 real 3.736594367254042E-15 Max Error for LINPACK real 2.207956040223280E-14 Max Error for LAPACK 1 imag 2.116362640691705E-14 Max Error for LAPACK 2 imag 2.116362640691705E-14 Max Error for LAPACK 3 imag 3.580469254416130E-15 Max Error for LAPACK 4 imag 3.580469254416130E-15 Max Error for LINPACK imag 2.059463710679665E-14 Sum Error for LAPACK 1 real 3.078720134733204E-11 Sum Error for LAPACK 2 real 3.078720134733204E-11 Sum Error for LAPACK 3 real 3.024266750114961E-12 Sum Error for LAPACK 4 real 3.024266750114961E-12 Sum Error for LINPACK real 2.831859478359157E-11 Sum Error for LAPACK 1 imag 3.166186637957452E-11 Sum Error for LAPACK 2 imag 3.166186637957452E-11 Sum Error for LAPACK 3 imag 2.939662402390297E-12 Sum Error for LAPACK 4 imag 2.939662402390297E-12 Sum Error for LINPACK imag 2.766293287102470E-11 Sumsq Error for LAPACK 1 real 1.652403344027760E-25 Sumsq Error for LAPACK 2 real 1.652403344027760E-25 Sumsq Error for LAPACK 3 real 1.862322216364066E-27 Sumsq Error for LAPACK 4 real 1.862322216364066E-27 Sumsq Error for LINPACK real 1.447918924517758E-25 Sumsq Error for LAPACK 1 imag 1.770763319588756E-25 Sumsq Error for LAPACK 2 imag 1.770763319588756E-25 Sumsq Error for LAPACK 3 imag 1.727395868198397E-27 Sumsq Error for LAPACK 4 imag 1.727395868198397E-27 Sumsq Error for LINPACK imag 1.354019631664373E-25

rcond rcond2 rcond3,rcond4

98

Chapter 16

RCOND = 0.41205738E-04

RCOND2 = 0.84677306E-04

RCOND3 = 0.34681902E-03

RCOND4 = 0.34681902E-03

B34S Matrix Command Ending. Last Command reached.

Space available in allocator 2882597, peak space used 593517 Number variables used 38, peak number used 39 Number temp variables used 221, # user temp clean 0

In the real case with an adjustment to the first row, linpack max error and sum of squared error of 5.9189d-09 and 3.503E-16 were slightly less than the comparable LAPACK1 and LAPACK2 values of 7.3341E-09 and 6.347E-16 respectively. For the refinement cases (LAPACK3 and LAPACK 4) the max errors were 8.586E-10 and 7.276E-10 respectively. The sum of squares of equalization/refinement errors were 4.171E-18 and 3.291E-18 respectively and should be compared to the lapack and linpack sum of squared values of 2.031745E-7 and 1.520938E-7 indicating that these adjustments make a real difference.

Looking at the matrix where row 1 was not adjusted, we see the lapack max error of 1.366E-13, the linpack max error of 1.767E-13 and the two refinement cases getting the same 3.086E-14 since equalization of the matrix did not have to be done. Sum of squared errors for lapack, linpack and the refinement cases were 4.135E-24, 3.829E-24 and 1.0740E-25 respectively. Here refinement makes a small but dectable difference. Here the accuracy is better than in the first more difficult case. Users are invited to experiment with this job and try the real*16 add and real*16 multiply adjustments to the blas routines ddot and dsum which provide a way to increase accuracy without going to real*8 math.

Results for the complex*16 case are listed and follow the pattern with respect to showing a gain for refinement, especially on the real side using the sum of squared error criteria. For the adjusted case the linpack routines slightly outperforms the lapack code using the max error criteria. For example the max error for the real part of the matrix when inverted by lapack was 1.089574E-9 which is larger than the linpack value of 8.4492057E-10. For the un-adjusted case, the pattern reverses with the corresponding values being almost the same as 2.207956E-14 and 1.9761968E-14 respectively. The same pattern is observed for the imag part of the matrix.

The natural question to ask concerns the relative cost of refinement and or equalization which were shown to improve the inverse calculation, especially in problem matrix cases. The relative speeds of various inversion strategies is investigated in test job INVSPEED in matrix.mac was run with matrices of order 200 – 600 by 100 incruments and times indicated. The code run is:

b34sexec matrix;* By setting n to different values we test and compare inverse speed;call echooff;

do n=200,600,100;x=rec(matrix(n,n:));pdx=transpose(x)*x;dd= matrix(n,n:)+1.;cdd=complex(dd,0.0);nn=namelist(math,inv,gmat,smat,pdmat,pdmat2,refine,refinee,);

99

Matrix Command Language

cpdx=complex(x,mfam(dsqrt(x)));scpdx=transpose(cpdx)*cpdx;cpdx=dconj(transpose(cpdx))*cpdx;if(n.le.5)call print(pdx,cpdx,scpdx,eig(pdx),eig(cpdx),eig(scpdx));call compress;

/; call print('Using LINPACK DGECO/DGRDI - ZGECO/ZGEDI':);call timer(base1);xinv=(1.0/pdx);call timer(base2);/; call print('Inverse using (1.0/pdx) took',(base2-base1):);realm(1)=base2-base1;error1(1)=sumsq((pdx*xinv)-dd);call compress;

call timer(base1);cinv=(complex(1.0,0.)/cpdx);call timer(base2);/; call print('Inverse using (1.0/cpdx) took',(base2-base1):);complexm(1)=base2-base1;error2a(1)=sumsq(real((cpdx*cinv)-cdd));error2b(1)=sumsq(imag((cpdx*cinv)-cdd));call compress;

call timer(base1);xinv=inv(pdx);call timer(base2);/; call print('Inverse using inv(pdx) took',(base2-base1):);realm(2)=base2-base1;error1(2)=sumsq((pdx*xinv)-dd);call compress;

call timer(base1);cinv=inv(cpdx);call timer(base2);/; call print('Inverse using inv(cpdx) took',(base2-base1):);complexm(2)=base2-base1;error2a(2)=sumsq(real((cpdx*cinv)-cdd));error2b(2)=sumsq(imag((cpdx*cinv)-cdd));

call compress;

/; call print('Using LAPACK ':);

call timer(base1);xinv=inv(pdx:GMAT);call timer(base2);/; call print('Inverse using inv(pdx:GMAT) took',(base2-base1):);realm(3)=base2-base1;error1(3)=sumsq((pdx*xinv)-dd);call compress;

call timer(base1);cinv=inv(cpdx:GMAT);call timer(base2);/; call print('Inverse using inv(cpdx:GMAT) took',(base2-base1):);complexm(3)=base2-base1;error2a(3)=sumsq(real((cpdx*cinv)-cdd));error2b(3)=sumsq(imag((cpdx*cinv)-cdd));

call compress;

/; call print('Using LINPACK':);

call timer(base1);xinv=inv(pdx:SMAT);call timer(base2);/; call print('Inverse using inv(pdx:SMAT) took',(base2-base1):);realm(4)=base2-base1;error1(4)=sumsq((pdx*xinv)-dd);call compress;

100

Chapter 16

call timer(base1);cinv=inv(scpdx:SMAT);call timer(base2);/; call print('Inverse using inv(scpdx:SMAT) took',(base2-base1):);complexm(4)=base2-base1;error2a(4)=sumsq(real((scpdx*cinv)-cdd));error2b(4)=sumsq(imag((scpdx*cinv)-cdd));call compress;

/; call print('Using LINPACK':);

call timer(base1);xinv=inv(pdx:PDMAT);call timer(base2);/; call print('Inverse using inv(pdx:PDMAT) took',(base2-base1):);realm(5)=base2-base1;error1(5)=sumsq((pdx*xinv)-dd);call compress;

call timer(base1);cinv=inv(cpdx:PDMAT);call timer(base2);/; call print('Inverse using inv(cpdx:PDMAT) took',(base2-base1):);complexm(5)=base2-base1;error2a(5)=sumsq(real((cpdx*cinv)-cdd));error2b(5)=sumsq(imag((cpdx*cinv)-cdd));/; call compress;

/; call print('Using LAPACK':);call timer(base1);xinv=inv(pdx:PDMAT2);call timer(base2);/; call print('Inverse using inv(pdx:PDMAT2) took',(base2-base1):);realm(6)=base2-base1;error1(6)=sumsq((pdx*xinv)-dd);/; call compress;

call timer(base1);cinv=inv(cpdx:PDMAT2);call timer(base2);/; call print('Inverse using inv(cpdx:PDMAT2) took',(base2-base1):);complexm(6)=base2-base1;error2a(6)=sumsq(real((cpdx*cinv)-cdd));error2b(6)=sumsq(imag((cpdx*cinv)-cdd));

/; call print('Using LAPACK':);

call timer(base1);xinv=inv(pdx:REFINE);call timer(base2);/; call print('Inverse using inv(pdx:REFINE) took',(base2-base1):);realm(7)=base2-base1;error1(7)=sumsq((pdx*xinv)-dd);call compress;

call timer(base1);cinv=inv(cpdx:REFINE);call timer(base2);/; call print('Inverse using inv(cpdx:REFINE) took',(base2-base1):);complexm(7)=base2-base1;error2a(7)=sumsq(real((cpdx*cinv)-cdd));error2b(7)=sumsq(imag((cpdx*cinv)-cdd));

call compress;

/; call print('Using LAPACK':);

call timer(base1);xinv=inv(pdx:REFINEE);call timer(base2);

101

Matrix Command Language

/; call print('Inverse using inv(pdx:REFINEE) took',(base2-base1):);realm(8)=base2-base1;error1(8)=sumsq((pdx*xinv)-dd);call compress;

call timer(base1);cinv=inv(cpdx:REFINEE);call timer(base2);/; call print('Inverse using inv(cpdx:REFINEE) took',(base2-base1):);complexm(8)=base2-base1;error2a(8)=sumsq(real((cpdx*cinv)-cdd));error2b(8)=sumsq(imag((cpdx*cinv)-cdd));

/; call print('Error2a and error2b = real and imag Complex*16 error':);call print(' ':);call print('Matrix Order',n:);call tabulate(nn,realm,error1,complexm,error2a,error2b);call compress;enddo;

The columns REALM and COMPLEXM refer to the times for real and complex matrices. For real matrices, ERROR1 = . For complex matrices ERROR2 and ERROR3 refer to the real and imaginary parts of the complex X matrix. The general solvers, symmetric matrix solvers and positive definite solvers were all applied to the same positive definite matrix.SMAT and PDMAT refer to the linpack symmetric and Cholesky inverters, while PDMAT2 refers to the lapack Cholesky inverters. The results for the Dell Latitude computer were:B34S(r) Matrix Command. d/m/y 4/ 7/07. h:m:s 14:12: 6.

=> * BY SETTING N TO DIFFERENT VALUES WE TEST AND COMPARE INVERSE SPEED$

=> CALL ECHOOFF$

Matrix Order 200

Obs NN REALM ERROR1 COMPLEXM ERROR2A ERROR2B 1 MATH 0.4006E-01 0.2575E-17 0.1803 0.2944E-17 0.1246E-17 2 INV 0.4006E-01 0.2575E-17 0.1702 0.2944E-17 0.1246E-17 3 GMAT 0.6009E-01 0.3765E-17 0.1702 0.1686E-17 0.1233E-17 4 SMAT 0.3004E-01 0.6412E-18 0.9013E-01 0.1409E-18 0.9842E-19 5 PDMAT 0.4006E-01 0.1755E-17 0.8011E-01 0.1063E-17 0.7556E-18 6 PDMAT2 0.4006E-01 0.9240E-18 0.1001 0.1542E-17 0.1124E-17 7 REFINE 0.5107 0.5322E-18 1.652 0.4179E-18 0.2918E-18 8 REFINEE 0.5207 0.5322E-18 1.642 0.4179E-18 0.2918E-18

Matrix Order 300

Obs NN REALM ERROR1 COMPLEXM ERROR2A ERROR2B 1 MATH 0.2103 0.4420E-17 0.9514 0.2733E-16 0.1806E-16 2 INV 0.2203 0.4420E-17 0.9013 0.2733E-16 0.1806E-16 3 GMAT 0.2003 0.6948E-17 0.6509 0.2860E-16 0.1147E-16 4 SMAT 0.9013E-01 0.1317E-17 0.3205 0.1013E-17 0.8467E-18 5 PDMAT 0.8012E-01 0.4051E-17 0.3405 0.1892E-16 0.1285E-16 6 PDMAT2 0.1102 0.1367E-16 0.3305 0.9589E-17 0.8387E-17 7 REFINE 2.444 0.1249E-17 5.858 0.4138E-17 0.4494E-17 8 REFINEE 2.343 0.1249E-17 5.868 0.4138E-17 0.4494E-17

Matrix Order 400

Obs NN REALM ERROR1 COMPLEXM ERROR2A ERROR2B 1 MATH 0.8412 0.8782E-16 2.614 0.7911E-16 0.7384E-16 2 INV 0.9313 0.8782E-16 2.534 0.7911E-16 0.7384E-16 3 GMAT 0.5808 0.1007E-15 1.562 0.1187E-15 0.5403E-16

102

Chapter 16

4 SMAT 0.2804 0.8933E-17 1.001 0.2147E-17 0.2602E-17 5 PDMAT 0.2704 0.8864E-16 1.102 0.4047E-16 0.3070E-16 6 PDMAT2 0.2403 0.5656E-16 0.7511 0.4240E-16 0.2621E-16 7 REFINE 6.179 0.7450E-17 14.08 0.1410E-16 0.1249E-16 8 REFINEE 5.999 0.7450E-17 14.41 0.1410E-16 0.1249E-16

Matrix Order 500

Obs NN REALM ERROR1 COMPLEXM ERROR2A ERROR2B 1 MATH 1.933 0.9835E-12 5.939 0.3184E-14 0.2082E-14 2 INV 1.963 0.9835E-12 5.528 0.3184E-14 0.2082E-14 3 GMAT 1.182 0.1467E-11 3.315 0.4225E-14 0.4437E-14 4 SMAT 0.6910 0.1207E-12 2.153 0.2204E-16 0.2372E-16 5 PDMAT 0.6810 0.3561E-12 2.874 0.4918E-14 0.3120E-14 6 PDMAT2 0.5308 0.8006E-12 1.602 0.2881E-14 0.3356E-14 7 REFINE 10.53 0.1149E-12 27.72 0.6522E-15 0.3650E-15 8 REFINEE 10.85 0.1149E-12 27.45 0.6522E-15 0.3650E-15

Matrix Order 600

Obs NN REALM ERROR1 COMPLEXM ERROR2A ERROR2B 1 MATH 4.016 0.1054E-14 10.76 0.1935E-14 0.1553E-14 2 INV 4.126 0.1054E-14 10.52 0.1935E-14 0.1553E-14 3 GMAT 2.143 0.1307E-14 5.688 0.2383E-14 0.1676E-14 4 SMAT 1.422 0.1689E-15 3.946 0.5304E-16 0.4960E-16 5 PDMAT 1.913 0.5407E-15 5.718 0.1455E-14 0.1471E-14 6 PDMAT2 0.9814 0.5722E-15 2.904 0.1774E-14 0.1299E-14 7 REFINE 25.91 0.1457E-15 48.27 0.2620E-15 0.2328E-15 8 REFINEE 26.41 0.1457E-15 48.26 0.2620E-15 0.2328E-15

For the Dell Workstation 650 the following was obtained:B34S(r) Matrix Command. d/m/y 5/ 7/07. h:m:s 11:14: 1.

=> * BY SETTING N TO DIFFERENT VALUES WE TEST AND COMPARE INVERSE SPEED$

=> CALL ECHOOFF$

Matrix Order 200

Obs NN REALM ERROR1 COMPLEXM ERROR2A ERROR2B 1 MATH 0.1562E-01 0.2575E-17 0.7812E-01 0.2944E-17 0.1246E-17 2 INV 0.1562E-01 0.2575E-17 0.6250E-01 0.2944E-17 0.1246E-17 3 GMAT 0.3125E-01 0.3765E-17 0.7812E-01 0.1686E-17 0.1233E-17 4 SMAT 0.1562E-01 0.6412E-18 0.3125E-01 0.1409E-18 0.9842E-19 5 PDMAT 0.000 0.1755E-17 0.3125E-01 0.1063E-17 0.7556E-18 6 PDMAT2 0.1562E-01 0.9240E-18 0.3125E-01 0.1542E-17 0.1124E-17 7 REFINE 0.1875 0.5322E-18 0.5469 0.4179E-18 0.2918E-18 8 REFINEE 0.1875 0.5322E-18 0.5312 0.4179E-18 0.2918E-18

Matrix Order 300

Obs NN REALM ERROR1 COMPLEXM ERROR2A ERROR2B 1 MATH 0.9375E-01 0.4420E-17 0.2656 0.2733E-16 0.1806E-16 2 INV 0.9375E-01 0.4420E-17 0.2812 0.2733E-16 0.1806E-16 3 GMAT 0.7812E-01 0.6948E-17 0.1875 0.2860E-16 0.1147E-16 4 SMAT 0.3125E-01 0.1317E-17 0.1250 0.1013E-17 0.8467E-18 5 PDMAT 0.4688E-01 0.4051E-17 0.1250 0.1892E-16 0.1285E-16 6 PDMAT2 0.3125E-01 0.1367E-16 0.1094 0.9589E-17 0.8387E-17 7 REFINE 0.7969 0.1249E-17 1.875 0.4138E-17 0.4494E-17 8 REFINEE 0.7969 0.1249E-17 1.859 0.4138E-17 0.4494E-17

Matrix Order 400

Obs NN REALM ERROR1 COMPLEXM ERROR2A ERROR2B 1 MATH 0.2656 0.8782E-16 0.7344 0.7911E-16 0.7384E-16 2 INV 0.2344 0.8782E-16 0.7031 0.7911E-16 0.7384E-16

103

Matrix Command Language

3 GMAT 0.1719 0.1007E-15 0.4844 0.1187E-15 0.5403E-16 4 SMAT 0.1094 0.8933E-17 0.3125 0.2147E-17 0.2602E-17 5 PDMAT 0.1250 0.8864E-16 0.4219 0.4047E-16 0.3070E-16 6 PDMAT2 0.7812E-01 0.5656E-16 0.2656 0.4240E-16 0.2621E-16 7 REFINE 2.047 0.7450E-17 4.375 0.1410E-16 0.1249E-16 8 REFINEE 2.016 0.7450E-17 4.359 0.1410E-16 0.1249E-16

Matrix Order 500

Obs NN REALM ERROR1 COMPLEXM ERROR2A ERROR2B 1 MATH 0.5625 0.9835E-12 1.484 0.3184E-14 0.2082E-14 2 INV 0.5781 0.9835E-12 1.438 0.3184E-14 0.2082E-14 3 GMAT 0.3750 0.1467E-11 0.9531 0.4225E-14 0.4437E-14 4 SMAT 0.2656 0.1207E-12 0.6406 0.2204E-16 0.2372E-16 5 PDMAT 0.3281 0.3561E-12 0.8438 0.4918E-14 0.3120E-14 6 PDMAT2 0.2344 0.8006E-12 0.5781 0.2881E-14 0.3356E-14 7 REFINE 3.484 0.1149E-12 8.562 0.6522E-15 0.3650E-15 8 REFINEE 3.531 0.1149E-12 8.406 0.6522E-15 0.3650E-15

Matrix Order 600

Obs NN REALM ERROR1 COMPLEXM ERROR2A ERROR2B 1 MATH 1.141 0.1054E-14 2.547 0.1935E-14 0.1553E-14 2 INV 1.109 0.1054E-14 2.516 0.1935E-14 0.1553E-14 3 GMAT 0.6719 0.1307E-14 1.688 0.2383E-14 0.1676E-14 4 SMAT 0.4844 0.1689E-15 1.109 0.5304E-16 0.4960E-16 5 PDMAT 0.6875 0.5407E-15 1.500 0.1455E-14 0.1471E-14 6 PDMAT2 0.3906 0.5722E-15 0.9531 0.1774E-14 0.1299E-14 7 REFINE 8.453 0.1457E-15 14.56 0.2620E-15 0.2328E-15 8 REFINEE 8.516 0.1457E-15 15.14 0.2620E-15 0.2328E-15

Math refers to using the form invx=1./x; while INV uses invx=inv(x);. Since both call the same linpack routines, they should and do run the same. GMAT uses lapack while SMAT and PDMAT use the linpack routines for symmetric (DSICO, DSIFA and DSIDI) and positive definite (DPOCO, DPOFA and DOPDI) matrices respectfully. The errors are the sum of squared errors where ERROR2A and ERROR2B refer to the real and imaginary part of the complex matrix. Here as expected the refine and refinee options are superior, although at great cost as shown in table 16.1

104

Chapter 16

Table 16.1 Relative Cost of Equalization & Refinement of a General Matrix

Computer Dell Latitude Dell 650 WorkstationReal Matrix Real Matrix

Order LU Equal_Refine Cost LU Equal_Refine Cost200 0.06009 0.5207 7.665 0.03125 0.1875 5.000300 0.203 2.343 10.542 0.07812 0.7969 9.201400 0.5808 5.999 9.329 0.1719 2.016 10.728500 1.182 10.85 8.179 0.375 3.531 8.416600 2.143 26.41 11.324 0.6719 8.516 11.675

Mean 9.408 9.004Complex Matrix Complex Matrix

200 0.1702 1.642 8.647 0.1702 0.642 2.772300 0.6509 5.868 8.015 0.6509 5.868 8.015400 1.562 14.41 8.225 1.562 14.41 8.225500 3.315 27.45 7.281 3.315 27.45 7.281600 5.688 48.26 7.485 5.688 48.26 7.485

Mean 7.931 6.756

The cost of equalization and refinement is around 9 time more for real matrices and 7-8 times more for complex matrices. The calculations in Table 16.1were made with positive definite matrices since in many econometric applications the matrix involved is positive definite. While Dell Latitude was substantially slower, the relative cost of equalization and refinement was relative stable across machines. At matrices of 300 or larger using the Dell workstation, lapack was faster. For the Dell Lattitide the cross over point was at matrices of size 400 or larger. For the Cholesky inverters, at order 600 the gain from lapack was around 2 (1.949=1.913/.9814) for the Dell Latitude and 1.8 (1.76=.6875/.3906) for the Dell Workstation over linpack.

The above tests have been performed using the lapack default blocksize as calculated by the routine ilanenv. The below listed job (LAPACK_2 in matrix.mac) investigates the gains from alternate blocksizes using a Dell 650 running 3.04 GH and a Dell Latitude running a 1.0 GH chip.

/$ Blocksize tests

b34sexec matrix;call echooff;

isize=12;Mat_ord =array(isize:);linpack =array(isize:);lapack1 =array(isize:);lapack4 =array(isize:);lapack7 =array(isize:);lapack10 =array(isize:);lapack13 =array(isize:);lapack16 =array(isize:);lapack19 =array(isize:);lapackd =array(isize:);

j=0;do i=1,19,3;

105

Matrix Command Language

n=64;

top continue;j=j+1;if(n.gt.768)go to endit;

/; call print('Order of Matrix ',n:);mat_ord(j)=n;x=rec(matrix(n,n:));

/; set blocksize for lapack

/; LINPACK need only to be run one time

call lapack(1,i);

if(i.eq.1)then;call timer(t1);xx=inv(x);call timer(t2);/; call print('LINPACK time ',t2-t1:);linpack(j)=t2-t1;call compress;endif;

call timer(t1);xx=inv(x:gmat);call timer(t2);/; call print('LAPACK time ',t2-t1:);if(i.eq.1)lapack1(j)=t2-t1;if(i.eq.4)lapack4(j)=t2-t1;if(i.eq.7)lapack7(j)=t2-t1;if(i.eq.10)lapack10(j)=t2-t1;if(i.eq.13)lapack13(j)=t2-t1;if(i.eq.16)lapack16(j)=t2-t1;if(i.eq.19)lapack19(j)=t2-t1;call compress;

if(i.eq.1)then;call lapack(:reset);call timer(t1);xx=inv(x:gmat);call timer(t2);/; call print('LAPACK Defaults ',t2-t1:);lapackd(j)=t2-t1;call compress;endif;

n=n+64;go to top;

endit continue;j=0;

enddo;

call print(' ':);call print('Effects on Relative Speed of LAPACK blocksize':);call tabulate(mat_ord,linpack,lapack1,lapack4,lapack7, lapack10,lapack13,lapack16,lapack19, lapackd);

b34srun;

Matrices of rank 64, 128,...,768 were generated using the rectangular IMSL random number generator. Since these numbers are in the range 0.0 – 1.0, then the large numbers possible from a random normal generator are not observed and there is less likelihood of one matrix having a problem being inverted. The job was run with 12,000,000 real*8 workspace. The variables

106

Chapter 16

LAPACKi refer to a lapack inversion where the blocksize was set as i. LAPACKD is the default recommended blocksize as calculated by the lapack routine ilanenv. Edited output from this job is listed next for both the Dell Latitude and Dell Workstation to control for possible chip related differences in addition to chip speed.

B34S(r) Matrix Command. d/m/y 4/ 7/07. h:m:s 16:17:50.

Run with Dell Latitude=> CALL ECHOOFF$

Effects on Relative Speed of LAPACK blocksize

Obs MAT_ORD LINPACK LAPACK1 LAPACK4 LAPACK7 LAPACK10 LAPACK13 LAPACK16 LAPACK19 LAPACKD 1 64 0.000 0.000 0.000 0.1002E-01 0.000 0.1001E-01 0.000 0.000 0.000 2 128 0.2003E-01 0.2003E-01 0.2003E-01 0.2003E-01 0.2003E-01 0.2003E-01 0.2003E-01 0.2002E-01 0.1001E-01 3 192 0.3004E-01 0.5007E-01 0.5007E-01 0.6009E-01 0.5007E-01 0.6009E-01 0.6009E-01 0.6009E-01 0.5007E-01 4 256 0.9013E-01 0.1302 0.1202 0.1402 0.1402 0.1302 0.1302 0.1302 0.1402 5 320 0.3205 0.3505 0.3104 0.2904 0.3004 0.3004 0.3104 0.2904 0.2804 6 384 0.7110 0.8212 0.6209 0.6009 0.5708 0.5708 0.6209 0.5508 0.5107 7 448 1.312 1.472 1.062 0.9914 0.9614 0.9614 0.9714 0.9113 0.8813 8 512 2.854 2.604 1.722 1.612 1.552 1.482 1.622 1.472 1.392 9 576 3.435 3.725 2.363 2.223 2.313 2.163 2.273 2.063 2.063 10 640 5.147 4.977 3.355 3.104 3.205 2.914 2.874 2.914 2.664 11 704 7.160 6.710 4.737 4.116 4.006 3.936 3.805 3.805 3.605 12 768 10.30 8.883 5.838 5.348 5.217 5.067 5.047 5.207 4.677

Run on Dell Precision Workstation 650

B34S(r) Matrix Command. d/m/y 5/ 7/07. h:m:s 11:30:52.

=> CALL ECHOOFF$

Effects on Relative Speed of LAPACK blocksize

Obs MAT_ORD LINPACK LAPACK1 LAPACK4 LAPACK7 LAPACK10 LAPACK13 LAPACK16 LAPACK19 LAPACKD 1 64 0.000 0.000 0.1562E-01 0.000 0.000 0.000 0.1562E-01 0.000 0.000 2 128 0.1562E-01 0.000 0.000 0.1562E-01 0.000 0.1562E-01 0.000 0.1562E-01 0.1562E-01 3 192 0.1562E-01 0.1562E-01 0.1562E-01 0.1562E-01 0.3125E-01 0.1562E-01 0.1562E-01 0.1562E-01 0.3125E-01 4 256 0.3125E-01 0.4688E-01 0.3125E-01 0.4688E-01 0.4688E-01 0.4688E-01 0.4688E-01 0.4688E-01 0.4688E-01 5 320 0.1094 0.1250 0.1094 0.1094 0.1094 0.9375E-01 0.1094 0.9375E-01 0.9375E-01 6 384 0.2344 0.2656 0.2188 0.2031 0.1875 0.2031 0.1875 0.1875 0.1875 7 448 0.3750 0.4375 0.3281 0.3281 0.3125 0.2969 0.2969 0.3125 0.2812 8 512 0.7500 0.7344 0.5469 0.5156 0.5000 0.5156 0.5000 0.4844 0.4531 9 576 0.9844 0.9844 0.7500 0.7031 0.7031 0.7031 0.6719 0.6562 0.6250 10 640 1.422 1.375 1.031 0.9688 0.9688 0.9219 0.9375 0.9062 0.8906 11 704 1.891 1.828 1.359 1.297 1.266 1.234 1.250 1.219 1.156 12 768 2.516 2.328 1.781 1.656 1.656 1.594 1.609 1.578 1.516

The findings indicate that the default blocksize lapack is faster than linpack for matrices of sizes greater than 320. The blocksize=1 lapack never beats linpack until size 512 when the times were .7344 vs .7500 and 2.604 vs 2.854 on the workstation and Latitude respectively. On the workstation at size 768, linpack was running at 2.516, while lapack was running at 2.328 or 1.516 with the default blocksize. For a general matrix lapack defaults suggest a blocksize of 64, with a minimum blocksize of 2. The cross over point is set as 128. The above data suggests that the linpack/lapack cross over for ther workstation using the default blocksize is between 256 to 320. Note that at 256 the ratio was 3.125E-2/4.6875E-2 while at 320 that ratio tipped in lapack's favor to be .09375/.1094. To investigate whether a blocksize > 1 but far less than 64 would be of benefit, in the above job we set i=4, 7, 10, 13,16 and 19 and repeated the calculations. For a matrix size of 768 and i=4 the lapack time fell to 1.781 from 2.328. For i=7 the time was marginally better falling to 1.656 which is close to the default setting of 1.516. The above experiment outlines the gains for blocksize adjustments but also suggests that if changes are made to increase the block size to a modest 4 and thereby save space, there still are substantial gains. For matrices in the usual range (under order 256) the space saving linpack code will

107

Matrix Command Language

always beat lapack. These findings were not altered when the test job was run on the Dell Latitude computer and suggest that although for many applications the linpack code is a better choice, the B34S user is given the capability to modify this choice. Readers are invited to rerun these examples on their own systems since the results may be chip sensitive. The brief speed and accuracy tests reported above highlight the fact that selecting just the right inverters can make a substantial difference. For most users running lapack, it is best to use the default settings although the usual "one size fits all" approach may carry with it substantial hidden costs.

The discussion of refinement capability has the hidden assumption that the calculation is assumed to remain using real*8 data. Another assumption is that the blas code not be modified to increase accuracy. In the next section these assumptions are relaxed as calculations are made in real*16 and VPA and when using real*8 routines, the blas routines are enhanced to give more accuracy.

16.7 Variable Precision Math

Many software systems allow real*4 data storage but move the data to real*8 to make a calculation. In many cases the resulting accuracy is not the same as what would be obtained with a direct read into real*8. The B34S matrix command supports real*4, real*8, real*16, complex*16, Complex*32, integer*4 and integer*8 data types to facilitate research into the impact of data storage accuracy on the calculation. In addition the variable precision subroutine library developed by Smith (1991) has been implemented to give accuracy to better that 1700 digits. The use of this code is discussed next. It is not just important to calculate in real*8, the precision in which the data was initially read makes a major difference, even in simple problems. A simple example from Stokes (2005) involving 2.00 / 4.11 will illustrate the problems of precision.

str=>vpa .4866180048661800486618004866180048661800486618004866M+0r*8=>vpa .48661800486618001080454992664677M+0r*8=>r*16 .48661800486618001080454992664677E+00str=>r*16 .48661800486618004866180048661800E+00r*8=>r*8 .48661800486618000000000000000000E+00r*4=>r*4 .48661798200000000000000000000000E+00

The line str>vpa lists the exact answer obtained when the data (2.0 and 4.11) are read from a string into a variable precision arithmetic (VPA) routine while the line r*8=>vpa shows what happens to accuracy when the data are first read into real*8 or double precision, then moved to a vpa datatype. The line r*8=>r*16 shows what occurs when the data are first read into real*8, then converted to real*16 before making the calculation. In this case the results are the same as what is obtained with r*8=>vpa but are inferior to the line str=>r*16 where the data are read directly into real*16. The lines r*8=>r*8 and r*4=>r*4 show what can be expected using the usual double precision and single precision math, respectively. The importance of this simple example is that is can be used to disentangle the effect of data storage precision and data calculation precision in a very simple problem where each can be isolated. When there are many calculations needed to solve a problem (to invert a 100 by 100 matrix by elimination involves a third of a million operations), round off error can mount, especially when numbers differ in size. Strang (1976, page 32) notes "if floating-point numbers are added, and their exponents c differ say by two, then the last two digits in the smaller number will be more or less lost..." Real*4 or

108

Chapter 16

single precision on IEEE machines has a range of 1.18*10-38 to 3.40*1038. This gives a precision of 7-8 digits at best. Real*8 or double precision has a range of 2.23*10-308 to 1.79*10308 and at best gives a precision of 15-16 digits. Real*16 has a range of 10-4931 to 104932 and gives up to 32 digits of precision. VPA or variable precision arithmetic allows variable precision calculations.

To measure the effects of data precision and calculation method on accuracy requires a number of different test data sets. The first problem attempted was the StRD (Rogers-Filliben-Gill-Guthrie-Lagergren-Vangel 1998) Filippelli data set, which contains 82 observations on a

polynomial model of the form where x ranges from -3.13200249 to -

8.781464495 and ranges from 90,828.258 to 2,726,901,792.451598. Answers to 15 digits are supplied by StRD. Table 16.2 reports 15 experiments involving various ways to estimate the model. The linpack Cholesky routines and general matrix routines detect rank problems and will not solve the problem if the data are not converted to real*16. The QR approach obtains an average LRE13 of 7.306, 7.415 and 8.368 on the coefficients, SE and residual sum of squares. The exact numbers obtained are listed in Table 16.3. If the accuracy improvements for the BLAS routines suggested are enabled, these LRE numbers jump to 8.118, 8.098 and 9.803, respectively. Note that both accuracy improvements result in the same gain. Experiments # 4 and # 5 first copy the data that have been first read into real*8 into a real*16 variable and attempt estimation with a Cholesky and a QR approach. The LRE's are the same for both approaches (7.925, 8.708, 8.167). This experiment shows the effect of calculation precision and at first would lead one to believe that there is little gain obtained using real*16 calculation except for the fact that the Cholesky condition is not seen as 0.0. However, this interpretation would be premature without checking for data base precision effects (i. e., at what precision was the data initially read), which we do below.

Experiments 6-12 test various combinations of calculation precision and routine selection. In Experiment # 6 we use the linpack SVD routines on real*8 data. The results are poor (LRE numbers of 2.195, 2.132 and 4.039).14 When the accuracy improvements are enabled, (experiment 7 and 8), there is a slight loss of accuracy on the coefficients to 1.901 but a slight gain on the SE to 2.431 . However, when the real*8 data are copied to real*16 in experiment 9, the SVD LRE numbers jump to 7.924, 8.708 and 8.167, respectively, which are similar to what was found in experiments 4 and 5 and show clearly the effect of calculation precision conditional on data reading into real*8 before the data are moved to real*16. These results are similar to those in the real*16 Cholesky experiment 4 and the real*16 QR experiment 5.

Experiments 10-12 study the effect of using lapack's SVD routine in place of linpack. For experiment 10, the coefficient LRE jumps to 7.490, which is quite good and in fact beats the QR LRE reported for experiment 1. This value is far better than the linpack LRE of 2.195.15

13 LRE is the Log Relative Error as discussed by McCullough (1999). Assume x is value obtained and c is the correct value. Then .14 The author has used the linpack code since 1979. These results were not expected and seem to be related to the extreme values in the X matrix in the Filippelli data. When real*16 is used, accuracy of the linpack SVD routine improves.15 McCullough (1999, 2000) Used lapack QR and SVD routines to estimate the coefficients of the Filippelli data finding that "QR generally returns more accurate digits than SVD." The LRE values found were 7.4 and 6.3

109

Matrix Command Language

However, the LRE of the SE is poor with a LRE of 1.910, which is less than that found with linpack of 2.132. The LRE of of 1.606 is also less than the linpack LRE of 3.258. Since the SE requires knowledge of , calculated as , extreme values along the diagonal of may be causing errors when forming . However, this possibility does not explain the poor performance of the residual sum of squares LRE of 1.606.16 The reason may be related to the fact that the data set has such high values that minor coefficient differences will result in substantial changes in the relative residual sum of squares. Experiments 13-15 first load the data in real*16 and proceed to the same routines as used for experiments 4 - 6. Here we see LRE numbers of 14.68 14.99 and 15.00 for the Cholesky experiment and 14.79, 14.96 15.00 for the QR experiment which is the same as SVD calculated with linpack. These are close to perfect answers. Table 16.3 lists the coefficients obtained for experiment 1, which used real*8 data while Table 16.4 lists the exact coefficients obtained for the QR using data read directly into real*16. Experiments 13-15 show gain from reading the Filippelli data set in real*16. Since all there experiments produced similar LRE values, it suggests that if the data are read with enough precision, the results are less sensitive to the estimation method. This finding has important implications for data base design. The next task is to study less extreme (stiff) data sets and observe the results.

respectively. For S-PLUS he found 8.4 and 5.8, respectively, where the underlying routines were not known. 16 The sum of squares was tested against the published value of 0.795851382172941E-03. The lapack SVD routine obtained 0.8155689538070673E-03.

110

Chapter 16

Table 16.2 LRE for Various Approaches to an OLS Model of the Filippelli Data_____________________________________________________________Various options of real*8 data Experiment TYPE COEF SE RSS_LE 1 QR 7.306 7.415 8.368 2 ACC_1 8.118 8.098 9.803 3 ACC_2 8.118 8.098 9.803 4 R16_CHOL 7.924 8.708 8.167 5 R16_QR 7.924 8.708 8.167 6 SVD 2.195 2.132 4.039 7 SVD_ACC1 1.901 2.431 3.258 8 SVD_ACC2 1.901 2.431 3.258 9 SVD_R16 7.924 8.708 8.167 10 SVD_LAPK 7.490 1.910 1.606 11 SVD2ACC1 7.490 1.910 1.606 12 SVD2ACC2 7.490 1.910 1.606

Various Options using Data read directly in real*16

13 R16_CHOL 14.68 14.99 15.00 14 R16_QR 14.79 14.96 15.00 15 R16_SVD 14.79 14.96 15.00 ___________________________________________________________Experiments 4, 5 and 9 involve reading data first into real*8 and then converting the data to real*16. Experiments 1-3, 6-8 and 10-12 involve real*8 data. Experiments 13-15 use data read directly into real*16. Experiments 2, 7 and 11 enhance blas routines by accumulating real*8 data using IMSL routines DQADD and DQMULT. Experiments 3, 8 and 12 assumulate real*8 blas calculations using real*16 data as outlined in Stokes (2005). See Chapter 10 for a detailed discussion of the methods used. The coefficients obtained for experiment # 1 and 14 are listed in Tables 16.3 and 16.4.

111

Matrix Command Language

Table 16.3 Coefficients and SE Estimated Using QR Models of the Real*8 Filippelli Data__________________________________________________________________ Test Value Value Obtained LRECoef 1 -2772.179591933420 -2772.179723094652 7.33 Coef 2 -2316.371081608930 -2316.371192269638 7.32 Coef 3 -1127.973940983720 -1127.973995395338 7.32 Coef 4 -354.4782337033490 -354.4782509735776 7.31 Coef 5 -75.12420173937571 -75.12420543777237 7.31 Coef 6 -10.87531803553430 -10.87531857690271 7.30 Coef 7 -1.062214985889470 -1.062215039398714 7.30 Coef 8 -0.6701911545934081E-01 -0.6701911887876555E-01 7.29 Coef 9 -0.2467810782754790E-02 -0.2467810910390330E-02 7.29 Coef 10 -0.4029625250804040E-04 -0.4029625462234867E-04 7.28 Coef 11 -1467.489614229800 -1467.489683023960 7.33 Mean LRE 7.306448565286121 Variance LRE 2.587670394878226E-04 Minimum LRE 7.280096349919187 Maximum LRE 7.329023461850447 SE 1 559.7798654749500 559.7798867059487 7.42 SE 2 466.4775721277960 466.4775900975754 7.41 SE 3 227.2042744777510 227.2042833290517 7.41 SE 4 71.64786608759270 71.64786889794284 7.41 SE 5 15.28971787474000 15.28971847592676 7.41 SE 6 2.236911598160330 2.236911685945726 7.41 SE 7 0.2216243219342270 0.2216243305780890 7.41 SE 8 0.1423637631547240E-01 0.1423637686503493E-01 7.41 SE 9 0.5356174088898210E-03 0.5356174292732132E-03 7.42 SE 10 0.8966328373738681E-05 0.8966328708850490E-05 7.43 SE 11 298.0845309955370 298.0845420801842 7.43 Mean LRE 7.414701487211084 Variance LRE 7.386168559949404E-05 Minimum LRE 7.405390067654106 Maximum LRE 7.429617565744895

Residual sum of squares:RSS 0.7958513821729410E-03 0.7958513787598208E-03 8.37

________________________________________________________________Test values are reported on the left-hand side. LRE = log relative error. The coefficients report experiment # 1 from Table 16.2. The same linpack QR routine was modified and to run for real*16 data. Results for this experiment are shown in Table 16.4.

112

Chapter 16

Table 16.4 Coefficients estimated with QR using Real*16 Filippelli Data

Coefficients Using QR on Data Loaded into Real*16

LRE 1. –2772.1795919334239280284475535721 14.85 2. –2316.3710816089307588219679140978 15.00 3. –1127.9739409837156985716700141998 14.42 4. –354.47823370334877161073848496470 15.00 5. –75.124201739375713890522075522684 15.00 6. –10.875318035534251085281081177145 14.35 7. –1.0622149858894676645966112202356 14.66 8. –0.67019115459340837592673412281191E-01 15.00 9. –0.24678107827547865084085445245647E-02 14.85 10. –0.40296252508040367129713154870917E-04 15.00 11. –1467.4896142297958822878485135961 14.55 Mean LRE 14.788490320266543980835382276091684 Variance LRE 6.3569618908829012635712782954099325E-0002 Minimum LRE 14.347002403969724322813759016211991 Maximum LRE 15.000000000000000000000000000000000

SE Using QR on DATA Loaded into Real*16 LRE 1. 559.77986547494987457477254797527 15.00 2. 466.47757212779645269310982974610 15.00 3. 227.20427447775131062939817526228 14.86 4. 71.647866087592737261665720850718 15.00 5. 15.289717874740006503075678978592 15.00 6. 2.2369115981603327555186234039771 14.91 7. 0.22162432193422740206612983379340 14.74 8. 0.14236376315472394891823309147959E-01 15.00 9. 0.53561740888982093625865193118466E-03 15.00 10. 0.89663283737386822210041526987951E-05 15.00 11. 298.08453099553698520055234224439 15.00 Mean LRE 14.955903576675283545444986642213045 Variance LRE 7.2174779096858864768608287934814669E-0003 Minimum LRE 14.741319930323011772906976043000329 Maximum LRE 15.000000000000000000000000000000000 Residual sum of squares0.79585138217294058848463068814293E-03 15.00

LRE = log relative error. This is experiment # 14 from Table 16.2 but uses the linpack QR routine modified by Stokes (2005) to run with real*16 data. For this experiment the data was read directly into real*16.

The Box-Jenkins (1976) Gas Furnace data have been widely studied and modeled and are close in difficulty to what are found in many applied models in time series. While "correct" 15 digit agreed upon answers are not available, it is possible to study the effect on the residual sum of squares using 11 approaches reported in Table 16.5.17 Since OLS minimizes the sum of squared errors, a "better" answer is one with a smaller . Using this criteria, the linpack general matrix solver DGECO, Experiment 3, is "best" followed closely by the lapack general matrix solver, Experiment 4, and the linpack SVD routine, Experiment 10. Experiments 5 and 6 use the lapack general matrix solver that allows refinement and, in the case of Experiment 6 refinement and equilibration. These approaches did not do as well in determining a minimum

and were substantially more expensive in terms of computer time. Of interest is why Experiment 1 and Experiment 8 did not produce the same answer since they both used the linpack Cholesky routines. The answer relates to the way the coefficients are calculated. In the former case the Cholesky R is used to obtain the coefficients without explicitly forming

17 Since this data set does not have the rank problems found with the Filippelli data, it is possible to attempt a number of alternative procedures. Not all these procedures should be used.

113

Matrix Command Language

using the linpack routine DPOSL, while in the latter case is formed from R using DPODI. In general, the answers are very close for this exercise.

Table 16.5 Residual Sum of Squares on a VAR model of Order 6 – Gas Furnace Data________________________________________________________________Residual Sum of Squares for various methods1. OLSQ using Linpack Chosleky – solving from R 16.138582959158152. OLSQ using LINPACK QR 16.138582959158213. OLSQ using LINPACK DGECO 16.138582959158034. OLSQ using LAPACK DGETRE-DGECON-DGETRI 16.138582959158065. OLSQ using LAPACK DGESVX 16.138582959357516. OLSQ using LAPACK DGESVX with equilibration 16.138582959635007. OLSQ using LAPACK DPOTRF-DPOCON-DPTTRI 16.138582959158128. OLSQ using LINPACK DPOCO-DPODI 16.138582959158119. OLSQ using LINPACK DSICO-DSIDI 16.1385829591581410. OLSQ using SVD Linpack 16.1385829591580811. OLSQ using SVD Lapack 16.13858295915810

_________________________________________________________________Model estimated was gasout=f(gasout{1 to 6}, gasin{1 to 6}). Data from Box-Jenkins [3]. Data studied in Stokes [32]. Experiment 1 solves for using Cholesky R directly. Experiments 3-9 form .

The StRD Pontius data are classified as of a lower level of difficulty, although more challenging than the gas furnace data studied in the prior section. The Pontius data consists of 40 observations of a model of the form for a model which is almost a perfect fit. The eigenvalues of , as calculated by the eispack routine RG, were 0.8109E+13, 0.7317E+27, 3.613, giving a condition estimate that tripped the condition tolerance in the linpack LU and Cholesky routines for both real*8 and real*4 data. Calculations were "forced" by ignoring this check.18 Results are reported for a number of experiments in Table 16.5 that vary precision, method of calculation and degree of Fortran optimization for real*4 data. The base method was the QR for real*8 data which gives a LRE = 13.54 for . When accuracy was enabled the LRE for the SE and increased slightly from 12.39 to 12.51 and 12.09 to 12.21 respectively in experiments 1 and 2. The linpack SVD produced a LRE of 13.92, 13.92 and 13.53 for the coefficient, the SE and , respectively, while for lapack these were 13.48, 12.74 and 12.93, respectively, in Experiments 3 and 4. Here, using accuracy as a criteria, linpack edged lapack. Since in the Filippelli data set the reverse was found, there appears to be no "best" SVD routine for all cases. In addition to accuracy, there are other aspects of the selection process that include relative speed of execution (tested in Table 16.7 and found to be a function

18 The same data was estimated in Windows RATS Doan (1992) version 6.0. While the reported coefficients agreed with the benchmark for 11, 11 and 14 digits, respectively, RATS unexpectedly produced a SE of 0.0 and a t of 0.0 for the term. The "certified" coefficients and standard errors are:

0.673565789473684E-03 0.107938612033077E-03

0.732059160401003E-06 0.157817399981659E-09

-0.316081871345029E-14 0.486652849992036E-16

which produce a t for of -64.95, not zero.

114

Chapter 16

of the size of the problem and computer chip) and memory requirements that are not tested here since they are published.19

Experiments 5-8 show forced linpack LU and Cholesky models for real*8 data. In Experiments 7-8, added accuracy in the accumulators was enabled. Slight accuracy gains were observed, especially in the RSS calculation where the LRE jumped from 12.77 & 12.73 to 13.23 and 13.39, respectively. What is interesting is that in this case, even though the condition of

was large, the LU and Cholesky approaches were able to get reasonable answers. The linpack condition check appears to be conservative since in the usual case the software would not attempt the solution of this problem.

Experiments 9-14 concern real*4 data.20 Again, the QR was found to be most accurate, with scores of 5.36, 6.01 and 5.65 for the coefficients, SE's and RSS, respectively. These runs were made with code compiled by Lahey Fortran version 7.10 running opt = 1. When accuracy enhancement was enabled, the LRE for the SE fell from 6.01 to 4.37. This difference was traced to the fact that the BLAS real*4 routine SDOT is optimized to hold data in registers while the higher accuracy routine SDSDOT did not optimize to the same extent. This is shown when the same calculation was done with opt=0 and the QR SE accuracy was 4.21 and 4.37 for non-accuracy and accuracy-enabled code respectively. Higher accuracy was observed for opt=1 LU-forced of 5.27 vs 4.80 for opt=0 calculations. Why the forced Cholesky experiment seems to run more accurately at opt=0 than opt=1 (see Experiment 12) is not clear. What seems to be the case is that the level of optimization and its resulting changes in registers seems to make a detectable difference only with real*4 precision data. A strong case can be made not to use this precision for this or any other econometric problem. When real*8 calculations are used, these knife edge type differences are not observed.

19 For lapack memory was set to the suggested amount from the first call to the routine. Experimentation with alternative lapack memory, possible with the B34S system implementation of lapack, was not attempted for his problem since it was discussed earlier.20 Data was first read in real*8. Then the B34S routine RND( ) first checked for maximum and minimum allowable real*4 size, using the Fortran functions HUGH( ) and TINY( ). Next, the real*8 data was written to a buffer, using g25.16, and re-read into real*4, using the format g25.16. This approach gives a close approximation to having read the data directly into real*4. Use of the Fortran function sngl( ) can be dangerous in that, among other things, range checking is not performed.

115

Matrix Command Language

Table 16.6 LRE for Various Estimates of Coef, SE and RSS of Pontius DataReal*8 Data# Method COEF SE RSS1. QR 13.54 12.39 12.092. QR_AC 13.52 12.51 12.213. SVD-LINPACK 13.92 13.92 13.534. SVD_LAPACK 13.48 12.74 12.935. LU-Forced 12.61 13.02 12.776. Chol-Forced 12.11 13.00 12.737. LU-Forced_AC 12.77 13.61 13.238. Chol-Forced_AC 12.17 13.63 13.39

Real*4 Data Optimization = 19. QR 5.36 6.01 5.65 10. QR_AC 5.36 4.37 4.06 11. LU-Forced 3.93 5.27 5.36 12. Chol-Forced 3.97 3.36 3.06 13. LU-Forced_AC 3.95 5.30 4.78 14. Chol-Forced_AC 4.01 3.32 3.02

Real*4 Data Optimization = 0 9. QR 5.36 4.21 3.9110. QR_AC 5.36 4.37 4.0611. LU_Forced 4.31 4.80 4.4512. Chol_Forced 4.48 4.51 4.2613. LU_Forced_AC 3.95 5.30 4.7814. Chol_Forced_AC 4.16 3.79 3.48 _______________________________________________________________________All data were initially read in real*8. For real*4 results data were then converted to real*4. Forced means that the LINPACK condition check has been bypassed for testing purposes. All reported LRE values are for the means. All real*4 tests have been done with LINPACK routines. Real*4 accumulators have not been enabled in cases where _AC is not added to the method name.

The Eberhardt data consist of 11 observations of a one input model . The level of difficulty is rated as average. Results are shown in Table 16.7. Here the Cholesky, the linpack SVD and the lapack SVD all produce 100% identical LRE values of 14.72, 15.00 and 14.91 respectively. For the QR the Coefficient LRE was 14.72 while the SE and residual LRE's were marginally less at 14.40 and 14.05. Here again the methods being considered run very close together.

Table 16.7 LRE for QR, Cholesky, SVD linpack and lapack for Eberhardt Data_______________________________________________________________________Method COEF SE RSSQR 14.72 14.40 14.05Chol 14.72 15.00 14.91SVD-LINPACK 14.72 15.00 14.91SVD-LAPACK 14.72 15.00 14.91_______________________________________________________________________All data read in real*8.

The above results suggest that in certain problems that have a high degree of multicollinearity, the results are sensitive to the level of precision of the calculation as well as the method of the calculation. A challenging example was the Filippelli polynomial data set which was discussed earlier. However, the discussion was not complete because the real*16 QR results were only compared to the 15-digit "official" benchmark, and not a benchmark with more digits.

116

Chapter 16

Since real*16 will give more than 15 digits of accuracy, an important final task for the next section is to extend the Filippelli benchmark, using variable precision arithmetic to benchmark the accuracy of the real*16 results obtained.

The variable precision library developed by Smith [30] was implemented in the B34S to extend the Filippelli benchmark and thus fully test the true accuracy of the reported real*16 results. The linpack LU inversion routines DGECO, DGEFA and DGEDI were rewritten to allow variable precision calculations. What was formerly a real*8 variable became a 328 element real*8 vector. Simple statements, such as A=A+B*C, had to be individually coded, using a customized pointer routine, IVPAADD( ) that would address the correct element to pass to a lower level routine to make the calculation. 21 A simple example gives some insight into how this is done:

cc if (z(k) .ne. 0.0d0) ek = dsign(ek,-z(k))c if(vpa_logic(kindr, * z(ivpaadd(kindr,k,1,k,1)),'ne', vpa_work(i_zero)) )then call vpa_mul(kindr,vpa_work(i_mone),z(ivpaadd(kindr,k,1,k,1)), * vpa_work(iwork(4))) call vpa_func_2(kindr,'sign',vpa_work(i_ek), * vpa_work(iwork(4)), * vpa_work(iwork(5)) ) call vpa_equal(kindr,vpa_work(iwork(5)),vpa_work(i_ek)) endif

vpa_work( ) is a 328 by 20 work array. The line z(ivpadd(kindr,k,1,k,1) addresses the kth element of Z, which is 328 by k, and compares it to a constant = 0.0 saved in vpa_work(i_zero). If these two variables are not equal then the three calls are executed to solve ek = dsign(ek,-z(k)). The first call forms –z(k) and places it in VPA_work(iwork(4)). The variable vpa_work(i_mone) contains –1.0. Next, the SIGN function is called and the result placed in VPA_work(iwork(5)). Finally a copy is performed. This simple example shows what is involved to "convert" a real*8 program to do VPA math. The results can be spectacular.22 The test job vpainv is shown next:

/;/; Shows gains in accuracy of the inverse with vpa/;b34sexec matrix;

call echooff;n=6;x=rn(matrix(n,n:));ix=inv(x,rcond8);r16x=r8tor16(x);ir16x=inv(r16x,rcond16);

21 Stokes (2005) provides added detail on how this was accomplished.22 The job vpainv, in paper_86.mac which is distributed which B34S, illustrates the gains in accuracy for alternative precision settings. Assuming a matrix X, X*inv(X) produces off diagonal elements in the order of |.1e-1728|, which is far superior to what can be obtained with real*4, real*8 or real*16 results which are also shown in the test problem. The B34S VPA implementation allows these high-accuracy calculations to be mixed with lower precision commands, using real*4, real*8 and real*16, since data can be moved from one precision to another. This allows experimentation concerning how sensitive the results are to accuracy settings.

117

Matrix Command Language

call print('Real*4 tests',sngl(x),inv(sngl(x)),sngl(x)*inv(sngl(x)));call print('Real*8 tests',x, ix, x*ix);call print('Real*16 tests',r16x,ir16x,r16x*ir16x);vpax=vpa(x);ivpax=inv(vpax,rcondvpa);detvpa=%det;call print(rcond8,rcond16,rcondvpa,det(x),det(r16x),detvpa);call print('Default accuracy');call print('VPA Inverse ',vpax,ivpax,vpax*ivpax);

/; call vpaset(:info);

do i=100,1850,100;call vpaset(:ndigits i);call vpaset(:jform2 10);

call print('*************************************************':);vpax=mfam(dsqrt(dabs(vpa(x))));call vpaset(:jform2 i);call print('Looking at vpax(2,1) given ndigits was set as ',i:);call print(vpax(2,1));ivpax=inv(vpax);call print('VPAX and Inverse VPAX at high accuracy ', vpax,ivpax,vpax*ivpax);call print('*************************************************':);

enddo;b34srun;

Edited output from running this script is shown next. First a real*4 matrix x is first displayed, then inverted and then displayed. Errors are in the range of |1.e-6| and smaller.

Real*4 tests

Matrix of 6 by 6 elements (real*4)

1 2 3 4 5 6 1 2.05157 -1.32010 1.49779 0.409240 0.647330 1.92704 2 1.08325 -1.52445 -0.168215 1.06262 -0.595625E-01 0.146707 3 0.825589E-01 -0.459242 0.498469 -0.886348 1.14983 -0.271132E-01 4 1.27773 -0.605638 1.26792 0.396807 -0.579426 -0.138891 5 -1.22596 0.307389 0.741401 0.245425 1.63783 -0.137510 6 0.338526 -1.54789 -0.187157 0.527160 1.81526 -1.96044

Matrix of 6 by 6 elements (real*4)

1 2 3 4 5 6 1 0.821598 -1.25918 -0.958025 -0.255133 -0.665297 0.791364 2 0.778618 -1.58549 -1.34620 -0.310931 -0.316747 0.709570 3 -0.262284 0.249125 0.406023 0.676249 0.424313 -0.322461 4 0.250162 -0.950123E-01 -0.938542 -0.628610E-01 0.271452 0.237183 5 0.561824 -0.716494 -0.495201 -0.454585 -0.505871E-01 0.541239 6 0.139631 0.321648 0.147814 -0.300940 0.120854 -0.337968

Matrix of 6 by 6 elements (real*4)

1 2 3 4 5 6 1 1.00000 -0.586484E-07 0.147839E-07 -0.649433E-07 -0.117634E-06 0.191178E-06 2 -0.105908E-06 1.00000 -0.591370E-08 -0.114246E-06 -0.735698E-07 0.108380E-06 3 0.255151E-07 -0.116802E-06 1.00000 -0.251482E-07 -0.628774E-07 0.153868E-06

118

Chapter 16

4 -0.364185E-07 -0.112549E-06 -0.528599E-07 1.00000 -0.118921E-06 0.111764E-06 5 0.125920E-06 -0.233107E-07 0.455514E-08 0.481459E-07 1.00000 0.365477E-07 6 -0.429915E-07 -0.220843E-06 -0.739041E-07 0.121912E-07 -0.798442E-07 1.00000

Next the experiment is repeated for real*8 and real*16 versions of the same matrix. Here errors are in the area of |.1e-15| and smaller and |.1e-33| and smaller. Using the default VPA setting and the same matrix, these errors become |.1e-62| and smaller. Real*8 tests

X = Matrix of 6 by 6 elements

1 2 3 4 5 6 1 2.05157 -1.32010 1.49779 0.409240 0.647330 1.92704 2 1.08325 -1.52445 -0.168215 1.06262 -0.595625E-01 0.146707 3 0.825589E-01 -0.459242 0.498469 -0.886348 1.14983 -0.271132E-01 4 1.27773 -0.605638 1.26792 0.396807 -0.579426 -0.138891 5 -1.22596 0.307389 0.741401 0.245425 1.63783 -0.137510 6 0.338525 -1.54789 -0.187157 0.527160 1.81526 -1.96044

IX = Matrix of 6 by 6 elements

1 2 3 4 5 6 1 0.821598 -1.25918 -0.958025 -0.255133 -0.665297 0.791364 2 0.778618 -1.58550 -1.34620 -0.310931 -0.316747 0.709570 3 -0.262284 0.249125 0.406023 0.676249 0.424313 -0.322461 4 0.250162 -0.950124E-01 -0.938542 -0.628611E-01 0.271452 0.237183 5 0.561824 -0.716494 -0.495201 -0.454585 -0.505871E-01 0.541239 6 0.139631 0.321648 0.147814 -0.300940 0.120854 -0.337968

Matrix of 6 by 6 elements

1 2 3 4 5 6 1 1.00000 0.00000 -0.222045E-15 0.444089E-15 -0.222045E-15 -0.111022E-15 2 0.693889E-17 1.00000 -0.971445E-16 -0.111022E-15 -0.100614E-15 -0.249800E-15 3 -0.598480E-16 0.693889E-17 1.00000 0.107553E-15 -0.129237E-15 -0.329597E-16 4 -0.242861E-15 0.270617E-15 0.267147E-15 1.00000 -0.180411E-15 -0.180411E-15 5 -0.128370E-15 -0.159595E-15 -0.246331E-15 0.194289E-15 1.00000 0.277556E-15 6 0.00000 0.111022E-15 0.00000 -0.111022E-15 -0.555112E-16 1.00000

Real*16 tests

R16X = Matrix of 6 by 6 elements (real*16)

1 2 3 4 5 6 1 2.05157 -1.32010 1.49779 0.409240 0.647330 1.92704 2 1.08325 -1.52445 -0.168215 1.06262 -0.595625E-01 0.146707 3 0.825589E-01 -0.459242 0.498469 -0.886348 1.14983 -0.271132E-01 4 1.27773 -0.605638 1.26792 0.396807 -0.579426 -0.138891 5 -1.22596 0.307389 0.741401 0.245425 1.63783 -0.137510 6 0.338525 -1.54789 -0.187157 0.527160 1.81526 -1.96044

IR16X = Matrix of 6 by 6 elements (real*16)

1 2 3 4 5 6 1 0.821598 -1.25918 -0.958025 -0.255133 -0.665297 0.791364 2 0.778618 -1.58550 -1.34620 -0.310931 -0.316747 0.709570 3 -0.262284 0.249125 0.406023 0.676249 0.424313 -0.322461 4 0.250162 -0.950124E-01 -0.938542 -0.628611E-01 0.271452 0.237183 5 0.561824 -0.716494 -0.495201 -0.454585 -0.505871E-01 0.541239 6 0.139631 0.321648 0.147814 -0.300940 0.120854 -0.337968

Matrix of 6 by 6 elements (real*16)

119

Matrix Command Language

1 2 3 4 5 6 1 1.00000 0.192593E-33 0.288889E-33 -0.192593E-33 0.120371E-33 -0.192593E-33 2 0.255788E-33 1.00000 0.120371E-34 -0.541668E-34 0.692131E-34 -0.662038E-34 3 -0.564237E-35 0.917826E-34 1.00000 0.165510E-33 0.184318E-34 -0.168519E-33 4 0.382177E-33 0.120371E-34 0.752316E-34 1.00000 0.120371E-34 -0.144445E-33 5 -0.328010E-33 0.601853E-35 -0.243751E-33 0.541668E-34 1.00000 0.156482E-33 6 -0.481482E-34 0.962965E-34 0.481482E-34 0.00000 0.481482E-34 1.00000

RCOND8 = 0.50111667E-01

RCOND16 = 0.5011166670247408E-01

RCONDVPA= 5.01116667024740759246941521228642361326495469435182039839368M-2

15.503129

15.50312907174408

DETVPA = 1.55031290717440844136019448415020291808052552694282172989314M+1

Default accuracy

VPA Inverse

VPAX = Matrix of 6 by 6 elements VPA - FM

1 2 3 4 5 6 1 .205157M+1 -.132010M+1 .149779M+1 .409240M+0 .647330M+0 .192704M+1 2 .108325M+1 -.152445M+1 -.168215M+0 .106262M+1 -.595625M-1 .146707M+0 3 .825589M-1 -.459242M+0 .498469M+0 -.886348M+0 .114983M+1 -.271132M-1 4 .127773M+1 -.605638M+0 .126792M+1 .396807M+0 -.579426M+0 -.138891M+0 5 -.122596M+1 .307389M+0 .741401M+0 .245425M+0 .163783M+1 -.137510M+0 6 .338525M+0 -.154789M+1 -.187157M+0 .527160M+0 .181526M+1 -.196044M+1

IVPAX = Matrix of 6 by 6 elements VPA - FM

1 2 3 4 5 6 1 .821598M+0 -.125918M+1 -.958025M+0 -.255133M+0 -.665297M+0 .791364M+0 2 .778618M+0 -.158550M+1 -.134620M+1 -.310931M+0 -.316747M+0 .709570M+0 3 -.262284M+0 .249125M+0 .406023M+0 .676249M+0 .424313M+0 -.322461M+0 4 .250162M+0 -.950124M-1 -.938542M+0 -.628611M-1 .271452M+0 .237183M+0 5 .561824M+0 -.716494M+0 -.495201M+0 -.454585M+0 -.505871M-1 .541239M+0 6 .139631M+0 .321648M+0 .147814M+0 -.300940M+0 .120854M+0 -.337968M+0

Matrix of 6 by 6 elements VPA - FM

1 2 3 4 5 6 1 .100000M+1 -.349857M-63 .223368M-63 -.757910M-63 -.276120M-63 -.542557M-63 2 .281368M-62 .100000M+1 .722028M-63 -.165087M-62 .128858M-63 -.416666M-63 3 .505466M-63 -.281391M-63 .100000M+1 -.312900M-63 -.794001M-64 .136114M-63 4 .155177M-62 .248980M-63 .754062M-64 .100000M+1 .139723M-63 -.699228M-63 5 -.614843M-63 -.423209M-63 -.400764M-63 -.685434M-64 .100000M+1 .736800M-63 6 .241052M-62 -.149789M-62 -.141051M-63 -.160277M-62 .169336M-64 .100000M+1

Next the VPA degrees of accuracy is increased from 100 to 1700+ in steps of 100. Edited results for 100 and 1775 show accuracy in the range of |.1e-104| and an astounding |.1e-1784|, which illustrate what is possible with VAP math. A typical element, x(2,1) is also shown.

************************************************* Looking at vpax(2,1) given ndigits was set as 100

120

Chapter 16

1.0407938475538507976540614667166763813116409216491186532751876440441549 90671195410174838727421096493M+0

VPAX and Inverse VPAX at high accuracy

VPAX = Matrix of 6 by 6 elements VPA - FM

1 2 3 4 5 6 1 .143233M+1 .114896M+1 .122384M+1 .639719M+0 .804568M+0 .138818M+1 2 .104079M+1 .123469M+1 .410141M+0 .103084M+1 .244054M+0 .383024M+0 3 .287331M+0 .677674M+0 .706023M+0 .941461M+0 .107230M+1 .164661M+0 4 .113037M+1 .778228M+0 .112602M+1 .629926M+0 .761200M+0 .372681M+0 5 .110723M+1 .554427M+0 .861046M+0 .495403M+0 .127978M+1 .370824M+0 6 .581829M+0 .124414M+1 .432617M+0 .726058M+0 .134732M+1 .140016M+1

IVPAX = Matrix of 6 by 6 elements VPA - FM

1 2 3 4 5 6 1 .727136M+0 .730328M+0 -.308940M+0 -.175557M+1 .158871M+1 -.837851M+0 2 -.342033M+1 -.219101M+0 -.233482M+1 .595158M+1 -.228244M+1 .274594M+1 3 -.109686M+0 -.901577M+0 .229806M+0 .219987M+1 -.137187M+1 .106148M+0 4 .283037M+1 .937660M+0 .272689M+1 -.526084M+1 .135031M+1 -.234069M+1 5 -.758210M+0 -.279579M+0 -.193377M+0 .311109M+0 .665131M+0 .591982M+0 6 .203284M+1 -.474316M-1 .904066M+0 -.280993M+1 .451570M+0 -.766261M+0

Matrix of 6 by 6 elements VPA - FM

1 2 3 4 5 6 1 .100000M+1 .921927M-105 .200000M-104 .300000M-104 -.932517M-105 .000000M 0 2 .679132M-105 .100000M+1 .221145M-104 .000000M 0 -.102494M-104 -.221304M-104 3 -.229392M-104 .622168M-106 .100000M+1 .230075M-104 .965828M-106 .115613M-104 4 -.122064M-104 .309049M-105 .120743M-104 .100000M+1 -.104756M-104 -.494081M-105 5 -.760298M-105 .576995M-105 .221558M-104 .100000M-104 .100000M+1 -.341342M-105 6 -.200000M-104 .374939M-105 .000000M 0 .400000M-104 -.113653M-104 .100000M+1

************************************************* Note: Precision out of range when calling FMSET. NPREC = 1800 Nearest valid precision used given ndig= 256 Upperlimit on :ndigits = 1775 ************************************************* Looking at vpax(2,1) given ndigits was set as 1800

1.0407938475538507976540614667166763813116409216491186532751876440441549 9067119541017483872742109649284847786731954780618396337243557760905134253 6563520156336803061100714463959556366263662269873821308469464832965854167 0138171860068377342247552386420701260553904819498240561168336065413626401 0250669906640460786770367950409741912099989603137336731422032053417523721 3966297964166071008246385873024394829403802520042514409728651300143976957 2812397779120024875167135613154575066947601426816271130247154879147668923 9394356296009388425802127641937860496756504613699970799418421555804093055 3237120995762235218535766576766661537848740795670861988624303476444175233 7226176225798507544402708599420593094629811345970349094729351139588955728 5119428379300506359255141263770626457625458403838601615539373960461597816 0073349099782610400889436729448523898284270595056658203704057321081365781 8654413565539069157902511584029231833064933298214721741970329932207991290 6527078671610757572078982983114671804609743149945456219122385716278877328 1968071093789651792050131594215163153195866826910747687397553138316978173 9321736744421971655768821520112748477863475786714301867009421997175231617 0363238010614721981137580098809073513520578006575598464157845090062991068 3355234688196773978064825447224414805727326031381881663445151323266575394 3649227970422690469305135088104621928676013036474412367065826577456525169 9851349360204073235768035720102081408651568383935674777433519021016931172 0962506684179407159824884668004581146345625430570260083442728751522255963 6613251768712917596895179532314049464823036176041089662244758723460968285

121

Matrix Command Language

4191733256205711195695710155099750814377036616922522753789328720253554539 2344474454352662945653575100394159947087765297030779022580375458745250914 46728867302226218905058287716250997200000000000000M+0

VPAX and Inverse VPAX at high accuracy

VPAX = Matrix of 6 by 6 elements VPA - FM

1 2 3 4 5 6 1 .143233M+1 .114896M+1 .122384M+1 .639719M+0 .804568M+0 .138818M+1 2 .104079M+1 .123469M+1 .410141M+0 .103084M+1 .244054M+0 .383024M+0 3 .287331M+0 .677674M+0 .706023M+0 .941461M+0 .107230M+1 .164661M+0 4 .113037M+1 .778228M+0 .112602M+1 .629926M+0 .761200M+0 .372681M+0 5 .110723M+1 .554427M+0 .861046M+0 .495403M+0 .127978M+1 .370824M+0 6 .581829M+0 .124414M+1 .432617M+0 .726058M+0 .134732M+1 .140016M+1

IVPAX = Matrix of 6 by 6 elements VPA - FM

1 2 3 4 5 6 1 .727136M+0 .730328M+0 -.308940M+0 -.175557M+1 .158871M+1 -.837851M+0 2 -.342033M+1 -.219101M+0 -.233482M+1 .595158M+1 -.228244M+1 .274594M+1 3 -.109686M+0 -.901577M+0 .229806M+0 .219987M+1 -.137187M+1 .106148M+0 4 .283037M+1 .937660M+0 .272689M+1 -.526084M+1 .135031M+1 -.234069M+1 5 -.758210M+0 -.279579M+0 -.193377M+0 .311109M+0 .665131M+0 .591982M+0 6 .203284M+1 -.474316M-1 .904066M+0 -.280993M+1 .451570M+0 -.766261M+0

Matrix of 6 by 6 elements VPA - FM

1 2 3 4 5 6 1 .100000M+1 -.168696M-1785 .000000M 0 -.200000M-1784 .196614M-1784 -.200000M-1784 2 .266236M-1784 .100000M+1 .143811M-1784 -.500000M-1784 .297167M-1784 -.177795M-1784 3 .105965M-1784 -.129815M-1785 .100000M+1 -.103406M-1784 -.856883M-1785 -.529863M-1785 4 -.489586M-1786 -.602335M-1785 -.793529M-1785 .100000M+1 .113342M-1784 -.942817M-1785 5 -.866758M-1785 -.612497M-1785 -.136012M-1784 .000000M 0 .100000M+1 -.160420M-1787 6 .000000M 0 -.503337M-1785 .100000M-1784 -.100000M-1784 .110410M-1784 .100000M+1

*************************************************

Table 16.8 shows the Filippelli Data set benchmark, an extended printout of the QR real*16 results and the expanded Filippelli benchmark calculated with VPA data to 40 digits. A ruler listed at the top table is designed to assist the reader in determining at which digit there is a difference. Consider coefficient # 1. The VPA beta agrees with the real*16 QR beta up to the 28th digit, which is far beyond the 15th digit, which was all that was listed for the "benchmark" which is shown again in Table 16.8. The VPA experiment documents that the real*16 calculation is in fact substantially more accurate than the best real*8 QR, which produced, on average 7 digits, as reported in Table 16.3. Recall that the "converted" real*16 results (data converted from real*8 to real*16), reported in Table 16.2 Experiment 5, had only a marginally better LRE of 7.924 than the real*8 QR results that found the LRE was 7.31. Although in Tables 16.2 & 16.3 it was reported that the "true" real*16 QR results (data loaded directly into real*16), had a LRE value was 14.79, once we had the VPA benchmark for 40 digits, it was apparent that the LRE was substantially larger. Even calculations of the 10th and 11th coefficients, when compared with the VPA data, produced 27 digits of accuracy. It should be remembered that these impressive results for real*16 are due to both the accuracy of the calculation and the fact that the data was directly read into real*16, not converted from real*8 to real*16. As we have shown, the data base precision makes a real difference in addition to the precision of the calculation. The

122

Chapter 16

important implication is that the inherent precision of the calculation method will be no help and in fact may give misleadingly "accurate" results unless the data is read with sufficient precision.23

Some of the key lessions of this paper are listed in Table 16.9. The main finding is the accuracy tradeoff between the precision of the data and the calculation method used. In all cases, it is important to check for rank problems before proceeding with a calculation. The less the precision of the data, the more appropriate it is to consider higher accuracy solution methods such as the QR and the SVD approach.24 Real*4 data saving precision, and more important calculation precision, was found to be associated with substantial loss of accuracy, even with less stiff data sets.

23 In order to 100% isolate the VPA results from data reading issues, the loading of data into the VPA array proceeded as follows. The real*16 data was printed to a character*1 array using e50.32. Next, the VPA string input routine was used to convert this character*1 array into a VPA variable. This way both real*16 and the VPA results were using the same data. Experiments were also conducted by reading the data in character form directly into the VPA routines. For this problem both methods of data input into VPA made no difference since there were relative few digits. In results not reported but available in paper_86.mac, the Filippelli problem was "extended" by adding to the right hand side to make the problem more difficult (stiff). Both the VPA and the native real*16 experiments were run and both successfully solved the problem, suggesting "reserve" capability to handle a substantially more stiff problem.

24 While the main trust of the paper has been to show the effect of various factors on the number of "correct" digits of a calculation, in applied econometric work an important consideration is how many digits to report. If the government data is known only to k digits, many researchers argue that only k digits of accuracy should be reported. In many situations, this is appropriate although such a practice makes it difficult to access the underlying accuracy of the calculation routines used in the software system. Clearly if variables such as or are to be calculated, all

estimated digits should be used to insure etc.

123

Matrix Command Language

Table 16.8 VPA Alternative Estimates of Filippelli Data set_______________________________________________________________________ 10 20 30 40 50 12345678901234567890123456789012345678901234567890 -------------------------------------------------- VPA BETA 1 -.2772179591933423928028447556649596044434M+4 Real*16 QR beta -0.2772179591933423928028447553572108500000E+04 Answer for coef -0.2772179591933420E+04 VPA SE 1 .5597798654749498745747725508021651489727M+3 Real*16 QR SE 0.5597798654749498745747725479752748700000E+03 Answer for SE 0.5597798654749500E+03

VPA BETA 2 -.2316371081608930758821967916501044936138M+4 Real*16 QR beta -0.2316371081608930758821967914097820200000E+04 Answer for coef -0.2316371081608930E+04 VPA SE 2 .4664775721277964526931098320484471124838M+3 Real*16 QR SE 0.4664775721277964526931098297461005100000E+03 Answer for SE 0.4664775721277960E+03

VPA BETA 3 -.1127973940983715698571670015266249731414M+4 Real*16 QR beta -0.1127973940983715698571670014199826100000E+04 Answer for coef -0.1127973940983720E+04 VPA SE 3 .2272042744777513106293981763510244738352M+3 Real*16 QR SE 0.2272042744777513106293981752622826700000E+03 Answer for SE 0.2272042744777510E+03

VPA BETA 4 -.3544782337033487716107384852595281875294M+3 Real*16 QR beta -0.3544782337033487716107384849646966900000E+03 Answer for coef -0.3544782337033490E+03 VPA SE 4 .7164786608759273726166572118158443735326M+2 Real*16 QR SE 0.7164786608759273726166572085071780100000E+02 Answer for SE 0.7164786608759270E+02

VPA BETA 5 -.7512420173937571389052207557481187222874M+2 Real*16 QR beta -0.7512420173937571389052207552268365400000E+02 Answer for coef -0.7512420173937570E+02 VPA SE 5 .1528971787474000650307567904607140782062M+2 Real*16 QR SE 0.1528971787474000650307567897859220700000E+02 Answer for SE 0.1528971787474000E+02

VPA BETA 6 -.1087531803553425108528108118290083531722M+2 Real*16 QR beta -0.1087531803553425108528108117714492600000E+02 Answer for coef -0.1087531803553430E+02 VPA SE 6 .2236911598160332755518623413323850745016M+1 Real*16 QR SE 0.2236911598160332755518623403977080500000E+01 Answer for SE 0.2236911598160330E+01

VPA BETA 7 -.1062214985889467664596611220591597363944M+1 Real*16 QR beta -0.1062214985889467664596611220235596600000E+01 Answer for coef -0.1062214985889470E+01 VPA SE 7 .2216243219342274020661298346608897939687M+0 Real*16 QR SE 0.2216243219342274020661298337934033000000E+00 Answer for SE 0.2216243219342270E+00

VPA BETA 8 -.6701911545934083759267341228848844976973M-1 Real*16 QR beta -0.6701911545934083759267341228119136200000E-01 Answer for coef -0.6701911545934080E-01 VPA SE 8 .1423637631547239489182330919953278852498M-1 Real*16 QR SE 0.1423637631547239489182330914795936200000E-01 Answer for SE 0.1423637631547240E-01

VPA BETA 9 -.2467810782754786508408544524189188555839M-2 Real*16 QR beta -0.2467810782754786508408544524564670500000E-02 Answer for coef -0.2467810782754790E-02 VPA SE 9 .5356174088898209362586519329555783802279M-3 Real*16 QR SE 0.5356174088898209362586519311846583900000E-03 Answer for SE 0.5356174088898210E-03

VPA BETA 10 -.4029625250804036712971315485276426445821M-4 Real*16 QR beta -0.4029625250804036712971315487091695800000E-04 Answer for coef -0.4029625250804040E-04 VPA SE 10 .8966328373738682221004152725410272047808M-5 Real*16 QR SE 0.8966328373738682221004152698795102200000E-05 Answer for SE 0.8966328373738680E-05

VPA BETA 11 -.1467489614229795882287848515307287127546M+4 Real*16 QR beta -0.1467489614229795882287848513596070800000E+04 Answer for coef -0.1467489614229800E+04 VPA SE 11 .2980845309955369852005523437755166954313M+3 Real*16 QR SE 0.2980845309955369852005523422443903600000E+03 Answer for SE 0.2980845309955370E+03

_____________________________________________________

124

Chapter 16

Table 16.9 Lessons to be learned from VPA and Other Accuracy Experiments_______________________________________________________________________

1. The QR method of solving an OLS regression model can provided 1-2 more digits of accuracy and in fact may be the only way to successfully solve a "stiff" or multicollinear model.

2. The precision in which data are initially loaded into memory (for example, single precision) impacts accuracy, even in cases when it is later moved to a higher precision (for example double precision) for the calculation. This suggests that data should be read into the precision in which the calculation is made to avoid numeric representation accuracy issues that occur when the precision of the data is increased. This means that the current practive of a number of software systems to save data in real*4, but move this data into real*8 for calculations is a dangerous practive that unnecessarily induces accuracy issues.

3. In many cases, accuracy gains can be made by boosting the precision of accumulators such as the BLAS routines for sum, absolute sum and dot product. Such routines should be used throughout software systems and will increase the accuracy of the variance and other calculations. It is desirable to be able to switch on and off such accuracy improvements to test the sensitivity of the given problem to these changes. Accuracy improvements to these routines have a CPU cost.

4. Data base design should take into account the needs of the users who may want to read data into higher-than-usual precision. For data that is not transformed in a data bank, the user should be able to get all reported digits of precision without rounding (due to numeric representation) loss. This means allowing a character representation to be accessable and well as a real*4 or real*8 representation. This suggestion has far reaching implications since most if not all databanks save data in some kind of real format. If character saving of all data is not possible, a partial solution would be to save all data in at least real*8.

5. The new 64-bit computers will make higher-precision calculations more viable and may prove useful for the estimation of problems requiring high precision for their successful solution. Real*16 and complex*32 will not have to be emulated in software by the compilers. These technological changes on the hardware side suggest that software designers may want to offer greater than double precision math in future releases of their products.

6. The lower the precision of the data, the more imperative it is to check for rank problems, use high-quality numeric routines (lapack/linpack etc.) and utilize inherently higher accuracy solution methods, such as the QR. For many problems, however, if data are read with sufficient accuracy, this may not be needed.

7. If data are not initially read with sufficient precision, high-accuracy methods of calculation, such as the QR, can provide misleadingly "accurate" results that are in fact tainted by numeric representation issues inherent in the initial data read. This initial data "corruption" cannot be "cured" by any subsequent increase in data precision. The more "stiff" the problem, the more this becomes an important consideration.

125

Matrix Command Language

16.8 Conclusion

In the 1960's when econometric software was not generally available, users passed around Fortran programs, usually with crude column dependent command structure. In that era, an applied econometrician needed to know the theory, the econometrics and in addition be able to program in Fortran. In the 70's this practice gave way to commercially available procedure driven software that could perform the usual analysis. In parallel a number of 4th generation languages were developed. These included APL, the SAS® Matrix Language, Speakeasy® and later MATLAB®, Gauss® and Mathmatica®. The B34S matrix command, while patterned on Speakeasy, is targeted for econometric analysis of time series, nonlinear detection and modeling and Monti Carlo analysis. The learning curve for its use is substantially less than Fortran and many built-in commands allow the user to program custom calculations relatively quickly. For routine analysis the built-in procedures are usually sufficient.

The examples in this chapter illustrate the wide range of problems that can be solved using data saved in a number of precisions. Inspection of matrix language programs, as well as programs written in othetr 4th generation languages, both document the calculation being made and facilitate replication exercises with other software. Such systems make it easy to experiment, something substantially more difficult when Fortran and or C programs had to be custom built for each research step.25 In many of the chapters there is more discussion of specific problems that were studied with the help of the matrix command.

25 Stokes (2004b) extensively discussed this aspect of modern econometric software languages.

126