Nearest neighbor matching USING THE GREEDY MATCH MACRO Note: Much of the code originally was written...
-
Upload
dulcie-simmons -
Category
Documents
-
view
234 -
download
0
Transcript of Nearest neighbor matching USING THE GREEDY MATCH MACRO Note: Much of the code originally was written...
Nearest neighbor matchingUSING THE GREEDY MATCH MACRO
Note: Much of the code originally was written by Lori Parsonshttp://www2.sas.com/proceedings/sugi26/p214-26.pdf
This code has been written with simplicity as a primary concern. If you do not have a large number of controls, you may want to modify it
/* Define the library for formats */
LIBNAME saslib "G:\oldpeople\sasdata\" ;
OPTIONS NOFMTERR FMTSEARCH = (saslib) ;
%propen(libname, dsname, idvariable, dependent, propensity)
LIBNAME = directory for data setsDSNAME = dataset with study dataIDVARIABLE = subject ID variableDEPENDENT = dependent variablePROPENSITY = propensity score produced in logistic regression
%propen(study,allpropen,id,athome,prob);
FOR EXAMPLE
Remember, we already have the study.allpropen dataset with the propensity score (prob) from the PROC LOGISTIC we just did
Propensity scores rounded to 5, then 4, 2, 3 and 1 decimals
%Do countr = 1 %to 5 ;%let digits = %eval(6 - &countr) ;%let roundto = %eval(10**&digits) ;%let roundto = %sysevalf(1/&roundto) ;%let nextin = %eval(&digits - 1) ;
MACRO NOTES
%Do countr = 1 %to 5 ;/* Starts %DO loop */
Use %EVAL function to do integer arithmetic
%let digits = %eval(6 - &countr) ;
Use %SYSEVALF function to do non-integers
/* Output control to one data set, intervention to another */
/* Create random number to sort within group */
Create 2 data sets
DATA yes1 (KEEP= &prob id_y depend_y randnum) no1 (KEEP = &prob id_n depend_n randnum ) ;SET in&digits ;
We go through this loop 5 times and create data sets of records matching to 5, 4, 3, 2 and 1 decimal placesWe only keep four variables
Assignment statements
randnum = RANUNI(0) ;&prob = ROUND(&prob,&roundto) ;
Create a random number andRound propensity score to a set
number of digits
Output to Case Data set …
IF &depend = 1 THEN DO ;id_y = &id ;depend_y = &depend ;OUTPUT yes1 ;
END ;
We need to rename the dependent & id variables or they’ll get overwritten
… Or output control data set
ELSE IF &depend = 0 THEN DO ;
id_n = &id ;depend_n = &depend ;OUTPUT no1 ;
END ;
Notice the data sets were named no1 and yes1It becomes evident why shortly
/* Runs through control and experimental and matches up to 20 subjects with identical propensity score */
%Do i = 1 %to 20 ;
%let j = %eval(&i +1) ;proc sort data = yes&i ;
by &prob randnum ;data yes&i yes&j ;
set yes&i ;by &prob ;if first.&prob then output yes&i ;
else output yes&j ;
NOTE: Matching without replacement
Same thing for controls
proc sort data = no&i ;by &prob randnum ;
data no&i no&j ;set no&i ;by &prob ;if first.&prob then output no&i ;
else output no&j ;
The randnum insures matching scores are pulled at random
Merge matches, end loop
DATA match&i ;MERGE yes&i (in= ina) no&i (in= inb) ;BY &prob ;IF ina AND inb ;
run ;%END ;
/* Adds all matches into a single data set */
DATA allmatches ;
SET%DO k = 1 %TO 20 ; match&k %END ;
Concatenate all data sets with matches (N=20)
Create two data sets with IDs
DATA allyes (RENAME = (id_y = &id
depend_y = &depend))
allno (RENAME = (id_n = &id depend_n = &depend));
SET allmatches ;
Create one file of all matched IDsDATA matchfile ;
SET allyes allno ;
And sort it …
proc sort data = matchfile ;by &id &depend ;
DATA MATCHES&DIGITS IN&NEXTIN ;
MERGE IN&DIGITS (IN = INA)
MATCHFILE (IN= INB) ;
BY &ID &DEPEND ;
IF INA AND INB THEN OUTPUT
MATCHES&DIGITS ;
ELSE OUTPUT IN&NEXTIN ;
/* Creates a data set of all subjects with n-digit match *//* Creates a second data set of subjects with no match */
TITLE "MATCHES &ROUNDTO " ;PROC FREQ DATA = MATCHES&DIGITS ;
TABLES &DEPEND ;RUN ;%END ;
JUST A GOOD HABIT TO CHECK AS THE LOOP RUNS THROUGH
End loop. Now match to 4 decimal places, etc
/* Adds 1- to 5-digit matches into a single data set */
data &lib..finalset ;
set%do m = 1 %to 5 ; matches&m %end ;
One final check & done !
Title "Distribution of Dependent Variable in &lib..finalset " ;
proc freq data = &lib..finalset ;tables &depend ;
run;%mend propen; run ;
Did it work?
Variable
QUINTILES NEAREST NEIGHBOR
AT Home
NOT Home
Prob AT Home NOT Home
Prob
Age 79.2 79.3 .60 79.1 79.1 .76
ER visits
4.5 ****
3.8 ****
.0001 4.2 4.2 .88
Female 52% 54% .36 50% 50% .74
Race .97 .67
** P <.01 **** P < .0001
Model Comparison
TESTWithout
MatchingQuintile
MatchingNearest
NeighborLikelihood Ratio
643.1 180.8 186.6
Score 582.4 176.0 181.4
Wald 485.6 165.7 170.4