The Statistical Administrative Records System and Administrative Records Experiment 2000: System...

51
The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning, Research and Evaluation Division U.S. Census Bureau

Transcript of The Statistical Administrative Records System and Administrative Records Experiment 2000: System...

Page 1: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

The Statistical Administrative Records System and Administrative Records

Experiment 2000: System Design, Successes, and

Challenges

Dean H. Judson

Planning, Research and Evaluation DivisionU.S. Census Bureau

Page 2: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Outline of Presentation

• General principles for using administrative records properly

• Overview of StARS/AREX history, goals and design

• Applications and evaluations: StARS 1999 and StARS 2000 versus Census 2000

Page 3: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

General Principles for Using Administrative Records Properly

Page 4: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

How Administrative Records Are Created and Used

P resen ta tion(q u ery resu lts an d d isp lays )

D atab ase

R ecord ed E ven ts an d O b jec ts(ad m in is tra tive record )

O b served E ven ts an d O b jec ts("sam p lin g fram e")

E ven ts an d O b jec ts(p op u la tion )

Policy changes which change the definition of events and objects

“Ontologies” and thresholds for observation

Data entry errors and coding schemes

Data management issues

Query structure and spurious structure

Data collection

Page 5: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Some Important Principles

• Database Population !

• Database Truth !

• The “true” Data exist in the “real world”, as does the “true” Population.

• But, the database gives us information that points to the Truth, and points to the Population.

Page 6: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Populationin StARSDatabase

Resident U.S. Population on April 1, 2000

Deceased

Non-U.S.Residents

Acc

iden

tal D

upli

cati

on

Oops!Accidentally

includedcontractors!

Populationin Employee

Database

“Current” employees of Company X,October 1, 2001

Terminated,not yet entered

in database

Acc

iden

tal D

upli

cati

on

Page 7: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

State 1

State 2

State 3

State 1

State 2

State 3

State 4

State 1

State 2

State 3

State 1

State 2

State 1

State 2

State 3

State 1

State 2

State 1

State 2

State 1

State 2

State 3

State 4

Proper Representation Incomplete Representation

Ambiguous Representation Meaningless States

Data Quality The function that maps from “real world” to database allows one to reconstructthe “real world” from the database values. Source: Wand and Wang, 1996:90

Ontologies and Data Quality

“Real world” Database

“Real world” Database“Real world” Database

“Real world” Database

Page 8: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Coverage of Target Population

Inte

nsit

y/C

onte

nt o

f D

ata

Col

lect

ion

Low High

Low

High

Administrative Records/Data Warehouse

Careful, well-donesample survey

Coverage versus Intensity/Content:How can we get the best of both?

Page 9: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Original DW Database (X)

Augmented DW Database, withX and estimated Y’s

CarefullyCollected Data (Y)

RepresentativeSample of XX

“GroundTruth”

Estimated Model: Y=f(X)

A Model for “Borrowing Strength”

Page 10: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Statistical Administrative Records System and Administrative Records

Experiment

Page 11: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Background and History

• Statistical Administrative Records System– Six large Federal input files: IRS 1040, IRS

1099, Selective Service, Medicare, Indian Health Service, HUD-TRACS/MTCS

– One lookup file: SSA/Census NUMIDENT

• AREX 2000– Attempt to use StARS data to simulate

administrative records census

Page 12: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

What Was the Purpose of StARS 1999 and AREX 2000?

• Test the feasibility of an administrative records census– StARS: Nationwide– AREX: two counties in Maryland, three in Colorado

• MD 1.4M persons in 558K households• CO: 1.2M persons in 459K households

• Test two methods for conducting an administrative records census – top-down method– bottom-up method (match to address list, add’tl

operations)

Page 13: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Can We Do This?

• Title 13, U.S. Code (§6, (a)-(c) abridged:– “The Secretary…may call upon any other department…of the Federal

Government…for information pertinent to the work provided for in this title…To the maximum extent possible, the Secretary…shall use [such] information instead of conducting direct inquiries”

• Privacy Act, 1974 (Title 5 §6, abridged):– “No agency shall disclose any record…unless…to the Bureau of the

Census for purposes of planning or carrying out a census or survey or related [title 13] activity”

– “Each agency that maintains a system of records shall…publish in the Federal Register upon establishment…the existence and character of the system of records” (Published StARS in FR , January 1999)

Page 14: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

The Statistical Administrative Records System-1999

TY98 IRS 1040119,946,193

TY98 IRS 1099598,075,971

Medicare56,837,022

Selective Service

13,176,234

HUD TRACS3,342,234

Indian Health Service

3,106,821

EditedIRS 1040

243,260,776

EditedIRS 1099

EditedMedicare

EditedSelective Service

EditedHUD TRACS

EditedIndian Health

Service

NUMIDENT676,589,439

CensusNUMIDENT

396,185,872

Address Processing795,742,702

Person Characteristics

File (PCF)396,185,872

Hygiene & Unduplication136,154,293

Geocoding102,965,122 (75.6% Coded)33,189,171 (24.4% Uncoded)

Person Processing875,750,973

SSN Validation (PVS)844,945,296 Valid

(96.5%)

Unduplication279,601,038

Remove Deceased/Create

Composite Record

257,764,909

Extraction of AREX Test Site Records1,459,760 in Baltimore Site1,229,274 in Colorado Site

InvalidSSNs

30,805,677(3.5%)

RaceModel

GenderModel

MortalityModel

TIGER

Code 1

ABI

? Research

Page 15: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

EditedMTCS

6,208,615

EditedIRS IMF

253,825,653

EditedHUD TRACS

1,991,655

EditedSSS

14,538,895

EditedMedicare

59,197,759

EditedIRS IRMF

568,109,788

Statistical Administrative Records System-2000 (DRAFT)

TY99 IRS IMF124,729,862

TY99 IRS IRMF583,642,950

Medicare59,198,432

Selective Service

13,370,053

HUD TRACS1,991,672

Indian Health Service

2,730,407

EditedIHS

2,728,548

NUMIDENT721,228,119

CensusNUMIDENT

408,447,131

Address Processing725,230,009

Hygiene & Unduplication158,593,956

Geocoding125,647,359

Person Processing905,432,071

SSN Validation895,196,891

Unduplication289,968,449

Remove Deceased/Create

Composite Record

265,950,850

InvalidSSNs

10,235,180

RaceModel

GenderModel

MortalityModel

TIGER/MAF

Code 1

ABI

?

HUD MTCS6,232,562

Person Characteristics

File (PCF)408,447,131

Page 16: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Administrative Records Experiment in 2000 (AREX 2000)

• Five selected sites in Maryland and Colorado– MD: Baltimore city, Baltimore county;

– CO: El Paso county, Douglas county, Jefferson county

• Attempt to simulate an Administrative Records Census

• Not all aspects of an Administrative Records Census are simulated– Group Quarters survey

– Coverage measurement survey

• Special operations not included in StARS– Request for physical address (PO boxes/Rural Route’s)

– Clerical hand geocoding

– Field verification of addresses not matched to DMAF

Page 17: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

AREX 2000 Evaluations

• Process: Analyzing selected components of the AREX implementation processing

• Outcomes: Block level analysis: Age/Race/Sex/Hispanicity comparisons to Census 2000

• Household level analysis:– Comparing household distributions for matched addresses

– Assessing the feasibility of using administrative records in lieu of a field interview to obtain data on nonresponding households

• Available at www.census.gov/pred/www/rpts.html#AREX

• (Synthesis of results from the Administrative Records Experiment in 2000)

Page 18: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Characteristics of Files Included in the StARS System

• IRS Individual Master 1040 File:– Tax year data; April, 2000 refers to “tax year” 1999– TY ‘99 file arrives October, 2000– Business entities, estates, other institutions included– ~120 million return records/year; maximum of six person records per

return – Households below the filing threshold do not need to file– Late filers systematically different than early filers– Tax Filing Unit Housing Unit: 10-20% of addresses are PO Boxes,

business addresses, tax preparers (Czajka, 2000)– TY95+: SSN’s of dependents requested, recorded– .5% of primary filer, 1.6% of secondary filer, 3.4% of dependents’ SSN’s

in error (Czajka, 1987)– Age, race, sex, Hispanic origin microdata not available

Page 19: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Characteristics of Files Included in the StARS System, cont.

• IRS Information Returns Master File:– Tax year data; April, 2000 refers to “tax year” 1999

– TY ‘99 file arrives October, 2000

– Business entities, estates, other institutions included

– ~700 million records/year

– Recipient address Housing Unit

– 10-20% of addresses are PO Boxes, business addresses, tax preparers

– Extremely limited microdata content: Age, race, sex, Hispanic origin microdata not available; name information often truncated

– Possible source of information on undocumented persons

Page 20: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Characteristics of Files Included in the StARS System, cont.

• Selective Service File:– Requested 4/1/99(00) file “cut date”

– ~13 million records

– Registration required in 1940, suspended in 1975, resumed in 1980

– Presumably, males 18-25 are required to inform SSS when they move

– Females, non-immigrant aliens, hospitalized, incarcerated, and institutionalized males, and members of the armed forces are exempt

– Limited microdata content: Race, Hispanic origin microdata not available

– Address information may not be current

Page 21: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Characteristics of Files Included in the StARS System, cont.

• Medicare Enrollment Database (EDB):– Requested 4/1/99(00) file “cut date” -- current and historical Medicare

enrollment (“Active” and “Inactive” cases)– ~ 40 million records at any one point in time– Recipient Address Housing Unit

• Proxy recipients listed on the file (e.g., John Doe’s benefits c/o Jane Doe; John Doe’s benefits c/o nursing home)

– Used in population estimates system for 65+ household population estimates

– A small portion of records at any point in time are almost certainly deceased (Kim and Sater, 2000)

– Coverage is high (93-102%) but not perfect and unevenly distributed geographically

• “Snowbird” states appear to have lower ratios of Medicare to 65+ population than “non-snowbird” states (Kim and Sater, 2000)

Page 22: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Characteristics of Files Included in the StARS System, cont.

• Indian Health Service patient file:– Requested 4/1/99(00) file “cut date”

– ~10 million patient/transaction records

– Transaction record person record

– Unduplication• about 10 million patient records, 2 million unduplicated SSN’s

– Many missing SSN’s (about 20%)

– Integral part of our race model

Page 23: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Characteristics of Files Included in the StARS System, cont.

• Housing and Urban Development Tenant Rental Assistance Certification System (HUD-TRACS/MTCS):– Requested 4/1/99(00) file “cut date”

– HUD subsidy payments

– TRACS 1999: ~ 3.3 million records

– TRACS 2000: ~ 2 million records

– Short form data for all members of household (Race/Hispanic only for head of household)

– Address information may represent project or landlord address

Page 24: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Characteristics of Files Included in the StARS System, cont.

• Census NUMIDENT File:– ~700 million transaction records 400 million individual SSN records– Post 1985: Enumeration at birth– For each SSN: Date of birth, gender, race, place of birth

• About 50-60 million persons on the file are deceased but not identified as such

• No current residence information on the file• Taxpayer ID Numbers (TINs) not on the file• Demographic properties:

– About 35% of SSN’s on file have alternate names (marriage, divorce, etc.)– About 6% missing gender– Race coding has changed (prior to 1980, 3 races: White, Black, Other);

20% either “unknown” or “other”– About 25% of SSN’s have transactions with different race codes

Page 25: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Creating Final StARS Database

• Select best address and demographics based on– geocodability

– currency

– quality

• Impute missing demographics (from NUMIDENT/PERSON CHARACTERISTICS FILE)

• Flag records for deceased people• Final database is like the census

Page 26: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Address Processing Results (StARS 1999)

• Almost 800 million addresses at start

• About 6 percent identified as potential businesses

• 136 million address records after unduplication

• About 75 percent geocoded– 85 percent geocoding rate for city-style

addresses

Page 27: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Person Processing Results (StARS 1999)

• 875 million records at start• 845 million have valid SSN record (96.5%)• 280 million after unduplication by SSN• 261 million after removal of known deceased• 257 million after removal of known deceased and

persons residing in outlying territories• StARS 2000: 266 million after removal of known

deceased before April 1, 2000 and persons residing in outlying territories

Page 28: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Additional Operations of AREX 2000

• Clerical geocoding

• Request for physical address (for P.O. Boxes, Etc.)

• Match to Decennial Master Address File

• Field address verification

Page 29: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Major Analytic Issues with StARS Processing

• Ontologies– The way in which an administrative agency “defines” the world may not

match the way the Census Bureau “defines” the world, e.g.,

– A delivery address suitable for receiving a payment check may not suffice for putting individuals at a street address

– Difficult to distinguish individual units within the Basic Street Address

– Race coding: Hispanic Origin is a separate race on NUMIDENT

– Transaction data person data

– How many names does a person have (and in what order)?

• Proxies – IRS & Medicare records– JOHN WILSON The address is (presumably) for Mary Smith. John Wilson may or

– C/O MARY SMITH may not live there.

– 1004 LAUREL LANE

– ROCKMONT, MD 22345

Page 30: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Major Analytic Issues with StARS Processing, cont.

• Addresses that are difficult to place on the ground– About 10 % of addresses are rural style

– PO Boxes: 45% for IHS, 9.5% for Medicare, 7.5% for IRS 1040, 6.8% for SSS, 3.8% for IRS 1099, .4% for HUD-TRACS (Huang and Kim, 2000)

– 1995 IRS/CPS match: 86.5% of tax return cases had the same address as residence address, 94% coded to same county (Sater, 1995)

• John Smith

• H&R BLOCK

• P.O. BOX 12

• GREENWAY, MD 29752

– Addresses with both business and residential components• Dean H. Judson

• JUDSON OLD GROWTH LOGGING SERVICES

• 45850 BACKWOODS HIGHWAY

• BOONDOCKS, OR 96432

Page 31: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Major Analytic Issues with StARS Processing, cont.• Unduplication and matching

– Addresses and personal characteristics are measured with substantial variation• Often not obvious whether a particular pair of records represent a duplicate or not.• Yet, with multiple files, unduplication decisions must be made.

– Address matching:

101 Elm Rd, # 1 97132

101 Elm St, apt 1 97701

Versus

101 Elm Rd, #1 97132

101 Elm St, apt 1 97132

Page 32: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Major Analytic Issues with StARS Processing, cont.

• Variations in data from different sources– Of the 50% of SSN’s found on multiple files,

• about 1% have more than one gender recorded • about 32% have multiple addresses• about 2% have multiple races (Huang and Kim, 2000)

• “Imputation” from the NUMIDENT– Many files have limited microdata. For those that are found on the

NUMIDENT, we can “impute” microdata from the approximately equivalent NUMIDENT fields.

• Race Model (Bye, 1998,1999)• Gender Model (Thompson, 1999)• Mortality Model (Falkenstein, Resnick, and Judson, 2000)

– StARS 2002 “NUMIDENT Race Enhancement”• Match NUMIDENT to Census 2000• Use Census 2000 race response to improve imputation model

Page 33: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Major Analytic Issues with StARS Processing, cont.

• Changing information states– Distinct problem from “point in time” data collection

– Information states change over time/over databases• Address information ages over time and varies over databases

SAM SMITH SAM SMITH

BOX 2 RURAL ROUTE 37 486 MAIN STREET

WESTPORT, VA 32784 FAIRFIELD, VA 33412

(Dated 10/14/98 from Medicare) (From TY97 IRS file, filed sometime in 1998)

• Mortality information ages over time and varies over databases

• One database provides information about the other, provided that matching can be performed

• Data processing requires complex, and substantively important, decision logic at each step

Page 34: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Applications and Evaluations

Page 35: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Applications• SSN search and validation with GEOkey

– Earlier: 90% found in validation step, 5% in search step– 2001 Evaluation: 92% found in search (with GEOkey) alone– Apparently, our computer search outperforms SSA manual system

• CPS/NHIS/ACS to Census matching evaluations– Compare different race responses– Compare survey and Census coverage– Compare variations in Poverty estimates

• Evaluation of synthetic estimation methods (Popoff, Judson and Fadali, 2001)

• Multiple-system Estimation for coverage evaluation– Additional information to aid dual-system estimation (Asher and

Feinberg, 2001)– Erroneous enumerations (Biemer, Brown, Wiesen, and Judson, 2001)

Page 36: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Applications• Nonresponse follow up (NRFU) substitution (’04 simulation test)

• Imputation methods improvement (’04 simulation test)

• Master Address File (MAF) targeting

• Census unduplication confirmation

• Population estimation (postcensal estimates)

• Survey improvement (noninterview adjustments)

Page 37: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Evaluations• Numident/PCF 1998 versus 1998 National estimates (Miller, Judson and Sater, 2000)

• State level comparisons of StARS 2000 versus Census 2000

• County StARS-synthetic methods versus county ratio estimates and Census 2000

• Detailed comparison by (fully crossed) age, race, sex, and Hispanic origin counts versus Census 2000, at the county level

• AREX tract, block, household evaluations on February 19th

Page 38: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Population Distribution by Age

0%

5%

10%

15%

20%

Under 10 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90+

National estimates PCF population BEFORE applying the mortality model PCF population AFTER applying the mortality model

Numident/PCF 1998 versus 1998 National Estimates

Page 39: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Percent Distribution

72.4%

12.1%

3.6%0.7%

10.1%

0.6% 0.2% 0.1%

69.2%

12.5%

4.3%0.8%

12.7%

0.3% 0.1% 0.1%0%

20%

40%

60%

80%

100%

White Non-Hispanic

Black Non-Hispanic

API Non-Hispanic

AI Non-Hispanic

WhiteHispanic

BlackHispanic

APIHispanic

AI Hispanic

National Estimates

PCF File

Numident/PCF 1998 versus 1998 National Estimates

Page 40: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

AL

AK

AZ

AR

CA

CO

CTDE

DC

FLGA

HIID IL

INIAKS

KYLA

MEMDMA

MI

MN

MS

MOMTNE

NV

NH

NJ

NM

NY

NC

NDOH

OKOR

PA

RI

SC

SDTN

TXUT

VTVAWA

WV

WIWY

US

0.95

0.97

0.99

1.01

1.03

1.05

1.07

1.09

1.11

1.13

States

Rat

io C

ensu

s 20

00/S

tAR

S 2

000

Over the entire U.S., Census 2000 is about 6% higher than StARS 2000. Alaska is the only state where StARS 2000 exceeds Census 2000.

State Level Comparisons of Census 2000 to StARS 2000

Page 41: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

County StARS-synthetic Methods versus 1999 Estimates

Comparison of 99 Estimates and StARS 99 Race/Sex Distribution (Three Counties in Colorado)

0%

10%

20%

30%

40%

50%

Whitemale

Whitefemale

Blackmale

Blackfemale

AIANmale

AIANfemale

API male APIfemale

99 Estimates

StARS 99

Page 42: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

County StARS-synthetic methods versus 1999 Estimatesversus Census 2000

% Hispanic (StARS 99 vs. 99 Estimates vs. Census 2000, selected counties where StARS and Estimates deviate by more than 4

percentage points, counties in Colorado)

0

10

20

30

40

50

60

70

80

90

Alamos

a

Archu

leta

Bent

Chaffe

e

Conejo

s

Costill

a

Crowley

Frem

ont

Garfie

ld

Huerfa

no

Kiowa

La P

lata

Las A

nimas

Linco

ln

Mine

ral

Mor

gan

Otero

Phillip

s

Pueblo

Sagua

che

San Ju

an

StARS 99

Census 2000

99 Estimates

Counties in which StARS 99 is closer to Census 2000 are marked with a star.

Page 43: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Fully crossed age, race, sex, and Hispanic Origin array(ARSH array)

• For every county in the U.S., count the number of nondeceased persons by:– Single year of age (0,101+)

– Race (four groups)

– Sex (two groups)

– Hispanic origin (Hispanic/non)

– Potentially 102 x 4 x 2 x 2 = 1632 cells per county, 3141x1632 = 5,126,112 in the U.S.

• Error Measures:– Simple difference (C-S)

– Algebraic percent error (S-C)/C

Page 44: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Note: Each data point is a single

county’s ARSH cell.

Page 45: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Note: Each data point is a singlecounty’s ARSH cell.

Page 46: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Age/Sex distributions, selected counties in Texas

Graphs by SEX (MF)AGE

StARS % of total population Census % of total population

F

0 10 20 30 40 50 60 70 80 90

0

.5

1

1.5

M

0 10 20 30 40 50 60 70 80 90

Graphs by SEX (MF)AGE

StARS % of total population Census % of total population

F

0 10 20 30 40 50 60 70 80 90

0

.5

1

1.5

M

0 10 20 30 40 50 60 70 80 90

Graphs by SEX (MF)AGE

StARS % of total population Census % of total population

F

0 10 20 30 40 50 60 70 80 90

0

.5

1

M

0 10 20 30 40 50 60 70 80 90

Graphs by SEX (MF)AGE

StARS % of total population Census % of total population

F

0 10 20 30 40 50 60 70 80 90

0

1

2

3

4

M

0 10 20 30 40 50 60 70 80 90

Anderson County (N of Houston) Andrews County (Far west, NM border)

Atascosa County (Southern part of state) Brazos County (W of Houston)

Page 47: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Concluding Thoughts

• Historians of science will say that there was an “explosion” of research into Administrative Records and Data Warehousing in the late 20th/early 21st century

• Using these databases in a statistically-principled way requires a new statistical paradigm:– Not survey sampling per se– Not econometric modeling per se– Not coverage measurement per se– Something new

• These databases have some similar, but many different data quality issues than usual survey or census data

• We are attacking these issues with real Census applications

Page 48: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

For Further Reading• Alvey, W., and Scheuren, F. (1982). Background for an Administrative Records Census. Proceedings of the Social

Statistics Section. Alexandria, VA: American Statistical Association.• Asher, J., and Feinberg, S. (2001). Statistical Variations on an Administrative Records Census. Proceedings of the Social

Statistics Section. Alexandria, VA: American Statistical Association.• Biemer, P., Brown, G., Weisen, C., and Judson, D.H. (2001). Triple system estimation in the presence of erroneous

enumerations. Proceedings of the Social Statistics Section. Alexandria, VA: American Statistical Association. Under review at the Journal of Official Statistics.

• Bye, B. (1997). Administrative Record Census for 2010 Design Proposal, Final Report. Rockville, MD: Westat, Inc.• Bye, B. (1998). Race and ethnicity modeling with SSA Numident Data: Interim report: File development and tabulations.

Unpublished document available from the U.S. Bureau of the Census.• Bryant, C. (1995). Comparing the LUCA address list to “local records.” Paper presented at the 1995 State Data Center

Meeting, San Francisco, CA, April 4, 1995.• Czajka, J., Moreno, L., and Schirm, A.L. (1997). On the Feasibility of Using Internal Revenue Service Records to Count

the U.S. Population. Washington, DC: Mathematica Policy Research, Inc.• Czajka, J. (1999). Can we count on administrative records in future U.S. Censuses? Presentation at the Bureau of the

Census, December 15, 1999.• Falkenstein, Matthew, Resnick, Dean R., and Judson, Dean. H. (2000). The Mortality Module of the Statistical

Administrative Records System. Administrative Records Memorandum Series, U.S. Census Bureau.• Farber, Jim, and Shaw, Kevin M. (2002). Dual System Estimates of Housing Units Based on Administrative Records. To

appear in the 2002 Proceedings of the American Statistical Association, Government Statistics Section [CD-ROM], Alexandria, VA: American Statistical Association.

• Heimovitz, Harley K (2002). Administrative Records Experiment 2000: Outcomes. To appear in the 2002 Proceedings of the American Statistical Association, Government Statistics Section [CD-ROM], Alexandria, VA: American Statistical Association.

• Huang, E., and Kim, J. (2000). One Percent Sample Study Report (SRD-DRAFT). Unpublished document available from the U.S. Bureau of the Census, February 10, 2000.

Page 49: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

For Further Reading• Judson, D.H., and Popoff, C.L. (2000). Research Use of Administrative Records. University of Nevada: Nevada State

Demographer’s Office.• Judson, D. H. (2000). The Statistical Administrative Records System: System Design, Successes, and Challenges. Paper

presented at the 2000 Data Quality Workshop, Morristown, NJ, Nov 30-Dec 1.• Judson, D.H., Popoff, Carole L., and Batutis, Michael (2001). An Evaluation of the Accuracy of U.S. Census Bureau

County Population Estimation Methods. Statistics in Transition, 5:185-215.• Judson, D.H. (2001). A Partial Order Approach to Record Linkage. Paper presented at the Federal Committee on

Statistical Methodology, Washington, DC, November 14, 2001.• Judson, D.H. (2002). Adventures in Bayesian Record Linkage. Paper presented at the Classification Society of North

America, June 11, 2002.• Judson, Dean H. (2002). Merging Administrative Records Databases in the Absence of a Register: Data Quality Concerns

and Outcomes of an Experiment in Administrative Records Use. Paper presented at the UNECE-EUROSTAT work session on registers and administrative records in social and demographic statistics, Geneva, Switzerland, 9-11 December 2002).

• Kim, M. O., and Sater, D. (2000). Defining the Medicare Data Universe for the U.S. Census Bureau's Population Estimates Program. Paper presented at the Southern Demographic Association meetings, New Orleans, LA, August 29, 2000.

• Leggieri, Charlene, and Prevost, Ron (1999). Expansion Of Administrative Records Uses At The Census Bureau: A Long-Range Research Plan. Paper presented at the November 1999 Meeting of the Federal Committee on Statistical Methodology, Washington D.C.

• Miller, E., Judson, D.H., and Sater, D. (2000). The 100% Census NUMIDENT: Demographic Analysis of Modeled Race and Hispanic Origin Estimates Based Exclusively on Administrative Records Data, Paper presented at the Southern Demographic Association meetings, New Orleans, LA, August 29, 2000.

• Popoff, C.L., Judson, D.H., and Fadali, Betsy (2001). Measuring the Number of People Without Health Insurance: A Test of a Synthetic Estimates Approach for Small Area Estimates using SIPP Microdata. Paper presented at the Federal Committee on Statistical Methodology, Washington, DC, November 14, 2001.

Page 50: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

For Further Reading• Sailer, P., Weber, M., and Yau, E. (1993). How Well Can IRS Count the Population? 1993

Proceedings of the Survey Research Methods Section. Alexandria, VA: American Statistical Association.

• Sater, D. (1995). Differences in Location of Households and Tax Filing Units. Paper presented at the 1995 meeting of the Population Association of America, San Francisco, CA, April 6, 1995.

• Stuart, E. and Zaslavsky, A.M. (2002). Using administrative records to predict census day residency. In Constantine Gatsonis, Robert E. Kass, Alicia Carriquiry, Andrew Gelman, David Higdon, Donna K. Pauler, Isabella Verdinelli (Eds.), Case Studies in Bayesian Statistics Volume VI. New York, NY: Springer.

• Thompson, Herbert (1999). The Development of a Gender Model with SSA Numident Data. Administrative Records Research Memorandum Series #32, U.S. Census Bureau.

• Wand, Y., and Wang, R. Y. (1996). Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39: 86-95.

• Zanutto, Elaine, and Zaslavsky, Alan M. (2001). Using Administrative Records to Impute for Nonresponse. In R. Groves, R.J.A. Little, and J.Eltinge (Eds), Survey Nonresponse. New York: John Wiley.

Page 51: The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges Dean H. Judson Planning,

Glossary of Terms• Administrative records: Data collected wherein the primary purpose is to administer a regulation or record a

transaction rather than data collection per se.

• Administrative Records Census: A Census of Population and Housing in which a predominant component of the census-taking is performed by using administrative records databases. In practice, field operations (for example, for coverage measurement or for Group Quarters enumeration) often coincide.

• AREX2000: Administrative Records Experiment in 2000, an experimental attempt to simulate an “Administrative Records Census” in two sites in the U.S.

• Basic Street Address: The primary street number and street name, omitting apartment numbers or other within-structure identifiers.

• CPS: Current Population Survey, an ongoing survey administered by the U.S. Census Bureau.

• Data Quality: The ability to construct a mapping from the ontological representation of a data item in a database to its appropriate ontological representation in the “real world.”

• Master Address File (MAF): A file of addresses maintained by the U.S. Census Bureau for the purpose of taking its decennial census, and acting as a frame for ongoing sample surveys. The Decennial Master Address File is referred to as the “DMAF.”

• Master Housing File: A file of addresses developed by the Statistical Administrative Records System.

• Microdata: Data on individual person or housing characteristics, i.e., race, sex, age, street address, zip code.

• Ontology: The study of “what is”, that is, the categories by which we understand the world.

• StARS: Statistical Administrative Records System, an experimental database that combines information from several major Federal databases into one database that can be used for census-taking purposes.