apdu.orgTranslate this pageapdu.org/wp-content/uploads/2011/12/2011-01-27_Research...ÐÏ à¡± á>...

Post on 01-May-2018

240 views 1 download

Transcript of apdu.orgTranslate this pageapdu.org/wp-content/uploads/2011/12/2011-01-27_Research...ÐÏ à¡± á>...

The Employer Universe: The Business Register and the

Longitudinal Business Database

Javier Miranda, U.S. Census Bureau

Census Research Data Center Network

Overview

• Employer universe business data at the US Census Bureau

• Research • Public use versions

2

The Business Register: The Census Bureau’s Business Master List

• Universe coverage of employers in the U.S. with IRS filings• Transaction list of administrative records (income, payroll)• Enhanced with Census Collections to provide detail

• Origin and Use• Enumeration list for census and frame for surveys• Central storage of admin data for statistical products• Source data for Census products (CBP, LBD, BDS, BITS…)

• Structure: • Annual snapshots back to 1974, Single/Multi unit files

• Statistical Units: • EIN (the admin unit), Establishments and firms

3

The Business Register: The Census Bureau Business Master List

• Data in the BR• Industry, Geography, Employment, Payroll, LFO, Sales,

Name and Address…

• Data often require substantial value added to be utilized for research.

• Solution: The LBD

4

Longitudinal Business Database (LBD)• Longitudinal Universe Database of US Employer Business

Establishments• Uses Census Business Register longitudinal linkages of both

firms and establishments– Census uniquely tracks firms and establishments through Company

Organization Survey and Economic Censuses (and other surveys)• All employers in the U.S.

– Complete sectoral coverage– Detailed geography and industry– Basic backbone to which all other Census business data can be linked

• Long time series 1976-2008• Firm and establishment characteristics

• Including size and age. Age is critical to understanding dynamics and entrepreneurship.

5

LBD: Large vs Small vs Young

Important to put job creation and destruction in context…

Small (1-500) Large (500+)

6

LBD: Life cycle dynamics of businesses and who creates jobs

Net Employment Growth for Continuing Firms by Firm Age

0

0.05

0.1

0.15

0.2

Firm Age Class

Age Only With Base Year Size Controls

With Current Year Size Controls

7

LBD: Life cycle dynamics of businesses and who destroys jobs

0

0.05

0.1

0.15

Firm Age Class

Job Destruction from Firm Exit by Firm Age

Age Only

With Base Year Size Controls

With Current Year Size Controls

8

Census Data: Productivity Growth

Productivity Relative to Mature Surviving Incumbents

-32%

-27%

3%5%

-35%

-30%

-25%

-20%

-15%

-10%

-5%

0%

5%

10%

Young Exits Mature Exits Young Survivors Young Survivors FiveYears Later

Exits: Young & Mature Young Survivors

9

LBD: The effect of business cycle dynamics and credit conditions on firms and job creation

11

-4.000

-3.000

-2.000

-1.000

0.000

Net Job Creation- Effects of Business Cycle* by Firm Size, 1981-2008

Urate (ind*yr fixed effects) Urate Urate (by itself)

-0.050

0.000

0.050

0.100

0.150

0.200

Net Job Creation- Effects of FHFA* Prices by Firm Size, 1981-2008

FHFA (ind*yr fixed effects) FHFA FHFA (by itself)

Forms of financing differForms of financing differfor small and large firms…for small and large firms…

Private equity

0

200

400

600

800

1000

1200

1400

1600

1800

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

19801981198219831984198519861987198819891990199119921993199419951996199719981999200020012002200320042005

Thou

sand

s

Employment under Buyout Targets: By Year and as a Percent of Economy

Percent of LBD Employment Buyout Employment (right axes)

Large growth of private equity since 1980’s

Net loss of jobs but consistent with restructuring and creative destruction

LBD: Entrepreneurial activity over time and across states

Entrepreneurial activity differs across states…

13

Available Data

• Confidential Microdata only through the RDCs

LBD: Public Use Products• Business Dynamics Statistics

– Basic data by firm size and age across sectors, states and time. Expansions to detailed ind and geography.

– Data visualizations available– http://www.ces.census.gov/index.php/bds/bds_home

• Synthetic LBD (ver. 1) – public use microdata– To be deployed this coming year via the Cornell Virtual

RDC (http://www.vrdc.cornell.edu/news/data/lbd-synthetic-data/)

• Sister program: ILBD

15

Summary

• Very rich data by itself and when linked to other products

• A NAS study “Understanding Business Dynamics” discusses the importance of these data for accurate and timely measurement of critical economic and social concepts

• Lots of research opportunities…

More information about LBD and BDS can be found at

Center for Economic Studies

http://www.ces.census.gov

You can email me at Javier.miranda@census.gov

An Overview of Data from the Economic Directorate

Shawn D. KlimekU.S. Census Bureau

21

The Business Register

• The primary frame for establishment and firm level surveys

• Identifying Business Units– Employer Identification Number (EIN)– Survey Unit Identifier (SURVUID, CFN)– Firm Identifier (ALPHA, FIRMID)– Social Security Number (SSN, PIK)

22

Hierarchy of BR Identifiers

ENTERPRISE(1234560000)

SURVUID1(EIN1)

SURVUID2(EIN2)

SURVUID3(EIN2)

23

Structure of Identifiers

• Census File Numbers (CFN, pre-2001)– Single Units (0+EIN)– Multi-units (ALPHA+PLANT #, e.g. 123456 0001)

• Survey Unit ID (2002 and later)– 2XXXXXXXXX (Survey Unit)– 8XXXXXXXXX (Alternate Reporting Unit)

• Firm ID– SU (0+EIN)– MU (ALPHA+0000)

24

Business Register Data Sources

• NAICS Industry Codes– Economic Census and Surveys– Bureau of Labor Statistics– Internal Revenue Service– Social Security Administration

25

Business Register Data Sources• Firm Ownership & Control

– Annual Survey of Manufactures (~50,000 estabs)– Company Organization Survey

• Annual• Firms >250 employees (40,000 firms)• List of establishments, basic frame information

– Economic Census• Every 5 years• All MU establishments (~1.6 million)• Sample of SU firms (~2.9 million)

– Long/short forms (1.9 million)– Classification forms (1 million)

26

Business Register Data Sources

• Geography– Address

• Census Physical Address– Company Organization Survey– Economic Census– Other surveys

• IRS Mailing Address• BLS Mailing Address

27

2007 Economic Census

• Mailed 4.5 out of 7 million establishments– All MU establishments– Sample of SU establishments– 86% response rate overall

• Roughly 600 forms designed• Roughly 1200 NAICS industries

28

“Division” of Labor• Manufacturing Construction Division (MCD)

– Manufacturing– Mining– Construction

• Service Sector Statistics Division (SSSD)– Retail– Wholesale– Services– Communications, Utilities & Transportation

29

2007 Economic Census Timeline

• Collection Activities– October 2007 to October 2008

• Publications– Advance Report – early 2009– Industry Series – December 2009– Geographic Areas Series – December

2010– Miscellaneous Subject Series – June 2011

30

The most detailed snapshot of the economy• 20,000+ items collected• Basic Data Items – e.g. Payroll, Employment,

Revenue• Industry

– Six-digit NAICS (~1,100)• Geography

– State & County (~3,100)– MSAs (~900)– Census Places (~5,000 out of 18,000)

• Products & Revenue Lines• Special Inquiries

31

2012 Economic Census Changes

• Proposed expansion of geography – publishing as many Census Places as feasible

• North American Product Classification System – changes to manufacturing, retail, and wholesale product detail.

• NAICS 2012 – significant reduction in the number of manufacturing industries (~260 down from ~470)

• Manufacturing “type of operations” may be coming– Integrated Manufacturer– Contract Manufacturer– Factoryless Goods Producers

32

2012 Enterprise Statistics Program

• Intellectual Property Revenue– All Multi-unit firms– Sample of Single-unit firms (100,000)– Different types of revenue

• Royalties• Licensing Fees• Franchising

• Manufacturing Activities– Outsourcing– Offshoring

33

Business Sample Revision (BSR)

• Derived from the Business Register• Frame for Services Sector Statistics Division

(SSSD)– Services: Quarterly & Annual, Expenses– Retail: Monthly & Annual, Expenses– Wholesale: Monthly & Annual, Expenses

• Sampling Units– Firm Level, Industry Units– EIN Level, Industry Units

34

Complications – e.g. productivity

• Outside of manufacturing we collect input data in a number of programs– Capital

• Annual Capital Expenditures Survey (ACES)• Firm Level

– Employment & Payroll• Business Register, Census, Annuals

– Other inputs (e.g. inventories, benefits)• Annuals• BSR Units

• Relatively few projects request these data, but replacement of the Assets and Expenditures Survey (1992) and Business Expenses Survey (BES) with the Annuals means demand should be increasing.

35

Concluding Remarks

• Core Programs for research– Business Register– Economic Census– Annual Survey of Manufacturers

• Many other programs…– Company Organization Survey– Indicators (M3, Retail, Wholesale)

Longitudinal Employer-Household Dynamics (LEHD) Program

A Dynamic Data Source for the 21st Century

Erika McEntarferLEHD Economic Research Group

Center for Economic Studies, U.S. Census Bureau

Disclaimer: All data examples are fictional and do not reflect any individual or firm data. Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau.

37

What is LEHD?

• At its core, LEHD is a National Longitudinal Job Frame

– Based on UI-Wage and other administrative data sources

• Primary Products– Public use products: QWI, OnTheMap– Rich micro data for research in the RDCs

38

Where does LEHD fit within the Census Bureau’s data infrastructure?

• The Census Bureau maintains national frames of household and business establishments

• Household Frame: Master Address File• Decennial Census, ACS, CPS, SIPP, etc.

• Establishment Frame: Business Register• Economics Census, Monthly and Annual

Surveys, Longitudinal Business Database, County Business Patterns, etc

39

LEHD is a national jobs frame

• Jobs are the unit of analysis in LEHD data– Jobs are an employer – employee pair for a given time

period• Integrate with

– Person and Household Data via “employee” information– Establishment and Firm data via “employer” information

• Integration permits:– Improved Public Use Products– Richer Microdata for Research (via the Research Data

Centers)

40

• Leverage existing data

• Create new data and products

• Make valid detailed data available while protect confidentiality

• Cost-effective• No respondent

burdenLongitudinal National

Frame of Jobs

New data and products

The Concept – Data Integration

41

Local Employment Dynamics A voluntary partnership between the states and the U.S. Census Bureau States supply quarterly worker (UI wage) and business (QCEW) records Census Bureau merges the state records with other data to produce new data and products about jobs, workers, industries and your local economy

42

LEHD microdata available for research in the RDCs

Employment History File (EHF)

PIK SEIN Q1 Q2 Q3 Q4 Q5Person1 Firm A 7000 7000 3000 0 0

Person1 Firm B 0 0 4000 8000 8000

Person2 Firm A 500 0 0 0 0

Person2 Firm D 0 1000 1000 0 0

Person2 Firm F 0 0 3000 4000 4000

Changes jobs in Q3

Unit of observation is a jobUniverse is jobs covered by State UI

43

LEHD microdata available for research in the RDCs

Employer Characteristics File (ECF)

SEIN SEINUNIT Qtr Industry M1size M2size M3sizeFirmA Unit1 1 333333 302 335 330

FirmA Unit2 1 666111 4030 4032 4031

FirmA Unit3 1 444222 20 23 21

FirmB Singleunit 1 771111 1 1 0

FirmC Singleunit 1 666622 5 7 7

Unit of observation is a State UI taxpayer ID

Universe is employers reporting QCEW data

44

LEHD microdata available for research in the RDCs

Individual Characteristics File (ICF)

PIK DOB Sex RacePerson1 MM/DD/YYYY M/F Race1

Person2 MM/DD/YYYY M/F Race4

Person3 MM/DD/YYYY M/F Race1

Demographic information from Census surveys and SSA administrative data.

Unit of observation is a Person ID (PIK)

45

Linking the data for analysis

PIK SEIN Q1 Q2 Q3

Person1 Firm A 7000 7000 3000

Person1 Firm B 0 0 4000

Person2 Firm A 500 0 0

Person2 Firm D 0 1000 1000

Person2 Firm F 0 0 3000

U2W: imputes PIK -> SEINUNIT

EHF

SEIN SEINUNIT Qtr Industry M1size M2size M3size

FirmA Unit1 1 333333 302 335 330

FirmA Unit2 1 666111 4030 4032 4031

FirmA Unit3 1 444222 20 23 21

PIK DOB Sex Race

Person1 1/3/73 F White

Person2 3/1/37 F Asian

ECF

ICF

Geo-coded Address List: Person and Firm address data

46

LEHD microdata available for research in the RDCs

• Employment History Files• PIK-level file, wage and employment history

• Employer Characteristics Files• SEIN-level file, information on employers

• Individual Characteristics File• Worker characteristics

• Geo-coded Address List• SEIN and PIK addresses

• Unit-to-Worker Imputation File• Impute from SEIN to establishment

• Business Register Bridge

47

• Business formation is critical for job and productivity growth• New firms are often small, sole proprietors and an important

fraction start as micro-enterprises (non-employer firms) • By integrating LEHD microdata with business microdata data

researchers can track business startups.– Where did the entrepreneur come from?

• What type of firm was entrepreneur working at?• Are some business types and locations especially effective incubators

of new firms?– What kinds of jobs do start-ups create?

• What kind of job paths are there at successful startups?• Do workers at startups come from the community or are the workers

migrants?

Questions for Research: ExampleBusiness Formation and Innovation

48

Questions for Research: ExampleDisplaced worker outcomes

• What happens to the workers at establishments that have mass layoff events?– LEHD data allow researchers to follow these workers to their

subsequent jobs – Can examine their wage outcomes and the characteristics of the

businesses that reemploy them.

• Tracking employment outcomes for workers who are displaced• How long does it take to become re-employed?• What types of jobs are they hired into (location,

industry)?• What are the earnings outcomes?

49

Summary: Research using LEHD data in RDCs• LEHD microdata offer many unique advantages for

economic research:• Longitudinal linked employer-employee data• Follow employment histories of workers • Can identify nascent firms and follow them over time• Can identify co-workers

• Ability to link at the micro (individual, household, establishment, firm) level records from different census, survey and administrative programs, as well as researcher provided data.– Dramatically increases the analytical power of the

data.

Newly Discovered Microdata onU.S. Manufacturing Plants from the 1950s and 1960s

Randy A. Becker and Cheryl A. GrimCenter for Economic StudiesU.S. Bureau of the Census

January 2011

Disclaimer

The opinions and conclusions expressed here today are our own and do not necessarily represent the views of the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed.

Unisys Clearpath IX 4400

UNIVAC I and UNIVAC 1105

Unisys Clearpath IX 4400

Historical Microdata Recovery from Unisys Mainframe

Challenges– Arcane, proprietary file format (CENIO)– Data completely unstructured

• And record layout may no longer exist.– Employed one or more (now) esoteric character sets (not ASCII)

• E.g., FIELDATA, Excess-3, EBCDIC, Binary integer numericRecovered before decommissioning in Spring 2010:

– Over 2,500 tapes containing more than 7,000 files– Business microdata

• Covering nearly all sectors of the economy, including manufacturing, mining, retail, wholesale, services, construction, transportation, and agriculture.

• From as early as 1953 (and perhaps 1947)

Making the Data “Usable”For technical reasons, the data were downloaded from the Unisys in

two forms:– Assuming the data was all Excess-3– Assuming the data was all FIELDATA

If the data is in another character set (e.g., Binary integer numeric) the “gibberish” in the above FIELDATA file must be converted to ASCII using the implied mapping.

Challenges in creating ASCII and SAS datasets:– Data in a record might employ multiple character sets (e.g.,

both Excess-3 & Binary, depending on variable).– Record layout may no longer exist (explaining variables, their

lengths, and character set employed), or it may be wrong.• An electronic file containing the first 100 records using 5 different

assumptions of character set can help reveal variable length & character set.

• Existing microdata (and published data) can potentially help in determining what variables are.

Getting Access to these DataIf research does NOT hinge critically on recovered data

– Submit CES proposal as usual, requesting recovered data• Development of these data can be cited as a benefit• Within 6 months, the researcher provides internal technical note

concerning the data quality, cleaned-up datasets, programs, and documentation

If research DOES hinge critically on recovered data– Feasibility access, provide a brief description of intended research

• Researcher will obtain SSS clearance• Access only to historical data of interest and (if requested) analogous

data for later years for quality assurance purposes

• RDC lab fees are waived• The project has no public output• Within 6 months, the researcher provides internal technical note

concerning the data quality, cleaned-up datasets, programs, and documentation

– Research access, submit CES proposal as usual• Feasibility work can be cited as a benefit

Current U.S. Manufacturing MicrodataAnnual Survey of Manufactures (ASM)

– 1972 to present

Census of Manufactures (CM)– 1963, 1967, and every 5 years thereafter

Longitudinally-consistent establishment identifiers– Plant entry, exit, growth, change

Cross-sectional links to data from other surveys of manufactures– R&D, PACE, MECS, PCU, SMT

Expansion of These Data Now PossibleIndividual ASMs from 1954-1964 and 1966-1971.Longitudinally-linked, plant-level ASM data for:

– 25 selected 4-digit SIC industries– 1954-1961– Perhaps through 1963 and perhaps back to 1947.

Hundreds of rolls of 16 mm microfilm containing images of completed survey forms from:– 1954-1958 ASM– 1958 CM

Decades-old research datasets by Richard & Nancy Ruggles, Zvi Griliches, and Lawrence Klein.

1954-1958 ASM Shuttle Form

YearsData items Number of observations

Match rate: 91%Weighted match rate: 97%Same SIC & county:~ 100%

Match rate: 48%Weighted match rate: 66%

Match rate: 83%Weighted match rate: 93%Same SIC & county:91%

Match rate: 99.9%Weighted match rate: forthcomingYear-to-year match rate:78% to 98%

Conclusion

Much more work needs to be done– Constructing linkages– Differentiating between multiple versions of same– “Proving in” the data (e.g., tab to publish totals)

Why is this worthwhile?– Data over a few more business cycles– New baselines, for example:

• The “heyday” of U.S. manufacturing • Before 1970s energy crisis• Before environmental regulation

Start to make plans to use these data!

Questions?

randy.a.becker@census.gov

cheryl.ann.grim@census.gov

www.census.gov/ces/

70

A Guide to the Proposal Process and Using and RDC

James C. DavisBoston Census Research Data CenterCenter for Economic StudiesUS Bureau of the Census

Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed.

71

Agenda

• Process for accessing restricted-use data– Research Data Center (RDC)– Using an RDC– Proposal Process– Research Examples

72

Research Data Center (RDC)

• Census Bureau – university partnerships– RDC fees

• Secure access to confidential microdata– Thin client access to Census linux servers– Census Bureau and other Federal statistical data

• Authorized researchers on approved projects– Proposal– RDC analysis– Statistical estimates disclosure

73

74

Why Restrict Microdata Access?

Titles 13 (Census) /26 (IRS) U.S.C. and CIPSEA protect confidentiality– respondent cannot be identified– only Census employees and temporary staff

can access microdata– use limited to statistical purpose– access must potentially provide legitimate

benefits to Census Bureau programs

75

Proposal Process • Preliminary proposal

– www.ces.census.gov• Proposal development

– Involve RDC staff• Census Review

– Feasibility– Requirement of benefits to Census– Scientific merit– Statistical purpose– Need for non-public data– Risk of disclosure– Availability of resources

• Other Agency Review• Special Sworn Status application

76

Example Proposal Outline

• Overview • Benefits to Census • Methodology

– Estimating equations• Required Data• Expected Output• Duration and Funding

77

9 Criteria for Benefits• Understanding/improving the quality of data • Leading to new or improved methodology to collect,

measure, or tabulate • Enhancing the data collected (e.g. improving

imputations for non-response, developing links across time or entities)

• Identifying limitations/improving the Business Register

• Documenting new data collection needs • Constructing, verifying, improving sampling frames• Preparing estimates/characteristics of population• Developing methodology for estimating non-

response • Developing statistical weights for a survey

78

Data AvailabilityCensus Bureau Data

– Economic Data• establishment or firm level

– Demographic Data• household or individual level

– Combined Econ/Demo Data• Longitudinal Employer-Household Dynamics (LEHD)

Other Agency Data– National Center for Health Statistics (NCHS)– Agency for Healthcare Research and Quality

(AHRQ)

79

RDC Economic Data Advantages

• No publicly-available microdata– Internal data at establishment and firm level– Universal scope– Detailed industry and geography

• Linking Data– Consistent identifiers– Business register

• External data

80

Economic Research Examples

• Bernard, Redding, Schott– (2010), “Multiple-Product Firms and Product

Switching,” American Economic Review– (forthcoming), “Multi-Product Firms and Trade

Liberalization,” Quarterly Journal of Economics• Census of Manufacturers, Longitudinal Business

Database, Business Register• One half of firms alter their mix of products every five

years• Firms exporting many products also serve many export

destinations and export more of a given product to a given destination

81

Economic Research Examples

• Ellison, Glaeser, Kerr (2010), “What Causes Industry Agglomeration? Evidence from Coagglomeration Patterns,” American Economic Review– Economic Census and LBD– Construct pairwise coagglomeration

indices for US manufacturing industries– Relate coagglomeration levels to the

degree to which industry pairs share goods, labor, or ideas

82

Economic Research Examples

• Greenstone, Hornbeck, Moretti (2010), “Identifying Agglomeration Spillovers: Evidence from Winners and Losers of Large Plant Openings,” Journal of Political Economy– Economic Census and LBD– Winning and losing counties have similar trends in

incumbents’ TFP prior to a large new plant opening.

– Five years after the opening, incumbent plants’ TFP is 12 percent higher in winning counties.

83

Economic Research Examples

• Chemmanur, He, Nandy (2010), “The Going Public Decision and the Product Market,” Review of Financial Studies– Longitudinal Business Database (LBD), Census of

Manufacturers, Annual Survey of Manufacturers– A private firm’s characteristics (e.g. TFP, sales

growth) significantly affect its likelihood of going public after controlling for its access to private financing

– IPOs of firms occur at the peak of their productivity cycle

84

Conclusions• Start the process early• Use standard data sets if time-constrained• Write proposals geared towards multiple papers• Use proposal development as research time

– Understand the data & data limitations– Read on-line documentation

• CES Working Papers• Sampling Methodology/Survey Forms• History of the Economic Census

• Time and data requests are crucial components – adding data and/or time is difficult for Census projects once underway

• Remember that the Predominant Purpose is to benefit Census

• www.ces.census.gov