A Practitioner's Guide to Testing Regional Industrial...

A Practitioner’s Guide to Testing Regional

Industrial Localization

Andrew J Cassey Ben O Smith

School of Economic Sciences Department of EconomicsWashington State University University of Nebraska-Omaha

February 2015

Testing Industrial Localization

I Thesis: Describe software for statistically testing forindustrial localization

I Cassey & Smith (2014) provides theory & method

I Method: Simulation of economy to compare “random”data to real-world data

I Importance: Base activities important in regionaleconomic development planning (Stinson 2006)

I Want to be sure localization is due to legit economic,natural, or political advantages

I Localization: Geographic concentration of industrialemployment beyond general economic activity

I Localization is different from clustering

• Clustering indicates high spatial correlation• Localization more important for development planning

I Natural and political advantages

• Building of ships near water• Drilling of oil• Gambling

I Economic advantages (input pooling)

• Detroit’s auto industry• Bay Area’s software industry

I Small numbers randomness

I Localization is different from clustering• Clustering indicates high spatial correlation• Localization more important for development planning

I Natural and political advantages

• Building of ships near water• Drilling of oil• Gambling

I Natural and political advantages• Building of ships near water• Drilling of oil• Gambling

I Economic advantages (input pooling)• Detroit’s auto industry• Bay Area’s software industry

Measuring Localization

I Location quotient: Compare regional share of industryemployment to regional share of employment

I Problem: Small numbers randomness

• 75% of vacuum manufacturing employment in 4 states• 75% of vacuum manufacturing employment in 4 plants

• If there are only 4 plants, they have to be somewhere!• And those places could just be random

I Is industry x localized given the space?

I Small numbers more problematic as region decreases insize

I Problem: Small numbers randomness• 75% of vacuum manufacturing employment in 4 states• 75% of vacuum manufacturing employment in 4 plants

Clustering: Moran’s I / Geary’s C

I Measures of spatial autocorrelation

I Tests if adjacent observations are correlated

I Popular measure of clustering across spatial boundaries

I Cannot distinguish industrial cluster beyond populationdistribution without correction

I Even so, a limited measure of localization

Clustering: Moran’s I / Geary’s C

I Measures of spatial autocorrelation

I Tests if adjacent observations are correlated

I Popular measure of clustering across spatial boundaries

I Cannot distinguish industrial cluster beyond populationdistribution without correction

I Even so, a limited measure of localization

Clustering is a limited measure of localization

I Must define decay function

I Adjacency matters so partition of space matters

I Suppose one plant in each of the left subregionsNone in right subregions

I High spatial correlation, Moran’s I = 1 =⇒ clustering

I But not as localized as if all plants were in 1 subregion

The Ellison - Glaeser (1997) Index γ

I Corrects for the small numbers problem

I Measures localization beyond that expected from throwsat employment-weighted map

I Comparable across industries, time, partition of space,and levels of geographic aggregation

I Doesn’t distinguish natural or industrial advantages

I More data required than LQ

I E[γ] = 0 if no natural, political, or economic advantages

I National meat packaging: γ = 0.042

• Is this industry localized (beyond that of darts)?• Is its localization statistically different from zero?

• Autos: γ = 0.127• Computers: γ = 0.059

I Can’t analytically solve for confidence intervals

I Can’t compare to regional industries easily

I Nonetheless, EG index widely used in theory and practice

≈ 2000 articles cite E&G on Google Scholar

I National meat packaging: γ = 0.042• Is this industry localized (beyond that of darts)?• Is its localization statistically different from zero?

Program to Calculate Confidence Intervals

I Input specifics of regional economy and industry

I Program simulates 10,000 or more outcomes based onrandomness

I Outputs 95% critical values for the randomly simulatedEG stat

I Compare real-world EG stat to critical values

I If real-world data outside cv then nonrandom clustering

Data Requirements & An Example

I Regional planner: Is forestry clustered in Whatever State?

I Calculate EG stat in data

• Partition Whatever State into its (3) counties• Weight each country by share of employment

12, 18, 35: Share in Co. 1 (x1) is 12/65 = .1846

• No. of plants in forestry in state (not country), 7

• Industry employment in each county8, 3, 3: Share in Co. 1 (s1) is 8/14=.5714

• State plant Herfindahl index

* Data may be available from state government* H =

∑k (plant k employment / state forest

employment)2

6, 2, 2, 1, 1, 1, 1: ( 614 )2 + 2× ( 2

14 )2 + 4× ( 114 )2 = .2449

I Calculate EG stat in data• Partition Whatever State into its (3) counties• Weight each country by share of employment

12, 18, 35: Share in Co. 1 (x1) is 12/65 = .1846

employment)2

6, 2, 2, 1, 1, 1, 1: ( 614 )2 + 2× ( 2

14 )2 + 4× ( 114 )2 = .2449

12, 18, 35: Share in Co. 1 (x1) is 12/65 = .1846

employment)2

6, 2, 2, 1, 1, 1, 1: ( 614 )2 + 2× ( 2

14 )2 + 4× ( 114 )2 = .2449

12, 18, 35: Share in Co. 1 (x1) is 12/65 = .1846

employment)2

6, 2, 2, 1, 1, 1, 1: ( 614 )2 + 2× ( 2

14 )2 + 4× ( 114 )2 = .2449

12, 18, 35: Share in Co. 1 (x1) is 12/65 = .1846

employment)2

6, 2, 2, 1, 1, 1, 1: ( 614 )2 + 2× ( 2

14 )2 + 4× ( 114 )2 = .2449

Example Continued

I Lots of random ways these 7 plants got placed• County 1: 8 employees: (6,2) or (6, 1, 1)• County 2: 3 employees: (2,1) or (2, 1)• County 2: 3 employees: (1, 1, 1) or (2, 1)• All permutations between plants

I Program not bound to match plants with locations indata

• Subregion weight used to assess how likely each plant isto land there (18.46% for county 1)

• Plant of size 6 as likely as plant of size 1 to land incounty 1

Example Continued

I Lots of random ways these 7 plants got placed• County 1: 8 employees: (6,2) or (6, 1, 1)• County 2: 3 employees: (2,1) or (2, 1)• County 2: 3 employees: (1, 1, 1) or (2, 1)• All permutations between plants

I Program not bound to match plants with locations indata• Subregion weight used to assess how likely each plant is

to land there (18.46% for county 1)• Plant of size 6 as likely as plant of size 1 to land in

county 1

How the Program Works & Customization

I Calculate the odds randomness could generate an EG statat least as large as in the real world

I Inputs user-specified data

I Geographic weights

I Number of plants

I Weights may be modified for natural advantages such asoil

I Inputs user-specified options

• Number of iterations - 10,000 default• Level of significance - 95% default (two-tailed)

I Number of plants

I Inputs user-specified options

• Number of iterations - 10,000 default• Level of significance - 95% default (two-tailed)

I Number of plants

I Inputs user-specified options• Number of iterations - 10,000 default• Level of significance - 95% default (two-tailed)

Interpreting the Output

I EG=0.247; H=0.244; LQ=3

Industry Employment Distribution

I Industry’s employment distribution needed but unknownI Herfindahl is best measure

I Program simulates many plausible distributionsI Use Herfindahl data to choose ones to compare to data

Industry Employment Distribution

I Industry’s employment distribution needed but unknownI Herfindahl is best measure

I Program simulates many plausible distributionsI Use Herfindahl data to choose ones to compare to data

I EG=0.247; H=0.244; LQ=3

I Not clustered statistically

I EG=0.247; H=0.244; LQ=3

I Not clustered statistically

Conclusion

I Identifying regional industrial localization important forbasis of community economic development planning

I As the region gets smaller, increased problems with smallnumbers in LQ stat

I Ellison-Glaeser (1997) stat corrects for small numbers

I We provide interpretation and statistical test

I We develop method and provide free software

I Customizable

I Identify localization from natural or economic advantage

I Allows for better economic planning

Conclusion

I Customizable

Conclusion

I Customizable

Conclusion

I Customizable

Installation Instructions

I Does not have graphical user interface

I Works by typing text into the command line

I Available for free

I Unless you’re tech savvy, download binary code

I Download• Binary file: “EGSimulation”• Folder: “Data”• Text file: “command.txt”

http://goo.gl/n1N06

Finding and Opening the App

I Open Terminal

I cd desktop

I ls (see folder)

I cd egsimulation

Running, Understanding, and Personalizing

I tranche.txtI plants.txtI sigma.txtI criticalvalues.txtI size.text, values.txtI 10000I out.csv

The Ellison-Glaeser Stat

I si : share of industry employment in subregion i

I xi : share of total employment in subregion i

I H : the regional industry plant Herfindahl

∑i(si − xi)

2 − (1−∑

i x2i )H

(1−∑

i x2i )(1− H)

I EG controls for N and “lumpiness” of plant employmentas realized in H

I E[EG ] = 0 regardless of M , N , µ, and σ if no advantages

I Other moments DO depend on parameters, but noanalytic solution

Our Simulation

Calculate how likely for γ = c ∀c from randomness alone

I Assume lognormal distribution of firm employment• Used by E&G• Sorta matches overall data (Cabral & Mata 2005)• We specify µ and σ

I Draw fixed number of firms N from employment dist.• We draw to get H given N, µ, σ

I Randomly place firms on employment-weighted US map• We use data for xi as probabilities• We draw to get si

I Calculate γ from this emp draw and placement

I Repeat 10,000 times

I Calculate interval for which 5% of simulated γ are outside

I Change one parameter value: σ and repeat

Location Random Number Generator

I Computers cannot generate truly random numbers

I Standard random generator produces patterns in 2D

Stylized Numerical Results

Result 1: CI width increases in σN = 20 N = 100

.2 .4 .6 .8 1.0 1.2 1.4

Solid: width of 95% mass. Dashed: chance of Type I using γ = 0.05

I Up sloping as σ increases chance of a “fat” dart

Stylized Numerical Results

Result 2: CI width decreases in N

10 20 30 40 50

100 200 300 400 500

Plants Plants

Solid: width of 95% mass. Dashed: chance of Type I using γ = 0.05

I Down sloping as N decreases chance of all plants locatingtogether

A Practitioner's Guide to Testing Regional Industrial...

Documents

Transcript of A Practitioner's Guide to Testing Regional Industrial...