A Practitioner's Guide to Testing Regional Industrial...
Transcript of A Practitioner's Guide to Testing Regional Industrial...
A Practitioner’s Guide to Testing Regional
Industrial Localization
Andrew J Cassey Ben O Smith
School of Economic Sciences Department of EconomicsWashington State University University of Nebraska-Omaha
February 2015
Testing Industrial Localization
I Thesis: Describe software for statistically testing forindustrial localization
I Cassey & Smith (2014) provides theory & method
I Method: Simulation of economy to compare “random”data to real-world data
I Importance: Base activities important in regionaleconomic development planning (Stinson 2006)
I Want to be sure localization is due to legit economic,natural, or political advantages
Testing Industrial Localization
I Thesis: Describe software for statistically testing forindustrial localization
I Cassey & Smith (2014) provides theory & method
I Method: Simulation of economy to compare “random”data to real-world data
I Importance: Base activities important in regionaleconomic development planning (Stinson 2006)
I Want to be sure localization is due to legit economic,natural, or political advantages
Testing Industrial Localization
I Thesis: Describe software for statistically testing forindustrial localization
I Cassey & Smith (2014) provides theory & method
I Method: Simulation of economy to compare “random”data to real-world data
I Importance: Base activities important in regionaleconomic development planning (Stinson 2006)
I Want to be sure localization is due to legit economic,natural, or political advantages
Industrial Localization
I Localization: Geographic concentration of industrialemployment beyond general economic activity
I Localization is different from clustering
• Clustering indicates high spatial correlation• Localization more important for development planning
I Natural and political advantages
• Building of ships near water• Drilling of oil• Gambling
I Economic advantages (input pooling)
• Detroit’s auto industry• Bay Area’s software industry
I Small numbers randomness
Industrial Localization
I Localization: Geographic concentration of industrialemployment beyond general economic activity
I Localization is different from clustering• Clustering indicates high spatial correlation• Localization more important for development planning
I Natural and political advantages
• Building of ships near water• Drilling of oil• Gambling
I Economic advantages (input pooling)
• Detroit’s auto industry• Bay Area’s software industry
I Small numbers randomness
Industrial Localization
I Localization: Geographic concentration of industrialemployment beyond general economic activity
I Localization is different from clustering• Clustering indicates high spatial correlation• Localization more important for development planning
I Natural and political advantages• Building of ships near water• Drilling of oil• Gambling
I Economic advantages (input pooling)
• Detroit’s auto industry• Bay Area’s software industry
I Small numbers randomness
Industrial Localization
I Localization: Geographic concentration of industrialemployment beyond general economic activity
I Localization is different from clustering• Clustering indicates high spatial correlation• Localization more important for development planning
I Natural and political advantages• Building of ships near water• Drilling of oil• Gambling
I Economic advantages (input pooling)• Detroit’s auto industry• Bay Area’s software industry
I Small numbers randomness
Industrial Localization
I Localization: Geographic concentration of industrialemployment beyond general economic activity
I Localization is different from clustering• Clustering indicates high spatial correlation• Localization more important for development planning
I Natural and political advantages• Building of ships near water• Drilling of oil• Gambling
I Economic advantages (input pooling)• Detroit’s auto industry• Bay Area’s software industry
I Small numbers randomness
Measuring Localization
I Location quotient: Compare regional share of industryemployment to regional share of employment
I Problem: Small numbers randomness
• 75% of vacuum manufacturing employment in 4 states• 75% of vacuum manufacturing employment in 4 plants
• If there are only 4 plants, they have to be somewhere!• And those places could just be random
I Is industry x localized given the space?
I Small numbers more problematic as region decreases insize
Measuring Localization
I Location quotient: Compare regional share of industryemployment to regional share of employment
I Problem: Small numbers randomness• 75% of vacuum manufacturing employment in 4 states• 75% of vacuum manufacturing employment in 4 plants
• If there are only 4 plants, they have to be somewhere!• And those places could just be random
I Is industry x localized given the space?
I Small numbers more problematic as region decreases insize
Measuring Localization
I Location quotient: Compare regional share of industryemployment to regional share of employment
I Problem: Small numbers randomness• 75% of vacuum manufacturing employment in 4 states• 75% of vacuum manufacturing employment in 4 plants
• If there are only 4 plants, they have to be somewhere!• And those places could just be random
I Is industry x localized given the space?
I Small numbers more problematic as region decreases insize
Measuring Localization
I Location quotient: Compare regional share of industryemployment to regional share of employment
I Problem: Small numbers randomness• 75% of vacuum manufacturing employment in 4 states• 75% of vacuum manufacturing employment in 4 plants
• If there are only 4 plants, they have to be somewhere!• And those places could just be random
I Is industry x localized given the space?
I Small numbers more problematic as region decreases insize
Clustering: Moran’s I / Geary’s C
I Measures of spatial autocorrelation
I Tests if adjacent observations are correlated
I Popular measure of clustering across spatial boundaries
I Cannot distinguish industrial cluster beyond populationdistribution without correction
I Even so, a limited measure of localization
Clustering: Moran’s I / Geary’s C
I Measures of spatial autocorrelation
I Tests if adjacent observations are correlated
I Popular measure of clustering across spatial boundaries
I Cannot distinguish industrial cluster beyond populationdistribution without correction
I Even so, a limited measure of localization
Clustering is a limited measure of localization
I Must define decay function
I Adjacency matters so partition of space matters
I Suppose one plant in each of the left subregionsNone in right subregions
I High spatial correlation, Moran’s I = 1 =⇒ clustering
I But not as localized as if all plants were in 1 subregion
Clustering is a limited measure of localization
I Must define decay function
I Adjacency matters so partition of space matters
I Suppose one plant in each of the left subregionsNone in right subregions
I High spatial correlation, Moran’s I = 1 =⇒ clustering
I But not as localized as if all plants were in 1 subregion
Clustering is a limited measure of localization
I Must define decay function
I Adjacency matters so partition of space matters
I Suppose one plant in each of the left subregionsNone in right subregions
I High spatial correlation, Moran’s I = 1 =⇒ clustering
I But not as localized as if all plants were in 1 subregion
The Ellison - Glaeser (1997) Index γ
I Corrects for the small numbers problem
I Measures localization beyond that expected from throwsat employment-weighted map
I Comparable across industries, time, partition of space,and levels of geographic aggregation
I Doesn’t distinguish natural or industrial advantages
I More data required than LQ
The Ellison - Glaeser (1997) Index γ
I Corrects for the small numbers problem
I Measures localization beyond that expected from throwsat employment-weighted map
I Comparable across industries, time, partition of space,and levels of geographic aggregation
I Doesn’t distinguish natural or industrial advantages
I More data required than LQ
The Ellison - Glaeser (1997) Index γ
I Corrects for the small numbers problem
I Measures localization beyond that expected from throwsat employment-weighted map
I Comparable across industries, time, partition of space,and levels of geographic aggregation
I Doesn’t distinguish natural or industrial advantages
I More data required than LQ
The Ellison - Glaeser (1997) Index γ
I E[γ] = 0 if no natural, political, or economic advantages
I National meat packaging: γ = 0.042
• Is this industry localized (beyond that of darts)?• Is its localization statistically different from zero?
• Autos: γ = 0.127• Computers: γ = 0.059
I Can’t analytically solve for confidence intervals
I Can’t compare to regional industries easily
I Nonetheless, EG index widely used in theory and practice
≈ 2000 articles cite E&G on Google Scholar
The Ellison - Glaeser (1997) Index γ
I E[γ] = 0 if no natural, political, or economic advantages
I National meat packaging: γ = 0.042• Is this industry localized (beyond that of darts)?• Is its localization statistically different from zero?
• Autos: γ = 0.127• Computers: γ = 0.059
I Can’t analytically solve for confidence intervals
I Can’t compare to regional industries easily
I Nonetheless, EG index widely used in theory and practice
≈ 2000 articles cite E&G on Google Scholar
The Ellison - Glaeser (1997) Index γ
I E[γ] = 0 if no natural, political, or economic advantages
I National meat packaging: γ = 0.042• Is this industry localized (beyond that of darts)?• Is its localization statistically different from zero?
• Autos: γ = 0.127• Computers: γ = 0.059
I Can’t analytically solve for confidence intervals
I Can’t compare to regional industries easily
I Nonetheless, EG index widely used in theory and practice
≈ 2000 articles cite E&G on Google Scholar
The Ellison - Glaeser (1997) Index γ
I E[γ] = 0 if no natural, political, or economic advantages
I National meat packaging: γ = 0.042• Is this industry localized (beyond that of darts)?• Is its localization statistically different from zero?
• Autos: γ = 0.127• Computers: γ = 0.059
I Can’t analytically solve for confidence intervals
I Can’t compare to regional industries easily
I Nonetheless, EG index widely used in theory and practice
≈ 2000 articles cite E&G on Google Scholar
Program to Calculate Confidence Intervals
I Input specifics of regional economy and industry
I Program simulates 10,000 or more outcomes based onrandomness
I Outputs 95% critical values for the randomly simulatedEG stat
I Compare real-world EG stat to critical values
I If real-world data outside cv then nonrandom clustering
Program to Calculate Confidence Intervals
I Input specifics of regional economy and industry
I Program simulates 10,000 or more outcomes based onrandomness
I Outputs 95% critical values for the randomly simulatedEG stat
I Compare real-world EG stat to critical values
I If real-world data outside cv then nonrandom clustering
Program to Calculate Confidence Intervals
I Input specifics of regional economy and industry
I Program simulates 10,000 or more outcomes based onrandomness
I Outputs 95% critical values for the randomly simulatedEG stat
I Compare real-world EG stat to critical values
I If real-world data outside cv then nonrandom clustering
Program to Calculate Confidence Intervals
I Input specifics of regional economy and industry
I Program simulates 10,000 or more outcomes based onrandomness
I Outputs 95% critical values for the randomly simulatedEG stat
I Compare real-world EG stat to critical values
I If real-world data outside cv then nonrandom clustering
Data Requirements & An Example
I Regional planner: Is forestry clustered in Whatever State?
I Calculate EG stat in data
• Partition Whatever State into its (3) counties• Weight each country by share of employment
12, 18, 35: Share in Co. 1 (x1) is 12/65 = .1846
• No. of plants in forestry in state (not country), 7
• Industry employment in each county8, 3, 3: Share in Co. 1 (s1) is 8/14=.5714
• State plant Herfindahl index
* Data may be available from state government* H =
∑k (plant k employment / state forest
employment)2
6, 2, 2, 1, 1, 1, 1: ( 614 )2 + 2× ( 2
14 )2 + 4× ( 114 )2 = .2449
Data Requirements & An Example
I Regional planner: Is forestry clustered in Whatever State?
I Calculate EG stat in data• Partition Whatever State into its (3) counties• Weight each country by share of employment
12, 18, 35: Share in Co. 1 (x1) is 12/65 = .1846
• No. of plants in forestry in state (not country), 7
• Industry employment in each county8, 3, 3: Share in Co. 1 (s1) is 8/14=.5714
• State plant Herfindahl index
* Data may be available from state government* H =
∑k (plant k employment / state forest
employment)2
6, 2, 2, 1, 1, 1, 1: ( 614 )2 + 2× ( 2
14 )2 + 4× ( 114 )2 = .2449
Data Requirements & An Example
I Regional planner: Is forestry clustered in Whatever State?
I Calculate EG stat in data• Partition Whatever State into its (3) counties• Weight each country by share of employment
12, 18, 35: Share in Co. 1 (x1) is 12/65 = .1846
• No. of plants in forestry in state (not country), 7
• Industry employment in each county8, 3, 3: Share in Co. 1 (s1) is 8/14=.5714
• State plant Herfindahl index
* Data may be available from state government* H =
∑k (plant k employment / state forest
employment)2
6, 2, 2, 1, 1, 1, 1: ( 614 )2 + 2× ( 2
14 )2 + 4× ( 114 )2 = .2449
Data Requirements & An Example
I Regional planner: Is forestry clustered in Whatever State?
I Calculate EG stat in data• Partition Whatever State into its (3) counties• Weight each country by share of employment
12, 18, 35: Share in Co. 1 (x1) is 12/65 = .1846
• No. of plants in forestry in state (not country), 7
• Industry employment in each county8, 3, 3: Share in Co. 1 (s1) is 8/14=.5714
• State plant Herfindahl index
* Data may be available from state government* H =
∑k (plant k employment / state forest
employment)2
6, 2, 2, 1, 1, 1, 1: ( 614 )2 + 2× ( 2
14 )2 + 4× ( 114 )2 = .2449
Data Requirements & An Example
I Regional planner: Is forestry clustered in Whatever State?
I Calculate EG stat in data• Partition Whatever State into its (3) counties• Weight each country by share of employment
12, 18, 35: Share in Co. 1 (x1) is 12/65 = .1846
• No. of plants in forestry in state (not country), 7
• Industry employment in each county8, 3, 3: Share in Co. 1 (s1) is 8/14=.5714
• State plant Herfindahl index
* Data may be available from state government* H =
∑k (plant k employment / state forest
employment)2
6, 2, 2, 1, 1, 1, 1: ( 614 )2 + 2× ( 2
14 )2 + 4× ( 114 )2 = .2449
Example Continued
I Lots of random ways these 7 plants got placed• County 1: 8 employees: (6,2) or (6, 1, 1)• County 2: 3 employees: (2,1) or (2, 1)• County 2: 3 employees: (1, 1, 1) or (2, 1)• All permutations between plants
I Program not bound to match plants with locations indata
• Subregion weight used to assess how likely each plant isto land there (18.46% for county 1)
• Plant of size 6 as likely as plant of size 1 to land incounty 1
Example Continued
I Lots of random ways these 7 plants got placed• County 1: 8 employees: (6,2) or (6, 1, 1)• County 2: 3 employees: (2,1) or (2, 1)• County 2: 3 employees: (1, 1, 1) or (2, 1)• All permutations between plants
I Program not bound to match plants with locations indata• Subregion weight used to assess how likely each plant is
to land there (18.46% for county 1)• Plant of size 6 as likely as plant of size 1 to land in
county 1
How the Program Works & Customization
I Calculate the odds randomness could generate an EG statat least as large as in the real world
I Inputs user-specified data
I Geographic weights
I Number of plants
I Weights may be modified for natural advantages such asoil
I Inputs user-specified options
• Number of iterations - 10,000 default• Level of significance - 95% default (two-tailed)
How the Program Works & Customization
I Calculate the odds randomness could generate an EG statat least as large as in the real world
I Inputs user-specified data
I Geographic weights
I Number of plants
I Weights may be modified for natural advantages such asoil
I Inputs user-specified options
• Number of iterations - 10,000 default• Level of significance - 95% default (two-tailed)
How the Program Works & Customization
I Calculate the odds randomness could generate an EG statat least as large as in the real world
I Inputs user-specified data
I Geographic weights
I Number of plants
I Weights may be modified for natural advantages such asoil
I Inputs user-specified options• Number of iterations - 10,000 default• Level of significance - 95% default (two-tailed)
Interpreting the Output
I EG=0.247; H=0.244; LQ=3
Industry Employment Distribution
I Industry’s employment distribution needed but unknownI Herfindahl is best measure
I Program simulates many plausible distributionsI Use Herfindahl data to choose ones to compare to data
Industry Employment Distribution
I Industry’s employment distribution needed but unknownI Herfindahl is best measure
I Program simulates many plausible distributionsI Use Herfindahl data to choose ones to compare to data
Interpreting the Output
I EG=0.247; H=0.244; LQ=3
I Not clustered statistically
Interpreting the Output
I EG=0.247; H=0.244; LQ=3
I Not clustered statistically
Conclusion
I Identifying regional industrial localization important forbasis of community economic development planning
I As the region gets smaller, increased problems with smallnumbers in LQ stat
I Ellison-Glaeser (1997) stat corrects for small numbers
I We provide interpretation and statistical test
I We develop method and provide free software
I Customizable
I Identify localization from natural or economic advantage
I Allows for better economic planning
Conclusion
I Identifying regional industrial localization important forbasis of community economic development planning
I As the region gets smaller, increased problems with smallnumbers in LQ stat
I Ellison-Glaeser (1997) stat corrects for small numbers
I We provide interpretation and statistical test
I We develop method and provide free software
I Customizable
I Identify localization from natural or economic advantage
I Allows for better economic planning
Conclusion
I Identifying regional industrial localization important forbasis of community economic development planning
I As the region gets smaller, increased problems with smallnumbers in LQ stat
I Ellison-Glaeser (1997) stat corrects for small numbers
I We provide interpretation and statistical test
I We develop method and provide free software
I Customizable
I Identify localization from natural or economic advantage
I Allows for better economic planning
Conclusion
I Identifying regional industrial localization important forbasis of community economic development planning
I As the region gets smaller, increased problems with smallnumbers in LQ stat
I Ellison-Glaeser (1997) stat corrects for small numbers
I We provide interpretation and statistical test
I We develop method and provide free software
I Customizable
I Identify localization from natural or economic advantage
I Allows for better economic planning
Installation Instructions
I Does not have graphical user interface
I Works by typing text into the command line
I Available for free
I Unless you’re tech savvy, download binary code
I Download• Binary file: “EGSimulation”• Folder: “Data”• Text file: “command.txt”
Finding and Opening the App
I Open Terminal
I cd desktop
I ls (see folder)
I cd egsimulation
Running, Understanding, and Personalizing
I tranche.txtI plants.txtI sigma.txtI criticalvalues.txtI size.text, values.txtI 10000I out.csv
The Ellison-Glaeser Stat
I si : share of industry employment in subregion i
I xi : share of total employment in subregion i
I H : the regional industry plant Herfindahl
EG =
∑i(si − xi)
2 − (1−∑
i x2i )H
(1−∑
i x2i )(1− H)
I EG controls for N and “lumpiness” of plant employmentas realized in H
I E[EG ] = 0 regardless of M , N , µ, and σ if no advantages
I Other moments DO depend on parameters, but noanalytic solution
Our Simulation
Calculate how likely for γ = c ∀c from randomness alone
I Assume lognormal distribution of firm employment• Used by E&G• Sorta matches overall data (Cabral & Mata 2005)• We specify µ and σ
I Draw fixed number of firms N from employment dist.• We draw to get H given N, µ, σ
I Randomly place firms on employment-weighted US map• We use data for xi as probabilities• We draw to get si
I Calculate γ from this emp draw and placement
I Repeat 10,000 times
I Calculate interval for which 5% of simulated γ are outside
I Change one parameter value: σ and repeat
Location Random Number Generator
I Computers cannot generate truly random numbers
I Standard random generator produces patterns in 2D
Stylized Numerical Results
Result 1: CI width increases in σN = 20 N = 100
σ
.2 .4 .6 .8 1.0 1.2 1.4
.00
.05
.10
.15
.20
.25
.30
σ
.2 .4 .6 .8 1.0 1.2 1.4
.00
.02
.04
.06
.08
.10
Solid: width of 95% mass. Dashed: chance of Type I using γ = 0.05
I Up sloping as σ increases chance of a “fat” dart
Stylized Numerical Results
Result 2: CI width decreases in N
10 20 30 40 50
.00
.05
.10
.15
.20
.25
.30
100 200 300 400 500
.000
.005
.010
.015
.020
.025
.030
.035
Plants Plants
Solid: width of 95% mass. Dashed: chance of Type I using γ = 0.05
I Down sloping as N decreases chance of all plants locatingtogether