The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and...

74
The Geography of Business Dynamism and Skill-Biased Technical Change Hannah Rubinton December 4, 2019 Click here for the latest version Abstract This paper seeks to explain three key components of the growing regional disparities in the U.S. since 1980, referred to as the Great Divergence by Moretti (2012). Namely, big cities saw a larger increase in the relative wages of skilled workers, a larger increase in the relative supply of skilled workers, and a smaller decline in business dynamism. These trends can be explained by dierences across cities in the extent to which firms adopt new skill-biased technologies. In response to the introduction of a new skill-biased, high fixed cost but low marginal cost technology, firms endogenously adopt more in big cities, cities that oer abundant amenities for high-skilled workers and cities that are more productive in using high-skilled labor. The dierences in adoption can account for the increasing relationship between skill intensity and city size, the divergence of the city size wage premium by skill group and changing cross-sectional patterns of business dynamism. I document a new fact that firms in big cities invest more in Information and Communication Technology per employee than firms in small cities, consistent with patterns of adoption in the model. I am extremely grateful to my advisor Richard Rogerson and to my committe, Ezra Oberfield, Teresa Fort and Stephen Redding, for their invaluable help and guidance. For helpful comments, I thank Esteban Rossi-Hansberg, Gene Grossman, Eduardo Morales, Kamran Bilir, Mark Aguiar, Gianluca Violante, Oleg Itskhoki, Nobu Kiyotaki, Gregor Jarosch, Jonathan Dingel, Rebecca Diamond, Chris Tonetti, Benjamin Pugsley, Andrew Bernard, David Chor, Peter Schott, Ethan Lewis, Bob Staiger, Nina Pavcnik, Treb Allen, Laura Castillo-Martinez, Elisa Giannone, Fabian Eckert, Gideon Bornstein, and other visitors to and participants of the IES Student Seminar Series. Thank you to my classmates for their continuous support and encouragement. This research benefited from financial support from the International Economics Section (IES) and the Simpson Center for the Study of Macroeconomics. Any opinions and conclusions expressed herein are those of the author(s) and do not necessarily represent the views of the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed. Department of Economics, Princeton University; Email: [email protected] 1

Transcript of The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and...

Page 1: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

The Geography of Business Dynamism and Skill-Biased TechnicalChange⇤

Hannah Rubinton†

December 4, 2019

Click here for the latest version

Abstract

This paper seeks to explain three key components of the growing regional disparities in the

U.S. since 1980, referred to as the Great Divergence by Moretti (2012). Namely, big cities saw

a larger increase in the relative wages of skilled workers, a larger increase in the relative supply

of skilled workers, and a smaller decline in business dynamism. These trends can be explained

by differences across cities in the extent to which firms adopt new skill-biased technologies.

In response to the introduction of a new skill-biased, high fixed cost but low marginal cost

technology, firms endogenously adopt more in big cities, cities that offer abundant amenities

for high-skilled workers and cities that are more productive in using high-skilled labor. The

differences in adoption can account for the increasing relationship between skill intensity and

city size, the divergence of the city size wage premium by skill group and changing cross-sectional

patterns of business dynamism. I document a new fact that firms in big cities invest more in

Information and Communication Technology per employee than firms in small cities, consistent

with patterns of adoption in the model.

⇤I am extremely grateful to my advisor Richard Rogerson and to my committe, Ezra Oberfield, Teresa Fort andStephen Redding, for their invaluable help and guidance. For helpful comments, I thank Esteban Rossi-Hansberg,Gene Grossman, Eduardo Morales, Kamran Bilir, Mark Aguiar, Gianluca Violante, Oleg Itskhoki, Nobu Kiyotaki,Gregor Jarosch, Jonathan Dingel, Rebecca Diamond, Chris Tonetti, Benjamin Pugsley, Andrew Bernard, David Chor,Peter Schott, Ethan Lewis, Bob Staiger, Nina Pavcnik, Treb Allen, Laura Castillo-Martinez, Elisa Giannone, FabianEckert, Gideon Bornstein, and other visitors to and participants of the IES Student Seminar Series. Thank you tomy classmates for their continuous support and encouragement. This research benefited from financial support fromthe International Economics Section (IES) and the Simpson Center for the Study of Macroeconomics.Any opinions and conclusions expressed herein are those of the author(s) and do not necessarily represent the viewsof the U.S. Census Bureau. All results have been reviewed to ensure that no confidential information is disclosed.

†Department of Economics, Princeton University; Email: [email protected]

1

Page 2: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

1 Introduction

Since 1980, economic growth in the United States has been concentrated in large, urban areaswhile small rural areas have fallen further behind. Moretti (2012) refers to this growing gap as theGreat Divergence. The Great Divergence has become an increasing concern for policy makers asthey have learned of its implications for gaps in health outcomes, disintegrating social connections,economic mobility and political tension1. For policies to effectively address these trends, it isnecessary to understand the underlying causes.

This paper seeks to explain several key components of the Great Divergence: big cities sawa larger increase in the relative wages of skilled workers and big cities saw a larger increase inthe relative supply of skilled workers. I also introduce a new fact to the literature on the GreatDivergence: big cities saw a smaller decline in business dynamism, as measured by the rates ofestablishment entry and exit and the rates of job creation and destruction. While in 1980 big andsmall cities had similar rates of business dynamism, today big cities are more dynamic than smallcities. The simultaneous increase in high-skilled wage growth and the abundance of high-skilledworkers is typically explained at the aggregate level by skill-biased technical change (SBTC) (Katzand Murphy, 1992). But, this aggregate story does not address the question of why SBTC shouldhave occurred differently across cities.

To explain these facts I develop a spatial model in which the amount of skill-biased technicalchange will endogenously vary across cities. The key insight from my model is that firms facedifferent incentives to adopt a new technology based on the characteristics of the city in which theyare located. I consider the introduction of a new technology that lowers the marginal cost for a firm,but has a higher fixed cost and uses high-skilled labor more intensively. Firms will adopt more incities that are big, cities with amenities attractive to high-skilled workers, and cities that are moreproductive in using high-skilled labor. The extent of adoption and therefore, skill-biased technicalchange, endogenously varies across cities, explaining the changing relationships between wages andskill intensity with city size.

Differences in adoption also affect rates of business dynamism across cities. In cities with highadoption rates, small and unproductive firms are less profitable, increasing their probability of exit(a selection effect similar to Melitz (2003)), which increases the equilibrium rate of turnover. Sincethese less productive firms are using the old technology which places more weight on low-skilledlabor, the increase in selection amplifies the differences across cities in the share of firms adoptingthe new technology, and therefore, skill-biased technical change.

A key feature of the response to the new technology is that adoption is higher in big cities. Theliterature has emphasized Information and Communication Technologies (ICT) as being intimatelyrelated to skill-biased technologies (Krueger, 1993; Autor et al., 1998). Thus, as evidence for this

1For papers that discuss these implications see Austin et al. (2018), Chetty and Hendren (2018), Chetty et al.(2016), Autor et al. (2019), and Autor et al. (2016).

2

Page 3: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

mechanism, I use data on investment in ICT to establish a novel empirical fact: firms in big citiesinvest more intensively in ICT.

The changing relationship of these economic variables with respect to city size has importantimplications for welfare inequality, economic mobility, and declining regional convergence (Diamond(2016), Moretti (2013), Giannone (2017), and Autor (2019)). Understanding the drivers of thesetrends is key to policy makers who wish to influence them. I make progress by showing thatdifferences across cities in technology adoption will amplify existing geographic inequalities alongthe dimensions for which a new technology is most suitable. Policies that attempt to reallocateeconomic activity across cities should target the incentives firms face to adopt new technologies,but they should also consider how such policies will affect the aggregate rate of technology adoptionand, therefore, aggregate growth.

I do three things in this project. My first contribution is to document empirical changes inthe distribution of economic activity across cities since 1980. I focus on three cross-sectional rela-tionships of interest. First, I introduce a new fact to the literature on the Great Divergence, thechanging relationship between business dynamism and city size since 1980. In 1980, these measureswere similar in big and small cities. By 2014, big cities exhibited much faster rates of dynamismthan small cities. The second relationship is the correlation between average wages and city size,referred to as the city size wage premium, by skill group. While the city size wage premium wassimilar for high- and low-skilled workers in 1980, by 2014, the city size wage premium for high-skilled workers was almost twice that of low-skilled workers. This divergence was driven by both anincreasing city size wage premium for high-skilled workers and a decreasing city size wage premiumfor low-skilled workers. Third, I document that the relationship between skill intensity, or the ratioof high- to low-skilled workers, and city size has increased since 1980.

My second contribution is to build a model that matches the salient features of the data in1980. I embed a rich model of firm dynamics into an otherwise standard spatial equilibrium modelwith high- and low-skilled workers, allowing a joint consideration of the geographic distribution ofrelative wage inequality and firm dynamics. The model features two types of workers, high- and low-skilled, who are freely mobile across space. Workers pay rent to live and work in a city and receivean amenity specific to the city where they live and their skill level. Workers have idiosyncraticpreferences for each city and choose to live in the city that gives them the highest utility. Withineach city there is a continuum of monopolistically competitive firms that use high- and low-skilledlabor to produce non-tradable intermediate goods. Firms pay a fixed cost in units of high- andlow-skilled labor, and they pay rent on a unit of floor space in the city where they produce. Firmsreceive idiosyncratic productivity shocks and, therefore, make dynamic entry and exit decisions.Entrants choose the city where they want to enter and produce. In equilibrium, firms are indifferentbetween entering in different cities. I calibrate the model steady state to the data in 1980 and showthat it can match the key features of the data.

My third contribution is to use the model to analyze the diffusion of a new technology that favorsskilled workers. I consider the introduction of a new technology that has an absolute productivity

3

Page 4: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

advantage, but is more skill-biased in that the marginal productivity of high-skilled labor is higherthan that of the old technology. Firms can choose to adopt the new technology, and, depending onthe factor costs in their city, adopting may lower their marginal cost. However, the new technologyrequires a higher fixed cost. Even though the new technology is available everywhere, firms thatare ex-ante similar will adopt differently depending on the environment in their city.

After introducing the new skill-biased technology and allowing the aggregate supply of high andlow-skilled labor to adjust as it has in the data, I solve for a new steady state equilibrium in themodel. I compare the model steady states before and after the introduction of the new technology,and show that the model can match the key changes in the data, including the changing relationshipof skill intensity and firm dynamics to city size and the divergence of the city size wage premiumby skill group. As evidence for differences in technology adoption across cities, I document a newcross-sectional fact. Using data on ICT investment between 2003 and 2013, I show that firms in bigcities have more ICT-related expenditures per employee than firms in small cities, and firms in bigcities spend a higher share of their total investment budget on ICT.

In the model, there are four channels that drive the higher rates of adoption in the big city. Thefirst channel is the market size effect. For a given productivity, firms in big cities receive a highervolume of sales and are more willing to pay the additional fixed cost to achieve the marginal costsaving from adopting the new technology. The second channel driving differences in adoption isex-ante differences in technology between the cities. In 1980, firms in big cities are already moreproductive at using high-skilled labor than firms in small cities. Even though the new technologykeeps the relative skill intensities of the old and new production functions constant across cities, thecost difference between the old and new technology will be smaller when the old technology is moreskill intensive, which increases the return to adoption in cities that are ex-ante more productivein using high-skilled labor. The third channel that drives differences in adoption are differences inamenities. If a city is rich in amenities for high-skilled workers, then all else equal, there will be alower relative wage for high-skilled workers, which further increases the return to adoption. Finally,a fourth channel affecting adoption rates is selection. In cities where firms adopt more, the exitthreshold below which firms exit the market shifts up, meaning that selection becomes tougher.Small, less productive firms are better at using low-skilled labor than big firms which have adoptedthe new technology. When selection becomes tougher, the small firms exit the market amplifyingthe amount of SBTC that the city experiences. This is the same force that drives up equilibriumdynamism rates in big cities relative to small cities.

Finally, I use the model for a series of counterfactuals. The first counterfactual examines theeffect of the increased sorting of high-skilled workers to big cities. Specifically, I shut down themigration channel in the model, and look at what would happen if cities maintained their 1980share of high-skilled labor. This counterfactual is motivated by one view of these trends

Second, I look at a counterfactual in which congestion forces are relaxed in big, frontier cities.Several recent papers have identified housing supply constraints as an important factor in drivingmisallocation across space and changing patterns of migration by high- and low-skilled workers

4

Page 5: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

(Hsieh and Moretti, 2019; Herkenhoff et al., 2018; Ganong and Shoag, 2017). Using data from Saiz(2010), I find that big cities have more inelastic housing supplies than small cities. I examine theeffect of equalizing housing supply elasticities across cities.

For each counterfactual, I consider the effect on the aggregate rate of technology adoption and,therefore, technical change in addition to the usual considerations of welfare. I find that there is atradeoff between technology adoption and welfare. While a policy of no migration is unambiguouslybad for welfare, it slightly increases aggregate adoption rates and, therefore, may be good fortechnological progress. On the other hand, equalizing housing supply elasticities is unambiguouslygood for welfare, but decreases aggregate technology adoption.

Literature Review

There is a large recent literature devoted to documenting and explaining what Moretti (2012)calls the Great Divergence, or the growing disparities across cities in the United States since 1980.However, the divergence of the city size wage premium by skill group has largely gone unrecognizeduntil Autor (2019) brought the fact to the forefront in his 2019 Ely Lecture (the main exceptionbeing Baum-Snow et al. (2018)). Furthermore, to the best of my knowledge, this paper is thefirst to document the changing relationship of business dynamism with city size and that the welldocumented aggregate decline in dynamism (Decker et al., 2016; Karahan et al., 2019; Pugsley andSahin, 2018) was more severe in small cities. Thus, an important contribution of this project isdocumenting these facts and showing that they are robust to controlling for industry compositionand changing demographics.

As in Giannone (2017), I build on the long literature on aggregate skill biased technical change(Autor et al., 2008; Katz and Murphy, 1992; Krusell et al., 2000), and asses the extent to which dif-ferences in SBTC across cities can account for the Great Divergence. However, instead of calibratingthe differences in SBTC to match the data, I allow the extent of SBTC to be endogenous acrosscities by considering the decision of heterogeneous firms to adopt a new skill-biased technology.

While there is a lot of work on documenting patterns of technology adoption (Fort, 2017; Fortet al., 2018) and some models of a skill biased technology choice where firms trade a high fixed costfor a low marginal cost (Bustos, 2011; Hsieh and Rossi-Hansberg, 2019; Yeaple, 2005), Beaudry et al.(2010) were the first to recognize that the incentives firms face to adopt new technologies will varyacross cities and that the extent of SBTC will endogenously vary as a result. They provide reducedform estimates of the causal impact of shocks to the relative supply of skilled labor on computerpurchases. I build on this insight by embedding this firm decision into a full general equilibriumspatial model in which several characteristics of the location affect the firm adoption decision.

Two recent papers, Davis et al. (2019) and Eeckhout et al. (2019), have a similar goal in thatthey show an aggregate technical change shock will endogenously have different effects in differentcities, but both focus on a polarization shock rather than SBTC. They show that firms incentivesto purchase capital that is substitutable with middle-skill jobs will be higher in big cities. Neitherhave a rich model of firm heterogeneity, and consider the decision of heterogeneous firms to adopt

5

Page 6: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

new technologies. Further, they do not consider how the adoption decisions will affect the city leveldynamism rates or that selection may amplify the differences in adoption.

A large set of papers in the spatial economics literature considers a similar set of facts relatedto the sorting of high- and low-skilled workers across space and the growing relationship betweencity size and the skill premium. Several papers (Baum-Snow et al. 2018; Rossi-Hansberg et al.2019; Michaels et al. 2013; Davis and Dingel 2014, 2019) argue that these facts can be explained byagglomeration economies that are biased towards high-skilled workers. Eckert (2019), Eckert et al.(2019) and Jiao and Tian (2019) argue that decreasing communication or trade costs will causeincreased geographic concentration of the high-skilled business services or high-skilled managersin large, frontier cities amplifying initial productivity differences. Diamond (2016) shows that ifamenities respond endogenously to an increase in the supply of skilled labor, that it will amplify theeffect of a SBTC shock that is asymmetric across space. However, while these theories can explainan increase in the high-skilled city size wage premium and an increase in sorting, they struggle toexplain the decrease in the city size wage premium for low-skilled workers. Instead, I propose thatthese trends can be explained by differences in adoption of a new skill-biased technology. While allof these mechanisms can be complementary, SBTC is crucial for a joint explanation of the empiricalfacts.

A relatively small strand of literature considers differences in business dynamism across cities.Nocke (2006), Asplund and Nocke (2006) and Gaubert (2014) build models in which establishmentturnover will be higher in big cities. Asplund and Nocke (2006) confirm these patterns for Swedishhair salons and Gaubert (2014) in the French micro-data. I am the first to document the relationshipof business dynamism to city size in the United States, and, I am the first to show that thesepatterns changed over time. Despite previous literature documenting a positive correlation betweendynamism and city size in other countries, it is not a stylized fact that big cities are more dynamic;in the 1980s, big cities were not more dynamic than small cities.

The rest of this paper is organized as follows. In section 2, I describe the data and documentchanges in economic activity across cities. In section 3, I present a spatial equilibrium model withfirm dynamics and show that it can match the relevant features of the data in 1980. Then insection 4, I describe the skill-biased technical change shock and the calibration of the new skill-biased technology. In section 5, I show the extent to which a skill-biased technical change shockcan account for the changes in economic activity across cities since 1980. In section 6, I discussfour channels that drive differences across cities in technology adoption in the model and validatethese mechanisms using data on ICT spending. Finally in section 7, I use the model to undertakecounterfactuals before concluding in section 8.

2 Descriptive Facts

In this section, I document several facts related to the changing relationship of economic activitywith city size over time. First, the correlation between average wages and city size, known as the

6

Page 7: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

city size wage premium, diverged by skill group. Second, the relationship between skill intensityand city size has grown steeper over time. In 1980, big and small cities had similar ratios of high-to low-skilled workers, but by 2014, big cities became much more skill intensive than small cities.Third, firms in big cities spend more on ICT per employee than firms in small cities and a highershare of their total investment is spent on ICT. This relationship holds even when controllingfor characteristics of the firms such as industry, size and age. Finally, I document the changingrelationship between business dynamism and city size. In 1980 big and small cities exhibited similarrates of business dynamics while today big cities are more dynamic than small cities. Relatedly, Ishow that the well documented decline in the aggregate start up rate (Decker et al., 2016; Pugsleyand Sahin, 2018) was more pronounced in small cities than big cities.

2.1 Data description

To document facts on the changing relationship of economic activity with city size over time, I useseveral micro datasets on individuals and establishments from the U.S. Census Bureau. ThroughoutI use Core Based Statistical Areas (CBSAs) as the definition of cities2. As a measure of city size,I use working age population (ages 20-64), though all results are robust to using total population.Estimates of county level working age population are available from the Census Bureau’s IntercensalPopulation Estimates, which I aggregate to the CBSA level. In the main results, I limit the sampleof cities to those with non-missing values of population, wages and measures of business dynamism.Measures of business dynamism are only available publicly in the Business Dynamic Statistics forMetropolitan Statistical Areas (MSAs, cities with a population of greater than 50, 000). The finalsample includes 366 MSAs. Where possible, I provide results using the full sample of CBSAs, whichincludes 942 cities.

After aggregating to the CBSA level, I further aggregate the cities into 30 bins based on 1980city size with 12 or 13 cities per bin. Binning by city size has several advantages. First, I cancalibrate and solve my model with 30 bins, which would be computationally infeasible with the fullset of cities. Second, the analysis with equally sized bins gives equal weight to cities across the sizedistribution. Alternatively, performing the analysis at the city level would give more weight to smallcities since there are more small cities than big cities while weighting by population would give moreweight to big cities since the biggest cities are home to a disproportionate share of workers3.

To document facts on the city size wage premium and skill intensity by city size, I use data fromthe 1980, 1990, and 2000 Decennial Censes and the American Community Surveys (ACS) from 2001to 20144 compiled by IPUMS (Ruggles et al., 2019). The finest geographic unit available in the ACSis the Public Use Micro Area (PUMA) which, unfortunately, does not map uniquely to a CBSA. Todeal with cases where there is no unique mapping, I follow the method of Autor and Dorn (2013)and Baum-Snow et al. (2018): when an individual resides in a PUMA that crosses CBSA lines, I

2I use the 2009 definitions following the public use Business Dynamic Statistics.3The top 10 cities contain 29.6 of the population in 1980.4I stop my analysis in 2014 because that is the last year the BDS is available.

7

Page 8: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

include the individual in both CBSAs and adjust their weights for the probability that they residein each using the Geographic Equivalency Files from the Census Bureau.

I limit the sample to civilian workers aged 16 to 64 who work at least 30 hours a week. Iuse the occupation codes from Autor and Dorn (2013) and the 1990 industry codes provided byIPUMS. I group the detailed race categories into four groups that are consistent over time: whitenon-hispanic, hispanic, black and asian. Wages include all wage and salary income including cashtips, commissions and bonuses collected in the previous calendar year. High-skilled workers arethose with at least four years of college while low-skilled workers are those with less than fouryears of college. In appendix A, I show the main facts are robust to alternative definitions of skill.For data on rents and housing prices, I follow Ganong and Shoag (2017) and calculate a housingcomposite measure which is 12 time monthly rent for renters and 5 percent of the value of the housefor homeowners. I use measures of housing supply elasticity from Saiz (2010).

To document facts on business dynamics and technology adoption I use several different sourcesof firm and establishment micro data from the U.S. Census. The primary dataset is the LongitudinalBusiness Database (LBD) which is a panel dataset of all U.S. establishments with at least oneemployee. The LBD includes information on establishment county, employment, industry, andpayroll from 1976 to 2014. Establishments are linked to their parent firm using a firm ID. Firmage is imputed from the first time its oldest establishment is observed in the LBD. Because theLBD starts in 1976, I only have meaningful age information starting in 1987 at which point I knowif a firm is age 0 to 10 or 11+. For firms with establishments across multiple sectors, I assign afirm industry from the establishment level industry codes using the industry with the firm’s highestshare of employment. I assign the two digit NAICS code first and then the three digit code withthe most employment that is consistent with the two digit codes, and so on, up to six digits. For aconsistent measure of industry codes going back to 1976, I use the NAICS codes available for theLBD from Fort and Klimek (2018). I also use the public Business Dynamic Statistics (BDS), whichis available at the MSA level and provides information on the number of firms and establishmentsand job creation and destruction by firm and establishment age.

I supplement the LBD with establishment level sales data from the Economic Censuses. Thereare 9 Economic Censuses5 available in years ending in two and seven (1977, 1982, etc.). However,the coverage varies by sector. As such, when using data on sales, I use data only from sectors thatwere available in 19776. This ensures the results are not driven by changes in sample selection.

For data on ICT expenditures, I use data from the Annual Capital Expenditure Survey (ACES),which has an annual ICT supplement starting in 2003. The survey is collected at the firm leveland, when a firm is sampled into the ICT supplement, contains annual ICT expenditures. A newsample of firms is chosen each year giving a series of repeated cross-sections. The data include

5Census of Construction Industries, Census of Services, Census of Retail Trade, Census of Manufacturers, Censusof Wholesale Trade, Census of Auxiliary Establishments, Census of Finance, Census of Mining, and the Census ofUtilities.

6All of the Economic Census were available in 1977 except the Census of Finance, Census of Mining and theCensus of Utilities.

8

Page 9: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

capitalized and non-capitalized expenditures, as well as rental, lease and maintenance paymentsfor software, computer equipment, and other peripheral ICT equipment. Software expenses includeprepackaged, customized or in-house built software, including payroll related to development. Wherepossible, I supplement the ACES data with information on employment, payroll, sales and industryfrom the LBD and the Economic Censuses. Unfortunately, only 95.5% of records contain the firmidentifiers necessary to match to the LBD. Appendix A.4 provides information on the characteristicsof matchers versus non-matchers. Because the ICT supplement is only available in later years, Iinterpret the data as being informative about the period in my model in which the new skill-biasedtechnology has already become available.

For the ACES data, which are at the firm level, I assign a firm location using the location ofthe firm’s establishment with the highest payroll per employee. I use payroll per employee insteadof the location with the highest share of employees because the number of employees is likely to bespuriously correlated with the size of the city. For example, I do not want to assign the location ofStarbucks as New York City just because it has a location on every block. Further, an establishmentwith higher than average pay per employee is likely to be associated with high paying managementactivities.

2.2 Demographic facts

I document two sets of facts about changing outcomes for skilled and unskilled workers by citysize. First, I show that the city size wage premium has diverged by skill group. Second, I showthe relationship between skill intensity and city size has grown steeper over time. Closely relatedfacts have been previously documented by researchers (including Autor (2019); Baum-Snow et al.(2018); Diamond (2016); Giannone (2017)) and I reproduce them in my sample as they will be animportant input in my quantitative exercise.

Fact 1: the city size wage premium diverged by skill group since 1980

Figure 1, shows the relationship between wages and city size by skill group. In 1980, theelasticity of wages with respect to city size was about .056 and .047 for high- and low-skilledworkers, respectively, meaning that high-skilled workers earned about 3.9 percent more for workingin cities twice as large and low skilled workers earned about 3.2 percent more7. By 2014, the wagepremium had substantially diverged. The city size wage premium increased to .072 for high-skilledworkers, while for low-skilled workers the city size wage premium fell to just .035 percent.

The divergence of the city size wage premium by skill group is robust to controlling for demo-graphics, occupation, and industry of the workers. To control for demographics, I estimate a Mincerregression in each year and for each skill group separately,

log(wit) = ↵t + �tXit + ✏it.

7I multiply the coefficient by 2 log points (approximately .7) to arrive at these numbers

9

Page 10: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

β1980=0.056(0.005)β2014=0.072(0.004)

−.1

0

.1

.2

.3

log(w

h),

dem

eaned

10 12 14 16

log(population)

1980 2014

(a) High skilled wages

β1980=0.047(0.007)β2014=0.035(0.004)

−.1

−.05

0

.05

.1

.15

log(w

l), dem

eaned

10 12 14 16

log(population)

1980 2014

(b) Low skilled wages

Figure 1: City size wage premium by skill group

Source: Wages are from the 1980 Decennial Census and 2014 ACS. Population is from the Intercensal PopulationEstimates. Figure displays the relationship between log average wages and city size. The unit of observation is oneof 30 city size categories. Wages are demeaned by year Specifically, for each skill type ⌧ and each year t I plot arelog(w⌧t)� log(w⌧t).

In the first set of regressions I include standard demographic controls typical in a Mincer regression:age, age squared, a gender dummy and full set of race fixed effects. In the second set of regressions,I add a full set of industry and occupation fixed effects. I then use the residuals to compute theaverage adjusted wages in a city, wc,t =

Pi2c exp(✏it)

8. I fit a separate Mincer regression for eachskill group and in each year to allow the coefficients to change over time and to differ by skill. Thisensures that the divergence in the city size wage premium is not driven by differences in demographiccomposition between big and small cities. In appendix A, Tables 10 and 11 reproduces these figuresusing more education groups and Table 9 reproduces them using the full set of cities instead of citybins.

In panels (a) and (b) of figure 2, I show the city size wage premium adjusted for standarddemographic controls. The divergence of the city size wage premium by skill group is unchanged.For both groups the city size wage premium is about .045 percent in 1980. By 2014, high-skilledworkers are earning a city size wage premium of .074 percent for while for low-skilled workers thecity size wage premium fell to .038 percent.

Finally in panels (c) and (d) of figure 2, I show the city size wage premium controlling foroccupation and industry fixed effects in addition to the standard demographic controls. Includingindustry and occupation fixed effects decreases the level of the city size wage premium substantiallysuggesting that workers are already engaged in different activities in big and small cities in 1980.

8I take the exponential of the residual because the Mincer regression uses log wages and the residual shouldtherefore be interpreted as the adjusted log wage. Instead one could compute the average of the residuals directly,but this would be interpreted as taking the average of the log(wit) in a city. In practice both methods producesimilar results. However, in theory, the difference between the mean of the log and the log of the mean will dependon variance of the wage distribution. Baum-Snow and Pavan (2013) and Santamaria (2018) show that inequality issystematically higher in big cities suggesting that using the mean of the log could bias the estimate of the city sizewage premium.

10

Page 11: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

β1980=0.046(0.003)β2014=0.074(0.004)

−.1

0

.1

.2

.3

log

(ad

j. w

h),

de

me

an

ed

10 12 14 16

log(population)

1980 2014

(a) High-skilled wages

β1980=0.043(0.006)β2014=0.038(0.004)

−.1

−.05

0

.05

.1

.15

log

(ad

j. w

l),

de

me

an

ed

10 12 14 16

log(population)

1980 2014

(b) Low-skilled wages

β1980=0.046(0.003)β2014=0.074(0.004)

−.1

0

.1

.2

.3

log

(ad

j. w

h),

de

me

an

ed

10 12 14 16

log(population)

1980 2014

(c) High-skilled wages

β1980=0.043(0.006)β2014=0.038(0.004)

−.1

−.05

0

.05

.1

.15

log

(ad

j. w

l),

de

me

an

ed

10 12 14 16

log(population)

1980 2014

(d) Low-skilled wages

Figure 2: City size wage premium by skill group, residual wages

Source: Wages are from the 1980 Decennial Census and 2014 ACS. Population is from the Intercensal PopulationEstimates. Figure displays the relationship between log average adjusted wages and city size. The unit of observationis one of 30 city size categories. Wages are demeaned by year. Specifically, for each skill type ⌧ , city j and eachyear t, I plot are log(w⌧jt)� log(w⌧t). Panels (a) and (b) show wages adjusted for demographics including age, agesquared, gender and race dummies. Panels (c) and (d) add further controls for occupation and industry fixed effects.

11

Page 12: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

β1980=0.026(0.004)β2014=0.074(0.009)

−.1

0

.1

.2

.3

skill

inte

nsi

ty,

de

me

an

ed

10 12 14 16

log(population)

1980 2014

Figure 3: Skill intensity and city size

Source: Skill intensity is from the 1980 Decennial Census and 2014 ACS. Population is from the Intercensal PopulationEstimates. Figure displays the relationship between demeaned skill intensity and city size. Specifically for a city jand year t demeaned skill intensity is H

L jt� H

L t. The unit of observation is one of 30 city size categories.

The occupation and industry controls also explain some, but not all of the divergence. In 1980,both groups earn a wage premium of approximately .025%. By 2014, it is .052% for high-skilledworkers and .030% for low-skilled workers. The city size wage premium for low-skilled workers isno longer lower in 2014 than in 1980, but the divergence between the high and low-skilled city sizewage premium still holds.

Fact 2: the relationship between skill intensity and city size has grown steeper over time

Figure 3 shows the relationship between skill intensity and city size. Skill intensity is the ratio ofhigh-skilled workers, those with 4 or more years of college, to low-skilled workers, those with lessthan 4 years of college. In 1980, the semi-elasticity of skill intensity with respect to city size was.026 percentage points. By 2014, the semi-elasticity increased to .074. Cities that were twice aslarge in 1980 were, on average, 1.82 percentage points more skill intensive. By 2014, the relationshipbetween skill intensity and city size had steepened; cities twice as large were 5.18 percentage pointsmore skill intensive. In appendix A.1 figure 11, I show the results are robust to using alternativedefinitions of high- and low-skilled and in figure 12 to using the full set of cities.

2.3 Firm facts

Fact 3: Today, firms in big cities invest more in ICT.

Table 1 shows the results from a regression of log of ICT expenses per employee and ICT shareof investment on city size and industry fixed effects. The elasticity of ICT spending per employeewith respect to city size is .054 percent meaning that firms in cities that are twice as large spend3.8 percent more per employee on ICT investment than firms in small cities. In column 2, I add

12

Page 13: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

log( ICT investmentemployment ) ICT investment

total investmentlog(pop) 0.0541⇤⇤⇤ 0.0568⇤⇤⇤ 0.0142⇤⇤⇤ 0.0145⇤⇤⇤

(0.0109) (0.00919) (0.00119) (0.00117)

log(emp) �0.430⇤⇤⇤ �0.0156⇤⇤⇤

(0.00841) (0.00147)

log(estabs) 0.450⇤⇤⇤ �0.00624⇤⇤⇤

(0.0124) (0.00214)

NAICS 4 FEs Y Y Y Y

Year FEs Y Y Y Y

Age FEs N Y N Y

N 239000 239000 269000 269000

R2 0.194 0.324 0.146 0.15

SEs clustered at the CBSA level*** p<0.01, ** p<0.05, * p<0.1

Table 1: Technology adoption and city size

Source: ICT data is from ACES and merged into the LBD for information on age, size and location of the firm.Population is from the Intercensal Population Estimates. Table displays the relationship between log of ICT spendingand ICT share of investment and city size. The unit of observation is a firm and city size is measured at the CBSAlevel. Standard errors are clustered at the CBSA level.

controls for the age and size of the firm with little effect on the correlation between city size andICT spending per employee. Similarly, columns 3 and 4 present the same regressions using the ICTshare of investment as the dependent variable. Firms in cities twice as large spend, on average, 1percentage points more of their total investment budget on ICT-related expenses such as computerequipment and software development. Controlling for size and age has little effect on the coefficient.I run these regressions on pooled data across the years for which the ICT data is available (2003-2013) and include yearly fixed effects in the regression. Because the data are only available forrecent years, I interpret these results as being informative about the second steady state in mymodel in which the new technology is available.

Fact 4: Big cities today are more dynamic than small cities. This was not true in 1980.

Throughout the analysis, I use several different measures of dynamism including the firm start upand exit rate, the establishment start up and exit rate and the job creation and destruction rates.Following Davis et al. (1996), the start up rate in time t is defined as srt = age 0 firms

t

(firmst+firmst�1)/2

; the exitrate is defined as ert = firm deathst

(firmst+firmst�1)/2

; the establishment start up rate and exit rates at time t are

defined analogously as esrt = age 0 establishmentst

(establishmentst+establishmentst�1)/2

and eert =establishments exitst

(establishmentst+establishmentst�1)/2

;and job creation and destruction are employment increases and decreases at expanding or contract-ing firms, respectively. Specifically, job creation is jcrt =

Pi(empti�empt�1,i)I[empti>empt�1,i]

(empt+empt�1)/2and

jdrt =P

i(empt�1,i�empti)I[empti<empt�1,i]

(empt+empt�1)/2. I take three year moving averages of all dynamism rates

to smooth out cyclical variation.

13

Page 14: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

β1980=−0.322(0.227)β2014=0.452(0.119)

−2

0

2

4

6

est

ablis

hm

ent st

art

up r

ate

, dem

eaned

10 12 14 16

log(population)

1980 2014

(a) Establishment start up rate

β1980=−0.142(0.090)β2014=0.229(0.079)−2

−1

0

1

2

est

ablis

hm

ent exi

t ra

te, dem

eaned

10 12 14 16

log(population)

1980 2014

(b) Establishment exit rate

β1980=−0.085(0.212)β2014=0.594(0.095)

−2

0

2

4

6

firm

sta

rt u

p r

ate

, dem

eaned

10 12 14 16

log(population)

1980 2014

(c) Firm start up rate

β1980=0.160(0.062)β2014=0.308(0.053)

−1

−.5

0

.5

1

firm

exi

t ra

te, dem

eaned

10 12 14 16

log(population)

1980 2014

(d) Firm exit rate

β1980=−0.275(0.300)β2014=0.451(0.101)

−4

−2

0

2

4

6

job c

reatio

n r

ate

, dem

eaned

10 12 14 16

log(population)

1980 2014

(e) Job creation rate

β1980=−0.182(0.160)β2014=0.241(0.061)

−2

−1

0

1

2

3

job d

est

ruct

ion r

ate

, dem

eaned

10 12 14 16

log(population)

1980 2014

(f) Job destruction rate

Figure 4: Dynamism and City size

Source: Business Dynamic Statistics and author calculations. Population is from the Intercensal Population Esti-mates. Figure displays the relationship between demeaned dynamism, as measured by the establishment start upand exit rate, the firm start up and exit rate and job creation and destruction, and city size. Specifically for a city jand year t demeaned dynamism rate Djt is Djt � Dt. The unit of observation is one of 30 city size categories.

14

Page 15: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

Figure 4 shows the relationship between dynamism and city size for 1980 and 2014 using datafrom the public Business Dynamic Statistics. In 1980, there is no relationship between measures ofdynamism and city size while by 2014 big cities are more dynamic than small cities. In 2014, thesemi-elasticity of the establishment start up rate with respect to city size is .45 percentage points,meaning that the establishment start up rate is .32 percent higher in cities twice as large.

In appendix A.2, I show that these results results are robust to controlling for city-level industrycomposition. I present the correlation between dynamism and city size over time controlling forindustry composition at the 3 digit NAICS level. To do this, I compute dynamism measures withina city-industry cell and then estimate the following specification

Dict = ↵it + �t ⇥X

t

�tlog(popct) + ✏ict

where Dict is the measure of dynamism and ↵it is a full set of industry-year fixed effects. Thepopulation variable is interacted with a full set of year dummies, �t, giving an estimate of the cross-sectional relationship between dynamism and city size, �t, for each year. Because city-industrybins become very small for detailed industry categories, I also perform an alternative analysis bygrouping cities into 7 size categories9. I then calculate the dynamism measures within a city sizecategory-industry bin, using 4 digit NAICS codes.

βfirm start up rate=0.597(0.135)βfirm exit rate=0.104(0.063)

−10

−8

−6

−4

−2

∆D

1980,2

014

10 12 14 16

log(population), 1980

firm start up rate firm exit rate

(a) Firm start up and exit rate

βestab. start up rate=0.684(0.129)βestab. exit rate=0.320(0.056)

−10

−8

−6

−4

−2

∆D

1980,2

014

10 12 14 16

log(population), 1980

estab. start up rate estab. exit rate

(b) Establishment start up and exit rate

βjob creation rate=0.652(0.219)βjob destruction rate=0.391(0.136)

−12

−10

−8

−6

−4

−2

∆D

1980,2

014

10 12 14 16

log(population), 1980

job creation rate job destruction rate

(c) Job creation and job destruction rate

Figure 5: Decline in dynamism versus initial city size

Source: Business Dynamic Statistics and author calculations. Population is from the Intercensal Population Esti-mates. Figure displays the relationship between the demeaned decline in dynamism, as measured by the establishmentstart up and exit rate, the firm start up and exit rate and job creation and destruction, between 1980 and 2014 andcity size. Specifically for a city j and year t demeaned dynamism rate Djt is �1980,2014Dj � ¯�1980,2014Dt. The unitof observation is one of 30 city size categories.

Next, I show that cities that were small in 1980 exhibited larger declines in dynamism thanbig cities. This ensures that the changing relationship of dynamism to city size is not just drivenby dynamic cities moving up the city size distribution. Figure 5 shows the relationship between

9Specifically, city size categories are based on population percentiles. City size category 1 is cities below the 50thpercentile, category 2 is the 50th to 75th percentile, 3 is the 75th to 90th percentile, 4 is the 90th to 95th percentile,5 is 95th percentile to 98th percentile, 6 is the 98th to 99th percentile and finally category 7 is cities above the 99thpercentile.

15

Page 16: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

� 1980 to 2014

sr er esr eer jcr jdr

log(pop1980) 0.349⇤⇤⇤ 0.0979⇤⇤⇤ 0.322⇤⇤⇤ 0.286⇤⇤⇤ 0.780⇤⇤⇤ 0.308⇤⇤⇤

(0.0557) (0.0387) (0.0586) 0.0444) (0.0822) (0.0567)

Observations 57, 500 57, 500 57, 500 57, 500 58, 500 58, 500

R2 0.141 0.066 0.101 0.070 0.073 0.061

*** p<0.01, ** p<0.05, * p<0.1

Table 2: Decline in Dynamism and City Size, industry controls

Source: LBD and author calculations. Population is from the Intercensal Population Estimates. Figure displays therelationship between the demeaned decline in dynamism, as measured by the establishment start up and exit rate,the firm start up and exit rate and job creation and destruction, between 1980 and 2014 and city size. Specificallyfor a city j and year t, demeaned decline in the dynamism rate Djt is �1980,2014Dj � ¯�1980,2014Dt. The unit ofobservation is a CBSA ⇥ 3-digit NAICS. Regression includes a full set of industry fixed effects. Standard errors areclustered at the CBSA level. Observations are rounded to the nearest 100 for disclosure avoidance.

the decline in dynamism versus 1980 city size in the public BDS data. To control for industrycomposition, I estimate the following specification

�1980,2014Dic = ↵i + �log(popc1980) + ✏ic

where �2014,1980Dic is the change in dynamism between 1980 and 2014 in city c and industry i, ↵i

is a full set of industry fixed effects and popc1980 is a city’s 1980 working age population. I estimatethis at the city-NAICS 3 level. The results are presented in table 2. In appendix A.2, I group citiesinto size bins and control for more detailed industry composition.

In cities twice as large, the establishment start up rate fell .5 percentage points less than incities half the size. A back of the envelope calculation suggests that if the decline in dynamism insmall cities had been the same as the decline in big cities, the aggregate establishment start up ratewould have declined by half as much.

3 A Spatial Equilibrium Model with Firm Dynamics

A key objective of the model is the explain the 1980 relationship of wages, skill intensity anddynamism with city size, as documented in the previous section. As such, the model imbeds a stan-dard model of firm dynamics (Hopenhayn, 1992) into a benchmark spatial equilibrium framework(Roback, 1982) allowing a joint consideration of firm dynamics and relative wage inequality. Thereare two types of workers, high- and low-skilled, who are freely mobile between J cities. Workers haveidiosyncratic preferences for each city, and they consume a freely traded final good and housing. Ineach city there is a final goods producer that aggregates a continuum of local intermediate varieties.The perfectly competitive final good is freely traded on the national market, and the price of thefinal good is normalized to one.

16

Page 17: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

In each city there is a continuum of monopolistically competitive firms that produce using aCES bundle of high- and low-skilled labor. They sell their output to a local final goods producer.Intermediate firms pay a fixed cost of production in units of high- and low-skilled labor weightedaccording their CES cost shares. Firms must also purchase a fixed unit of floor space in their city.Time is continuous and firms are subject to a stochastically varying productivity. Firms exit whentheir productivity falls below a threshold which varies endogenously across cities. Firms choose acity in which to enter and pay an entry cost denominated in units of labor. Firms cannot move oncethey enter. In the 1980 steady state, all firms within a city use the same technology to produce. Insection 3.7, I discuss the calibration of the initial steady state.

3.1 Final good producer

In each city j, a final goods producer uses a continuum of local intermediate varieties, indexed by!. The final goods producer aggregates the intermediates according to a standard CES demandfunction

Yj =

Z

⌦j

qj(!)��1� d!

! �

��1

.

Solving the profit maximization problem of the final goods producer, an intermediate firm faces thedemand function

qj(!) = YjP�

j p(!)�� = RjP

��1j

p(!)��,

where Pj =⇣R⌦j

pj(!)1��d!⌘ 1

1�� is the price of the final good. The final good is freely traded in anational market and the price is normalized to 1.

3.2 Intermediate producer

The intermediate firm’s problem can be separated into a static component and a dynamic compo-nent. In the static profit maximization problem, firms choose their optimal bundle of high- andlow-skilled labor to minimize their unit cost function and then choose price and output to maximizeprofits given demand from the local final goods producer. In the dynamic problem, incumbent firmschoose the optimal exit threshold given their stochastically varying productivity and a continuumof potential entrants choose whether or not to enter.

Static problem

All intermediate firms in a city j produce using the same CES bundle of high- and low-skilled labor

qjt = jzt(!)⇣(1� �j)ljt

✏�1✏ + �jhjt

✏�1✏

⌘✏

✏�1 ,

where zt(!) is a firm specific stochastically varying productivity term and j is a city specificproductivity term. The weight placed on high-skilled labor �j is city specific.

17

Page 18: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

First, firms choose the bundle of high- and low-skilled labor that minimizes their unit costfunction,

cj(z) = minljt,hjt

wljtljt + whjthjt

such that jzt(!)

⇣(1� �j)ljt

✏�1✏ + �jhjt

✏�1✏

⌘✏

✏�1 � 1.

Solving the minimization problem gives the Hicksian demand function for labor (dropping timesubscripts for ease of notation)

ldj (z) =w�✏lj(1� �j)✏

jz(�✏jw1�✏hj

+ (1� �j)✏w1�✏lj

)✏

✏�1,

hdj (z) =w�✏hj�✏j

jz(�✏jw1�✏hj

+ (1� �j)✏w1�✏lj

)✏

✏�1,

and the unit cost function

cj(z) = ( jz)�1 (�✏jw

1�✏hj

+ (1� �j)✏w1�✏

lj)

11�✏

for a firm in city j with productivity z. The share of labor that is high-skilled and low-skilled,respectively, that a firm uses to produce one unit of output is given by ✓h

j=

hd

j

ld

j+h

d

j

and ✓lj=

ld

j

ld

j+h

d

j

.Next, the intermediate firm chooses output and price to maximize profits subject to their down-

ward sloping demand curve from the final goods producer:

⇡j(z) = maxpj(!),qj(!)

pj(!)qj(!)� qj(!)cj(z)� (✓hjwhj + ✓ljwlj)fc

j � rjfb

j

such thatqj(!) = YjP

j pj(!)��,

where f c

jis the fixed cost paid by firms in units of high- and low-skilled labor and f b

jis the fixed

units of building space firms need to produce in city j. Firms choose their price pj(z) as a mark-upover their marginal cost (which, since the production function exhibits constant returns to scale, isthe same as the unit cost function cj(z) solved for in the previous step)

pj(z) = cj(z)�

� � 1.

Variable profits are thus

⇡vj (z) =

✓cj(z)

� � 1

◆1��YjP

j

1

�.

18

Page 19: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

Dynamic problem: exit

The productivity of intermediate firms varies according to a geometric Brownian motion, and firmsdiscount future profits at an interest rate ⇢10. The dynamic problem of the firm is as follows: firmschoose the optimal exit threshold zx below which they will exit the market

V (z) = maxzxjz

Ez

ZT (zxj)

0e�⇢t

⇣⇡j(z(t))� (✓hjwhj + ✓ljwlj)f

c

j � rjfb

⌘dt

dlog(zt) = µdt+ dW (t),

where T (zxj) is the time at exit. Defining

s = ln z��1

and using a change of variable, the value function can be rewritten as

V (s) = maxsxjs

Es

ZT (sxj)

0e�⇢t

hZje

s � (✓hjwhj + ✓ljwlj)fc

j � rjfb

idt

dst = µdt+ dW (t),

where Zj =⇣( j)

�1 (�✏jhw1�✏jh

+ �✏jlw1�✏jl

)1

1�✏�

��1

⌘1��YjP �

j

1�

and µ = (� � 1)�µ� 1

2 2�

and =

(� � 1). The stochastic process for s is derived using Ito’s Lemma11 and follows a standardBrownian motion.

The solution to the firm’s optimal stopping time problem solves the HJB equation

⇢V (s) = [Zjes � (whj + wlj)f

c

j � rf b

j ] + µV 0(s) +1

2 2V 00(s)

and the following boundary conditions

no bubble: lims!1

V (s)� Vp(s) = 0

value matching: V (sxj) = 0

smooth pasting: V 0(sxj) = 0.

10Household consumption is linear in wages, so the interest rate will be pinned down by the household’s discountrate.

11Ito’s lemma says that if x is a Brownian motion dx = µ(x)dt+�(x)dW then y = f(x) is also a Brownian motionof the form dy = df(x) =

�µ(x)f 0(x) + 1

2�2(x)f 00(x)

�dt+ �(x)f 0(x)dW . Thus,

ds = df(z) =1a(� � 1)

✓µz

1z� 1

2

2z2 1z2

◆dt+

1a(� � 1) dW (t)

= (� � 1)

✓µ� 1

2

2

◆dt+ (� � 1) dW (t) = µdt+ dW (t)

19

Page 20: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

An advantage of using continuous time is that it allows a closed for solution for the value function.I derive the solution to the value function in appendix B.1:

V (s) = Zj

⇢�µ� 12

2 es + 1

⇢[�(whj + wlj)f c

j� rjf b

j]...

+

�Zj

⇢�µ� 12

2 esx�⇠�sx � 1

⇢[�(✓h

jwhj + ✓l

jwlj)f c

j� rjf b]e�⇠

�sx

�e⇠

�s,

and the exit threshold

sxj = ln

"1

Zj

✓�⇠�

1� ⇠�

◆ ⇢� µ� 1

2 2

!⇣(✓hjwhj + ✓ljwlj)f

c

j + rjfb

⌘#

where ⇠� = �µ�p

µ2+2 2⇢

2 .

Dynamic problem: entry

There is an unbounded mass of potential entrants. Entrants pay a fixed entry cost in units of high-and low-skilled labor, weighted by the CES shares of high- and low-skilled labor, and start with aninitial productivity of ze. Firms choose the city in which to enter that will maximize their expectedvalue. Once they start in a city, they cannot move. As a result, a free entry condition holds in eachmarket

Vj(ze) = (✓hjwhj + ✓ljwlj)fe

j ,

such that entrants are indifferent between potential entry locations.

3.3 Firm size distribution

The distribution of firm size evolves according to the Kolmogorov Forward Equation

@gj(s)

@t= �µ

dgj(s)

ds+

1

2 2d

2gj(s)

ds2.

In steady state the distribution must be unchanging, @gj(s)dt

= 0, which implies the stationarydistribution must satisfy �� dgj(s)

ds= d

2gj(s)ds2

where � = �µ

2/2. In appendix B.2 I show that the

distribution can be solved for in closed form, and is described by the following equation:

gj(s|se) =

8<

:

Ej

µ

�e��(s�sxj) � 1

�s 2 (sxj , se)

Ej

µ

�e�sxj � e�se

�e��s s > se.

(1)

Integrating over the distribution of firms in city j gives the mass of firms Mj .

Mj =

Z 1

sxj

gj(s) =Ej

µ(sxj � se).

20

Page 21: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

Rearranging the expression for the mass of firms and noting that in steady state, the mass of exitors,Ej , must be equal to the mass of entrants, Nj , the start up rate in city j is given by

srj =Nj

Mj

=�µ

sej � sxj.

The start up rate in a city is a function of the drift in firm size, µ, (note that µ < 0) and thedifference between the entry and exit thresholds. As the exit threshold increases, the start up raterises. This is because as the exit threshold increases, the probability that a firm exits increases.Since the entry rate and exit rate must be equal in steady state, an increase in the probability ofexit implies an increase in the start up rate.

3.4 Households

Labor is perfectly mobile across cities. High- and low-skilled workers provide one unit of labor andconsume the final good and residential land. The utility of worker i of type ⌧ in city j is given by

Vi⌧j(w⌧j , qj) = maxc,b

log(c�⌧jd1��⌧j

) + log(A⌧j) + ⇣ij

subject to their budget constraintw⌧j = c⌧j + qjd⌧j ,

where qj is the price of residential floor space and ⇣ij is an idiosyncratic location preference drawnfrom a type I extreme value distribution with shape parameter ⌫, which determines the elasticityof labor supply with respect to the real wage. A⌧j is a type specific amenity received for living incity j. The indirect utility function is given by

Vi⌧j(w⌧j , qj) = log⇣��(1� �)(1��)w⌧jq

��1j

⌘+ log(A⌧j) + ⇣ij .

Define V⌧j ⌘ log⇣��(1� �)(1��)w⌧jq

��1j

⌘+ log(A⌧j). Then the probability that city j provides

the highest utility to worker i will follow a multinomial logit

⇡⌧j = P (Vi⌧j > Vi⌧k, 8k 6= j) =exp(⌫V⌧j)Pjexp(⌫V⌧j)

.

3.5 Land

A landlord builds floorspace and rents commercial space to firms and residential housing to workers.He chooses the amount of floor space, Bj , to build and the share of it that is rented to firms, ⇥j ,to solve

maxBj ,✓j

rj⇥jBj + qj(1�⇥j)Bj �B⌘j

j

b⌘�1j

⌘j,

21

Page 22: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

where rj is the rental price of a commercial floor space, qj is the rental price of residential housing,

and B⌘j

j

b⌘�1j

⌘j

is the cost of producing Bj units of floor space. In steady state, a city cannot existwithout strictly positive amounts of commercial and residential space. A no arbitrage conditionrequires that rents for both uses of floor space are equal, rj = qj , otherwise landlords would onlyrent to the users that pay the highest rent.

This gives an equation for the supply of buildings

Bj = bjr1

⌘j�1

j,

where 1⌘j�1 is the elasticity of building supply and bj is the productivity of the building sector.

Profits of the landlord are ⇧B

j= (bjrj)

⌘j

⌘j�1 1bj(⌘j�1⌘j

) and their costs, which are paid in units of the

final good, are cBj= 1

bj⌘j(bjrj)

⌘j

⌘j�1 . The market clearing condition for the building market is

Mjfb + (1� �)Hj

whj

rj+ (1� �)Lj

wlj

rj= Bs

j

where the left-hand side gives demand for buildings from firms and workers, respectively.

3.6 Equilibrium

Profits are made by the landlords and by the owners of the firms. I assume that profiteers consumeunits of the final good in their city, but do not consume housing. Thus, even though the final goodis freely traded, the final goods market will clear in each city since the final good is not differentiatedand each city makes only enough to satisfy local demand. The final goods market clearing conditionin each city is given by

�wHjHj + �wLjLj +⇣⇧v

j � (✓hjwhj + ✓ljwlj)Mjfc �Mjrjf

b � (✓hjwhj + ✓ljwlj)Njfe

j

...+⇧B

j + cBj = Yj .

Using the equations for revenue and costs of the building sector,

⇧b

j = rj

✓Mjf

b + (1� �)Hj

whj

rj+ (1� �)Lj

wlj

rj

◆� cBj ,

the final goods market clearing condition simplifies to

wHjHj + wLjLj +⇧v

j � (✓hjwhj + ✓ljwlj)�Mjf

c

j +Njfe

j

�= Yj .

A labor market clearing condition holds for each type of labor

⇡ljL = Mj

Z

z

ldj (z)q(z)g(z)dz + ✓hjMjfc

j + ✓hjNjfe

j

22

Page 23: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

⇡HjH = Mj

Z

z

hdj (z)q(z)g(z)dz + ✓ljMjfc

j + ✓ljNjfe

j ,

where ⇡⌧j is the share of labor type ⌧ that chooses to go to city j and the right hand side aggregatesover the distribution of firms to get city level labor demand.

A steady state equilibrium is a set of prices, labor allocations, output, the mass of firms andentrants, and exit thresholds, {wlj , whj , rj , Lj , Hj , Yj ,Mj , Nj , zx}, for each city, such that:

1. there is an invariant distribution of firms, gj(s), that satisfies the Kolmogorov Forward Equa-tion,

2. there is a constant mass of firms and entrants, Mjt and Njt,

3. exit policies, zxj , satisfy the firms HJB equation and the free entry condition holds in eachcity,

4. firms and landlords maximize profits,

5. workers choose the city that maximizes utility and the share of their income to spend onconsumption and housing,

6. the final goods market, the land market, and labor markets clear in each city.

3.7 Calibration

To calibrate the model, I first calibrate a set of parameters that are constant across cities. Someof these parameters are standard values that I take from the literature and others are calibrated tomatch moments in the data. Next, I recover the city-level fundamentals that can rationalize thecross-sectional patterns by city size in 1980. I show that conditional on the aggregate parameters,there is a unique set of city fundamentals that are consistent with the data in 1980 being a steady-state equilibrium of the model.

A. AggregatesH mass of high skill .24 dataL mass of low skill 1 normalization

B. Elasticities1� � housing share of income .4 Monte et al. (2018)� elasticity of substitution ! 6.5 Broda and Weinstein

(2006)⌫ scale parameter of Gumbel

distribution3 Allen et al. (2018)

Table 3: Aggregate parameters

Table 3 gives the aggregate parameters and the corresponding moments or sources that I useto calibrate them. The mass of low-skilled workers is normalized to 1 and the mass of high-skilled

23

Page 24: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

parameters description value moment target modelµ persistence of Brownian

motion

�.0016 pareto tail emp xx

sd of Brownian motion .023 sd emp. gr xx✏ elasticity of sub. L & H 1.64 Autor et al. (2008) 1.62 1.62f b

fixed building cost for

firms

1.08 3.5% of costs 3.5% 3.5%

Table 4: Firm productivity process parameters

workers is calculated using the aggregate share of workers with 4 or more years of college in 1980.� gives the share of income spent on rent, calibrated to be .4 following Monte et al. (2018). Theelasticity of substitution between varieties is also a standard parameter taken from the literature.I set it to 6.5 in line with the estimates from Broda and Weinstein (2006). Finally, I take the scaleparameter of the Gumbel distribution, which governs the migration elasticity, from Allen et al.(2018) and set it to 3.

In table 4, I show the calibration and model fit of the firm productivity parameters and thefirm production function. These four parameters are jointly calibrated using a method of momentsroutine A standard value for the aggregate elasticity of substitution between high- and low-skilledlabor is 1.62 as estimated by Autor et al. (2008). However, ✏ in my model is a firm level elasticityof substitution for variable factors of production (excluding the fixed costs). I calibrate the firmlevel ✏ to match an aggregate elasticity of substitution of 1.62. I calibrate the parameters of thefirm’s stochastic process for productivity, the drift µ and the variance , using moments calculatedin the establishment micro-data. The establishment size distribution will have a Pareto tail withtail parameter �. In the LBD, I calculate that � is equal to XX and the standard deviation ofemployment growth is XX12. I calibrate the fixed cost that firms pay in units of building space, fb,so that the share of firm’s costs spent on the fixed unit of building space is 3.5%. The fixed buildingspace should be interpreted as the minimum amount of real estate a firm needs to produce in thecity. 13

12These parameters are still undergoing the disclosure process from the Census Bureau13I could not find estimates from the literature to pin down this parameter. The share of costs paid in rent for

the minimum building space cannot be much higher than 4% without the value of entering falling below the cost ofentering in which case there is no entry in equilibrium. Thus, within the feasible range of this target (between 1 and4 percent of costs), it makes little difference to the results.

24

Page 25: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

parameters description targets�j, skill premium wj,h

w1,l

j city size premium wj,l

w1,l

Ahj skill intensity Hj

Lj

Alj relative city size Hj+Lj

H1+L1

f e

jestablishment start up rate esrj

f c

jestablishment size Hj+Lj

Mj

bj rent of city j rj

w1,l1

⌘j�1 elasticity of building supply Saiz, 2010

Normalizations�1,l, Ah1,Al1 1

Table 5: City level fundamentals

Next, I calibrate the parameters that vary across cities. Let Pc = {�, , Ah, Al, f e, f c, b, ⌘} bethe vector of city level fundamentals, that is the weight on high-skilled labor, �, the city specificproductivity term, , high- and low-skilled amenities, Ah and Al, entry costs, fe, fixed costs, fc,the productivity of the building sector b, and the elasticity of building supply, ⌘. Table 5 gives themoments from the data used to identify each of the fundamentals. In appendix C.1, I show thatthere are unique values of the fundamentals that rationalize the data in 1980 as being an equilibriumof the model and I describe the steps to back out the fundamentals from the data.

4 The Arrival of a New Technology

In this section, I consider what happens when a new technology is introduced into the economy ofSection 3. The new technology has an absolute productivity advantage, but use high-skilled labormore intensely, and so, represents skill-biased technical change. Firms can choose to adopt the newtechnology, but will pay an additional fixed cost in units of high-skilled labor. The adoption choiceis irreversible. Even though the new technology is available across all cities, firms that are otherwisethe same will potentially make different adoption choices based on the characteristics of the city inwhich they produce. I first describe the new technology and the technology adoption choice. Then Idefine the equilibrium in a model with endogenous technology adoption and describe the calibrationof the new technology. Finally, I describe the mechanisms that drive differences in the technologyadoption decisions across cities.

4.1 The new technology

The new technology uses a CES bundle of high- and low-skilled labor

qjt = � jzt(!)⇣(1� ���j)ljt

✏�1✏ + ���jhjt

✏�1✏

⌘✏

✏�1 ,

25

Page 26: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

has an absolute productivity advantage, � > 1, but places a higher weight on the use of high-skilled labor than the old technology, �� > 1. As in section 3.2, firms choose the optimal bundle ofhigh-skilled and low-skilled labor to minimize their unit cost function. The unit cost function forusing the new technology is given by

caj (z) = (� jz)�1 ((���j)

✏w1�✏hj

+ (1� ���j)✏w1�✏lj

)1

1�✏ .

Depending on the wages in a city, adopting the new technology may deliver marginal cost savingsto the firm. Conditional on adopting, the firm’s profit maximization problem is the same as thestatic profit maximization problem under the old technology described in section 3.2 except thatnow it has a lower marginal cost.

In order to use the new technology, firms pay an additional fixed cost in units of high-skilledlabor denoted as �fc, which is constant across cities and paid according to the cost shares of thenew CES technology.

4.2 The firm’s adoption decision

As long as adopting lowers the marginal cost, caj(z) < cj(z), there will always be a productivity

threshold, za, above which it is optimal to adopt the new technology. If wages are such that thenew technology does not lower the firm’s marginal cost, ca

j(z) > cj(z), no firm will adopt the new

technology.The dynamic optimization problem of the firm involves solving for three thresholds. First, firms

that have not yet adopted choose an exit threshold below which they will exit the market, zax.Second, conditional on the value of adopting the new technology, firms that have not yet adoptedchoose the technology adoption threshold, za, above which firms will adopt the new technology andan exit threshold, zx, below which they exit the market. Because the adoption choice is irreversible,firms that have already adopted no longer make a technology choice and only choose when to exit. Ifirst describe the dynamic exit decision of firms that have already made the irreversible technologyadoption choice before describing the dynamic problem of a firm that has not yet adopted.

Adopters

Firms that have adopted the new technology solve the static profit maximization problem describedin section 4.1 and an optimal stopping time problem analogous to the dynamic problem of a firmin the economy with no technology adoption choice described in section 3.2. Their value functionis given by:

V a(z) = maxzaxz

Ez

ZT (zax)

0e�⇢t

⇣⇡avj (z(t))� (✓hjwhj + ✓ljwlj)f

c

j � (✓haj whj + ✓laj wlj)�fc � rjfb

dlog(zt) = µdt+ dW (t)

26

Page 27: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

where ⇡avj(z) is variable profits of a firm in city j with productivity z that has adopted the new

technology. As before, I use a change of variable s(z) = ln z��1. The value function takes the sameform as in the case without adoption:

V a(s) =Za

j

⇢� µ� 12

2es +

1

⇢[�(✓hjwhj + ✓ljwlj)f

c

j � whj�fc � rjfb]...

+

"�Za

j

⇢� µ� 12

2es

ax�⇠�s

ax � 1

⇢[�(✓hjwhj + ✓ljwlj)f

c

j � (✓haj whj + ✓laj wlj)�fc � rjfb]e�⇠

�sax

#e⇠

�s

and the exit threshold for firms that have adopted is

sax = ln

"1

Za

j

✓�⇠�

1� ⇠�

◆ ⇢� µ� 1

2 2

!⇣(✓hjwhj + ✓ljwlj)f

c

j + (✓haj whj + ✓laj wlj)�fc + rjfb

⌘#

where Za

j=⇣(� j)

�1⇣(���j)✏w

1�✏jh

+ (1� ���j)✏w1�✏jl

⌘1

1�✏�

��1

⌘1��YjP �

j

1�.

Non-Adopters

Firms that have not yet adopted solve for two thresholds: an adoption threshold za and an exitthreshold zx. The value function for the firm is the expected discounted stream of profits plus theexpected discounted value of adopting, va(s(za)), times the probability that the firm reaches theadoption threshold za before the exit threshold zx. Let T = T (zx) ^ T (za) be the stopping time,the discounted stream of profits is given by:

Fj(z, zx, za) =

8>>><

>>>:

Ez[RT

0 e�⇢t⇣⇡j(z(t))� (✓h

jwhj + ✓l

jwlj)f c

j� rjf b

⌘dt...

+⇥e�⇢T |z(T ) = zaj

⇤Pr [z(T ) = zaj ]V a

j(zaj)]

V a(zaj)

, z 2 (zxj , zaj)

z > zaj

dlog(zt) = µdt+ dW (t).

As above I use a change of variable s(z) = ln z��1 to rewrite the expected profit function as

F (s, sx, sa) =

8>>>><

>>>>:

Es[RT

0 e�⇢t✓

Zj

⇢�µ� 12

2 es � (✓h

jwhj + ✓l

jwlj)f c

j� rjf b

◆dt...

+⇥e�⇢T |s(T ) = sa

⇤Pr [s(T ) = sa]

⇣V a

j(sa)

⌘]

V a(sa)

, s 2 (sx, sa)

s > sa

dst = µdt+ dW (t).

27

Page 28: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

The value function is defined asV (s) = max

sx,sa

F (s, sx, sa),

i.e. the discounted stream of profits when choosing the optimal exit and adoption thresholds. Thevalue function will satisfy the HJB equation

⇢V (s) =

8<

:

h⇡vj(s)� (✓h

jwhj + ✓l

jwlj)f c

j� rjf b

i+ µV 0(s) + 1

2�2V 00(s), s 2 (sx, sa)

⇢ [V a(sa)] , s � sa

and the conditions:

value matching: V (sx) = 0; V (sa) = V a(sa)

smooth pasting: V 0(sx) = 0; limz!za

V 0(sa)� V a0(sa) = 0

There is no closed form solution for the value function and the exit and adoption thresholds. Inappendix B.3, I derive the value function as a function of sx and sa and provide the two smoothpasting conditions that give the optimal thresholds when solved numerically.

4.3 The distribution of firms with adoption

I solve for the distribution of adopters and the distribution of non-adopters separately. As in section3.3, the stochastic process for firm size is ds

dt= µdt+ dW (t) and the distribution of firms satisfies

the Kolmogorov Forward Equation

@gj(s)

@t= �µ

dgj(s)

ds+

1

2 2d

2gj(s)

ds2.

In steady state @gj(s)dt

= 0 and � = �µ

2/2. In appendix B.4, I show that the distribution of adopters

is given by14

fa

j (z) =

8<

:

Ea

j

µ

⇣z�(1��)e�s

a

xj � 1⌘ �

��1z

�z 2 (za

xj, zaj)

Ea

j

µ

⇣e�s

a

xj � e�sa⌘z�(1��)

���1z

�z > zaj

and the distribution of non-adopters is given by

fn

j (z) =

8<

:

En

j

µ

�z�(1��)e�sxj � 1

� ���1z

�z 2 (zxj , ze)

En

j

µ

⇣e��(se�sxj)�1e��se�e

��saj

⌘ �z�(1��) � e��saj

� ���1z

�z 2 (ze, zaj)

.

14Note that here I am giving the distribution over productivities f(z) instead of the distribution over sizes g(s).In the appendix, I derive the distribution over size g(s) and then use a change of basis to find the distribution over z.

28

Page 29: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

Integrating both of these over z to get the mass of adopters and non-adopters and rearranging gives

sraj =Ea

j

Ma

j

=�µ

saj � saxj

andsrnj =

En

j

Mn

j

=�µ

(se � sxj) +⇣e��(se�sxj)�1

e��(se�saj)�1

⌘(saj � se)

where Ea

jand En

jare the mass of adopters who exit and non-adopters that exit, respectively. Finally

in appendix B.4, I show that En

j= �Ea

j

⇣e��(se�sxj)�1e��se�e

��saj

⌘e��saj which means both distributions can

be written as a function of Ea

j, and the mass of firms Mj = Ma

j+Mn

jcan be written as a function

of Ea

j.

In steady state, the mass of new adopters, A, must be equal to the mass of adopters that exit,Ea and the mass of entrants, N , must be equal to the mass of exitors, Nj = En

j+ Ea

j. Finally, the

share of adopters is given by

µj =Ma

j

Ma

j+Mn

j

=(sa

xj� saj)

(saxj

� saj) +⇣e��(se�sxj)�1e��se�e

��saj

⌘e��saj

h(se � sxj) +

⇣e��(se�sxj)�1e��se�e

��saj

⌘e��saj (saj � se)

i .

Thus, the share of adopters, the start up rate for adopters and the start up rate for non-adopters canall be characterized by three objects: the difference between the entry threshold and exit thresholdfor non-adopters, the difference between the adoption threshold and the exit threshold for adopters,and the difference between the entry threshold and the adoption threshold. In appendix D.3, Idiscuss how the churn rates and the share of adopters move with each threshold and I discuss howeach threshold moves with endogenous objects such as prices and market size.

4.4 Steady state equilibrium with new technology

The equilibrium in the model with the technology adoption decision is very similar to the modelwithout adoption. The final goods market clearing condition in each city is given by

�wHjHj+�wLjLj+⇣⇧v

j � (✓hjwhj + ✓ljwlj)Mn

j fc

j � (✓haj whj + ✓laj wlj)Ma

j �fc �Mjrjf

b � (✓hjwhj + ✓ljwlj)Njfe

j

...+⇧B

j + (✓hjwhj + ✓ljwlj)Mn

j fc

j + (✓haj whj + ✓laj wlj)Ma

j �fc + (✓hjwhj + ✓ljwlj)Njf

e

j + cBj = Yj ,

where the only difference with the model without adoption is the inclusion of the additional fixedcost for adopters. The final goods market clearing condition simplifies to

Yj = wHjHj+wLjLj+⇧v

j � (✓hjwhj+✓l

jwlj)Mn

j fc

j � (✓haj whj+✓la

j wlj)Ma

j �fc� (✓hjwhj+✓

l

jwlj)Njfe

j .

A steady state equilibrium is a set of prices, labor allocations, output and the mass of adopters,non-adopters, entrants and new adopters, and exit and adoption thresholds {wlj , whj , rj , Lj , Hj , Yj ,Ma

j,Mn

j, Nj , Aj , zxj , zaxj , zaj},

29

Page 30: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

for each city, such that

1. in each city, there is an invariant distribution of adopters and non-adopters that satisfies theKolmogorov Forward Equations with @g

a

j(s)

dt= 0 and @g

n

j(s)

dt= 0,

2. there is a constant mass of adopters, non-adopters, entrants and new adopters, Ma

j, Mn

j, Nj

and Aj ,

3. exit and adoption policies satisfy the firms HJB equations and the free entry conditions holdin each city,

4. firms and landlords maximize profits,

5. workers choose the city that maximizes utility and the share of their income to spend onconsumption and housing, and

6. the final goods market, the housing market, and labor markets clear in each city.

4.5 Calibration of the new technology

parameters description parametervalue

targets data model

�� new high skill share 1.7014 Aggregate growth in high-skilled wages 39.9 35.7� productivity

advantage of new

technology

1.4270 Aggregate growth in low-skilled wages 4.31 1.27

�fc additional fixed

cost

4.1727 Change in avg. establishment size 18.5 19.9

H Aggregate quantity

of high-skilled

labor

.6935 growth in supply of high-skilled labor .6935 .6935

L Aggregate quantity

of low-skilled labor

1.2431 growth in supply of low-skilled labor 1.2431 1.2431

Table 6: Calibration of the new technology

There are three parameters of the new technology that need to be calibrated, �� the weight on high-skilled labor, � the absolute productivity advantage of the new technology and �fc the additionalfixed cost paid in units of high-skilled labor. I calibrate the parameters to match moments onthe aggregate growth in high- and low-skilled wages between 1980 and 2014 and on the change inaverage firm size. In addition, Katz and Murphy (1992) show that in order to understand the extentof SBTC, it is important to consider changes in the relative supply of high-skilled workers. I changethe aggregate quantities of high- and low-skilled labor, H and L, to match the growth in high- andlow-skilled labor between 1980 and 2014.

30

Page 31: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

The five parameters and the corresponding targets are listed in table 6. The model slightlyunderstates wage growth for both high- and low-skilled workers. However, the model does a goodjob in matching the aggregate increase in the skill premium, or the ratio of high- to low-skilledwages, which rises to 2.08 in the data and 2.13 in the model.

5 Results

5.1 Model predictions versus the data

Figure 6 shows that the introduction of the skill-biased technology can reproduce the patterns inthe data described in section 2. In 1980, the model perfectly matches the relationship in the databetween wages, skill intensity, and dynamism with city size. I reproduce these relationships in the2014 data and in the second steady state in the model. In the second steady state, the only thingsthat changed are the aggregate quantities of high- and low-skilled labor and the availability of thenew technology.

In the data, the city size wage premium for high-skilled workers increased from .056 percentto .072 percent. In the model, the high-skilled city size wage premium increases from .056 to .065

percent. Thus the model can explain 53 percent of the increase in the city size wage premium forhigh-skilled workers. Similarly, the city size wage premium for low-skilled workers decreases from.047 to .035 percent in the data, while in the model it decreases from .047 to .032 percent. Thus,the model slightly over-explains the decline. The model is successful in matching the divergenceby skill group, the city size wage premium rises for high-skilled workers while it falls for low-skilledworkers.

The model also explains a substantial share of the changing relationship between the skill pre-mium and city size. In 1980, the skill premium is, on average, 1.00 percentage points higher in citiestwice as large. By 2014, the skill premium was 5.17 percentage points higher in cities twice as large.In the model, the relationship between the skill premium and city size increases from 1.00 to 4.90

percentage points. Thus the model explains 89 percent of the increase in the slope.The model also matches the changing relationship of skill intensity, or the ratio of high to low-

skilled workers, with city size. In 1980, the semi-elasticity between skill intensity and city size was.026 percentage points. By 2014, it had increased to .074 percentage points. The model matchesthis fact well. The relationship between skill intensity and city size increases from .026 in the initialsteady state to .085 in the final steady state. Thus the model over explains the changing relationshipbetween city size and skill intensity.

Turning to the establishment start up rate, in the data the semi-elasticity between the estab-lishment start up rate and city size increased from �.322 to .45 percentage points, while in themodel it increased from �.322 to �.018 percentage points. Thus, the model can explain 39 percentof the changing slope. Next, I discuss the mechanisms that are important for driving the changingrelationship between dynamism and city size.

One concern is that the success of the model in generating the changing cross-sectional patterns

31

Page 32: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

β1980=0.056(0.005)β2014=0.072(0.004)

βmodel,2014=0.065(0.006)

−.1

0

.1

.2

.3

log

(wh

), d

e−

me

an

ed

10 12 14 16

log(population)

lincwage_college_normlmodel_wh_normlincwage_college_normlmodel_wh_norm

(a) City size wage premium: high-skilled

β1980=0.047(0.007)β2014=0.035(0.004)

βmodel,2014=0.032(0.007)

−.1

−.05

0

.05

.1

.15

log

(wl),

de

−m

ea

ne

d

10 12 14 16

log(population)

lincwage_nc_norm lmodel_wl_norm lincwage_nc_norm lmodel_wl_norm

(b) City size wage premium: low-skilled

β1980=0.014(0.008)β2014=0.074(0.008)

βmodel,2014=0.067(0.011)

−.1

0

.1

.2

.3

skill

pre

miu

m,

de

−m

ea

ne

d

10 12 14 16

log(population)

skillpremium_dmmodel_skillpremium_dmskillpremium_dmmodel_skillpremium_dm

(c) Skill premium

β1980=0.026(0.004)β2014=0.074(0.009)

βmodel,2014=0.085(0.012)

−.2

−.1

0

.1

.2

.3

skill

inte

nsi

ty,

de

−m

ea

ne

d

10 12 14 16

log(population)

HL_dm model_HL_dm HL_dm model_HL_dm

(d) Skill intensity

β1980=−0.322(0.227)β2014=0.452(0.119)

βmodel,2014=−0.018(0.273)

−2

0

2

4

6

sta

rt u

p r

ate

10 12 14 16

log(population), de−meaned

esr_3yr_dm model_sr_dm esr_3yr_dm model_sr_dm

(e) Establishment start up rate

βdata=0.684(0.129)βmodel=0.317(0.043)

−4

−2

0

2

∆S

tart

up

ra

te1980,2

014,

de

−m

ea

ne

d

10 12 14 16

log(pop, 1980)

data

model

(f) Change in the establishment start up rate

Figure 6: Descriptive facts: data vs. model

Source: Wages and skill intensity are from the 1980 Decennial Census and 2014 ACS. Dynamism is from the BusinessDynamic Statistic. Population is from the Intercensal Population Estimates. Additional data from author calculatedmodel output. Figure displays the relationship between wages, skill intensity, and dynamism and city size in themodel and the data. By construction, the model and data align perfectly in 1980 and are both shown in blue. Redgives the 2014 data and green the 2014 steady state in the model. The unit of observation is one of 30 city sizecategories. All variables are demeaned by year.

32

Page 33: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

is driven by the persistence of variables such as population and wages. As a further test, in appendixD.1, I test the model in changes. For each of the four main variables, high-skilled wages, low-skilledwages, skill intensity and business dynamism, I show the relationship between the change in thedata between 1980 and 2014 and the change in the model between the first and second steady state.With the exception of high-skilled wages, the relationship between the change in the data and thechange in the model is positive and statistically significant at the 10% level. For high-skilled wages,the relationship is positive, but not significant.

5.2 The changing relationship between dynamism and city size

The start up rate in city j is given by

srj = (1� µj)srna

j + µjsra

j ,

a weighted average of the start up rate for non-adopters and adopters where the weight is the shareof firms that have adopted, µj . In appendix D.3, I characterize the three objects in this equationsrna

j, sra

jand µj , and I discuss in detail how they are determined by three thresholds: the exit

threshold for adopters, the exit threshold for non-adopters and the adoption threshold. I show howeach threshold and, therefore, the churn rates are affected by endogenous objects such as wages,rents, and market size and the parameter � which gives the weight on high-skilled labor in the initialequilibrium.

In this section, I quantify which factors are important for driving the changing equilibriumrelationship of dynamism with city size. In figure 7, I plot the counterfactual change in dynamismrates versus city size between the 1980 and 2014 steady states, holding each of the endogenousobjects fixed across cities (or if the case of market size I consider what would have happened if thegrowth rate had been equal across cities15).

Panel (a) shows what the change in dynamism would have been if rent had remained constant atthe 1980 level. In the baseline model, the change in dynamism is about 2 percentage points higherin the biggest city relative to the smallest cities. Holding rent constant, the difference across citieslargely disappears. Rent increases affect the fixed costs paid by the firm. When fixed costs are high,firms need a higher volume of sales to justify staying in the market. The exit threshold shifts upmeaning that for any given productivity, firms are more likely to exit the market. In steady state,entry and exit rates must be equal so an increase in the exit threshold means that both entry andexit rates will increase. If there had been no endogenous response of rent to the availability andadoption of the new technology, the start up rate would not increase more in big cities relative tosmall cities. Therefore, the increases in rent, which are larger in big cities than small cities and inturn drive changes in selection, are an important driver of the changing cross sectional pattern ofdynamism and city size.

15I do this because holding market size at the 1980 level, the exit threshold will rise above the entry threshold andthere will be no firms in the market.

33

Page 34: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

-6 -5 -4 -3 -2 -1 0

log(population)

0.5

1

1.5

2

2.5

sta

rt u

p r

ate

-14

-13

-12

-11

-10

-9

-8

-7

-6

-5

baseline model

Rent at 1980 level

(a) rent at 1980 level

-6 -5 -4 -3 -2 -1 0

log(population)

0.5

1

1.5

2

2.5

sta

rt u

p r

ate

-14

-12

-10

-8

-6

-4

-2

0

2

baseline model

Equal growth in Y

(b) equal growth in Y

-6 -5 -4 -3 -2 -1 0

log(population)

0.5

1

1.5

2

2.5

sta

rt u

p r

ate

-20

-19

-18

-17

-16

-15

-14

-13

-12

-11

-10

baseline model

Wages at 1980 level

(c) wages at 1980 level

-6 -5 -4 -3 -2 -1 0

log(population)

0.5

1

1.5

2

2.5

sta

rt u

p r

ate

-18

-17

-16

-15

-14

-13

-12

-11

-10

baseline model

Skill premium at 1980 level

(d) skill premium at 1980 level

Figure 7: Drivers of the cross sectional change in dynamism

Source: Model calculations of counterfactuals cross-sectional patterns of dynamism and city size if different variableswere held constant.

34

Page 35: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

Panel (b) of figure 7 shows what would have happened had every city experienced equal growthin the market size, Yj . Though market size grows everywhere, it actually grows more in smallcities than large cities. This is because of the inelastic housing supply in big cities which limitstheir growth. In appendix D.2, I show that the model matches the relationship between populationgrowth and initial city size in the data. An increase in market size increases profits for a firmand makes firms more willing to stay in the market for any given productivity. This lowers theexit threshold and decreases the city level churn rate. Thus if market size had increased the sameeverywhere, the change in dynamism rates would have been smaller in big cities since the growthin Y was bigger in small cities than big cities. Thus, the endogenous response of Y contributes tothe increasing relationship of dynamism and city size in the second steady state.

Panels (c) and (d) look at the effect of wages on the change in dynamism with city size. Inpanel (c), I hold both high- and low-skilled wages fixed at their 1980 level and in panel (d), I allowlow-skilled wages to grow as they did in the second steady state, but adjust high-skilled wages sothat the skill premium is held at its 1980 level. Lower wages imply weaker selection - firms are morewilling to stay in the market for any given productivity level. The two figures have similar effects onthe cross sectional pattern of dynamism, implying most of the effect on dynamism comes from thechange in high-skilled wages16. In both counterfactuals, the differences across cities in the changein dynamism would have been attenuated - big and small cities would have seen similar changes indynamism rates without the equilibrium response of wages, and particularly high-skilled wages, tothe introduction of the new technology.

6 Technology Adoption

A key result in the quantitative exercise is that technology adoption is higher in larger cities. Thishappens despite the fact that all firms in all cities have access to the new technology. The goal ofthis section is to determine what factors drive differences in adoption. Section 6.1 examines whichcity characteristics are important for driving the differences Section 6.2 uses data on ICT spendingto validate these mechanisms in the data.

6.1 The drivers of technology adoption

Here, I discuss four channels driving differences in adoption rates across cities: market size, initialtechnology, selection, and amenities. In figure 8, I show how the share of firms that adopt the newtechnology will change as each channel is strengthened or weakened. I do this for one city in thecalibrated model as an illustration.

The first channel affecting adoption rates across cities is the market size effect. In cities that

16In other words, both figures have the same 1980 skill premium, but panel (c) keeps low-skilled and low-skilledwages at the 1980 level while panel (d) allows only low-skilled wages to adjust to the 2014 level. Since we see mostof the change in dynamism rates when we only hold the skill premium constant (panel d), this implies most of theeffect is coming from the change in high-skilled wages.

35

Page 36: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

-0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 0.04 0.05

% change in Y

-1.5

-1

-0.5

0

0.5

1

1.5

% c

hang

e in

thre

shold

s

0.0266

0.0268

0.027

0.0272

0.0274

0.0276

0.0278

0.028

0.0282

0.0284

0.0286

sha

re o

f adop

ters

exit threshold non-adopters

exit threshold adopters

adoption threshold

share of adopters

(a) market size

-0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02

% change in gm

-6

-5

-4

-3

-2

-1

0

1

2

3

% c

hang

e in

thre

shold

s

0.023

0.024

0.025

0.026

0.027

0.028

0.029

sha

re o

f adop

ters

exit threshold non-adopters

exit threshold adopters

adoption threshold

share of adopters

(b) gamma

-0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 0.04 0.05

% change in rent

-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

% c

ha

nge

in thre

shold

s

0.0274

0.0275

0.0276

0.0277

0.0278

0.0279

0.028

share

of

ad

op

ters

exit threshold non-adopters

exit threshold adopters

adoption threshold

share of adopters

(c) selection

-0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 0.04

% change in wh

-6

-4

-2

0

2

4

6

% c

ha

nge

in thre

shold

s

0.02

0.022

0.024

0.026

0.028

0.03

0.032

0.034

0.036

0.038

0.04

Sh

are

of

ad

op

ters

exit threshold non-adopters

exit threshold adopters

adoption threshold

share of adopters

(d) high-skilled wages

Figure 8: Drivers of adoption

Source: Model calculations of the exit threshold for non-adopters, the exit threshold for adopters, the adoptionthreshold and the share of adopters versus model objects such as wages, rents and the initial weight on high-skilledlabor �j . These relationships are shown for city size category 20.

36

Page 37: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

are large, firms will be more willing to pay the additional fixed cost associated with adopting thenew technology because the marginal cost savings are more valuable when there is a larger volumeof sales. Panel (a) of figure 8 shows what happens to the share of firms that adopt and the threethresholds governing the firm size distribution as the market size, given by Y , changes. The shareof firms that adopt the new technology increases with market size. As Y gets smaller, the exitthreshold rises above the entry threshold and no firm is willing to stay in the market; the lines endsat this point since there are no firms to adopt.

Panel (b) shows the change in the share of firms adopting as the skill intensity of the city-leveltechnology changes. As �j increases, the share of firms who adopt the new technology increases.This is true even though the weight on high-skilled labor in the new technology is a proportional tothe weight in the old technology. This is driven by two factors. First, as �j increases, non-adoptersuse high-skilled labor more intensively and have more expensive labor costs, but do not get theproductivity advantage associated with the new technology. As a result, the exit threshold increases,and the non-adopters exit the market. The market share of adopters increases, which increases thereturns to adopting and lowers the exit threshold for adopters. Eventually the exit threshold fornon-adopters rises above the entry threshold and there will be no firms in the market. Second, as�j increases the difference between the labor cost bundle for adopters and non-adopters increaseseven though the weight on high-skilled labor for the new technology is scaled proportionally. As thedifference between the cost bundles grows, so does the benefit to adopting and more firms becomeadopters.

The third factor affecting adoption rates is selection. This can be seen in panel (c) of figure8 which shows what happens to adoption rates and the exit thresholds as rent increases. As rentincreases, the fixed costs firms need to pay to stay in the market increase. Non-adopters are hurtespecially hard by the increase in rent since they have a lower volume of sales to cover the fixedcost. The exit threshold for non-adopters increases, meaning that less of them stay in the market,and the share of firms that have adopted the new technology increases.

Finally, a fourth factor affecting adoption rates is amenities for high-skilled workers, which lowerthe relative wages paid to high-skilled workers. If wages for high skilled workers are high, the returnsto adopting will be lower since the new technology uses high skilled labor more intensively. Panel(d) plots what happens to the share of firms that adopt the new technology and the adoptionand exit thresholds as high skilled wages increase. As high skilled wages increase, labor becomesmore expensive for both adopters and non-adopters and both types of firms are more likely to exitthe market. Thus, the change in adoption rates is primarily drive by the change in the adoptionthreshold. As high skilled wages increase, the adoption threshold increases and the share of adoptersdeclines.

6.2 Model validation

As discussed in the previous section, there are four channels that influence adoption rates: marketsize, initial technology, relative wages and selection. In table 7, I validate the mechanisms in the

37

Page 38: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

model by showing the relationship of ICT spending per employee today versus 1980 city charac-teristics, and that the same relationships hold for the share of firms adopting the new technologyin the model. While ICT spending is not a binary measure of firm technology adoption, it doescapture firms’ investments decisions in new technologies, and thus provides suggestive evidence onthe model’s mechanisms. In appendix E.1, table 13, I show that the same relationships hold usingICT share of investment, instead of ICT spending per employee, as the dependent variable.

As a measure of market size, I use 1980 working age population. The elasticity between ICTspending per employee today and 1980 population is 5.65 percent, meaning that ICT spending peremployee is 3.96 percent higher today in markets that were twice as large in 1980. Similarly, in themodel, the share of adopters is 1.3 percentage points higher in cities that were initially twice aslarge. This provides suggestive evidence that market size is an important factor in a firms decisionto adopt new technologies.

Two additional factors influencing the technology adoption decision of firms are the initial tech-nology used in the city and the amenities for high-skilled labor. Firms that are in locations whichare already good at using high skilled labor will be more likely to adopt the new technology. Firmsin locations that have abundant amenities for high-skilled labor will, all else equal, have a lowerskill premium. The initial technology and the amenities will jointly determine the initial skill mix ofthe city and the initial skill premium. Therefore, I examine the relationship between both measuresand ICT spending. There is a positive correlation between ICT spending and initial skill intensity.Cities with a one percentage point higher skill intensity in 1980 have about one percent more ICTspending per employee today. This provides suggestive evidence for both the initial technologymechanism and the labor supply mechanism. Both abundant amenities for high skilled workers andan initial technology that is more skill intensive will make the initial population more high-skilledintensive.

The effect of the initial skill premium on subsequent adoption is ambiguous. If the technologyis more skill intensive, it will increase the initial skill premium and we would expect to see apositive relationship between ICT spending and the initial skill premium. On the other hand, ifamenities are abundant for high-skilled workers, high-skilled workers will be willing to receive alower wage and we would expect a negative relationship between ICT spending and the initial skillpremium. The data show a positive relationship between ICT spending and initial skill premium,which provides suggestive evidence for the importance of initial technology on subsequent technologyadoption. Though, consistent with the ex-ante ambiguity of the two mechanisms, controlling forskill intensity decreases the size and significance of the relationship between ICT spending andinitial skill-premium.

7 Counterfactuals

In this section, I use the model to undertake two counterfactuals. In the first I ask what wouldhappen if there were no migration. The first counterfactual is motivated by policies that have the

38

Page 39: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

Data: log( ict

emp) Model: share of adopters

log(pop1980) 0.0565⇤⇤⇤ 0.0436⇤⇤⇤ 1.85⇤⇤⇤ 0.656⇤⇤⇤

(0.00929) (0.00955) (0.200) (0.106)H1980L1980

0.970⇤⇤⇤ 0.462⇤⇤⇤ 61.4⇤⇤⇤ 46.5⇤⇤⇤

(0.212) (0.142) (2.948) (3.016)wh1980wl1980

0.279⇤⇤ 0.116⇤ 10.98 �0.673

(0.115) (0.0668) (9.304) (1.632)

Observations 239000 239000 239000 239000 30 30 30 30

R2 0.324 0.320 0.322 0.324 0.754 0.939 0.047 0.976

*** p<0.01, ** p<0.05, * p<0.1

Table 7: Model vs. Data: ICT spending and adoption vs. initial city charac-

teristics

Source: The left panel uses ICT data from ACES merged into the LBD for information on the location of thefirm. Population is from the Intercensal Population Estimates. Table displays the relationship between log of ICTspending from 2003 to 2013 and 1980 city characteristics including city size, skill intensity and the skill premium.The regressions include a full set of 4 digit NAICS fixed effects and year fixed effects. The unit of observation is afirm and city size is measured at the CBSA level. Standard errors are clustered at the CBSA level. The right paneluses model generated output and shows the relationship between the share of firms that adopt in a city and 1980 citycharacteristics.

goal of attracting high-skilled workers and firms to small cities. I ask what would happen if a policymaker could do this perfectly, and keep a small city’s share of the labor force constant at its 1980level.

In the second counterfactual, I ask what would happen if housing supply elasticities were equal-ized across cities. This counterfactual is motivated by recent papers (Ganong and Shoag, 2017;Herkenhoff et al., 2018; Hsieh and Moretti, 2019) that show housing supply constraints keep work-ers from moving to cities that have experienced high TFP growth. Using data on housing supplyelasticities from Saiz, 2010, I find that big cities are more constrained in their housing supply thansmall cities. I ask what would happen if housing supply became more elastic in big frontier cities.

The two counterfactuals are motivated by opposite views on using mobility to affect the ge-ographic distribution of welfare. The first has the view that small cities have been hurt by out-migration of high-skilled workers (“brain drain”) and limits mobility to keep high-skilled workersand firms in small cities. The second has the view that low-skilled workers are hurt by being keptout of the big productive cities and seeks to reallocate more people to big cities even at the expenseof small cities.

In examining the counterfactuals it is important to understand the affect on the aggregate rateof technology adoption and, therefore, technical change in addition to the usual considerations ofwelfare. I find that this tradeoff is relevant when considering these two policies. While, a policy ofno migration is unambiguously bad for welfare, it slightly increases aggregate adoption rates and,therefore, may be good for technological progress. On the other hand, equalizing housing supplyelasticities is unambiguously good for welfare, but decreases aggregate technology adoption.

39

Page 40: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

High-skilled Low-skilled

�adoption wh gr. consumption equiv. wl gr. consumption equiv.baseline, 2014 . . . . .no migration 0.009 1.413 -0.294 0.278 -1.484equal HSE -0.130 2.177 8.468 0.715 6.019

Source: Model calculations. Table shows the aggregate share of firms adopting, average wages and averageconsumption equivalent measures. Aggregates are averages across cities weighted by the number of firms,high-skilled population and low-skilled population, respectively

Table 8: Counterfactuals: aggregates

7.1 No migration

Motivated by the view that small cities are suffering from “brain drain”, or the out migration of theirhigh-skilled workers, the first counterfactual I undertake is to ask what would happen if there wereno migration in the model. Specifically, I change the aggregate quantities of high- and low-skilledlabor as in the data, but I keep each city’s share of the population of high- and low-skilled workersconstant at its 1980 level. I compare the welfare of a worker who stays in the same city j betweena world with no migration and the 2014 steady state with free mobility. Specifically, I look at theindirect utility of a worker of type ⌧ in city j with idiosyncratic preference ⇣ij

V 2014⌧ji = A⌧j

�c⌧j(w

2014⌧j , q2014j )

�� �d⌧j(w

2014⌧j , q2014j )

�1��e⇣ij ,

and I ask how much consumption a worker would give up in order to avoid living in the counterfactualworld with no migration. The consumption equivalent, �c, solves

A⌧j��cc⌧j(w

2014⌧j , q2014j )

�� �d⌧j(w

2014⌧j , q2014j )

�1��e⇣ij = V no migration

⌧j. (2)

Only workers with a particularly high preference for a certain city will remain there once they areallowed to migrate, but since their preferences are fixed, they do not affect the calculation of theirconsumption equivalent. However, the welfare calculation ignores the welfare gains of the workerwho was allowed to move cities.

Table 8 shows what happens to the aggregate share of firms that adopt the new technology,aggregate wages and the aggregate consumption equivalent and Figure 9 shows the relationship ofthese measures with city size.

Average wages increase slightly for both high- and low-skilled workers in the model with nomigration versus the 2014 steady state. However, looking at the average consumption equivalentmeasure, the policy makes both types of workers worse off. For high-skilled workers, the averageconsumption equivalent is �.31 meaning that high-skilled workers would, on average, give up .31

percent of their consumption in the 2014 steady state to avoid living in the world with no migration.The welfare costs for low-skilled workers are much higher. They are, on average, willing to give up

40

Page 41: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

.2

.3

.4

.5

.6

.7

HL

10 12 14 16

log(population)

model, 2014 counterfactual

(a) Skill intensity

−15

−10

−5

0

5

con

sum

ptio

n e

qu

iva

len

t, %

11 12 13 14 15 16

log(population)

high−skilled low−skilled

(b) consumption equivalence

1

2

3

4

5

6

ad

op

tion

10 12 14 16

log(population)

model, 2014 counterfactual

(c) adoption

−10

−5

0

5

10

% c

ha

ng

e in

price

s

11 12 13 14 15 16

log(population)

high−skilled wages low−skilled wages rent

(d) high-skilled wages

Figure 9: No Migration

Source: Model generated output. Panel (a) shows what skill intensity (H

L) looks like in the no migration coun-

terfactual. Panel (b) shows the consumption equivalent for high- and low-skilled labor versus 2014 city size. Theconsumption equivalent gives the percent of consumption they would be willing to give up in order to avoid the coun-terfactual world. Panel (c) shows the relationship between adoption and city size. Panel (d) shows the relationshipbetween the percent change in wages and rents and city size between the no-migration counterfactual and the 2014steady state.

41

Page 42: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

1.49 percent of their consumption to remain in the 2014 steady state with free mobility.Panel (a) of figure 9, plots the consumption equivalent for high- and low-skilled workers versus

city size in the 2014 steady state. For high-skilled workers, the consumption equivalent is negativein the small cities, while positive in the largest city meaning the policy benefits high-skilled workersin the largest city only. This is not surprising since the high-skilled workers are forced to stay inthe small cities where there is less adoption. High-skilled workers in the big city benefit since thereis no in-migration that puts downward pressure on their wages. For low-skilled workers the welfarecosts are larger for workers in the big city. This is because of the effect of the no migration policy onrents. Wages for low-skilled workers are actually slightly higher in the big city in the no-migrationsteady state, but rents are higher as well. This is because population is slightly higher in the bigcity when low-skilled workers are not allowed to migrate to smaller, less congested cities17.

Looking to panel (b), this policy increases the adoption rates in the small city and decreasesadoption in the big city since high-skilled workers have become more abundant in small cities andless abundant in big cities. However, the increase in adoption in the small city does not increasehigh-skilled labor demand enough to offset the increase in supply. Panel (c) shows that wages forhigh-skilled workers in the small city fall in the world with no migration. Overall, there is a slightincrease in aggregate adoption rates, but even though adoption rates increase in the small cities,this does not result in wage increases in the small cities. Panels (c) and (d) show that wages, onaverage, decreased in small cities for both high- and low-skilled workers even though adoption rateswent up.

7.2 Elasticity of housing supply

Recent work from Ganong and Shoag (2017), Herkenhoff et al. (2018), and Hsieh and Moretti(2019) identifies housing supply constraints as an important drag on aggregate growth. In thiscounterfactual, I ask what would happen if we could equalize the housing supply elasticities acrosscities. Using data on housing supply elasticities from Saiz (2010), panel (a) of figure 10 shows thatbig cities have a lower housing supply elasticity than small cities. Housing supply in a city is givenby

Bj = bjr1

⌘j�1

j.

In the counterfactual with equalized housing supply elasticities, I set ⌘j = 1J

P⌘j 8j. I also adjust

the productivity of the building sector in each city, bj , to keep the building supply constant in the1980 steady state.

Panel (b) of figure 10 plots the consumption equivalence measures, defined in equation 2, forhigh- and low-skilled workers versus 2014 city size. For both high-and low-skilled workers, theconsumption equivalence measure is increasing in city size. Both worker types benefit from relaxing

17Though counterintuitive, in appendix 19, I show that there is a negative relationship between population growthand initial city size in the model and in the data. This is driven by the lower housing supply elasticities in big citiesthat keep their population from growing even though they are becoming more productive. This is consistent withevidence from Rappaport (2018).

42

Page 43: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

1

2

3

4

5

ela

stic

ity o

f h

ou

sin

g s

up

ply

10 12 14 16

log(population, 1980)

(a) Housing Supply Elasticity, Saiz (2010)

−5

0

5

10

15

con

sum

ptio

n e

qu

iva

len

t, %

11 12 13 14 15 16

log(population)

high−skilled low−skilled

(b) consumption equivalence

1

2

3

4

5

6

ad

op

tion

11 12 13 14 15 16

log(population)

model, 2014 counterfactual

(c) adoption

−10

−5

0

5

% c

ha

ng

e in

price

s

11 12 13 14 15 16

log(population)

high−skilled wages low−skilled wages rent

(d) change in prices

−.6

−.4

−.2

0

.2

∆ s

tart

−u

p r

ate

11 12 13 14 15 16

log(population)

(e) dynamism

Figure 10: Equalize Elasticity of Housing Supply

Source: Data on housing supply elasticity is from Saiz (2010). Population is from the Intercensal Population Esti-mates. Model generated output. Panel (a) shows the relationship between housing supply elasticity and 1980 city size.Panel (b) shows the consumption equivalent for high- and low-skilled labor versus 2014 city size. The consumptionequivalent gives the percent of consumption they would be willing to give up in order to avoid the counterfactualworld. Panel (c) shows the relationship between adoption and city size and panel (d) shows the relationship of thechange in wages and rent and city size between the counterfactual and the 2014 steady state. Panel (e) shows thechange in dynamism between the counterfactual and the 2014 steady state.

43

Page 44: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

the inelastic housing supply in big cities and they are hurt by a more inelastic supply in smallcities. However, as shown in Table 8, the average consumption equivalents are resoundingly positivefor both types of workers meaning workers prefer the steady state with equalized housing supplyelasticities. High- and low-skilled workers would need to be compensated with 8.6 and 6.0 percentmore consumption, respectively, in the 2014 steady state to be in indifferent between the 2014 steadystate and the one with equalized housing supply elasticities.

Both types are hurt by the policy in the smallest cities. This is because of the equilibrium effecton market size. When the housing supply elasticities are equalized, big cities get bigger and smallcities get smaller. There is an agglomeration force in the model through the CES specification ofthe final-goods producer. When the small cities get smaller, they become less productive and wagesgo down for both types of workers. Since housing supply became more inelastic in the small cityrents do not fall to compensate the workers for the fall in wages. Panel (d) shows what happensto rents in the model with equalized housing supply elasticities. Even though the small cities aresmaller than in the 2014 steady state, their rents are slightly higher because their housing suppliesbecome less elastic.

Table 8 shows what happens to aggregate adoption rates in the steady state with equal housingsupply elasticities, and Panel (c) of Figure 10 shows what happens to adoption versus city size.Adoption rates go down in the big city and up in small cities. In the aggregate, the adoption ratefalls slightly. This is because rents (shown in panel d) in the counterfactual world are much lowerthan in the second steady state. Lower rents decrease the fixed costs paid by firms who, as a result,are more willing to stay in the market even if they are less productive, non-adopters. Thus, thetougher selection in big cities, stemming from their inelastic housing supply, amplifies the differencesin adoption rates and, therefore, SBTC across cities.

Finally, panel (e) shows the effect on the cross-sectional pattern of dynamism rates from equal-izing housing supply elasticities. As shown in section 5.2, changes in selection driven by the equi-librium response of rent is an important driver of the cross sectional changes in dynamism. Thus,it is not surprising that a policy of equalizing housing supply elasticities is partially effective inrestoring the cross-sectional pattern of dynamism rates to its 1980 level. Between the initial andfinal steady state, the relationship between dynamism and city size increases from �.322 to �.018.In the model with equalized housing supply elasticities the relationship between dynamism and citysize is �.103. Thus, the increase in the relationship between dynamism and city size is about twothirds of the increase in the baseline model.

8 Conclusion

I document that, since 1980, big and small cities have diverged on several important dimensions. In1980, big cities paid a similar wage premium to both high-and low-skilled workers. By 2014, the citysize wage premium for high-skilled workers was twice that of low-skilled workers. Furthermore, therelationship between skill intensity and city size and between dynamism and city size both increased

44

Page 45: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

over time. While facts about the divergence of the city size wage premium and skill intensity andcity size have been previously documented in the literature, the changing relationship of dynamismand city size is a new fact that I introduce.

I build a spatial equilibrium model with a model of firm dynamics embedded in each city. Addinga rich model of firm dynamics to an otherwise benchmark spatial equilibrium model, allows the jointconsideration of relative wage inequality and business dynamism across cities. I calibrate the modelto the 1980 steady state and show that it can match the salient features of the 1980 data.

I then use the model to consider the introduction of a new skill-biased technology. The newtechnology has an absolute productivity advantage, but uses skill more intensively and incurs ahigher fixed cost. Firms that are otherwise the same will make different adoption decisions basedon the characteristics of the city in which they are located. Firms adopt more when they are in acity that is big, better at using high-skilled labor and has abundant amenities for high skilled labor.Cities in which firms adopt more become skill intensive, more competitive, and more congested,driving an increase in high-skilled wages and rents. Smaller, less productive firms that do notfind it profitable to adopt the new technology find it harder to compete and exit the market,which amplifies the differences in adoption rates across cities. Furthermore, the selection forceputs additional downward pressure on low-skilled wages since these firms use low-skilled labor moreproductively.

Even though a new technology is available everywhere, the introduction of the technology favorsspecific kinds of labor and market characteristics, amplifying existing differences across cities. Theintroduction of the a new skill-biased technology which is differently adopted across cities canquantitatively account for the divergence between big and small cities. Specifically, it can accountfor the changes I document across cities in the divergence of the city size wage premium by skillgroup, the increasing relationship between skill intensity and city size, and the increasing relationshipbetween dynamism and city size.

As evidence for this mechanism I use data on ICT spending to document that firms in big citiesspend more per worker on ICT investment than firms in small cities and they devote a higher shareof their investment budget to ICT expenses. With the exception of Beaudry et al. (2010), thegeography of ICT spending is relatively unstudied, and I am the first to leverage the ACES datafor this purpose.

Firms in big cities adopt more, and as they do so they increase their demand for skilled laborrelative to unskilled labor driving an increase in the city size wage premium for high-skilled workers, adecrease in the city size wage premium for low-skilled workers and an increasing relationship betweenskill intensity and city size. As big cities become more competitive and more congested, it becomesharder for small, less productive firms to stay in the market which increases the equilibrium rate ofturnover in big cities relative to small cities. Small firms are better at using low-skilled labor thanbig firms who have adopted the new technology. As they exit the market, it amplifies the differencesacross cities in the extent of skill-biased technical change.

45

Page 46: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

Works Cited

Allen, T., C. de Castro Dobbin, and M. Morten (2018): “Border Walls,” NBER WorkingPapers 25267, National Bureau of Economic Research, Inc.

Asplund, M. and V. Nocke (2006): “Firm Turnover in Imperfectly Competitive Markets,” Reviewof Economic Studies, 73, 295–327.

Austin, B., E. Glaeser, and L. Summers (2018): “Jobs for the Heartland: Place-Based Policiesin 21st-Century America,” Brookings Papers on Economic Activity, 49, 151–255.

Autor, D. (2019): “Work of the Past, Work of the Future,” NBER Working Papers 25588, NationalBureau of Economic Research, Inc.

Autor, D., D. Dorn, and G. Hanson (2019): “When Work Disappears: Manufacturing Declineand the Falling Marriage Market Value of Young Men,” American Economic Review: Insights, 1,161–178.

Autor, D., D. Dorn, G. Hanson, and K. Majlesi (2016): “Importing Political Polarization?The Electoral Consequences of Rising Trade Exposure,” NBER Working Papers 22637, NationalBureau of Economic Research, Inc.

Autor, D. H. and D. Dorn (2013): “The Growth of Low-Skill Service Jobs and the Polarizationof the US Labor Market,” American Economic Review, 103, 1553–1597.

Autor, D. H., L. F. Katz, and M. S. Kearney (2008): “Trends in U.S. Wage Inequality:Revising the Revisionists,” The Review of Economics and Statistics, 90, 300–323.

Autor, D. H., L. F. Katz, and A. B. Krueger (1998): “Computing Inequality: Have Com-puters Changed the Labor Market?” The Quarterly Journal of Economics, 113, 1169–1213.

Baum-Snow, N., M. Freedman, and R. Pavan (2018): “Why Has Urban Inequality Increased?”American Economic Journal: Applied Economics, 10, 1–42.

Baum-Snow, N. and R. Pavan (2013): “Inequality and City Size,” The Review of Economics andStatistics, 95, 1535–1548.

Beaudry, P., M. Doms, and E. Lewis (2010): “Should the Personal Computer Be Consid-ered a Technological Revolution? Evidence from U.S. Metropolitan Areas,” Journal of PoliticalEconomy, 118, 988–1036.

Broda, C. and D. E. Weinstein (2006): “Globalization and the Gains From Variety,” TheQuarterly Journal of Economics, 121, 541–585.

Bustos, P. (2011): “Trade Liberalization, Exports, and Technology Upgrading: Evidence on theImpact of MERCOSUR on Argentinian Firms,” American Economic Review, 101, 304–340.

46

Page 47: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

Chetty, R. and N. Hendren (2018): “The Impacts of Neighborhoods on IntergenerationalMobility II: County-Level Estimates,” The Quarterly Journal of Economics, 133, 1163–1228.

Chetty, R., M. Stepner, S. Abraham, S. Lin, B. Scuderi, N. Turner, A. Bergeron,and D. Cutler (2016): “The Association Between Income and Life Expectancy in the UnitedStates, 2001-2014,” JAMA, 315, 1750–1766.

Davis, D., E. Mengus, and T. Michalski (2019): “Labor Market Polarization and The GreatDivergence: Theory and Evidence,” .

Davis, D. R. and J. I. Dingel (2014): “The Comparative Advantage of Cities,” NBER WorkingPapers 20602, National Bureau of Economic Research, Inc.

——— (2019): “A Spatial Knowledge Economy,” American Economic Review, 109, 153–170.

Davis, S. J., J. Haltiwanger, and S. Schuh (1996): “Small Business and Job Creation: Dis-secting the Myth and Reassessing the Facts,” Small Business Economics, 8, 297–315.

Decker, R. A., J. Haltiwanger, R. S. Jarmin, and J. Miranda (2016): “Declining BusinessDynamism: What We Know and the Way Forward,” American Economic Review, 106, 203–207.

Diamond, R. (2016): “The Determinants and Welfare Implications of US Workers’ DivergingLocation Choices by Skill: 1980-2000,” American Economic Review, 106, 479–524.

Eckert, F. (2019): “Growing Apart: Tradable Services and the Fragmentation of the U.S. Econ-omy,” Ph.D. thesis, Yale University.

Eckert, F., S. Ganapati, and C. Walsh (2019): “Skilled Tradable Services: The Transformationof U.S. High-Skill Labor Markets,” Working Paper 25, Minneapolis Federal Reserve Opportunityand Inclusive Growth Institute.

Eeckhout, J., C. Hedtrich, and R. Pinheiro (2019): “Automation, Spatial Sorting, and JobPolarization,” .

Fort, T. C. (2017): “Technology and Production Fragmentation: Domestic versus Foreign Sourc-ing,” Review of Economic Studies, 84, 650–687.

Fort, T. C. and S. D. Klimek (2018): “The Effects of Industry Classification Changes on USEmployment Composition,” Working Papers 18-28, Center for Economic Studies, U.S. CensusBureau.

Fort, T. C., J. R. Pierce, and P. K. Schott (2018): “New Perspectives on the Decline of USManufacturing Employment,” Journal of Economic Perspectives, 32, 47–72.

Ganong, P. and D. Shoag (2017): “Why has regional income convergence in the U.S. declined?”Journal of Urban Economics, 102, 76–90.

47

Page 48: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

Gaubert, C. (2014): “Essays on Firms and Cities,” Ph.D. thesis, Princeton University.

Giannone, E. (2017): “Skill-Biased Technical Change and Regional Convergence,” Ph.D. thesis,University of Chicago.

Herkenhoff, K. F., L. E. Ohanian, and E. C. Prescott (2018): “Tarnishing the goldenand empire states: Land-use restrictions and the U.S. economic slowdown,” Journal of MonetaryEconomics, 93, 89–109.

Hopenhayn, H. A. (1992): “Entry, Exit, and Firm Dynamics in Long Run Equilibrium,” Econo-metrica, 60, 1127–1150.

Hsieh, C.-T. and E. Moretti (2019): “Housing Constraints and Spatial Misallocation,” Ameri-can Economic Journal: Macroeconomics, 11, 1–39.

Hsieh, C.-T. and E. Rossi-Hansberg (2019): “The Industrial Revolution in Services,” WorkingPaper 25968, National Bureau of Economic Research.

Jiao, Y. and L. Tian (2019): “Geographic Fragmentation in a Knowledge Economy,” .

Karahan, F., B. Pugsley, and A. Şahin (2019): “Demographic Orgins of the Startup Deficit,”Working Paper 25874, National Bureau of Economic Research.

Katz, L. F. and K. M. Murphy (1992): “Changes in Relative Wages, 1963–1987: Supply andDemand Factors,” The Quarterly Journal of Economics, 107, 35–78.

Krueger, A. B. (1993): “How Computers Have Changed the Wage Structure: Evidence fromMicrodata, 1984–1989,” The Quarterly Journal of Economics, 108, 33–60.

Krusell, P., L. E. Ohanian, J.-V. RÌos-Rull, and G. L. Violante (2000): “Capital-SkillComplementarity and Inequality: A Macroeconomic Analysis,” Econometrica, 68, 1029–1054.

Melitz, M. J. (2003): “The Impact of Trade on Intra-Industry Reallocations and Aggregate In-dustry Productivity,” Econometrica, 71, 1695–1725.

Michaels, G., F. Rauch, and S. J. Redding (2013): “Task Specialization in U.S. Cities from1880-2000,” NBER Working Papers 18715, National Bureau of Economic Research, Inc.

Monte, F., S. J. Redding, and E. Rossi-Hansberg (2018): “Commuting, Migration, and LocalEmployment Elasticities,” American Economic Review, 108, 3855–90.

Moretti, E. (2012): The New Geography of Jobs, Houghton Mifflin Harcourt.

——— (2013): “Real Wage Inequality,” American Economic Journal: Applied Economics, 5, 65–103.

48

Page 49: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

Nocke, V. (2006): “A Gap for Me: Entrepreneurs and Entry,” Journal of the European EconomicAssociation, 4, 929–956.

Pugsley, B. and A. Sahin (2018): “Grown-up business cycles,” The Review of Financial Studies,32, 1102–1147.

Rappaport, J. (2018): “The Faster Growth of Larger, Less Crowded Locations,” Kansas City FedEconomic Review, 4th Quarter 2018.

Roback, J. (1982): “Wages, Rents, and the Quality of Life,” Journal of Political Economy, 90,1257–1278.

Rossi-Hansberg, E., P.-D. Sarte, and F. Schwartzman (2019): “Cognitive Hubs and SpatialRedistribution,” Working Paper 26267, National Bureau of Economic Research.

Ruggles, S., S. Flood, R. Goeken, J. Grover, E. Meyer, J. Pacas, and M. Sobek (2019):“IPUMS USA Version 9.0 [dataset],” Minneapolis, MN: IPUMS.

Saiz, A. (2010): “The Geographic Determinants of Housing Supply,” The Quarterly Journal ofEconomics, 125, 1253–1296.

Santamaria, C. (2018): “Small Teams in Big Cities: Inequality, City Size and the Organizationof Production,” Ph.D. thesis, Princeton University.

Yeaple, S. R. (2005): “A simple model of firm heterogeneity, international trade, and wages,”Journal of International Economics, 65, 1 – 20.

49

Page 50: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

(1) (2) (3) (4) (5) (6)wh wl wh, res wl, res wh, res fs wl, res fs

year=1980 ⇥ lwapop 0.0633⇤⇤⇤ 0.0530⇤⇤⇤ 0.0459⇤⇤⇤ 0.0402⇤⇤⇤ 0.0254⇤⇤⇤ 0.0216⇤⇤⇤(0.00295) (0.00286) (0.00255) (0.00258) (0.00209) (0.00221)

year=1990 ⇥ lwapop 0.0754⇤⇤⇤ 0.0614⇤⇤⇤ 0.0715⇤⇤⇤ 0.0596⇤⇤⇤ 0.0423⇤⇤⇤ 0.0388⇤⇤⇤(0.00289) (0.00280) (0.00249) (0.00252) (0.00204) (0.00215)

year=2000 ⇥ lwapop 0.0814⇤⇤⇤ 0.0532⇤⇤⇤ 0.0772⇤⇤⇤ 0.0534⇤⇤⇤ 0.0466⇤⇤⇤ 0.0389⇤⇤⇤(0.00287) (0.00279) (0.00248) (0.00251) (0.00203) (0.00215)

year=2007 ⇥ lwapop 0.0821⇤⇤⇤ 0.0469⇤⇤⇤ 0.0738⇤⇤⇤ 0.0458⇤⇤⇤ 0.0453⇤⇤⇤ 0.0352⇤⇤⇤(0.00283) (0.00274) (0.00245) (0.00247) (0.00200) (0.00211)

year=2014 ⇥ lwapop 0.0696⇤⇤⇤ 0.0256⇤⇤⇤ 0.0643⇤⇤⇤ 0.0290⇤⇤⇤ 0.0418⇤⇤⇤ 0.0248⇤⇤⇤(0.00278) (0.00270) (0.00241) (0.00243) (0.00197) (0.00208)

Observations 13187 13187 13187 13187 13187 13187R

2 0.477 0.207 0.460 0.280 0.332 0.183Standard errors in parentheses⇤p < .1, ⇤⇤

p < .05, ⇤⇤⇤p < .01

Table 9: City size wage premium, all cities

Appendix

A Robustness checks

A.1 Demographic facts

A.2 Firm facts

In Figure 13, I present the correlation between dynamism and city size over time controlling forindustry composition at the 3 digit NAICS level. To do this, I compute dynamism measures withina city-industry cell and then estimate the following specification

Dict = ↵it +X

t

�tlog(popct)⇥ �t + ✏ict.

Figure 13 shows the correlation coefficients �t over time.Because city-industry cells can have very small firm counts for small cities, I perform an alter-

native analysis by grouping cities into 7 size categories18. I then calculate the dynamism measureswithin a city size category-industry bin, using 4 digit NAICS codes. Grouping cities into size cat-egories allows me to control for more detailed industry composition without worrying about smallfirm counts. In figure 15, I plot the city size category fixed effects from a regression of the dynamism

18Specifically, city size categories are based on population percentiles. City size category 1 is cities below the 50thpercentile, category 2 is the 50th to 75th percentile, 3 is the 75th to 90th percentile, 4 is the 90th to 95th percentile,5 is 95th percentile to 98th percentile, 6 is the 98th to 99th percentile and finally category 7 is cities above the 99thpercentile.

50

Page 51: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

(1) (2) (3) (4) (5)<HS HS SC C >C

year=1980 ⇥ log(pop) 0.0444⇤⇤⇤ 0.0387⇤⇤⇤ 0.0526⇤⇤⇤ 0.0628⇤⇤⇤ 0.0490⇤⇤⇤(0.00654) (0.00531) (0.00481) (0.00492) (0.00560)

year=1990 ⇥ log(pop) 0.0523⇤⇤⇤ 0.0517⇤⇤⇤ 0.0692⇤⇤⇤ 0.0678⇤⇤⇤ 0.0639⇤⇤⇤(0.00658) (0.00533) (0.00483) (0.00494) (0.00563)

year=2000 ⇥ log(pop) 0.0306⇤⇤⇤ 0.0437⇤⇤⇤ 0.0681⇤⇤⇤ 0.0803⇤⇤⇤ 0.0699⇤⇤⇤(0.00669) (0.00542) (0.00491) (0.00503) (0.00572)

year=2007 ⇥ log(pop) 0.0263⇤⇤⇤ 0.0383⇤⇤⇤ 0.0607⇤⇤⇤ 0.0817⇤⇤⇤ 0.0708⇤⇤⇤(0.00675) (0.00548) (0.00496) (0.00508) (0.00578)

year=2014 ⇥ log(pop) 0.0137⇤⇤ 0.0212⇤⇤⇤ 0.0471⇤⇤⇤ 0.0741⇤⇤⇤ 0.0652⇤⇤⇤(0.00671) (0.00544) (0.00493) (0.00504) (0.00574)

Observations 360 360 360 360 360R

2 0.853 0.763 0.857 0.908 0.924Standard errors in parentheses⇤p < .1, ⇤⇤

p < .05, ⇤⇤⇤p < .01

(a) City bins

(1) (2) (3) (4) (5)<HS HS SC C >C

year=1980 ⇥ lwapop 0.0511⇤⇤⇤ 0.0469⇤⇤⇤ 0.0504⇤⇤⇤ 0.0642⇤⇤⇤ 0.0605⇤⇤⇤(0.00408) (0.00290) (0.00294) (0.00318) (0.00368)

year=1990 ⇥ lwapop 0.0492⇤⇤⇤ 0.0506⇤⇤⇤ 0.0623⇤⇤⇤ 0.0700⇤⇤⇤ 0.0823⇤⇤⇤(0.00398) (0.00284) (0.00287) (0.00311) (0.00360)

year=2000 ⇥ lwapop 0.0275⇤⇤⇤ 0.0411⇤⇤⇤ 0.0600⇤⇤⇤ 0.0784⇤⇤⇤ 0.0822⇤⇤⇤(0.00397) (0.00283) (0.00286) (0.00309) (0.00358)

year=2007 ⇥ lwapop 0.0266⇤⇤⇤ 0.0340⇤⇤⇤ 0.0549⇤⇤⇤ 0.0768⇤⇤⇤ 0.0844⇤⇤⇤(0.00391) (0.00278) (0.00282) (0.00305) (0.00353)

year=2014 ⇥ lwapop 0.000573 0.0131⇤⇤⇤ 0.0346⇤⇤⇤ 0.0632⇤⇤⇤ 0.0703⇤⇤⇤(0.00384) (0.00274) (0.00277) (0.00300) (0.00347)

Observations 13187 13187 13187 13187 13187R

2 0.203 0.166 0.253 0.384 0.457Standard errors in parentheses⇤p < .1, ⇤⇤

p < .05, ⇤⇤⇤p < .01

(b) All cities

Table 10: City size wage premium, more education groups

51

Page 52: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

(1) (2) (3) (4) (5)<HS HS SC C >C

year=1980 ⇥ log(pop) 0.0208⇤⇤⇤ 0.0211⇤⇤⇤ 0.0273⇤⇤⇤ 0.0241⇤⇤⇤ 0.0260⇤⇤⇤(0.00647) (0.00424) (0.00427) (0.00454) (0.00407)

year=1990 ⇥ log(pop) 0.0406⇤⇤⇤ 0.0425⇤⇤⇤ 0.0515⇤⇤⇤ 0.0456⇤⇤⇤ 0.0433⇤⇤⇤(0.00650) (0.00426) (0.00429) (0.00456) (0.00409)

year=2000 ⇥ log(pop) 0.0317⇤⇤⇤ 0.0416⇤⇤⇤ 0.0510⇤⇤⇤ 0.0542⇤⇤⇤ 0.0511⇤⇤⇤(0.00661) (0.00433) (0.00436) (0.00463) (0.00416)

year=2007 ⇥ log(pop) 0.0262⇤⇤⇤ 0.0380⇤⇤⇤ 0.0457⇤⇤⇤ 0.0539⇤⇤⇤ 0.0491⇤⇤⇤(0.00668) (0.00437) (0.00440) (0.00468) (0.00420)

year=2014 ⇥ log(pop) 0.0148⇤⇤ 0.0242⇤⇤⇤ 0.0379⇤⇤⇤ 0.0517⇤⇤⇤ 0.0462⇤⇤⇤(0.00663) (0.00434) (0.00437) (0.00465) (0.00417)

Observations 360 360 360 360 360R

2 0.431 0.709 0.799 0.809 0.817Standard errors in parentheses⇤p < .1, ⇤⇤

p < .05, ⇤⇤⇤p < .01

(a) City bins

(1) (2) (3) (4) (5)<HS HS SC C >C

year=1980 ⇥ lwapop 0.0226⇤⇤⇤ 0.0211⇤⇤⇤ 0.0215⇤⇤⇤ 0.0196⇤⇤⇤ 0.0292⇤⇤⇤(0.00361) (0.00233) (0.00222) (0.00229) (0.00246)

year=1990 ⇥ lwapop 0.0401⇤⇤⇤ 0.0361⇤⇤⇤ 0.0430⇤⇤⇤ 0.0369⇤⇤⇤ 0.0469⇤⇤⇤(0.00353) (0.00228) (0.00217) (0.00224) (0.00241)

year=2000 ⇥ lwapop 0.0359⇤⇤⇤ 0.0376⇤⇤⇤ 0.0429⇤⇤⇤ 0.0426⇤⇤⇤ 0.0487⇤⇤⇤(0.00351) (0.00227) (0.00217) (0.00223) (0.00240)

year=2007 ⇥ lwapop 0.0307⇤⇤⇤ 0.0331⇤⇤⇤ 0.0412⇤⇤⇤ 0.0420⇤⇤⇤ 0.0477⇤⇤⇤(0.00346) (0.00223) (0.00213) (0.00219) (0.00236)

year=2014 ⇥ lwapop 0.0130⇤⇤⇤ 0.0223⇤⇤⇤ 0.0308⇤⇤⇤ 0.0387⇤⇤⇤ 0.0426⇤⇤⇤(0.00340) (0.00220) (0.00210) (0.00216) (0.00232)

Observations 13187 13187 13187 13187 13187R

2 0.059 0.179 0.275 0.270 0.284Standard errors in parentheses⇤p < .1, ⇤⇤

p < .05, ⇤⇤⇤p < .01

(b) All cities

Table 11: City size wage premium, more education groups, residual wages

52

Page 53: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

β1980=0.067(0.015)β2014=0.137(0.046)

−1

−.5

0

.5

1

skill

inte

nsi

ty,

de

me

an

ed

10 12 14 16

log(population)

1980 2014

(a) More than HS : HS or less

β1980=0.032(0.004)β2014=0.134(0.019)

−.2

0

.2

.4

skill

inte

nsi

ty,

de

me

an

ed

10 12 14 16

log(population)

1980 2014

(b) C : HS

Figure 11: Skill intensity and city size, alternate definitions of skill

measure on a full set of industry-year fixed effects and city size-year fixed effects:

Dict = ↵it + �ct + ✏ict

The city size category fixed effects represent the average difference, across industries, in the dy-namism rates between big cities and the smallest city size category.

In figure 16, I show that the results are robust to using a finer set of industry controls. I againgroup cities into size categories and calculate dynamism rates within a city size - NAICS 4 bin. Ithen estimate the following specification

�2007,1980Dic = ↵i + �c + ✏ic

Figure 16, plots the city fixed effects from this regression. The city fixed effect represents the averagedifference across industries between the decline in dynamism for a given city size category and thesmallest city size category. The fact that the line is increasing in city size means that the biggercity size categories experienced a smaller decline in dynamism than the smallest cities.

A.3 Selection

Finally, I examine how exit rates differ across city size. One interpretation of the facts presentedso far, is that big cities are growing faster than small cities. Because growing cities will havemore new entrants and new firms are more likely to exit, big cities will also have higher exit rates.This is part of the story, but I also show that big cities are exhibiting higher rates of churn evenwhen controlling for establishment characteristics such as age and size. To test this, I estimate the

53

Page 54: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

β1980=0.025(0.002)β2014=0.071(0.003)

−.5

0

.5

1sk

ill in

ten

sity

, d

em

ea

ne

d

8 10 12 14 16

log(population)

1980 2014

(a) More than college : Some college or less (baseline measure ofskill intensity)

β1980=0.076(0.006)β2014=0.229(0.015)

−2

0

2

4

skill

inte

nsi

ty,

de

me

an

ed

8 10 12 14 16

log(population)

1980 2014

(b) More than HS : HS or less

β1980=0.028(0.002)β2014=0.138(0.007)

−.5

0

.5

1

1.5

2

skill

inte

nsi

ty,

de

me

an

ed

8 10 12 14 16

log(population)

1980 2014

(c) College : High School

Figure 12: Skill intensity and city size, all cities

−1.5

−1

−.5

0

.5

1

regre

ssio

n c

oeffic

ient

esta

b. sta

rt up

rate

esta

b ex

it ra

te

firm

sta

rt up

rate

firm

exit r

ate

job

crea

tion

rate

job

destru

ction

rate

1980 2014

Figure 13: Correlation coefficients, dynamism and city size controlling for in-

dustry

54

Page 55: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

−1.5

−1

−.5

0

.5

reg

ress

ion c

oeff

icie

nt

esta

b. sta

rt up

rate

esta

b ex

it ra

te

firm

sta

rt up

rate

firm

exit r

ate

job

crea

tion

rate

job

destru

ction

rate

1980 2014

Figure 14: Correlation coefficients, dynamism and city size controlling for in-

dustry and demographics

Table 12: Matchers versus non-matchers

following specification

Eet = �i(e),t + �i(f(e)),t + �tlog(popc(e),t) + tXet + ✏et

where Eet is an indicator that establishment e exits between year t and t+1, �i(e),t are establishmentNAICS 4 fixed effects, �i(f(e)),t are firm NAICS 4 fixed effects and Xe,t are other characteristics ofthe establishment such as size and age. If �t is increasing over time, then the relationship betweenselection and city size became stronger. I present three sets of results. In the first set of resultsgoing back to 1977, I control for establishment and firm size. In the second set of results, I add acourse measure of productivity, sales per employee. Because sales is only available in the EconomicCensus, I limit the sample to firms who are in a sector whose Economic Census started in 1977.In the third set of results, I also control for age of the establishment and firm. In the LBD ageis imputed from the first time an establishment is observed. As a result, I can only meaningfullycontrol for age starting in 1987.

Figure 17 presents the results from these regressions. Looking across the years, one can see thecoefficient, �t, growing stronger. In 1977, �1977 is .0005 which means that in cities twice as largeestablishments are .035 percentage points more likely to exit than establishments in small cities. By2007, � had grown to .002percentage points meaning that establishments are .14 percentage pointsmore likely to exit in cities twice as large.

55

Page 56: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

(a) Firm start up rate (b) Firm exit rate

(c) Establishment start up rate (d) Establishment exit rate

(e) Job creation rate (f) Job destruction rate

Figure 15: Dynamism by city size category, NAICS 4 controls

56

Page 57: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

0

.5

1

1.5

2

2.5

∆D

1980 to 2

007

0 2 4 6 8

city size category

firm start up rate firm exit rate

(a) Firm start up and exit rate

0

.5

1

1.5

2

2.5

∆D

1980 to 2

007

0 2 4 6 8

city size category

establishment start up rate establishment exit rate

(b) Establishment start up and exit rate

0

.5

1

1.5

2

∆D

1980 to 2

007

0 2 4 6 8

city size category

job creation rate job destruction rate

(c) Job creation and destruction rate

Figure 16: Decline in Dynamism and City Size

−.002

0

.002

.004

β lo

g(p

op)

1980 1990 2000 2010

year

controls for industry

controls for industry and size

controls for industry, size and age

(a) no sales

.002

.004

.006

.008

.01

β lo

g(p

op)

1980 1990 2000 2010

year

controls for industry, size, sales/emp

controls for industry, size, sales/emp and age

(b) controls for sales per employee

Figure 17: Selection regressions

57

Page 58: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

A.4 ACES sample selection

B Derivations for firm problems

B.1 Value function: no adoption

To find the solution to the HJB equation I use the following theorem:

Theorem: The general solution is the sum of a particular solution to the general equation and thegeneral solution to the reduced equation.

So, one needs to find a particular solution to the general equation and a general solution to thereduced equation. The reduced equation is

0 = �⇢V (s) + µV 0(s) +1

2 2V 00(s)

To find a general solution to the reduced equation, I use the following theorem:

Theorem: the generalized solution to the reduced equation is a linear combination of any twolinearly independent particular solutions.

To find particular solutions, consider the class of equations e⇠s. Plugging this into the reducedequation

0 = �⇢e⇠s + µ⇠e⇠s +1

2 2⇠2e⇠s

dividing by e⇠s gives0 = �⇢+ µ⇠ +

1

2 2⇠2

=) ⇠ = �µ±

qµ2 + 2 2⇢

2

Let ⇠+ and ⇠� be the positive and negative roots, respectively. Then the general solution to thereduced equation is

C+e⇠+s + C�e⇠

�s.

The general solution is the sum of a particular solution to the general equation and a generalsolution to the reduced equation. I just found the general solution to the reduced equation so nowI find a particular solution to the general equation. Define this as Vp(m). I solve for Vp(m) usingthe method of undetermined coefficients. I look for a solution to

⇢V (s) = [Zes � f c � rf b] + µV 0(s) +1

2 2V 00(s)

guessing that the answer is V (s) = Aes +B and plugging the guess into the HJB equation

⇢ (Aes +B) = [Zes � f c � rf b] + µAes +1

2 2Aes

58

Page 59: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

Rearranging gives

Aes +B =1

⇢[�f c � rf b] +

1

Z + µA+

1

2 2A

�es

=) B =1

⇢[�f c � rf b]

and=) A =

1

Z + µA+

1

2 2A

=) A =Z

⇢� µ� 12

2

So the particular solution to the HJB is

Vp(s) =Z

⇢� µ� 12

2es +

1

⇢[�f c � rf b]

Thus a general solution to the value function will be

V (s) = Vp(s) + C+e⇠+s + C�e⇠

�s

whereVp(s) =

Z

⇢� µ� 12

2es +

1

⇢[�f c � rf b]

Using the no bubble condition, it must be the case that C+ = 0 otherwise V (s) will diverge fromVp(s) as s ! 1. Then I solve for C� using the value matching condition and the optimal thresholdusing the smooth pasting condition.

Value matching givesV (sx) = 0

V (sx) =Z

⇢� µ� 12

2esx +

1

⇢[�⇣✓hjwhj + ✓ljwlj

⌘f c � rf b] + C�e⇠

�sx = 0

=) C� =�Z

⇢� µ� 12

2esx�⇠

�sx � 1

⇢[�⇣✓hjwhj + ✓ljwlj

⌘f c � rf b]e�⇠

�sx

So now the value function is

V (s) =Z

⇢� µ� 12

2es+

1

⇢[�⇣✓hjwhj + ✓ljwlj

⌘f c�rf b]+

"�Z

⇢� µ� 12

2esx�⇠

�sx � 1

⇢[�⇣✓hjwhj + ✓ljwlj

⌘f c � rf b]e�⇠

�sx

#e⇠

�s

Then using the smooth pasting condition, I solve for sx

V 0(sx) = 0

V 0(s) =Z

⇢� µ� 12

2es + ⇠�

"�Z

⇢� µ� 12

2esx�⇠

�sx � 1

⇢[�⇣✓hjwhj + ✓ljwlj

⌘f c � rf b]e�⇠

�sx

#e⇠

�s

59

Page 60: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

V 0(sx) =Z

⇢� µ� 12

2esx+⇠�

"�Z

⇢� µ� 12

2esx�⇠

�sx � 1

⇢[�⇣✓hjwhj + ✓ljwlj

⌘f c � rf b]e�⇠

�sx

#e⇠

�sx = 0

Solving for sx gives

sx = ln

"1

Z

✓�⇠�

1� ⇠�

◆ ⇢� µ� 1

2 2

!⇣⇣✓hjwhj + ✓ljwlj

⌘f c + rf b

⌘#

B.2 Firm size distribution: no adoption

The corresponding Kolmogorov Forward Equation is

@g(s)

@t= �µ

dg(s)

ds+

1

2 2d

2g(s)

ds2

In steady state @g(s)dt

= 0 =) �� dg(s)ds

= d2g(s)ds2

where � = �µ

2/2. The solution is going to be

g(s) =

8<

:g�(s) s 2 (sx, se)

g+(s) s > se

andg�(s) = C1 + C2e��s

g+(s) = C3 + C4e��s.

There will be 4 boundary conditions:

1. the mass of firms is finite: M =Rse

sxg�(s)ds+

R1se

g+(s)ds < 1

(a)R1se

g+(s)ds =⇥C3s+ C4

1�e��s

⇤1se

=) C3 = 0

2. f(s) is continuous at entry: lims#se g+(s) = lims"se g

�(s)

(a) To show this use the discrete approximation to the Brownian motion

�N = (1� p)hg+(se + h) = phg�(se � h)

�N ⇡ (1�p)h�g+(se) + g+0(se)h+ g+00(se)h

2�⇡ ph

�g�(se) + g�0(se)(�h) + g�00(se)h

2�

Divide both sides byp� and take the limit as �! 0 which gives

g+(se) = g�(se)

(b) C1 + C2e��se = C4e��se =) C1 = (C4 � C2) e��se

3. there’s no mass at the exit threshold: g(sx) = 0

(a) (C4 � C2) e��se + C2e��sx = 0 =) C4 = C2(e��se�e

��sx)e��se

60

Page 61: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

(b) So far we haveg�(s) = �C2e��sx + C2e��s

g+(s) = C2(e��se�e

��sx)e��se

e��s

4. the mass of exitors is

�E = (1� p)hg�(sx + h) ⇡ (1� p)h⇥g�(sx) + hg�0(sx) +O(h2)

dividing by � and taking the limit as �! 0 implies E = 2

2 g�0(sx)

(a) f 0(sx) = ��C2e��sx = 2E 2 =) C2 =

E

µe��sx

B.3 Value function: non-adopters

As before the solution v(s) will be the sum of a particular solution to the general equation and twolinearly independent solutions to the reduced equation

V (s) = Vp(s) + C+e⇠+s + C�e⇠

�s

where ⇠ = �µ ±p

µ2+2 2⇢

2 and Vp(s) is the same as above and solved for using the method ofundetermined coefficients:

Vp(s) =Z

⇢� µ� 12

2es +

1

⇢[�⇣✓hjwhj + ✓ljwlj

⌘f c � rf b]

To make the notation easier, rewrite this as

Vp(s) = Aes +B

I use the boundary conditions to solve for C+, C� and the exit and adoption thresholds.

1. Use v(sx) = 0 to solve for C+

C+ =�Aesx �B � C�e⇠

�sx

e⇠+sx

2. Use v(sa) = va⇤ to solve for C�

va⇤ = Aesa +B +

"�Aesx �B � C�e⇠

�sx

e⇠+sx

#e⇠

+sa + C�e⇠

�sa

C�⇣e⇠

�sx+⇠+sa�⇠+sx � e⇠

�sa

⌘= Aesa +B +

✓�Aesx �B

e⇠+sx

◆e⇠

+sa � va⇤

C� =Aesa +B + (�Aesx �B) e⇠

+sa�⇠+sx � va⇤

e⇠�sx+⇠+sa�⇠+sx � e⇠�sa

61

Page 62: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

3. Plug in C� to solve for C+

C+ =�Aesx �B � C�e⇠

�sx

e⇠+sx

C+ =

✓�Aesx �B

e⇠+sx

◆� Aesa +B + (�Aesx �B) e⇠

+sa�⇠+sx � va⇤

e⇠�sx+⇠+sa�⇠+sx � e⇠�sa

!esx(⇠

��⇠+)

4. Use the smooth pasting conditions v0(sx) = 0 and v0(sa) = va0(sa) to solve for sx and sa

B.4 Firm size distribution: with adoption

The distribution conditional on adoption

The stochastic process for firm size is ds

dt= µdt + dW (t)) and the same Kolmogorov Forward

Equation as in the case with no adoption

@ga(s)

@t= �µ

dga(s)

ds+

1

2 2d

2ga(s)

ds2

In steady state @ga(s)dt

= 0 and � = �µ

2/2. The solution is going to be

ga(s) =

8<

:ga�(s) s 2 (sx, sa)

ga+(s) s > sa

andga�(s) = C1 + C2e��s

ga+(s) = C3 + C4e��s.

There will be 4 boundary conditions:

1. the mass of firms is finite: M =Rsa

saxga�(s)ds+

R1sa

ga+(s)ds < 1

(a) =) C3 = 0

2. ga(s) is continuous at the adoption threshold: lims#sa ga+(s) = lims"sa g

a�(s)

(a) C1 = (C4 � C2) e��sa

3. there’s no mass at the exit threshold: ga(sax) = 0

(a) C4 = C2

⇣e��sa�e

��sx

e��sa

(b) =) C1 = �C2e��sx

4. the mass of exitors is

�Ea = (1� p)hga�(sax + h) ⇡ (1� p)h⇥ga�(sax) + hga�0(sax) +O(h2)

62

Page 63: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

dividing by � and taking the limit as �! 0 implies Ea = 2

2 ga�0(sax)

(a) Now to recap we havega�(s) = �C2e��sx + C2e��s

ga+(s) = C2

⇣e��sa�e

��sax

e��sa

⌘e��s

(b) =) C2 =E

a2�� 2 e�s

ax = E

a

µe��sax

(c) Note that in steady state, the mass of new adopters must equal the mass of adoptersthat exit, A = Ea

5. So,

ga(s) =

8<

:

Ea

µ

�e��(s�s

ax) � 1

�s 2 (sax, sa)

Ea

µ

�e�s

ax � e�sa

�e��s s > sa

6. Integrating over the mass of firms to get MA gives MA = Ea

µ(sx � sa)

7. Putting this in terms of z

fa(z) =

8<

:

Ea

µ

�z�(1��)e�s

ax) � 1

� ���1z

�z 2 (zax, za)

Ea

µ

�e�s

ax � e�sa

�z�(1��)

���1z

�z > za

The distribution conditional on no adoption

The stochastic process for firm size is ds

dt= µdt + dW (t) and the same Kolmogorov Forward

Equation is@gn(s)

@t= �µ

dgn(s)

ds+

1

2 2d

2gn(s)

ds2

In steady state @gn(s)dt

= 0 =) �� dgn(s)ds

= d2gn(s)

ds2where � = �µ

2/2. The solution is going to be

gn(s) =

8<

:gn�(s) s 2 (sx, se)

gn+(s) s 2 (se, sa)

andgn�(s) = C1 + C2e��s

gn+(s) = C3 + C4e��s.

There will be 5 boundary conditions:

1. the mass of adopters is

�A = phgn+(sa � h) ⇡ ph⇥gn+(sa)� hgn+0(sa) + h2gn+00(sa)

= ph⇥�hgn+0(sa) + h2gn+00(sa)

63

Page 64: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

where the second equality follows because g(sa) = 0. Dividing both sides by � and takingthe limit as �! 0 gives

A = �1

2 2gn+0(sa)

2. there’s no mass at the adoption threshold: gn(sa) = 0

3. gn(s) is continuous at the entry threshold: lims#se gn+(s) = lims"se g

n�(s)

4. there’s no mass at the exit threshold: gn(sx) = 0

5. the mass of exitors is

�En = (1� p)hgn�(sx + h) ⇡ (1� p)h⇥gn�(sx) + hgn�0(sx) +O(h2)

Divide by � and taking the limit as �! 0 implies En = 2

2 gn�0(sx)

6. In steady state the mass of entrants must equal the mass of non-adopters who exit plus themass of firms that become adopters, N = En +A.

7. Applying 3 and 4 first gives

gn�(sx) = 0 =) C1 = �C2e��sx

Then

En = 2

2gn�0(sx) = ��

2

2C2e

��sx =) C2 =En

µe��sx

=) C1 = �En

µ

=) gn�(s) =En

µ

⇣e��(s�sx) � 1

Apply gn+(sa) = 0.C3 = �C4e

��sa

Next apply gn�(se) = gn+(se)

En

µ

⇣e��(se�sx) � 1

⌘= C4

⇣e��se � e��sa

=) C4 =En

µ

e��(se�sx) � 1

e��se � e��sa

!

=) gn+(s) =En

µ

e��(se�sx) � 1

e��se � e��sa

!⇣e��s � e��sa

64

Page 65: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

8. So the final distribution is:

gn(s) =

8<

:

En

µ

�e��(s�sx) � 1

�s 2 (sx, se)

En

µ

⇣e��(se�sx)�1e��se�e��sa

⌘ �e��s � e��sa

�s 2 (se, sa)

9. Integrating to get Mn gives

Mn =En

��µ

⇣e��(se�sx) � 1

⌘� En

µ(se � sx)

...+En

�µ

⇣e��(se�sx) � 1

⌘� En

µ

e��(se�sx) � 1

e��se � e��sa

!e��sa(sa � se)

= �En

µ(se � sx)�

En

µ

e��(se�sx) � 1

e��se � e��sa

!e��sa(sa � se)

10. Use

A = �1

2 2gn+0(sa) = �1

2 2

��E

n

µ

e��(se�sx) � 1

e��se � e��sa

!e��sa

!

= �En

e��(se�sx) � 1

e��se � e��sa

!e��sa

to solve for A = Ea as a function of En. (where I used � = �µ2 2 )

11. Distribution over z

gn(z) =

8<

:

En

µ

�z�(1��)e�sx � 1

� ���1z

�s 2 (sx, se)

En

µ

⇣e��(se�sx)�1e��se�e��sa

⌘ �z�(1��) � e��sa

� ���1z

�s 2 (se, sa)

C Calibration

C.1 Uniqueness of city fundamentals

In this section, I show that there are a unique set of fundamentals that rationalize the data being anequilibrium of the model in 1980. The vector of fundamentals is given by Pc = {�, , Ah, Al, fe, f c, b, ⌘},that is the weight on high-skilled labor, �, the city specific productivity term, , high- and low-skilled amenities, Ah and Al, entry costs, fe, fixed costs, fc, the productivity of the building sectorb, and the elasticity of building supply, ⌘. Table 5 gives the moments in the data used to recoverthe parameters. I solve for the parameters recursively in the following steps:

1. Using the start up rate, back out the implied exit threshold zx: sr = µ

sx�seand zx = e

sx

��1 .Knowing the exit threshold means the firm distribution given by equation 1 is determined.

65

Page 66: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

2. Solve for �j using the two labor market clearing conditions. The two labor market clearingconditions are

Hj = hdj

Yj +

Mjf c

j

hdj+ ld

j

+Ejf e

j

hdj+ ld

j

!

Lj = ldj

Yj +

Mjf c

j

hdj+ ld

j

+Ejf e

j

hdj+ ld

j

!

Divide Hj

Lj=⇣wh

wl

⌘�✏ ⇣�

1��

⌘✏

. The skill intensity of city j, Hj

Lj, and the relative wages are both

data. Thus, rearranging to solve for � gives �j =

✓Hj

Lj

◆ 1✏ w

h

wl

1+

✓Hj

Lj

◆ 1✏ w

h

wl

.

3. Next I use the condition that Pj = 1 to solve for the level of wages and j . In city 1, 1 isnormalized to be 1. Thus, I can solve

1 =

Z

⌦j

✓cj(z)

� � 1

◆1��d!

! 11��

where cj(z) = ( jz)�1 (�✏

jw1�✏hj

+(1� �j)✏w1�✏lj

)1

1�✏ and ⌦j was pinned down in step 1 for wl1.Rearranging gives

1 = w1

1��

lj(�✏j

w1�✏hj

w1�✏lj

+ (1� �j)✏)

1��

1�✏

Z

⌦j

✓1

z

� � 1

◆1��d!

!

which means there is a unique solution for wl1. Using wl1 pins down the level of wages in thewhole economy and using data on relative wages in city j, the city size wage premium andrents, I can solve for whj , wlj and rj for all j.

4. Knowing wages, in all other cities j � 2, I can use P = 1 to solve for j .

j =

Z

⌦j

✓(z)�1 (�✏jw

1�✏hj

+ (1� �j)✏w1�✏

lj)

11�✏

� � 1

◆1��d!

! 11��

5. Next, I use three equations to jointly solve for aggregate output Y , the fixed cost f c and theentry cost f e.

final goods: Yj = wHjHj + wLjLj +⇧v

j � (✓hjwhj + ✓ljwlj)�Mjf

c

j +Njfe

j

free entry: Vj(ze) =⇣✓hjwhj + ✓ljwlj

⌘f e

j

profit maximization (smooth pasting): V 0j (zx) = 0

66

Page 67: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

where the smooth pasting condition can be re-arranged to solve uniquely for f c,

⇢ (1�⇠�)�⇠�

Z

⇢�µ� 12

2 esx � rf b

⇣✓hjwhj + ✓l

jwlj

⌘ = f c.

Recall that Zj ⌘⇣( j)

�1 (�✏jhw1�✏jh

+ �✏jlw1�✏jl

)1

1�✏�

��1

⌘1��YjP �

j

1�

for ease of notation. Thenplugging the free entry condition and the equation for f c into the market clearing conditiongives a linear function of Y and therefore the solution for Yj will be unique. Knowing Yj thefree entry condition and smooth pasting condition can be used to solve for unique values off e

jand f c

j.

6. The data on the elasticity of housing supply from Saiz (2010) maps directly into the parameteron the elasticity of building supply ⌘j . Then use the building market clearing condition tosolve for bj

Mjfb + (1� �)Hj

whj

rj+ (1� �)Lj

wlj

rj= bjr

1⌘�1

7. Finally, I use the labor market clearing to solve for amenities for each skill type. DefineV⌧j ⌘

⇣A⌧j��(1� �)(1��)w⌧jq

��1j

⌘⌫

. Then the probability that city j provides the highestutility to worker i will follow a multinomial logit

⇡⌧j = P (Vi⌧j > Vi⌧k, 8k 6= j) =V⌧jPjV⌧j

.

In city 1 the amenities are normalized to be 1 so V⌧1 is known. Further, the ⇡⌧js are knownfrom the data. Define a vector x = [�⇡⌧2V⌧1, ...,�⇡⌧J V⌧1] and a matrix

P =

2

66664

⇡⌧2 � 1 ⇡⌧2 ... ⇡⌧2

⇡⌧3 ⇡⌧3 � 1 ... ⇡⌧3

... ... ... ...

⇡⌧J ⇡⌧J ... ⇡⌧J � 1

3

77775

then the the vector V = [V⌧2, ..., V⌧J ] can be solved for

x = PV =) V = P�1x.

The determinant of the matrix P is not equal to zero and, therefore, there will be a uniquesolution for V which can then be used to solve in closed form for A⌧j .

67

Page 68: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

β=0.246(0.417)

.1

.15

.2

.25

∆w

h, data

.28 .3 .32 .34

∆wh, model

(a) high-skilled wages

β=0.643(0.232)

−.1

−.08

−.06

−.04

−.02

0

∆w

l, data

−.02 0 .02 .04 .06 .08

∆wl, model

(b) low-skilled wages

β=0.741(0.085)

.1

.2

.3

.4

∆H

/L, data

.1 .2 .3 .4

∆H/L, model

(c) skill intensity

β=0.778(0.443)

−10

−8

−6

−4

−2

∆sr

, data

.5 1 1.5 2 2.5

∆sr, model

(d) establishment start up rate

Figure 18: Testing the model in changes

Note: Figures show the change in the data between 1980 and 2014 versus the change in the model between the first and secondsteady state.

D Additional Model Results

D.1 Testing the model in changes

As an additional test of the success of the model, in figure 18, I test the model in changes. Foreach of the four main variables, high-skilled wages, low-skilled wages, skill intensity and businessdynamism, I show the relationship between the change in the data between 1980 and 2014 and thechange in the model between the first and second steady state. With the exception of high-skilledwages, the relationship between the change in the data and the change in the model is positive andstatistically significant at the 10% level. For high-skilled wages, the relationship is positive, but notsignificant.

D.2 Changes in market size: model vs. data

In this section, I verify that the relationship between population growth and initial population inthe model are consistent with the data. This relationship is counter-intuitive since we would expectbig cities that are doing more adoption to grow faster. However, these cities are also much morelimited in their housing supply elasticity. Thus, even though they adopt more, the increase incongestion limits their population growth. Low-skilled workers, in particular, leave the big cities

68

Page 69: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

.5

1

1.5

2

2.5

3

annualiz

ed p

opula

tion g

row

th

10 12 14 16

log(population)

data model

Figure 19: Population growth vs initial population, model vs. data

for less constrained smaller cities. This is consistent with evidence from Rappaport (2018) whodocuments a non-monotonic relationship of population growth to initial city size. In particular,Rappaport (2018) finds that big cities grow less than less constrained and less crowded mediumsized cities. However he finds that medium sized and big cities grew faster than rural areas andmicropolitan statistical areas which are not in my model.

D.3 Understanding the drivers of the start up rate

To understand what is driving the changes in the city level churn rates in the second steady stateequilibrium, I proceed in three steps. First, I decompose the churn rates into three components: thechurn rate for adopters, the churn rate for non-adopters and the share of firms that are adopters.Each of these components are functions of three endogenous objects: the exit threshold for adopters,the exit threshold for non-adopters and the adoption threshold. I show how the churn rates andthe share of adopters are affected by each threshold. In the second step, I show how each of thethresholds, the churn rates and the share of adopters depend on wages, rents, market size and theavailability of the new technology. Finally, I undertake a series of partial equilibrium counterfactualsto understand the main drivers of the changing churn rate in the second equilibrium. I ask whatwould have happened to the churn rates and the share of adopters had rent and wages been heldconstant at their 1980 levels or if market size, Y , increased equally across cities.

The start up rate in the city is given by

srj = (1� µj)srna

j + µjsra

j ,

i.e. it is a weighted average of the start up rate of non-adopters and adopters where the weightis given by the share of firms that have adopted, µj . Now, I characterize the three objects in thisequation srna

j, sra

jand µj .

69

Page 70: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

Integrating the size distribution of adopters gives the start up rate for adopters

sraj =Ea

j

Ma

j

=�µ

saj � saxj

where

sax = ln

"1

Za

j

✓�⇠�

1� ⇠�

◆ ⇢� µ� 1

2 2

!⇣(✓hjwhj + ✓ljwlj)f

c

j + (✓haj whj + ✓laj wlj)�fc + rjf

b

⌘#

and Za

j=

✓⇣ a

j

⌘�1 ⇣(�a

j)✏w1�✏

jh+ (1� �a

j)✏w1�✏

jl

⌘1

1�✏�

��1

◆1��Yj

1�.

Similarly, the start up rate for non-adopters is given by

srnaj =En

j

Mn

j

=�µ

(se � sxj) +⇣e��(se�sxj)�1

e��(se�saj)�1

⌘(saj � se)

where sx and sa solve the smooth pasting conditions: v0(sx) = 0 and v0(sa) = va0(sa).Finally, the share of adopters is give by

µj =Ma

j

Ma

j+Mn

j

=(sa

xj� saj)

(saxj

� saj) +⇣e��(se�sxj)�1e��se�e

��saj

⌘e��saj

h(se � sxj) +

⇣e��(se�sxj)�1e��se�e

��saj

⌘e��saj (saj � se)

i

=1

1� e��(se�sxj) � 1

e��(se�saj) � 1

!

| {z }<0

⇥sra

srna

Three thresholds characterize the start up rate for adopters and non-adopters and the share ofadopters: the adoption threshold, sa

j, the exit threshold for adopters, sa

xj, and the exit threshold for

non-adopters, sxj . Thus, the start up rate for adopters is characterized by the difference between theexit threshold for adopters and the adoption threshold while the start up rate for non-adopters ischaracterized by the difference between the exit threshold for non-adopters and the entry thresholdas well as the difference between the entry threshold and the adoption threshold. Finally, the shareof adopters, µj , is characterized by the difference between the entry and exit thresholds and thedifference between the entry and adoption thresholds as well as the relative churn rates for adoptersand non-adopters.

In figure 20, I show how the churn rate for adopters and non-adopters and the share of adopterschanges with each exit and adoption threshold, holding the other thresholds fixed. These relation-ships are largely intuitive. The The churn rate for adopters is in panel 20a of figure 20. Each linegives the churn rate for adopters for a percentage change in the exit threshold for non-adopters, theexit threshold for adopter and the adoption threshold. As the the exit threshold for non-adopters

70

Page 71: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

-0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 0.04 0.05

% change in threshold

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

start

up r

ate

ad

opte

rs

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

exit threshold: non-adopters

exit threshold: adopters

adoption threshold (right axis)

(a) Churn rate for adopters

-0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 0.04 0.05

% change in threshold

15.6

15.7

15.8

15.9

16

16.1

16.2

16.3

16.4

16.5

start

up r

ate

no

n-a

dopte

rs0

50

100

150

200

250

exit threshold: adopters

adoption threshold

exit threshold: non-adopters (right axis)

(b) Churn rate for non-adopters

-0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 0.04 0.05

% change in threshold

0.02

0.025

0.03

0.035

sha

re o

f ad

op

ters

exit threshold: non-adopters

exit threshold:adopters

adoption threshold

(c) Share of adopters

Figure 20: Churn rates and share of adopters versus exit and adoption thresh-

olds

71

Page 72: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

changes holding the other thresholds fixed, the churn rate for adopters is unchanged. This is un-surprising as sra

jonly depends on the exit threshold for adopters and the adoption threshold. As

the exit threshold for adopters increases, firms are more likely to hit the exit barrier and the churnrate for adopters goes up. As the adoption threshold decreases firms adopt earlier and, holding theexit threshold for adopters fixed, they are more likely to be near the exit threshold increasing thechurn rate for adopters.

Similarly, panel b plots the churn rate for non-adopters. The churn rate for non-adopters isunchanged by the exit threshold for adopters and increasing in the exit threshold for non-adopters.As the adoption threshold decreases, non-adopters are more likely to adopt and the churn rate fornon-adopters increases.

Panel c shows the share of firms that adopt versus a percent change in each threshold. As theexit threshold for non-adopters increases, there will be less non-adopters in market and the share ofadopters increases. Similarly as the exit threshold for adopters increases, the share of adopters goesdown since less of them remain in the market. The relationship between the adoption thresholdand the share of adopters in non-monotonic. For low values of the adoption threshold, the share ofadopters is increasing. This is because there is a tradeoff between a low adoption threshold whichmakes firms more likely to hit the threshold and adopt but also means the new adopters are closeto the exit threshold and do not stay in the market for long. As the adoption threshold rises, firmsare less likely to hit the threshold, but also less likely to exit once they hit it.

Figure 21 shows how the churn rates, the share of adopters and the relative thresholds areaffected by wages, relative wages, rents and market size. Panels a and b show how the churn ratesand adoption are affected by a proportional increase in both high- and low-skilled wages. As wagesincrease the churn rate for non-adopters increases. This is because both variable labor and fixedlabor costs are higher and the firms need a higher productivity to stay in the market. The share ofadopters goes down as wages increase because the additional fixed labor cost of the new technologybecomes more burdensome as wages rise. The churn rate for non-adopters is largely unchanged asthe exit threshold for adopters and the adoption threshold increase proportionally.

The second row shows what happens as high-skilled wages increase. As wages for high skilledlabor goes up the share of adopters goes down since the return to adoption is lower. The churn ratefor adopters also falls slightly since the adoption threshold relative to the exit threshold for adoptersincreases and, on average, adopters will be further away from their exit threshold. The churn ratefor non-adopters rises substantially as fixed and variable labor costs increase and selection becomestougher.

In the third row, I show the how the churn rates and the share of adopters move with rent. Asrent increases, so do the fixed costs paid by the firm. Selection becomes tougher and the share ofadopters increases as does the churn rate for non-adopters. The churn rate for adopters is largelyunaffected as the burden of the rent increase is much smaller.

Finally, row 4 shows what happens to churn and adoption rates and market size, given by Y ,increases. As market size rises, churn rates go down as profits increase and firms are more willing

72

Page 73: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

-0.05 -0.04 -0.03 -0.02 -0.01 0 0.01

% change in wages

0

20

40

60

80

100

120

140

160

180

200

start

up r

ate

0.026

0.028

0.03

0.032

0.034

0.036

0.038

share

of adopte

rs

sr adopters

sr non-adopters

share of adopters

(a) Churn rates, µ

-0.05 -0.04 -0.03 -0.02 -0.01 0 0.01

% change in w

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

thre

shold

s

sa/sx

sa/sxa

(b) thresholds

0.51 0.515 0.52 0.525 0.53 0.535 0.54 0.545 0.55 0.555 0.56

wh

0

100

200

300

400

500

600

700

start

up r

ate

0.02

0.022

0.024

0.026

0.028

0.03

0.032

0.034

0.036

0.038

0.04

share

of adopte

rs

sr adopters

sr non-adopters

share of adopters

(c) Churn rates, µ

-0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 0.04

% change in wh

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

thre

shold

s

sa/sx

sa/sxa

(d) thresholds

-0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 0.04 0.05

% change in rent

0

5

10

15

20

25

start

up r

ate

0.0274

0.0275

0.0276

0.0277

0.0278

0.0279

0.028

share

of adopte

rs

sr adopters

sr non-adopters

share of adopters

(e) Churn rates, µ

-0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 0.04 0.05

% change in rent

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

thre

shold

s

sa/sx

sa/sxa

(f) thresholds

-0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 0.04 0.05

% change in Y

0

100

200

300

400

500

600

700

start

up r

ate

0.0266

0.0268

0.027

0.0272

0.0274

0.0276

0.0278

0.028

0.0282

0.0284

0.0286

share

of adopte

rs

sr adopters

sr non-adopters

share of adopters

(g) Churn rates, µ

-0.05 -0.04 -0.03 -0.02 -0.01 0 0.01 0.02 0.03 0.04 0.05

% change in Y

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

thre

shold

s

sa/sx

sa/sxa

(h) thresholds

Figure 21: Churn rates, share of adopters, and thresholds versus endogenous

objects

73

Page 74: The Geography of Business Dynamism and Skill-Biased ...The Geography of Business Dynamism and Skill-Biased Technical ... first channel is the market size effect. For a given productivity,

Data: ict inv.tot inv.

log(pop1980) 0.0138⇤⇤⇤ 0.0110⇤⇤⇤

(0.00121) (0.00164)H1980L1980

0.238⇤⇤⇤ 0.112⇤⇤⇤

(0.0456) (0.0373)wh1980wl1980

0.0436⇤ 0.0027

(0.0239) (0.0158)

Observations 269000 269000 269000 269000

R2 0.150 0.145 0.148 0.150

SEs clustered at the CBSA level*** p<0.01, ** p<0.05, * p<0.1

Table 13: ICT spending vs. initial city characteristics

to stay in the market for any given productivity. In addition, the share of firms that adopt increaseas firms are more willing to absorb the additional fixed cost of adoption since they can spread itover a larger volume of sales.

E Model Validation

E.1 Model versus data: robustness check

74