SEWP Research Conference October 19, 2005

38
Creating a Longitudinal Research Worker-Establishment Matched Dataset from Patent Data: Description and Application to Understanding International Knowledge Flows SEWP Research Conference SEWP Research Conference October October 19, 2005 19, 2005 Jinyoung Kim (SUNY-Buffalo) Sangjoon John Lee (Alfred University) Gerald Marschke (SUNY-Albany)

description

SEWP Research Conference October 19, 2005. Creating a Longitudinal Research Worker-Establishment Matched Dataset from Patent Data: Description and Application to Understanding International Knowledge Flows. - PowerPoint PPT Presentation

Transcript of SEWP Research Conference October 19, 2005

Page 1: SEWP Research Conference        October 19, 2005

Creating a Longitudinal Research Worker-Establishment Matched Dataset from Patent

Data:

Description and Application to Understanding International Knowledge Flows

SEWP Research ConferenceSEWP Research Conference October 19, 2005 October 19, 2005

Jinyoung Kim (SUNY-Buffalo) Sangjoon John Lee (Alfred University)

Gerald Marschke (SUNY-Albany)

Page 2: SEWP Research Conference        October 19, 2005

Issues

• Construction of a longitudinal research worker-establishment matched panel data

• Knowledge flow across national borders

Page 3: SEWP Research Conference        October 19, 2005

Idea

• Policy implications on immigration, labor market, and education arena

• productivity of scientific researchers

• transmittal mechanism of knowledge

• Technology spillover appears to be geographically limited

• Firms access externally-located technology partly through hiring of and collaboration with researchers from the outside.

Page 4: SEWP Research Conference        October 19, 2005

We examined:

1. Trends in U.S. firms’ access to the researchers overseas and those with foreign research experience in the late 1980s through the 1990s

2. Role of research personnel as a pathway for the diffusion of ideas from foreign countries to U.S. innovators

3. The firm-level determinants of accessing innovations developed overseas.

Page 5: SEWP Research Conference        October 19, 2005

Main findings:

a. In recent years, an increase in the extent that U.S. innovators access researchers residing in foreign country

• The fraction U.S. residents with foreign research experience in US firms appears to be falling.

• U.S. pharmaceutical and semiconductor firms are increasingly going to foreign countries to employ such researchers

b. Retaining researchers with overseas research experience seems to facilitate access to innovations developed overseas.

c. In the semiconductor industry, smaller firms and older firms are more likely to make use of the output of non-U.S. R&D.

d. In the pharmaceutical industry, younger firms are more likely to make use of the output of non-U.S. R&D.

Page 6: SEWP Research Conference        October 19, 2005

Outline

• Literature Review

• Data Construction Process

• Empirical findings

• Conclusions

Page 7: SEWP Research Conference        October 19, 2005

Literatures

Various mechanisms for technology and knowledge transfer across institutional boundaries.

• Informal Contact

• Agrawal, Cockburn, and McHale (2003), Von Hippel (1988)

• Spillovers

• Henderson, Jaffe, and Trajtenberg (REStat 1998), Jaffe (AER 1989), Zucker, Darby, and Brewer (AER 1998), Audretsch and Feldman (AER 1996), Mowery, Ziedonis (NBER 2001).

Page 8: SEWP Research Conference        October 19, 2005

• Transmission of Tacit knowledge

Feldman (1994)

• Collaboration and Hiring

Cohen, Nelson, and Walsh (Mgt Science 2002), Almeida and Kugot (Mgt Science 1999), Zucker, Darby, and Armstrong (NBER 2001), Adams, Black, Clemmons, and Stephan (NBER 2004)

Page 9: SEWP Research Conference        October 19, 2005

Data

1. Patent Bibliographic data (Patents BIB)

• U.S. utility patents issued between January 1975 and February 2002.

• Patent ID number, patent application and granting, patent assignee, and geographic information (country, state, city, address) on all inventors involved.

• The number of patents during this period is 2,493,610 and the number inventor records is 5,105,754

Page 10: SEWP Research Conference        October 19, 2005

2. ProQuest Digital Dissertations Abstracts

• Author, title of dissertation, degree conferring institution, date of degree, academic field, and type of degree

• From over 1,000 North American graduate schools and European universities.

• For those who earned degrees in all natural science and engineering fields between 1945 and 2003

• 1,068,551 degree holders.

Page 11: SEWP Research Conference        October 19, 2005

3. The Compact D/SEC

• 12,000 publicly traded firms

• at least $5 million in assets and at least 500 shareholders

• Information obtained from Annual Reports, 10-K and 20-F filings, and Proxy Statements for those companies.

• pharmaceutical and semiconductor firms in the Compact D/SEC data by their primary SIC.

• selected only the years 1989 through 1997 due to patent grant lag

Page 12: SEWP Research Conference        October 19, 2005

4. Standard & Poor’s Annual Guide to Stocks – Directory of Obsolete Securities

• histories of firm ownership changes due to mergers and acquisitions, bankruptcy, dissolution, and name changes, updated through December 2002.

5. NBER Patent-Citations

• collected by Hall, Jaffe and Trajtenberg (2001)

• all citations made and received by patents granted between 1975 and 1999. (16,522,438 citation records)

6. Thomas Register

• Firm founding year

Page 13: SEWP Research Conference        October 19, 2005

3 Steps in Data Construction

1. Identifying the same inventor among ‘same/similar’ names (Patent BIB)

2. Identifying the Ownership Structure of Subsidiaries (Compact D/SEC, S&P)

3. Combining Patent-Inventor Data with Firm Data and Patent Citation Data

Patent BIB Compact D/SECProquest

S&P

Thomas

Citation

+

Page 14: SEWP Research Conference        October 19, 2005

Front page of patent

Page 15: SEWP Research Conference        October 19, 2005

Step 1: Identifying the Same Inventor

• Inventor name variants

Adam Smith vs. Adam Smith?

Adam E. Smith vs. Adam Smith?

Adam Smyth vs. Adam Smith?

:

:

Page 16: SEWP Research Conference        October 19, 2005

• The size of data (1975-2002)

2,493,610 million patents

5,105,754 million inventor names

• Name of the inventor (last, first, middle, surname modifier)

• Street address, zip• City, state, country

Over 16 million patent citations (A. Jaffe)

Page 17: SEWP Research Conference        October 19, 2005

How to identify?

• Pair each name with other names and compare

N(N-1)/2 number of unique pairs.

= (5,105,754 x 5,105,753) / 2

≈ 13 trillion pairs

• Trajtenberg (2004)

Page 18: SEWP Research Conference        October 19, 2005

How to Identify?

a. The pair is a ‘Match’ if

• Last names (SOUNDEX coded) and First Names in the pair are the same and

• at least one of below categories are the same

i) Full Address: same street address+ city + country

ii) Self Citation: same name is found in the patent that is citing

iii) Shared Partner (s): two names from the pair share the same partner

c.f. Strong Criteria (Trajtenberg 2004)

Page 19: SEWP Research Conference        October 19, 2005

SOUNDEX Coding Method

• Code on the way a last name sounds rather than the way it is spelled.

• Expand the list of similar last names to overcome the potential for inconsistent foreign name translations into English.

PETTIT (P330000), Chang (C520000), Chiang (C520000)

• Giving letters numerical values from 1 to 6

1 for B, F, P, V; 2 for C, G, J, K, Q, S, X, Z; 3 for D, T;

4 for L; 5 for M, N; 6 for R; 0 for punctuation, H, W, Y

Page 20: SEWP Research Conference        October 19, 2005

b. The pair is a ‘Match’ if

• Full Last (not a Soundex coded) and First Names in the pair are the same and

• at least one of below categories are the same

i) Zip Code

ii) Full Middle Name

c.f. Medium Criteria (Trajtenberg 2004)

c. The pair is a ‘Mismatch’ if middle name initials are different.

Page 21: SEWP Research Conference        October 19, 2005

Impose Transitivity

A matched to B

B matched to C,

A matched to C

Page 22: SEWP Research Conference        October 19, 2005

An Example

ID Inventor name SOUNDEX Middle name Co-inventor ZIP

1 Adam Smith Adam S530000 John Keynes 20012

2 Adam Smith Adam S530000 Henry John Keynes 14228

3 Adam Smith Adam S530000 H 14228

4 Adam Smith Adam S530000 Henry 14214

5 Adam Smith Adam S530000 J John Keynes 14228

6 Adam Smyth Adam S530000 John Keynes 14228

-Match: 1:2 , 1:5, 1:6, 2:3, 2:4, 2:5, 2:6, 5:6: 3:6

-ID 5 is identified to be the same inventor through Transitivity

Page 23: SEWP Research Conference        October 19, 2005

• 126 mismatches found after imposing transitivity

• 3 categories of Mismatches

i) from data error

‘Laszlo Andra Szporny’ vs. ‘Laszlo Eszter Szporny’

ii) Inventor with 2 Middle names

iii) same Last and First names appear in the same patent

Page 24: SEWP Research Conference        October 19, 2005

Matching Results

• 2.3 million unique inventors (45%) out of 5.1 million names

c.f. Trajtenberg (2004)

• 1.6 million distinctive inventors (37%) out of 4.3 million names. (Our patent database is larger because it includes additional years, 2000-2002.)

• a matching criterion of the same Assignee -> can yield a bias in mobility among inventors.

• assigns scores for each matching criteria

• Instead we apply the criterion that two inventors are not treated as a match if their middle name initials differ.

• SOUNDEX coding system sometimes so loosely specifies names that apparently different last names are considered a match.

Page 25: SEWP Research Conference        October 19, 2005

Add Dissertation Abstract Information to Inventor data

• Match degree holders in the Dissertation Abstract data with the Inventor data.

• contains a full name in a string for each individual author

• Convert the last, first, middle names in the inventor data to a string of aggregated names

• 64,507 (3 percent) Ph.D. or equivalent degree holders out of 2.3 million uniquely identified inventors

Page 26: SEWP Research Conference        October 19, 2005

Step 2: Ownership Structure of Subsidiaries

• Necessary when combine firm-level information with patent data file

• Patent Assignee: either a parent firm or its subsidiaries.

• Firm identifier does not exist.

• Frequent changes in firm ownership and corporate names - During 1989 and 1997, 152 firms were merged, 15 firms were acquired, 145 firms changed their firm names

• Firm ownership structure of subsidiaries, M&A, and name change history

• Relate each assignee to a firm

• Enables to identify each inventor’s firm for which he/she is innovating

Page 27: SEWP Research Conference        October 19, 2005

1. Select two industry firms in the Compact D/SEC

• Primary SIC 2834 (pharmaceutical preparation) or Primary SIC 3674 (semiconductor and related devices)

2. Use S&P data

• whether the change of an inventor’s firm is due to firm-level M&A and/or corporation name changes.

3. List of subsidiary in the Compact D/SEC throughout the period 1989-1997

• not always complete –

• if once a subsidiary of the firm, it is a subsidiary throughout 1989-1997

4. Combined firms’ founding year

Page 28: SEWP Research Conference        October 19, 2005

Step 3: Combining Inventor data with firm data and Patent Citation data

• Combine inventor file with firm-level data

• Patent-inventor-firm matched data

• Link to Hall, Jaffe, and Trajtenberg citation data (2001)

• 16,522,438 citations for all granted patents applied from 1975 through 1999.

Page 29: SEWP Research Conference        October 19, 2005

Descriptive Statistics

1975 - 2002

• 2,493,610 patents

• 2.05 inventors per patent

• 2,299,579 unique inventors

Page 30: SEWP Research Conference        October 19, 2005

Descriptive Statistics

Total Pharmaceutical Semiconductor

Inventors (a) 2.299,579 25,609 33,683

Total No.Patents 2.22 2.8 2.60

No. Patent/Year 1.31 1.62 1.72

Degree holders (b) 122,168 3,399 3,941

Total No. Patent 3.07 3.70 2.95

No. Patent/Year 1.52 1.84 1.91

(b/a) 5.3%* 13.3% 11.7%

* 3 percent (64,507) of Ph.D. or equivalent degree holders

Page 31: SEWP Research Conference        October 19, 2005

Number of Patents Granted by Year of Application

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

* Grant lag - 97 % of patents are granted within the first 4 years of the applications date (Hall, Griliches, and Hausman 1986)

Page 32: SEWP Research Conference        October 19, 2005

International Knowledge Flow

1. Trends in U.S. firms’ access to the researchers with overseas research experience

2. Role of research personnel as a pathway for the diffusion of ideas from foreign to U.S.

3. The firm-level determinants of accessing innovations developed overseas.

Page 33: SEWP Research Conference        October 19, 2005

Inventors with Foreign Experience in US Domestic Patents Year Number of Inventors Fraction of Inventors by Foreign-Experience Type (%)

Current Foreign Residents

Current US Residentsw/ Foreign Experience †

Current US Residents w/o Foreign Exp.

All Pharma Semi All Pharma Semi All Pharma Semi All Pharma Semi

1985 42,368 8.15 0.99 90.86

1986 44,828 8.30 1.07 90.63

1987 48,810 8.21 1.13 90.66

1988 54,947 8.49 1.13 90.37

1989 59,164 2,143 1,139 8.60 14.47 9.04 1.17 2.01 1.14 90.23 83.53 89.82

1990 63,812 2,259 1,362 8.02 17.35 7.78 1.22 1.51 1.25 90.76 81.14 90.97

1991 67,657 3,332 2,791 7.76 19.09 6.02 1.26 1.23 1.22 90.98 79.68 92.76

1992 73,640 3,876 3,370 7.86 20.38 7.15 1.30 1.21 1.13 90.85 78.41 91.72

1993 80,428 4,505 4,190 8.06 25.88 7.06 1.21 1.31 1.03 90.73 72.81 91.91

1994 90,910 5,320 5,739 8.44 26.86 14.76 1.20 0.98 0.94 90.36 72.16 84.30

1995 104,775 6,629 7,450 8.78 28.87 15.18 1.13 0.87 0.86 90.08 70.25 83.96

1996 104,829 4,894 7,916 9.19 31.55 13.26 1.07 0.90 0.78 89.75 67.55 85.95

1997 119,556 6,093 9,993 9.11 29.71 15.31 1.01 0.75 0.80 89.87 69.54 83.89† Resided in foreign countries in the previous 10 years

Page 34: SEWP Research Conference        October 19, 2005

Patent-Inventor Ratio by Foreign-Experience type

0.5

0.7

0.9

1.1

1.3

1.5

1.7

1.9

2.1

1985

1986

1987

1988

1989

1990

1991

1992

1993

1994

1995

1996

1997

Year

Pate

nts

per

Inven

tor

Current foreignresidents

Current USresidents w/foreignexperience

Current USresidents w/oforeignexperience

Page 35: SEWP Research Conference        October 19, 2005

Variable Definition and Sample Statistics

DefinitionMean (Standard Deviation)

Pharmaceutical Semiconductor

CITE_FRGN Fraction of citations to patents that are assigned to foreign assignees

0.5505(0.3319)

0.4760(0.2850)

FRGN_EXP = 1 if at least one inventor is residing or used to reside in one of foreign countries where foreign assignees of cited patents are located

0.0734(0.2609)

0.0290(0.1677)

INVENTOR Number of all inventors in a patent assignee firm 326.0(195.7)

923.5(728.6)

EMPLOYEE Number of employees in a patent assignee firm 35,979(21,833)

41,538(52,501)

R&D/INV Real R&D expenditures in 1996 constant dollars over the number of inventors in a patent assignee firm

31.67(24.51)

12.04(27.34)

NSIC Number of secondary SIC’s assigned to a firm in a patent assignee firm

3.791(1.991)

3.154(1.944)

MEXP Median experience of all inventors in a patent assignee firm

5.292(1.582)

3.832(1.067)

FIRMAGE Years elapsed since the founding year of a patent assignee firm

77.40(51.51)

36.17(23.40)

Page 36: SEWP Research Conference        October 19, 2005

Determinants of Citation to Foreign-Assigned Patents

Pharmaceutical Semiconductor

FRGN_EXP 3.89504.95

3.38763.92

4.38323.87

5.86094.18

5.57303.66

6.41623.75

Log INVENTOR 1.08131.10

1.15951.19

-1.1918-2.69

-1.1702-2.64

Log EMPLOYEE 0.21240.38

0.18850.34

0.38711.24

0.35501.14

Log R&D/INV 0.05570.66

0.04880.59

0.06581.14

0.06911.18

Log NSIC -0.2723-0.38

-0.4079-0.57

1.14691.57

1.15621.56

Log MEXP -6.5845-4.41

-6.4702-4.40

-6.8640-2.76

-6.8410-2.66

Log FIRMAGE -1.0956-1.96

-1.1361-2.06

2.34392.88

2.37712.83

ObservationsR2

14300.0189

12470.1462

12150.1539

43160.0283

41860.1280

41120.1306

Dependent variable = logit transform of CITE_FRGN

Note: Rows show the estimated coefficient and the t statistic for each regressor. The result for a constant term is suppressed. The t statistic is based on the Huber-White sandwich estimator of variance.

Page 37: SEWP Research Conference        October 19, 2005

Conclusion

• An increase in the extent that U.S. innovators access researchers with foreign R&D experience in recent years

• An increase in U.S. firms’ employment of foreign-residing researchers;

• The fraction of research-active U.S. residents with foreign research experience appears to be falling

• Possibly to capture the geographically dispersed knowledge spillovers.

• Having researchers with research experience abroad seems to facilitate access to foreign produced knowledge.

• In the semiconductor industry smaller firms and older firms are more likely to make use of the output of non-U.S. R&D.

• In the pharmaceutical industry, younger firms are more likely to make use of the output of non-U.S. R&D.

Page 38: SEWP Research Conference        October 19, 2005

Future Extension

• The consequences of the mobility of R&D personnel on firm R&D.

• The impact of the arrival of a researcher with a particular set of R&D experiences on the character and quantity R&D done by a firm

• The importance of inter-firm mobility for technological diffusion.

• How firms organize the R&D enterprise, the extent of collaboration among scientists geographically dispersed, and the extent of interaction among scientists with different backgrounds.