Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty
-
Upload
xander-powers -
Category
Documents
-
view
37 -
download
0
description
Transcript of Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty
![Page 1: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/1.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science1
Scaling in the Geography of US Computer Science
Rui Carvalho and Michael BattyUniversity College London
[email protected] [email protected]://www.casa.ucl.ac.uk/
Thanks: Michael Gastner (SFI), Isaac Councill (PSU), Chris Brunsdon (Leicester), Ben Gimpert (UCL)
![Page 2: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/2.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science2
Motivation• Why Geography?
– Scientists: who can I collaborate with in my city/country?
– Funding Agencies: where are new research centres emerging? Is regional distribution of funds optimal?
– Scientometrics: distinguish between J. Smith (PSU) and J. Smith (UCL);
• Preprint server challenges:– [USA] NIH-funded investigators are
required to submit to PubMed their papers within 1 year of publication (effective May 2, 2005);
– [UK] Wellcome Trust-funded papers will in future have to be placed in a central public archive within six months of publication;
• Data mining challenges:– Processing of large databases give promise
to uncover knowledge hidden behind the mass of available data;
– Dramatically speed up achievements formerly reached solely by human effort and provide new results that could not have been reached by humans unaided;
• Statistical Challenges:– Conventional wisdom holds that
(geographical) spatial point processes have characteristic scales...
– Yet most “real world” phenomena are often far from equilibrium.
PNAS, 6 April 2004
![Page 3: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/3.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science3
Plan• Open Archives Datasets:
– Citeseer (Computer Science);– arXiv.org (mainly Physics, but also Maths and CS)
• Geographical Datasets:– The US census bureau makes available on the web datasets for
geocoding, but Europe lacks a unified ‘open-access’ database;• Plan:
– Extract ZIP codes from authors’ addresses;– Map research centres geographically;
• Questions about the research centres:– How productive are they?– Are there non-trivial spatial structures at a geographical scale?
![Page 4: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/4.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science4
Plan• Open Archives Datasets:
– Citeseer (Computer Science);– arXiv.org (mainly Physics, but also Maths and CS)
• Geographical Datasets:– The US census bureau makes available on the web datasets for
geocoding, but Europe lacks a unified ‘open-access’ database;• Plan:
– Extract ZIP codes from authors’ addresses;– Map research centres geographically;
• Questions about the research centres:– How productive are they?– Are there non-trivial spatial structures at a geographical
scale?
Can Statistical Physics Help?
![Page 5: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/5.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science5
What is Citeseer?• Founded by Steve Lawrence and C. Lee Giles in
1997 (NEC);• Now at Penn State http://citeseer.ist.psu.edu/• Archive of computer science research papers
harvested from the web and submitted by users;• Currently (Dec 2005) contains over 730,000
documents;• Citeseer was developed as a model for
Autonomous Citation Indexing, i.e. citation indexes are created automatically;
• Can search content in postscript and PDF files.
![Page 6: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/6.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science6
Data Collecting and Parsing• Citeseer metadata:
– 525,055 computer science research papers;– 399,757 (76.14%) of which are unique;– 103,172 (25.81%) of the unique papers have one or more US
authors;– 2,975 different ZIP codes in the unique papers belong to the
US conterminous states (48 states, plus the District of Columbia);
• 5 most productive ZIP codes:1. Count: 3950 Zip: 15213 Carnegie Mellon Univ, Pittsburgh PA;2. Count 3403 Zip: 02139 MIT, Cambridge, MA;3. Count: 2954 Zip: 94305 Stanford Univ, CA;4. Count: 2691 Zip: 94720 Univ California at Berkley, CA;5. Count: 2309 Zip: 61801 Univ Illinois at Urbana Champaign, IL
![Page 7: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/7.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science7
Q1: How productive are the research centres?
![Page 8: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/8.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science8
Q2: Non-trivial spatial structures?
![Page 9: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/9.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science9
The Geography of Citeseer
![Page 10: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/10.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science10
Cartograms
Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)Density-equalizing map projections: Diffusion-based algorithm and applicationsMichael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)
![Page 11: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/11.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science11
Cartograms
Diffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)Density-equalizing map projections: Diffusion-based algorithm and applicationsMichael T. Gastner and M. E. J. Newman, Geocomputation 2005 (to appear)
![Page 12: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/12.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science12
CartogramsDiffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)
![Page 13: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/13.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science13
CartogramsDiffusion-based method for producing density-equalizing maps, Michael T. Gastner and M. E. J. Newman, Proc. Nat. Acad. Sci. USA, 101, 7499-7504 (2004)
![Page 14: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/14.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science14
Spatial Point Processes
• Moments:– First moment: ρ, expected number of points
per unit area;– Second moment: Ripley’s function. ρK(r) is
the expected number of points within distance r of a point.
• For a Poisson process, ;
• But neither the first or second moments give a feel for the way in which spatial distribution changes within an area.
2K r r
![Page 15: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/15.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science15
The Two-Point Correlation Function
• The two-point correlation function
describes the probability to find a point in volume dV(x1) and another point in dV(x2) at distance r = |x1-x2|;
• For a Poisson process g(r)=1;
• Edge corrections (Ripley’s Weights): take a circle centred on point x passing through another point y. If the circle lies entirely within the domain, D, the point is counted once. If a proportion p(x,y) of the circle lies within D, the point is counted as 1/p points.
22 1 2 1 2 1 2, ( ) ( ) ( ) ( )x x dV x dV x g r dV x dV x
![Page 16: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/16.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science16
Computation of the Two-Point Correlation Function
Intersection with border gives more than one polygon
Geographical range at which the two-point correlation function can be approximated by a power-law
![Page 17: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/17.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science17
Two-Point Correlation Function
![Page 18: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/18.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science18
Speculation: knowledge diffusion?
![Page 19: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/19.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science19
Speculation: Universality?
![Page 20: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/20.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science20
To find out more
• http://www.casa.ucl.ac.uk/
• Spatially Embedded Complex Systems Engineering (SECSE):http://www.secse.net/members: UCL, Leeds, Southampton, Sussex
![Page 21: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/21.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science21
![Page 22: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/22.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science22
Plot of state R&D expenditure (NSF) vs population
![Page 23: Scaling in the Geography of US Computer Science Rui Carvalho and Michael Batty](https://reader036.fdocuments.us/reader036/viewer/2022062816/56813895550346895da0440e/html5/thumbnails/23.jpg)
PHYSICS AND THE CITY. Carvalho and Batty: Scaling in the Geography of US Computer Science23
Poisson Point Process
• We say that a spatial process is completely random iff:– The number of events in any planar region A
with area |A| follows a Poisson distribution with mean λ |A|, where λ is the density of points;
– For any two disjoint regions A and B, the random variables N(A) and N(B) are independent.