Mapping Examples

28
Math CS Physics Chemistry Biology Engineering Medical Brain/Neuro Psych Earth Sciences Social Sciences Kevin W. Boyack Mapping Examples Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under Contract DE-AC04-94AL85000.

description

Math. Social Sciences. CS. Engineering. Earth Sciences. Physics. Psych. Chemistry. Brain/Neuro. Biology. Medical. Mapping Examples. Kevin W. Boyack. - PowerPoint PPT Presentation

Transcript of Mapping Examples

Page 1: Mapping Examples

Math

CS

Physics

Chemistry

Biology

Engineering

Medical

Brain/Neuro

Psych

EarthSciences

SocialSciences

Kevin W. Boyack

Mapping Examples

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under Contract DE-AC04-94AL85000.

Page 2: Mapping Examples

2

Macromodel: “Best” Map

Each dot is one journal

Journals group by discipline

Labeled by hand

Comp Sci

PolySciLaw

LIS

Geogr

Hist

Econ

Sociol

Nursing

Educ

Comm

Psychol

Geront

Neurol

RadiolSport Sci

Oper Res

Math

Robot

AIStat

Psychol

Anthrop

Elect Eng

Physics

Mech Eng

ConstrMatSci

FuelsElectChemP Chem

Chemistry

AnalytChem

Astro

Env

Pharma

Neuro Sci

Chem Eng

Polymer

GeoSci

GeoSci

Paleo

Meteorol

EnvMarine

Social Sci

SoilPlant

Ecol

Agric

Earth Sciences

Psychol

OtoRh

HealthCare

BiomedRehab

Gen Med

GenetCardio

Ped

Food Sci

Zool

EntoVet Med

Parasit

Ophth

Dairy

Endocr

Ob/Gyn

Virol

Hemat

Oncol

Immun

BioChem

Nutr

Endocr

Urol

Dentist

Derm

Pathol

Gastro

Surg

Medicine

ApplMath

Aerosp

CondMatNuc

EmergMed Gen/Org

Boyack, K.W., Klavans, R., & Börner, K. (2005, in press). Mapping the backbone of science. Scientometrics.

Page 4: Mapping Examples

4

Macromodel: Structural Map

Clusters of journals denote disciplines

Lines denote strongest relationships between disciplines

Enables disciplinary diffusion studies

Enables comparison of institutions by disciplineBoyack, K.W., Klavans, R., & Börner, K. (2005, in press). Mapping the backbone of science. Scientometrics.

Page 5: Mapping Examples

5

Paper-level Map

Page 6: Mapping Examples

6

Disciplinary S&T Model - Details

Uses combined SCIE/ SSCI/ISI Proceedings from 2003– 7445 journals, 1198

proceedings– journals, proceedings

treated equivalently– Bib coupling of

journals

Initial ordination and clustering of journals gave 852 clusters

Coupling counts reaggregated at the journal cluster level; ordination of journal clusters– (x,y) positions for

each journal cluster, journal

Math

CS

Physics

Chemistry

Biology

Engineering

Medical

Brain/Neuro

Psych

EarthSciences

SocialSciences

Page 7: Mapping Examples

7

Research Community Model - Details

Uses combined SCIE/ SSCI/ISI Proceedings from 2003– 997,775 papers

from 8643 sources– Bib coupling of

papers

Initial ordination and clustering of journals gave 117,433 clusters

Cluster positions calculated using journal positions from the disciplinary map

Math

CS

Physics

Chemistry

Biology

Engineering

Medical

Brain/Neuro

Psych

EarthSciences

SocialSciences

Page 8: Mapping Examples

8

Page 9: Mapping Examples

9

Page 10: Mapping Examples

10

Accuracy

Page 11: Mapping Examples

11

Journal-level: Local Accuracy

For each similarity measure, journal pairs were assigned a 1/0 binary score if they were IN/OUT of the same ISI category

Accuracy vs. coverage curves were generated for each similarity measure

For each similarity measure, distances (in the VxOrd layouts) between journal pairs were calculated

Accuracy vs. coverage curves were generated for each re-estimated (distance) similarity measure

Results after running through VxOrd were more accurate than the raw measures

Inter-citation measures are best

Coverage (fraction of journals)

0.0 0.2 0.4 0.6 0.8 1.0

Acc

ura

cy (

cum

ula

tive

% y

es)

0.40

0.60

0.80

1.00FIG5_DAT.PDW

IC-CosineIC-K50IC-JaccardIC-RFavgIC-PearsonCC-CosineCC-K50CC-Pearson

0.90 0.95 1.000.70

0.75

0.80

0.85

Coverage (fraction of journals)

0.0 0.2 0.4 0.6 0.8 1.0

Acc

ura

cy (

cum

ula

tive

% y

es)

0.60

0.70

0.80

0.90

FIG5_DAT.PDW

IC-CosineIC-K50IC-JaccardIC-RFavgIC-PearsonCC-CosineCC-K50CC-Pearson

Similarity measures

After VxOrd

Klavans, R., & Boyack, K.W. (in press). Identifying a better measure of relatedness for mapping science. Journal of the American Society for Information Science and Technology.

Page 12: Mapping Examples

12

Journal-level: Regional Accuracy

For each similarity measure, the VxOrd layout was subjected to k-means clustering using different numbers of clusters

Resulting cluster/category memberships were compared to actual category memberships using entropy/mutual information method

Increasing Z-score indicates increasing distance from a random solution

Most similarity measures are within several percent of each other

Number of k-means clusters

100 150 200 250Z

-sco

re

280

300

320

340

360

380

400

IC RawIC CosineIC JaccardIC PearsonIC RFavgCC RawCC K50CC Pearson

Boyack, K.W., Klavans, R., & Börner, K., (submitted). Mapping the backbone of science. Scientometrics.

Page 13: Mapping Examples

13

Paper-level: Local Accuracy

Two maps (current and reference), two measures (raw and modified cosine), two aggregation levels

For each similarity measure, paper pairs, ordered by distance on the map, were assigned a 1/0 binary score if they were IN/OUT of the same ISI category

Accuracy vs. coverage curves were generated for each similarity measure

K50 measures have high accuracy at high coverage

Klavans, R., & Boyack, K.W. (under review). Quantitative evaluation of large maps of science. Scientometrics.

Local Accuracy (Current Papers)

K50 (aggregated) RawFreq (aggregated)K50 (paper) RawFreq (paper)

60%

Lo

cal A

ccu

racy

70%

60

%

Lo

cal A

ccu

racy

(%

yes)

70%

% Coverage

Local Accuracy (Reference Papers)

K50 (aggregated) RawFreq (aggregated)K50 (paper) RawFreq (paper)

% Coverage

60

%

Lo

cal A

ccu

racy

(%

yes)

70%

Page 14: Mapping Examples

14

Disciplinary Bias (Reference Papers)

Dis

cip

lina

ry B

ias Small[1999]

RawFreq (paper)RawFreq (aggregated)K50 (paper)K50 (aggregated)

Disciplinary Bias (Current Papers)

Dis

cip

lina

ry B

ias

RawFreq (paper)RawFreq (aggregated)K50 (paper)K50 (aggregated)

Paper-level: Disciplinary Bias

Two maps (current and reference), two measures (raw and modified cosine), two aggregation levels

Disciplinary bias measures effect of thresholding with coverage

K50 measures have lowest bias

Klavans, R., & Boyack, K.W. (under review). Quantitative evaluation of large maps of science. Scientometrics.

Page 15: Mapping Examples

15

Cluster Size (Reference Papers)

RawFreq (paper)K50 (paper)RawFreq (aggregated)K50 (aggregated)

Log (Rank Size)

Lo

g (S

ize

of

Clu

ste

r)

Cluster Size (Current Papers)

Lo

g (S

ize

of

Clu

ste

r)

Log (Rank Size)

K50 (paper)RawFreq (paper)RawFreq (aggregated)K50 (aggregated)

Paper-level: Cluster Distributions

Two maps (current and reference), two measures (raw and modified cosine), two aggregation levels

Cluster size distributions – smaller clusters are usually better – chaining can create communities that are too large

K50 measures have fewer large clusters

Klavans, R., & Boyack, K.W. (under review). Quantitative evaluation of large maps of science. Scientometrics.

Page 16: Mapping Examples

16

Paper-level: Text Analysis

Cluster size1 10 100

(1 -

CO

H)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Abstr-LgAbstr-SmTitle-LgRand-A-LgRand-A-SmRand-T-Lg

Cluster size1 10 100

(1 -

CO

H)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Abstr-LgRand-A-Lg

pdf at cluster size 100.00 0.05 0.10 0.15 0.20

Current map, K50 similarity

Multidocument summarization– cluster cohesiveness

calculated using abstracts for all clusters

– compared to values from random clusters

98.3% of the actual clusters of size 10 are more cohesive than random at p<0.0001 each

Page 17: Mapping Examples

17

Uses

Page 18: Mapping Examples

18

Worldwide S&T

Circle size – number of topics

Percent conference papers0-25%25-50%50-75%75-100%

Math

CS

Physics

Chemistry

Biology

Engineering

Medical

Brain/Neuro

Psych

EarthSciences

SocialSciences

Page 19: Mapping Examples

19

Math

CS

Physics

Chemistry

Biology

Engineering

Medical

Brain/Neuro

Psych

EarthSciences

SocialSciences

Circle size – number of topics

Vitality (use of newer ideas)>10% more vital than world0-10% more vital than world0-10% less vital than world>10% less vital than world

DOE Profile

Page 20: Mapping Examples

20

Math

CS

Physics

Chemistry

Biology

Engineering

Medical

Brain/Neuro

Psych

EarthSciences

SocialSciences

Circle size – number of topics

Vitality (use of newer ideas)>10% more vital than world0-10% more vital than world0-10% less vital than world>10% less vital than world

Sandia Profile

Page 21: Mapping Examples

21

Math

CS

Physics

Chemistry

Biology

Engineering

Medical

Brain/Neuro

Psych

EarthSciences

SocialSciences

Circle size – number of topics

Vitality (use of newer ideas)>10% more vital than world0-10% more vital than world0-10% less vital than world>10% less vital than world

A Specific University Profile

Page 22: Mapping Examples

22

Math

CS

Physics

Chemistry

Biology

Engineering

Medical

Brain/Neuro

Psych

EarthSciences

SocialSciences

Circle size – number of topics

Vitality (use of newer ideas)>10% more vital than world0-10% more vital than world0-10% less vital than world>10% less vital than world

Potential SNL/Univ Collaborations

Page 23: Mapping Examples

23

Identifying Opportunities/Threats

Policy

Economics

Statistics

Math

CompSci

Physics

Biology

GeoScience

Microbiology

BioChem

Brain

PsychiatryEnvironment

Vision

Virology Infectious Diseases

Cancer

MRI

Bio-Materials

Law

Plant

Animal

Phys-Chem

Chemistry

Psychology

Education

Computer Tech

GI

(36 Research Communities that will impact GI Research…that GI Researchers are least likely to be aware of)

Page 24: Mapping Examples

24

Identifying Core Competency

Policy

Economics

Statistics

Math

CompSci

Physics

Biology

GeoScience

Microbiology

BioChem

Brain

PsychiatryEnvironment

Vision

Virology Infectious Diseases

Cancer

MRI

Bio-Materials

Law

Plant

Animal

Phys-Chem

Chemistry

Psychology

Education

Computer Tech

GI

Funding patterns of the National Science Foundation (NSF)

Page 25: Mapping Examples

25

Identifying Core Competency

Policy

Economics

Statistics

Math

CompSci

Physics

Biology

GeoScience

Microbiology

BioChem

Brain

PsychiatryEnvironment

Vision

Virology Infectious Diseases

Cancer

MRI

Bio-Materials

Law

Plant

Animal

Phys-Chem

Chemistry

Psychology

Education

Computer Tech

GI

Funding patterns of the National Institutes of Health (NIH)

Page 26: Mapping Examples

26

Identifying Core Competency

Policy

Economics

Statistics

Math

CompSci

Physics

Biology

GeoScience

Microbiology

BioChem

Brain

PsychiatryEnvironment

Vision

Virology Infectious Diseases

Cancer

MRI

Bio-Materials

Law

Plant

Animal

Phys-Chem

Chemistry

Psychology

Education

Computer Tech

GI

Funding patterns of the US Department of Energy (DOE)

Page 27: Mapping Examples

27

Potential Uses of Science Maps

Overlays– Topic distribution– Opportunity/threat assessment– Core competency identification– Funding (amount) patterns– Impact patterns

Relationships– Interdisciplinary– Community level between disciplines

Page 28: Mapping Examples

28

Related Publications

– Boyack, K. W., Klavans, R. & Börner, K. (2005, in press). Mapping the backbone of science. Scientometrics.

– Klavans, R. & Boyack, K. W. (2005, in press). Identifying a better measure of relatedness for mapping science. Journal of the American Society for Information Science and Technology.

– Boyack, K. W., & Rahal, N. (2005, in press). Evaluation of LDRD investment areas at Sandia. Technological Forecasting and Social Change.

– Boyack, K. W., Mane, K. & Börner, K. (2004). Mapping Medline papers, genes and proteins related to melanoma research. IEEE Information Visualization 2004, 965-971.

– Boyack, K. W. (2004). Mapping knowledge domains: Characterizing PNAS. Proceedings of the National Academy of Sciences 101(S1), 5192-5199.

– Boyack, K. W., & Börner, K. (2003). Indicator-assisted evaluation and funding of research: Visualizing the influence of grants on the number and quality of research papers. Journal of the American Society for Information Science and Technology 54(5), 447.

– Börner, K., Chen, C., & Boyack, K. W. (2003). Visualizing knowledge domains. Annual Review of Information Science and Technology 37, 179-255.

– Werner-Washburne, M., Wylie, B., Boyack, K., Fuge, E., Galbraith, J., Fleharty, M., Weber, J., & Davidson, G.S. (2002). Concurrent analysis of multiple genome-scale datasets. Genome Research 12(10), 1564-1573.

– Boyack, K. W., Wylie, B. N., & Davidson, G. S. (2002). Information visualization, human-computer interaction, and cognitive psychology: Domain visualizations. Lecture Notes in Computer Science 2539, 145-160.

– Boyack, K. W., Wylie, B. N., & Davidson, G. S. (2002). Domain visualization using VxInsight for science and technology management. Journal of the American Society for Information Science and Technology, 53(9), 764-774.

– Davidson, G. S., Wylie, B. N., & Boyack, K. W. (2001). Cluster stability and the use of noise in interpretation of clustering. Proc. IEEE Information Visualization 2001, 23-30.

– Boyack, K.W., Wylie, B.N., Davidson, G.S. & Johnson, D.K., Analysis of patent databases using VxInsight. Presented at New Paradigms in Information Visualization and Manipulation 2000, McLean, VA, Nov. 10, 2000.