Mapping Examples
-
Upload
rashad-morin -
Category
Documents
-
view
27 -
download
0
description
Transcript of Mapping Examples
Math
CS
Physics
Chemistry
Biology
Engineering
Medical
Brain/Neuro
Psych
EarthSciences
SocialSciences
Kevin W. Boyack
Mapping Examples
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under Contract DE-AC04-94AL85000.
2
Macromodel: “Best” Map
Each dot is one journal
Journals group by discipline
Labeled by hand
Comp Sci
PolySciLaw
LIS
Geogr
Hist
Econ
Sociol
Nursing
Educ
Comm
Psychol
Geront
Neurol
RadiolSport Sci
Oper Res
Math
Robot
AIStat
Psychol
Anthrop
Elect Eng
Physics
Mech Eng
ConstrMatSci
FuelsElectChemP Chem
Chemistry
AnalytChem
Astro
Env
Pharma
Neuro Sci
Chem Eng
Polymer
GeoSci
GeoSci
Paleo
Meteorol
EnvMarine
Social Sci
SoilPlant
Ecol
Agric
Earth Sciences
Psychol
OtoRh
HealthCare
BiomedRehab
Gen Med
GenetCardio
Ped
Food Sci
Zool
EntoVet Med
Parasit
Ophth
Dairy
Endocr
Ob/Gyn
Virol
Hemat
Oncol
Immun
BioChem
Nutr
Endocr
Urol
Dentist
Derm
Pathol
Gastro
Surg
Medicine
ApplMath
Aerosp
CondMatNuc
EmergMed Gen/Org
Boyack, K.W., Klavans, R., & Börner, K. (2005, in press). Mapping the backbone of science. Scientometrics.
3
Modified from Boyack et al. by Ian Aliman, IU
4
Macromodel: Structural Map
Clusters of journals denote disciplines
Lines denote strongest relationships between disciplines
Enables disciplinary diffusion studies
Enables comparison of institutions by disciplineBoyack, K.W., Klavans, R., & Börner, K. (2005, in press). Mapping the backbone of science. Scientometrics.
5
Paper-level Map
6
Disciplinary S&T Model - Details
Uses combined SCIE/ SSCI/ISI Proceedings from 2003– 7445 journals, 1198
proceedings– journals, proceedings
treated equivalently– Bib coupling of
journals
Initial ordination and clustering of journals gave 852 clusters
Coupling counts reaggregated at the journal cluster level; ordination of journal clusters– (x,y) positions for
each journal cluster, journal
Math
CS
Physics
Chemistry
Biology
Engineering
Medical
Brain/Neuro
Psych
EarthSciences
SocialSciences
7
Research Community Model - Details
Uses combined SCIE/ SSCI/ISI Proceedings from 2003– 997,775 papers
from 8643 sources– Bib coupling of
papers
Initial ordination and clustering of journals gave 117,433 clusters
Cluster positions calculated using journal positions from the disciplinary map
Math
CS
Physics
Chemistry
Biology
Engineering
Medical
Brain/Neuro
Psych
EarthSciences
SocialSciences
8
9
10
Accuracy
11
Journal-level: Local Accuracy
For each similarity measure, journal pairs were assigned a 1/0 binary score if they were IN/OUT of the same ISI category
Accuracy vs. coverage curves were generated for each similarity measure
For each similarity measure, distances (in the VxOrd layouts) between journal pairs were calculated
Accuracy vs. coverage curves were generated for each re-estimated (distance) similarity measure
Results after running through VxOrd were more accurate than the raw measures
Inter-citation measures are best
Coverage (fraction of journals)
0.0 0.2 0.4 0.6 0.8 1.0
Acc
ura
cy (
cum
ula
tive
% y
es)
0.40
0.60
0.80
1.00FIG5_DAT.PDW
IC-CosineIC-K50IC-JaccardIC-RFavgIC-PearsonCC-CosineCC-K50CC-Pearson
0.90 0.95 1.000.70
0.75
0.80
0.85
Coverage (fraction of journals)
0.0 0.2 0.4 0.6 0.8 1.0
Acc
ura
cy (
cum
ula
tive
% y
es)
0.60
0.70
0.80
0.90
FIG5_DAT.PDW
IC-CosineIC-K50IC-JaccardIC-RFavgIC-PearsonCC-CosineCC-K50CC-Pearson
Similarity measures
After VxOrd
Klavans, R., & Boyack, K.W. (in press). Identifying a better measure of relatedness for mapping science. Journal of the American Society for Information Science and Technology.
12
Journal-level: Regional Accuracy
For each similarity measure, the VxOrd layout was subjected to k-means clustering using different numbers of clusters
Resulting cluster/category memberships were compared to actual category memberships using entropy/mutual information method
Increasing Z-score indicates increasing distance from a random solution
Most similarity measures are within several percent of each other
Number of k-means clusters
100 150 200 250Z
-sco
re
280
300
320
340
360
380
400
IC RawIC CosineIC JaccardIC PearsonIC RFavgCC RawCC K50CC Pearson
Boyack, K.W., Klavans, R., & Börner, K., (submitted). Mapping the backbone of science. Scientometrics.
13
Paper-level: Local Accuracy
Two maps (current and reference), two measures (raw and modified cosine), two aggregation levels
For each similarity measure, paper pairs, ordered by distance on the map, were assigned a 1/0 binary score if they were IN/OUT of the same ISI category
Accuracy vs. coverage curves were generated for each similarity measure
K50 measures have high accuracy at high coverage
Klavans, R., & Boyack, K.W. (under review). Quantitative evaluation of large maps of science. Scientometrics.
Local Accuracy (Current Papers)
K50 (aggregated) RawFreq (aggregated)K50 (paper) RawFreq (paper)
60%
Lo
cal A
ccu
racy
70%
60
%
Lo
cal A
ccu
racy
(%
yes)
70%
% Coverage
Local Accuracy (Reference Papers)
K50 (aggregated) RawFreq (aggregated)K50 (paper) RawFreq (paper)
% Coverage
60
%
Lo
cal A
ccu
racy
(%
yes)
70%
14
Disciplinary Bias (Reference Papers)
Dis
cip
lina
ry B
ias Small[1999]
RawFreq (paper)RawFreq (aggregated)K50 (paper)K50 (aggregated)
Disciplinary Bias (Current Papers)
Dis
cip
lina
ry B
ias
RawFreq (paper)RawFreq (aggregated)K50 (paper)K50 (aggregated)
Paper-level: Disciplinary Bias
Two maps (current and reference), two measures (raw and modified cosine), two aggregation levels
Disciplinary bias measures effect of thresholding with coverage
K50 measures have lowest bias
Klavans, R., & Boyack, K.W. (under review). Quantitative evaluation of large maps of science. Scientometrics.
15
Cluster Size (Reference Papers)
RawFreq (paper)K50 (paper)RawFreq (aggregated)K50 (aggregated)
Log (Rank Size)
Lo
g (S
ize
of
Clu
ste
r)
Cluster Size (Current Papers)
Lo
g (S
ize
of
Clu
ste
r)
Log (Rank Size)
K50 (paper)RawFreq (paper)RawFreq (aggregated)K50 (aggregated)
Paper-level: Cluster Distributions
Two maps (current and reference), two measures (raw and modified cosine), two aggregation levels
Cluster size distributions – smaller clusters are usually better – chaining can create communities that are too large
K50 measures have fewer large clusters
Klavans, R., & Boyack, K.W. (under review). Quantitative evaluation of large maps of science. Scientometrics.
16
Paper-level: Text Analysis
Cluster size1 10 100
(1 -
CO
H)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Abstr-LgAbstr-SmTitle-LgRand-A-LgRand-A-SmRand-T-Lg
Cluster size1 10 100
(1 -
CO
H)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
Abstr-LgRand-A-Lg
pdf at cluster size 100.00 0.05 0.10 0.15 0.20
Current map, K50 similarity
Multidocument summarization– cluster cohesiveness
calculated using abstracts for all clusters
– compared to values from random clusters
98.3% of the actual clusters of size 10 are more cohesive than random at p<0.0001 each
17
Uses
18
Worldwide S&T
Circle size – number of topics
Percent conference papers0-25%25-50%50-75%75-100%
Math
CS
Physics
Chemistry
Biology
Engineering
Medical
Brain/Neuro
Psych
EarthSciences
SocialSciences
19
Math
CS
Physics
Chemistry
Biology
Engineering
Medical
Brain/Neuro
Psych
EarthSciences
SocialSciences
Circle size – number of topics
Vitality (use of newer ideas)>10% more vital than world0-10% more vital than world0-10% less vital than world>10% less vital than world
DOE Profile
20
Math
CS
Physics
Chemistry
Biology
Engineering
Medical
Brain/Neuro
Psych
EarthSciences
SocialSciences
Circle size – number of topics
Vitality (use of newer ideas)>10% more vital than world0-10% more vital than world0-10% less vital than world>10% less vital than world
Sandia Profile
21
Math
CS
Physics
Chemistry
Biology
Engineering
Medical
Brain/Neuro
Psych
EarthSciences
SocialSciences
Circle size – number of topics
Vitality (use of newer ideas)>10% more vital than world0-10% more vital than world0-10% less vital than world>10% less vital than world
A Specific University Profile
22
Math
CS
Physics
Chemistry
Biology
Engineering
Medical
Brain/Neuro
Psych
EarthSciences
SocialSciences
Circle size – number of topics
Vitality (use of newer ideas)>10% more vital than world0-10% more vital than world0-10% less vital than world>10% less vital than world
Potential SNL/Univ Collaborations
23
Identifying Opportunities/Threats
Policy
Economics
Statistics
Math
CompSci
Physics
Biology
GeoScience
Microbiology
BioChem
Brain
PsychiatryEnvironment
Vision
Virology Infectious Diseases
Cancer
MRI
Bio-Materials
Law
Plant
Animal
Phys-Chem
Chemistry
Psychology
Education
Computer Tech
GI
(36 Research Communities that will impact GI Research…that GI Researchers are least likely to be aware of)
24
Identifying Core Competency
Policy
Economics
Statistics
Math
CompSci
Physics
Biology
GeoScience
Microbiology
BioChem
Brain
PsychiatryEnvironment
Vision
Virology Infectious Diseases
Cancer
MRI
Bio-Materials
Law
Plant
Animal
Phys-Chem
Chemistry
Psychology
Education
Computer Tech
GI
Funding patterns of the National Science Foundation (NSF)
25
Identifying Core Competency
Policy
Economics
Statistics
Math
CompSci
Physics
Biology
GeoScience
Microbiology
BioChem
Brain
PsychiatryEnvironment
Vision
Virology Infectious Diseases
Cancer
MRI
Bio-Materials
Law
Plant
Animal
Phys-Chem
Chemistry
Psychology
Education
Computer Tech
GI
Funding patterns of the National Institutes of Health (NIH)
26
Identifying Core Competency
Policy
Economics
Statistics
Math
CompSci
Physics
Biology
GeoScience
Microbiology
BioChem
Brain
PsychiatryEnvironment
Vision
Virology Infectious Diseases
Cancer
MRI
Bio-Materials
Law
Plant
Animal
Phys-Chem
Chemistry
Psychology
Education
Computer Tech
GI
Funding patterns of the US Department of Energy (DOE)
27
Potential Uses of Science Maps
Overlays– Topic distribution– Opportunity/threat assessment– Core competency identification– Funding (amount) patterns– Impact patterns
Relationships– Interdisciplinary– Community level between disciplines
28
Related Publications
– Boyack, K. W., Klavans, R. & Börner, K. (2005, in press). Mapping the backbone of science. Scientometrics.
– Klavans, R. & Boyack, K. W. (2005, in press). Identifying a better measure of relatedness for mapping science. Journal of the American Society for Information Science and Technology.
– Boyack, K. W., & Rahal, N. (2005, in press). Evaluation of LDRD investment areas at Sandia. Technological Forecasting and Social Change.
– Boyack, K. W., Mane, K. & Börner, K. (2004). Mapping Medline papers, genes and proteins related to melanoma research. IEEE Information Visualization 2004, 965-971.
– Boyack, K. W. (2004). Mapping knowledge domains: Characterizing PNAS. Proceedings of the National Academy of Sciences 101(S1), 5192-5199.
– Boyack, K. W., & Börner, K. (2003). Indicator-assisted evaluation and funding of research: Visualizing the influence of grants on the number and quality of research papers. Journal of the American Society for Information Science and Technology 54(5), 447.
– Börner, K., Chen, C., & Boyack, K. W. (2003). Visualizing knowledge domains. Annual Review of Information Science and Technology 37, 179-255.
– Werner-Washburne, M., Wylie, B., Boyack, K., Fuge, E., Galbraith, J., Fleharty, M., Weber, J., & Davidson, G.S. (2002). Concurrent analysis of multiple genome-scale datasets. Genome Research 12(10), 1564-1573.
– Boyack, K. W., Wylie, B. N., & Davidson, G. S. (2002). Information visualization, human-computer interaction, and cognitive psychology: Domain visualizations. Lecture Notes in Computer Science 2539, 145-160.
– Boyack, K. W., Wylie, B. N., & Davidson, G. S. (2002). Domain visualization using VxInsight for science and technology management. Journal of the American Society for Information Science and Technology, 53(9), 764-774.
– Davidson, G. S., Wylie, B. N., & Boyack, K. W. (2001). Cluster stability and the use of noise in interpretation of clustering. Proc. IEEE Information Visualization 2001, 23-30.
– Boyack, K.W., Wylie, B.N., Davidson, G.S. & Johnson, D.K., Analysis of patent databases using VxInsight. Presented at New Paradigms in Information Visualization and Manipulation 2000, McLean, VA, Nov. 10, 2000.