Information Technology: The Foundation for 21st Century Research · 2018-04-16 · Information...
Transcript of Information Technology: The Foundation for 21st Century Research · 2018-04-16 · Information...
Information Technology:The Foundation for 21st Century Research
Robert J. RobbinsFred Hutchinson Cancer Research Center
1100 Fairview Avenue North, LV-101Seattle, Washington 98109
[email protected](206) 667 2920
Abstract
Over the next few years, the relentless exponential effect of Moore'sLaw will profoundly affect nearly all areas of science and technology.By 2005, analytical power previously available only at supercomputercenters will exist on every desktop and the volume of electronic datawill be enormous. Even now, a standard Intel computer delivers morecomputational power than the first supercomputer and GenBankacquires more data every ten weeks than it did in its first ten years.
Science of the 21st Century will require an adequate informationinfrastructure. Those with access will participate in the transformationof science; those without may become irrelevant. If public support forinformation infrastructure is inadequate, some types of research mayonly be possible in the private sector.
Abstract
Over the next few years, the relentless exponential effect of Moore'sLaw will profoundly affect nearly all areas of science and technology.By 2005, analytical power previously available only at supercomputercenters will exist on every desktop and the volume of electronic datawill be enormous. Even now, a standard Intel computer delivers morecomputational power than the first supercomputer and GenBankacquires more data every ten weeks than it did in its first ten years.
Science of the 21st Century will require an adequate informationinfrastructure. Those with access will participate in the transformationof science; those without may become irrelevant. If public support forinformation infrastructure is inadequate, some types of research mayonly be possible in the private sector.
3
Topics
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
4
Topics
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Moore’s Law constantly transforms IT (and everythingelse).
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Moore’s Law constantly transforms IT (and everythingelse).
5
Topics
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Moore’s Law constantly transforms IT (and everythingelse).
• Information Technology (IT) has a special relationshipwith biology.
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Moore’s Law constantly transforms IT (and everythingelse).
• Information Technology (IT) has a special relationshipwith biology.
6
Topics
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Moore’s Law constantly transforms IT (and everythingelse).
• Information Technology (IT) has a special relationshipwith biology.
• 21st-Century biology will be based on bioinformatics.
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Moore’s Law constantly transforms IT (and everythingelse).
• Information Technology (IT) has a special relationshipwith biology.
• 21st-Century biology will be based on bioinformatics.
7
Topics
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Moore’s Law constantly transforms IT (and everythingelse).
• Information Technology (IT) has a special relationshipwith biology.
• 21st-Century biology will be based on bioinformatics.• Bioinformatics is emerging as an independent
discipline.
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Moore’s Law constantly transforms IT (and everythingelse).
• Information Technology (IT) has a special relationshipwith biology.
• 21st-Century biology will be based on bioinformatics.• Bioinformatics is emerging as an independent
discipline.
8
Topics
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Moore’s Law constantly transforms IT (and everythingelse).
• Information Technology (IT) has a special relationshipwith biology.
• 21st-Century biology will be based on bioinformatics.• Bioinformatics is emerging as an independent
discipline.• A connected, federated information infrastructure for
biology is needed.
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Moore’s Law constantly transforms IT (and everythingelse).
• Information Technology (IT) has a special relationshipwith biology.
• 21st-Century biology will be based on bioinformatics.• Bioinformatics is emerging as an independent
discipline.• A connected, federated information infrastructure for
biology is needed.
9
Topics
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Moore’s Law constantly transforms IT (and everythingelse).
• Information Technology (IT) has a special relationshipwith biology.
• 21st-Century biology will be based on bioinformatics.• Bioinformatics is emerging as an independent
discipline.• A connected, federated information infrastructure for
biology is needed.• Current support for public bio-information
infrastructure seems inadequate.
• Biotechnology and information technology will be the“magic” technologies of the 21st Century.
• Moore’s Law constantly transforms IT (and everythingelse).
• Information Technology (IT) has a special relationshipwith biology.
• 21st-Century biology will be based on bioinformatics.• Bioinformatics is emerging as an independent
discipline.• A connected, federated information infrastructure for
biology is needed.• Current support for public bio-information
infrastructure seems inadequate.
Introduction
MagicalTechnology
11
Magic
To a person from 1897, much currenttechnology would seem like magic.
12
Magic
To a person from 1897, much currenttechnology would seem like magic.
What technology of 2097 would seemmagical to a person from 1997?
13
Magic
To a person from 1897, much currenttechnology would seem like magic.
What technology of 2097 would seemmagical to a person from 1997?
Candidate: Biotechnology so advanced that the distinctionbetween living and non-living is blurred.
Information technology so advanced that accessto information is immediate and universal.
Candidate: Biotechnology so advanced that the distinctionbetween living and non-living is blurred.
Information technology so advanced that accessto information is immediate and universal.
14
Magic
To a person from 1897, much currenttechnology would seem like magic.
What technology of 2097 would seemmagical to a person from 1997?
Candidate: Biotechnology so advanced that the distinctionbetween living and non-living is blurred.
Information technology so advanced that accessto information is immediate and universal.
Candidate: Biotechnology so advanced that the distinctionbetween living and non-living is blurred.
Information technology so advanced that accessto information is immediate and universal.
Moore’s Law
Transforms InfoTech(and everything else)
16
Moore’s Law: The Statement
Every eighteen months, thenumber of transistors that canbe placed on a chip doubles.
Gordon Moore, co-founder of Intel...Gordon Moore, co-founder of Intel...
17
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Time
100,000
10,000
1,000
100
10
18
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Time
100,000
10,000
1,000
100
10
P
19
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erf
orm
an
ce)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
P
20
Moore’s Law: The Effect
Three Phases of Novel IT Applications
• It’s Impossible
Three Phases of Novel IT Applications
• It’s Impossible
21
Moore’s Law: The Effect
Three Phases of Novel IT Applications
• It’s Impossible
• It’s Impractical
Three Phases of Novel IT Applications
• It’s Impossible
• It’s Impractical
22
Moore’s Law: The Effect
Three Phases of Novel IT Applications
• It’s Impossible
• It’s Impractical
• It’s Overdue
Three Phases of Novel IT Applications
• It’s Impossible
• It’s Impractical
• It’s Overdue
23
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erf
orm
an
ce)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
24
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erf
orm
an
ce)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
25
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erf
orm
an
ce)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
26
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erf
orm
an
ce)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
27
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erf
orm
an
ce)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
C
28
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erf
orm
an
ce)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
C
29
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erf
orm
an
ce)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
AA
C
30
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erf
orm
an
ce)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
A
C
31
Moore’s Law: The EffectP
erfo
rman
ce(c
on
stan
t co
st)
Cos
t(c
on
stan
t p
erf
orm
an
ce)
Time
100,000 100,000
10,000
1,000
100
10
10,000
1,000
100
10
D
P
A
C
Relevance for biology?Relevance for biology?
32
Cost (constant performance)
1975 1980 1985 1990 1995 2000 2005
1,000
10,000
100,000
1,000,000
10,000,000 UniversityPurchase
33
Cost (constant performance)
1975 1980 1985 1990 1995 2000 2005
1,000
10,000
100,000
1,000,000
10,000,000 UniversityPurchase
DepartmentPurchase
34
Cost (constant performance)
1975 1980 1985 1990 1995 2000 2005
1,000
10,000
100,000
1,000,000
10,000,000
RO1 GrantPurchase
UniversityPurchase
DepartmentPurchase
35
Cost (constant performance)
1975 1980 1985 1990 1995 2000 2005
1,000
10,000
100,000
1,000,000
10,000,000
PersonalPurchase
RO1 GrantPurchase
UniversityPurchase
DepartmentPurchase
36
Cost (constant performance)
1975 1980 1985 1990 1995 2000 2005
1,000
10,000
100,000
1,000,000
10,000,000
PersonalPurchase
RO1 GrantPurchase
UniversityPurchase
DepartmentPurchase
UnplannedPurchases
IT-BiologySynergismIT-BiologySynergism
38
IT is Special
Information Technology:
• affects the performance and themanagement of tasks
Information Technology:
• affects the performance and themanagement of tasks
39
IT is Special
Information Technology:
• affects the performance and themanagement of tasks
• allows the manipulation of hugeamounts of highly complex data
Information Technology:
• affects the performance and themanagement of tasks
• allows the manipulation of hugeamounts of highly complex data
40
IT is Special
Information Technology:
• affects the performance and themanagement of tasks
• allows the manipulation of hugeamounts of highly complex data
• is incredibly plastic
Information Technology:
• affects the performance and themanagement of tasks
• allows the manipulation of hugeamounts of highly complex data
• is incredibly plastic(programming and poetry are both exercises in pure thought)
41
IT is Special
Information Technology:
• affects the performance and themanagement of tasks
• allows the manipulation of hugeamounts of highly complex data
• is incredibly plastic
Information Technology:
• affects the performance and themanagement of tasks
• allows the manipulation of hugeamounts of highly complex data
• is incredibly plastic(programming and poetry are both exercises in pure thought)
• improves exponentially(Moore’s Law)
42
Biology is Special
Life is Characterized by:
• individuality
Life is Characterized by:
• individuality
43
Biology is Special
Life is Characterized by:
• individuality
• historicity
Life is Characterized by:
• individuality
• historicity
44
Biology is Special
Life is Characterized by:
• individuality
• historicity
• contingency
Life is Characterized by:
• individuality
• historicity
• contingency
45
Biology is Special
Life is Characterized by:
• individuality
• historicity
• contingency
• high (digital) information content
Life is Characterized by:
• individuality
• historicity
• contingency
• high (digital) information content
46
Biology is Special
Life is Characterized by:
• individuality
• historicity
• contingency
• high (digital) information content
Life is Characterized by:
• individuality
• historicity
• contingency
• high (digital) information content
No law of large numbers...No law of large numbers...
47
Biology is Special
Life is Characterized by:
• individuality
• historicity
• contingency
• high (digital) information content
Life is Characterized by:
• individuality
• historicity
• contingency
• high (digital) information content
No law of large numbers, since everyliving thing is genuinely unique.No law of large numbers, since everyliving thing is genuinely unique.
48
IT-Biology Synergism
• Physics needs calculus, the method formanipulating information aboutstatistically large numbers of vanishinglysmall, independent, equivalent things.
• Physics needs calculus, the method formanipulating information aboutstatistically large numbers of vanishinglysmall, independent, equivalent things.
49
IT-Biology Synergism
• Physics needs calculus, the method formanipulating information aboutstatistically large numbers of vanishinglysmall, independent, equivalent things.
• Biology needs information technology, themethod for manipulating informationabout large numbers of dependent,historically contingent, individual things.
• Physics needs calculus, the method formanipulating information aboutstatistically large numbers of vanishinglysmall, independent, equivalent things.
• Biology needs information technology, themethod for manipulating informationabout large numbers of dependent,historically contingent, individual things.
50
Biology is Special
For it is in relation to the statistical point of viewthat the structure of the vital parts of livingorganisms differs so entirely from that of anypiece of matter that we physicists and chemistshave ever handled in our laboratories ormentally at our writing desks.
For it is in relation to the statistical point of viewthat the structure of the vital parts of livingorganisms differs so entirely from that of anypiece of matter that we physicists and chemistshave ever handled in our laboratories ormentally at our writing desks.
Erwin Schrödinger. 1944. What is Life.Erwin Schrödinger. 1944. What is Life.
51
Genetics as Code
[The] chromosomes ... contain in some kind of code-script the entire pattern of the individual's futuredevelopment and of its functioning in the mature state.... [By] code-script we mean that the all-penetratingmind, once conceived by Laplace, to which everycausal connection lay immediately open, could tellfrom their structure whether [an egg carrying them]would develop, under suitable conditions, into a blackcock or into a speckled hen, into a fly or a maize plant,a rhodo-dendron, a beetle, a mouse, or a woman.
[The] chromosomes ... contain in some kind of code-script the entire pattern of the individual's futuredevelopment and of its functioning in the mature state.... [By] code-script we mean that the all-penetratingmind, once conceived by Laplace, to which everycausal connection lay immediately open, could tellfrom their structure whether [an egg carrying them]would develop, under suitable conditions, into a blackcock or into a speckled hen, into a fly or a maize plant,a rhodo-dendron, a beetle, a mouse, or a woman.
Erwin Schrödinger. 1944. What is Life.Erwin Schrödinger. 1944. What is Life.
52
One Human SequenceWe now know thatSchrödinger’s mysterioushuman “code-script”consists of 3.3 billionbase pairs of DNA.
53
One Human Sequence
Typed in 10-pitch font, one human sequence would stretch for morethan 5,000 miles. Digitally formatted, it could be stored on one CD-ROM. Biologically encoded, it fits easily within a single cell.
We now know thatSchrödinger’s mysterioushuman “code-script”consists of 3.3 billionbase pairs of DNA.
54
Bio-digital Information
DNA is a highly efficient digital storage device:
• There is more mass-storage capacity in theDNA of a side of beef than in all the hard drivesof all the world’s computers.
DNA is a highly efficient digital storage device:
• There is more mass-storage capacity in theDNA of a side of beef than in all the hard drivesof all the world’s computers.
55
Bio-digital Information
DNA is a highly efficient digital storage device:
• There is more mass-storage capacity in theDNA of a side of beef than in all the hard drivesof all the world’s computers.
• Storing all of the (redundant) information in allof the world’s DNA on computer hard diskswould require that the entire surface of the Earthbe covered to a depth of three miles in Conner1.0 gB drives.
DNA is a highly efficient digital storage device:
• There is more mass-storage capacity in theDNA of a side of beef than in all the hard drivesof all the world’s computers.
• Storing all of the (redundant) information in allof the world’s DNA on computer hard diskswould require that the entire surface of the Earthbe covered to a depth of three miles in Conner1.0 gB drives.
Genomics:An ExampleGenomics:
An Example
57
Human Genome Project - Goals– construction of a high-resolution genetic map of the human
genome;– construction of a high-resolution genetic map of the human
genome;
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
58
Human Genome Project - Goals– construction of a high-resolution genetic map of the human
genome;
– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;
– construction of a high-resolution genetic map of the humangenome;
– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
59
Human Genome Project - Goals– construction of a high-resolution genetic map of the human
genome;
– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;
– determination of the complete sequence of human DNA andof the DNA of selected model organisms;
– construction of a high-resolution genetic map of the humangenome;
– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;
– determination of the complete sequence of human DNA andof the DNA of selected model organisms;
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
60
Human Genome Project - Goals– construction of a high-resolution genetic map of the human
genome;
– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;
– determination of the complete sequence of human DNA andof the DNA of selected model organisms;
– development of capabilities for collecting, storing,distributing, and analyzing the data produced;
– construction of a high-resolution genetic map of the humangenome;
– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;
– determination of the complete sequence of human DNA andof the DNA of selected model organisms;
– development of capabilities for collecting, storing,distributing, and analyzing the data produced;
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
61
Human Genome Project - Goals– construction of a high-resolution genetic map of the human
genome;
– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;
– determination of the complete sequence of human DNA andof the DNA of selected model organisms;
– development of capabilities for collecting, storing,distributing, and analyzing the data produced;
– creation of appropriate technologies necessary to achievethese objectives.
– construction of a high-resolution genetic map of the humangenome;
– production of a variety of physical maps of all humanchromosomes and of the DNA of selected modelorganisms;
– determination of the complete sequence of human DNA andof the DNA of selected model organisms;
– development of capabilities for collecting, storing,distributing, and analyzing the data produced;
– creation of appropriate technologies necessary to achievethese objectives.
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
USDOE. 1990. Understanding Our Genetic Inheritance.The U.S. Human Genome Project: The First Five Years.
62
Infrastructure and the HGP
Progress towards all of the [Genome Project]goals will require the establishment of well-funded centralized facilities, including a stockcenter for the cloned DNA fragmentsgenerated in the mapping and sequencingeffort and a data center for the computer-basedcollection and distribution of large amounts ofDNA sequence information.
Progress towards all of the [Genome Project]goals will require the establishment of well-funded centralized facilities, including a stockcenter for the cloned DNA fragmentsgenerated in the mapping and sequencingeffort and a data center for the computer-basedcollection and distribution of large amounts ofDNA sequence information.
National Research Council. 1988. Mapping and Sequencing theHuman Genome. Washington, DC: National Academy Press. p. 3
National Research Council. 1988. Mapping and Sequencing theHuman Genome. Washington, DC: National Academy Press. p. 3
63
GenBank Totals (Release 103)
DIVISION
Phage Sequences (PHG)Viral Sequences (VRL)
Bacteria (BCT)
Plant, Fungal, and Algal Sequences (PLN)
Invertebrate Sequences (INV)
Rodent Sequences (ROD)Primate Sequences (PRI1–2)
Other Mammals (MAM)Other Vertebrate Sequences (VRT)
High-Throughput Genome Sequences (HTG)
Genome Survey Sequences (GSS)Structural RNA Sequences (RNA)
Sequence Tagged Sites Sequences (STS)Patent Sequences (PAT)
Synthetic Sequences (SYN)Unannotated Sequences (UNA)
EST1-17
TOTALS
Entries
1,313 45,355 38,023
44,553
29,657
36,967 75,587 12,744 17,713
1,120
42,628 4,802 52,824 87,767 2,577 2,480
1,269,737
1,765,847
Base Pairs
2,138,810 44,484,848 88,576,641
92,259,434
105,703,550
45,437,309 134,944,314 12,358,310 17,040,159
72,064,395
22,783,326 2,487,397 18,161,532 27,593,724 5,698,945 1,933,676
466,634,317
1,160,300,687
Per Cent
0.184%3.834%7.634%
7.951%
9.110%
3.916%11.630%
1.065%1.469%
6.211%
1.964%0.214%1.565%2.378%0.491%0.167%
40.217%
100.000%
Per Cent
0.074%2.568%2.153%
2.523%
1.679%
2.093%4.280%0.722%1.003%
0.063%
2.414%0.272%2.991%4.970%0.146%0.140%
71.905%
100.000%
64
Base Pairs in GenBank
0
2 0 0 ,0 0 0 ,0 0 0
4 0 0 ,0 0 0 ,0 0 0
6 0 0 ,0 0 0 ,0 0 0
8 0 0 ,0 0 0 ,0 0 0
1 ,0 0 0 ,0 0 0 ,0 0 0
1 ,2 0 0 ,0 0 0 ,0 0 0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
GenBank Release Numbers
9493929190898887 95 96 97
65
Base Pairs in GenBank
0
2 0 0 ,0 0 0 ,0 0 0
4 0 0 ,0 0 0 ,0 0 0
6 0 0 ,0 0 0 ,0 0 0
8 0 0 ,0 0 0 ,0 0 0
1 ,0 0 0 ,0 0 0 ,0 0 0
1 ,2 0 0 ,0 0 0 ,0 0 0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
GenBank Release Numbers
9493929190898887 95 96 97
Growth in GenBank is exponential.More data were added in the last tenweeks than were added in the first tenyears of the project.
Growth in GenBank is exponential.More data were added in the last tenweeks than were added in the first tenyears of the project.
66
Base Pairs in GenBank
0
2 0 0 ,0 0 0 ,0 0 0
4 0 0 ,0 0 0 ,0 0 0
6 0 0 ,0 0 0 ,0 0 0
8 0 0 ,0 0 0 ,0 0 0
1 ,0 0 0 ,0 0 0 ,0 0 0
1 ,2 0 0 ,0 0 0 ,0 0 0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
GenBank Release Numbers
9493929190898887 95 96 97
Growth in GenBank is exponential.More data were added in the last tenweeks than were added in the first tenyears of the project.
Growth in GenBank is exponential.More data were added in the last tenweeks than were added in the first tenyears of the project.
At this rate, what’s next...At this rate, what’s next...
67
ABI Bass-o-Matic Sequencer
In with the sample, out with the sequence...
TGCGCATCGCGTATCGATAG
speed
gB/sec
EnterDefrost
7 8 9
4 5 6
1 2 3
0
+
-
68
What’s Really Next
The post-genome era in biologicalresearch will take for granted readyaccess to huge amounts of genomicdata.
The challenge will be understandingthose data and using the understandingto solve real-world problems...
69
Base Pairs in GenBank
0
2 0 ,0 0 0 ,0 0 0
4 0 ,0 0 0 ,0 0 0
6 0 ,0 0 0 ,0 0 0
8 0 ,0 0 0 ,0 0 0
1 0 0 ,0 0 0 ,0 0 0
1 2 0 ,0 0 0 ,0 0 0
1 4 0 ,0 0 0 ,0 0 0
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 105
GenBank Release Numbers
9493929190898887 95 96 97
Net ChangesNet Changes
70
Base Pairs in GenBank (Percent Increase)
0 .0 0 %
1 0 .0 0 %
2 0 .0 0 %
3 0 .0 0 %
4 0 .0 0 %
5 0 .0 0 %
6 0 .0 0 %
7 0 .0 0 %
8 0 .0 0 %
9 0 .0 0 %
1 0 0 .0 0 %
85 86 87 88 89 90 91 92 93 94 95 96
Year
Percent Increaseaverage = 56%
Percent Increaseaverage = 56%
71
Projected Base Pairs
11 0
1 0 01 ,0 0 0
1 0 ,0 0 01 0 0 ,0 0 0
1 ,0 0 0 ,0 0 01 0 ,0 0 0 ,0 0 0
1 0 0 ,0 0 0 ,0 0 01 ,0 0 0 ,0 0 0 ,0 0 0
1 0 ,0 0 0 ,0 0 0 ,0 0 01 0 0 ,0 0 0 ,0 0 0 ,0 0 0
1 ,0 0 0 ,0 0 0 ,0 0 0 ,0 0 01 0 ,0 0 0 ,0 0 0 ,0 0 0 ,0 0 0
1 0 0 ,0 0 0 ,0 0 0 ,0 0 0 ,0 0 0
90 95 0 5 10 15 20 25
Year
Assumed annual growth rate: 50%(less than current rate)
72
Projected Base Pairs
11 0
1 0 01 ,0 0 0
1 0 ,0 0 01 0 0 ,0 0 0
1 ,0 0 0 ,0 0 01 0 ,0 0 0 ,0 0 0
1 0 0 ,0 0 0 ,0 0 01 ,0 0 0 ,0 0 0 ,0 0 0
1 0 ,0 0 0 ,0 0 0 ,0 0 01 0 0 ,0 0 0 ,0 0 0 ,0 0 0
1 ,0 0 0 ,0 0 0 ,0 0 0 ,0 0 01 0 ,0 0 0 ,0 0 0 ,0 0 0 ,0 0 0
1 0 0 ,0 0 0 ,0 0 0 ,0 0 0 ,0 0 0
90 95 0 5 10 15 20 25
Year
Assumed annual growth rate: 50%(less than current rate)
Is this crazy?One trillion bp by 2015100 trillion by 2025
Is this crazy?One trillion bp by 2015100 trillion by 2025
73
Projected Base Pairs
11 0
1 0 01 ,0 0 0
1 0 ,0 0 01 0 0 ,0 0 0
1 ,0 0 0 ,0 0 01 0 ,0 0 0 ,0 0 0
1 0 0 ,0 0 0 ,0 0 01 ,0 0 0 ,0 0 0 ,0 0 0
1 0 ,0 0 0 ,0 0 0 ,0 0 01 0 0 ,0 0 0 ,0 0 0 ,0 0 0
1 ,0 0 0 ,0 0 0 ,0 0 0 ,0 0 01 0 ,0 0 0 ,0 0 0 ,0 0 0 ,0 0 0
1 0 0 ,0 0 0 ,0 0 0 ,0 0 0 ,0 0 0
90 95 0 5 10 15 20 25
Year
Is this crazy?One trillion bp by 2015100 trillion by 2025
Is this crazy?One trillion bp by 2015100 trillion by 2025
500,00050,0005,000
Maybe not...Maybe not...
Projected database size,indicated as the numberof base pairs perindividual medicalrecord in the US.
21st CenturyBiology
Post-Genome Era
75
The Post-Genome Era
Post-genome research involves:
• applying genomic tools and knowledge to moregeneral problems
• asking new questions, tractable only to genomicor post-genomic analysis
• moving beyond the structural genomics of thehuman genome project and into the functionalgenomics of the post-genome era
Post-genome research involves:
• applying genomic tools and knowledge to moregeneral problems
• asking new questions, tractable only to genomicor post-genomic analysis
• moving beyond the structural genomics of thehuman genome project and into the functionalgenomics of the post-genome era
76
The Post-Genome Era
Suggested definition:
• functional genomics = biology
Suggested definition:
• functional genomics = biology
77
The Post-Genome Era
An early analysis:An early analysis:
Walter Gilbert. 1991. Towards a paradigmshift in biology. Nature, 349:99.
78
Paradigm Shift in Biology
To use [the] flood of knowledge, which will pouracross the computer networks of the world,biologists not only must become computerliterate, but also change their approach to theproblem of understanding life.
To use [the] flood of knowledge, which will pouracross the computer networks of the world,biologists not only must become computerliterate, but also change their approach to theproblem of understanding life.
Walter Gilbert. 1991. Towards a paradigm shift in biology. Nature, 349:99.Walter Gilbert. 1991. Towards a paradigm shift in biology. Nature, 349:99.
79
Paradigm Shift in Biology
The new paradigm, now emerging, is that all the‘genes’ will be known (in the sense of beingresident in databases available electronically),and that the starting point of a biologicalinvestigation will be theoretical. An individualscientist will begin with a theoretical conjecture,only then turning to experiment to follow or testthat hypothesis.
The new paradigm, now emerging, is that all the‘genes’ will be known (in the sense of beingresident in databases available electronically),and that the starting point of a biologicalinvestigation will be theoretical. An individualscientist will begin with a theoretical conjecture,only then turning to experiment to follow or testthat hypothesis.
Walter Gilbert. 1991. Towards a paradigm shift in biology. Nature, 349:99.Walter Gilbert. 1991. Towards a paradigm shift in biology. Nature, 349:99.
80
Paradigm Shift in Biology
Case of Microbiology
< 5,000 known and described bacteria
5,000,000 base pairs per genome
25,000,000,000 TOTAL base pairs
If a full, annotated sequence were available for all known bacteria, the practiceof microbiology would match Gilbert’s prediction.
If a full, annotated sequence were available for all known bacteria, the practiceof microbiology would match Gilbert’s prediction.
21st CenturyBiology
The Science
82
Fundamental Dogma
The fundamental dogma of molecular biologyis that genes act to create phenotypes througha flow of information from DNA to RNA toproteins, to interactions among proteins(regulatory circuits and metabolic pathways),and ultimately to phenotypes.
Collections of individual phenotypes, ofcourse, constitute a population.
DNA
RNA
Proteins
Circuits
Phenotypes
Populations
83
Fundamental DogmaDNA
RNA
Proteins
Circuits
Phenotypes
Populations
GenBankEMBLDDBJ
MapDatabases
SwissPROTPIR
PDB
Although a few databases already existto distribute molecular information,
Although a few databases already existto distribute molecular information,
84
Fundamental DogmaDNA
RNA
Proteins
Circuits
Phenotypes
Populations
GenBankEMBLDDBJ
MapDatabases
SwissPROTPIR
PDB
Gene Expression?
Clinical Data ?
Regulatory Pathways?Metabolism?
Biodiversity?
Neuroanatomy?
Development ?
Molecular Epidemiology?
Comparative Genomics?
the post-genomic era will need manymore to collect, manage, and publishthe coming flood of new findings.
the post-genomic era will need manymore to collect, manage, and publishthe coming flood of new findings.
Although a few databases already existto distribute molecular information,
Although a few databases already existto distribute molecular information,
21st CenturyBiology
The People
86
Human Resources Issues
• Reduction in need for non-IT staff• Reduction in need for non-IT staff
87
Human Resources Issues
• Reduction in need for non-IT staff
• Increase in need for IT staff, especially“information engineers”
• Reduction in need for non-IT staff
• Increase in need for IT staff, especially“information engineers”
88
Human Resources Issues
• Reduction in need for non-IT staff
• Increase in need for IT staff, especially“information engineers”
• Reduction in need for non-IT staff
• Increase in need for IT staff, especially“information engineers”
In modern biology, a general trend is toconvert expert work into staff work andfinally into computation. New expertise isrequired to design, carry out, and interpretcontinuing work.
In modern biology, a general trend is toconvert expert work into staff work andfinally into computation. New expertise isrequired to design, carry out, and interpretcontinuing work.
89
Human Resources Issues
Elbert Branscomb: “You must recognize thatsome day you may need as many computerscientists as biologists in your labs.”
Elbert Branscomb: “You must recognize thatsome day you may need as many computerscientists as biologists in your labs.”
90
Human Resources Issues
Elbert Branscomb: “You must recognize thatsome day you may need as many computerscientists as biologists in your labs.”
Craig Venter: “At TIGR, we already havetwice as many computer scientists on ourstaff.”
Elbert Branscomb: “You must recognize thatsome day you may need as many computerscientists as biologists in your labs.”
Craig Venter: “At TIGR, we already havetwice as many computer scientists on ourstaff.”
Exchange at DOE workshop on high-throughput sequencing.Exchange at DOE workshop on high-throughput sequencing.
New Disciplineof Informatics
New Disciplineof Informatics
92
What is Informatics?
Informatics
ComputerScienceResearch
BiologicalApplicationPrograms
93
What is Informatics?
Informatics combines expertise from:
• domain science (e.g., biology)
• computer science
• library science
• management science
Informatics combines expertise from:
• domain science (e.g., biology)
• computer science
• library science
• management science
All tempered with an engineering mindset...All tempered with an engineering mindset...
94
What is Informatics?
ISIS
MedicalInformatics
MedicalInformatics
BioInformatics
BioInformatics
OtherInformatics
OtherInformatics
LibraryScience
LibraryScience
ComputerScience
ComputerScience
MgtScienceMgt
Science
DomainKnowledge
EngineeringPrinciples
95
Engineering Mindset
Engineering is often defined as the use ofscientific knowledge and principles for practicalpurposes. While the original usage restrictedthe word to the building of roads, bridges, andobjects of military use, today's usage is moregeneral and includes chemical, electronic, andeven mathematical engineering.
Engineering is often defined as the use ofscientific knowledge and principles for practicalpurposes. While the original usage restrictedthe word to the building of roads, bridges, andobjects of military use, today's usage is moregeneral and includes chemical, electronic, andeven mathematical engineering.
Parnas, David Lorge. 1990. Computer, 23(1):17-22.Parnas, David Lorge. 1990. Computer, 23(1):17-22.
96
Engineering Mindset
Engineering is often defined as the use ofscientific knowledge and principles for practicalpurposes. While the original usage restrictedthe word to the building of roads, bridges, andobjects of military use, today's usage is moregeneral and includes chemical, electronic, andeven mathematical engineering.
Engineering is often defined as the use ofscientific knowledge and principles for practicalpurposes. While the original usage restrictedthe word to the building of roads, bridges, andobjects of military use, today's usage is moregeneral and includes chemical, electronic, andeven mathematical engineering.
Parnas, David Lorge. 1990. Computer, 23(1):17-22.Parnas, David Lorge. 1990. Computer, 23(1):17-22.
... or even information engineering.
97
Engineering Mindset
Engineering education ... stresses findinggood, as contrasted with workable, designs.Where a scientist may be happy with a devicethat validates his theory, an engineer is taughtto make sure that the device is efficient,reliable, safe, easy to use, and robust.
Engineering education ... stresses findinggood, as contrasted with workable, designs.Where a scientist may be happy with a devicethat validates his theory, an engineer is taughtto make sure that the device is efficient,reliable, safe, easy to use, and robust.
Parnas, David Lorge. 1990. Computer, 23(1):17-22.Parnas, David Lorge. 1990. Computer, 23(1):17-22.
98
Engineering Mindset
Engineering education ... stresses findinggood, as contrasted with workable, designs.Where a scientist may be happy with a devicethat validates his theory, an engineer is taughtto make sure that the device is efficient,reliable, safe, easy to use, and robust.
Engineering education ... stresses findinggood, as contrasted with workable, designs.Where a scientist may be happy with a devicethat validates his theory, an engineer is taughtto make sure that the device is efficient,reliable, safe, easy to use, and robust.
Parnas, David Lorge. 1990. Computer, 23(1):17-22.Parnas, David Lorge. 1990. Computer, 23(1):17-22.
The assembly of working, robust systems, on time and onbudget, is the key requirement for a federated informationinfrastructure for biology.
FederatedInformation
Infrastructure
FederatedInformation
Infrastructure
100
National Information Infrastructure
analog
commercialuses
non-commercialuses
digital
Edu Libother ResETC
�
101
FIIST & NII
FIIST(science & technology)
FIIE(engineering)
FIIS(science)
FII(climatology)
FII(chemistry)
FII(geology)
FII(physics)
FII(biology)
FII(physiology)
FII(ecology)
FII(structural biology)
FII(• • •)
FII(geography)
FII(human)
FII(mouse)
FII(• • •)
FII(Arabidopsis)
FII(genomics)
FII(E. coli)
FII(zoology)
FII(botany)
FII(• • •)
FII(systematics)
FII(• • •)
analog
commercialuses
non-commercialuses
digital
Edu Libother ResETC
�
The research component ofthe NII contains a FederatedInformation Infrastructurefor Science and Technology..
The research component ofthe NII contains a FederatedInformation Infrastructurefor Science and Technology..
102
FIIST
FIIST(science & technology)
FIIE(engineering)
FIIS(science)
FII(climatology)
FII(chemistry)
FII(geology)
FII(physics)
FII(biology)
FII(• • •)
FII(geography)
103
FIIB
FII(biology)
FII(physiology)
FII(ecology)
FII(structural biology)
FII(human)
FII(mouse)
FII(• • •)
FII(Arabidopsis)
FII(genomics)
FII(E. coli)
FII(zoology)
FII(botany)
FII(• • •)
FII(systematics)
FII(• • •)
104
Public Funding of Databases
Stand-alone Criteria:
• Is there a need?
• Will this meet the need?
• Can they do it?
• Is it worth it?
105
Public Funding of Databases
Global Criteria:
• Does it adhere to standards?
• Will it interoperate?
• Is there commitment to federation?
• Is it worth it?
106
Information Resources and the GII
Guiding Principles:
• Global value explosion
• Componentry
• Anonymous interoperability
• Technical scalability
• Social scalability
• Value additivity
Funding forBio-InformationInfrastructure
Funding forBio-InformationInfrastructure
108
Call for Change
Among the many new tools that are or will be needed (for 21st-century biology), some of those having the highest priority are:
• bioinformatics
• computational biology
• functional imaging tools using biosensors and biomarkers
• transformation and transient expression technologies
• nanotechnologies
Among the many new tools that are or will be needed (for 21st-century biology), some of those having the highest priority are:
• bioinformatics
• computational biology
• functional imaging tools using biosensors and biomarkers
• transformation and transient expression technologies
• nanotechnologies
Impact of Emerging Technologies on the Biological Sciences: Report of aWorkshop. NSF-supported workshop, held 26-27 June 1995, Washington, DC.
Impact of Emerging Technologies on the Biological Sciences: Report of aWorkshop. NSF-supported workshop, held 26-27 June 1995, Washington, DC.
109
The Problem
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT moves at “Internet Speed” and respondsrapidly to market forces.
110
The Problem
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT will play a central role in 21st Centurybiology.
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT will play a central role in 21st Centurybiology.
111
The Problem
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT will play a central role in 21st Centurybiology.
• Current levels of support for public bio-information infrastructure are too low.
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT will play a central role in 21st Centurybiology.
• Current levels of support for public bio-information infrastructure are too low.
112
The Problem
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT will play a central role in 21st Centurybiology.
• Current levels of support for public bio-information infrastructure are too low.
• Reallocation of federal funding is difficult,and subject to political pressures.
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT will play a central role in 21st Centurybiology.
• Current levels of support for public bio-information infrastructure are too low.
• Reallocation of federal funding is difficult,and subject to political pressures.
113
The Problem
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT will play a central role in 21st Centurybiology.
• Current levels of support for public bio-information infrastructure are too low.
• Reallocation of federal funding is difficult,and subject to political pressures.
• Federal-funding decision processes areponderously slow and inefficient.
• IT moves at “Internet Speed” and respondsrapidly to market forces.
• IT will play a central role in 21st Centurybiology.
• Current levels of support for public bio-information infrastructure are too low.
• Reallocation of federal funding is difficult,and subject to political pressures.
• Federal-funding decision processes areponderously slow and inefficient.
114
Federal Funding of Bio-Databases
The challenges:The challenges:
115
Federal Funding of Bio-Databases
The challenges:
• providing adequate funding levels
The challenges:
• providing adequate funding levels
116
Federal Funding of Bio-Databases
The challenges:
• providing adequate funding levels
• making timely, efficient decisions
The challenges:
• providing adequate funding levels
• making timely, efficient decisions
IT Budgets
A Reality Check
118
Rhetorical Question
Which is likely to be more complex:
• identifying, documenting, and tracking thewhereabouts of all parcels in transit in the US atone time
Which is likely to be more complex:
• identifying, documenting, and tracking thewhereabouts of all parcels in transit in the US atone time
119
Rhetorical Question
Which is likely to be more complex:
• identifying, documenting, and tracking thewhereabouts of all parcels in transit in the US atone time
• identifying, documenting, and analyzing thestructure and function of all individual genes inall economically significant organisms; thenanalyzing all significant gene-gene and gene-environment interactions in those organismsand their environments
Which is likely to be more complex:
• identifying, documenting, and tracking thewhereabouts of all parcels in transit in the US atone time
• identifying, documenting, and analyzing thestructure and function of all individual genes inall economically significant organisms; thenanalyzing all significant gene-gene and gene-environment interactions in those organismsand their environments
120
Business Factoids
United Parcel Service:
• uses two redundant 3 Terabyte (yes, 3000 GB)databases to track all packages in transit.
• has 4,000 full-time employees dedicated to IT
• spends one billion dollars per year on IT
• has an income of 1.1 billion dollars, againstrevenues of 22.4 billion dollars
United Parcel Service:
• uses two redundant 3 Terabyte (yes, 3000 GB)databases to track all packages in transit.
• has 4,000 full-time employees dedicated to IT
• spends one billion dollars per year on IT
• has an income of 1.1 billion dollars, againstrevenues of 22.4 billion dollars
121
Business ComparisonsCompany Revenues IT Budget Pct
Bristol-Myers Squibb 15,065,000,000 440,000,000 2.92 %
Pfizer 11,306,000,000 300,000,000 2.65 %
Pacific Gas & Electric 10,000,000,000 250,000,000 2.50 %
K-Mart 31,437,000,000 130,000,000 0.41 %
Wal-Mart 104,859,000,000 550,000,000 0.52 %
Sprint 14,235,000,000 873,000,000 6.13 %
MCI 18,500,000,000 1,000,000,000 5.41 %
United Parcel 22,400,000,000 1,000,000,000 4.46 %
AMR Corporation 17,753,000,000 1,368,000,000 7.71 %
IBM 75,947,000,000 4,400,000,000 5.79 %
Microsoft 11,360,000,000 510,000,000 4.49 %
Chase-Manhattan 16,431,000,000 1,800,000,000 10.95 %
Nation’s Bank 17,509,000,000 1,130,000,000 6.45 %
122
Federal Funding of Biomedical-IT
Appropriate funding level:
• approx. 5-10% of research funding
• i.e., 1 - 2 billion dollars per year
Appropriate funding level:
• approx. 5-10% of research funding
• i.e., 1 - 2 billion dollars per year
123
Federal Funding of Biomedical-IT
Appropriate funding level:
• approx. 5-10% of research funding
• i.e., 1 - 2 billion dollars per year
Appropriate funding level:
• approx. 5-10% of research funding
• i.e., 1 - 2 billion dollars per year
Source of estimate:
- Experience of IT-transformed industries.
- Current support for IT-rich biological research.
Source of estimate:
- Experience of IT-transformed industries.
- Current support for IT-rich biological research.
Basics
Business 101
125
Market Forces
Vendors
productsservices
Buyers
$
purchases
In a simple market economy, vendors try to anticipatethe needs of buyers and offer products and services tomeet those needs.
Real users decide whether or not to buy a product orservice, depending upon whether or not it meets a realneed at a reasonable price.
In a simple market economy, vendors try to anticipatethe needs of buyers and offer products and services tomeet those needs.
Real users decide whether or not to buy a product orservice, depending upon whether or not it meets a realneed at a reasonable price.
Business 101 Insight:
Successful vendors target aniche and excel at meeting theneeds of that niche.
Business 101 Insight:
Successful vendors target aniche and excel at meeting theneeds of that niche.
126
Market Forces
VentureCapital
Vendors$
Buyers
$Stock
Offerings
Funding to initiate the developmentof products and services come frominvestors, not from buyers.
Investors decide whether or not toprovide start-up funding based uponthe estimated ability of the vendor tocreate products and services that willmeet real needs at competitive prices.
Funding to initiate the developmentof products and services come frominvestors, not from buyers.
Investors decide whether or not toprovide start-up funding based uponthe estimated ability of the vendor tocreate products and services that willmeet real needs at competitive prices.
$
VendorInvestment
productsservices
$
purchases
127
Federal Funding
Investors
Database
$
Users
productsservices
$
purchases
If biological databases were drivenby market forces, individual userswould choose what services theyneed and individual databaseproviders would choose whatservices to make available.
Investors would provide start-upmoney on the likelihood ofsuccessful products and servicesbeing developed.
Ultimate success would depend onmeeting the needs of real users.Decisions could be made rapidly, inresponse to changing needs andemerging opportunities.
If biological databases were drivenby market forces, individual userswould choose what services theyneed and individual databaseproviders would choose whatservices to make available.
Investors would provide start-upmoney on the likelihood ofsuccessful products and servicesbeing developed.
Ultimate success would depend onmeeting the needs of real users.Decisions could be made rapidly, inresponse to changing needs andemerging opportunities.
128
Federal Funding
Agency
Database
Reviewers
OtherAgencies
AgencyAdvisors
Congress
productsservices
OMB $
$ DatabaseAdvisors
Users
Instead, funding decisions for grant-supported biological databases canfollow a ponderously slow course,with almost no opportunity for real-time input from real users.
Even with the best of intentions at alllevels, this process is slow,inefficient, risk-averse, and non-responsive to the real and changingneeds of users.
Instead, funding decisions for grant-supported biological databases canfollow a ponderously slow course,with almost no opportunity for real-time input from real users.
Even with the best of intentions at alllevels, this process is slow,inefficient, risk-averse, and non-responsive to the real and changingneeds of users.
129
Federal Funding of Bio-Databases
Possible solutions:
• increase the direct support of federal serviceorganizations providing informationinfrastructure (e.g., NCBI).
• reduce support for investigator-initiated, grant-funded public database projects.
• create market forces, initially throughsubsidization, later simply through direct supportfor affected science (e.g., NSFnet into internet).
Possible solutions:
• increase the direct support of federal serviceorganizations providing informationinfrastructure (e.g., NCBI).
• reduce support for investigator-initiated, grant-funded public database projects.
• create market forces, initially throughsubsidization, later simply through direct supportfor affected science (e.g., NSFnet into internet).
130
Federal Funding of Bio-Databases
Creating market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
Creating market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
131
Federal Funding of Bio-Databases
Creating market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
• start supporting the demand side through fast,efficient processes.
Creating market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
• start supporting the demand side through fast,efficient processes.
132
Federal Funding of Bio-Databases
Creating market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
• start supporting the demand side through fast,efficient processes.
• provide guaranteed supplementary funding,redeemable only for access to bio-databases.
Creating market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
• start supporting the demand side through fast,efficient processes.
• provide guaranteed supplementary funding,redeemable only for access to bio-databases.
133
Federal Funding of Bio-Databases
Creating market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
• start supporting the demand side through fast,efficient processes.
• provide guaranteed supplementary funding,redeemable only for access to bio-databases.
• data stamps
Creating market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
• start supporting the demand side through fast,efficient processes.
• provide guaranteed supplementary funding,redeemable only for access to bio-databases.
• data stamps
134
Federal Funding of Bio-Databases
Creating market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
• start supporting the demand side through fast,efficient processes.
• provide guaranteed supplementary funding,redeemable only for access to bio-databases.
• data stamps, AKA food (for-thought) stamps ?!
Creating market forces:
• stop supporting the supply side of biodatabasesthrough slow, inefficient processes.
• start supporting the demand side through fast,efficient processes.
• provide guaranteed supplementary funding,redeemable only for access to bio-databases.
• data stamps, AKA food (for-thought) stamps ?!
135
Food (for thought) Stamps
Funding Agencies could:
• provide a 10% supplement to every researchgrant in the form of “stamps” redeemable only atdatabase providers.
• allow the “stamps” to be transferable amongscientists, so that a market for them couldemerge.
• provide funding only after the stamps have beenredeemed at a database provider.
Funding Agencies could:
• provide a 10% supplement to every researchgrant in the form of “stamps” redeemable only atdatabase providers.
• allow the “stamps” to be transferable amongscientists, so that a market for them couldemerge.
• provide funding only after the stamps have beenredeemed at a database provider.
136
Food (for thought) Stamps
Problems:
• how to estimate the amount of FFT stamps thatwould actually be redeemed (and thus therequired budget set-aside).
• how to identify “approved” database providers.
• how to initiate the FFT system.
• etc etc
Problems:
• how to estimate the amount of FFT stamps thatwould actually be redeemed (and thus therequired budget set-aside).
• how to identify “approved” database providers.
• how to initiate the FFT system.
• etc etc
137
Food (for thought) Stamps
Alternatives (if no solution emerges):• increasingly inefficient research activities (abject
failure will occur when it becomes simpler torepeat research than to obtain prior results).
• loss of access to bio-databases for public-sectorresearch.
• movement of majority of “important” biologicalresearch into the private sector.
• loss of American pre-eminence (if othercountries solve the problems first).
Alternatives (if no solution emerges):• increasingly inefficient research activities (abject
failure will occur when it becomes simpler torepeat research than to obtain prior results).
• loss of access to bio-databases for public-sectorresearch.
• movement of majority of “important” biologicalresearch into the private sector.
• loss of American pre-eminence (if othercountries solve the problems first).
138
Slides:
http://www.esp.org/rjr/self.pdfhttp://www.esp.org/rjr/self.pdf