Molecular biology in the information era
-
Upload
andres-aravena -
Category
Education
-
view
100 -
download
0
Transcript of Molecular biology in the information era
Molecular Biology in theInformation EraWinter School 2015
Andrés Aravena, PhD - Istanbul UniversityDepartment of Molecular Biology and Genetics - 7 March 2015
My name is Andrés Aravena
Türkçe bilmiorum !
I am
New Assistant Professor at Molecular Biology and GenomicsDepartment
Mathematical Engineer, U. of Chile
PhD Informatics, U Rennes 1, France
PhD Mathematical Modeling, U. of Chile
not a Biologist
but an Applied Mathematician who can speak "biologist language"
·
·
·
·
·
·
3/67
I will speak about
The Past, Present and Future
Facts, opinion and guess
What I've done beforeso you can understand why I'm here
What I'm doing now at Istanbul University
What I foresee from my "outsider" point of view
·
·
·
4/67
I've worked on
Big and small computers
Telecommunication Networks
Between 2003 and 2014 I was the chief research engineer
·
·
·
on the main bioinformatic group in my country
in the top research center (CMM)
in the top university (University of Chile)
of my country
-
-
-
-
5/67
I come from Chile
6/67
Chile
Small country of ~17 million people
Universities ranks similar to Turkish ones
Spanish colony 500 years ago (so language is Spanish)
Independent Republic 200 years ago
First Latin American country to recognize Turkish republic
OECD member
Everyday life very similar to Turkey
7/67
Chilean Economy: Exports
1st world producer of copper
2nd world producer of salmon
Fruits: peaches, grapes, apples,avocado
Wine: exported worldwide
Official data for 2014
9/67
The natural question was
How can we improve theseindustriesusing Molecular Biology and Bioinformatics?
FruitsPeach and Grapes
Gene expression analysis for industrial applications:
Peach: response to cold stress
Grapefruit: development related to seed and grape size (Sultaniye)
·
·
11/67
FishesSalmon
Farmed salmons are feed with cheap vegetal proteinBut wild salmons eat animal protein
How is salmon's metabolism affected by the diet?Which genes change their expression because the changes in food?
Gene expression analysis usingmicroarrays
Fish selection for breeding usingmicroarrays (patent pending)
·
·
12/67
FishesSalmon Genomic Sequence
... and sequencing of whole Salmo salar genome
(10 million dollars project)
13/67
Wine
Chilean wine travels long distances to final markets
Any yeast contamination means big economic loses(people stops buying all Chilean brands)
Quality control is usually done growing samples for 3 daysBut time is expensive: penalty for shipping delays
We designed qPCR method for rapid detection of yeast contamination
It is currently used by one major wine producer in Chile. It may besold to Roche.
14/67
Mining industrymolecular biology to extract copper
A little chemistry:Copper is part of a compound, with Sulfur and Iron.Ferric acid separates it.
Cu2S + 4Fe3+ � 2Cu2+ + 4Fe2+ + S
Resulting Cu2+ is soluble and is recovered.
But all Fe3+ transforms to Fe2+ and reaction stops
There are bacteria that "eat" e- and keep the reaction going on
Fe2+ � Fe3+ + e-
15/67
Why is it important?
The biological method is much better that the standard one
The goal is to understand and improve the involved bacteria so thistechnology can be used extensively
Enables building new mines
It is like discovering petrol reserves for the country
Reduced contamination
Cheaper
·
·
16/67
Most of the results are still industrial secret
We had a research contract with the main mining company
State owned, big enough to pay for long term research
Few papers, many patents
17/67
BioidentificationMonitoring the presence of good bacteria
We need to control the "ecosystem" on the mine
Molecular Biology methods are fast, sensible and reliable
They can be used in place: metagenomic approach. No culture
Key problem: Design probes that match a taxonomic branch, not aspecific strain
The probes should be tolerant to mutations that occur inenvironmental samples with many strains
Classical tools don't work on big scales
18/67
Design of probes for complex samplesI designed and built a solution using a super-computer
Calculation tool one day on 32 processors (one processor month)
Resulting probes worked as expected
They can be used on qPCR or in microarrays.
19/67
Automatic Interpretation of Resultsusing a Statistical Classification Model
20/67
Publications
The microarray was published inN. Ehrenfeld, A. Aravena, A. Reyes-Jara, N. Barreto, R. Assar, A. Maass,P. Parada, Design and use of oligonucleotide microarrays for identification
of Biomining microorganisms. Advanced Materials Research 71-73(2009) 155-158.
21/67
Patents
The method and the probes have been patented in
USA, Number: US 7 853 408 B2, Date: 14/12/2010;
South Africa, Number: 2006/06828, Date: 26/03/2008;
Australia, Number: 2006203551, Date: 15/09/2011;
Mexico, Number: PXMX 32/2006, Date: November 2012.
Peru, Number: PE 5838, Date: 29/10/2010;
Chine, Number: 200810095172.6, Date: 2013;
Chile, Number: DPI-660-2007, Date: 06/05/2013;
Argentina, Number: AR056179
·
·
·
·
·
·
·
·
22/67
Functional genomicsHow does the bacteria work?
To improve the process we need to see inside the black box. Wesequenced the complete genome of 3 bacteria
We paid over USD $150K. Today is USD $5K
Hint: Sequence assembly requires a big computer. It does not workon a regular PC
Acidithiobacillus ferrooxidans
Acidithiobacillus thiooxidans
Leptospirillum ferrooxidans
·
·
·
23/67
Modeling MetabolismWe predict which genes codeenzymes
Each enzyme catalyzes a reaction,with a known stoichiometry
Every reaction gives an equation
All equations plus boundaryconditions give model to predictmetabolite concentration
We can predict how the cell adaptsto environmental changes
24/67
Modeling Regulation
From the genome sequence we can predict which genes code fortranscription factors and they bind
They form a putative regulatory network.
But current methods produce too many false positives
We expected ~4K regulations. We got 25K regulations.
I integrate this model with microarray data to find the "mostprobable" regulatory network using a parsimony criterium
25/67
Systems Biologybeyond Bioinformatics
A very active research area that aim to understand the cell as asystem with complex interactions
The focus is not on the genes, is on the genome
The key is to understand networks
regulatory
metabolic
signaling
protein-protein-interaction
·
·
·
·
26/67
The present
Why Computers in MolecularBiology and Genetics?
DNA is digital information
All experimental values in science are measured with an observationalerror.(e.g. temperature is 10.2 ± 0.05°C, pressure is 101215 ± 125 Pa)
Except genetic sequences: Nucleotides are either A, C, T or G.
There is no "average" or "intermediate case"
So is natural to use computers and information theory to model DNA
but there is another reason ...
28/67
29/67
Science converges to Molecular Biology
Physicists, mathematicians, computer scientist and engineers, turnedtheir attention to molecular biology questions.
They come looking with new eyes and creating new theoretical andpractical tools.
Molecular Biology has always interacted with other disciplines
Just consider the word "Biochemistry"
30/67
Internet makes Molecular Biology theoryaccessible to more people
Before Internet times
top science was accessible only to researchers with money to
finding references took several weeks by regular mail
Professors had the only copy of the textbooks
·
make complex experiments or
buy expensive books and journals
-
-
·
·
31/67
Today
all journals are accessible on-line
references are download in minutes at low cost
experimental results of each article are also free
·
·
free when the article is Open Access-
·
32/67
Anyone can analyze this data
Structured data is easy to process to discover new knowledge.
The software for this meta-analysis is also Open Source
Scientist can adapt the program internal code to solve their specificquestion
Anyone can download these programs without cost.
If the analysis requires big computational power you can rent it at lowcost
33/67
You don't need your own super-computerYou can rent Cloud computers
Companies like Amazon.com and Google sell their spare computerpower at low prices
This enables researchers to carry computations that would beimpossible otherwise.
34/67
The World is Flat
This democratization of knowledge provides an exciting challenge.
Rich countries have no longer the monopoly of knowledge.
We can be players in the big leagues, on a leveled surface.
We can read the same books and the same articles, use the samemachines and the same programs.
Anyone could make the new scientific breakthrough, either in NewYork, New Delhi or Istanbul.
But the same opportunity presents to everyone else.
35/67
There are more PhD students than everAnd many of them will be on Molecular Biology
Cyranoski et al. 2011. “Education: The PhD Factory.” Nature 472: 276–79.
36/67
More players come to the game
Emerging economies push up the number of researchers worldwide
India graduates more than a million engineers each year. Many ofthem in biotechnology
Egypt has 35.000 PhD students and Israel 10.000.
Many of them will find jobs in Molecular Biology companies oracademia
Hays, Thomas. 2011. “PhDs: Israel Also Trains Plenty.” Nature 473 (7347). Nature Publishing Group: 284–84.
37/67
How will we be different?
Success of Molecular Biology generates Big Data
Advances in molecular biology technology has produced
They produce
new generation sequencers
microarrays
mass spectrometers
real-time PCR.
·
·
·
·
reproducible experimental results
in big volumes
at low cost
·
·
·39/67
Data production costs is falling
National Human Genome Research Institute. http://genome.gov/sequencingcosts
40/67
Extracting Information from Raw DataSurviving the Data Tsunami
In a few years we passed from lack of data to excess of it
We need to learn how to extract biological meaning from big volumesof data
Classical methods are not enough
What is significant? What is the "null hypothesis"?
41/67
If we don't fully analyze our ownexperimental data, someone elsewill doAnd they will publish it
The planwhat we will teach
Teaching "Introduction to Data Science"
The students will learn
how to handle experimental data
how to communicate with scientists of other data-orienteddisciplines
how to produce publication quality reports with reproducibleresults
How to get raw data, extracting relevant information, filter it usingseveral selection criteria.
How to store and retrieve it in efficient and useful ways.
How to transform it, organize it, categorize it, display, show andunderstand the results.
·
·
·
·
·
·
44/67
Teaching "Scientific Computing"
Teach Python and BioPython to analyze, model, evaluate and predictthe behavior of genomic and molecular biology entities.
The students should be able to interact with high end servers, usecommand line tools and be comfortable in computing environmentsothers than Microsoft Windows.
Tools include Unix command line tools, SQL and the R statisticalpackage.
The student should be able to understand how computer networkswork and what are their limitations.
45/67
The idea is no to be experts oncomputers, but to have theconcepts and language to work ininterdisciplinary groups
Let's start learning Data Science
To test these ideas we start next week an
Introduction to Data Science Workshop
The mathematical tools can be explored together with the biologicalcontext, so they make sense and are easier to learn.
I will give you a link at the end of this talk.
If you are interested visit the webpage and send an email.
after all, maybe I'm just crazy
47/67
Every normal student is capable of good
mathematical reasoning if attention is
directed to activities of his interest
“
”
Jean Piaget, 1976Swiss psychologist and philosopher
A SecretYou can also learn at home
Everything we will show is available on the Internet
You just need to look for it
But it is in English
Translation takes too long
Translated science is obsolete science
49/67
The FutureMy personal prediction
It is hard to make predictions, especially
about the future
“
”
Danish proverb
Molecular Biology has become mainstream
Genomic tools are also used outside academia.
Several companies provide "personalized DNA services".
Both offer to trace ancestry and migrations of the human population.Any person can know which are his true origins.
23andMe, partially owned by Google.
The Genographic project, created by the National Geographic Society
and IBM.
·
·
52/67
Molecular Biology will follow the path ofcomputers
Today PCR thermocyclers are expensive devices found in universitiesand research centers, very much like desktop computers were in the70's and 80's.
Nowadays computers are low-cost and found everywhere.
Will the same happen with PCR?
54/67
PCR future
Today only a few companies produce PCR thermocyclers, just likesmartphones such as the iPhone and Samsung.
Nevertheless you can see them everywhere.
And this is a big opportunity for creators of software applications.
The value is in the apps. Ask Nokia or Blackberry
55/67
A computer on every desk and in every
home, all running Microsoft software
“
”
Bill Gates,Microsoft’s founding mission.
PCR is the new PC
Gates set this goal in the late 70's, when it was not obvious if peoplewould even see a computer in their lives.
PCR technology is now in the same state that Personal Computerswere in 1975. If PCR machines become inexpensive,
then who will be making "software apps" for them?
and there is "a PCR on every desk and home",
in hospitals,
restaurants
and high schools,
·
·
·
·
57/67
If PCR machines are available everywhereapplications can be:
Determining ancestry (e.g. race horses, farm animals, fishes)
Detection of unwanted organisms
Marker-assisted breeding
Food quality control (e.g. in an university canteen)
Security and control of Genetically Modified Organisms
Polymorphism detection
Clinical diagnosis
Personalized medicine
Police forensic analysis
·
·
·
·
·
·
·
·
·58/67
Software for PCRthe specific parameters of an application
I think we should prepare our students to make these "apps".
They should have easy access to low-cost thermocyclers, use themfrequently and creatively.
Then, like in the computer industry, they may create completely newapplications that we cannot foresee now.
DNA extraction protocols
Primers design
Amplification protocols
Detection methods
·
·
·
·
59/67
New tools for new science
New Instruments trigger advances in MolecularBiologyand in other sciences
They are usually named according to their inventor
Galileo created modern science when he made his own telescope
Newton also invented a new kind of telescope, still used today
Bunsen enabled spectrometry analysis with his burner
Svedberg ultracentrifugue (16S)
Sanger DNA sequencing method
Southern blot method for specific DNA detection
PCR to amplify DNA samples
·
·
·
·
·
·
·61/67
Scientific Instrumentation
I propose to create a course on "Scientific Instrumentation" usinginitially software tools.
Making instruments is now "software", not craftsmanship.
We can understand this with a biological analogy.
Designs in digital files are like genes.
3D printers are like ribosomes, producing physical versions of thedesign.
Online collaboration is like the evolution: designs are changed toimprove their fitness.
·
·
·
62/67
It is not rocket science
It is not heart surgery
Teşekkür Ederim
http://anaraven.github.io/data-science-workshop/