2013 alumni-webinar
-
Upload
ctitusbrown -
Category
Technology
-
view
336 -
download
1
Transcript of 2013 alumni-webinar
Outline
1. Genetics 101 and 102 - what you need to know.2. Marek’s Disease – chicken cancer.3. Generating lots of data – the sequencing revolution.4. The problems of data analysis and data integration.5. Some preliminary results on Marek’s Disease5. An apparent digression: chess and computers.6. My actual research :)
Genetics 101: DNA to RNA to protein to phenotype…
http://commons.wikimedia.org/wiki/File:Spombe_Pop2p_protein_structure_rainbow.png; http://commons.wikimedia.org/wiki/File:Protein_CA2_PDB_12ca.png
…plus diploidy (2x each chromosome)
…plus regulation and interaction.
PHYSICAL AGENTS
INFECTIOUSAGENTS
HORMONES RADIATION
CANCER
GENETIC FACTORS
CHEMICAL CARCINOGENS
LIFESTYLE FACTORS
(slide courtesy Suga Subramanian)
Herpesvirus and Cancer
• Epstein-Barr Virus– Burkitt’s lymphoma– Hodgkin’s lymphoma– Nasopharyngeal
carcinoma
• Herpes Virus-8– Kaposi’s sarcoma– Multicentric lymphoma
• Mardivirus– Marek’s Disease
• Viral neoplastic disease• Alpha-herpesvirus• Model for Burkitt’s lymphoma
(slide courtesy Suga Subramanian)
Clinical Signs Asymmetric Paralysis
http://partnersah.vet.cornell.edu/avian-atlas/
Visceral LymphomaLiver
NO
RM
AL
LYM
PH
OM
A
Courtesy: John Dunn, USDA
Importance of Marek’s Disease
• Agricultural Impact– Economic losses (2 billion)– Viral evolution: Increased virulence – Current Vaccines: Not enough– Long term viral persistence
• Model Sytem– Human herpes viral infections– Viral induced lymphoma
(slide courtesy Suga Subramanian)
MAREK’S DISEASE VIRUS
(MDV)INBRED CHICKEN
LINES
MD-RESISTANT LINE
MD-SUSCEPTIBLE LINE
LINE 62 LINE 73
GENETIC RESISTANCE TO MAREK’S DISEASE
?(slide courtesy Suga Subramanian)
What happens when we infect?
…how does the virus specifically interact with genes?
…and what are the mechanisms of resistance?
Digression: DNA sequencing
• Observation of actual DNA sequence• Counting of molecules
Image: Werner Van Belle
Fast, cheap, and easy to generate.
Image: Werner Van Belle
Applying sequencing to Marek’s Disease
Differentially expressed genes (DEG) due to infection
Gene GO Analysis, IPA Pathway Analysis
DEGs in Md5-infected and not in Md5ΔMeq-infected groups
YES NO
Meq-dependent DEGs DEGs not dependent on Meq
DEGs in Line 6 and not in Line 7 DEGs in Line 7 and not in Line 6YES NO NO YES
Meq-dependent DEGs involved in MD resistance
Meq-dependent DEGs involved in
MD susceptibility
Meq-dependent DEGs common to both lines
Back to Marek’s disease:
(slide courtesy Suga Subramanian)
LINE 6
MD-RESISTANCE: ROLE OF MEQ
MDV MDV-no Meq
Genes involved in MD-resistance
that are regulated by Meq
Genes involved in MD-resistance that are not regulated
by Meq
1031 1670
(slide courtesy Suga Subramanian)
Pathway Analysis: MD resistance
(slide courtesy Suga Subramanian)
LINE 7
MD-SUSCEPTIBILITY: ROLE OF MEQ
MDV MDV-no Meq
Genes involved in MD-susceptibilitythat are regulated
by Meq
Genes involved in MD-susceptibility
that are not regulated by Meq
650 540
(slide courtesy Suga Subramanian)
Pathway Analysis: MD susceptibility
(slide courtesy Suga Subramanian)
Next problem: data analysis & integration!
• Once you can generate virtually any data set you want…
• …the next problem becomes finding your answer in the data set!
• Think of it as a gigantic NSA treasure hunt: you know there are terrorists out there, but to find them you to hunt through 1 bn phone calls a day…
Digression: “Heuristics”
• What do computers do when the answer is either really, really hard to compute exactly, or actually impossible?
• They approximate! Or guess!
• The term “heuristic” refers to a guess, or shortcut procedure, that usually returns a pretty good answer.
Often explicit or implicit tradeoffs between compute “amount” and quality of result
http://www.infernodevelopment.com/how-computer-chess-engines-think-minimax-tree
My actual research focus
What we do is think about ways to get computers to play chess better, by:
– Identifying better ways to guess;– Speeding up the guessing process;– Improving people’s ability to use the chess playing
computer
Now, replace “play chess” with“analyze biological data”...
My actual research focus…
We build tools that help experimental biologists work efficiently and correctly with large amounts of data, to help answer their
scientific questions.
This touches on many problems, including:• Computational and scientific correctness.• Computational efficiency.• Cultural divides between experimental biologists and
computational scientists.• Lack of training (biology and medical curricula devoid of math
and computing).
Not-so-secret sauce: “digital normalization”
• One primary step of one type of data analysis becomes 20-200x faster, 20-150x “cheaper”.
http://en.wikipedia.org/wiki/JPEG
Lossy compression
http://en.wikipedia.org/wiki/JPEG
Lossy compression
http://en.wikipedia.org/wiki/JPEG
Lossy compression
http://en.wikipedia.org/wiki/JPEG
Lossy compression
http://en.wikipedia.org/wiki/JPEG
Lossy compression
Restated:
Can we use lossy compression approaches to make downstream analysis faster and better? (Yes.)
~2 GB – 2 TB of single-chassis RAM
Some diginorm examples:
1. Assembly of the H. contortus parasitic nematode genome.
2. Assembly of two Midwest soil metagenomes, Iowa corn and Iowa prairie.
3. Reference-free assembly of the lamprey (P. marinus) transcriptome.
1. The H. contortus problem
• A sheep parasite.
• ~350 Mbp genome
• Sequenced DNA 6 individuals after whole genome amplification, estimated 10% heterozygosity (!?)
• Significant bacterial contamination.
(w/Robin Gasser, Paul Sternberg, and Erich Schwarz)
H. contortus life cycle
Refs.: Nikolaou and Gasser (2006), Int. J. Parasitol. 36, 859-868;Prichard and Geary (2008), Nature 452, 157-158.
Assembly after digital normalization
• Diginorm readily enabled assembly of a 404 Mbp genome with N50 of 15.6 kb;
• Post-processing led to 73-94% complete genome.
• Diginorm helped by making analysis possible.– Highly variable population.– Lots of contamination from microbes.
Next steps with H. contortus
• Publish the genome paper
• Identification of antibiotic targets for treatment in agricultural settings (animal husbandry).
• Serving as “reference approach” for a wide variety of parasitic nematodes, many of which have similar genomic issues.
2. Soil metagenome assembly
A “Grand Challenge” dataset (DOE/JGI)
Putting it in perspective:Total equivalent of ~1200 bacterial genomesHuman genome ~3 billion bp
Assembly results for Iowa corn and prairie(2x ~300 Gbp soil metagenomes)
Total Assembly
Total Contigs(> 300 bp)
% Reads Assembled
Predicted protein coding
2.5 bill 4.5 mill 19% 5.3 mill
3.5 bill 5.9 mill 22% 6.8 mill
Adina Howe
3. Sea lamprey gene expression
• Non-native• Parasite of
medium to large fishes
• Caused populations of host fishes to crash
Li Lab / Y-W C-D
Transcriptome results• Started with 5.1 billion reads from 50 different tissues.
(4 years of computational research, and about 1 month of compute time, GO HERE)
• Final assembly contains ~95% of genes (est.)• This is an extra 40% over previous work.• Enabling studies in –
– Basal vertebrate phylogeny– Biliary atresia– Evolutionary origin of brown fat (previously thought to be mammalian
only!) – J Exp Biol. 2013– Pheromonal response in adults
What are the tissue level changes in gene expression that support regeneration? Transcriptome analysis of a regenerating vertebrate after SCI
brainspinal cord
RNA-Seq to determinedifferential expressionprofile after injury
Sampling >weekly
-/+ Dex
Ona Bloom
Challenges ahead
• We need more people working at the interface– “Priesthood” model doesn’t scale!– Cultural shifts in biology needed…
• We need more data!– Data often only makes sense in context of other data– This is a hard sell: “if you give us 1000x as much data, we
might start to develop some idea of what it means.”
• We actually know very little about biology still!
Open science & sharing
• Science, and biology in particular, is in the middle of a transition to a “data intensive” field.
• The sharing ethos is not incentivized properly; you get more credit for discovering new stuff than for discoveries resulting from sharing.
• We are focused on sharing: methods, programs, educational materials…
Being disruptive?
Possible initiative from my lab:“We will analyze your data for you if we can
make your data openly available in 1 yr.”
Will it work, or sink like a stone? Ask me in a year
MSU’s role in my research
• MSU provides nice infrastructure, great administrative support, and a truly excellent community (students, profs, and other researchers).
• MSU is also uniquely interdisciplinary in many ways; very few “hard” boundaries in biology research.
Credits
• Marek’s Disease: Suga Subramanian and Hans Cheng (USDA)• Haemonchus: Erich Schwarz (Caltech/Cornell), Paul Sternberg
(Caltech), Robin Gasser (U. Melbourne)• Lamprey: Weiming Li (MSU), Ona Bloom (Feinstein), Jen
Morgan (MBL/Woods Hole)• Great Prairie: Jim Tiedje (MSU), Janet Jansson (LBL), Susanna
Tringe (Joint Genome Inst.)
Funding: MSU; USDA; NSF; NIH.
Drop me a line – [email protected]