Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Post on 18-Jul-2015

93 views 3 download

Tags:

Transcript of Cracking the (bio)code -- Professional Development Session at SACNAS 2014

Cracking the (bio)code Resources for research careers in computational biology & bioinformatics

Felipe Zapata, PhDBrown University@zapata_f

Conner Sandefur, PhDUniv. North Carolina @oshehoma

Emilia Huerta-Sanchez, PhDUniv. California, Merced @emiliahsc

Tracy Heath, PhDIowa State Univ.@trayc7

Visit our website: crackingthebiocode.github.io● Information about the session● Resources for learning to program: workshops, online courses, tutorials, etc.● Links to many degree programs in the U.S. for studying computational

biology/bioinformatics● Profiles of computational biologists and bioinformaticians

How small changes can make a big difference Bioinformatics @UNC-Pembroke Investigating how changes in gene

expression drive system-wide behaviorComputational Biology @UNC-Chapel HillPredicting therapies to improve mucus clearance in cystic

fibrosis (CF) and chronic obstructive pulmonary disease (COPD) 1 hr 24 hrs

-4 0 4

Tools I use:

Dr. Conner I. SandefurSPIRE Postdoctoral Scholar at UNC-CHVisiting Assistant Professor at UCNP

PhD BioinformaticsUniversity of Michigan Ann Arbor, Michigan

BA Computer Science George Washington UniversityWashington, DC

email: sandefur@email.unc.eduweb: http://www.unc.edu/~sandefurtwitter: @oshehoma

What is the evolutionary history of species?Using transcriptomes and genomes to

resolve ancient animal radiationsPhylogeny of snails, slugs, and relatives

What genes are homologous?Using graph-based approaches to infer homology

Gene clusters inferred to be the “same” gene family across multiple species

AGALMA: https://bitbucket.org/caseywdunn/agalmaBitBucket (Git)

Dr. Felipe ZapataPostdoctoral Research AssociateBrown University

COLOMBIA

email: felipe_zapata@brown.eduweb: http://felipezapata.metwitter: @zapata_f

PhD Ecology, Evolution & SystematicsUniversity of Missouri-St. Louis St. Louis, Missouri

BSc Biology Universidad de Los AndesBogotá, Colombia

What does genetics tell us about human history?

Dr. Emilia Huerta SanchezAssistant ProfessorUC Merced

email: ehuerta-sanchez@ucmerced.eduweb: http://www.stat.berkeley.edu/~emiliahstwitter: @emiliahsc

Postdoc in Integrative Biology and Statistics, UC Berkeley, Berkeley, CA

PhD Applied MathematicsCornell University, Ithaca, NY

BA Mathematics & FrenchMills College, Oakland, CA

Modeling macro- & molecular evolutionary processes to infer phylogenetic relationships

● How have rates of molecular and morphological

evolution changed across the tree of life?

● How do patterns of fossilization, preservation, and

recovery change across different taxa?

● Can we detect relationships between geological

events and species diversification?

● What are the evolutionary processes acting on

different regions of the genome and how have those

factors shaped the evolution of different genes?

C++RevBayes

Probabilistic graphical models

Dr. Tracy A. HeathAssistant Professor (Jan. 2015)Iowa State University

email: trayc7@gmail.comweb: phyloworks.orgtwitter: @trayc7

Postdoctoral FellowU. Kansas & U.C. Berkeley

PhD Ecology, Evolution & BehaviorUniversity of Texas at Austin

BA Biology Boston University

What is Computational Biology?

What is Bioinformatics?

http://crackingthebiocode.github.io/

Modeling infectious disease transmission

Compartmental models are one type of mathematical model used to investigate the spread of infectious disease

Rate of infectionRate of recovery

Change in proportion of Susceptible (S) people over time = - Susceptible (S) X Infected (I) X β

Susceptible Infected Recovered

=

Infection dynamics for different diseases can be simulated by selecting appropriate parameters

We can use models to predict how interventions change disease transmission dynamics

Infection dynamics with R0 = 2

Infection dynamics after intervention at day 10, which reduced R

0 to 0.8

R0 > 1, infection peaks then disappears R

0 < 1, infection dies out

Simulations run in Python 3.4 (downloaded as part of Anaconda package: http://continuum.io/downloads)

Agalma: automated and reproducible phylogenetic

analyses

From…a few key genes (e.g. 16S RNA, mitochondria, chloroplasts)across many species

To…High-Throughput Sequencing of 1000s of genes across many species

genes

spec

ies

spec

ies

genes

Phylogenetics

Challenges to phylogenetics• Many steps

• Many programs must be used together

• Computationally intensive

• Difficult to reproduce

Challenges to phylogenetics• Many steps

• Many programs must be used together

• Computationally intensive

• Difficult to reproduce

Automate!

Why automate?• Results are reproducible

• Results can be easily explored and extended

• Methods can be compared in a controlled setting

• Facilitate method development without reinventing

everything

https://bitbucket.org/caseywdunn/agalmaThe tool

The paper

https://bitbucket.org/caseywdunn/dunnhowisonzapata2013/The example analysis

For each transcriptome:• Quality control• Assemble transcriptome • Translate and annotate genes • Quantify gene expression• Put sequences in database

Can also:• Import DNA sequences from national databases (e.g., NCBI)• Process externally produced assemblies

Across transcriptomes (many species):• Identify homologous genes

• Build phylogenies using all genes!

silh

ouet

te im

ages

from

http

://ph

ylop

ic.o

rg/

What tools do you need?

http://crackingthebiocode.github.io/

A biological question

programming skills

statistical modeling

C++

a mathematical model

Questions?

• What programming language should I learn?• How do I get started learning a programming language?• What is the best way to become proficient in a programming language?• What is the difference between C++ and python and java and R and

MatLab and ruby and ...?• What is version control? Do I need to know it?• Do I need a GitHub account?• Where are jobs or degree programs in computational

biology/bioinformatics listed?• What does it mean to be open source? Why is it important?• and ...?

http://crackingthebiocode.github.io/

Take-Home Messages • You don’t have to be an expert programmer to do computational

biology.• Anyone can learn to program, it’s just a matter of getting started.• Computational skills are extremely helpful for streamlining biology

research.• The skills you need to learn depend heavily on you background and

your research interests. • Quantitative skills – a firm understanding of math and statistics – are

important for any research field.• Don’t be overwhelmed by all there is to know, these skills grow over

time. If you consistently seek to improve them & use them for your work you will be amazed at how your expertise will develop.

http://crackingthebiocode.github.io/

Find out more!http://crackingthebiocode.github.io/profiles.html