Workflows for Biological Research at Notre Dame

25
Workflows for biological research at Notre Dame Sandra Gesing [email protected]

Transcript of Workflows for Biological Research at Notre Dame

Page 1: Workflows for Biological Research at Notre Dame

Workflows  for  biological  research  at  Notre  Dame  Sandra  Gesing  

[email protected]  

Page 2: Workflows for Biological Research at Notre Dame

Biological  Research  at  Notre  Dame  •  genomics  •  proteomics  •  molecular  

simula0ons  •  docking  •  disease    

modeling  Black  Swallowtail  -­‐  larvae  and  bu9erfly  

Page 3: Workflows for Biological Research at Notre Dame

Molecular  Simula@ons  and  Docking  •  Predic0on  and  analysis  of  molecular  structures  •  Numerous  applica0ons,  e.g.    – Materials  science  –  Drug  design  

 ligands   target  

docking   ?  

Page 4: Workflows for Biological Research at Notre Dame

Molecular  Simula@ons  and  Docking  •  Predic0on  and  analysis  of  molecular  structures  •  Numerous  applica0ons,  e.g.    – Materials  science  –  Drug  design  

 ligands   target  

docking  binding  energy  

scoring    func0ons  

binding  pocket  

Page 5: Workflows for Biological Research at Notre Dame

Disease  Modeling  •  vector-­‐borne  diseases,  e.g.,    

lympha0c  filariaris,  malaria  •  mathema0cal  models  •  predic0on  of  interven0ons  •  data  on  weather,    

demographics,  interven0ons  

Page 6: Workflows for Biological Research at Notre Dame

Biological  Research  at  Notre  Dame  •  technologies  and  methods  for  crea0ng,  analyzing  and  

predic0on  of  data  available  •  immense  amount  of  data,  e.g.,  –  ZINC  database:  ~20  Mio  molecular  structures    –  Human  genome:  ~  3  Bio  DNA  base  pairs  

•  compute-­‐intensive  tasks  

Page 7: Workflows for Biological Research at Notre Dame

Workflows  •  a  sequence  of  connected  steps  in  a  defined  order  based  

on  their  control  and  data  dependencies  

12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaag atagaatcaa

 Figure  copied  from:  Stuart  Owen  „Workflows  with  Taverna“  

Page 8: Workflows for Biological Research at Notre Dame

Communi@es  •  users  are  generally  not  IT  specialists  

Page 9: Workflows for Biological Research at Notre Dame

MoSGrid  Molecular  Simula0on  Grid  •  science  gateway  integrated  with  underlying  compute  and  data  

management  infrastructure        •  distributed  workflow    

management    •  data  repository  

Page 10: Workflows for Biological Research at Notre Dame

MoSGrid  

Page 11: Workflows for Biological Research at Notre Dame

MoSGrid  

Page 12: Workflows for Biological Research at Notre Dame

MoSGrid  –  Applica@on  Areas  Molecular  Dynamics  •  Study  and  simula0on  of  molecular  mo0on  Quantum  Chemistry  •  Study  and  simula0on  of  molecular  electronic  behavior  rela0ve  

to  their  chemical  reac0vity  Docking  •  Main  focus  on  evalua0on  of  ligand-­‐receptor  interac0ons    

(e.g.,  for  drug  design)  

Page 13: Workflows for Biological Research at Notre Dame

MoSGrid  

Page 14: Workflows for Biological Research at Notre Dame

MoSGrid  

Page 15: Workflows for Biological Research at Notre Dame

MoSGrid  

Page 16: Workflows for Biological Research at Notre Dame

VectorBase  

Page 17: Workflows for Biological Research at Notre Dame

VectorBase  

Page 18: Workflows for Biological Research at Notre Dame

VectorBase  -­‐  Galaxy  

Page 19: Workflows for Biological Research at Notre Dame

VectorBase  -­‐  Galaxy  

Page 20: Workflows for Biological Research at Notre Dame

Disease  Modeling  –  Baysian  Model  

Page 21: Workflows for Biological Research at Notre Dame

Disease  Modeling  –  Baysian  Model  

Page 22: Workflows for Biological Research at Notre Dame

An  Old  Idea:  Makefiles    part1 part2 part3: input.data split.py ./split.py input.data out1: part1 mysim.exe ./mysim.exe part1 >out1 out2: part2 mysim.exe ./mysim.exe part2 >out2 out3: part3 mysim.exe ./mysim.exe part3 >out3 result: out1 out2 out3 join.py ./join.py out1 out2 out3 > result

Slide  copied  from:  Douglas  Thain  „Toward  a  Common  Model  for  Highly  Concurrent  Applica0ons“  

Page 23: Workflows for Biological Research at Notre Dame

Makeflow  =  Make  +  Workflow  

Makeflow  

Local   Condor   SGE   Work  Queue  

•  Provides  portability  across  batch  systems.  •  Enable  parallelism  (but  not  too  much!)  •  Fault  tolerance  at  mul0ple  scales.  •  Data  and  resource  management.    

Slide  copied  from:  Douglas  Thain  „Toward  a  Common  Model  for  Highly  Concurrent  Applica0ons“  

Page 24: Workflows for Biological Research at Notre Dame

Outlook  •  crea0on  of  more  workflows  in  science  gateways  •  integra0on  of  science  gateways  with  ICTBioMed  

infrastructure  •  integra0on  of  Makeflow  and  ICTBioMed  infrastructure  

Page 25: Workflows for Biological Research at Notre Dame

PARTNERS