Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

14
Data Analysis Project Advanced Bioinformatics BIF-30806 2013

Transcript of Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Page 1: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Data Analysis Project

Advanced BioinformaticsBIF-30806

2013

Page 2: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Set Up

• Basic and Advanced Project• Available data sets• Deliverables• Literature• Groups• Schedule week 3 & 4

Page 3: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Purpose

• Build software pipeline to perform a transcriptome analysis

– Code to connect tools and do input/output conversions

– Code developed on certain data set, but should be able to run on different input (e.g. different species)

Page 4: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Basic Project

• Which are the most highly expressed genes (top 100) in your species of interest under a single condition (or in a single tissue)?

• Can you find a correlation between gene expression and transcript properties, such as GC content, transcript length, intron length, codon usage, or others?

• [Optional] Can you visualize the highly expressed genes in an interaction network?TOOLS: Tophat, cufflinks, perl scripts, and possibly others.

Page 5: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Why?

Page 6: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Advanced Project

• Which transcripts/genes show differential expression under both conditions?

• Can you find out what the functions of these genes are?• Can you give a biological explanation of why these genes are

differentially expressed under the conditions in your experiment?

• [Optional] In your data set, can you find modules of co-expressed genes? Try to use the WGCNA package.

• [Optional] Can you find a functional description and explanation for the identified modules?

• [Optional] To what extent are the modules conserved in a closely related species?

TOOLS: Tophat, cufflinks, cuffdiff, WGCNA, perl scripts, and possibly others

Page 7: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Why?

Page 8: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

You have a choice

• Start on basic or advanced project– Of cour se the basic project can be extended

with elements of the advanced project• Group members should talk to each other and

discuss their choice with Harm/Sandra.

Page 9: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Deliverables per group

• Pipeline code, all input/output has to be stored in the “group directory” at the server

• Final presentation (20 minutes)– Each group member must prepare and presents

some slides (5 min per person)

Page 10: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Deliverables per person

• Project report– All the work done in the project (intro, M&M,

results, discussion/conclusion)– Appendix A: your contribution to the group effort– Appendix B: personal reflection on the project

• Contribution to group presentation– Prepare and present some slides (5 min per

person)• The code that you have written

Page 11: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Data

• On server: /course/project/– Arabidopsis– Yeast

• Other data/species of your choice– Use for example NCBI Short Read Archive (SRA)

Page 12: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Literature

• See course website

Page 13: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Groups

• See course website

Page 14: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Schedule week 3 & 4

• Presentations– Tue (26-2) afternoon: presenting project plan– Fri (1-3) afternoon: presenting progress– Fri (8-3) all day: final presentation

• Deadline report & code– Sunday March 10, 23:59– So, your report has to be in before Monday!– Email your report to “[email protected]