Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Post on 02-Jan-2016

215 views 0 download

Transcript of Data Analysis Project Advanced Bioinformatics BIF-30806 2013.

Data Analysis Project

Advanced BioinformaticsBIF-30806

2013

Set Up

• Basic and Advanced Project• Available data sets• Deliverables• Literature• Groups• Schedule week 3 & 4

Purpose

• Build software pipeline to perform a transcriptome analysis

– Code to connect tools and do input/output conversions

– Code developed on certain data set, but should be able to run on different input (e.g. different species)

Basic Project

• Which are the most highly expressed genes (top 100) in your species of interest under a single condition (or in a single tissue)?

• Can you find a correlation between gene expression and transcript properties, such as GC content, transcript length, intron length, codon usage, or others?

• [Optional] Can you visualize the highly expressed genes in an interaction network?TOOLS: Tophat, cufflinks, perl scripts, and possibly others.

Why?

Advanced Project

• Which transcripts/genes show differential expression under both conditions?

• Can you find out what the functions of these genes are?• Can you give a biological explanation of why these genes are

differentially expressed under the conditions in your experiment?

• [Optional] In your data set, can you find modules of co-expressed genes? Try to use the WGCNA package.

• [Optional] Can you find a functional description and explanation for the identified modules?

• [Optional] To what extent are the modules conserved in a closely related species?

TOOLS: Tophat, cufflinks, cuffdiff, WGCNA, perl scripts, and possibly others

Why?

You have a choice

• Start on basic or advanced project– Of cour se the basic project can be extended

with elements of the advanced project• Group members should talk to each other and

discuss their choice with Harm/Sandra.

Deliverables per group

• Pipeline code, all input/output has to be stored in the “group directory” at the server

• Final presentation (20 minutes)– Each group member must prepare and presents

some slides (5 min per person)

Deliverables per person

• Project report– All the work done in the project (intro, M&M,

results, discussion/conclusion)– Appendix A: your contribution to the group effort– Appendix B: personal reflection on the project

• Contribution to group presentation– Prepare and present some slides (5 min per

person)• The code that you have written

Data

• On server: /course/project/– Arabidopsis– Yeast

• Other data/species of your choice– Use for example NCBI Short Read Archive (SRA)

Literature

• See course website

Groups

• See course website

Schedule week 3 & 4

• Presentations– Tue (26-2) afternoon: presenting project plan– Fri (1-3) afternoon: presenting progress– Fri (8-3) all day: final presentation

• Deadline report & code– Sunday March 10, 23:59– So, your report has to be in before Monday!– Email your report to “project@bioinformatics.nl”