Data Analysis Project Advanced Bioinformatics BIF-30806 2013.
-
Upload
walter-johnson -
Category
Documents
-
view
215 -
download
0
Transcript of Data Analysis Project Advanced Bioinformatics BIF-30806 2013.
![Page 1: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.](https://reader036.fdocuments.us/reader036/viewer/2022071716/56649ee85503460f94bfa0a4/html5/thumbnails/1.jpg)
Data Analysis Project
Advanced BioinformaticsBIF-30806
2013
![Page 2: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.](https://reader036.fdocuments.us/reader036/viewer/2022071716/56649ee85503460f94bfa0a4/html5/thumbnails/2.jpg)
Set Up
• Basic and Advanced Project• Available data sets• Deliverables• Literature• Groups• Schedule week 3 & 4
![Page 3: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.](https://reader036.fdocuments.us/reader036/viewer/2022071716/56649ee85503460f94bfa0a4/html5/thumbnails/3.jpg)
Purpose
• Build software pipeline to perform a transcriptome analysis
– Code to connect tools and do input/output conversions
– Code developed on certain data set, but should be able to run on different input (e.g. different species)
![Page 4: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.](https://reader036.fdocuments.us/reader036/viewer/2022071716/56649ee85503460f94bfa0a4/html5/thumbnails/4.jpg)
Basic Project
• Which are the most highly expressed genes (top 100) in your species of interest under a single condition (or in a single tissue)?
• Can you find a correlation between gene expression and transcript properties, such as GC content, transcript length, intron length, codon usage, or others?
• [Optional] Can you visualize the highly expressed genes in an interaction network?TOOLS: Tophat, cufflinks, perl scripts, and possibly others.
![Page 5: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.](https://reader036.fdocuments.us/reader036/viewer/2022071716/56649ee85503460f94bfa0a4/html5/thumbnails/5.jpg)
Why?
![Page 6: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.](https://reader036.fdocuments.us/reader036/viewer/2022071716/56649ee85503460f94bfa0a4/html5/thumbnails/6.jpg)
Advanced Project
• Which transcripts/genes show differential expression under both conditions?
• Can you find out what the functions of these genes are?• Can you give a biological explanation of why these genes are
differentially expressed under the conditions in your experiment?
• [Optional] In your data set, can you find modules of co-expressed genes? Try to use the WGCNA package.
• [Optional] Can you find a functional description and explanation for the identified modules?
• [Optional] To what extent are the modules conserved in a closely related species?
TOOLS: Tophat, cufflinks, cuffdiff, WGCNA, perl scripts, and possibly others
![Page 7: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.](https://reader036.fdocuments.us/reader036/viewer/2022071716/56649ee85503460f94bfa0a4/html5/thumbnails/7.jpg)
Why?
![Page 8: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.](https://reader036.fdocuments.us/reader036/viewer/2022071716/56649ee85503460f94bfa0a4/html5/thumbnails/8.jpg)
You have a choice
• Start on basic or advanced project– Of cour se the basic project can be extended
with elements of the advanced project• Group members should talk to each other and
discuss their choice with Harm/Sandra.
![Page 9: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.](https://reader036.fdocuments.us/reader036/viewer/2022071716/56649ee85503460f94bfa0a4/html5/thumbnails/9.jpg)
Deliverables per group
• Pipeline code, all input/output has to be stored in the “group directory” at the server
• Final presentation (20 minutes)– Each group member must prepare and presents
some slides (5 min per person)
![Page 10: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.](https://reader036.fdocuments.us/reader036/viewer/2022071716/56649ee85503460f94bfa0a4/html5/thumbnails/10.jpg)
Deliverables per person
• Project report– All the work done in the project (intro, M&M,
results, discussion/conclusion)– Appendix A: your contribution to the group effort– Appendix B: personal reflection on the project
• Contribution to group presentation– Prepare and present some slides (5 min per
person)• The code that you have written
![Page 11: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.](https://reader036.fdocuments.us/reader036/viewer/2022071716/56649ee85503460f94bfa0a4/html5/thumbnails/11.jpg)
Data
• On server: /course/project/– Arabidopsis– Yeast
• Other data/species of your choice– Use for example NCBI Short Read Archive (SRA)
![Page 12: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.](https://reader036.fdocuments.us/reader036/viewer/2022071716/56649ee85503460f94bfa0a4/html5/thumbnails/12.jpg)
Literature
• See course website
![Page 13: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.](https://reader036.fdocuments.us/reader036/viewer/2022071716/56649ee85503460f94bfa0a4/html5/thumbnails/13.jpg)
Groups
• See course website
![Page 14: Data Analysis Project Advanced Bioinformatics BIF-30806 2013.](https://reader036.fdocuments.us/reader036/viewer/2022071716/56649ee85503460f94bfa0a4/html5/thumbnails/14.jpg)
Schedule week 3 & 4
• Presentations– Tue (26-2) afternoon: presenting project plan– Fri (1-3) afternoon: presenting progress– Fri (8-3) all day: final presentation
• Deadline report & code– Sunday March 10, 23:59– So, your report has to be in before Monday!– Email your report to “[email protected]”