Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton
-
Upload
nathanlawless -
Category
Science
-
view
80 -
download
2
description
Transcript of Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton
![Page 1: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton](https://reader034.fdocuments.us/reader034/viewer/2022042700/559aac4d1a28abfe688b45a3/html5/thumbnails/1.jpg)
Bioinformatics
Programming
(Perl Programming)
2010
Davide Pisani
![Page 2: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton](https://reader034.fdocuments.us/reader034/viewer/2022042700/559aac4d1a28abfe688b45a3/html5/thumbnails/2.jpg)
Bioinformatics
• Using computers to store, organise and
interpret biological data
• In particular, data from high-throughput
technologies (-omics)
![Page 3: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton](https://reader034.fdocuments.us/reader034/viewer/2022042700/559aac4d1a28abfe688b45a3/html5/thumbnails/3.jpg)
High-throughput technologies
• DNA & Protein sequences and structure
(genomics & Proteomics)
• Yeast two-hybrid screens (interactomics)
• Microarrays (transcriptomics)
• Metabolic networks (metabolomics)
![Page 4: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton](https://reader034.fdocuments.us/reader034/viewer/2022042700/559aac4d1a28abfe688b45a3/html5/thumbnails/4.jpg)
How much sequence data is
there? 1371published complete genomes
188 ongoing archaeal genomes
4941 Bacterial ongoing genomes
1599 Ongoing eukaryotic genomes
242 metagenomes
![Page 5: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton](https://reader034.fdocuments.us/reader034/viewer/2022042700/559aac4d1a28abfe688b45a3/html5/thumbnails/5.jpg)
How much data in each
genome? ftp://ftp.ncbi.nih.gov/refseq/release/
![Page 6: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton](https://reader034.fdocuments.us/reader034/viewer/2022042700/559aac4d1a28abfe688b45a3/html5/thumbnails/6.jpg)
The human genomeftp://ftp.ncbi.nih.gov/refseq/release/
![Page 7: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton](https://reader034.fdocuments.us/reader034/viewer/2022042700/559aac4d1a28abfe688b45a3/html5/thumbnails/7.jpg)
The human genomeftp://ftp.ncbi.nih.gov/refseq/release/
![Page 8: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton](https://reader034.fdocuments.us/reader034/viewer/2022042700/559aac4d1a28abfe688b45a3/html5/thumbnails/8.jpg)
The human genomeftp://ftp.ncbi.nih.gov/refseq/release/
etc..
(70 base pairs per line, 57 lines per page = 3990 bases/page
Chromosome 1 is (about) 247,249,719 bases long
i.e. 62,000 pages
Whole genome (3.2 x 109) = 802,000 pages
![Page 9: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton](https://reader034.fdocuments.us/reader034/viewer/2022042700/559aac4d1a28abfe688b45a3/html5/thumbnails/9.jpg)
Genome Base pairs No. of Genes
Phi-X 174 5,386 10
Nanoarchaeum equitans 490,885 552
E. coli 4,639,221 4,377
Saccharomyces
cerevisiae
12,495,682 5,800
Drosophila
melanogaster
122,653,977 13,379
Homo sapiens 3.2 x 109 30,000
Protopterus aethiopicus 1.3 x 109 ?
Psilotum nudum 2.5 x 1011 ?20-25,000
Amoeba dubia 6.7 x 1011 ?
![Page 10: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton](https://reader034.fdocuments.us/reader034/viewer/2022042700/559aac4d1a28abfe688b45a3/html5/thumbnails/10.jpg)
Genbank contains much more
than just sequence data
Information on the Organism, the
gene, where it is expressed and so
forth.
![Page 11: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton](https://reader034.fdocuments.us/reader034/viewer/2022042700/559aac4d1a28abfe688b45a3/html5/thumbnails/11.jpg)
Protein Structure
![Page 12: Lecture1 1 Perl for bioinformatics Davide Pisani & James Cotton](https://reader034.fdocuments.us/reader034/viewer/2022042700/559aac4d1a28abfe688b45a3/html5/thumbnails/12.jpg)
PDB: Protein Structure