InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge The outcome of...

39
InterPro/prosite InterPro/prosite UCSC Genome UCSC Genome Browser Browser Exercise 3 Exercise 3
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge The outcome of...

Page 1: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

InterPro/prosite InterPro/prosite UCSC Genome UCSC Genome

BrowserBrowser

Exercise 3Exercise 3

Page 2: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

Turning information into knowledgeTurning information into knowledge

The outcome of a sequencing project is The outcome of a sequencing project is masses of raw datamasses of raw data

The challenge is to turn this The challenge is to turn this raw data into raw data into biological knowledgebiological knowledge

A valuable tool for this challenge is an A valuable tool for this challenge is an automated diagnostic pipe through which automated diagnostic pipe through which newly determined sequences can be newly determined sequences can be streamlinedstreamlined

Page 3: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

From sequence to functionFrom sequence to function

Nature tends to innovate rather than inventNature tends to innovate rather than invent Proteins are composed of functional Proteins are composed of functional

elements: domains and motifselements: domains and motifs DomainsDomains are structural units that carry out a are structural units that carry out a

certain functioncertain function The same domains are The same domains are

shared between different shared between different proteinsproteins

MotifsMotifs are shorter are shorter sequences with certainsequences with certainbiological activitybiological activity

Page 4: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

http://www.ebi.ac.uk/http://www.ebi.ac.uk/interprointerpro//

Page 5: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

InterProInterPro

An integrated documentation resource for An integrated documentation resource for protein families, domains and sitesprotein families, domains and sites

Groups signatures describing the same protein Groups signatures describing the same protein family or domainfamily or domain

Combines a number of databases that use Combines a number of databases that use different methodologies to derive protein different methodologies to derive protein signature:signature: UniProt: UniProtKB Swiss-Prot, TrEMBL, UniProt: UniProtKB Swiss-Prot, TrEMBL,

UniRef,UniParcUniRef,UniParc prosite: documented DB on domains, families and prosite: documented DB on domains, families and

functional sites.functional sites. Pfam: a DB of protein families represented by MSAsPfam: a DB of protein families represented by MSAs

Page 6: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

Member databasesMember databases

SequenceSequence--motif methods:motif methods: Protein signature DBs with different Protein signature DBs with different

focusfocus

SequenceSequence--cluster methods:cluster methods: Hierarchically clustered Hierarchically clustered

sequence/structure DBssequence/structure DBs

Page 7: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

InterPro searchInterPro search

Page 8: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Page 9: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Page 10: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

http://www.expasy.ch/http://www.expasy.ch/prositeprosite//

Page 11: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

prositeprosite

A method for determining the function of A method for determining the function of uncharacterized translated protein uncharacterized translated protein sequencessequences

Consists of a DB of annotated biologically Consists of a DB of annotated biologically important important sites/patterns/motifs/signature/fingerprintssites/patterns/motifs/signature/fingerprints

Page 12: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

prositeprosite Entries are represented with Entries are represented with patternspatterns or or

profilesprofiles

pattern

1122334455

AA0.660.66110000..

TT00000011..

CC0.330.33000.660.6600..

GG00000.330.3300..

profile

[AC-]A-[GC]-T-[TC]-[GC]

Profiles are used in prosite when the motif is relatively Profiles are used in prosite when the motif is relatively divergent, and it is difficult to represent as a patterndivergent, and it is difficult to represent as a pattern

Page 13: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

Scanning prositeScanning prosite

Query: sequence

Query: pattern

Result: all patterns found in sequence

Result: all sequences which adhere to this pattern

Page 14: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

Patterns with a high probability of Patterns with a high probability of occurrenceoccurrence

Entries describing commonly found postEntries describing commonly found post--translational modifications or compositionally translational modifications or compositionally biased regions.biased regions.

Found in the majority of known protein Found in the majority of known protein sequences sequences

High probability of occurrenceHigh probability of occurrence

Page 15: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

prosite sequence queryprosite sequence query

Page 16: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Page 17: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

prosite pattern queryprosite pattern query

Page 18: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Page 19: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.
Page 20: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

UCSC Genome BrowserUCSC Genome Browser

Page 21: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

Reset all settings of

previous user

UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway

Page 22: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway

Page 23: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

UCSC Genome Browser - GatewayUCSC Genome Browser - Gateway

Page 24: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

UCSC Genome BrowserUCSC Genome Browserquery resultsquery results

Page 25: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

UCSC Genome Browser UCSC Genome Browser Annotation tracksAnnotation tracks

Vertebrate conservation

mRNA (GenBank)

RefSeq

UCSC Genes

Base position

Single species compared

SNPs

Repeats

GeneDirection

Exon

Intron

UTR

Page 26: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

USCS GeneUSCS Gene

Page 27: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

UCSC Genome Browser - movementUCSC Genome Browser - movement

Zoom x3 + Center

Page 28: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

UCSC Genome Browser – UCSC Genome Browser – Base viewBase view

Page 29: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

Annotation track optionsAnnotation track options

dense

squish

full

pack

Page 30: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

Annotation track optionsAnnotation track optionsAnother option totoggle between

‘pack’ and ‘dense’view is to click on

the track title

Sickle-cell anemia distr.

Malariadistr.

Page 31: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

BLATBLAT

BLAT = BBLAT = Blast-last-LLike ike AAlignment lignment TTool ool BLAT is designed to find similarity of BLAT is designed to find similarity of >95% on >95% on

DNADNA, , >80% for protein>80% for protein Rapid search by indexing entire genome.Rapid search by indexing entire genome.

Good for:Good for:

1.1. Finding genomic coordinates of cDNAFinding genomic coordinates of cDNA

2.2. Determining exons/intronsDetermining exons/introns

3.3. Finding human (or chimp, dog, cow…) Finding human (or chimp, dog, cow…) homologs of another vertebrate sequencehomologs of another vertebrate sequence

Page 32: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

BLAT on UCSC Genome BrowserBLAT on UCSC Genome Browser

Page 33: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

BLAT on UCSC Genome BrowserBLAT on UCSC Genome Browser

Page 34: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

BLAT ResultsBLAT Results

Page 35: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

BLAT ResultsBLAT Results

Match

Non-Match(mismatch/indel)

Indel boundaries

Page 36: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

BLAT ResultsBLAT Results

Page 37: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

BLAT Results on the browserBLAT Results on the browser

Page 38: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

Getting Getting DNADNA sequence of region sequence of region

Page 39: InterPro/prosite UCSC Genome Browser Exercise 3. Turning information into knowledge  The outcome of a sequencing project is masses of raw data  The.

Getting Getting DNADNA sequence of region sequence of region