JSR 308: Type Annotations Making Java annotations more general ...
Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other...
Transcript of Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other...
![Page 1: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/1.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
Annotation
Marc Carlson
Fred Hutchinson Cancer Research Center
January 29, 2010
1 / 21Annotation
![Page 2: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/2.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
1 Bioconductor Annotations for Sequencing Technologies
2 rtracklayer
3 biomaRt
4 AnnotationDbi
2 / 21Annotation
![Page 3: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/3.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
Outline
1 Bioconductor Annotations for Sequencing Technologies
2 rtracklayer
3 biomaRt
4 AnnotationDbi
3 / 21Annotation
![Page 4: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/4.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
Annotations for Sequencing Technologies
Annotations for Sequencing projectsOther packages:
rtracklayer – export to UCSC web browsers.
GenomicFeatures – coming soon for transcript annotations (willrelease in spring)
biomaRt:
Query web-based ‘biomart’ resource for genes, sequence, and SNPsetc.
AnnotationDbi packages:
Organism and chip packages – contain chromosome start and stopsites for most genes.
4 / 21Annotation
![Page 5: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/5.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
Outline
1 Bioconductor Annotations for Sequencing Technologies
2 rtracklayer
3 biomaRt
4 AnnotationDbi
5 / 21Annotation
![Page 6: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/6.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
rtracklayer basics
What rtraklayer offers: rtracklayer
Web accessible annotations
Source: The data is from UCSC Genome tracks
6 / 21Annotation
![Page 7: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/7.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
finding resources with rtracklayer
How to find data from the UCSC Genome browser in R
creates a browserSession: browserSession.
list available genomes from UCSC: ucscGenomes.
set up a genome object: genome.
list available tracks: trackNames.
> library(rtracklayer)
> session <- browserSession()
> head(ucscGenomes())
> genome(session) <- "hg18"
> head(trackNames(session))
7 / 21Annotation
![Page 8: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/8.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
obtaining resources with rtracklayer
Downloading the UCSC Genome browser data into R
generate a query for UCSC: ucscTableQuery.
retrieves a UCSC track: getTable.
> ##can generate a query
> query <- ucscTableQuery(session, "refGene")
> ##which in turn can be used to get the data
> track <- getTable(query)
> head(track)
> colnames(track)
8 / 21Annotation
![Page 9: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/9.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
packaging chromosome data into aRangedData object
Next we can package this data into a RangedData object
> library(IRanges)
> library(BSgenome)
> rdAnn <- RangedData(IRanges(start = track[,"txStart"],
+ end = track[,"txEnd"]),
+ space = track[,"chrom"],
+ strand = track[,"strand"],
+ id = track[,"name"])
> rdAnn
9 / 21Annotation
![Page 10: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/10.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
Outline
1 Bioconductor Annotations for Sequencing Technologies
2 rtracklayer
3 biomaRt
4 AnnotationDbi
10 / 21Annotation
![Page 11: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/11.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
BiomaRt basics
What biomaRt offers: biomaRt
Web accessible annotations
The data is from ensembl
11 / 21Annotation
![Page 12: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/12.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
finding resources at biomaRt
BiomaRt has several methods for discovery or resources.
list available databases: listMarts.
list available datasets: listDatasets.
sets up a DB to be used: useMart.
> library(biomaRt)
> head(listMarts())
> mart <- useMart("ensembl")
> head(listDatasets(mart))
> ens <- useMart("ensembl", dataset="scerevisiae_gene_ensembl")
> ens
12 / 21Annotation
![Page 13: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/13.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
extracting data from biomaRt
To call getBM you need to to apply appropriate filters andattributes to a list of values that you supply. Attributes are whatyou want from the query, and filters describe the values you supply.
list filters from the DB/Dataset: listFilters.
list attributes from that DB/Dataset: listAttributes.
get selected data: getBM.
> head(listFilters(ens))
> head(listAttributes(ens))
> ## example query
> getBM(attributes=c("ensembl_gene_id","chromosome_name",
+ "strand","start_position","end_position"),
+ filters="entrezgene",
+ values=c(1466398,1466399,1466400), mart=ens)
13 / 21Annotation
![Page 14: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/14.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
extracting data from biomaRt
Lets now call getBM to get ALL of the data on these fields.
> BMres <- getBM(attributes=c("ensembl_gene_id",
+ "chromosome_name","strand",
+ "start_position","end_position"), mart=ens)
14 / 21Annotation
![Page 15: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/15.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
biomaRt exercise
Using what you just learned about biomaRt, try to construct aRangedData Annotation object similar to what we did withrtracklayer.
15 / 21Annotation
![Page 16: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/16.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
packaging biomaRt data into a RangedDataobject
> library(IRanges)
> library(BSgenome)
> strand <- strand(ifelse(BMres[,"strand"] > 0, "+", "-"))
> rdAnno <- RangedData(IRanges(
+ start = abs(BMres[,"start_position"]),
+ end = abs(BMres[,"end_position"])),
+ space = BMres[,"chromosome_name"],
+ strand = strand,
+ gene_id = BMres[,"ensembl_gene_id"] )
> rdAnno
16 / 21Annotation
![Page 17: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/17.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
Outline
1 Bioconductor Annotations for Sequencing Technologies
2 rtracklayer
3 biomaRt
4 AnnotationDbi
17 / 21Annotation
![Page 18: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/18.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
Using Annotation packages
What Annotation packages offer:
Pre-built and versioned annotation packages
The data is from NCBI
18 / 21Annotation
![Page 19: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/19.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
extracting chromosome data from Annotpackages
First let’t just get the data from the package.
> library(org.Sc.sgd.db)
> start <- toTable(org.Sc.sgdCHRLOC)
> end <- toTable(org.Sc.sgdCHRLOCEND)
> ##must check that these are the SAME!
> table(start[,1]==end[,1])
> ##If that checks out ok, then we can cbind() them together:
> end <- end[,"stop"]
> res <- cbind(start,end)
> ##filter out autonomously replicating sequences...
> res <- res[abs(res[,"start"]) < abs(res[,"end"]),]
> head(res)
19 / 21Annotation
![Page 20: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/20.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
Annotation package exercise
Using what you just learned about the annotation packages, try toconstruct a RangedData Annotation object similar to what we didwith biomaRt and rtracklayer.
20 / 21Annotation
![Page 21: Marc Carlson · Annotations for Sequencing Technologies Annotations for Sequencing projects Other packages: rtracklayer{ export to UCSC web browsers. GenomicFeatures{ coming soon](https://reader033.fdocuments.us/reader033/viewer/2022060313/5f0b7bc27e708231d430be08/html5/thumbnails/21.jpg)
Outline Bioconductor Annotations for Sequencing Technologies rtracklayer biomaRt AnnotationDbi
packaging annotation package data into aRangedData object
> library(IRanges)
> library(BSgenome)
> chroms <- paste("chr", res[,"Chromosome"], sep="")
> strand <- strand(ifelse(res[,"start"] > 0, "+", "-"))
> rdAnnot <- RangedData(IRanges(start = abs(res[,"start"]),
+ end = abs(res[,"end"])),
+ space = chroms,
+ strand = strand,
+ id = res[,"systematic_name"])
> rdAnnot
This is the same as the contents ofextractYeastGenesAsRangedData.
21 / 21Annotation