Transparent, accessible, and robust functional analysis of...

1
G a l a x y : u s e g a la x y . o r g Transparent, accessible, and robust functional analysis of SNPs Galaxy is a web-based platform for computational biomedical research. The Galaxy history is a record of the analysis and a description to help in repeating it, not unlike a recipe for baking a cake. For this example, we use public data to search for Most file formats can be easily converted within Galaxy to be used with the available tools. Here we converted a file in Personal Genome SNP (pgSnp) format imported from the BX Browser (a partial mirror of the UCSC Browser) to a variety of PLINK format used by the Genome Diversity tools (gd_snp). Shown here is a known SNP in SLC24A5 p.Thr111Ala (ACA > GCA) that is associated with light skin color in Caucasians. The Specify and Aggregate Individuals tools are used to aggregate the allele counts by skin color; then the Filter tool finds SNPs with differently fixed alleles, like the p.Thr111Ala above. The Cluster tool is used to find regions of the genome with clusters of such SNPs. Import and reformat data Fixed and clustered SNPs in functional regions Filter for SNPs that have different alleles nearly fixed in the two populations, and are clustered together Belinda M. Giardine, Burhans R., Riemer C., Cheng K., Ratan A., Harris R.,Von Kuster G., Galaxy Development Team, Hardison R.C., Zhang Y., Miller W. The Pennsylvania State University, Center for Comparative Genomics and Bioinformatics, University Park, PA [email protected] CEU GBR LWK YRI Using the aaChanges tool to look for potentially phenotype- associated SNPs in the coding regions of our filtered dataset identifies two coding SNPs: the known one shown above, and another in the CCNI2 gene, shown below in a larger set of populations. CEU GBR LWK YRI TSI ASW Population key: CEU: Caucasian, CEPH Collection GBR: British in England and Scotland LWK: Luhya in Webuye, Kenya YRI: Yoruba in Ibadan, Nigeria TSI: Toscani in Italy ASW: African ancestry in SW USA To look for potentially phenotype-associated SNPs in the non- coding regions, we can Intersect our filtered dataset with DNase hypersensitive sites and transcription factor occupied segments from ENCODE. The history panel gives the count of SNPs that intersect DHS (blue arrow), and the center panel shows the results from the TFos intersection. Two of these are displayed in the browser shot below. * Homo sapiens cyclin I family, member 2 SNPs that are associated with differences between two populations. T u t o r i a l: w w w . b x .p s u .e d u / m ill e r _ l a b

Transcript of Transparent, accessible, and robust functional analysis of...

Page 1: Transparent, accessible, and robust functional analysis of ...giardine/tests/tmp/ashg2012copy.pdf · Galaxy is a web-based platform for computational biomedical research. The Galaxy

Galaxy: usegalaxy.org

Transparent, accessible, and robust functional analysis of SNPs

Galaxy is a web-based platform for computational biomedical research. The Galaxy history is a record of the analysis and a description to help in repeating it, not unlike a recipe for baking a cake. For this example, we use public data to search for

Most file formats can be easily converted within Galaxy to be used with the available tools. Here we converted a file in Personal Genome SNP (pgSnp) format imported from the BX Browser (a partial mirror of the UCSC Browser) to a variety of PLINK format used by the Genome Diversity tools (gd_snp). Shown here is a known SNP in SLC24A5 p.Thr111Ala (ACA > GCA) that is associated with light skin color in Caucasians.

The Specify and Aggregate Individuals tools are used to aggregate the allele counts by skin color; then the Filter tool finds SNPs with differently fixed alleles, like the p.Thr111Ala above. The Cluster tool is used to find regions of the genome with clusters of such SNPs.

Import and reformat data

Fixed and clustered SNPs in functional regions

Filter for SNPs that have different alleles nearly fixed in the two populations, and are clustered together

Belinda M. Giardine, Burhans R., Riemer C., Cheng K., Ratan A., Harris R., Von Kuster G., Galaxy Development Team, Hardison R.C., Zhang Y., Miller W.The Pennsylvania State University, Center for Comparative Genomics and Bioinformatics, University Park, PA [email protected]

CEU

GBR

LWK

YRI

Using the aaChanges tool to look for potentially phenotype- associated SNPs in the coding regions of our filtered dataset identifies two coding SNPs: the known one shown above, and another in the CCNI2 gene, shown below in a larger set of populations.

CEU

GBR

LWK

YRI

TSI

ASW

Population key:CEU: Caucasian, CEPH CollectionGBR: British in England and ScotlandLWK: Luhya in Webuye, KenyaYRI: Yoruba in Ibadan, NigeriaTSI: Toscani in ItalyASW: African ancestry in SW USA

To look for potentially phenotype-associated SNPs in the non- coding regions, we can Intersect our filtered dataset with DNase hypersensitive sites and transcription factor occupied segments

from ENCODE. The history panel gives the count of SNPs that intersect DHS (blue arrow), and the center panel shows the results from the TFos intersection. Two of these are displayed in the browser shot below.

* Homo sapiens cyclin I family, member 2

SNPs that are associated with differences between two populations.

Tutorial: www.bx.psu.edu/miller_lab