Transparent, accessible, and robust functional analysis of...
Transcript of Transparent, accessible, and robust functional analysis of...
Galaxy: usegalaxy.org
Transparent, accessible, and robust functional analysis of SNPs
Galaxy is a web-based platform for computational biomedical research. The Galaxy history is a record of the analysis and a description to help in repeating it, not unlike a recipe for baking a cake. For this example, we use public data to search for
Most file formats can be easily converted within Galaxy to be used with the available tools. Here we converted a file in Personal Genome SNP (pgSnp) format imported from the BX Browser (a partial mirror of the UCSC Browser) to a variety of PLINK format used by the Genome Diversity tools (gd_snp). Shown here is a known SNP in SLC24A5 p.Thr111Ala (ACA > GCA) that is associated with light skin color in Caucasians.
The Specify and Aggregate Individuals tools are used to aggregate the allele counts by skin color; then the Filter tool finds SNPs with differently fixed alleles, like the p.Thr111Ala above. The Cluster tool is used to find regions of the genome with clusters of such SNPs.
Import and reformat data
Fixed and clustered SNPs in functional regions
Filter for SNPs that have different alleles nearly fixed in the two populations, and are clustered together
Belinda M. Giardine, Burhans R., Riemer C., Cheng K., Ratan A., Harris R., Von Kuster G., Galaxy Development Team, Hardison R.C., Zhang Y., Miller W.The Pennsylvania State University, Center for Comparative Genomics and Bioinformatics, University Park, PA [email protected]
CEU
GBR
LWK
YRI
Using the aaChanges tool to look for potentially phenotype- associated SNPs in the coding regions of our filtered dataset identifies two coding SNPs: the known one shown above, and another in the CCNI2 gene, shown below in a larger set of populations.
CEU
GBR
LWK
YRI
TSI
ASW
Population key:CEU: Caucasian, CEPH CollectionGBR: British in England and ScotlandLWK: Luhya in Webuye, KenyaYRI: Yoruba in Ibadan, NigeriaTSI: Toscani in ItalyASW: African ancestry in SW USA
To look for potentially phenotype-associated SNPs in the non- coding regions, we can Intersect our filtered dataset with DNase hypersensitive sites and transcription factor occupied segments
from ENCODE. The history panel gives the count of SNPs that intersect DHS (blue arrow), and the center panel shows the results from the TFos intersection. Two of these are displayed in the browser shot below.
* Homo sapiens cyclin I family, member 2
SNPs that are associated with differences between two populations.
Tutorial: www.bx.psu.edu/miller_lab