(Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information...
-
Upload
evan-henry-norman -
Category
Documents
-
view
216 -
download
0
Transcript of (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information...
![Page 1: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/1.jpg)
(Human) GenomicsBIOM/PHAR206 – 05/19/2014
Olivier Harismendy, PhDDivision of Genome Information Sciences
Department of PediatricsMoores UCSD Cancer Center
![Page 2: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/2.jpg)
UCSC Genome Browser• isPCR• BLAT• LiftOver• Track types
– BED minimum– BED extended– WIG
• Track Display and Shuffle• Browser Navigation• Custom Session – Export Figure• Custom Tracks
![Page 3: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/3.jpg)
0-based coordinates
Sequence A|C|C|G|G|T|C|G|A
1 based 1 2 3 4 5 6 7 8 9
0 based 1 2 3 4 5 6 7 8 9
![Page 4: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/4.jpg)
Human Genome Assemblies
![Page 5: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/5.jpg)
BED Track Formats
track name="ItemRGBDemo" description="Item RGB demonstration" visibility=2 itemRgb="On"
chr7 127471196 127472363 Pos1 0 + 127471196 127472363 255,0,0
chr7 127472363 127473530 Pos2 0 + 127472363 127473530 255,0,0
chr7 127473530 127474697 Pos3 0 + 127473530 127474697 255,0,0
chr7 127474697 127475864 Pos4 0 + 127474697 127475864 255,0,0
chr7 127475864 127477031 Neg1 0 - 127475864 127477031 0,0,255
chr7 127477031 127478198 Neg2 0 - 127477031 127478198 0,0,255
chr7 127478198 127479365 Neg3 0 - 127478198 127479365 0,0,255
chr7 127479365 127480532 Pos5 0 + 127479365 127480532 255,0,0
chr7 127480532 127481699 Neg4 0 - 127480532 127481699 0,0,255
![Page 6: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/6.jpg)
BED Track FormatsHeader: space separated parameters•name=<track_label> •description=<center_label> •type=<track_type> - Defines the track type. The track type attribute is required for BAM, BED detail, bedGraph, bigBed, bigWig, broadPeak, narrowPeak, Microarray, VCF and WIG tracks.•visibility=<display_mode> 0 - hide, 1 - dense, 2 - full, 3 - pack, and 4 - squish. •color=<RRR,GGG,BBB> - Defines the main color for the annotation track. •itemRgb=On •colorByStrand=<RRR,GGG,BBB RRR,GGG,BBB> - Sets colors for + and - strands, in that order. •useScore=<use_score> •group=<group> - •priority=<priority> - When the group attribute is set, defines the display position of the track relative to other tracks•db=<UCSC_assembly_name> - When set, indicates the specific genome assembly for which the annotation data is intended; •offset=<offset> - Defines a number to be added to all coordinates in the annotation track. The default is "0".•maxItems=<#> - Defines the maximum number of items the track can contain. •url=<external_url> - Defines a URL for an external link associated with this track. •htmlUrl=<external_url> - Defines a URL for an HTML description page to be displayed with this track. •bigDataUrl=<external_url> - Defines a URL to the data file for BAM, bigBed, bigWig or VCF tracks.
![Page 7: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/7.jpg)
BED Track Formats• For intervals• Header: space separated configuration parameters
– chrom - The name of the chromosome – chromStart - The starting position of the feature in the chromosome or scaffold. The
first base in a chromosome is numbered 0.– chromEnd - The ending position of the feature in the chromosome or scaffold. The
chromEnd base is not included in the display of the feature. – name - Defines the name of the BED line. – score - A score between 0 and 1000. – strand - Defines the strand - either '+' or '-'.– thickStart - The starting position at which the feature is drawn thickly – thickEnd - The ending position at which the feature is drawn thickly – itemRgb - An RGB value of the form R,G,B (e.g. 255,0,0). – blockCount - The number of blocks (exons) in the BED line.– blockSizes - A comma-separated list of the block sizes. – blockStarts - A comma-separated list of block starts.
![Page 8: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/8.jpg)
WIG track format# 150 base wide bar graph at arbitrarily spaced positions,# threshold line drawn at y=11.76# autoScale off viewing range set to [0:25]# priority = 10 positions this as the first graph# Note, one-relative coordinate system in use for this formattrack type=wiggle_0 name="variableStep" description="variableStep format" visibility=full autoScale=off viewLimits=0.0:25.0 color=50,150,255 yLineMark=11.76 yLineOnOff=on priority=10variableStep chrom=chr19 span=15049304701 10.049304901 12.549305401 15.049305601 17.549305901 20.049306081 17.549306301 15.049306691 12.549307871 10.0# 200 base wide points graph at every 300 bases, 50 pixel high graph# autoScale off and viewing range set to [0:1000]# priority = 20 positions this as the second graph# Note, one-relative coordinate system in use for this formattrack type=wiggle_0 name="fixedStep" description="fixedStep format" visibility=full autoScale=off viewLimits=0:1000 color=0,200,100 maxHeightPixels=100:50:20 graphType=points priority=20 fixedStep chrom=chr19 start=49307401 step=300 span=2001000 900 800 700 600 500 400 300 200 100
![Page 9: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/9.jpg)
Specific Tracks of interest• UCSC genes• RefSeq Genes• RepeatMasker• Conservation• TF motif predictions• dbSNP• ENCODE• Roadmap
![Page 10: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/10.jpg)
Custom Sessions• Create an account• Customize the tracks displayed• Add you own track (limited in size and time)• Save and Share
![Page 11: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/11.jpg)
Table Browser• Subset gene, region, genome• Output BED or fasta• Intersection• Filters
![Page 12: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/12.jpg)
ENCODE / Roadmap Tracks• Track search• Cell Types / Tissue Types• Raw • Peaks• HMM
![Page 13: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/13.jpg)
UNIX commands• Head • More (press Q to exit)• Cat
– Example cat file – Example cat file1 file2
• Grep – Grep –v ‘expression’– Grep –A 1 ‘expression’– Grep –B 2 ‘expression’– Example: grep –v ‘#’ file.txt to remove comments
• Expression metacharacters– $ end of line– $ beginning of line– [AB] A or B– * any character– Example: ‘CDKN*’ or ‘chr[1-7]’
![Page 14: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/14.jpg)
UNIX commands• Cut
– cut –f 1– cut –f 3 –d ‘:’
• Sort – sort –n– sort –nr (or sort –n –r)– sort –k 2
• uniq– uniq– uniq -c
• wc– wc –l file.txt – Example: cut –f 1 file | sort | uniq -c
![Page 15: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/15.jpg)
UNIX commands• Sed
– Sed ‘s/foo/bar/g’ file: find and replace
• Awk– Awk ‘$3>2000’ file : select row with 3rd field>2000– Awk ‘{if ($3>2000) print $1,$2}’ file only print first
2 columns– Awk ‘{sum+=$3} END {print sum}’ file print sum of
column 3– Awk ‘{sum+=$3} END {print sum/NR}’ file print
average of column 3
• Join– join –j 1 sorted_file1 sorted_file2
![Page 16: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/16.jpg)
Demo #1 and #2
![Page 17: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/17.jpg)
DNA variants(Sequence differences)
Highly Similar Genomes
Phenotypic Differences(Physical traits)
Human Genetic Variation
![Page 18: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/18.jpg)
Variant Types
Frazer et al. 2009
Rahim, Harismendy et al (2008)
![Page 19: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/19.jpg)
Within any given individual there are ~ 4 million genetic variants encompassing ~ 12 Mb
Variants from an individual genome
![Page 20: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/20.jpg)
Variants from multiple genomes
Within a given individual the majority of variants
are common.
![Page 21: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/21.jpg)
Next Generation DNA analysis• Whole genome sequencing
– Mutations (coding and non-coding)– Translocations– Copy Number Variants
• Whole Exome Sequencing– Mutations (coding)– ~Copy number variants (trisomia, gene
amplifications)• Gene Panel
– Mutations (coding)
![Page 22: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/22.jpg)
Variant Frequencies
• Common genetic variants – second allele present at greater than 3% frequency
• Rare genetic variant– present at less than 3% frequency, and commonly at very low
frequencies
• Private variants– in limited families or single individuals
![Page 23: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/23.jpg)
Map of Genetic Variation
Relationships between common SNPs in the human genome
Frazer et al (2007)
HapMap Project
Genotyped ~ 3.1 million SNPs in 270 individuals– 90 Yoruba in Ibadan, Nigeria (YRI) – 90 European descent in Utah, USA (CEU)– 45 Han Chinese in Beijing, China (CHB)– 45 Japanese in Tokyo, Japan (JPT)
![Page 24: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/24.jpg)
1000G Project
![Page 25: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/25.jpg)
VCF format##fileformat=VCFv4.1##fileDate=20090805##source=myImputationProgramV3.1##reference=file:///seq/references/1000GenomesPilot-NCBI36.fasta##contig=<ID=20,length=62435964,assembly=B36,md5=f126cdf8a6e0c7f379d618ff66beb2da,species="Homo sapiens",taxonomy=x>##phasing=partial##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency">##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele">##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129">##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership">##FILTER=<ID=q10,Description="Quality below 10">##FILTER=<ID=s50,Description="Less than 50% of samples have data">##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality">#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA0000320 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:320 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:420 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:220 1234567 microsat1 GTC G,GTCT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3
![Page 26: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/26.jpg)
Linkage Disequilibrium (LD) Given two biallelic sites there are four combinations that can be
observed with the following distributions.
SNP 1 = A/G
SNP 2 = A/C
SNP1-SNP2
Case r2=1 Case r2=0
AA 70 25
AC 0 25
GA 0 25
GC 30 25
LD measure the level of correlation between SNPsLD is the consequence of recombination at preferential sites
![Page 27: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/27.jpg)
LD Bin structure exampleLD bin = groups of SNPs with r2≥0.8
•The majority of common SNPs are in LD bins in the human genome
•Genotypes of a set of ~500,000 “tag SNPs” provide information (r2
≥ 0.8) regarding a large fraction (90%) of all 8 million common SNPs present in humans.
![Page 28: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/28.jpg)
GWAS principle
Tests if common SNPs tagging an interval in the human genome are “associated” with a disease
From phenotype to genotype
http://www.mpg.de
![Page 29: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/29.jpg)
GWAS results
WTCCC (2007)
PR interval
Large number to test requires low p-value (5.10-8)Sample sizes determine variant frequencies and effect size (Power)
Q1 2011221 traits
1319 studies>4000 associated SNPs
![Page 30: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/30.jpg)
GWAS highlights
• Many genes/loci not previously known to be involved in the diseases studied
• Newly identified pathways suggest that molecular sub-phenotypes of common diseases may exist
• Many common diseases have the same associated genes suggesting similar etiologies
![Page 31: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/31.jpg)
GWAS limitations– Genetic
• Small Effect sizes : only explains a small fraction (1-25%) of the heritability
• Missing heritability can be hiding in– Rare variants with large effects– Epitasis (Gene x Gene interactions)– Gene x Environment interaction (overlooked in heritability studies)
– Clinical• Limited Prognostic value : classic marker (family history, life style)
work better• Limited by ethnicity
– Functional• Proxy SNPs are not the functional ones• Genes associated by proximity : Variants are mostly outside• Cell type and condition unknown
![Page 32: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/32.jpg)
Demo #3
![Page 33: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/33.jpg)
Cancer Types
![Page 34: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/34.jpg)
Clinical Data Collectedage_at_initial_pathologic_diagnosis 100% history_of_colon_polyps 82%
preoperative_pretreatment_cea_level
60%
icd_10 89% pretreatment_history 100%
icd_o_3_histology 99%primary_lymph_node_presentation
_assessment98%
ajcc_cancer_staging_handbook_edition 80% icd_o_3_site 99% primary_tumor_pathologic_spread 100%
anatomic_site_colorectal 88% informed_consent_verified 100% prior_diagnosis 100%bcr_patient_uuid 100% kras_gene_analysis_performed 89% race 57%
braf_gene_analysis_performed 87% kras_mutation_codon 4% residual_tumor 82%
braf_gene_analysis_result 6% kras_mutation_found 9%synchronous_colon_cancer_presen
t87%
circumferential_resection_margin 10%loss_expression_of_mismatch_repair_protei
ns_by_ihc74% tissue_source_site 100%
colon_polyps_present 42% lymph_node_examined_count 98% tumor_stage 96%date_of_form_completion 100% lymphatic_invasion 87% tumor_tissue_site 100%
date_of_initial_pathologic_diagnosis 100% lymphnode_pathologic_spread 100% venous_invasion 83%
days_to_birth 100% microsatellite_instability 16% vital_status 100%days_to_death 89% non_nodal_tumor_deposits 43% weight 51%
days_to_initial_pathologic_diagnosis 100% number_of_abnormal_loci 12% anatomic_organ_subdivision 2%
days_to_last_followup 96%number_of_first_degree_relatives_with_can
cer_diagnosis85%
loss_expression_of_mismatch_repair_proteins_by_ihc_result
18%
days_to_last_known_alive 61% number_of_loci_tested 12%
distant_metastasis_pathologic_spread 98% number_of_lymphnodes_positive_by_he 94%
ethnicity 55% number_of_lymphnodes_positive_by_ihc 9%
gender 100% patient_id 100%height 47% perineural_invasion_present 33%
histological_type 99% person_neoplasm_cancer_status 86%
Personal and history Histology
Clinical Molecular
![Page 35: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/35.jpg)
Days after Dx
Patie
nts
Dec
reas
ing
Intr
insi
c se
nsiti
vity
Clinical Data Collected
![Page 36: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/36.jpg)
Molecular Data Collected
Molecule Method Measured entity Data
RNA microarrays 15,000 transcripts Expression levels
RNA RNA-Seq All known and novel trasncripts
Expression levels, isoform quantification, editing, Novel transcripts, Fusion
Trasncripts
DNA microarrays 100k to 1M SNP Copy Number Aberrations, LoH, Polymorphisms
DNA Sanger Sequencing 30 M Base pairs Coding Mutations
DNA whole exome sequencing 50 M Base pairs Coding Mutations, Copy Number
Aberrations
DNA whole genome 3 billion base pairs Coding and Regulatory Mutations, Copy Number Aberrations, Rearragements
DNA Methylation Array 450,000 CpG Methylation levels
DNA Methylation Array 27,000 CpG Methylation levels
![Page 37: (Human) Genomics BIOM/PHAR206 – 05/19/2014 Olivier Harismendy, PhD Division of Genome Information Sciences Department of Pediatrics Moores UCSD Cancer.](https://reader034.fdocuments.us/reader034/viewer/2022051401/56649cd95503460f949a3553/html5/thumbnails/37.jpg)
Demo #4