Considerations for Analyzing Targeted NGS Data Exome

24
Considerations for Analyzing Targeted NGS Data Exome Tim Hague, CTO

description

Considerations for Analyzing Targeted NGS Data Exome. Tim Hague , CTO. Exome Analysis. 3 sets of full exome sequences for the same individual, targeted by 3 different kits One set had data problems because reads were from 2 different sequencers - PowerPoint PPT Presentation

Transcript of Considerations for Analyzing Targeted NGS Data Exome

Page 1: Considerations for Analyzing Targeted NGS Data Exome

Considerations for Analyzing Targeted NGS Data

Exome

Tim Hague, CTO

Page 2: Considerations for Analyzing Targeted NGS Data Exome
Page 3: Considerations for Analyzing Targeted NGS Data Exome

Exome Analysis

3 sets of full exome sequences for the same individual, targeted by 3 different kits

One set had data problems because reads were from 2 different sequencers

Remaining 2 sets were analyzed both by the customer and by Omixon

Page 4: Considerations for Analyzing Targeted NGS Data Exome

Exome Targets

Illumina TruSeq ~62 Mbp Nimblegen SeqCap EZ Exome ~64 Mbp ~35 Mbp overlap between targets Exons, ORFs and putative translated regions

captured 40M and 37M read pairs resp., 101bp length

Page 5: Considerations for Analyzing Targeted NGS Data Exome

Full Analysis Pipelines In this case we are comparing two full NGS

analysis pipelines Including the mapping/alignment and a

multi-step variant call pipeline

The Omixon pipeline for this analysis uses two variant callers

The Omixon pipeline also uses recalibration and indel realignment

Page 6: Considerations for Analyzing Targeted NGS Data Exome

Finding long indels 1.

Page 7: Considerations for Analyzing Targeted NGS Data Exome

Better indel resolution 1.

Page 8: Considerations for Analyzing Targeted NGS Data Exome

Better indel resolution 2.

Page 9: Considerations for Analyzing Targeted NGS Data Exome

Indel Handling

If indels are important to an analysis then this needs to be taken into account, from the planning stage onwards

BWA does better when indel realignment is used, in combination with paired data

Page 10: Considerations for Analyzing Targeted NGS Data Exome

Less low quality false positives

Page 11: Considerations for Analyzing Targeted NGS Data Exome

Quality and Coverage

Some of these low quality variants can be removed by filtering, after variant call

Quality and coverage cut-offs have to be parameterized properly in the alignment and variant call

Quality recalibration can also help to reduce low quality false positives

Page 12: Considerations for Analyzing Targeted NGS Data Exome

Variations next to coding areas

Page 13: Considerations for Analyzing Targeted NGS Data Exome

Splicing and Promoters

Most of the exon kits also provide variant calls close to the coding regions

These should be included in the analysis if possible

Page 14: Considerations for Analyzing Targeted NGS Data Exome

Less false positives in complex regions 1.

Page 15: Considerations for Analyzing Targeted NGS Data Exome

Less false positives in complex regions 2.

Page 16: Considerations for Analyzing Targeted NGS Data Exome

Less false positives in complex regions 3.

Page 17: Considerations for Analyzing Targeted NGS Data Exome

Less false positives in complex regions 4.Higher coverage.

Page 18: Considerations for Analyzing Targeted NGS Data Exome

Less false positives in complex regions 5.Lower coverage.

Page 19: Considerations for Analyzing Targeted NGS Data Exome

Complex regions

Mismappings due to pseudogenes or repeats – or just complex regions?

Sometime more coverage can actually be bad

Need to watch out for non-specific read mappings (reads mapping to multiple places)

Page 20: Considerations for Analyzing Targeted NGS Data Exome

Regions where both aligners are confused 1.

Page 21: Considerations for Analyzing Targeted NGS Data Exome

Regions where both aligners are confused 2.

Page 22: Considerations for Analyzing Targeted NGS Data Exome

Very Complex Regions

Some regions are extremely difficult to map with any techniques

A different approach may be required to mapping/alignment

A different approach may be required to variant call (local de novo, phasing etc)

Page 23: Considerations for Analyzing Targeted NGS Data Exome

Problems with sex chromosomes

There are may heterozygous calls in the X and Y chromosomes that are certainly false positives or

incorrect calls. This is true for both pipelines, the read specificity and

variant call procedure has to be improved for these chromosomes.

Page 24: Considerations for Analyzing Targeted NGS Data Exome

Summary

These kinds of comparative studies can be useful in analyzing the effectiveness of exome sequencing

Different exome kits can give different results The data analysis and variant call tools chosen for the

analysis can also have a big impact

There is some potential to improve the quality of the customer's exome analysis pipeline