Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...
Transcript of Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...
![Page 1: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/1.jpg)
Joyce Njoki Nzioki BecA-‐ILRI Hub, Nairobi, Kenya h;p://hub.africabiosciences.org/ h;p://www.Ilri.org/ [email protected]
Introduc)on to CLC Main Workbench ILRI Training / EthopiaTraining
27, August 2015
![Page 2: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/2.jpg)
Ge#ng started with CLC
CLC Main Workbench is a so7ware package that supports analysis of sequence data Func)ons include:
ü Sequence assembly ü Primer design ü Alignment and Phylogeny ü Blast / Database searches ü Addi)onal plugins
![Page 3: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/3.jpg)
![Page 4: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/4.jpg)
Ge#ng around in CLC
ü CLC has a has a main menu with features available as shown above
ü File menu has opAons to manipulate data ü The most useful menu is the TOOLBOX that has various analysis opAons to manipulate data
![Page 5: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/5.jpg)
Sequenced Data
ü You can view your sequences data by opening the sequence files (trace files) extension .ab1 /.abi
ü NOTE: In order to obtain good sequencing results, you MUST download and examine your sequencing chromatogram. If you are using just the text data, you could be publishing data that is completely invalid!
ü So7ware used for viewing include: CLC bio, BioEdit, TracerView
![Page 6: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/6.jpg)
ManipulaAng Data in CLC
Crea)ng folders
ü It is best to organize data in the navigaAon area in folders.
ü To create a folder go to File | New | Folder ü Or click on the new folder icon on the tool bar ü Name the folder
![Page 7: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/7.jpg)
![Page 8: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/8.jpg)
![Page 9: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/9.jpg)
![Page 10: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/10.jpg)
ManipulaAng Data in CLC
Impor)ng Data ü Allows you to bring sequenced data into CLC from where it is stored on your computer.
ü Go to File | import or click the import icon on the tool bar.
ü Navigate to where your sequences are stored on your computer
ü Select the file format to import in the case of sequenced data select Trace files (.abi/.ab1/.scf/.phd)
ü Select the folder to save the sequences to
![Page 11: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/11.jpg)
![Page 12: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/12.jpg)
![Page 13: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/13.jpg)
![Page 14: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/14.jpg)
Trouble shoot sequenced data “the good”
• Good quality peaks are smooth, disAnct or well formed, evenly spaced and with li]le baseline noise
![Page 15: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/15.jpg)
Trouble shoot sequenced data “the bad”
ü A failed sequencing reaction: the chromatographs look messy, many ‘N’s in the sequence.
ü Non-usable sequenced data: can be due to low concentration of DNA template, none or wrong primer added.
![Page 16: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/16.jpg)
Trouble shoot sequenced data “double peaks”
ü Double peaks: mulAple peaks of same or different length at the same posiAon; this is due to clone contaminaAon, heterozygous posiAon (SNP), contaminated PCR reacAon
ü Can be corrected using degenerate codes; N (a c t g ) , Y (c t ), R (a g)
![Page 17: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/17.jpg)
Trouble shoot sequenced data “stu]ering”
ü Sequence data quality is poor a7er stretches of 7 or more nucleoAdes of the same base. This is due to polymerase slippage during DNA synthesis, it’s a limitaAon of sanger
![Page 18: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/18.jpg)
Trouble shoot sequenced data “drop off”
ü The DNA sequence suddenly stops or peak intensely drops off substanAally. This is caused by secondary structures like hairpin loops or GC/GT rich regions.
![Page 19: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/19.jpg)
Trouble shoot sequenced data “mis-‐called bases”
ü NucleoAdes that have been erroneously inserted into a sequence will appear oddly spaced relaAve to their neighboring bases
![Page 20: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/20.jpg)
Trouble shoot sequenced data “mis-‐called bases”
ü NucleoAdes that have been erroneously inserted into a sequence will appear oddly spaced relaAve to their neighboring bases
![Page 21: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/21.jpg)
Trim 3’ and 5’ ends At 5’ end sequences don’t start of very clearly till about bases 20-30 bases. Due to non-fully activated taq polymerase / poor termination near the primer
![Page 22: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/22.jpg)
Trim 3’ and 5’ ends At 5’ end towards the end base 500-800 the quality will degrade as well. due to diminishing bases.
![Page 23: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/23.jpg)
Trimming sequences ü After carefully scrutinizing your sequence you
can determine where your reliable sequence starts and ends.
ü You can delete / or trim the unreliable sequences from each end of your sequence file.
ü As a gel processes it looses resolution and the reads become more erroneous. Trim sequences when the errors become too frequent for your purpose
![Page 24: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/24.jpg)
Quality Control using CLC ü The first step in sequence analysis is to check the quality
of reads and trim sequences where need be to eliminate poor quality or vector contamination.
ü When the trimming is done the parts of the sequences that are trimmed are not actually removed but trim annotations are saved to the sequences. These annotated sections are ignored in further analysis.
![Page 25: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/25.jpg)
Assemble sequence
Sequence assembly refers to merging and aligning fragment of a much longer DNA sequence in order to reconstruct the much longer DNA sequence
I. Reference assembly – reference guided assembly.
II. De novo assembly – assembling without the aid of a reference genome.
![Page 26: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/26.jpg)
De novo assembly
ü In most cases forward and reverse primers are used, hence you sequence both forward and reverse sequences.
ü Assembling the two sequences aligns the two sequences at they point the overlap to get a conAguous sequence called a conAg.
![Page 27: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/27.jpg)
![Page 28: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/28.jpg)
Conflicts The example shows a conflict in which the forward strand show base call “A” and reverse strand shows a “gap”
F
R
![Page 29: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/29.jpg)
Resolving conflicts ü We assess the quality of reads at this position. The
reverse sequence has low quality of chromatographs (this is often the case towards the ends of the sequence). However the forward strand clearly has good quality peaks and can be trusted.
F
R
![Page 30: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/30.jpg)
Resolving conflicts ü Other conflicts may
occur between two nucleotides, judgment on how to resolve such conflicts should be made based on:
![Page 31: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/31.jpg)
Resolving conflicts Other conflicts may occur between two nucleotides, judgment on how to resolve such conflicts should be made based on: ü Quality of reads on both
strands (take data from the most consis tent sequence)
![Page 32: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/32.jpg)
Resolving conflicts Other conflicts may occur between two nucleotides, judgment on how to resolve such conflicts should be made based on: ü Quality of reads on both
strands (take data from the most consistent sequence)
ü Two differing bases may be picked on either sequences because it is genuinely a SNP position so judgment should be based on quality of reads but also background knowledge on the sequences been analyzed.
![Page 33: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/33.jpg)
Consensus sequences Once you have assembled and resolved conflicts you can extract a consensus sequence that is used in further analysis
![Page 34: Introduc)on*to*CLC*Main*Workbench* ILRI*Training ...](https://reader031.fdocuments.us/reader031/viewer/2022012510/618784ca6192e107fe34651f/html5/thumbnails/34.jpg)
The End