LSM3241: Bioinformatics and Biocomputing Lecture 2: Bioinformatics of viral genome Prof. Chen Yu...
-
date post
23-Jan-2016 -
Category
Documents
-
view
216 -
download
0
Transcript of LSM3241: Bioinformatics and Biocomputing Lecture 2: Bioinformatics of viral genome Prof. Chen Yu...
LSM3241: Bioinformatics and BiocomputingLSM3241: Bioinformatics and Biocomputing
Lecture 2: Bioinformatics of viral genomeLecture 2: Bioinformatics of viral genome
Prof. Chen Yu ZongProf. Chen Yu Zong
Tel: 6874-6877Tel: 6874-6877Email: Email: [email protected]@nus.edu.sghttp://xin.cz3.nus.edu.sghttp://xin.cz3.nus.edu.sg
Room 07-24, level 7, SOC1, Room 07-24, level 7, SOC1, National University of SingaporeNational University of Singapore
22
Resource of Viral GenomesResource of Viral GenomesNCBI Genome Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome
33
Resource of Viral GenomesResource of Viral GenomesNCBI Genome Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome
2,226 entries of viral genomes (1,524 distinct virus strains) in the database. Early 2005 figure: 1,250 entries and 1,022 distinct
1,193 entries of complete viral genome. Early 2005 figure: 900
44
Resource of Viral GenomesResource of Viral GenomesNCBI Genome Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome
12 entries of coronavirus genomes (8 in early 2005)
16 entries of influenza H5N1 genomes
55
Resource of Viral GenomesResource of Viral GenomesNCBI Genome Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome
Information of viral genomes in the database can also be retrieved by clicking the viruses link:
Click Here
66
Resource of Viral GenomesResource of Viral GenomesNCBI Genome Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome
List of viral genomes: (1,927 entries in Jan 2006, 1,461 in Jan 2005)
77
Resource of Viral GenomesResource of Viral GenomesNCBI Genome Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome
Viral taxonomy groups:
88
Resource of Viral GenomesResource of Viral GenomesNCBI Genome Database http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=genome
Viral genome list:
99
Resource of Viral GenomesResource of Viral GenomesViral genome list:
1010
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesViral name link:
Viral genome link
All entries
1111
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesViral protein link:
Limit to title search
1212
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesSARS coronavirus PP1ab PID link. It gives multiple entries from difference strains or from related species
Viral strain
1313
Different strains of SARS coronavirusDifferent strains of SARS coronavirus
1414
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesNote: Viral polyprotein is not a single protein, it is a combination of several proteins. Information about these proteins can be difficult to read
Suggestion: Looking into a latest NCBI entry of the same virus from a reputable research group
1515
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesSARS coronavirus unknown sars3a PID link:
1616
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesAlternative way to find SARS coronavirus genome. Look for the latest entry with complete genome and good functional annotation. Not all entries have these.
1717
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesThe latest good entry: AY572038 civet020 SARS coronavirus (In Jan 2005 AY310120 SARS coronavirus FRA), complete genome
1818
SARS Coronavirus GenomeSARS Coronavirus Genome
You are expected to find the info about each gene (genome location, sequence, function)
1919
Function of SARS Coronavirus GenesFunction of SARS Coronavirus Genes
2020
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?
Source 1: mat_peptide
Protein name
2121
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?
Source 1:
mat_peptide
2222
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?
Putative 3C-like protease mat_peptide link:
Protein name
Protein function
2323
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?
Source 2: CDS
Protein name
2424
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?
Source 2:
CDS
2525
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?
Source 2:
CDS
2626
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?
Source 2:
CDS
2727
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesWhere to find the proteins in the genome entry?
Nucleocapsid protein protein_id link:
Protein name
2828
Bioinformatics of Viral GenomesBioinformatics of Viral Genomes
How to find the name or function of a putative
protein in a genome?
• Medline keyword search
• Google search
2929
Bioinformatics of Viral GenomesBioinformatics of Viral Genomes
What if the function of a putative protein is unknown?
• Sequence alignment (BLAST, PSI-BLAST). This will be further discussed in lecture 4.
• Motif analysis (Conduct a PROSITE motif search)
• If sequence analysis fails or in doubt, try machine learning method (SVMProt , Nucleic Acids Res., 31: 3692-3697; ProtFun , Bioinformatics, 19:635-642). This will be studied in lecture 5.
3030
Bioinformatics of Viral GenomesBioinformatics of Viral Genomes
Drug design:
• Step 1: Finding the right target in the genome
• A key protein involved in viral cycle (stop the disease process)
• Different from human proteins (reduce side-effects)
• Step 2: Finding or making a chemical agent to stop the protein
• In majority of cases: protein inhibitors
• Step 3: Test and clinical trials
3131
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesSARS Drug design:
The target: 3C like protease
3232
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesSARS Drug design:
• Inhibitor design: Finding inhibitors of similar proteins, such as those of the same name (3C like proteases or 3C proteases of other species), may offer clues to inhibitor design.
Search from NCBI
3333
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesSearch from NCBI finds 19 references.
3434
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesCheck each abstract to find the name of one or more inhibitors.
Be prepared to read the full paper to find inhibitors
3535
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesMake sure the paper talks about the inhibitors of the right protein.
This one actually talks about inhibitors of protease family, thus may
not necessarily be suitable for SARS 3C like protease
3636
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesSARS Drug design:
• Inhibitor design: Finding inhibitors of similar proteins, such as those of the same name (3C like proteases or 3C proteases of other species), may offer clues to inhibitor design.
Search from Google
3737
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesSearch from Google finds numerous entries
3838
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesCheck each entry to find the name of one or more inhibitors.
Be prepared to read the full paper to find inhibitors
3939
Bioinformatics of Viral GenomesBioinformatics of Viral GenomesDesign of SARS 3C like protease inhibitors
using rhinovirus 3C like protease inhibitors as templates
4040
Summary of Today’s lectureSummary of Today’s lecture
• Genome database at NCBI• Viral genomes
– SARS coronavirus genome as an example
• Finding proteins from a genome• Therapeutic target identification from a genome and
inhibitor design