Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner [email protected]...
-
Upload
brianne-greene -
Category
Documents
-
view
215 -
download
1
Transcript of Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner [email protected]...
![Page 1: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/1.jpg)
Introduction to Bioinformatics
Monday, November 15, 2010Jonathan Pevsner
M.E:800.707
![Page 2: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/2.jpg)
• People with very diverse backgrounds in biology
• Some people with backgrounds in computer
science and biostatistics
• Most people (will) have a favorite gene, protein, or disease
Who is taking this course?
![Page 3: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/3.jpg)
• To provide an introduction to bioinformatics with a focus on the National Center for Biotechnology Information (NCBI), UCSC, and EBI
• To focus on the analysis of DNA, RNA and proteins
• To introduce you to the analysis of genomes
• To combine theory and practice to help you solve research problems
What are the goals of the course?
![Page 4: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/4.jpg)
Textbook
The course textbook has no required textbook. I wrote Bioinformatics and Functional Genomics (Wiley-Blackwell, 2nd edition 2009). The lectures in this course correspond closely to chapters.
I will make pdfs of the chapters available to everyone.
You can also purchase a copy at the bookstore, at amazon.com (now $60), or at Wiley with a 20% discount through the book’s website www.bioinfbook.org.
![Page 5: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/5.jpg)
Web sites
The course website is reached via moodle: http://pevsnerlab.kennedykrieger.org/moodle(or Google “moodle bioinformatics”)--This site contains the powerpoints for each lecture, including black & white versions for printing--The weekly quizzes are here--You can ask questions via the forum--Audio files of each lecture will be posted here
The textbook website is:http://www.bioinfbook.orgThis has powerpoints, URLs, etc. organized by
chapter. This is most useful to find “web documents” corresponding to each chapter.
![Page 6: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/6.jpg)
Literature references
You are encouraged to read original source articles(posted on moodle). They will enhance your understanding of the material. Readings are optional but recommended.
![Page 7: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/7.jpg)
Themes throughout the course: the beta globin gene/protein family
We will use beta globin as a model gene/protein throughout the course. Globins including hemoglobin and myoglobin carry oxygen. We will study globins in a variety of contexts including
--sequence alignment--gene expression--protein structure--phylogeny--homologs in various species
![Page 8: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/8.jpg)
Computer labs
There are no computer labs, but the seven weekly quizzes function as a computer lab. To solve the questions, you will need to go to websites, use databases, and use software.
![Page 9: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/9.jpg)
Grading
60% moodle quizzes (your top 6 out of 7 quizzes). Quizzes are taken at the moodle website, andare due one week after the relevant lecture.
Special extended due date for quizzes due immediately after Thanksgiving and the New Year.
40% final exam Monday, January 10 (in class).Closed book, cumulative, no computer,short answer / multiple choice. Past exams will be made available ahead of time.
![Page 10: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/10.jpg)
Google “moodle bioinformatics” to get here;Click “Bioinformatics” to sign in;The enrollment key you need is…
![Page 11: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/11.jpg)
![Page 12: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/12.jpg)
The password to get the book chapter pdf is…
![Page 13: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/13.jpg)
Outline for the course (all on Mondays)
1. Accessing information about DNA and proteins Nov. 15
2. Pairwise alignment Nov. 22
3. BLAST Nov. 29
4. Multiple sequence alignment Dec. 6
5. Molecular phylogeny and evolution Dec. 13
6. Microarrays Dec. 20
7. Genomes Jan. 3
Final exam Jan. 10
![Page 14: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/14.jpg)
Outline for today
Definition of bioinformatics
Overview of the NCBI website
Accessing information: accession numbers and RefSeq
Entrez Gene (and UniGene, HomoloGene)
Protein Databases: UniProt, ExPASy
Three genome browsers: NCBI, UCSC, Ensembl
Access to biomedical literature
![Page 15: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/15.jpg)
• Interface of biology and computers
• Analysis of proteins, genes and genomes using computer algorithms and computer databases
• Genomics is the analysis of genomes. The tools of bioinformatics are used to make sense of the billions of base pairs of DNA that are sequenced by genomics projects.
What is bioinformatics?
![Page 16: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/16.jpg)
On bioinformatics
“Science is about building causal relations between natural phenomena (for instance, between a mutation in a gene and a disease). The development of instruments to increase our capacity to observe natural phenomena has, therefore, played a crucial role in the development of science - the microscope being the paradigmatic example in biology. With the human genome, the natural world takes an unprecedented turn: it is better described as a sequence of symbols. Besides high-throughput machines such as sequencers and DNA chip readers, the computer and the associated software becomes the instrument to observe it, and the discipline of bioinformatics flourishes.”
![Page 17: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/17.jpg)
On bioinformatics
“However, as the separation between us (the observers) and the phenomena observed increases (from organism to cell to genome, for instance), instruments may capture phenomena only indirectly, through the footprints they leave. Instruments therefore need to be calibrated: the distance between the reality and the observation (through the instrument) needs to be accounted for. This issue of Genome Biology is about calibrating instruments to observe gene sequences; more specifically, computer programs to identify human genes in the sequence of the human genome.”
Martin Reese and Roderic Guigó, Genome Biology 2006 7(Suppl I):S1,introducing EGASP, the Encyclopedia of DNA Elements (ENCODE) Genome Annotation Assessment Project
![Page 18: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/18.jpg)
Tool-users
Tool-makers
bioinformatics
public healthinformatics
medicalinformatics
infrastructure
databases algorithms
![Page 19: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/19.jpg)
Three perspectives on bioinformatics
The cell
The organism
The tree of life
Page 4
![Page 20: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/20.jpg)
![Page 21: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/21.jpg)
DNA RNA phenotypeprotein
Page 5
![Page 22: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/22.jpg)
Time ofdevelopment
Body region, physiology, pharmacology, pathology
Page 5
![Page 23: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/23.jpg)
After Pace NR (1997) Science 276:734
Page 6
![Page 24: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/24.jpg)
DNA RNA phenotypeprotein
![Page 25: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/25.jpg)
Growth of GenBank
Year
Bas
e p
airs
of
DN
A (
mil
lio
ns)
Seq
uen
ces
(mil
lio
ns)
1982 1986 1990 1994 1998 2002 Fig. 2.1Page 17
![Page 26: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/26.jpg)
Growth of GenBank + Whole Genome Shotgun(1982-November 2008): we reached 0.2 terabases
Nu
mb
er
of s
eq
uen
ces
in G
en
Ban
k (m
illio
ns)
Ba
se p
air
s o
f DN
A in
Gen
Ba
nk (
bill
ions
) B
ase
pa
irs
in G
en
Ban
k +
WG
S (
billi
ons
)
0
20
40
60
80
100
120
140
160
180
200
1982 1992 2002 2008
Fig. 2.1Page 15
![Page 27: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/27.jpg)
Arrival of next-generation sequencing: In two years we have gone from 0.2 terabases to
71 terabases (71,000 gigabases) (November 2010)
Fig. 2.1Page 15
![Page 28: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/28.jpg)
DNA RNA protein
Central dogma of molecular biology
genome transcriptome proteome
Central dogma of bioinformatics and genomics
![Page 29: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/29.jpg)
DNA RNA
cDNAESTsUniGene
phenotype
genomicDNAdatabases
protein sequence databases
protein
Fig. 2.2Page 18
![Page 30: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/30.jpg)
GenBankEMBL DDBJ
There are three major public DNA databases
The underlying raw DNA sequences are identical
Page 14
![Page 31: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/31.jpg)
GenBankEMBL DDBJ
Housedat EBI
EuropeanBioinformatics
Institute
There are three major public DNA databases
Housed at NCBINational
Center forBiotechnology
Information
Housed in Japan
Page 14
![Page 32: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/32.jpg)
Taxonomy at NCBI:>200,000 species are represented in GenBank
Page 16http://www.ncbi.nlm.nih.gov/Taxonomy/txstat.cgi10/10
![Page 33: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/33.jpg)
The most sequenced organisms in GenBank
Homo sapiens 14.9 billion basesMus musculus 8.9bRattus norvegicus 6.5bBos taurus 5.4bZea mays 5.0bSus scrofa 4.8bDanio rerio 3.1bStrongylocentrotus purpurata 1.4bOryza sativa (japonica) 1.2bNicotiana tabacum 1.2b
Updated Oct. 2010GenBank release 180.0Excluding WGS, organelles, metagenomics
Table 2-2Page 17
![Page 34: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/34.jpg)
Outline for today
Definition of bioinformatics
Overview of the NCBI website
Accessing information: accession numbers and RefSeq
Entrez Gene (and UniGene, HomoloGene)
Protein Databases: UniProt, ExPASy
Three genome browsers: NCBI, UCSC, Ensembl
Access to biomedical literature
![Page 35: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/35.jpg)
National Center for BiotechnologyInformation (NCBI)
www.ncbi.nlm.nih.gov
Page 23
![Page 36: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/36.jpg)
NCBI homepage
Fig. 2.4Page 24
![Page 37: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/37.jpg)
• National Library of Medicine's search service• 20 million citations in MEDLINE (as of 2010)• links to participating online journals• PubMed tutorial on the site or visit NLM:
http://www.nlm.nih.gov/bsd/disted/pubmed.html
Page 23
NCBI key features: PubMed
![Page 38: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/38.jpg)
Entrez integrates…
• the scientific literature; • DNA and protein sequence databases; • 3D protein structure data; • population study data sets; • assemblies of complete genomes
Page 24
NCBI key features: Entrez search and retrieval system
![Page 39: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/39.jpg)
![Page 40: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/40.jpg)
BLAST is…
• Basic Local Alignment Search Tool• NCBI's sequence similarity search tool• supports analysis of DNA and protein databases• 100,000 searches per day
Page 25
NCBI key features: BLAST
![Page 41: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/41.jpg)
OMIM is…
•Online Mendelian Inheritance in Man•catalog of human genes and genetic disorders•created by Dr. Victor McKusick; led by Dr. Ada Hamosh
at JHMI
Page 25
NCBI key features: OMIM
![Page 42: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/42.jpg)
TaxBrowser is…
• browser for the major divisions of living organisms (archaea, bacteria, eukaryota, viruses)• taxonomy information such as genetic codes• molecular data on extinct organisms• practically useful to find a protein or gene from a species
Page 26
NCBI key features: TaxBrowser
![Page 43: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/43.jpg)
Structure site includes…
• Molecular Modelling Database (MMDB)
• biopolymer structures obtained from
the Protein Data Bank (PDB)• Cn3D (a 3D-structure viewer)• vector alignment search tool (VAST)
Page 25
NCBI key features: Structure
![Page 44: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/44.jpg)
Outline for today
Definition of bioinformatics
Overview of the NCBI website
Accessing information: accession numbers and RefSeq
Entrez Gene (and UniGene, HomoloGene)
Protein Databases: UniProt, ExPASy
Three genome browsers: NCBI, UCSC, Ensembl
Access to biomedical literature
![Page 45: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/45.jpg)
Accession numbers are labels for sequences
NCBI includes databases (such as GenBank) that contain information on DNA, RNA, or protein sequences. You may want to acquire information beginning with a query such as the name of a protein of interest, or theraw nucleotides comprising a DNA sequence of interest. DNA sequences and other molecular data are tagged with accession numbers that are used to identify a sequenceor other record relevant to molecular data.
Page 26
![Page 46: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/46.jpg)
What is an accession number?
An accession number is label that used to identify a sequence. It is a string of letters and/or numbers that corresponds to a molecular sequence.
Examples (all for retinol-binding protein, RBP4):
X02775 GenBank genomic DNA sequenceNT_030059 Genomic contigRs7079946 dbSNP (single nucleotide polymorphism)
N91759.1 An expressed sequence tag (1 of 170)NM_006744 RefSeq DNA sequence (from a transcript)
NP_007635 RefSeq proteinAAC02945 GenBank proteinQ28369 SwissProt protein1KT7 Protein Data Bank structure record
protein
DNA
RNA
Page 27
![Page 47: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/47.jpg)
NCBI’s important RefSeq project: best representative sequences
RefSeq (accessible via the main page of NCBI)provides an expertly curated accession number thatcorresponds to the most stable, agreed-upon “reference”version of a sequence.
RefSeq identifiers include the following formats:
Complete genome NC_######Complete chromosome NC_######Genomic contig NT_######mRNA (DNA format) NM_###### e.g. NM_006744Protein NP_###### e.g. NP_006735
Page 27
![Page 48: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/48.jpg)
Accession Molecule Method NoteAC_123456 Genomic Mixed Alternate complete genomicAP_123456 Protein Mixed Protein products; alternateNC_123456 Genomic Mixed Complete genomic moleculesNG_123456 Genomic Mixed Incomplete genomic regionsNM_123456 mRNA Mixed Transcript products; mRNA NM_123456789 mRNA Mixed Transcript products; 9-digit NP_123456 Protein Mixed Protein products; NP_123456789 Protein Curation Protein products; 9-digit NR_123456 RNA Mixed Non-coding transcripts NT_123456 Genomic Automated Genomic assembliesNW_123456 Genomic Automated Genomic assemblies NZ_ABCD12345678 Genomic Automated Whole genome shotgun dataXM_123456 mRNA Automated Transcript productsXP_123456 Protein Automated Protein productsXR_123456 RNA Automated Transcript productsYP_123456 Protein Auto. & Curated Protein productsZP_12345678 Protein Automated Protein products
NCBI’s RefSeq project: many accession number formats for genomic, mRNA, protein sequences
![Page 49: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/49.jpg)
Outline for today
Definition of bioinformatics
Overview of the NCBI website
Accessing information: accession numbers and RefSeq
Entrez Gene (and UniGene, HomoloGene)
Protein Databases: UniProt, ExPASy
Three genome browsers: NCBI, UCSC, Ensembl
Access to biomedical literature
![Page 50: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/50.jpg)
Access to sequences: Entrez Gene at NCBI
Entrez Gene is a great starting point: it collectskey information on each gene/protein from major databases. It covers all major organisms.
RefSeq provides a curated, optimal accession number for each DNA (NM_000518 for beta globin DNA corresponding to mRNA) or protein (NP_000509)
Page 29
![Page 51: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/51.jpg)
From the NCBI homepage, type “beta globin”and hit “Search”
Fig. 2.5Page 28
![Page 52: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/52.jpg)
Fig. 2.5Page 28
Follow the link to “Gene”
![Page 53: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/53.jpg)
Entrez Gene is in the headerNote the “Official Symbol” HBB for beta globinNote the “limits” option
![Page 54: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/54.jpg)
Using “limits” you can restrict your search to human (or any other organism)
![Page 55: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/55.jpg)
By applying limits, there are now far fewer entries
![Page 56: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/56.jpg)
Page 30
Entrez Gene (top of page)
Note that links tomany other HBB database entries are available
![Page 57: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/57.jpg)
Entrez Gene (middle of page): genomic region, bibliography
![Page 58: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/58.jpg)
Entrez Gene (middle of page, continued): phenotypes, function
![Page 59: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/59.jpg)
Entrez Gene (bottom of page): RefSeq accession numbers
![Page 60: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/60.jpg)
Entrez Gene (bottom of page): non-RefSeq accessions(it’s unclear what these are, highlighting usefulness of RefSeq)
![Page 61: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/61.jpg)
Fig. 2.8Page 31
Entrez Protein:accession, organism,literature…
![Page 62: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/62.jpg)
Fig. 2.8Page 31
Entrez Protein:…features of a protein, and its sequence in the one-letter amino acid code
![Page 63: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/63.jpg)
You should learn the one-letter amino acid code!
Name 3-Letter 1-LetterAlanine Ala AArginine Arg R
Asparagine Asn NAspartic acid Asp D
Cysteine Cys CGlutamic Acid Glu E
Glutamine Gln QGlycine Gly G
Histidine His HIsoleucine Ile I
Name 3-Letter 1-LetterLeucine Leu LLysine Lys K
Methionine Met MPhenylalanine Phe F
Proline Pro PSerine Ser S
Threonine Thr TTryptophan Trp W
Tyrosine Tyr YValine Val V
![Page 64: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/64.jpg)
Page 31
Entrez Protein:You can change the display (as shown)…
![Page 65: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/65.jpg)
FASTA format:versatile, compact with one header line
followed by a string of nucleotides or amino acids in the single letter code
Fig. 2.9Page 32
![Page 66: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/66.jpg)
Outline for today
Definition of bioinformatics
Overview of the NCBI website
Accessing information: accession numbers and RefSeq
Entrez Gene (and UniGene, HomoloGene)
Protein Databases: UniProt, ExPASy
Three genome browsers: NCBI, UCSC, Ensembl
Access to biomedical literature
![Page 67: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/67.jpg)
Comparison of Entrez Gene to other resources
Entrez Gene, Entrez Nucleotide, Entrez Protein:closely inter-related
Entrez Gene versus UniGene: UniGene is a database with information on where in a body, when in development, and how abundantly a transcript is expressed
Entrez Gene versus HomoloGene:HomoloGene conveniently gathers
information on sets of related proteins
Page 32
![Page 68: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/68.jpg)
DNA RNA
complementary DNA(cDNA)
protein
UniGeneFig. 2.3Page 22
HomoloGene: an NCBI resource organized by organism to describe where genes are expressed
(i.e. from which library) and how abundantly
![Page 69: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/69.jpg)
HomologoGene: an excellent NCBI resource that conveniently groups homologous eukaryotic genes
(find links from Entrez search engine or Entrez gene)
![Page 70: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/70.jpg)
Outline for today
Definition of bioinformatics
Overview of the NCBI website
Accessing information: accession numbers and RefSeq
Entrez Gene (and UniGene, HomoloGene)
Protein Databases: UniProt, ExPASy
Three genome browsers: NCBI, UCSC, Ensembl
Access to biomedical literature
![Page 71: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/71.jpg)
ExPASy to access protein and DNA sequences
ExPASy sequence retrieval system(ExPASy = Expert Protein Analysis System)
Visit http://www.expasy.ch/
Page 33
![Page 72: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/72.jpg)
UniProt: a centralized protein database (uniprot.org)
This is separate from NCBI, and interlinked.
Page 33
![Page 73: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/73.jpg)
Fig. 2.10Page 34
ExPASy: vast proteomics resources (www.expasy.ch)
![Page 74: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/74.jpg)
Outline for today
Definition of bioinformatics
Overview of the NCBI website
Accessing information: accession numbers and RefSeq
Entrez Gene (and UniGene, HomoloGene)
Protein Databases: UniProt, ExPASy
Three genome browsers: NCBI, UCSC, Ensembl
Access to biomedical literature
![Page 75: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/75.jpg)
Genome Browsers:increasingly important resources
Genomic DNA is organized in chromosomes. Genome browsers display ideograms (pictures) of chromosomes, with user-selected “annotation tracks” that display many kinds of information.
The two most essential human genome browsers are at Ensembl and UCSC. We will focus on UCSC (but the two are equally important). The browser at NCBI is not commonly used.
![Page 76: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/76.jpg)
clickhuman
Ensembl genome browser (www.ensembl.org)
![Page 77: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/77.jpg)
enterbeta globin
![Page 78: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/78.jpg)
Ensembl output for beta globin includes views of chromosome 11 (top), the region (middle), and a detailed view (bottom).
There are various horizontal annotation tracks.
![Page 79: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/79.jpg)
The UCSC Genome Browser:an increasingly important resource
• This browser’s focus is on humans and other eukaryotes
• you can select which tracks to display (and how much information for each track)
• tracks are based on data generated by the UCSC team and by the broad research community
• you can create “custom tracks” of your own data! Just format a spreadsheet properly and upload it
• The Table Browser is equally important as the more visual Genome Browser, and you can move between the two
![Page 80: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/80.jpg)
[1] Visit http://genome.ucsc.edu/, click Genome Browser
[2] Choose organisms, enter query (beta globin), hit submit
Page 36
![Page 81: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/81.jpg)
[3] Choose the RefSeq beta globin gene
![Page 82: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/82.jpg)
[4] On the UCSC Genome Browser:--choose which tracks to display--add custom tracks--the Table Browser is complementary
![Page 83: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/83.jpg)
Example of how to access sequence data:HIV-1 pol
There are many possible approaches. Begin at the mainpage of NCBI, and type an Entrez query: hiv-1 pol
Page 36
![Page 84: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/84.jpg)
11/10
Searching for HIV-1 pol: 150,000 nucleotide, protein hits
![Page 85: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/85.jpg)
Example of how to access sequence data:HIV-1 pol
For the Entrez query: hiv-1 polthere are about 150,000 nucleotide or protein records(and >350,000 records for a search for “hiv-1”),but these can easily be reduced in two easy steps:
--specify the organism, e.g. hiv-1[organism]--limit the output to RefSeq!
Page 37
![Page 86: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/86.jpg)
Searching for HIV-1 pol:using the command hiv-1[organism] limits the
output to just one entry
Try Taxonomy Browser to easily limit your query to your favorite organism(s). Example: NCBI home Taxonomy Taxonomy browser human protein to find a human protein
![Page 87: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/87.jpg)
Entrez Nucleotide features over 360,000 nucleotide entries for HIV-1 (but only one RefSeq for that virus)
![Page 88: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/88.jpg)
Example of how to access sequence data: histone
query for “histone” # results
protein records 104,000RefSeq entries 39,000
RefSeq (limit to human) 1171NOT deacetylase 911
At this point, select a reasonable candidate (e.g.histone 2, H4) and follow its link to Entrez Gene.There, you can confirm you have the right protein.
11-10
![Page 89: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/89.jpg)
Entrez Gene result for a histone
![Page 90: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/90.jpg)
Outline for today
Definition of bioinformatics
Overview of the NCBI website
Accessing information: accession numbers and RefSeq
Entrez Gene (and UniGene, HomoloGene)
Protein Databases: UniProt, ExPASy
Three genome browsers: NCBI, UCSC, Ensembl
Access to biomedical literature
![Page 91: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/91.jpg)
PubMed at NCBIto find literatureinformation
![Page 92: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/92.jpg)
PubMed is the NCBI gateway to MEDLINE.
MEDLINE contains bibliographic citations and author abstracts from over 4,600 journals published in the United States and in 70 foreign countries.
It has >20 million records dating back to 1950s.
Page 38
![Page 93: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/93.jpg)
MeSH is the acronym for "Medical Subject Headings."
MeSH is the list of the vocabulary terms used for subject analysis of biomedical literature at NLM. MeSH vocabulary is used for indexing journal articles for MEDLINE.
The MeSH controlled vocabulary imposes uniformity and consistency to the indexing of biomedical literature.
Page 38
![Page 94: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/94.jpg)
PubMed result for HBB
![Page 95: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/95.jpg)
Use the pull-down menu to access related resources such as Medical Subject Headings (MeSH)
![Page 96: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/96.jpg)
A “how to” pull-down menu links to tutorials
![Page 97: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/97.jpg)
Use “Advanced search” to limit by author, year, language, etc.
![Page 98: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/98.jpg)
PubMed search strategies
Try the tutorial
Use boolean queries (capitalize AND, OR, NOT)lipocalin AND disease
Try using limits (see Advanced search)
There are links to find Entrez entries and external resources
Obtain articles on-line via Welch Medical Library(and download pdf files): http://www.welch.jhu.edu/
Page 39
![Page 99: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/99.jpg)
lipocalin AND disease(504 results)
lipocalin OR disease(2,500,000 results)
lipocalin NOT disease(2,370 results)
1 AND 2
1 OR 2
1 NOT 2
1
1
1
2
2
2
Page 40
![Page 100: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/100.jpg)
WelchWeb is available at http://www.welch.jhu.edu
![Page 101: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/101.jpg)
Welch Medical Library liasons to the basic sciences
WelchWeb is available at http://www.welch.jhu.edu
![Page 102: Introduction to Bioinformatics Monday, November 15, 2010 Jonathan Pevsner pevsner@kennedykrieger.org Bioinformatics M.E:800.707.](https://reader036.fdocuments.us/reader036/viewer/2022070409/56649e9c5503460f94b9d516/html5/thumbnails/102.jpg)
Reminder: Please enroll! Google “moodle bioinformatics” to get here; click “Bioinformatics” to sign in;The enrollment key is…