14 May 2013Ganesha Associates1 Competências Básicas de Investigação Científica e de...
-
Upload
joy-jennings -
Category
Documents
-
view
219 -
download
5
Transcript of 14 May 2013Ganesha Associates1 Competências Básicas de Investigação Científica e de...
14 May 2013 Ganesha Associates 1
Competências Básicas de Investigação Científica e de Publicação
Lecture 3: Searching the Literature
Types of scientific output
• Abstracts• Primary journal articles
– peer-reviewed interpretations of original research
• Reviews• Book chapters, monographs• Conference proceedings• Lectures, seminars• Sequences, data sets• Patents, other forms of intellectual property• Blogs, tweets…
14 May 2013 Ganesha Associates 2
Usage of output differs
5 July 2012 Copyright: Ganesha Associates 2012 3
Some sources of scientific content• Google• PubMed/Medline (NLM)• Scopus (Elsevier)• Web of Science (Thomson Reuters) • Google Scholar• PubMed Central, PubMed Central Europe• SciELO, Biblioteca Virtual em Saude• Science Direct, Ovid, SpringerLink, Wiley Online Library,
BiomedCentral, Public Library of Science, SWETSwise…• CAPES Portal de Periódicos
14 May 2013 Ganesha Associates 4
Each source is different
• Free– Google, Google Scholar, Pubmed Central
• Subscription– Scopus, ScienceDirect
• Abstracts and citations only– PubMed, Web of Science
• Full text, single publisher– SpringerLink
• Full text, many publishers– Pubmed Central, SwetsWise Online Content
Classify sources of content
Abstract only
Full text
Free access Subscription
14 May 2013 Ganesha Associates 7
You can get access if…
• The journal is subscribed to by CAPES• You have a personal subscription• The journal is of the ‘Open Access’ type
– Note: some journals only make their content ‘Open Access’ after 6 or longer months. Some journals contain a mixture of OA and non-OA articles. See http://europepmc.org/journalList for more info.
• Journals in the ‘red’ categories are available anywhere.• Most journals subscribed to by CAPES will be available from
more than one source.• CAPES journals are only available from computers within the
University network unless you have remote access privileges.
14 May 2013 Ganesha Associates 8
So which sources should I use ?
• No single source contains all of the articles relevant to your research
• Google has the broadest coverage, but not all of the documents you find will be peer-reviewed articles
• Scopus, WoS and PubMed give you the best balance between quality and quantity, and, in theory, should link to all the content subscribed to by CAPES, plus OA content.
Ganesha Associates 924 August 2012
So usually you will visit several sources to find the information you are looking for
?
GoogleScopusWeb of Science PubMedScielo
HighWireScienceDirect
Springer Link
National Literature
CAPESPortal
OA: BMCOr PLoS
Other Databases,e.g. NCBI
Components of a bibliographic database
• Content such as abstracts and full-text articles [or a pointer to where these may be found]
• Metadata [data about data]• Index• Search engine• Ranking/relevance algorithm• Plus many additional features
14 May 2013 Ganesha Associates 10
14 May 2013 Ganesha Associates 11
Content (Basic PDF)
14 May 2013 Ganesha Associates 12
Content (HTML)
14 May 2013 Ganesha Associates 13
Content (Page source)
14 May 2013 Ganesha Associates 14
Content (metadata)
14 May 2013 Ganesha Associates 15
Sources of article metadata
• Journal name, publisher, ISSN• Date of publication, volume and page
numbers• Document object identifier [DOI]• Article title• Authors names• Address, affiliation, contact details• Article section identifiers• Sources of funding• Semantic tagging, e.g. protein name
Ganesha Associates 16
The basis of search: Indexing• The purpose of an index is to optimize speed and performance
in finding relevant documents for a search query.
• Without an index, the search engine would have to scan every document in the corpus, which would require considerable time and computing power.
• Metadata helps the indexing algorithm to select different classes of terminology from which to make an index, so a search can be carried out on just the authors names, for example
24 August 2012
Search: how the result list is ranked
• Date of publication• Relevance– Frequency with which search terms occur in the
document– Proximity of search terms
• Google’s PageRank algorithm uses "link popularity”- a document is ranked higher if there are more links to it
14 May 2013 Ganesha Associates 19
The question behind the query
• Search engines think in terms of words, but users think in terms of sentences!– How do you spell Bousfield?– What do we know about BRCA1?– Given these symptoms, what is the most likely
diagnosis?– What are the side effects of aspirin?– Has this chemical structure been synthesized before?
• “Cancer causes X” vs. “Y causes cancer”
Ganesha Associates 2224 August 2012
What real queries look like - Google
• pharmacogenomics and disorders• bacteria growth casein media effect• waal pseudomonas• TRPM2 PCR mouse• Chitinases in carnivorous plants• glycerophosphoinositol 4-phosphate• Dai N, Gubler C, Hengstler P, Meyenberger C,
Bauerfeind P. Improved capsule endoscopy after bowel preparation. Gastrointest Endosc 2005;61(1) 28-31.
Ganesha Associates 2424 August 2012
Query changes people actually make
• Query series 1– latrunculin– latrunculin fm3a cell arrest– latrunculin fm3a arrest– latrunculin fm3a – latrunculin FM3A
• Query series 2– cytokinin signalling in arabidopsis– "cytokinin signalling in arabidopsis"– cytokinin delta– spindly arabidopsis
• Results– Remember to look beyond the first page. Compare the results of
Query 1 in PubMed and Google (add the term PubMed)
Improving search accuracy
• Wild card characters– "a * saved is a * earned"
• Operators– jaguar speed -car– Pandas -site:wikipedia.org– “ribosome”
• Synonyms– MeSH terms
• Boolean terms– AND, OR, NOT
• Faceted search– GO terms
Anatomy of a query - Pubmed
• invasive fungal infections in young children• invasive[All Fields] AND ("mycoses"[MeSH
Terms] OR "mycoses"[All Fields] OR ("fungal"[All Fields] AND "infections"[All Fields]) OR "fungal infections"[All Fields]) AND ("Young Child"[Journal] OR ("young"[All Fields] AND "children"[All Fields]) OR "young children"[All Fields])
14 May 2013 Ganesha Associates 26
So…
• Using the same search terms will produce different results in different databases because:– Content different– Preparation of search terms will be different, e.g.
only Pubmed uses MeSH terms– Indexing process, implementation of stemming,
removal of stop words will be different– Ranking algorithms will be different
Quick tour
Break
Ganesha Associates 57
Other types of database• Some databases contain mainly text, but others contain image, sequence
or structural data
• The technologies required to search and retrieve these different data types are very different.
• There is a growing amount of information in publicly available databases.
• For example, in 2013 the Nucleic Acids Research journal online Molecular Biology Database Collection listed 1512.
• The National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute(EBI) host some of the most important databases used for biomedical research.
24 August 2012
Ganesha Associates 58
Linking different data types is a challenge
24 August 2012
Gene ExpressionWarehouse
ProteinDisease
SNP
Enzyme
Pathway
Known Gene
SequenceCluster
Affy Fragment
Sequence
LocusLink
MGD
ExPASySwissProt
PDBOMIM
NCBIdbSNP
ExPASyEnzyme
KEGG
SPAD
UniGene
Genbank
NMR
Metabolite
Ganesha Associates 59
Databases available at NCBI
24 August 2012
Ganesha Associates 63
Other ways to search – BLAST, PubChem, UCSC Genome Browser
24 August 2012
>DinoDNA from JURASSIC PARK p. 103 nt 1-1200GAATTCCGGAAGCGAGCAAGAGATAAGTCCTGGCATCAGATACAGTTGGAGATAAGGACGGACGTGTGGCAGCTCCCGCAGAGGATTCACTGGAAGTGCATTACCTATCCCATGGGAGCCATGGAGTTCGTGGCGCTGGGGGGGCCGGATGCGGGCTCCCCCACTCCGTTCCCTGATGAAGCCGGAGCCTTCCTGGGGCTGGGGGGGGGCG
By sequence – BLAST:
By structure – PubChem:
Ganesha Associates 6424 August 2012
Example of BLAST search results
Ganesha Associates 65
PC Compound Record
24 August 2012
Ganesha Associates
Learning points
13/08/2013
• Google is a good place to start• Learn to use several information resources• Modify your search terms during the
course of a search session• Understand how the results are ranked
and don’t just look on the first page