Search engines: shapes and sizes - German Cancer … · Search engines: shapes and sizes ... •...

7
1 7KH ,QWHUQHW HUSAR The Internet Search engines Biosci Newsgroups Important entry sites Link collections Introduction Comparison HUSAR / Internet HUSAR Introduction All the indicated links can be found at: genome.dkfz-heidelberg.de It is impossible to present all the resources on the entire WWW Interesting sites are added every week Our object for this session • Not primarily where to get information but: How to get to information (search strategies) The ‘best’ method doesn’t exist; therefore we present a personal view ! As the internet changes and grows, many interesting sites may be boring tomorrow HUSAR Several basic sources of information Search engines Metasearch engines Search engines Metasearch engines Homepages / portals Homepages / portals Medline Electronic journals Medline Electronic journals Newsgoups (BIOSCI) Newsgoups (BIOSCI) White lists White lists HUSAR The internet is a rich source of information Search engines Metasearch engines Search engines Metasearch engines Medline Electronic journals Medline Electronic journals Newsgoups (BIOSCI) Newsgoups (BIOSCI) I need info on Dr. Complexname. I need info on baldness. How do I purify DNA from hair ? Is there a database of hair growth related proteins ? I need info on Dr. Smith, member of National Baldness Society. I need info on baldness related diseases/syndromes ... But you have to combine the right question with the right source ! Homepages / portals Homepages / portals I need info on Dr. Smith, working at Baldness University White lists White lists HUSAR The Internet Search engines Biosci Newsgroups Important entry sites Link collections Introduction Comparison HUSAR / Internet HUSAR

Transcript of Search engines: shapes and sizes - German Cancer … · Search engines: shapes and sizes ... •...

1

7KH ,QWHUQHW

HUSAR

The Internet

Search engines

Biosci Newsgroups

Important entry sites

Link collections

Introduction

Comparison HUSAR / Internet

HUSAR

Introduction

All the indicated links can be found at: genome.dkfz-heidelberg.de

It is impossible to present all the resources on theentire WWW

Interesting sites are added every week

Our object for this session• Not primarily where to get information but:• How to get to information (search strategies)

The ‘best’ method doesn’t exist; therefore we present a personal view !

As the internet changes and grows, many interesting sites may be boring tomorrow

HUSAR

Several basic sources of information

Search enginesMetasearch engines

Search enginesMetasearch engines

Homepages / portalsHomepages / portalsMedlineElectronic journalsMedlineElectronic journals

Newsgoups (BIOSCI)Newsgoups (BIOSCI)

White listsWhite lists

HUSAR

The internet is a rich source of information

Search enginesMetasearch engines

Search enginesMetasearch engines

MedlineElectronic journals

MedlineElectronic journals

Newsgoups (BIOSCI)Newsgoups (BIOSCI)

I need info on Dr. Complexname.I need info on baldness.

How do I purify DNA from hair ?Is there a database of hair growthrelated proteins ?

I need info on Dr. Smith,member of National BaldnessSociety.I need info on baldness relateddiseases/syndromes

... But you have to combine the right question with the right source !

Homepages / portalsHomepages / portals

I need info on Dr. Smith, working at Baldness UniversityWhite listsWhite lists

HUSAR

The Internet

Search engines

Biosci Newsgroups

Important entry sites

Link collections

Introduction

Comparison HUSAR / InternetHUSAR

2

Search engines: shapes and sizes

• AltaVista and Northern Light are two of the largest search engines on the web.• FAST Search aims to index the entire web.• Excite is a medium-sized index but uses concept searching.• Companies can pay money to GoTo to be placed higher in the search results.• Google is a search engine that makes use of link popularity to rank web sites.• Yahoo is the largest human-compiled directory to the web, employs 150 editors• Specialized search engines: Biofinder, www.biologie.de, BioHunt, Pasteur NetBook

Multiple search engines query several other search engines in parallel• Examples: Metacrawler, DogPile, MetaFind, Cyber 411, Savvysearch

cuiwww.unige.ch/meta-index.html www.monash.com/spidap4.htmlwww.library.carleton.edu/staff/terry/websearch/

cuiwww.unige.ch/meta-index.html www.monash.com/spidap4.htmlwww.library.carleton.edu/staff/terry/websearch/

HUSAR

Do not rely on just one search engine

TMHMM Krogh TMHMM HUSAR Venter

&Krogh

Yahoo 15 2/0 4 5/0 4/0

FAST 99 23870/1 23 3965/0 56000/>1

Altavista 41 19186/? 19257/15 3946/>1 26000/>1

Excite 40/30 90/2 50/2 60/2 >150/3

Metacrawler 20 44/1 11 21/1 30/3

Total hits: blue Relevant hits: redSearch engines may employ different AND / OR rules

HUSAR

Comparison of search engines

NorthernLight AltaVista ExciteINKtomi GOOgle InfoSeek Lycos

YaHoo MicroSoft NetScapehttp://searchenginewatch.com

http://searchenginewatch.comhttp://searchenginewatch.com

HUSAR

The Internet

Search engines

Biosci Newsgroups

Important entry sites

Link collections

Introduction

Comparison HUSAR / InternetHUSAR

Usenet Biosci Newsgroups

Disadvantages• traffic can be too high or too low• resources are scientists• spam !

Objectto organize discussions on

a large variety of topics

Advantages• simple to complex questions

• resources are scientists• Netscape and newslist format

HUSAR

Usenet Biosci Newsgroups

HUSAR

3

Usenet Biosci Newsgroups

Access over a newsreader(e.g. Pine) is also very

convenient. Mailing lists orreading by Deja Vu is also

possible.

Instructions on how toinstall Usenet newsgroups

are provided bywww.bio.net

HUSAR

The Internet

Search engines

Biosci Newsgroups

Important entry sites

Link collections

Introduction

Comparison HUSAR / InternetHUSAR

Useful link collections

Many, many links for molecular biology.Internet problem: last update 1996

Many, many links. Highly scientificHUSAR

Many links to a wide variety of databases.

The Internet

Search engines

Biosci Newsgroups

Important entry sites

Link collections

Introduction

Comparison HUSAR / InternetHUSAR

EMBL / EBI

Keywords: molecularbiology

• Home of databasesEMBL, TREMBL andSwissprot

• Mitochondrial database• Ligand/Receptor

database• Home of European

Drosophila GenomeProject and Flybase

• Original home of SRS• Macromolecule structure• Large array of

downloadable software

HUSAR

EMBL / EBI

• Proteomics

• regular newsletter aboutthe EBI andBioinformatics

• http://industry.ebi.ac.uk/Datamining, EST´s,gene prediction, Java,microarrays, sequenceanalysis, visualisationand Web technology.

Database of databases

HUSAR

4

EBI´s Biocatalog

HUSAR

http://www.sanger.ac.uk

Keywords: large scale sequencing and analysisThis includes major databases and analysis software

Home of Pfam(Proteins Families Database of alignments and HMMs)

Home of AceDB(managing of genome project data)

and EMBOSS(The European Molecular Biology open software suite)

An abundance of tools, e.g.Victor Solovyev‘s gene prediction software

HUSAR

NCBI

• ENTREZ search andretrieval system

• Pubmed• home of BLAST• home of GENBANK• Unigene database• COGs (cluster of

genomic groups)• dbSNP Single

NucleotidePolymorphismsdatabase

• 600+ genome maps• Tools such as ORF

Finder and e-PCR

Keyword: major USsite for sequence analysis

HUSAR

NIH

Keywords: research, funding,USA science politics

• 25 separate institutes• huge amount of data

HUSAR

NIH• Local search engine• Still difficult to get to relevant data

HUSAR

The Institute for Genome Research

Genomes databases. E.g.:• >20 microbes• Parasites: Trypanosoma

brucei and Plasmodiumfalciparum

• Human,• Arabidopsis

Keyword: genome projects

http://www.tigr.org/tdb/

Abundant software, including

A system for finding genes in microbial DNA

MUMmer for aligning whole genome sequences

Sequence clean-up program HUSAR

5

Institut Pasteur: Bio Netbook

• The Bio Netbook is a search engine especially designed for biologists• Its index contains only biological expressions (2945)• Growing database• The homepage www.pasteur.fr contains a large amount of additional information HUSAR

GenomeNet

KEGG• Metabolic pathways• Regulatory pathways• Disease Catalogs, Cell Catalogs• Molecule Catalogs; compoundsand enzymes• Gene Catalogs• Genome Maps• Gene Expression Profiles• Computational Tools• Links to other pathway andcompound sites

Keywords: metabolic pathways / proteomics / metabolomics

HUSAR

GenomeNet / KEGG

Metabolic PathwaysGraphical pathway maps and ortholog grouptablesMaps are fully interactive

Regulatory Pathways

HUSAR

GenomeNet / KEGG

Gene Expression ProfilesStill preliminary characterClicakable signals allow identification of enzyme

HUSAR

More about proteomics

Gene Expression ProfilesStill preliminary characterhttp://bodymap.ims.u-tokyo.ac.jp

HUSAR

ExPASy

Many applications

High qualitysearch engine forbiologists

Largest collection of biologylinks on the WWW

(few outdated)

Keywords: Proteins /proteomics / applications

HUSAR

6

ExPASy

HUSAR

ExPASy

Software for 2D analysis

Swiss-PdbViewer is anapplication that providesa user friendly interface allowingto analyse several proteins at thesame time.

SWISS-MODEL, AnAutomatedComparative ProteinModelling Server

HUSAR

Pubcrawler

• It goes to the library. You go to the pub.• Automatic system which searches PubMed or other databases as oftenas you want with your keywords or sequences• Similar systems exist as well, links are indicated on the PubCrawlerhomepage

HUSAR

MIPS

• Keyword: Proteins and more...• Databases of proteins (Protfam), RNAs, mitochondrial sequences• Genome projects of human, yeast and Arabidopsis• Pathways, Proteomics• Yeast ORFs and genes• Small but comprehensive link list• An alert utility sends you once per week, via email, new database entries related to your field of study.• ORPHEUS is a software system for gene prediction in complete bacterial genomes and large genomic fragments.

HUSAR

IMB Jena

Keywords: biotech andmolecular biology

• Many useful links, up to date• Tools, databases, services

HUSAR

HUSAR

• Sequence Retrieval System• GDB• OMIM• AceDB• Genecards is mirrored at

the DKFZ• FAQ, Bioinformatics

information• Link list (>200 links)• Several free tools (e.g.

Genscan)

• ... and the HUSAR package

Keyword: Sequence analysis

HUSAR

7

The Internet

Search engines

Biosci Newsgroups

Important entry sites

Link collections

Introduction

Comparison HUSAR / Internet

HUSAR

Sequence analysis: Internet vs. HUSAR

The Internet HUSARNumber of applications many >250

Up-to-date few allComprehensive no yes

Databases many >90Speed low highSecurity low highData storage none 40MBHandling of multiple or large files bad (copy/paste) goodBatch job utilities mostly absent yesUser support low highDevelopment of customized tools no yesTraining low highBug removal slow fastCosts no low

HUSAR

Conclusion

You can always contact us at:

if you have difficulty locating the information that you need.

You can always contact us at:

if you have difficulty locating the information that you need.

genome @ dkfz-heidelberg.degenome @ dkfz-heidelberg.de

The internet contains abundant information, the importantthing is to use clever strategies to find it.

HUSAR