Search Engines
-
Upload
grady-tyler -
Category
Documents
-
view
30 -
download
2
description
Transcript of Search Engines
19.04.23 Thomas Haidlas 2
Introducing
Directories, Meta-Searchengine How search engines work What influences the ranking
19.04.23 Thomas Haidlas 3
Directories
hand-constructed hierarchy of topics (e.g. Yahoo!)
use human editors for page selection, indexing and classification
Covers a small part of the web Small updatability No ranking
19.04.23 Thomas Haidlas 4
Directories II
No searching across the index Searching across the reviews Sometimes partnership with search engines
to increase coverage
19.04.23 Thomas Haidlas 5
Meta-Searchengine
Rare keyword requests require use of more than one web search engine
Submit the same query parallel to many engines Duplicated entries are eliminated The results are shown in uniform format No harvesting or indexing
19.04.23 Thomas Haidlas 7
Harvesting
programs (robots, gatherer or crawler )visit web sites and gather the web pages for indexing
Start with an initial page Follows hyperlinks (<a href=…>) Sometimes, more then 2 sub-levels are visited These programs are started periodically
19.04.23 Thomas Haidlas 8
Harvesting II
Problems: Links aren‘t found in
Frames Imagemaps
Many robots are started by a search engine
=> traffic
19.04.23 Thomas Haidlas 9
Robot Exclusion
Two Methods: Meta-Tags:<meta name="robots" content=„noindex,nofollow"> robots.txt:
User-agent: Scooter
Disallow: /privat/geht_dich_gar_nix_an.html
Allow: /allesOffen
19.04.23 Thomas Haidlas 10
Robot Exclusion II
robots.txt (Example 2):User-agent: *
Allow: /allesOffen
19.04.23 Thomas Haidlas 11
Indexing
Indextable gets the harvesting-resuls Indextable includes keywords Table is located in main-mamory => fast
access
19.04.23 Thomas Haidlas 12
Analysing Requests
Comparison between searchstring and index-table The searchstring consists of a word:
=> easy processing The search word consists of truncation or booleans:
=> complex processing If the searchstring in the index is discovered, the
side is taken up to the hit-list
19.04.23 Thomas Haidlas 13
Ranking
influences on the ranking: How many keywords are found keyword-frequency keywords-position:
Domain/URLDocumentname
19.04.23 Thomas Haidlas 14
Ranking II
HeadlineEarly in the textMeta-Tags
Ranking for cash Page Rank Clicking frequency/ Hit Popularity Engine
19.04.23 Thomas Haidlas 15
Ranking for cash
Capitalism principle Paying money => high ranking-level Contents are not relevant additional incomes
19.04.23 Thomas Haidlas 16
Ranking for cash II
not independently in the employment Mostly used by e-commerce-companies Second method:
pay for faster indexing time
19.04.23 Thomas Haidlas 17
Page Rank (Google)
Evaluation through internet-community (web-admins)
Realtion between quality of a page and number of links that point to it
Links of the popular web-sites are regarded as better
19.04.23 Thomas Haidlas 18
Page Rank (Google) II
Disadvantage: new web-sites have a bad ranking Querys with many boolean-connections
and keywords are not easy to process
19.04.23 Thomas Haidlas 19
Hit Popularity Engine
index already exists and is pre-sorted A click on a link leads to a voting for this
site concerned => „click“ is recorded to the database
pages with many „clicks“ are more popular developed by „Direct Hit“
19.04.23 Thomas Haidlas 20
Hit Popularity Engine II
This method is usually combined with others Disadvantage:
new web-sites have a bad ranking
19.04.23 Thomas Haidlas 21
Ranking-Manipulation
Why? commercial interest
Done of: Search Engine Optimizer, SEO
Sense of: to boost the pagerank
19.04.23 Thomas Haidlas 22
Linkfarm
Many Domains are registered Programs generate thousands among
themselves linked pages each page contains keywords Partly these sides are arranged even complex
19.04.23 Thomas Haidlas 23
Forwarding
intermediate page contains the looked for terms
HTML Meta tags and simple Javascript can be recognized
SEO‘s complicate the forwarding instructions => no recognizing
19.04.23 Thomas Haidlas 24
IP Delivery
normal site is indicated by Robots After this, contents of the site are exchanged
19.04.23 Thomas Haidlas 25
IP Cloaking
Servers programs determine who the Request starts
Robots request: "cloaked" content is delivered which is designed to influence ranking
Human visitors: do not see the "cloaked" content
19.04.23 Thomas Haidlas 26
Other simple tricks
Links in guestbooks particularly effectively with high-ranking
guestbooks „Blind Text“
Text in background-color
19.04.23 Thomas Haidlas 28
Resumee
suitable tools select The www is dynamic =>
new developments consider correct estimate of ranking
19.04.23 Thomas Haidlas 30
Sources
[1] www.suchfibel.de [2] Jo Bager Orientierungslose Infosammler c‘t 23/99 [3] Stefan Karzauninkat Zielfahndung c‘t 23/99 [4] Sven Lennartz Ich bin wichtig c‘t 23/99 [5] Stefan Karzauninkat Google zugemüllt c‘t 1/03 [6] www.google.com/webmasters [7] Dr. Wolfgang Sander-Beuermann Schatzsucher c‘t 13/98 [8] Arno Dittmar Suchmaschinen und Anfragen im WWW [9] Ralf Rudolf Suchmaschinen und Anfragen im WWW