Search Engines

30
25.03.22 Thomas Haidlas 1 Search Engines

description

Search Engines. Introducing. Directories, Meta-Searchengine How search engines work What influences the ranking. Directories. hand-constructed hierarchy of topics (e.g. Yahoo!) use human editors for page selection, indexing and classification Covers a small part of the web - PowerPoint PPT Presentation

Transcript of Search Engines

19.04.23 Thomas Haidlas 1

Search Engines

19.04.23 Thomas Haidlas 2

Introducing

Directories, Meta-Searchengine How search engines work What influences the ranking

19.04.23 Thomas Haidlas 3

Directories

hand-constructed hierarchy of topics (e.g. Yahoo!)

use human editors for page selection, indexing and classification

Covers a small part of the web Small updatability No ranking

19.04.23 Thomas Haidlas 4

Directories II

No searching across the index Searching across the reviews Sometimes partnership with search engines

to increase coverage

19.04.23 Thomas Haidlas 5

Meta-Searchengine

Rare keyword requests require use of more than one web search engine

Submit the same query parallel to many engines Duplicated entries are eliminated The results are shown in uniform format No harvesting or indexing

19.04.23 Thomas Haidlas 6

How search engines work

Harvesting Indexing Analyzing Requests Ranking

19.04.23 Thomas Haidlas 7

Harvesting

programs (robots, gatherer or crawler )visit web sites and gather the web pages for indexing

Start with an initial page Follows hyperlinks (<a href=…>) Sometimes, more then 2 sub-levels are visited These programs are started periodically

19.04.23 Thomas Haidlas 8

Harvesting II

Problems: Links aren‘t found in

Frames Imagemaps

Many robots are started by a search engine

=> traffic

19.04.23 Thomas Haidlas 9

Robot Exclusion

Two Methods: Meta-Tags:<meta name="robots" content=„noindex,nofollow"> robots.txt:

User-agent: Scooter

Disallow: /privat/geht_dich_gar_nix_an.html

Allow: /allesOffen

19.04.23 Thomas Haidlas 10

Robot Exclusion II

robots.txt (Example 2):User-agent: *

Allow: /allesOffen

19.04.23 Thomas Haidlas 11

Indexing

Indextable gets the harvesting-resuls Indextable includes keywords Table is located in main-mamory => fast

access

19.04.23 Thomas Haidlas 12

Analysing Requests

Comparison between searchstring and index-table The searchstring consists of a word:

=> easy processing The search word consists of truncation or booleans:

=> complex processing If the searchstring in the index is discovered, the

side is taken up to the hit-list

19.04.23 Thomas Haidlas 13

Ranking

influences on the ranking: How many keywords are found keyword-frequency keywords-position:

Domain/URLDocumentname

19.04.23 Thomas Haidlas 14

Ranking II

HeadlineEarly in the textMeta-Tags

Ranking for cash Page Rank Clicking frequency/ Hit Popularity Engine

19.04.23 Thomas Haidlas 15

Ranking for cash

Capitalism principle Paying money => high ranking-level Contents are not relevant additional incomes

19.04.23 Thomas Haidlas 16

Ranking for cash II

not independently in the employment Mostly used by e-commerce-companies Second method:

pay for faster indexing time

19.04.23 Thomas Haidlas 17

Page Rank (Google)

Evaluation through internet-community (web-admins)

Realtion between quality of a page and number of links that point to it

Links of the popular web-sites are regarded as better

19.04.23 Thomas Haidlas 18

Page Rank (Google) II

Disadvantage: new web-sites have a bad ranking Querys with many boolean-connections

and keywords are not easy to process

19.04.23 Thomas Haidlas 19

Hit Popularity Engine

index already exists and is pre-sorted A click on a link leads to a voting for this

site concerned => „click“ is recorded to the database

pages with many „clicks“ are more popular developed by „Direct Hit“

19.04.23 Thomas Haidlas 20

Hit Popularity Engine II

This method is usually combined with others Disadvantage:

new web-sites have a bad ranking

19.04.23 Thomas Haidlas 21

Ranking-Manipulation

Why? commercial interest

Done of: Search Engine Optimizer, SEO

Sense of: to boost the pagerank

19.04.23 Thomas Haidlas 22

Linkfarm

Many Domains are registered Programs generate thousands among

themselves linked pages each page contains keywords Partly these sides are arranged even complex

19.04.23 Thomas Haidlas 23

Forwarding

intermediate page contains the looked for terms

HTML Meta tags and simple Javascript can be recognized

SEO‘s complicate the forwarding instructions => no recognizing

19.04.23 Thomas Haidlas 24

IP Delivery

normal site is indicated by Robots After this, contents of the site are exchanged

19.04.23 Thomas Haidlas 25

IP Cloaking

Servers programs determine who the Request starts

Robots request: "cloaked" content is delivered which is designed to influence ranking

Human visitors: do not see the "cloaked" content

19.04.23 Thomas Haidlas 26

Other simple tricks

Links in guestbooks particularly effectively with high-ranking

guestbooks „Blind Text“

Text in background-color

19.04.23 Thomas Haidlas 27

Trade with weblinks

Paying for linking Partnership =>Commission

19.04.23 Thomas Haidlas 28

Resumee

suitable tools select The www is dynamic =>

new developments consider correct estimate of ranking

19.04.23 Thomas Haidlas 29

Thank You!

19.04.23 Thomas Haidlas 30

Sources

[1] www.suchfibel.de [2] Jo Bager Orientierungslose Infosammler c‘t 23/99 [3] Stefan Karzauninkat Zielfahndung c‘t 23/99 [4] Sven Lennartz Ich bin wichtig c‘t 23/99 [5] Stefan Karzauninkat Google zugemüllt c‘t 1/03 [6] www.google.com/webmasters [7] Dr. Wolfgang Sander-Beuermann Schatzsucher c‘t 13/98 [8] Arno Dittmar Suchmaschinen und Anfragen im WWW [9] Ralf Rudolf Suchmaschinen und Anfragen im WWW