mining is the use of data mining techniques to Web Miningpaginas.fe.up.pt/~ec/files_1011/week 11 -...

WebMining

Based on several presentations found on the web: Shapiro, Ullman, Terziyan, Pedersen, Bing Liu ...

WhatisWebMining?

Web mining is the use of data mining techniques to automatically discover and extract information from Web documents/services

(Etzioni, 1996, CACM 39(11))

Web mining aims to discovery useful information or knowledge from the Web hyperlink structure, page content and usage data.

(Bing LIU 2007, Web Data Mining, Springer)

WhatisWebMining? Motivation / Opportunity

The WWW is huge, widely distributed, global information service centre and, therefore, constitutes a rich source for data mining

Intelligent Web Search

Personalization, Recommendation Engines

Web‐commerce applications

Building the Semantic Web

Web page classification and categorization

News classification and clustering

Information / trend monitoring

Analysis of online communities

Web and mail spam filtering

Abundanceandauthoritycrisis

Liberal and informal culture of content generation and dissemination

Redundancy and non‐standard form and content

Millions of qualifying pages for most broad queries

Example: java or kayaking

No authoritative information about the reliability of a site

Little support for adapting to the background of specific users

Pages added continuously and average page changes in a few weeks

SizeoftheWeb

Number of pages

11.5 billion indexable pages (http://www.cs.uiowa.edu/~asignori/web-size/ www2005)

Technically, infinite

Because of dynamically generated content

Lots of duplication (30‐40%)

Best estimate of “unique” static HTML pages comes from search engine claims

Yahoo = claimed 19.2 billion in Aug 2005

Number of unique web sites

Netcraft survey says 98 million sites

GYBA = Sorted on Google, Yahoo!, Bing and AskYGBA = Sorted on Yahoo!, Google, Bing and Ask

http://www.worldwidewebsize.com/8

Anotherwaytoestimatethewebsize

The number of web servers was estimated by sampling and testing random IP address numbers and determining the fraction of such tests that successfully located a web server

The estimate of the average number of pages per server was obtained by crawling a sample of the servers identified in the first experiment

Lawrence, S. and Giles, C. L. (1999). Accessibility of information on the web. Nature, 400(6740): 107–109.

WhyisWebInformationRetrievalDifficult?

The Abundance Problem (99% of information of no interest to 99% of people)

Hundreds of irrelevant documents returned in response to a search query

Limited Coverage of the Web (Internet sources hidden behind search interfaces)

Largest crawlers cover less than 18% of Web pages

The Web is extremely dynamic

Lots of pages added, removed and changed every day

Very high dimensionality (thousands of dimensions)

Limited query interface based on keyword‐oriented search

Limited customization to individual users

Ad indexes

Websearchbasics

The Web

Web Results 1 - 10 of about 7,310,000 for miele. (0.12 seconds)

Miele, Inc -- Anything else is a compromise At the heart of your home, Appliances by Miele. ... USA. to miele.com. Residential Appliances. Vacuum Cleaners. Dishwashers. Cooking Appliances. Steam Oven. Coffee System ... www.miele.com/ - 20k - Cached - Similar pages

Miele Welcome to Miele, the home of the very best appliances and kitchens in the world. www.miele.co.uk/ - 3k - Cached - Similar pages

Miele - Deutscher Hersteller von Einbaugeräten, Hausgeräten ... - [ Translate this page ] Das Portal zum Thema Essen & Geniessen online unter www.zu-tisch.de. Miele weltweit ...ein Leben lang. ... Wählen Sie die Miele Vertretung Ihres Landes. www.miele.de/ - 10k - Cached - Similar pages

Herzlich willkommen bei Miele Österreich - [ Translate this page ] Herzlich willkommen bei Miele Österreich Wenn Sie nicht automatisch weitergeleitet werden, klicken Sie bitte hier! HAUSHALTSGERÄTE ... www.miele.at/ - 3k - Cached - Similar pages

mining is the use of data mining techniques to Web Miningpaginas.fe.up.pt/~ec/files_1011/week 11 -...

Documents

Transcript of mining is the use of data mining techniques to Web Miningpaginas.fe.up.pt/~ec/files_1011/week 11 -...

Optimizing search engines

Travel search-engines

Semantic Search Engines

Search engines powerpoint

Using Search Engines to Market your Consultancy. What are Search Engines?

Search Engines!

Ao3 search engines

Search engines & eCommerce - Ravensbourne lecture - How Search Engines work & Foundations of SEO

Web search engines and search technology

Internet search engines

Search Engines Evaluation

cdn-cms.f-static.com · 2018. 3. 16. · (107) semantic search engines,xml search engines, web3 search engines, smart search engines, semantic–multimedia, audio, video, image-search

Module 3 - Internet. Search Engines Search engine anatomy Different search engines Effective searching techniques.

Search Engines & Search Engine Optimization (SEO).

Search engines

Biomedical Search engines

Academic Search Engines

Search Engines and Metasearch Engines

The Players The Majors Dead Search Engines International Search Engines Metasearch Engines.

Academic Search Engines - E-LISeprints.rclis.org/12721/1/Academic_Search_Engines.pdf · Search engines are about excitement, optimism, hope and enrichment. Search engines are also