Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

54
Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services

Transcript of Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Page 1: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Presented by

Edgar Cornejo03.03.14

LAMISpring 2014

Search Engine and Services

Page 2: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Outline

Mobile information search for location-based

information

Web-a-Where: Geotagging Web Content

The design and implementation of SPIRIT:

a spatially-aware search engine for

information retrieval on the Internet

Page 3: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Mobile information search for location-based information

Department of Industrial Engineering Tsinghua University

Beijing, ChinaApril 2010

Chengyi Liu · Pei-Luen Patrick Rau · Fei Gao

Page 4: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Mobile search for location-based information

Mobile information search for location-based information

The study investigated the

effects of location and

information type in mobile

searching for location-based

information by carrying out

two experiments in an airport

Page 5: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Mobile search scenario

High time

pressure

Many environment

al disturbance

s

Device limitation

s (screen size, input

method)

Restricted users’

operations

Mobile information search for location-based information

Page 6: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Mobile searching context

Information queries

+ location

More suitable results

Mobile information search for location-based information

Since most of the information is location-based [1,2],

the results can be improved by analyzing information

queries and location

Search Engine

Page 7: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Features of mobile interaction [3]

Mobile information search for location-based information

User's hands are often

used to manipulate

physical objects

Users may be involved in

tasks that demand a high

level of visual attention

Page 8: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Features of mobile interaction [3]

Mobile information search for location-based information

Users may be highly

mobile during the task

and have high-speed

interaction

Page 9: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Search queries

Mobile information search for location-based information

Query Type Purpose Share*

Navigational query

to reach a particular site 29.4%

Informational query

to find information 10.2%

Transactional query

to visit a site and perform some web-mediated activity

60.4%

*According to a large scale study of European mobile search

behavior developed in 2008 [4]

Page 10: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Factors proposed that may influence the mobile information

search

Page 11: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Experiment 1 - Hypotheses

Mobile information search for location-based information

Hypothesis 1

For information searches in mobile versus non-mobile:

The average of clicks in mobile is less

The first search is more important

Free recall is worse

Page 12: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Experiment 1 - Hypotheses

Mobile information search for location-based information

Hypothesis 2

For information searching about location-based with respect to non-location-based information

The number of clicks is less

The first search result is more important

Free recall is better

Page 13: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Experiment 1 - Tasks

Page 14: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Experiment 1 - Results

Mobile information search for location-based information

Hypothesis 1

The intention was to find how the user’s context (mobile vs. non-mobile) might affect the user’s information searching performance

The average of clicks in mobile are less

False

The first search is more important False

Free recall is worse False

Page 15: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Experiment 1 - Results

Mobile information search for location-based information

Hypothesis 2

The intention was to examine how the information type (location-based vs. non-location-based) might affect the user’s information searching performance The average of clicks in mobile are

lessTrue

The first search is more important True

Free recall is better True

Page 16: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Experiment 2 - Hypotheses

Mobile information search for location-based information

Hypothesis 3

For mobile information searching under high pressure with respect to low pressure info requirement:

Average number of clicks are less

The first search result is more important

Free recall is worse

Page 17: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Experiment 2 - Hypotheses

Mobile information search for location-based information

Hypothesis 4

For mobile information searching of informational or navigational with respect to transactional queries

Number of clicks is greater

The first search result is less important

Free recall is worse

Page 18: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Experiment 2 - Tasks

Page 19: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Experiment 2 - Result

Mobile information search for location-based information

Hypothesis 3

The intention was to examine how the information pressure (high vs. low) requirement might affect a user’s mobile search performance

The average of clicks is less True

The first search is more important False

Free recall is worse False

Page 20: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Experiment 2 - Results

Mobile information search for location-based information

Hypothesis 4

The intention was to examine how the how the location-based information type (informational, navigational vs. transactional) might affect a user’s mobile search performance.

The average of clicks is greater True

The first search results are less important

True

Free recall is worse True

Page 21: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Summary

Mobile information search for location-based information

Information type (location-based vs. non-location-based) was found to be effective in user performance during the information search process

Information requirement pressure and location-based information type (navigational, informational and transactional) affect the mobile search process

The first two search results were found to be very important to good search efficiency and good user satisfaction

Page 22: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Web-a-Where: Geotagging Web Content

Einat Amitay · Nadav Har’El Ron · Sivan Aya Soffer

IBM Haifa Research LabHaifa 31905, Israel

July 2004

Page 23: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Web-a-Where: Geotagging Web Content

Web-a-Where: Geotagging Web Content

Is a system for associating geography with Web pages

Locates mentions of places and determines the place each name refers to

Assigns to each page a geographic focus a locality that the page discusses as a whole

Implemented within the framework of the IBM WebFountain data mining system

Page 24: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Web-a-Where: Geotagging Web Content

Web-a-Where: Geotagging Web Content

Pages may have two types of geography associated

with it: a source and a target.

Source geography has to do with the origin of the

page, the physical location, address of its author,

etc.

Target geography is determined by the contents

of the page and relates to the topic the page is

discussing.

Page 25: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Ambiguities

Web-a-Where: Geotagging Web Content

Geo/non-geo ambiguity is the case of a place

name having another, non geographic meaning

e.g. Mobile (Alabama) or Reading (England)

Geo/geo ambiguity arises when two or more

distinct places have the same name

Page 26: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

System Components

Web-a-Where: Geotagging Web Content

Geotagger (Main component)

Finds and disambiguates geographic names

Assigns a taxonomy node to each phrase in the

text to refer to a place e.g., Paris/France/Europe

The gazetteer

Database that keeps the list of geographic names,

their canonical taxonomies and other information

Page 27: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Tagging individual place names

Web-a-Where: Geotagging Web Content

The processing of a page is done in three

phases:

Spotting DisambiguationFocus

determination

Page 28: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

1. Spotting place name candidates

Web-a-Where: Geotagging Web Content

Finding all the possible geographic names in each

page

Short abbreviations are not spotted e.g. IN (for

Indiana) or AT ( for Austria) but used to help

disambiguate other spots e.g. Gary, IN

Page 29: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

2. Disambiguating spots (Algorithm)

Web-a-Where: Geotagging Web Content

The geotagger assigns a unique meaning to spots

that can be uniquely qualified. Confidence 95%

Combinations that are not unique are left

unassigned

In a page with multiple spots with the same name

where only one is qualified, this value is assigned

to the others. Confidence 80%

Disambiguation contexts are also used to

unassigned spots with confidence less than 70%

Page 30: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

2. Disambiguating spot (Data sources)

Web-a-Where: Geotagging Web Content

The Geographic Names Information System

(GNIS) for U.S. locations

world-gazetteer.com for non-U.S. locations

United Nations Statistic Division (UNSD) for

countries and continents

ISO 3166-1 for country and other abbreviations

Page 31: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

3. Focus determination

Web-a-Where: Geotagging Web Content

The basic idea is that if several cities from the

same region are mentioned, probably this region

is the focus

Sometimes cannot be said that a page has only

one focus

The confidence score should be taken into

account when finding the focus, giving higher

weight to information coming from locations with

higher confidence

Page 32: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Example

Web-a-Where: Geotagging Web Content

A certain page contained four mentions of Orlando/Florida (assigned confidence 0.5), three Texas (0.75), eight Fort Worth/Texas (0.75), three Dallas/Texas (0.75), one Garland/Texas (0.75), and one Iraq (0.5)

A human was asked to judge what is the geographical focus of this page and responded with “It’s about Texas and perhaps also Orlando”

Indeed, that page comes from the “Orlando Weekly” site, in a forum titled “Just a look at The Texas Local Music Scene...”

Page 33: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Evaluating geotagging precision

Web-a-Where: Geotagging Web Content

CollectionNumber of

pages Accuracy

Arbitrary collection 200 81,7%

.GOV collection 200 73,3%

Open Directory Project (ODP)

200 63,1%

Geotags assigned automatically versus defined manually

Page 34: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Evaluating focus

Web-a-Where: Geotagging Web Content

92% Correct up to country level

8% Incorrect country

38% Precise match

30% Correct state

or city

24% Correct country

4%Correct

continent

4%Continent

wrong

Comparison of Web-a-Where-determined focus to human-determined one (ODP) for ~1 million pages

Page 35: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Summary

Web-a-Where: Geotagging Web Content

The system is able to correctly tag individual

name place occurrences 80% of the time and

define correct focus of a page 92% of the time

Accuracy can be further improved

The main source of errors is geo/non-geo

ambiguity

Page 36: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

The design and implementation of SPIRIT

Ross Purves, Paul Clough, Christopher Jones, Avi Arampatzis, Benedicte Bucheri, David Finch, Gaihua Fu, Hideo Joho, Awase Hhirni Syed, Subodh Vaid and

Bisheng Yang

Department of Geography, University of Zurich, Switzerland

Department of Information Studies, University of Sheffield, UK

School of Computer Science, Cardiff University, UK

Institute of Information and Computing Sciences, Utrecht University, Netherlands

Laboratoire COGIT - Institut Geographique National, France

August 2007

Page 37: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

The design and implementation of SPIRIT

The design and implementation of SPIRIT

This paper describes the design and implementation

of a complete solution to geographic information

retrieval

Page 38: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Requirements

The design and implementation of SPIRIT

Exhaustive retrieval of relevant documents in a

specified area

Place names should be automatically identified,

and interactively disambiguated

Ability to query for geographical areas whose

boundaries are imprecise

Page 39: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Requirements

The design and implementation of SPIRIT

Spatial concepts relating different geographic

entities should be represented (outside, in)

It should be possible for users to specify the area

of interest on a map

Ability to view query results on a map linked to

relevant web documents

Document ranking should combine both spatial

and thematic aspects of document relevance

Page 40: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Architecture Overview

The design and implementation of SPIRIT

User interface Broker

Relevance ranking

IndexesTextualSpatial

Web data collection

documents

Search Engine

Geographical

ontology

Metadata Doc-to-

footprint mapping

Query disambiguationQuery expansion

Rank results

Search request

Geo-coding

Access indexes

Spatial index

Textual index

Geo-parsing

Run-time

Pre-processing

Page 41: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Functionality of the components

The design and implementation of SPIRIT

Pre-processing the document collection

Assigning spatial footprints to web documents:

Identify geographical references

(geoparsing)

Assign them to spatial

coordinates (geocoding)

Spatial footprint

Page 42: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Functionality of the components

The design and implementation of SPIRIT

Building document indexes

Grid-based spatial indexing

For each cell of the grid, a list of

document ID’s was constructed, using

the document footprints which resulted

from the geo-tagging process

Page 43: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Functionality of the components

The design and implementation of SPIRIT

Retrieving the results: “T” (Text) Scheme

Simplest approach

Retrieve all the documents that match the

concept terms of the query and then filter to

return only those which intersect the

geographical scope of the place in the query

(footprint)

Page 44: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Functionality of the components

The design and implementation of SPIRIT

Retrieving the results: “ST” (Space-Text)

Scheme

More integrated approach

Regarded as a space-primary method

At search time the cells that intersect the query

footprint are determined and then only the

corresponding text indexes are searched

Page 45: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Functionality of the components

The design and implementation of SPIRIT

Retrieving the results: “TS” (Text-Space)

Scheme

Better query response time

Regarded as a text-primary method

At search time, for each term, the associated

documents are grouped according to the spatial

index which they relate to

Page 46: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Query interfaces

The design and implementation of SPIRIT

Page 47: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Results display

The design and implementation of SPIRIT

Page 48: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Evaluation

The design and implementation of SPIRIT

Performance analysis

A relevant document to the query had to be both

thematically and spatially relevant.

In this sense, the key result of the work is that

spatially aware search outperformed text-only

search.

Page 49: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Evaluation

The design and implementation of SPIRIT

Usability analysis

Strongly disagree

Disagree Neutral Agree Strongly agree

0

5

10

15

20

25

30

It was easy to get started with the system and make my query

No, not at all A little Yes, very much0

5

10

15

20

25

30

It was easy to find the locations of doc-uments listed to the right of the map on

the map

Page 50: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Conclusions

The design and implementation of SPIRIT

The paper describes a unified approach, as well

as the architecture, for introducing spatial-

awareness into search-engine technology

A prototype system demonstrated the

effectiveness of the strategy

Page 51: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Personal Conclusions

The design and implementation of SPIRIT

The first study that can lead to changes in search

engines and devices to improve the mobile

experience

The web-a-where system provides good insight

for further location search improving though is

not very precise

SPIRIT is a complete new paradigm in space

aware searching but the interaction methods can

be improved

Page 52: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

Thank you

Page 53: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

References

General References

[1] M. Sanderson, J. Kohler, Analyzing geographic queries,

in: Proceedings of the SIGIR 2004 Workshop on Geographic

Information Retrieval, Sheffield, UK, 2004.

[2] S. Asadi, Searching the World Wide Web for local

services and facilities: a review on the patterns of location-

based queries, in: WAIM’05, Hong Zhou, China, 2005.

[3] S. Kristoffersen, F. Ljungberg, ‘‘Making Place’’ to make

IT work: empirical explorations of HCI for mobile CSCW, in:

Paper Presented at the International ACM SIGGROUP

Conference on Supporting Group Work, 1999.

Page 54: Presented by Edgar Cornejo 03.03.14 LAMI Spring 2014 Search Engine and Services.

References

General References

[4] K. Church, B. Smyth, K. Bradley, P. Cotter, A large scale

study of European mobile search behavior, in: Proceedings

of MobileHCI’08, 2008, pp. 13–22.

[5] M.A. Neerincx, J.W. Streefkerk, Interacting in desktop

and mobile context: emotion, trust and task performance,

in: Paper Presented at the Proceedings of the First European

Symposium on Ambient Intelligence (EUSAI), Eindhoven,

The Netherlands, 2003.