© Tefko Saracevic1 Search strategy & tactics Governed by effectiveness&feedback.

25
© Tefko Saracevic 1 Search Search strategy strategy & tactics & tactics Governed by Governed by effectiveness effectiveness & & feedback feedback
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    0

Transcript of © Tefko Saracevic1 Search strategy & tactics Governed by effectiveness&feedback.

© Tefko Saracevic 1

Search strategySearch strategy& tactics& tactics

Search strategySearch strategy& tactics& tactics

Governed byGoverned by

effectivenesseffectiveness

& &

feedbackfeedback

© Tefko Saracevic 2

Some definitions

• Search statement (query):– set of search terms with

logical connectors and attributes - file and system dependent

• Search strategy (big picture):– overall approach to

searching of a questionselection of systems, files,

search statements & tactics, sequence, output formats; cost, time aspects

© Tefko Saracevic 3

Some definitions (cont.)

• Search tactics (action choices):– choices & variations in search

statements terms, connectors, attributes

• Move :– modifications of search

strategies or tactics that are aimed at improving the results

• Cycle (particularly applicable to systems such as DIALOG):

– set of commands from start (begin) to viewing (type) results, or from a viewing to a viewing command

© Tefko Saracevic 4

Some definitions (cont.)

• Effectiveness :– performance as to

objectivesto what degree did a search

accomplish what desired?how well done in terms of

relevance?

• Efficiency :– performance as to costs

at what cost and/or effort, time?

Both KEY concepts & criteria for selection of strategy, tactics & evaluation

© Tefko Saracevic 5

Effectiveness criteria

• Search tactics chosen & changed following some criteria of accomplishment, such as:– none - no thought given– relevance (very often)– magnitude (also very often)– output attributes– topic/strategy

• Tactics altered interactively– role & types of feedbackKnowing what tactics may produce what results key to professional searcher

© Tefko Saracevic 6

Relevance:key concept in IR

• Attribute/criterion reflecting effectiveness of exchange of inf. between people (users) & IR systems in communication contacts, based on valuation by people

• Some attributes:– in IR - user dependent– multidimensional or faceted– dynamic– measurable - somewhat– intuitively well understood

© Tefko Saracevic 7

Types of relevance

• Several types considered:– Systems or algorithmic

relevancerelation between between a

query as entered and objects in the file of a system as retrieved or failed to be retrieved by a given procedure or algorithm. Comparative effectiveness.

– Topical or subject relevance: relation between topic in the

query & topic covered by the retrieved objects, or objects in the file(s) of the system, or even in existence; Aboutness..

© Tefko Saracevic 8

Types of relevance (cont.) – Cognitive relevance or

pertinence:relation between state of knowledge &

cognitive inf. need of a user and the objects provided or in the file(s). Informativeness, novelty ...

– Motivational or affective relevancerelation between intents, goals &

motivations of a user & objects retrieved by a system or in the file, or even in existence. Satisfaction ...

– Situational relevance or utility: relation between the task or problem-at-

hand. and the objects retrieved (or in the files). Relates to usefulness in decision-making, reduction of uncertainty ...

© Tefko Saracevic 9

Effectiveness measures

• Precision:– probability that given that an

object is retrieved it is relevant, or the ratio of relevant items retrieved to all items retrieved

• Recall:– probability that given that an

object is relevant it is retrieved, or the ratio of relevant items retrieved to all relevant items in a file

• Precision easy to establish, recall is not

union of retrievals as a “trick” to establish recall

© Tefko Saracevic 10

Precision =

a

a + b

Recall =a

a + c

Calculation

High precision = maximize a, minimize b

High recall = maximize a, minimize c

JudgedRELEVANT

JudgedNOT RELEVANT

ItemsRETRIEVED

aNo. of items

relevant & retrieved

bnot relevant &

retrievedItems

NOT RETRIEVEDc

relevant &not retrieved

dnot relevant &not retrieved

© Tefko Saracevic 11

Interpretation: PRECISION

• Precision= percent of relevant stuff you have in your answer– or conversely percent of junk– high precision = most stuff

relevant– low precision = a lot of junk

• Some users demand high precision– do not want to wade through

much stuff– but it comes at a price: relevant

stuff may be missed tradeoff

© Tefko Saracevic 12

• A file may have a lot of relevant stuff

• Recall = percent of that relevant stuff in the file that you retrieved– conversely percent of stuff you

missed– high recall = you missed little– low recall = you missed a lot

• Some users demand high recall (e.g. PhD students doing dissertation)

– want to make sure that important stuff is not missed

– but will have to pay a price of wading through a lot of junk

tradeoff

Interpretation:RECALL

© Tefko Saracevic 13

Precision-recall trade-off

• USUALLY: precision & recall are inversely related– higher recall usually lower

precision & vice versa100 %

100 %0

Ideal

Usual

Impr

ovem

ents

Pre

cisi

on

Recall

© Tefko Saracevic 14

Interpretation:TRADE-OFF

• It is like in life, usually:– you get some lose some

• Usually, but not alwayskeep in mind these are

probabilities

– when you have high precision most stuff you got is relevant or on the target but you missed stuff that is also relevant – it was left behind

– when you have high recall you did not miss much but you got also a lot of junk - wading through itYou use different tactics for high recall from those for high precision

© Tefko Saracevic 15

Search tactics

• What variations possible?– several ‘things’ in a query

can be selected or changed that affect effectiveness

– each variation has consequence in output if I do X then Y will happen

1. LOGIC – choice of connectors among

terms (AND, OR, NOT, W …)

2. SCOPE– no. of terms linked - ANDs(A AND B vs A AND B AND C)

© Tefko Saracevic 16

Search tactics (cont.)

3.EXHAUSTIVITY– for each concept no. of related

terms - OR connections(A OR B vs. A OR B OR C)

4. TERM SPECIFICITY– for each concept level in hierarchy(broader vs narrower terms)

5. SEARCHABLE FIELDS– choice for text terms & non-text

attributes e.g. titles only, limit as to years

6. FILE OR SYSTEM SPECIFIC CAPABILITIES– e.g. ranking, sorting

© Tefko Saracevic 17

Effectiveness “laws”

SCOPE- adding more ANDs

EXHAUSTIVITY- adding more more

ORs

USE OF NOTs- adding more NOTs

BROAD TERM USE– low specificity

Output size: downRecall: downPrecision: up

Output size: upRecall: upPrecision: downOutput size downRecall: downPrecision: up

Output size: upRecall: upPrecision: downOutput size: downRecall: downPrecision: up

PHRASE USE - high specificity

© Tefko Saracevic 18

Tactics: What to do?

• To increase precision:– use precision devices

• To increase recall:– use recall devices

• Each will also affect magnitude of output

• With experience use of these devices will become will become second nature

© Tefko Saracevic 19

Recall, precision devices

BROADENING higher recall:Fewer ANDsMore ORsFewer NOTsMore free textFewer controlledMore synonymsBroader termsLess specificMore truncationFewer qualifiersFewer limitsCitation growing

NARROWING -higher

precision:More ANDsFewer ORsMore NOTsLess free textMore controlledLess synonymsNarrower termsMore specificLess truncationMore qualifiersMore limitsBuilding blocks

© Tefko Saracevic 20

Other tactics• Citation growing:

– find a relevant document– look for documents cited in– look for documents citing it– repeat on newly found

relevant documents

• Building blocks– find documents with term A– review – add term B & so on

• Using different feedbacks– a most important tool

© Tefko Saracevic 21

Feedback in searching

• Any feedback implies loops– a completion of a process

provides information for modification, if any, for the next process

– information from output is used to change previous or create new input

• In searching:– some information taken from

output of a search is used to do something with next query (search statement)

examine what you got to decide what to do next in searching

– a basic tactic in searching

• Several feedback types used in searching– each used for different decisions

© Tefko Saracevic 22

Feedback types

• Content relevance feedback– judge relevance of items retrieved– make decision what to do next

switch files, change exhaustivity …

• Term relevance feedback– find relevant documents– examine what other terms used in

those documents – search using additional terms

also called query modification & in some systems done automatically

• Magnitude feedback– on the basis of size of output

make tactical decisions often the size so big that documents

are not examined but next search done to limit size

© Tefko Saracevic 23

Feedback types (cont.)

• Tactical review feedback– after a number of queries (search

statements) in the same search review tactics as to getting desired outputs

review terms, logic, limits …

– change tactics accordingly

• Strategic review feedback– after a while (or after consultation

with user) review the “big” picture on what searched and how

sources, terms, relevant documents, need satisfaction, changes in question, query …

– do next searches accordingly– used in reiterative searching

• There is a difference between reviewing strategy & tactics– but they can be combined

© Tefko Saracevic 24

Bates Berry-picking model of searching

“…moving through many actions towards a general goal of satisfactory completion of research related to information need.”– query is shifting (continually)

as search progresses queries are changing

different tactics are used

– searcher (user) may move through a variety of sourcesnew files, resources may be usedstrategy may change

© Tefko Saracevic 25

Berry-picking …

– new information may provide new ideas, new directionsfeedback is used in various ways

– question is not satisfied by a single set of answers, but by a series of selections & bits of information found along the wayresults may vary & may have to

be provided in appropriate ways & means