INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of...

37
INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Dr. Xia Lin Associate Professor Associate Professor College of Information Science and College of Information Science and Technology Technology Drexel University Drexel University

Transcript of INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of...

Page 1: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

INFO624 - Week 4

Query Languages and Query Operations

Dr. Xia LinDr. Xia LinAssociate ProfessorAssociate Professor

College of Information Science and TechnologyCollege of Information Science and Technology

Drexel UniversityDrexel University

Page 2: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Query Query is a Query is a representationrepresentation of the user’s of the user’s

information needsinformation needs It may not represent the information It may not represent the information

needs exactly becauseneeds exactly becauseInformation needs are difficult to Information needs are difficult to

describe -- semantic difficultydescribe -- semantic difficultyQuery must be in a format Query must be in a format

acceptable to the retrieval system -- acceptable to the retrieval system -- syntactic difficultysyntactic difficulty

Page 3: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Content-based queries

Words

Phrases

Proximity

Pattern Matchingword matching

Prefix/suffix

Wildcard search

Error handlingExtended patterns

Boolean Vector

Natural Language

Page 4: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Boolean Queries

Request:Request:What are the likely problems when someone gets What are the likely problems when someone gets hurt on his knees when playing basketball?hurt on his knees when playing basketball?

Write your best Boolean query for this request:Write your best Boolean query for this request:

If the query returns zero hits, how do you modify If the query returns zero hits, how do you modify the query? the query?

If the query returns too many hits, how do you If the query returns too many hits, how do you modify the query?modify the query?

Page 5: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

How does AskJeeves translate the request? How does AskJeeves translate the request? What are the likely problems when What are the likely problems when

someone gets hurt on his knees when someone gets hurt on his knees when playing basketball?playing basketball?

Page 6: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Construct your best Boolean query for this Construct your best Boolean query for this request:request: I am doing a research on personal space I am doing a research on personal space

boundaries. I want to know if there are boundaries. I want to know if there are any sex or race differences in personal any sex or race differences in personal space boundaries. space boundaries.

Page 7: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Interaction with Queries

Starts with a SEED queryStarts with a SEED query The System responds with a list of The System responds with a list of

related termsrelated terms Adds selected terms from the list to the Adds selected terms from the list to the

queryquery The system updates the list of related The system updates the list of related

termsterms Repeat as neededRepeat as needed

Page 8: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Example: MedLine Search Assistant

Page 9: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Association-based Queries

Find documents similar to this document.Find documents similar to this document.

Find documents that links to this documentFind documents that links to this document ExplicitlyExplicitly Implicitly Implicitly

Page 10: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Field-based Queries

Page 11: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Field-based queries will likely improve Field-based queries will likely improve search precision.search precision.

Field-based queries require that the Field-based queries require that the data source has a fixed structure and data source has a fixed structure and are indexed by the structure.are indexed by the structure.

Page 12: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Citation-based Queries

Retrieve all documents that document A Retrieve all documents that document A cites. cites.

Find all documents that cite document A.Find all documents that cite document A. Find all documents that cite this authorFind all documents that cite this author Find all document that cite both document Find all document that cite both document

A and document BA and document B Find documents that cites both author A Find documents that cites both author A

and author Band author B

Page 13: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Co-Citation The college has more than 20 years tradition on The college has more than 20 years tradition on

Co-citation research.Co-citation research. Co-citation is the mentioning of any two earlier Co-citation is the mentioning of any two earlier

documents in the bibliographic references of a later documents in the bibliographic references of a later third document.third document.

Later Document 3

Document 1 cites

Document 2cites

?

Page 14: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Co-Citation Analysis The count of mentions may grow over The count of mentions may grow over

time as new writings appear. Thus, co-time as new writings appear. Thus, co-citation counts can reflect citers’ citation counts can reflect citers’ changing perceptions of documents as changing perceptions of documents as more or less strongly related.more or less strongly related.

Documents shown to be related by their Documents shown to be related by their co-citation counts can be mapped as co-citation counts can be mapped as proximate in intellectual space.proximate in intellectual space.

Page 15: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Co-Citation Mapping

Detects patterns in the frequency with which Detects patterns in the frequency with which any works by any two authors are jointly any works by any two authors are jointly cited in later works. cited in later works.

Only recurrent co-citation is significant: The Only recurrent co-citation is significant: The more times authors are cited together, the more times authors are cited together, the more strongly related they are in the eyes of more strongly related they are in the eyes of citers.citers.

Page 16: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

A Map of Information Scientists

Page 17: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

AuthorLinks

Page 18: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Link-Based Queries Hypertext StructureHypertext Structure

Is a link a query?Is a link a query?http://www.google.com/search?http://www.google.com/search?

hl=en&q=information+retrievalhl=en&q=information+retrievalThis is called query-mediated link. This is called query-mediated link. It is also called “soft link.”It is also called “soft link.”

Is a query a link?Is a query a link?Many pages are dynamically generated Many pages are dynamically generated

from a database or a search engine.from a database or a search engine.• Your review pagesYour review pages

Page 19: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Queries, Links, Is there a difference – SIGCHI’97 An experiment was conducted to compare An experiment was conducted to compare browsing behavior in query- and link-browsing behavior in query- and link-based interfaces. Results suggest that based interfaces. Results suggest that query-mediated links are as effective as query-mediated links are as effective as explicit queries, and that strategies explicit queries, and that strategies adopted by users affect performance. This adopted by users affect performance. This work has implications for the design of work has implications for the design of information exploration interfaces. information exploration interfaces.

Page 20: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Query Structure Hierarchical StructureHierarchical Structure

What does the user want when searching for What does the user want when searching for “substance abuse”“substance abuse”

We may not know, but adding narrower terms We may not know, but adding narrower terms of “substance abuse” will likely get better of “substance abuse” will likely get better resultsresults

Alcohol Abuse; Alcohol Abuse; Drug Abuse; Drug Abuse; Alcohol-Related Disorders Alcohol-Related Disorders Amphetamine-Related Disorders Amphetamine-Related Disorders Cocaine-Related Disorders Cocaine-Related Disorders Marijuana Abuse Marijuana Abuse

Page 21: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Automatic Expansion If there is a defined hierarchy, several If there is a defined hierarchy, several

search strategies may be defined to expand search strategies may be defined to expand the query:the query: Search with the query term onlySearch with the query term only Search with the query term and all the Search with the query term and all the

terms in its upper hierarchyterms in its upper hierarchy Search with the query term and all the Search with the query term and all the

terms in its lower hierarchy.terms in its lower hierarchy. Search with the query terms and its all Search with the query terms and its all

the sibling termsthe sibling terms

Page 22: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Page 23: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Query Operations

Query executionQuery execution Query expansionQuery expansion Query translationQuery translation

Page 24: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Query Expansion

Improve the initial query through Improve the initial query through automatically automatically restructuring the query or restructuring the query or adding other new terms oradding other new terms or Adjusting weights of each terms.Adjusting weights of each terms.

Page 25: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Restructuring the query:Restructuring the query: Identify key concepts through natural Identify key concepts through natural

language processinglanguage processing Identify any field information that Identify any field information that

may be contained in the querymay be contained in the queryIs this an author?Is this an author?Is this a journal?Is this a journal?

Reverse term orders in the queryReverse term orders in the query

Page 26: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Adding new terms:Adding new terms: Synonyms Synonyms Hierarchical termsHierarchical terms Scope termsScope terms

Does query “Football” retrieve Does query “Football” retrieve information on football or on soccer? information on football or on soccer?

Relevant termsRelevant termsSelected terms from relevant documentsSelected terms from relevant documentsTerms co-occur most often with the query Terms co-occur most often with the query

termsterms

Page 27: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Adjusting term weightingAdjusting term weighting If relevant documents are known, increase the If relevant documents are known, increase the

weights for terms assigned to the relevant weights for terms assigned to the relevant documents and decrease the weights to terms documents and decrease the weights to terms assigned to non-relevant documents.assigned to non-relevant documents.

Adjust term weights in a topic tree:Adjust term weights in a topic tree: Fruit Fruit

Fruit, 0.9 ; apple, 0.7; orange, 0.7; banana, Fruit, 0.9 ; apple, 0.7; orange, 0.7; banana, 0.6; ….; Macintosh, 0.1; Computer -.4.0.6; ….; Macintosh, 0.1; Computer -.4.

Page 28: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Query Translation From natural language to queriesFrom natural language to queries

AskJeevesAskJeeves From queries in one system to queries in From queries in one system to queries in

another systemanother system From one natural language to another From one natural language to another

natural languagenatural language AltavistaAltavista

Page 29: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Other types of representation for user’s needs?

Mind-reading?Mind-reading? Non-text queries?Non-text queries? Gesture/motion? Gesture/motion?

Page 30: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

IBM – Visualization Space•This information system understands the user.

•It "hears" users' voice commands and "sees"their gestures and body positions. Interactions are natural, more like human-to-human interactions.

Page 31: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Multimedia Queries Content-basedContent-based

Text indexingText indexing Attribute-basedAttribute-based

Color, size, type, time period, …Color, size, type, time period, … Structure-basedStructure-based

Location, shape, layout, etc.Location, shape, layout, etc. Cluster-basedCluster-based

Semantic groups, physical groups, structure-Semantic groups, physical groups, structure-groups, groups,

Example: find a photo that has the White House Example: find a photo that has the White House in the center.in the center.

Page 32: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Project Discussion Idea 1: Install and implement an IR systemIdea 1: Install and implement an IR system

Focus on system and technologyFocus on system and technology Need to have a collection Need to have a collection Need to have hand-on experience with systemsNeed to have hand-on experience with systems

Idea 2: Conduct an evaluation experiment on one Idea 2: Conduct an evaluation experiment on one or two selected IR systemsor two selected IR systems Focus on interfaces and usersFocus on interfaces and users

Idea 3: Customize an IR system Idea 3: Customize an IR system Focus on functionality and customization Focus on functionality and customization

Page 33: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Project Evaluation TopicsTopics

RelevanceRelevance Problems identifiedProblems identified Technical difficultiesTechnical difficulties Solutions/ideasSolutions/ideas

The processThe process DesignDesign ImplementationImplementation

Page 34: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

The reportThe report BackgroundBackground Written Written Oral Oral

Page 35: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Midterm ConceptsConcepts

What is information retrieval?What is information retrieval? Data, information, text, and documentsData, information, text, and documents Two abstractions principlesTwo abstractions principles User’s information needsUser’s information needs Queries and query formatsQueries and query formats Precision and RecallPrecision and Recall RelevanceRelevance

Page 36: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

Midterm

Procedures & problem solving Procedures & problem solving How to translate a request into a query?How to translate a request into a query? How to expand queriesHow to expand queries

for better recall or better precision?for better recall or better precision? How to create an inverted indexing?How to create an inverted indexing? How to create a vector space ?How to create a vector space ? How to calculate similarities of How to calculate similarities of

documents?documents? How to match a query to documents in a How to match a query to documents in a

vector space?vector space?

Page 37: INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.

DiscussionsDiscussions Challenges of IRChallenges of IR Advantages and disadvantages of Boolean Advantages and disadvantages of Boolean

search (vector space, automatic indexing, search (vector space, automatic indexing, association-based queries, etc.)association-based queries, etc.)

Evaluation of IR systemsEvaluation of IR systemsWith or without using precision/recall.With or without using precision/recall.

Difference between data retrieval and Difference between data retrieval and information retrievalinformation retrieval