INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of...
-
Upload
asher-singleton -
Category
Documents
-
view
218 -
download
0
Transcript of INFO624 - Week 4 Query Languages and Query Operations Dr. Xia Lin Associate Professor College of...
INFO624 - Week 4
Query Languages and Query Operations
Dr. Xia LinDr. Xia LinAssociate ProfessorAssociate Professor
College of Information Science and TechnologyCollege of Information Science and Technology
Drexel UniversityDrexel University
Query Query is a Query is a representationrepresentation of the user’s of the user’s
information needsinformation needs It may not represent the information It may not represent the information
needs exactly becauseneeds exactly becauseInformation needs are difficult to Information needs are difficult to
describe -- semantic difficultydescribe -- semantic difficultyQuery must be in a format Query must be in a format
acceptable to the retrieval system -- acceptable to the retrieval system -- syntactic difficultysyntactic difficulty
Content-based queries
Words
Phrases
Proximity
Pattern Matchingword matching
Prefix/suffix
Wildcard search
Error handlingExtended patterns
Boolean Vector
Natural Language
Boolean Queries
Request:Request:What are the likely problems when someone gets What are the likely problems when someone gets hurt on his knees when playing basketball?hurt on his knees when playing basketball?
Write your best Boolean query for this request:Write your best Boolean query for this request:
If the query returns zero hits, how do you modify If the query returns zero hits, how do you modify the query? the query?
If the query returns too many hits, how do you If the query returns too many hits, how do you modify the query?modify the query?
How does AskJeeves translate the request? How does AskJeeves translate the request? What are the likely problems when What are the likely problems when
someone gets hurt on his knees when someone gets hurt on his knees when playing basketball?playing basketball?
Construct your best Boolean query for this Construct your best Boolean query for this request:request: I am doing a research on personal space I am doing a research on personal space
boundaries. I want to know if there are boundaries. I want to know if there are any sex or race differences in personal any sex or race differences in personal space boundaries. space boundaries.
Interaction with Queries
Starts with a SEED queryStarts with a SEED query The System responds with a list of The System responds with a list of
related termsrelated terms Adds selected terms from the list to the Adds selected terms from the list to the
queryquery The system updates the list of related The system updates the list of related
termsterms Repeat as neededRepeat as needed
Example: MedLine Search Assistant
Association-based Queries
Find documents similar to this document.Find documents similar to this document.
Find documents that links to this documentFind documents that links to this document ExplicitlyExplicitly Implicitly Implicitly
Field-based Queries
Field-based queries will likely improve Field-based queries will likely improve search precision.search precision.
Field-based queries require that the Field-based queries require that the data source has a fixed structure and data source has a fixed structure and are indexed by the structure.are indexed by the structure.
Citation-based Queries
Retrieve all documents that document A Retrieve all documents that document A cites. cites.
Find all documents that cite document A.Find all documents that cite document A. Find all documents that cite this authorFind all documents that cite this author Find all document that cite both document Find all document that cite both document
A and document BA and document B Find documents that cites both author A Find documents that cites both author A
and author Band author B
Co-Citation The college has more than 20 years tradition on The college has more than 20 years tradition on
Co-citation research.Co-citation research. Co-citation is the mentioning of any two earlier Co-citation is the mentioning of any two earlier
documents in the bibliographic references of a later documents in the bibliographic references of a later third document.third document.
Later Document 3
Document 1 cites
Document 2cites
?
Co-Citation Analysis The count of mentions may grow over The count of mentions may grow over
time as new writings appear. Thus, co-time as new writings appear. Thus, co-citation counts can reflect citers’ citation counts can reflect citers’ changing perceptions of documents as changing perceptions of documents as more or less strongly related.more or less strongly related.
Documents shown to be related by their Documents shown to be related by their co-citation counts can be mapped as co-citation counts can be mapped as proximate in intellectual space.proximate in intellectual space.
Co-Citation Mapping
Detects patterns in the frequency with which Detects patterns in the frequency with which any works by any two authors are jointly any works by any two authors are jointly cited in later works. cited in later works.
Only recurrent co-citation is significant: The Only recurrent co-citation is significant: The more times authors are cited together, the more times authors are cited together, the more strongly related they are in the eyes of more strongly related they are in the eyes of citers.citers.
A Map of Information Scientists
AuthorLinks
Link-Based Queries Hypertext StructureHypertext Structure
Is a link a query?Is a link a query?http://www.google.com/search?http://www.google.com/search?
hl=en&q=information+retrievalhl=en&q=information+retrievalThis is called query-mediated link. This is called query-mediated link. It is also called “soft link.”It is also called “soft link.”
Is a query a link?Is a query a link?Many pages are dynamically generated Many pages are dynamically generated
from a database or a search engine.from a database or a search engine.• Your review pagesYour review pages
Queries, Links, Is there a difference – SIGCHI’97 An experiment was conducted to compare An experiment was conducted to compare browsing behavior in query- and link-browsing behavior in query- and link-based interfaces. Results suggest that based interfaces. Results suggest that query-mediated links are as effective as query-mediated links are as effective as explicit queries, and that strategies explicit queries, and that strategies adopted by users affect performance. This adopted by users affect performance. This work has implications for the design of work has implications for the design of information exploration interfaces. information exploration interfaces.
Query Structure Hierarchical StructureHierarchical Structure
What does the user want when searching for What does the user want when searching for “substance abuse”“substance abuse”
We may not know, but adding narrower terms We may not know, but adding narrower terms of “substance abuse” will likely get better of “substance abuse” will likely get better resultsresults
Alcohol Abuse; Alcohol Abuse; Drug Abuse; Drug Abuse; Alcohol-Related Disorders Alcohol-Related Disorders Amphetamine-Related Disorders Amphetamine-Related Disorders Cocaine-Related Disorders Cocaine-Related Disorders Marijuana Abuse Marijuana Abuse
Automatic Expansion If there is a defined hierarchy, several If there is a defined hierarchy, several
search strategies may be defined to expand search strategies may be defined to expand the query:the query: Search with the query term onlySearch with the query term only Search with the query term and all the Search with the query term and all the
terms in its upper hierarchyterms in its upper hierarchy Search with the query term and all the Search with the query term and all the
terms in its lower hierarchy.terms in its lower hierarchy. Search with the query terms and its all Search with the query terms and its all
the sibling termsthe sibling terms
Query Operations
Query executionQuery execution Query expansionQuery expansion Query translationQuery translation
Query Expansion
Improve the initial query through Improve the initial query through automatically automatically restructuring the query or restructuring the query or adding other new terms oradding other new terms or Adjusting weights of each terms.Adjusting weights of each terms.
Restructuring the query:Restructuring the query: Identify key concepts through natural Identify key concepts through natural
language processinglanguage processing Identify any field information that Identify any field information that
may be contained in the querymay be contained in the queryIs this an author?Is this an author?Is this a journal?Is this a journal?
Reverse term orders in the queryReverse term orders in the query
Adding new terms:Adding new terms: Synonyms Synonyms Hierarchical termsHierarchical terms Scope termsScope terms
Does query “Football” retrieve Does query “Football” retrieve information on football or on soccer? information on football or on soccer?
Relevant termsRelevant termsSelected terms from relevant documentsSelected terms from relevant documentsTerms co-occur most often with the query Terms co-occur most often with the query
termsterms
Adjusting term weightingAdjusting term weighting If relevant documents are known, increase the If relevant documents are known, increase the
weights for terms assigned to the relevant weights for terms assigned to the relevant documents and decrease the weights to terms documents and decrease the weights to terms assigned to non-relevant documents.assigned to non-relevant documents.
Adjust term weights in a topic tree:Adjust term weights in a topic tree: Fruit Fruit
Fruit, 0.9 ; apple, 0.7; orange, 0.7; banana, Fruit, 0.9 ; apple, 0.7; orange, 0.7; banana, 0.6; ….; Macintosh, 0.1; Computer -.4.0.6; ….; Macintosh, 0.1; Computer -.4.
Query Translation From natural language to queriesFrom natural language to queries
AskJeevesAskJeeves From queries in one system to queries in From queries in one system to queries in
another systemanother system From one natural language to another From one natural language to another
natural languagenatural language AltavistaAltavista
Other types of representation for user’s needs?
Mind-reading?Mind-reading? Non-text queries?Non-text queries? Gesture/motion? Gesture/motion?
IBM – Visualization Space•This information system understands the user.
•It "hears" users' voice commands and "sees"their gestures and body positions. Interactions are natural, more like human-to-human interactions.
Multimedia Queries Content-basedContent-based
Text indexingText indexing Attribute-basedAttribute-based
Color, size, type, time period, …Color, size, type, time period, … Structure-basedStructure-based
Location, shape, layout, etc.Location, shape, layout, etc. Cluster-basedCluster-based
Semantic groups, physical groups, structure-Semantic groups, physical groups, structure-groups, groups,
Example: find a photo that has the White House Example: find a photo that has the White House in the center.in the center.
Project Discussion Idea 1: Install and implement an IR systemIdea 1: Install and implement an IR system
Focus on system and technologyFocus on system and technology Need to have a collection Need to have a collection Need to have hand-on experience with systemsNeed to have hand-on experience with systems
Idea 2: Conduct an evaluation experiment on one Idea 2: Conduct an evaluation experiment on one or two selected IR systemsor two selected IR systems Focus on interfaces and usersFocus on interfaces and users
Idea 3: Customize an IR system Idea 3: Customize an IR system Focus on functionality and customization Focus on functionality and customization
Project Evaluation TopicsTopics
RelevanceRelevance Problems identifiedProblems identified Technical difficultiesTechnical difficulties Solutions/ideasSolutions/ideas
The processThe process DesignDesign ImplementationImplementation
The reportThe report BackgroundBackground Written Written Oral Oral
Midterm ConceptsConcepts
What is information retrieval?What is information retrieval? Data, information, text, and documentsData, information, text, and documents Two abstractions principlesTwo abstractions principles User’s information needsUser’s information needs Queries and query formatsQueries and query formats Precision and RecallPrecision and Recall RelevanceRelevance
Midterm
Procedures & problem solving Procedures & problem solving How to translate a request into a query?How to translate a request into a query? How to expand queriesHow to expand queries
for better recall or better precision?for better recall or better precision? How to create an inverted indexing?How to create an inverted indexing? How to create a vector space ?How to create a vector space ? How to calculate similarities of How to calculate similarities of
documents?documents? How to match a query to documents in a How to match a query to documents in a
vector space?vector space?
DiscussionsDiscussions Challenges of IRChallenges of IR Advantages and disadvantages of Boolean Advantages and disadvantages of Boolean
search (vector space, automatic indexing, search (vector space, automatic indexing, association-based queries, etc.)association-based queries, etc.)
Evaluation of IR systemsEvaluation of IR systemsWith or without using precision/recall.With or without using precision/recall.
Difference between data retrieval and Difference between data retrieval and information retrievalinformation retrieval