LIS618 lecture 4 Thomas Krichel 2002-09-06. Structure of talk The blue sheet Working with Dialog...
-
Upload
eustacia-hawkins -
Category
Documents
-
view
218 -
download
0
description
Transcript of LIS618 lecture 4 Thomas Krichel 2002-09-06. Structure of talk The blue sheet Working with Dialog...
LIS618 lecture 4
Thomas Krichel2002-09-06
Structure of talk
• The blue sheet• Working with Dialog• Nexis.com
the rank command
• counts the occurrences of unique terms within a specified field and lists them in ranked order with most highly posted term appearing first.
• example– rank au
to find who has written the most papers• Does not work on word-indexed fields
View command
Sort command
type command
type set/format/range [from base]• set is a result set• format is a format• range can be
– start – end• start is a record number to start• end is a record number to end
– all• base is a database file number or "each"
formats are defined
• 2 -- full record except abstract• 3 or medium – citation• 5 or long – full except full text• 6 or free – title and dialog number• 8 or short – title plus indexing terms
– useful to find other indexing terms• 9 or full – everything• KWIC or K – keywords in context
Using controlled vocabularies
• not all databases have it• use of heterogeneous schemes across
different databases• A strategy to use them is s word s s1/ti,de t s1/8/all s term/cc
special name finders
• Dialog company name, file 416• Dialog journal name, file 414• Dialog product name, file 413
searching dialog index
• use broad term on file 415• expand
– db= database type– di= index category– dt= document types indexed– gc= geographic coverage
Finally: verdict on Dialog
• speed abominable• interface clunky• javascript stinks• integration of sources leaves much to be
desired
• Does not reach the pass mark• Will be dead in five years time.
Nexis
primarily a news service
• adds an important temporal component to all its contents
• restricts contents as compared to Dialog• Six interfaces to be covered
– search -- personal news– subject directory -- real time news– power search -- search forms
• potentially bad competition from Google
limiting by dates
• does not use ISO format :--(• any of the following are legal
– 07/24/2000 – 7/24/00 – July 24, 2000 – Jul 24, 2000 – July 2000
Nexis search
• no phrase terms• implicit Boolean "or" between terms• in fact searches
– keywords extracted– HLEAD for news– TITLE for legal documets– WEB-SEARCH-TEXT for web pages
with the web comes all sorts• TITLE: FarmSex FarmLove bestiality beasitialty animal sex
animalsex dog fuckers horsecocks zebras chimps monkeys male fuck dogsex
• URL: http://www.farmsex.com • DOMAIN-TYPE: Commercial • WEB-LEAD-TEXT: WARNING! Farm Sex The following
pages contain sexually oriented adult material intended for individuals 21 years of age or older.
• WEB-PAGE-STATUS: OK (200) • LAST-UPDATE: September 7, 2002 • FIRST-REFERENCE: Jul 22 1999 12:00AM • LATEST-REFERENCE: June 01, 2002• SUBJECT: PORNOGRAPHY & OBSCENITY (78%);
TERMS segment
• a list of indexing terms has been compiled by nexis staff
• computer scans and looks for terms or a variants of terms, and replaces with term identified
• this collection of terms makes for the index terms.
relevance ranking
• where terms appear within the document • how many occurances of the terms appear
in the document• how often those search terms appear
throughout the document• apparently not how much they occur,
example search for "the"• seems that they guard algorithm a secret
Subject directory
• you can follow the subject tree but• there seems to be only a tiny amount of
documents• categories are not particularly deep or
developed• there is a "more like this" feature of limited
use, Thomas finds
Power search
• source selection, editing is possible• use of connectors is possible here
– OR -- AND – AND NOT – PRE/n, n is a number, ordered proximity– W/n, n is a number, unordered proximity – W/S words in same sentence – W/P words is the some paragraph
• no use of double quotes for paragraphs
Power search expressions• Parentheses group terms together• * for one or no letter• ! for any number of letters• ATLEAST n, where n is a minimum number of
occurrences• PLURAL (term) only the plural of term• SINGULAR (term) only the singular of term• ALLCAPS (term) only capitals of term• NOCAPS (term) no capitals of term• CAPS (term) capitalized term only
power search for news
• uses power search expressions, plus• lhead (expression)• company (expression) for a company• byline (expression) for the author• show (expression) for a television show
transcript
expression is a Boolean expression
power search for legal data
• uses power search expressions, plus• name (expression) for the name of a party• cite (expression) for a citation expression
for case law• title (expression) for the title of a law article
expression is a Boolean expression
other searches
• our evidence suggests that this is a delicate topic
• news alert– use this to get personal use– do a search, then click on update to get to a
screen where you can enter • periodicity • document type
a different query language
• terms are implicitly ANDed• explicit AND and OR allowed• phrases have to be put in quotes• * starts for any number of characters, not
just one as in power search• parenthesis can be used