Indexing

92
INDEXING INDEXING
  • date post

    19-Sep-2014
  • Category

    Technology

  • view

    219
  • download

    35

description

 

Transcript of Indexing

Page 1: Indexing

INDEXINGINDEXING

Page 2: Indexing

Definition of TermsDefinition of Terms• indexing - the process of providing in-depth access to

information contained within a document or knowledge record.

• index - a guide to the contents of a document or collection of documents with the same format arranged in a searchable order such as alphabetical, classified, chronological or numerical.

• index entry – single record in an index that may consist of four parts: main heading, subheading, locator and/or cross reference/s.

• descriptor – a term designed for use by the thesaurus to represent the aboutness of a topic in a document.

• document – any item that contains information, either in print or non-print format, including digital forms.

• identifier - proper name of person, object, institution/organization, process, etc.

Page 3: Indexing

• indexing language - any vocabulary, controlled or uncontrolled, used for indexing along with the rules of usage.

• indexing system – a set of prescribed procedures (manual or machine-operated) intended for organizing the contents of a document or knowledge records for purposes of retrieval and dissemination.

• keyword - raw word coming from the documents that are regarded as indexable term.

• qualifier - a term or phrase added to a heading to distinguish among homographs or clarify meaning.

• translation – the process of converting concepts derived from the document into a particular set of index terms usually derived from a controlled vocabulary.

• vocabulary control - the process of organizing a list of terms for use in indexing, along with the rules of usage.

Page 4: Indexing

Development of Indexes and Development of Indexes and IndexingIndexing

• First systematic organization of written records occurred in Sumer around 3, 000 B.C.

• Around 2, 000 B.C. in China and India, record keeping became part of the society.

• Early civilizations proposed schemes of knowledge classification and document arrangement (e.g. Greeks used some sort of alphabetic order).

• In 900 A.D., an encyclopedia was arranged in alphabetical order.

• During the 15th century, books were published with blank pages and quite wide margins.

Page 5: Indexing

• The 17th century brought a new type of information tool, the periodical.

• During the 19th century also, Paul Otlet and Henry La Fontaine founded the International Institute of Bibliography to improve indexing approaches to scholarly literature. This led to modern keyword and free-text indexing.

• In 1900, H.W. Wilson first published Reader’s Guide to Periodical Literature.

• In the 1950s, W.F. Poole published an index that covered numerous issues of many periodicals.

• By the 1950s, computers penetrated the indexing arena and efforts to evaluate indexing begun.

Page 6: Indexing

Role of Indexing in Information Role of Indexing in Information RetrievalRetrieval

Relationship of Indexing, Abstracting and Searching (Cleveland and Cleveland, 2001, p. 31)

DOCUMENT

INDEX ABSTRACT

PATRON

INDEXING TOOL

Page 7: Indexing

Information Retrieval Information Retrieval SystemSystem

• Information retrieval system is a mechanism for carrying out the functions of information retrieval process.

• Organization of information may take in different forms (manual, by the use of computer or a combination of both).

• Most challenging problem: providing for the nearest possible response or coincidence

• Modern information retrieval systems: data retrieval, reference retrieval and text retrieval.

Page 8: Indexing

Functions involved:1. The information is created and acquired for

the system.2. Knowledge records are analyzed and tagged

by set of index terms.3. The knowledge records are stored physically

and index terms are stored into a structured file.

4. The user’s query is tagged with sets of index terms and then is matched against tagged records.

5. Matched documents are retrieved for review.6. Feedback may lead to several reiterations of

the search.

Information Retrieval Information Retrieval SystemSystem

Page 9: Indexing

Feedback may lead to several reiterations of the search...

Request is conceptually analyzed

Request is translated into system's index language

A searching strategy is composed

Search is carried out

Search is completed

Is user satisfied?

Stop

Reformation of the request

Are all searching options depleted?

User espresses an inforamtion need

Page 10: Indexing

Purposes and Uses of Purposes and Uses of IndexesIndexes

• Saves time and effort in finding information.

• Identify potentially relevant information in the document or collection being indexed.

• Analyze concepts treated in a document to produce appropriate index headings based on the indexing language assigned.

• Indicate relationships among terms.

Page 11: Indexing

• Group together related topics.• Direct the users seeking information

under terms not chosen as index headings to headings that have been chosen.

• Suggest related topics .• Tool for current awareness services.

Purposes and Uses of Purposes and Uses of IndexesIndexes

Page 12: Indexing

Types of IndexesTypes of Indexes by arrangementa. Alphabetical index a. Alphabetical index Advantage:• More convenient to use and follows an

order that is familiar to users.Drawbacks:• synonymy • scattering of entries

Page 13: Indexing

b. Classified index b. Classified index Advantages:• useful for generic searches.• Brings similar things together.Drawbacks:• Most users find them difficult to use.• Needs a secondary file. • One cannot enter it directly as one can with

alphabetical sequences of names.

Page 14: Indexing

c. Concordance c. Concordance Uses:• Locate a partly or completely remembered

passageDrawback:• Searching is difficult since this type of index

spreads similar entries over many synonymous terms, ignores misspellings, and confuses any general-specific term relationships.

d. Numerical or serial order*e.g. Numerical Patent Index of Chemical

Abstracts; American Statistics Office

Page 15: Indexing

Nelson’s Complete Concordance of the Revised Standard Version

BibleAARON

“Is there not A., your brother, the Ex. 4.14The Lord said to A., “Go into 4.27And Moses told A. all the words 4.28And A. spoke all the words which 4.30Afterward Moses and A. went to 5.30

Page 16: Indexing

by type or form of material indexed1. Book index*Reasons for Preparing a Book Index• collects the different ways of wording

the same concept.• filters information for the reader.• pinpoints information

Page 17: Indexing

Components of a book index entry:– main heading – subheading– locator – cross references

World Wide Web (WWW)browsers, 78components, 89development, 100-156

see also Internet

Page 18: Indexing

2. Periodical index*• consistency becomes the most

challenging part • open-ended projects • scope is broader 3. Newspaper index • vocabulary control becomes a

paramount challenge4. Audiovisual materials index• textual labeling is needed along with

image matching

Page 19: Indexing

Difference between Book & Difference between Book & Periodical IndexesPeriodical Indexes

• Compiled only once and within a relatively short time and usually performed by a single person.

• Deals with a more or less well-defined central topic.

• A continuous process and more often performed by a team of indexers and lasting for an extended period.

• Deals with a great variety of topics.

Page 20: Indexing

• Indexing terms are almost always derived from the text.

• Specificity is largely governed by the text itself.

• Terminology must be consistent and derived from a controlled vocabulary.

• Terms are prescribed by a controlled vocabulary and their level of specificity may be lower than the book index.

Difference between Book & Difference between Book & Periodical IndexesPeriodical Indexes

Page 21: Indexing

• Every single page of a book must be read.

• Entire text is virtually subject to indexing.

• Always bound with the indexed text.

• Articles are scanned for indexable items and may rely on an abstract or summary compiled.

• A periodical index will depend on a number of policy decisions.

• Compiled separately.

Difference between Book & Difference between Book & Periodical IndexesPeriodical Indexes

Page 22: Indexing

by physical form• card index• printed index • microform index• computerized index

– automatic indexing – computer-assisted indexing

Page 23: Indexing

Principles and Concepts of Principles and Concepts of IndexingIndexing

• Exhaustivity – refers to the extent to which concepts are made retrievable by means of index terms.1.1 Summarization1.2 Depth indexing

2. Specificity – refers to the extent to which a concept or topic in a document is identified by a percise term in the hierarchy of its genus-species relationship.Example:An information resource about musicians should be entered under ‘Musicians’ and not under ‘Performing Artists’.

3. Consistency – refers to the extent of the agreement exists on the terms to be used to index some documents.Types of consistency level:

inter-indexer consistency intra-indexer consistency

Page 24: Indexing

Indexing LanguagesIndexing Languages Purposes and Uses• a system for naming or identifying subjects contained

in a document. • as a tool for communicationFeatures/Characteristics

Vocabulary – refers to terms selected from the indexing of concepts.Syntactics – refers to the combination and modification of terms to form headings and multilevel headings or to form search statements.

Example: Employees, Training of; Training of employeesSemantics – the study of meaning as expressed in communication such as words.

Page 25: Indexing

Semantic relationships are categorized into:

• Equivalence relationship – implies that there will be more than one term denoting the same concept. – Synonyms– Quasi-synonyms – Preferred spelling – Acronyms and abbreviations – Current and established terms– Translation

Page 26: Indexing

• Hierarchical relationship– Genus – species relationship (represents class

inclusionExample:

Agro industry Food Industry Meat Industry– Whole - part relationship

Example: Foot Toes

• Affinitive relationship – displayed with the use of related termsExample:

Men – WomenEducation – Teaching

Page 27: Indexing

1. Natural language (derived-term system)Characteristics are:• Improves recall because it provides more

access points but reduces precision• Redundancy is greater• Uses more current terms• Tends to be favored by subject-specialists or

the end-users• May also be called indexing by extraction (or

extractive indexing method).

Types of Indexing Types of Indexing LanguagesLanguages

Page 28: Indexing

2. Controlled vocabulary (assigned-term system)

Functions:• To control synonyms by choosing one

form as the standard term• To make distinctions among homographs • To bring or link together terms that are

closely related• Establishes the size of scope of a term • Usually records hierarchical and

affinitive/associative relations• Controls variant spellings

Page 29: Indexing

Syndetic devices used by a controlled vocabulary:

• USE and UF (use for) for synonyms• BT (broader term), NT (narrower

term) and RT (related term) for differing levels of specificity and certain near synonyms and antonyms

Page 30: Indexing

Advantages of Controlled Vocabulary Language

• Increases the probability that both indexer and searcher will express a particular concept in the same way.

• Increases the probability that the same term will be used by different indexers or by the same indexer at different times.

• Helps searchers to focus their thoughts when they approach the information system without a full and precise realization of what information they need.

Page 31: Indexing

Disadvantages of Controlled Vocabulary Language:

• Incompatibility of different indexing languages.

• High input cost. • The possibility of inadequate

vocabulary.

Page 32: Indexing

1. Authority List / Subject Authority ListExamples:

• Library of Congress Subject Headings• Sears List of Subject Headings• Dewey Decimal Classification

2. Thesaurus• Latin word means ‘treasure’ • Poly-hierarchicalExamples:

The Art & Architecture Thesaurus* ERIC (Education Resouces Information Center)

Thesaurus*

Types of Controlled Types of Controlled VocabularyVocabulary

Page 33: Indexing

Similarities between Authority Lists and Thesauri

• Both attempts to provide subject access to information resources by providing terminology that can be consistent rather than uncontrolled and unpredictable.

• Both choose preferred terms and make references from non-used terms.

• Both provide hierarchies so that terms are presented in relation to their broader, narrower, and related terms.

Page 34: Indexing

Difference between Authority Lists and Thesauri

• Thesauri are made up of single terms and bound terms representing single concepts. Subject heading lists have phrases and other pre-coordinated terms in addition to single terms.

• Thesauri are more strictly hierarchical.• Thesauri are narrow in scope. • Thesauri are more likely multilingual.

Page 35: Indexing

Relationships of TermsINTELLIGENCEBT: AbilityNT: ComprehensionRT: Talent Aptitude

• Broader term (BT) reference shows hierarchical relationship upward in the classification tree.

• Narrower term (NT) reference is similar to the broader term reference, except it goes down in the classification tree.

• Related term (RT) reference refers to a descriptor that can be used in addition to the basic term but is not in a hierarchical relationship.

• Use reference refers to a preferred descriptor from a non-usable term.

• Use for (UF) reference deals primarily with synonymous or variant forms of the preferred descriptor. It is also used to lead the indexer to more general terms.

• Scope Note (SN) is used to give the users about the descriptor’s usage restrictions or to clarify ambiguity.

Page 36: Indexing

Construction of a Construction of a ThesaurusThesaurus

1. Identify the subject field. 2. Identify the nature of literature to be indexed. 3. Identify the users.4. Identify the file structure. Will this be a pre-

coordinate or post-coordinate system?5. Consult published indexes, glossaries,

dictionaries, and other tools in the subject areas for the raw vocabulary.

6. Cluster the terms.7. Establish term relationships.

Page 37: Indexing

Indexing SystemsIndexing Systems 1. Coordinate indexes – an indexing

scheme that combines single index terms to create composite subject concepts

Types:post-coordinate indexing pre-coordinate indexing

Page 38: Indexing

2. Classified indexes – contents are arranged systematically by classes or subject headings.

2.1 Enumerative indexes – Both DDC, LCC, and UDC are examples of

enumerative classifications. – Enumerative classifications are top-down

methods of analysis.

Page 39: Indexing

2.2 Faceted indexes • often called as analytico-synthetic system. A

facet analysis is a tightly controlled process by which simple concepts are organized into carefully defined categories by connecting class numbers of the basic concepts.

• Bottom-up systems. • Is pre-coordinated at the time of indexing

and is arranged in classification order rather than a straight alphabetical order.

• Shiyali Ramamrita Ranganathan in 1930sExample: When indexing a cookbook, some important facets

might be:• Holidays • Ingredients • Recipe Titles• Techniques

Page 40: Indexing

3. Chain indexes • Provide that every concept becomes

linked, or chained.• Introduced by S.R. Ranganathan as

part of his Colon Classification, the system uses “synthesis” or “number building”. The number that represents some complex subject is arrived at by joining the notational elements that represent more elemental subjects.

Page 41: Indexing

Example of a Chain IndexTopic: Victorian period English Poetry (821.8)Hierarchy:

8 Literature 2 English1 Poetry .8 Victorian period

Chain index entries that will be generated are the following:

Victorian period: Poetry: English: Literature 821.8Poetry: English: Literature 821English: Literature 820Literature 800

Page 42: Indexing

4. Permuted title indexesAdvantages:• minimum cost• does not need the expertise of a professional indexer

because it is entirely done by a computer Disadvantages:• titles may not accurately reflect the content of the

item• limited number of terms restrict complete subject

indication• most of the title indexes are unappealing to the eye • can increase the retrieval of irrelevant documents

– usually employ stop-lists• Scattering of synonyms and generic terms usually

cause user frustration and missed entries.

Page 43: Indexing

4.1 KWIC (keyword in context) – was introduced by Hans Peter Luhn in 1959. It is a rotated index most commonly derived from the titles of documents. Each keyword appearing in a title becomes an entry point and highlighted in some way by setting it off at the center of the page. Principles of KWIC Indexing

• Title are generally informative• Words extracted from the title can be used as an

effective guide • Although the meaning of an individual word viewed

in isolation may be ambiguous or too general, the context surrounding the word helps to define and explain meaning.

Example:for Croatians.  Cataloging and classificationCataloging and  classification for Croatiansfor  Croatians. Cataloging and classification

Page 44: Indexing

4.2 KWOC (keyword out of context) - A variation on the Keyword in Context Index (KWIC), in which keywords, removed from the context of the titles that contain them, appear as headings in a separate line index flush with the left margin.

Example:Cataloging Cataloging and classification for Croatians.classification Cataloging and classification for Croatians.Croatians. Cataloging and classification for Croatians

*A keyword used as an entry point in KWOC index is sometimes not repeated in the title but is replaced by an asterisk (*) or some symbols. Example:Blue-eyed * Cats in Texas ……………………. 25Cat The * and the Economy ………….. 12Cats Blue-eyed * in Texas …………..…. 13Economy The Cat and the * ……………….… 56Texas Blue-eyed Cats in * ……………..… 76

Page 45: Indexing

4.3 KWAC (keyword alongside context) - also produced by computer algorithm, are designed to preserve work pairs and phrases in the alphabetical sequence of keywords while at the same time imitating the traditional format with the lead term on the left.

Example:Cataloging and classification for Croatians.classification for Croatians. Cataloging andCroatians. Cataloging and classification for

Page 46: Indexing

5. Citation indexes – lead users to papers by citations, rather than by index terms.

6. String indexes – a word-based system in which the indexer analyzes the various aspects of the subject treated in a document and records the aspects as words, along with “role operators” . The computer program combines these words into string of terms that represents a brief summary of the document’s content.

Page 47: Indexing

6.1 PRECIS (Preserved Context Index System)• developed by Derek Austin for the British

National Bibliography (1971-1973) in order to produce printed alphabetical subject entries.

• principle of “context-dependency”. It involves:

– Determining the subject content of the document– Analyzing the subject statement to determine the role

of each significant term (action term, location item, an agent or object of the action)

– Determine the relationship of a term to other terms in the database and how should all these terms be linked.

Page 48: Indexing

Below is an illustration on how a string of terms are organized according to the principle of context-dependency.

Topic: “Selection of personnel in paper industries in the Philippines”, the input string is:

A > B > C > Dor

Philippines > Paper industries > Personnel > Selection

Page 49: Indexing

The input string is:(0) Philippines(1) paper industries(P) personnel(2) selectionWhere (2) represents the “transition action”, (P)

“object of action”, (0) “location, and (1) “key system” (object of transitive action). These operators show the role that a term plays in relation to other terms and thus can be regarded as “role indicators” or “role operators”.

Page 50: Indexing

Entries provided are:

Philippines Paper industries. Personnel. Selection.

Paper industries. Philippines. Personnel. Selection.

Personnel. Paper industries. Philippines. Selection.

Selection. Personnel. Paper industries. Philippines

Page 51: Indexing

6.2 POPSI (Postulate-based Permuted Subject Indexing)

• developed at the Documentation Research and Training Center (India)

• classification ideas of S.R. Ranganathan• coding used for the index string generator is

based on the indicator system of Colon Classification. A comma “,” precedes the “entity” segment; a semicolon “;” is a “property segment”; a colon “:”is a process segment; a hyphen “-“is a qualifying sub segment; and a greater than “>”is a narrower term.

Page 52: Indexing

Example: The topic “study, using rabbits, of heart stimulation by antibiotics” will be placed under the discipline of pharmacology and will generate the following input string:

PHARMACOLOGY, CHEMICAL>DRUG>ANTIBIOTICS; STIMULATION-CIRCULATORY SYSTEM>HEART: STUDY-ANIMAL>RABBIT

Page 53: Indexing

Index strings that may be generated from the index string cited above are:

ANIMAL,STUDY,STIMULATIONPHARMACOLOGY,ANTIBIOTICS;STIMULATION-HEART:STUDY-RABBIT

ANTIBIOTICS,PHARMACOLOGYPHARMACOLOGY,ANTIBIOTICS;STIMULATION-HEART:STUDY-RABBIT

Page 54: Indexing

6.3 NEPHIS (Nested Phrase Indexing System) – developed by Timothy C. Craven. The input string was designed to be a phrase in ordinary language.

Four different coding symbols are used: • the left and the right angular brackets (“<” and “>”) -

mark the beginning and end of a phrase embedded or nested within a larger phrase

• the question mark “?” - indicates that what follows is a connective to be included only in those index strings in which the connective has something to connect

• the at sign “@” - indicates that what follows is not an access term; this coding symbol is used at the beginning of the input string or at the beginning of a nested phrase.

Page 55: Indexing

Example: Topic is “measures from information theory of the information content of document surrogates”

@MEASURES? OF<INFORMATION CONTENT?OF <DOCUMENT SURROGATES>>?FROM<INFORMATION THEORY>

Sample index strings that may be generated from the above input string are:

DOCUMENT SURROGATES. INFORMATION CONTENT. MEASURES FROM INFORMATION THEORY

INFORMATION CONTENT OF DOCUMENT SURROGATES. MEASURES FROM INFORMATION THEORY

INFORMATION THEORY. MEASURES OF INFORMATION CONTENT OF DOCUMENT SURROGATES

Page 56: Indexing

6.4 CIFT (Contextual Indexing and Faceted Taxonomic Access System)

– developed for the Modern Language Association (MLA), alphabetical subject entries are created from strings provided by indexers who assign facets derived from literature, linguistics and folklore.

Example:HENDIADYS

English literature. Tragedy. 1500-1599 Shakespeare, William. Hamlet. Use of HENDIADYS.

Sources in Vigil. Linguistic approach

LINGUISTIC APPROACHEnglish literature. Tragedy. 1500-1599

Shakespeare, William. Hamlet. Use of Hendiadys. Sources in Vigil. LINGUISTIC APPROACH

Page 57: Indexing

Measures of Effectiveness of Measures of Effectiveness of the Indexing Systemthe Indexing System

1. Recall measure – is a simple quantitative ratio of relevant documents retrieved to the total number of relevant documents potentially available. Recall depends on the level of exhaustivity allowed by the indexing policy.

Example: If there are 100 relevant documents in the library that are relevant to the user’s needs and the indexing system retrieves 75, then the recall ratio is 75 out of 100 (75/100). Recall for this search is 75 percent effective.

Page 58: Indexing

2. Precision measure – is the ratio of relevant documents retrieved to the total number of documents retrieved. Relevance or precision depends on the terminology of the text being indexed and the specificity of the indexing language used.Example:If 100 documents are retrieved and 50 of those items are relevant to the request, the precision ratio is 50 to 100 (50/100). Precision for this search is 50 percent effective.

Page 59: Indexing

Subject IndexingSubject Indexing Steps in subject indexing:1. Recording bibliographic data2. Subject determination3. Conceptual analysis4. Translation into standard terms

using controlled vocabulary

Page 60: Indexing

1. Recording bibliographic data (author, title, publication data, etc.)a. When indexing printed books, pamphlets, periodicals and other printed documents, use locators that refer to the page numbers, separating locators with a comma. Example: Livingstone, Ken 1/3, 1/97, 3/56

b. When indexing several issues or volumes of one title of a periodical, the indexer should take the locators from the numbering of the issues at the time of publication. Example: 54/3: 38 volume/part: page53, April 1998: 38 volume, date: page53: 38 volume: pageApril 1998: 38 date: page

Page 61: Indexing

c. When indexing contents of a collection of documents, locators should give complete information about each document (title of the article, the author(s), the title of the periodical, volume number and date, and the inclusive pagination for the article). Example:Automated Teller MachinesCompetition spurs development of innovative bank technologies. Bus Journ. 45, Jan-Mar 2004: 13.

The new networks. Info Tech. Apr 2005: 76-89.

d. If a document treats a subject continuously in a consecutively numbered sequence, reference should be made to the first and last numbered elements only.

e. Exceptionally, where space constraints apply or where the locators are extremely long, e.g. 10002-10012, numbers may be deleted so that the only changed digits of the second locator are given, e.g. 10002-12.

f. Conventionally, the digits 10-19 in each hundred are given in full, e.g. 412-18

Page 62: Indexing

2. Subject determination• “aboutness” of the material• formulation of a concept list

– most appropriate to the given community of users

– If necessary, modify both indexing tools and procedures as a result of feedback from inquiries

– no arbitrary limit should be set to the number of terms or descriptors

– concepts should be identified as specifically as possible.

Page 63: Indexing

3. Content analysisa. Factors that may affect content analysis:• Environmental situation • Policy decisions • Decisions of the indexer

b. Parts of the documents that have to be analyzed

• Title• Abstract • List of contents• Text itself• Illustrations, diagrams, tables and captions.• Reference section

Page 64: Indexing

4. Translation into standard terms using controlled vocabulary

The following practices must be observed in the translation process.– Concepts which are already translated into indexing

terms should be translated into their preferred terms.

– Terms which represent new concepts should be checked for accuracy and acceptability in reference tools.

– If the concepts are not present in an existing thesaurus or classification scheme, these may be

– Expressed by terms or descriptors which are admitted into indexing language

– Represented temporarily by more general terms, the new concept being proposed as candidates for later addition

Page 65: Indexing

Indexing Policies and Guidelines Indexing Policies and Guidelines &&

Production of Indexes Production of Indexes

Page 66: Indexing

Indexing Procedures for BooksIndexing Procedures for Books1. Examine the text carefully.2. Read the text several times, page by

page, to be able to analyze the contents and determine the indexable topics.

3. Select the topics to be indexed taking into consideration their significance to the central theme of the book.

4. Name the topics that were chosen to be indexed and mark up page proofs.

Page 67: Indexing

5. Alphabetize the entries.6. Edit the entries• Decide which entries should be the

main headings and which should be the subheadings

• Decide whether certain entries will be treated as main entries or subentries

Example:handicrafts

pottery making pottery makingweaving or weavingwood carving wood carving

Page 68: Indexing

• Main entries unmodified by subentries should not be followed by long rows or page numbers.

• Subentries must be concise and informative • Make a final choice among synonymous terms • Provide adequate but not excessive cross-

referencing

Examples:Cars Trucks Chevrolet, 224 Dodge Ram, 219 Mazda, 146 GMC (Jimmy), 143 Volkswagen Mercedes-Benz, 144

See also trucks See also cars

Page 69: Indexing

• Punctuationa. The inversion of a phrase used as the heading in a main entry is punctuated by a comma.b. If the heading is followed immediately by page references, a comma is used between the heading and the first numeral and between subsequent numerals.c. If the heading is followed immediately by run-in subentries, a colon precedes the first subheading. All subsequent subentries are preceded by semicolons. For example:payments, balance of: definition of, 16;importance of, 19

Page 70: Indexing

7. Determine the design of the index after the compilation of the entries

• Decide whether subentries will follow an indented or run-in style.

• Typography should be used to differentiate between types of headings and to distinguish them from numerals indicating volumes, parts and pages.

8. Typing, proofreading, and the final review.

Page 71: Indexing

Indexing Techniques for Periodical Articles

1. Always index names of persons honored by awards or prizes and those eulogized in obituaries.

2. Every article that have permanent value should be indexed under all topics and issues dealt with.

3. Editorials should be indexed under their topics as any other article but differentiated from the others by the addition of (Ed.) or (E). The titles of editorials may be indexed under a collective heading “Editorials”.

Page 72: Indexing

4. Letters to the editor if considered indexable should be indexed by topic, not under a caption that may have been assigned by the editor. It is advisable to index at least the name of the person who criticized an article as well as the author’s response. For example:

Doe, John. “Effect of magnetic fields” 37-43 Errors (H. Smith) 75; correction 185 [author’s entry]Smith, Henry. “Effect of magnetic fields” (John Doe pp. 37-43): errors 75 [letter writer’s index entry]

Page 73: Indexing

5. Book reviews are indexed by the title of the book, followed by the name of the author, the locator, and the designation (R) unless all book reviews are listed under the class heading “Book Reviews” or in a separate index,

e.g.Guide to reference books, 10th ed. (Sheehy) 68 (R)*The name of the reviewer should be included in

the author name index, e.g.Dixon, Geoffrey 68 (R), 92-96, 123

Page 74: Indexing

Choice and Forms of Headings (ISO 999)

1. Personal Names1. Personal Names• full form as possible• should take the form used in the document, but if the

text is not consistent, the indexer should adopt one form

• choose the most recent, or the most commonly used form of personal name as the heading and add “see” cross-references from other forms,

e.g. Clemens, Samuel Langhorne see Twain, Mark• where surnames are in common used, the entry

should be the surname followed by any given name or initials

• Where surnames are not used, the name that customarily comes first should properly be used as the entry word

e.g. Imran Khan

Page 75: Indexing

• Persons identified only by a given name or forename should be indexed under that name, qualified if necessary, by a title of office or other distinguishing epithete.g. Leonardo da Vinci

Boudicca, Queen of Iceni • Persons normally identified by a title of honor or

nobility should be indexed under that title, expanded if necessary by their family namee.g. Dalai Lama

First Duke of Marlborough, John Churchill• Compound and multiple surnames, whether

hyphenated or not, should be indexed under the first part

e.g. Layzell Ward, PatriciaPerez de Cueller, Javier

Page 76: Indexing

2. Corporate Bodies2. Corporate Bodies• Names of the corporate bodies should normally be

indexed without transpositione.g. British Museum

• Transposition may, however, be used if it is considered that this would help the users of the index.

e.g. Department of Agriculture see Agriculture, Department of

J. Whitaker & Sons see Whitaker (J) & Sons

• Choose the most recent or the most commonly used form of corporate name as the main heading and add “see” cross references from other forms

e.g. John Moores University see Liverpool John Moores University

Liverpool John Moores University

Page 77: Indexing

3. Geographic Names• should be full as necessary for clarity, with additions

to avoid confusion with the otherwise identical names

e.g Alaminos (Laguna)Alaminos (Pangasinan)

• An article or preposition should be retained in a geographic name of which it forms an integral part

e.g. La Paz Las Vegas

• Where the article or preposition does not form an integral part of a name it should be omitted, e.g.

e.g New Forest rather than The New ForestRheinfall rather than Der Rheinfall

Page 78: Indexing

4. Titles of documents4. Titles of documents• should normally be italicized, underlined or otherwise

distinguished. If necessary for identification, names of creators, places of publication dates or other qualifiers may be added within parenthesis.

e.g. Ave Maria (Gounod)Ave Maria (Schubert)Ave Maria (Verdi)

• In an English index, articles in titles are conventionally transposed to the end of the heading so that filing order is explicit.

e.g. Hunting of the Snark, TheKapital, Das

• A preposition at the beginning of the title should be retainede.g. To the Lighthouse

Page 79: Indexing

5. First lines of poems5. First lines of poems Conventionally in an index of first lines of poems, the article is retained without transposition and is recognized for purpose of alphabetical arrangement

e.g. A little thing in the snow The modest Rose puts forth a thorn

Page 80: Indexing

EvaluationEvaluation of Indexes of Indexes Guidelines/Criteria

1. Subject error•Errors in choosing subject descriptors•Omission errors•Use of a too broad or too narrow term

2. Generic searching – Alphabetical indexes have always presented difficulties in promoting generic searching.

Page 81: Indexing

3. Terminology 4. Internal guidance

• Cross-references• Printed instruction on how to use the index

5. Accuracy in referring• Bibliographic citation• Cross-references

6. Entry scatteringExample:

College libraries School librariesNational libraries Special librariesPublic libraries

Page 82: Indexing

7. Entry differentiation Example:Libraries, 1-2, 28-31, 42, 53-60, 82, 109-11, 131-40, 310, 342-508. Spelling and punctuation9. Filing

• Letter by letter (Air base, Airborne, Air brake)• Word by word (Air base, Airborne, Air brake)

10. Layout• Main heading are in heavy print• Subheadings are in lighter print and small letters and

indented• See references are italicized

11. Length and type• Index length should be 3-5% of the pages of a typical

nonfiction book, about 5-8% for a history or biography and about 15-20% for reference books

12. Cost13. Standards

Page 83: Indexing

International Organization for International Organization for StandardizationStandardization

ISO 2788: 1986 – Documentation – Guidelines for the establishment and development of monolingual thesauri

ISO 5964: 1985 – Documentation - Guidelines for the establishment and development of multilingual thesauri

ISO 5963: 1985 – Documentation – Methods for examining documents, determining their subjects, and selecting indexing terms

Indexing Standards Indexing Standards

Page 84: Indexing

Indexing Standards Indexing Standards International Organization for International Organization for

StandardizationStandardization

ISO 999: 1996 – Information and documentation – Guidelines for the content, organization and presentation of indexes

ISO 4: 1997 - Information and documentation – Rules for the abbreviation of title words and titles of publications. It publishes a List of Serial Title Word Abbreviations which includes title word abbreviations in over 50 languages.

Page 85: Indexing

British Standards Institution (BSI) British Standards Institution (BSI)

BS 1749: 1985 - Recommendations for alphabetical arrangement and the filing order of numbers and symbols

BS 6478: 1984 Guide to filing bibliographic information in libraries and documentation

BS 6529: 1984 Recommendations for examining documents, determining their subjects and selecting indexing terms

BS 6723: 1985 Guide to establishment and development of multilingual thesauri

BS 5723: 1987 - Guide to establishment and development of monolingual thesauri

BS ISO 999: 1996 Information and Documentation – Guidelines for the content, organization and presentation of indexes

Page 86: Indexing

Automatic Indexing Automatic Indexing • refers to indexing by machine, or the

analysis of text by means of computer algorithms. The focus is on automatic methods used behind the scenes with little or no input from individual searchers, with the exception of relevance feedback.

Page 87: Indexing

Four Types of Approaches Four Types of Approaches (Cleveland & Cleveland, 2001, p. 211)

• Statistical – based on counts of words, statistical associations, and collation techniques that assigns weighs, cluster similar words

• Syntactical – stresses grammar and parts of speech, identifying concepts found in designated grammatical combinations, such as noun phrases.

Page 88: Indexing

• Semantic systems – concerned with the context sensitivity of words in the text. What does cat mean in terms of its context? House cats? Heavy earthmoving equipment?

• Knowledge-based – systems goes beyond thesaurus or equivalent relationships to knowing the relationship between words, e.g. ‘tibia’ is part of a leg, thus the document is indexed under ‘leg injuries’.

Page 89: Indexing

Human /Manual Indexing vs. Automatic Indexing

• Needs more people• Costly• Human error• Low in production• Quality can range from

excellent to appalling

• Needs less human effort• Cheaper • Follows instruction

automatically• Accurate• Fast in production• Promotes meticulous

problem analysis• Dependent to human

intelligence• Power lies on how the

computer is programmed

Page 90: Indexing

Human /Manual Indexing vs. Automatic Indexing

• Automatic methods have trouble handling synonyms, homonyms, and semantic relations. Conceptualizing is very poor.

• Human indexers go through cognitive processes that may be influenced by their background experience, education, training, intelligence, and common sense.

• Computers can, and humans cannot, organize all words in a text and in a given database and make statistical operations on them

Page 91: Indexing

Indexing and the InternetIndexing and the InternetSearch ToolsSearch Tools• Search engines - Engines are computer software that scan the

Web and select pages to be indexed for the searching system. They are often referred to as Web indexes since they examine the content of the web pages. Examples: HotBot, InfoSeek, and Google.

• Directory-based systems – usually indexed by human and thus tend to have a higher level of quality in the indexing. Indexing may be based on full text or on most frequently used words since the way the material is organized is a sense of browsing that is similar to traditional library browsing. Examples: Yahoo! Directory and Google Directory

• Metasearchers - allow the user to search across multiple search tools at once. They take user’s query and submit it to a number of other search tools. Examples: Metacrawler and Surfmax

Page 92: Indexing

GOOD LUCK!