Table - davilamlis.files.wordpress.com › 2017 › 09 › ... · taxonomy, theory, thesauri,...

Erin Agobert, Holley Cornetto, Felix Davila III April 13, 2016 INFO 202 Information Retrieval System Design Project 2: Database Design & Subject Analysis

Table of Contents

Part A. Design Database A1. Descriptor Vocabulary A2. Statement of Purpose A3. Data Structure A4. Rules

Part B. Create Content

B1. Search Page URL B2. Records

Part C. Query & Evaluate

C1. Topics & Queries C2. Evaluation C3. Reflections

1

A1. Descriptor Vocabulary

aboutness, access, aggregation, architecture, archives, behavior, bibliographies, bookmarking,

browsing, catalogs, cataloging, classification, constructing, content, controlled vocabulary, data,

databases, design, digital libraries, discrimination, evaluation, facets, history, indexes, indexing,

information, information retrieval, information science, information seeking, keywords, libraries,

medicine, metadata, methods, museums, natural language, navigation, organization,

performance, practitioners, professional, queries, recall, records, relevance, reliability, research,

retrieval, search, service, sharing, standards, structure, subject headings, systems, tagging,

taxonomy, theory, thesauri, usability, users, vocabulary, web, XML

A2. Statement of Purpose

The purpose of our database is to provide students in a Library and Information Science

graduate program easy access to scholarly articles within their field of study. Students may need

to access these articles for scholarly research, required course readings, career development, or

for other information needs. Our goal is to make the database accessible to students who may

need to find articles based on the names of the author, the article’s title, or the article’s subject.

Students should also be able to narrow their searches based on the year and source of the

publication. As graduate students in an MLIS program, these individuals should have a working

knowledge of the vocabulary of the profession, and an emerging knowledge of the acronyms

commonly used in the profession.

2

A3. Data Structure

A4. Rules

Field Name: Title

Type: Comment

Mandatory: Yes

Description: Source Title

Entry Rules: The indexer is to enter the title by capitalizing the first letter of the first term and

leaving the rest of the terms in lowercase format. APA style formatting should be followed.

Field Name: Author

Type: Textbox

Mandatory: Yes

Description: Author of the Source

3

Entry Rules: The indexer is to enter the author’s name by surname first, capitalizing the first

letter. This should be followed by a comma, a space, the first name with the first letter

capitalized, followed by the first initial of the middle name capitalized, with a period at the end.

APA style formatting should be followed.

Field Name: Source

Type: Textbox

Mandatory: Yes

Description: Type of Source

Entry Rules: The indexer is to input the type of source by the complete title, capitalizing the first

letter of each word, with spaces following each word. A comma should follow the last term, with

a space, the journal number with a comma next, a space and the page range including the first

and last pages.

Field Name: Year

Type: Textbox

Mandatory: Yes

Description: Year of Source

Entry Rules: The indexer should enter the year of the source in full, just as in APA style

formatting.

Field Name: Abstract

4

Type: Comment

Mandatory: Yes

Description: Abstract of Source

Entry Rules: The indexer is to input the complete abstract with spaces in between each word,

concluding with a period.

Field Name: Descriptor

Type: Textbox

Mandatory: Yes

Description: Postco Vocab Assigned to Source

Entry Rules: The indexer should input the descriptors in lowercase format, following each term

with commas, but without spaces.

B1. Search Page URL

https://libr202.sjsu.edu/webdata_pro/student/2241/cgi-bin/webdata_pro.pl?_cgifunction=user&_l

ayout=group_23_articles_202

B2. Records

ID# Title Author Source Year Abstract Descriptor

1 The invisible substrate of information science

Bates, Marcia J.

Journal of the American Society for Information Science, 50(12), 1043-1

1999 The explicit, above-the-water-line paradigm of information science is well known and widely discussed. Every disciplinary paradigm, however, contains elements that are less conscious and explicit in the thinking of its practitioners.

information science,information retrieval,history,research,theory,methods

https://libr202.sjsu.edu/webdata_pro/student/2241/cgi-bin/webdata_pro.pl?_cgifunction=user&_layout=group_23_articles_202

https://libr202.sjsu.edu/webdata_pro/student/2241/cgi-bin/webdata_pro.pl?_cgifunction=user&_layout=group_23_articles_202

5

Elucidates the key elements of the below-the-water-line portion of the information science paradigm. Highlights the role of information science as a meta-science: conducting research and developing theory around the documentary products of other disciplines and activities. Views the mental activities of the professional practice of the field as centering around representation and organization of information, rather than knowing information. Argues that such representation engages fundamentally different talents and skills from those required in other professions and intellectual disciplines. Also considers methodological approaches and values of information science.

2 Vocabulary as a central concept in library and information science

Buckland, Michael

Proceedings of the Third International Conference on Conceptions of Library and Information Science. Dubrovnik, Croatia, 23-26 May 1999

1999 The nature and role of vocabulary in information systems is examined. "Vocabulary" commonly refers to the stylized adaptation of natural language to form indexes and thesauri. Much of bibliographic access, filtering, and information retrieval can be viewed as matching or translating across vocabularies. Multiple vocabularies are simultaneously present. A simple query in an online catalog normally involves at least five distinct vocabularies: those of the authors; the cataloger; the syndetic structure; the searcher; and the formulated query.

vocabulary,information,systems,indexes,thesauri,digital libraries,information retrieval,access,bibliographies,catalogs,structure,search,user,queries

3 A brief history of information architecture

Resmini, Andrea & Rosati, Luca

Journal ofInformation Architecture, 3(2), 33-46

2012 Information architecture (IA) is a professional practice and field of studies focused on solving the basic problems of accessing, and using, the vast amounts of information available today. This article covers the history of information design from the

information,architecture,access,history,information,design,systems,web,organization,navigation

6

1960s, when the concept was first introduced in the computing world, to today, in a world where relationships with people, places, objects, and companies are shaped by semantics and not only by physical proximity. (This article is a reprint of parts of Chapter 2, “Towards a Pervasive Information Architecture”, from Andrea Resmini and Luca Rosati's “Pervasive Information Architecture”, a book published by MorganKauffman. The text was partially edited for clarity by the authors.)

4 Information retrieval as a trial-and-error process

Swanson, Don R.

Library Quarterly, 47(2), 128-148.

1977 Recognition of the essential role of trial and error in access to scientific literature may point the way toward improved information services and may illuminate inconsistencies that have beset many retrieval experiments. This paper examines three important and well-known information retrieval experiments, with a focus on certain internal inconsistencies and on the high variability of search results. In these experiments, retrieval systems are evaluated in terms of their ability to select relevant documents and reject those that are irrelevant. It is suggested that this criterion is inadequate because of ambiguities inherent in the concept of relevance and that closer attention to trial-and-error processes may be helpful in developing better criteria. Specific examples of how one might improve document retrieval, library use, and citation indexing are offered.

information,service,evaluation,search,aggregation,discrimination,retrieval,systems,recall,indexing,relevance

5 User-centered design of information systems

Toms, Elaine

M.J. Bates (Ed.), Understanding Information Retrieval

2012 User-Centered Design (UCD) was founded on the premise that knowledge of users and their participation in the way that systems are designed is essential.

systems,design,users,evaluation,methods,information retrieval

7

Systems: Management, Types, and Standards. Boca Raton, FL : CRC Press.

User-centered design is a “multidisciplinary design approach based on the active involvement of users to improve the understanding of user and task requirements, and the interaction of user design and evaluation.” This entry first provides background on the genesis of UCD, following by a section on the philosophy and theoretical underpinning of UCD, and then a description of the method, as it is generally practiced.

6 Design, use, and evaluation of information retrieval systems

Weedman, Judith

Brooke Sheldon & Kenneth Haycock (Eds.), The portable MLIS. Westport, CN: Libraries Unlimited

2008 Design is at the core of professional work. Information professionals not only design sources of information such as websites and collections, but also use them for information retrieval. Evaluation is intricately connected to both of these activities. This evaluation consists of evaluating the collection itself, the reliability of the sources, and the reliability and usability of the information. This chapter introduces these essential spheres of information work.

design,information,information retrieval,systems,evaluation,usability,reliability,professional

7 Indexing and access for digital libraries and the internet: Human, database, and domain factor

Bates, Marcia J.

Journal of the American Society for Information Science, 49(13), 1185-1205.

1998 Presents information on a study which looked at indexing and access to digital libraries and the Internet. Factors important in the design of access mechanisms; Skills of an indexer; Reference to pervious literature; Information on folk classification.

tagging,indexing,subject searching,information retrieval,digital libraries,search,access,classification

8 Library data in a modern context

Coyle, Karen

Library Technology Reports, 46(1), 5-13.

2010 The article examines the state of library data in a modern context. It recounts the history of modern library cataloging practice, which included Anthony Panizzi's 91 rules and the creation of the Online Public Access Catalog (OPAC) in the 1980s. A description of metadata, which is

cataloging,metadata,web,data,search

8

processed by computers to be understandable to humans, is discussed as well as library bibliographical metadata. Also explained is the process of cataloging and the use of the library catalog. The role of the World Wide Web as an information platform, including social networking sites, is noted.

9 Metadata for all: Descriptive standards and metadata sharing across libraries, archives, and museums

Elings, Mary W. & Waibel, Gunter

First Monday, 12(3).

2007 Integrating digital content from libraries, archives and museums represents a persistent challenge. While the history of standards development is rife with examples of cross-community experimentation, in the end, libraries, archives and museums have developed parallel descriptive strategies for cataloguing the materials in their custody. Applying in particular data content standards by material type, and not by community affiliation, could lead to greater data interoperability within the cultural heritage community. In making this argument, the article demystifies metadata by defining and categorizing types of standards, provides a brief historical overview of the rise of descriptive standards in museums, libraries and archives, and considers the current tensions and ambitions in making descriptive practice more economic [1].

museums,libraries,archives,XML,content,cataloging,aggregation,metadata,standards,sharing,data,structures

10 A cognitive process model of document indexing

Farrow, John F.

Journal of Documentation, 47(2), 149-166.

1991 Classification, indexing and abstracting can all be regarded as summarisations of the content of a document. A model of text comprehension by indexers (including classifiers and abstractors) is presented, based on task descriptions which indicate that the comprehension of text for indexing differs from normal

classification,indexing,thesauri,subject headings,controlled vocabulary,natural language,aboutness

9

fluent reading in respect of: operational time constraints, which lead to text being scanned rapidly for perceptual cues to aid gist comprehension; comprehension being task oriented rather than learning oriented, and being followed immediately by the production of an abstract, index, or classification; and the automaticity of processing of text by experienced indexers working within a restricted range of text types. The evidence for the interplay of perceptual and conceptual processing of text under conditions of rapid scanning is reviewed. The allocation of mental resources to text processing is discussed, and a cognitive process model of abstracting, indexing and classification is described.

11 The structure of collaborative tagging systems.

Golder, Scott A. & Huberman, Bernardo A.

arXiv.org 2005 Collaborative tagging describes the process by which many users add metadata in the form of keywords to shared content. Recently, collaborative tagging has grown in popularity on the web, on sites that allow users to tag bookmarks, photographs and other content. In this paper we analyze the structure of collaborative tagging systems as well as their dynamical aspects. Specifically, we discovered regularities in user activity, tag frequencies, kinds of tags used, bursts of popularity in bookmarking and a remarkable stability in the relative proportions of tags within a given url. We also present a dynamical model of collaborative tagging that predicts these stable patterns and relates them to imitation and shared knowledge.

tagging,metadata,bookmarking,keywords,taxonomy,web,sharing,search,classification

10

12 Issues in the development of a thesaurus for patients' chief complaints in the emergency department

Haas, Stephanie W. & Travers, Debbie A

67th Proceedings of the ASIS&T annual meeting, 41, 411-417.

2004 When a patient visits the Emergency Department (ED), the reason the patient is seeking care is recorded as the Chief Complaint (CC). Beyond its role in the patient's care, there is interest in the CC for secondary uses. Clinicians and epidemiologists can use CC for research. ED clinicians and administrators incorporate CC data into quality monitoring and improvement efforts. Public health officials can use it as data for health surveillance. But there is no controlled vocabulary for recording CC, or standard for a CC component in the patient record. Travers (2003) completed a crucial first step toward the creation of a thesaurus for CC by analyzing a corpus of CCs to determine the nature of the language used by triage nurses, and the concepts that were expressed. Her analysis also illuminated many issues concerning the content and structure of a CC thesaurus that must be discussed before the thesaurus can be developed. Using Cimino's 1998 article, “Desiderata for Controlled Medical Vocabularies in the Twenty-First Century”, as a framework, we discuss these issues and the resulting decisions that the thesaurus development team, along with other stakeholders, will encounter.

thesauri,controlled vocabulary,metadata,design,search

13 Developing a new thesaurus for art and architecture

Petersen, Toni

Library Trends 38(4).

1990 THE ART AND ARCHITECTURE currently consisting of THESAURUS, almost 40,000 terms, is midway in its development. Methods for constructing the thesaurus were modeled on existing standards and on other thesauri such the National Library of Medicine’s

architecture,thesauri, constructing,medicine,bibliographies, vocabulary,databases,facets,metadata

11

MeSH Thesaurus. It was designed to provide the “hinge” between the object, its images, and related bibliographic material. In the decade since it was begun, however, attitudes toward the use of terminology to describe visual images and museum objects have changed, impelling AAT constructors to develop policies that would make the thesaurus flexible enough to meet the needs of a new generation of database producers. This article describes the processes and policies that were developed to construct a language that would represent knowledge in the field of art and architecture as well as be surrogates for the images and objects being described. The AAT’s presentation of an “atomized” or faceted language is detailed.

14 Design science in the information sciences

Weedman, Judith

Marcia J. Bates & Mary Niles Maack (Eds.), Encyclopedia of library and information sciences, 3rd ed.

2010 Design is a core professional responsibility in the information professions as in others. It may be local and idiosyncratic, or it may be the focus of a major research project. Design researchers have argued that there are fundamental design problems and design solutions that cross scale and domain boundaries; if this is so, then design science, the study of design, should build knowledge both about particular domains and also about what is true across domains. This entry examines design science in several fields to indicate where it might inform design in the library and information sciences. It concludes with two examples from the information sciences, one a design science study of vocabulary design and one a design research project, in which the research consisted of

design,information,vocabulary,research,information retrieval,systems

12

design of an information retrieval system.

15 What is browsing—really? A model drawing from behavioural science research.

Bates, Marcia J.

Information Research, 12(4).

2007 Introduction. It is argued that the actual elements of typical browsing episodes have not been well captured by common approaches to the concept to date. Method. Empirical research results reported by previous researchers are presented and closely analysed. Analysis. Based on the issues raised by the above research review, the components of browsing are closely analysed and developed. Browsing is seen to consist of a series of four steps, iterated indefinitely until the end of a browsing episode: 1) glimpsing a field of vision, 2) selecting or sampling a physical or informational object within the field of vision, 3) examining the object, 4) acquiring the object (conceptually and/or physically) or abandoning it. Not all of these elements need be present in every browsing episode, though multiple glimpses are seen to be the minimum to constitute the act. Results. This concept of browsing is then shown to have persuasive support in the psychological and anthropological literature, where research on visual search, curiosity and exploratory behaviour all find harmony with this perspective. Conclusions. It is argued that this conception of browsing is closer to real human behaviour than other approaches. Implications for better information system design are developed.

browsing,research,behavior, search

13

ID# Title Author Source Year Abstract Descriptor

16 What have we got to lose? The effect of controlled vocabulary on keyword searching results

Gross, T. & Taylor, A.

College & Research Libraries, 66(3)

2005 Using controlled vocabulary in the creation and searching of library catalogs has evoked a great deal of debate because it is expensive to provide. Leading to this study were suggestions that because most users seem to search by keyword, subject headings could be removed from catalog records to save space and cost. This study asked, what proportion of records retrieved by a keyword search has a keyword only in a subject heading field and thus would not be retrieved if there were no subject headings? It was found that more than one-third of records retrieved by successful keyword searches would be lost if subject headings were not present, and many individual cases exist in which 80, 90, and even 100 percent of the retrieved records would not be retrieved in the absence of subject headings.

controlled vocabulary,catalogs,keywords,records,subject headings,search

17 Evaluating the performance of information retrieval systems using test collections

Clough, C., & Sanderson, M.

Information Research, 18(2)

2013 Introduction. Evaluation is highly important for designing, developing and maintaining effective information retrieval or search systems as it allows the measurement of how successfully an information retrieval system meets its goal of helping users fulfil their information needs. But what does it mean to be successful? It might refer to whether an information retrieval system retrieves relevant (compared with non-relevant) documents; how quickly results are returned; how well the system supports users' interactions; whether users are satisfied with the results; how easily users can use the system; whether the system helps users carry out their tasks and fulfil

evaluation,information retrieval,search,behavior,queries,usability

14

their information needs; whether the system impacts on the wider environment; how reliable the system is etc. Evaluation of information retrieval systems has been actively researched for over 50 years and continues to be an area of discussion and controversy. Test collections. In this paper we discuss system-oriented evaluation that focuses on measuring system effectiveness: how well an information retrieval system can separate relevant from non-relevant documents for a given user query. We discuss the construction and use of standardised benchmarks - test collections - for evaluating information retrieval systems. Research directions. The paper also describes current and future research directions for test collection-based evaluation, including efficient gathering of relevance assessments, the relationship between system effectiveness and user utility, and evaluation across user sessions. Conclusions. This paper describes test collections which have been widely used in information retrieval evaluation and provide an approach for measuring system effectiveness.

18 Comparative recall precision of simple and expert searches in GoogleScholar and eight other databases

Walters, W. H.

portal: Libraries and the Academy, 11(4), 971-1006

2011 This study evaluates the effectiveness of simple and expert searches in Google Scholar (GS), EconLit, GEOBASE, PAIS, POPLINE, PubMed, Social Sciences Citation Index, Social Sciences Full Text, and Sociological Abstracts. It assesses the recall and precision of 32 searches in the field of later-life migration: nine simple keyword searches and 23 expert searches constructed by demography librarians at three top universities.

search,indexes,keywords,users,databases,evaluation,systems,performance

15

For simple searches, Google Scholar’s recall and precision are well above average. For expert searches, the relative effectiveness of GS depends on the number of results users are willing to examine. Although Google Scholar’s expert-search performance is just average within the first fifty search results, GS is one of the few databases that retrieves relevant results with reasonably high precision after the fiftieth hit. The results also show that simple searches in GS, GEOBASE, PubMed, and Sociological Abstracts have consistently higher recall and precision than expert searches. This can be attributed not to differences in expert-search effectiveness, but to the unusually strong performance of simple searches in those four databases.

19 Homepage real estate allocation.

Nielsen, Jakob

http://www.nngroup.com/articles/homepage-real-estate-allocation/

2013 Websites spend too little homepage screen space on content of interest to users and fail to utilize modern monitor sizes. And? It’s worse now than it was 12 years ago :-(

users,design,navigation,web,structures

20 Usability evaluation of web mapping sites.

Nivala, Annuu-Maaria, Brewster, Stephen, & Sarjakoski, L. Tina

The Cartographic Journal 45(2), 129-138.

2008 To identify the potential usability problems of Web mapping sites, four different sites were evaluated: Google Maps, MSN Maps & Directions, MapQuest, and Multimap. The experiment comprised a series of expert evaluations and user tests. During the expert evaluations, eight usability engineers and eight cartographers examined the Web mapping sites by paying attention to their features and functionality. Additionally, eight user tests were carried out by ordinary users in a usability laboratory. In all, 403 usability problems were identified during the trial and were grouped according to their severity. A

usability,web,evaluation,users,search,design,navigation,structures






16

qualitative description is given of these usability problems, many of which were related to search operations that the users performed at the Web mapping sites. There were also several problems relating to the user interface, map visualisation, and map tools. We suggest some design guidelines for Web mapping sites based on the problems we identified and close the paper with a discussion of the findings and some conclusions.

C1. Topics & Queries

Holley Cornetto

Topic 1: I would like to find more information about theories of information retrieval.

● Queries in the descriptor field:

o Relevant Article(s): 1

o Theory: 1 result (# 1)

o Information retrieval: 7 results (# 1, 2, 5, 6, 7, 14, 17)

o Information retrieval AND theory: 1 result (# 1)

● Queries in the abstract field:


o Theory: 1 result (# 1)

o Information retrieval: 5 results (# 2, 4, 6, 14, 17)

o Information retrieval AND theory: 0 results

Topic 1 Articles #

17

Retrieved & Relevant 1 (Only retrieved in the descriptor field search).

Relevant & Not Retrieved (Recall 100%)

1 (Not retrieved in the abstract search.)

Retrieved & Not Relevant (Precision 14%)

2, 4, 5, 6, 7, 14, 17

Analysis: My search results yielded 7 articles that were not relevant; I believe this is because they were applicable to the larger field of information retrieval, but not to the narrowed search, which included theory. When the search was narrowed, none of the irrelevant articles were retrieved. However, when the search was narrowed and Boolean operators were included, the abstract field did not retrieve the relevant result.

Topic 2: I would like to find information about searching for information that organized through

tagging.


o Relevant Article(s): 7, 11

o Search AND tagging: 2 results (#7, 11)

o Information AND tagging: 1 result (# 7)

o Organization AND tagging: 0 results

o Tagging AND classification: 2 results (# 7, 11)



o Search AND tagging: 0 results

o Information AND tagging: 0 results

o Organization AND tagging: 0 results

o Classification AND tagging: 0 results

18

Topic 2 Articles #





0

Analysis: Search for this topic was easier when using the descriptor fields, because then I had 100% precision. When I searched for the same terms in the abstract, I did not receive any documents, which was frustrating, because I knew that the database contained documents about tagging.

Erin Agobert

Topic 1: I would like to know more about the use of controlled vocabulary for information

retrieval.


o Relevant Article(s): 2, 12, 16

o Vocabulary: 6 results (# 2, 10, 12, 13, 14, 16)

o Controlled AND Vocabulary: 3 results (# 10, 12, 16)

o Vocabulary AND Information Retrieval: 2 results (# 2, 14)

o Controlled Vocabulary AND Information Retrieval: 0 results


o Relevant Article(s): 2, 12, 16

o Vocabulary: 4 results (# 2, 12, 14, 16 )

19

o Controlled AND Vocabulary: 2 results (# 12, 16)

o Vocabulary AND Information Retrieval: 2 results (#2, 14)

o Controlled Vocabulary AND Information Retrieval: 0 results

Topic 1 Articles #

Retrieved & Relevant 3 Retrieved and relevant from descriptor and abstract field search. ID# 2, 12, 16

Relevant & Not Retrieved (Recall: 100% Descriptor and Abstract)

0 Not retrieved from descriptor and abstract field search.

Retrieved & Not Relevant (Precision: 50% Descriptor, 75% Abstract)

3 Retrieved and not relevant from descriptor field search (ID# 10, 13, 14) and 1 retrieved and not relevant from abstract field search.

Analysis: Searching this topic retrieved 100% recall of relevant records from descriptor and abstract search. However, the abstract field search was more precise in the number of relevant records retrieved. This occurred perhaps because of the availability of more relevant terms in the abstract field. The descriptor field is based on the controlled vocabulary the team assigned to the record.

Topic 2: I would like to know about designing information retrieval systems.


o Relevant Article(s): 5, 6, 14, 17

o Designing Information Retrieval Systems: 0 results

o Design AND Information Retrieval: 3 results (# 5, 6, 14)

o Design OR Designing AND Information Retrieval: 7 results (# 3, 5, 6, 12, 14, 19,

20)

20

o Design AND IR System: 0 results


o Relevant Article(s): 5, 6, 14, 17

o Designing Information Retrieval Systems: 0 results

o Design AND Information Retrieval: results (# 6, 14, 17)

o Design OR Designing AND Information Retrieval: 9 results (# 3, 5, 6, 7, 13, 14,

15, 17, 20)

o Design AND IR System: 0 results

Topic 2 Articles #

Retrieved & Relevant 4 Retrieved and relevant from descriptor and abstract field search. ID# 5, 6, 14, 17

Relevant & Not Retrieved (Recall: 80% Descriptor, 100% Abstract)

1 Relevant and not retrieved from descriptor field search (ID# 17) and 0 relevant and not retrieved from abstract field search

Retrieved & Not Relevant (Precision: 50% Descriptor, 44% Abstract )

4 Retrieved and not relevant from descriptor search field (ID# 3, 12, 19, 20) and 5 retrieved and not relevant from abstract search field (ID# 3, 7, 13, 15, 20).

Analysis: Searching this topic retrieved all relevant files. Recall of relevant files was 80% for the descriptor search field and 100% of the abstract search field. The search precision was 50% for the descriptor search and 44% for the abstract search. Both fields were very good at pulling accurate and relevant files. However, the searches were poor at providing erroneous records in addition to relevant files. I am not sure if this meant my chosen search terms should be strengthened in addition to adding terms to the controlled vocabulary assigned to each record (in terms of precision).

21

Felix Davila III

Topic 1: I would like to find more information about the access to digital libraries.



o Access: 3 results (# 2, 3, 7)

o Digital libraries: 2 results (# 2, 7)

o Digital libraries AND access: 2 results (# 2, 7)



o Access: 5 results (# 2, 3, 4, 7, 8)

o Digital libraries: 1 result (# 7)

o Digital libraries AND access: 1 result (# 7)

Topic 1 Articles #’s

Retrieved & Relevant 2 & 7 (Only retrieved in the descriptor field search).


2 (Article 2 was not retrieved in the abstract search.)


3, 4, 8

Analysis: My search results yielded 3 articles that were not relevant; I believe this is because they were more applicable to the more general idea of access, but not to the narrowed search, which included digital libraries. When the search was narrowed, none of the irrelevant articles was retrieved. However, when the search was narrowed and Boolean operators were included, the abstract field only retrieved half of the desired results.

22

Topic 2: I would like to find more information about the usability and design of systems.



o Usability: 3 result (# 6, 17, 20)

o Design: 7 results (# 3, 5, 6, 12, 14, 19, 20)

o Systems: 7 results (# 2, 3, 4, 5, 6, 14, 18)

o Usability AND Design: 2 results (# 6, 20)

o Design AND Systems: 4 results (# 3, 5, 6, 14)

o Systems AND Usability: 1 results (# 6)

o Usability AND Design AND Systems: 1 results (# 6)



o Usability: 2 result (# 6, 20)

o Design: 9 results (# 3, 5, 6, 12, 13, 14, 15, 17, 20)

o Systems: 5 results (# 2, 4, 5, 11, 17)

o Usability AND Design: 2 results (# 6, 20)

o Design AND Systems: 2 results (# 5, 6, 17)

o Systems AND Usability: 0 results

o Usability AND Design AND Systems: 0 results

Topic 2 Articles #’s

23





3, 5, 12, 13, 14, 15, 17, 20

Analysis: Search for this topic was just as difficult as when using the descriptor fields, because then I had 12.5% precision when including each of the precise search terms, whether individually or together. Using all three keyword terms, I was able to find my desired article When I searched for the same terms in the abstract; it generated almost the same exact results, resulting in a small 11% yield.

C2. Evaluation

The key takeaway from this project is the importance and usefulness of a controlled

vocabulary, mostly because we had such mixed results when searching the database for selected

topics. When using the descriptor field, we found that we were able to retrieve most results

wanted with a recall of relevant articles ranging from 80% to 100%. The abstract field yielded

similar results with a recall range of 100%. When we looked at search precision, the database’s

ability to retrieve relevant articles was results were all over the place with a range of 11% to

100%. Our thoughts on this are twofold – one, we wonder how much of the database’s precision

and recall had to do with the aboutness of the article rather than bias in our controlled

vocabulary. Having worked together as a group creating the vocabulary terms and entered the

terms into the database, we had a strong familiarity with the controlled vocabulary used for the

descriptor field. There exists the possibility the descriptor search proved more fruitful because

we were more familiar with this vocabulary than the vocabulary used in the article abstracts.

That being stated, we noticed the descriptor search allowed for more modern language and

24

terminology to be used to describe the aboutness of the article. For example, when Holley

searched for articles about “tagging.” Article 7 in our database includes information about

tagging, but in the abstract, this is referred to as “folk classification” which is a term related to

tagging or folksonomy, but because the article was written before these became the widely

accepted terminology, a search for “tagging” in the abstract field would not retrieve this article –

despite its relevancy.

C3. Reflections

The division of work in this group was organic – each member stepped in to help when

and where they could. As working professionals, we decided it was easier to divide the work

into tasks rather than by roles, so that each group member could take on work as his or her

schedule allowed.

When creating the subject term vocabulary, it was helpful to look at articles which had

keywords assigned by the authors in order to understand which concepts considered were

relevant and helpful. For example, in Virginia Tucker’s article, “The Expert Searcher’s

Information Experience,” after the abstract, the following keywords were included: “information

experience, expert searcher, expertise, search experience, threshold concepts” (Tucker, 2014).

Although we were advised to break many of these down into smaller (mostly one word)

concepts, having an example to follow was very helpful in creating the first draft of the

vocabulary.

25

The challenges that our group faced when determining the main subject of the articles

were in trying to make concepts that would lend themselves easily to aggregation and

discrimination. When creating concepts to anticipate a user’s search needs one can be neither

too specific nor too vague. While using operators is useful when grouping terms during a search,

deciding on the most relevant terms without making them redundant proved to be challenging.

In terms of improvement to our database, it would be interesting to allow users to tag

articles with their own vocabulary to see what the articles get labeled as by users. Another field

could then be added to the database called “user generated metadata” or simply “user created

tags.” Because this would most likely prove to be redundant to the descriptor fields, the

information could instead be used to evaluate and update the controlled vocabulary. It is in this

way that the designers could ensure that the language used in the descriptor field is up to date

and relevant for the user.

Holley Cornetto

Kuhlthau’s ISP model reflects on the emotional states that a searcher experiences during

the information cycle, and the ability of the user to construct meaningful knowledge from the

information that is retrieved from a search (1991). It was important during the creation of this

database to reflect on the information search process as a whole, and to remember that the

determination and classification of these articles according to their subject reflected only a small

step in the information process as a whole. A user will go through many different stages during

the search process as they attempt to find information, narrow down his or her research query,

and even reformulate the question as new information on the topic is discovered (Kuhlthau,

26

1991). The database needs to be able to facilitate the evolving needs of the user, and in order to

do that, a user needs to be able to search in multiple ways to extract the information that he or

she is searching for.

Felix Davila III

In developing a research database for usage, information professionals must take into

consideration the variety of methods that users may go about searching. As Tucker stipulates,

“Students have come to rely on web search engine intelligence to such an extent that they may

fail to formulate a question before charging forward to search for its answer” (2015). It is this

lack of knowledge in searching that hinders even the best database from being used to its best

potential. Creating the database must factor in multiple methods of searching, understanding that

while students and other users may rely on simple keyword searches, they ideally must have a

plan of action before beginning research.

Tucker suggests users “formulate a question” before moving into search, and then

understanding concepts within that subject matter. Using ideas like “concept fusion”, which

allows users to combine similar subject terms into a search with Boolean operators to maximize

the possibilities of results that are relevant. The concept of “pearl growing” is another concept,

involving keywords that go from specific to more general in an attempt to garner a larger variety

of results (Tucker 2015). As it applies to the created database, the search fields were designed to

provide the user a shot at harnessing these concepts and finding the articles that may be relevant

to their intended topic.

27

Tucker explains that users severely underestimate the significance of having a plan for

searching. But using search concepts with the database’s variety of fields, including Title,

Abstract and Descriptor, could maximize the results.

Erin Agobert

I found the analysis portion of this project quite interesting and felt the need to delve

further into the recall and precision results of the database. Abstract searches tended to retrieve

less non-relevant articles than the descriptor field search. In addition, I retrieved all articles

deemed relevant to my searches.

Previous research indicates subject headings are valuable tools for reducing irrelevant

hits in searching (Gross, 2005). As abstracts are a summation of an article, including the abstract

within the database provides relevant keyword information potentially missed in developing a

controlled vocabulary. This is similar to Gross’s findings that if subject headings were to be

removed from or no longer included in catalog records, users performing keyword searches

would miss more than one third of the hits they currently retrieve (Gross, 2005). Searching the

descriptor field based on controlled vocabulary may have proved Gross’s results.

It was useful to know the target users and focus on their potential needs in developing

vocabulary terms. In project 1, we learned that to develop a database we must understand the

user. This project was another reflection of that lesson. It was important to know we were

developing the database on the behalf of MLIS student who have an interest in information

retrieval. I looked at titles, abstract, and keywords used within the articles to help develop the

controlled vocabulary.

28

References

Gross, T. & Taylor, A. (2005). What have we got to lose? The effect of controlled vocabulary on

keyword searching results. College & Research Libraries, 66(3).

Kuhlthau, C. C. (1991). Inside the search process: Information seeking from the user's

perspective. Journal of the American society for information science, 42(5), 361.

Tucker, V.M. (2014). The expert searcher’s experience of information.

Tucker, V.M. (2015). Sharpening the search saw: Lessons from expert searchers. SJSU School of

Information Student Research Journal, 5(1). Retrieved

from:http://scholarworks.sjsu.edu/slissrj/vol5/iss1/1/

http://eprints.qut.edu.au/78024/1/InfoExperience2014_Chapter15_Tucker.pdf

http://scholarworks.sjsu.edu/slissrj/vol5/iss1/1/

http://eprints.qut.edu.au/78024/1/InfoExperience2014_Chapter15_Tucker.pdf

Table - davilamlis.files.wordpress.com › 2017 › 09 › ... · taxonomy, theory, thesauri,...

Documents

Transcript of Table - davilamlis.files.wordpress.com › 2017 › 09 › ... · taxonomy, theory, thesauri,...