Relevance of clasification and indexing
-
Upload
varalakshmirsr -
Category
Education
-
view
396 -
download
3
Transcript of Relevance of clasification and indexing
Relevance of classification and indexing
in the organization of internet resources
The general opinion is that the digital age wipes the centuries old library system.
There is a feeling that libraries and librarians are obsolete in present digital era.
Two questions generally faced by the LIS professionals are: ‘What will be the future of libraries?’ ‘Why organization of information if you can find
it on the internet?’
Will Sherman: 33 Reasons why libraries and librarians are still important (http://www.degreetutor.com/library) Not everything available on the internet Digital libraries are not the internet Internet compliments libraries but does not replace The internet is not free Digitization does not mean destruction, infact means
survival Libraries are not just books Like business, digital libraries still need human beings Eliminating libraries would cut short cultural evolution Internet is a mess while libraries organize knowledge
Librarians employed three important tools for K.O. They are: Data element directory (Cataloging Manual) Classification Scheme for categorization of the documents; and Thesaurus (vocabulary control tool) for consistent indexing
(assigning index terms) The web has grown without any of these tools, so
unorganized(Devadason, F.J. Facet analysis and semantic Web: Musings of a student of Ranganathan
http://www.reocities.com/Athens/5041/FASEMWEB.html)
However the issue is:
Enormous quantity of information outside libraries
How to collect and organize world’s knowledge?
TRADITONAL Classification – shelf
arrangement Catalogue – identification and
location of information Analysis & consolidation-Indexing
/ abtractingfor micro documentsResult: -improved precision or recall
-provide context for search terms
- enable browsing
- access to related information with
meaningful relationships
-serve as a mechanism for switching between languages.
WEB BASED Search engines Subject gateways Directories
Result: The web is a sea of all kinds of data
- difficult to find, access &
retrieve pertinent information
-extremely unorganized data
- Too many false and missing links
Eg Building and architecture
Travel and hotel
Difference: Use of subject descriptors
Directories - Could not cope with the scale of Web growth- Were often built by amateurs in classification and vocabulary management
- Were biased by the commercial use of the Web
Vocabularies
- Open Directory categories
- Wikipedia categories
- Metadata in html <head>
- Spammed, not in sync with the content
- Ignored by most search engines now- Bottom line : The Web is not and will never be an organized library(Bernard, V. Porting library vocabularies to the Semantic Web, and back A win-win round
trip. IFLA 2010, Gothenburg)
Eg. Works on M. K. Gandhi
Library - The art of librarianship has been used for thousands of years to organise knowledge – catalogue/ librarian – class no. – shelf
Search engines - collections are built by robots; number count
- aim for exhaustive indexing;
- offer automatically generated metadata
Subject gateways - collections are built by humans
- aim to develop catalogues of high quality resources
- offer human generated metadata
Can we apply classification principles?Can we apply Metadata?Can we apply indexing techniques?
Two distinct ways of finding resources on the Internet emerged (Dodd 1996).
- the use of robot or spider based search engines and
- producing ‘hotlists’, which would encourage users to browse the Web.
This production of hierarchically arranged lists brought in the use of Library classification schemes
Subject directories like Yahoo! and other quality controlled subject gateways started use of classification schemes to enhance searching the Net.
They maximize the retrievability / visibility of information: clustering, browsing. e.g. LIS education through distance mode
Electronic versions of classification schemes (Web Dewey, UDC Online) made it to adopt them on the web.
The Web, as an information environment, differs from the controlled setting of a traditional information retrieval system
How and to what extent a classification is actually used to support subject access on web.
Many Web sites, like Google and Yahoo, use hierarchical classification trees to organize text resources in Web.
Subject gateways offer hierarchical browse structures based on subject classification schemes.
The DDC was adapted earlier and more quickly to usage in digital systems via the Internet.
It is completely and easily available as "WebDewey" for all Web browsers and platforms.
Examples: Library and Archives Canada (LAC) has capitalized on the Dewey
Decimal Classification (DDC) potential for organizing Web resources in two projects.
ADAM, the Art, Design, Architecture & Media Information Gateway Biz/ed is a subject gateway for business education BUBL uses the Dewey Decimal Classification system as the primary
organisation structure for its catalogue of Internet resources. National Library of Canada's Canadian Information by Subject
service
Since 1993UDC has been in subject gateways and become more prevalent in East European SGs, portals and hubs since 2000
UDC in SGs appeared to be linked to the following types of applications: manual classification of manually collected links on small to medium-
size directories (from a few hundred to a few thousand resources) manual classification of a large number of automatically harvested
resources using harvesting and metadata creation tools and more advanced technology (quality controlled SGs)
automatic harvesting and classification (quality controlled SGs)
(Aida Slavic. UDC in subject gateways: experiment or opportunity? Knowledge Organization, 33, 2006)
Examples: WAIS (Wide Area Information Server) NISS (National Information Services and System ) INTUTE FVL (Finnish Virtual Library ) GERHARD (German Harvest Automated Retrieval and
Directory) PORT (Maritime Information Gateway) OKO (Slovenian catalogue of Web resources ) etc
But they are not displaying the UDC structure on the interface or UDC numbers in the metadata.
The UDC is probably more "modern" and has made faster progress towards a faceted structure.
Descriptive metadata is to facilitate discovery of relevant information.
In addition to resource discovery, metadata can help organize electronic resources, facilitate interoperability and legacy resource integration, provide digital identification, and support archiving and preservation.
The process is automatic and cost effective In descriptive metadata, the medium of that resource
becomes a non-issue. This enables DC metadata to be used by any organizations
for cataloguing specialized types of mixed-media collections
Pre and post coordinated; Derived and assigned; context based; Thesaurus and classaurus (Classaurus is a faceted scheme of terms indicating hierarchy enriched with synonyms)
Two concepts - Semantics and syntax Purpose – achieve precision out of recalled information Humans can do it since it is natural language Machines – ignorant and can’t make any sense
How to achieve precision out of recalled information of the Web?
Relationships – categorized as Hierarchical (internal) – whole – part composition
Non hierarchical (external) – associative and equivalent Application in different areas
Design of classification (thesaurus) Knowledge organization and Information retrieval (search
strategies) Lexical cohesion Epistemology etc Design and development of databases Web design and development Artificial intelligence Text analysis and summarization Hypermedia
Creating representation of Web pages Providing standard identifiers (URI) associated to access
protocol (http). The WWW is based on HTML / XML hierarchies for coding a
body of text and images (multi media) and linking things together Via http protocol, hypertext etc
Use of vocabularies as subject descriptors to organize Web content as in libraries
Taxonomies, subject headings, classifications
- That’s where library heritage is strong and the Web is weak
- Such vocabularies can be structuring for the web of data as they are for libraries
- But it is more than in a library – the process should be automated
Semantic enhancement of scholarly journal articles, by aiding publication of data and metadata and providing ‘lively’ interactive access is necessary
Such semantic enhancements are already being undertaken by leading STM publishers
Application of structured vocabularies, of course using artificial intelligence, is the ‘semantic Web’
Tim Berners-Lee: Computer Scientist at MIT, USA.;WWW Creator; Director of W3Consortium; Developer of Semantic Web
Intention: to enhance the usability and usefulness of the web and its connected resources.
“I have a dream for the Web [in which computers] become capable of analysing all the data on the Web – the content, links, and transactions between people and computers. A ‘Semantic Web’, which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‘intelligent agents’ people have touted for ages will finally materialize.”
—Tim Berners Lee, 1999
Technologies enabling machines to make more sense of the Web making the Web more useful for humans.
This means radically improving ability to find, sort, and classify information: an activity that takes up a large part
The Semantic Web is a project that intends to create a universal medium for information exchange by putting documents with computer-processable meaning (semantics) on the World Wide Web.
“The Semantic Web is an extension of the current Web that will allow you to find, share, and combine information more easily. It relies on machine-readable information and metadata expressed in RDF.”
www.noisebetweenstations.com/personal/essays/metadata_glossary/metadata_glossary.html
Humans can easily connect the data when browsing the Web…for
e.g. we disregard advertisements, we know the links that are interesting for our purpose (job –resume; air ticket – flights)… but machines can’t!Eg. automatic airline reservation can done (Ivan Herman, W3C) combining the local knowledge with remote services: airline preferences; dietary requirements; calendaringFor e.g. a computer can find the nearest plastic surgeon and book an appointment that fits a personal schedule.
XML provides a surface syntax for structured documents, but imposes no semantic constraints on the meaning of these documents.
XML SCHEMA is a language for restricting the structure of XML documents.
RDF is a simple data model for referring to objects (“resources") and how they are related. An RDF-based model can be represented in XML syntax.
RDF Schema is a vocabulary for describing properties and classes of RDF resources, with semantics for generalization-hierarchies of such properties and classes.
OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes.
URI – Universal Resource Identifier - used as universal naming tools, including for properties
NAME SPACE is a context in which a group of one or more identifiers might exist. An identifier defined in a namespace is associated with that namespace. E.g. Employee ID 123. Many modern computer languages provide support for namespaces.
All these are based on knowledge representation algorithms, say week AI.
The primary facilitators of this technology are URIs which identify resources along with XML and namespaces.
These with a bit of logic form RDF, which can be used to say anything about anything.
FOAF: A popular application of the semantic web is Friend of a Friend or (FoaF), which describes relationships among people and other agents in terms of RDF.
The web is changing and offering new possibilities for communication and interaction by combining the concepts on the web. This is made possible by XML
XML provides an interoperable syntactical foundation that facilitates to represent relationships and built meanings
RDF is an XML based standard for describing resources that exist on the web.
RDF is a model for such relationships and Interchange RDF is the standard interchange format on the semantic
web. Once information is in RDF form, it becomes easy to process it, since RDF is a generic format.
It is a model of (s p o) triplets with p naming the relationship between s and o
RDF is a graph: i.e., a set of RDF statements is a directed, labeled graph
- the nodes represent the resources that are bound
- the labeled edges are the relationships with their names
With an RDF application, it is easy to know which bits of data are the semantics of the application, and which bits are just syntactic fluff.
RDF statements describe a resource, the resources properties and the values of the properties.
RDF statements are often refer to as “triples” that consist of a subject, predicate and object which correspond to a resource (subject), a property (predicate) and a property value (object)
This piece of RDF basically says that this article has the title "The Semantic Web: An Introduction", and was written by someone whose name is "Sean B. Palmer". Here are the triples that this RDF produces:-
<> <http://purl.org/dc/elements/1.1/creator> _:x0 . this <http://purl.org/dc/elements/1.1/title> "The Semantic Web: An Introduction" . _:x0 <http://xmlns.com/0.1/foaf/name> "Sean B. Palmer" .
<rdf:Description rdf:about="http://www.ivan-herman.net">
<foaf:name>Ivan</foaf:name>
<abc:myCalendar rdf:resource="http://…/myCalendar"/>
<foaf:surname>Herman</foaf:surname>
</rdf:Description>
URI is simply a web identifier like the strings starting with “http:” “ftp:” Anyone can create a URI and the ownership of them is clearly delegated so they form ideal base technology to build a global web.
Resources on the web are identified by URIs, which uses a global naming convention.
The W3C maintains list of URI schemes. The URI-s made the merge possible URI-s ground RDF into the Web URI-s make this the Semantic Web
Ontological analysis clarifies the structure of knowledge Defined as the terms used to describe and represent an
area of knowledge. These are explicit specifications of a conceptualization The ontology is the study of the ‘categories, of things
that exist or may exist in some domain’. A common ontology defines the vocabulary with which
queries and assertions are exchanged among agents. These are the rules that help integration and operate on
globally shared theory Often equated with taxonomic hierarchies of classes but
need not be limited to this form as it adds knowledge about the word
The semantic Web is generally built on syntaxes which use URIs to represent data, usually in triples based structures i.e. many triples of URI data that can be held in databases, or interchanged on the WWW using a particular syntax developed especially for the task. These syntaxes are called “Resource Description Framework” Syntaxes.
The application of Semantic Web is to create relations among resources on the Web and to interchange those data, like (hyper) links on the traditional web, except that:
- there is no notion of “current” document; ie, relationship is between any two resources
- a relationship must have a name: a link to my CV should be differentiated from a link to my calendar
- there is no attached user-interface action like for a hyperlink
Map the various data onto an abstract data representation make the data independent of its internal representation…
Merge the resulting representations Start making queries on the whole!
queries that could not have been done on the individual data sets
Web lacks the coordination and organization of a traditional library.
It has been practiced and proved that the use of traditional library tools and techniques could be a great help in taming the Net.
The IFLA Information Technology section, with support of Cataloguing section, Classification and Indexing section, and Knowledge Management section, proposes the creation of a Semantic Web Special Interest Group (SWSIG) within IFLA.
The SWSIG intends to be a platform where interested professionals could gather, and undertake whatever tasks are needed to develop, enhance and facilitate the adoption of semantic Web technologies in the library community.
Librarians should start research projects to develop better techniques of organizing the web. Modern classification research must find order especially in the context of complexities of the Internet