How to collaboratively aut Course Handbooks collaboratively author ...
Wikipedia as source of collaboratively created Knowledge Organization Systems
-
Upload
jakob- -
Category
Technology
-
view
1.828 -
download
0
Transcript of Wikipedia as source of collaboratively created Knowledge Organization Systems
Wikipedia as source of collaboratively created Knowledge Organization Systems
Digitale Bibliothek
Jakob Vo
Wikipedia as source ofcollaboratively createdKnowledgeOrganization Systems
Fachhochschule Hannover25. Juni 2009
Wikipedia
de facto standard online reference
> 13 million articles, > 230 languages
run by Wikimedia, run with MediaWiki
open content (CC-BY-SA / GFDL)
its a wiki!
dense hypertext
anyone can edit (but its a media of its own)
http://de.wikipedia.org/wiki/Portal:BID
Structure of Wikipedia
articles
internal and external links
redirects and disambiguation pages
lists, portals, and navigation templates
categories
infoboxes and geodata
(bibliographic) references
revisions, flags, featured content ....
Articles
text, intro, substructure
specific structure for specific article types
(years, people etc.)
Links
[[target]] or [[target|label]]
connect on textual and conceptual level
structure of hyperlinks encodes relations
External links
links to references
links to other structured knowledge bases
authority files (for instance PND)
MusicBrainz, IMDB ...
interlanguage links to other wikipedias
Redirects and disambiguations
control synonyms and homonys
Redirects and disambiguations
Lists and Portals
list: lead section followed by a list of links to articles in a particular subject area, such as people or places, or a timeline of events
List of _ , Outline of _, Glossary of _,
Timeline of _, Index of _ ...
portal: intended to serve as Main Pages for specific topics or areas. May be associated with one or more WikiProjects.
en: ~140 featured portals of ~600 total
http://en.wikipedia.org/wiki/Portal:Featured_portals
Navigation templates
grouping of links used in multiple related articles to facilitate navigation between those articles.
Categories
Nordrhein-Westfalennach Ort
Ort als Thema
Rheinland
Kln
Kultur (Kln)
Klner Dom
Geschichte Klns
Messe Kln
Multihierarchie of categories
Tagged article (social tagging, set model)
Kategorien: Katholische Bischhofskirche (Deutschland) | Klner Dom | Weltkulturerbe in Deutschland | Geschtztes Kulturgut | Architekturikone | Gotisches Bauwerk | Historisches Bauwerk | Stadtbezirk Kln-Innenstadt | Kultbau
Categories
Infoboxes and Geodata
structured tables via MediaWiki Templates, a simple field-value-structure
used for cities, animals, bands, chemicals ...
qualiers problematic: date, unit, source...
special and popular case:
geographical coordinates
this and following slides based on Georgi Kobilarovs presentation.
Field values are not atomic
References
vast amount of bibliographic data
Wikipedia cataloguing rules (sic!)
partly structured via templates:
Examples without templates
Revisions and other metadata
Information about articles
which user changed what an which time
flagged revisions
featured content
...
Interesting data available for wiki research
Wikipedia is/are not just articles but a struc-tured system of knowledge management
And all of it is availabe for further processing!
Use as Knowledge Organization System (KOS)
WikiWord
DBPedia
Semantic Tagging
...its up to you!
Summary
WikiWord
WikiWord builds a multilingual thesaurus
by mining the link structure
Every page describes a concept
Link labels are terms refering
to those concepts
Links and categories dene relations
Multilingual by merging languages
German Thesis by Daniel Kinzler
http://brightbyte.de/page/WikiWord
WikiWord Thesaurus
English, German, French, Dutch, Norwegian
>20 millionen labels
>11 millionen concepts
>2 millionen definitions
>75 millionen related links
>11 millionen hierarchical links
Available in SKOS/RDF
Source code available to generate more
RDF is URI + Unicode + Triples [+ Rules]
"Object"@lang
"Object"^^type-URI
subject
object
Resource Description Framework
predicate
RDF example (this: SKOS)
"Ananas"@de
agro:c385skos:prefLabelskos:Conceptrdf:typeURI namespaces for abbreviation
@prefix skos: .@prefix agro: .
RDF formats
http://d-nb.info/gnd/13150794X
Zettelwirtschaft
dc:title
KrajewskiMarkus
foaf:firstName
foaf:secondName
N3
graph
@prefix foaf .
@prefix dc .
dc:title "Zettelwirtschaft" ;
dc:creator .
foaf:firstName "Markus" ;
foaf:secondName "Krajeski" .
http://d-nb.info/96327841X
dc:creator
RDF/XML format
Zettelwirtschaft
Markus Krajewski
initiative to connect and publish open collections of data with RDF on the Web
one of largest collections and main hub:
DBpedia (http://dbpedia.org)
DBPedia Extraction framework
http://dbpedia.svn.sourceforge.net (Open Source)
Wikipedia
Extraction
Triple Store
DBPedia Extraction framework
core ontology
people
places
organizations
events
works
specific infoboxes
Parsers for each field
RDF Triples
Crowd Sourced Extraction
Wikipedia
Extraction
Triple Store
Linked Data
Benutze URIs, um Objekte zu identifizieren.
Benutze HTTP URIs, so dass Objekte nachgeschlagen werden knnen.
Wenn jemand eine URI nachschlgt, stelle zweckdienliche Informationen bereit.
Biete Links zu anderen URIs, so dass weitere Objekte nachgeschlagen werden knnen.
Tim Berners Lee (2006): Linked Data Design Issues http://www.w3.org/DesignIssues/LinkedData.html
Mai 2007
Mrz 2009
September 2008
More complex queries
examples
people born in 1965 that
contributed music in films
books about these people
notation in SPARQL (SQL for RDF)
one of several ways to access Semantic Web
Beispielanfrage
Dancer in the DarkBjrkmusic
1965
born
?music
1965
born
?
Filme, deren Musik jemand gemacht hat, der im Jahr 1965 geboren wurde?
Problem: Die Prdikate music (hat-darin-Musik-gemacht) und born (ist-geboren-im-Jahr) mssenbekannt und einheitlich verwendet werden!
Vernpfungen von Quellen
Dancer in the DarkBjrkdc:creator
1965
Buch ber BjrkPND:119525054dc:subject
owl:sameAs
OPAC
DBpedia
dbpedia:birthYear
Inference rules
if ... then also ...
a frbr:creator B => B rdf:type frbr:Work
danger of inference and discrimination
Bowker and Star (1999): Sorting Things out. Classification and its consequences
Voss (2007): The Semantic Web and
why Wikipedia should bother.
reality is fuzzy, data is not
Semantic Tagging
assign controlled concepts to resources
subject indexing reinvented
practised at BBC (!) with DBPedia concepts
SKOS and CommonTags ontology
Open Issues
user interfaces for query and display
data quality (needs humans)
fuzzy concepts and mapping (e.g. languages)
versioning and changes
underestimated regularly
interesting research topics
References
Wikipedia itself (practise editing and discussion!)
Kinzler (2008): Automatischer Aufbau eines multilingualen
Thesaurus durch Extraktion semantischer und lexikalischer
Relationen aus der Wikipedia.
Kobilarov, Bizer, Auer & Lehmann (2009): DBpedia - A Linked
Data Hub and Data Source for Web and Enterprise Applications.
Vo (2006):Collaborative thesaurus tagging the Wikipedia
way.
What do you think?
Klicken Sie, um das Format des Titeltextes zu bearbeiten
Klicken Sie, um die Formate des Gliederungstextes zu bearbeiten
Zweite Gliederungsebene
Dritte Gliederungsebene
Vierte Gliederungsebene
Fnfte Gliederungsebene
Sechste Gliederungsebene
Siebente Gliederungsebene
Achte Gliederungsebene
Neunte Gliederungsebene
Jakob Vo: Wikipedia als Grundlage zur gemeinsamen Erstellung von
Begriffsnetzen
25. Juni 2009 an der Fachhochschule Hannover
Klicken Sie, um das Format des Titeltextes zu bearbeiten
Klicken Sie, um die Formate des Gliederungstextes zu bearbeiten
Zweite Gliederungsebene
Dritte Gliederungsebene
Vierte Gliederungsebene
Fnfte Gliederungsebene
Sechste Gliederungsebene
Siebente Gliederungsebene
Achte Gliederungsebene
Neunte Gliederungsebene
Die Inhalte dieser Prsentation stehen (sofern nicht weiter angegeben) von Jakob Vofreigegeben unter der Creative Commons Attribution-Share Alike 3.0 Unported Lizenz.