Multilingualism ifla 2014 08

Post on 14-Dec-2014

155 views 0 download

Tags:

description

OCLC's 3 overlapping projects aim to generate true multi-lingual displays and to generate translation records for sharing via VIAF.

Transcript of Multilingualism ifla 2014 08

IFLA - Lyon, France 19 August 2014

Janifer Gatenby

Multilingualism in WorldCat and VIAF

Working with Karen Smith-Yoshimura, Robert Bremer, Eric Childress, Jean Godby, Richard Greene, JD Shipengrover, Gail Thornburg, Jenny Toves, Diane Vizine Goetz, Shenghui Wang, Jay Weitz

WorldCat Today

• Resources in nearly all languages

• Contributed by more than 20,000 libraries worldwide

• More than half the database is for works not in English

Languages

EnglishGermanFrenchSpanishChineseDutchJapaneseRussianArabic469 others

• Bibliographic Records– Hybrid records– Parallel records

• Clustered at Work level (FRBR)

WorldCat Today

Existing Architecture

AuthorsAuthors

Authors

SubjClassifSubj

ClassifSubjClassif

HoldingHolding

Holdings

Bibliographic recordWork

cluster

Content cluster

Manifestation cluster

Complementary Initiatives

Work Level Record

GLIMIRManifestation & Content Clusters

Multi-lingual Bibliographic

Structure

Objective: Work Level Record

Create a consolidated metadata summary for the content of a work

Work Level Recordhttp://www.oclc.org/research/activities/workrecs.html

Coming Q1 2015

GLIMIR: Objective

Create better work presentations

• The Content Cluster– Enables better work record displays by reducing the number of

lines that display for large works– Enables a choice of format and presents the formats that could

be acceptable substitutes– Consolidates holdings for identical content

• The Manifestation Cluster is important – Consolidates holdings at manifestation level– In the short term allows the record catalogued in the language

of the interface to be chosen for display– Reduces apparent duplication– Allows a more accurate count of the number of manifestations

in WorldCat (as opposed to the number of records)

GLIMIRUsers like C

Cataloguers & scholars

like C

Manifestation Clustering

So far 103 million records processed (about 30%)

Manifestation Cluster Opened

SRU Search:

Loti Pêcheur d’islande (Work ID 21536567)

Records HoldingsWork 18 148

Content 14 143Manifestation 7 115

Objective: Improve displays; surface translations

Multilingual Bibliographic Structure Project

Creates true multi-lingual displays– At work and manifestation levels– Using all available data instead of “most appropriate

record”– Generates data

Corrects many of the 28 million records coded “und”

Better control and linking of translationsInput to refinement of work clustersSmarter data storage

Multilingual Bibliographic Structure Project

• Worldcat.org selects the most appropriate record to show to a user as representative of the work in the short result list and beyond

• The end result will not be very satisfactory from a multi-lingual viewpoint… here’s why

“Most appropriate” questioned

Which record is better to present to a German speaker?

Incomplete Swedish Record

Hybrid record

Build the display from all available data

Most appropriate display

• Work level data, mined from all associated bibliographic records will be displayed supplemented with expression / manifestation level data as the user drills through the short to fuller versions of the metadata.

Multilingual Bibliographic Structure Project

End user interface will show works and manifestations not bibliographic records; the cataloguing client will also show bibliographic records

Proposed new architecture

Work

eng

fre

ger

jpn

ManifengManif

engManifeng

Manifeng

Manifeng Manif

engA

o freNotesContents +

+

HoldingHolding

Holding

Holding

Subjsif

SubjClassif

eng

freger

jpn

AuthorsAuthors

Authorseng

freger

jpn

eng

fre

ger

jpn

eng

fre

ger

jpn

Translations (Language of work)

Maniffre

Holding

• Language tagging of elements, particularly– Summaries (M21 520)– Subject headings

• Display in script preferred by the user if data is available

• Improve translated interfaces• Show consolidated holdings as appropriate

Important principles

Surfacing the “cream”

Translations

• The cream of the world’s cultural and knowledge heritage is shared by being translated

• WorldCat contains many rich cataloguing records for these translations

Great works are translated

GOAL: Data mine the really good records to improve clustering, presentation, authority records

and linked data

ΙλιάδαThe Iliad 紅樓夢

Dream of the Red Chamber

Война и миръWar and Peace

ঘরে� বা�ইরে�The Home and the World

સતયના� પરયો�ગો� અથવા� આતમકથ�The Story of My Experiments with Truth [Gandhi autobiography]

源氏物語

The Tale of Genji

דער בעל-תשובהThe Penitent

زقاق المدقMidaq Alley

Leo Tolstoy: 32 languagesHomer: 28 languages

Rabindranath Tagore: 21Isaac Bashevis Singer: 17Najib Mahfuz: 12 languages

Cao Xueqin: 9 languages

Mahatma Gandhi: 7 languages

Murasaki Shikabu: 7 languages

Translations

• Inconsistencies cause work clusters to be incomplete resulting in less than optimal search results– Titles without subtitles– Missing or different forms of uniform title– Inverted title– Different coding of original and translated

information

Improving work clustering

Generated uniform title authority records will overcome most of these differences without needing to edit individual records

Addition of xR records to VIAF

Before

After

UNESCO Translation Database

XR VIAF Record

VIAF ID for Author

Translated title

Translator

IFLA - Lyon, France 19 August 2014

VIAF Linked DataNew Information

Title: Journey to the WestLanguage: EnglishTranslator: Anthony C. YuDate: 1977IsTranslationOf:

Title: Journey to the WestLanguage: EnglishTranslator: W. J. F. JennerDate: 1982-1984IsTranslationOf:

Title: 西遊記Language: ChineseAuthor: 吳承恩Created: 1592HasTranslation:

Title: Tay du ky binh khaoLanguage: VietnameseTranslator: Phan QuanDate: 1980IsTranslationOf:

Title: 西遊記Language: JapaneseTranslator: 中野美代子Date: 1986IsTranslationOf:

Title: Monkeys PilgerfahrtLanguage: GermanTranslator: Georgette Boner Date: 1983IsTranslationOf:

# Original Work (in Chinese)<http://worldcat.org/entity/work/id/1215997>

a schema:CreativeWork; schema:creator <http://viaf.org/viaf/102266649> ; # "Gao, Xingjian”schema:inLanguage "zh";schema:name "靈山 "@zh;.

# Translated Work (in English)<http://worldcat.org/entity/work/id/145209748>

a schema:CreativeWork;schema:creator <http://viaf.org/viaf/102266649> ; # "Gao, Xingjian“ [new]:translator <http://viaf.org/viaf/81663420> ; # "Lee, Mabel"schema:inLanguage "en";schema:name "Soul Mountain"@en ;[new]:translationOfWork <http://worldcat.org/entity/work/id/1215997> “

Markup for the Semantic Web

Understanding information sharing across cultures

• What percentage of non-English works are translations of English works, and vice-versa?• Which authors are translated the most?• Which works have been translated into the most languages?• Which countries translate the most English works, the most non-English works?• Which countries translate a new work the fastest?Etc.

http://www.oclc.org/research/activities/multilingual-bib-structure.html

Where are we now?

Clustering• Work clusters done; ongoing refinement• GLIMIR clustering done for all [simple] text;

– 103 million records have GLIMIR IDs • Working on collected worksDisplays• Working on VIAF expression displays• Work level displays in WorldCat.org ++Data Mining for translations

Explore. Share. Magnify.

Janifer GatenbyEMEA Program Manager Metadata

Janifer.gatenby@oclc.orgoclc.org