Leveraging WorldCat

Post on 24-Feb-2016

40 views 0 download

Tags:

description

Leveraging WorldCat. Data Mining the Largest Library Database in the World Roy Tennant OCLC Research. Algorithmically constructed from WorldCat records. Worldcat.org /identities/. A Union database of authority records. Viaf.org. The Responsible Party. Thom Hickey - PowerPoint PPT Presentation

Transcript of Leveraging WorldCat

Data Mining the Largest Library Database in the World

Roy TennantOCLC Research

Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Worldcat.org/identities/

Algorithmically constructed from WorldCat records

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Viaf.org

A Union database of authority records

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

The Responsible Party

Thom HickeyChief Scientist

OCLC Research

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

290+ million records

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Language Coverage

Percentage of records for non-English materials

30 June 2012

60.2%

274 million

36.5 million

25.5 million11.3

million4.7 million4.3 million3.6 million3.5 million

Total

GermanFrenchSpanishItalianDutch Russian Latin

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Worldcat.org/identities/

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

(J.K. Rowling)

(Diana Gabaldon)

(Galileo)

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Viaf.org

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

VIAF Participants

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

“Super” Authority File

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Our Cataloging Future

“Moving from cataloging to catalinking”

Eric Miller, Zepheira

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Some Lessons• Widespread collaboration is essential• Normalizing the data is essential• Normalizing the data is complicated• Everything is interrelated:

– You can’t bring names together if titles don’t match– You can’t bring titles together if names don’t match

• Batch mode processing still rules (but we’re getting better and faster at it)

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Conclusions

• Data mining isn’t just useful, it’s essential• Extracting data from MARC that is useful in

other contexts is possible, but will require sophisticated processing

• Only very large organizations (e.g., OCLC, national libraries) have the data and resources to do this work

• Thankfully, we are doing it, but there is much more to be done

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Roy Tennant

tennantr@oclc.org@rtennantroytennant.com