Leveraging WorldCat

23
Data Mining the Largest Library Database in the World Roy Tennant OCLC Research Leveraging WorldCat

description

Leveraging WorldCat. Data Mining the Largest Library Database in the World Roy Tennant OCLC Research. Algorithmically constructed from WorldCat records. Worldcat.org /identities/. A Union database of authority records. Viaf.org. The Responsible Party. Thom Hickey - PowerPoint PPT Presentation

Transcript of Leveraging WorldCat

Page 1: Leveraging WorldCat

Data Mining the Largest Library Database in the World

Roy TennantOCLC Research

Leveraging WorldCat

Page 2: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Worldcat.org/identities/

Algorithmically constructed from WorldCat records

Page 3: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Viaf.org

A Union database of authority records

Page 4: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

The Responsible Party

Thom HickeyChief Scientist

OCLC Research

Page 5: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

290+ million records

Page 6: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Language Coverage

Percentage of records for non-English materials

30 June 2012

60.2%

274 million

36.5 million

25.5 million11.3

million4.7 million4.3 million3.6 million3.5 million

Total

GermanFrenchSpanishItalianDutch Russian Latin

Page 7: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Worldcat.org/identities/

Page 8: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 9: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 10: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

(J.K. Rowling)

(Diana Gabaldon)

(Galileo)

Page 11: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 12: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 13: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Viaf.org

Page 14: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

VIAF Participants

Page 15: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 16: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

“Super” Authority File

Page 17: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 18: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 19: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Our Cataloging Future

“Moving from cataloging to catalinking”

Eric Miller, Zepheira

Page 20: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Page 21: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Some Lessons• Widespread collaboration is essential• Normalizing the data is essential• Normalizing the data is complicated• Everything is interrelated:

– You can’t bring names together if titles don’t match– You can’t bring titles together if names don’t match

• Batch mode processing still rules (but we’re getting better and faster at it)

Page 22: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Conclusions

• Data mining isn’t just useful, it’s essential• Extracting data from MARC that is useful in

other contexts is possible, but will require sophisticated processing

• Only very large organizations (e.g., OCLC, national libraries) have the data and resources to do this work

• Thankfully, we are doing it, but there is much more to be done

Page 23: Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Roy Tennant

[email protected]@rtennantroytennant.com