Data Mining the Largest Library Database in the World Roy Tennant OCLC Research Leveraging WorldCat.

Post on 13-Dec-2015

214 views 0 download

Tags:

Transcript of Data Mining the Largest Library Database in the World Roy Tennant OCLC Research Leveraging WorldCat.

Data Mining the Largest Library Database in the World

Roy TennantOCLC Research

Leveraging WorldCat

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Worldcat.org/identities/

Algorithmically constructed from WorldCat records

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Viaf.org

A Union database of authority records

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

The Responsible Party

Thom HickeyChief Scientist

OCLC Research

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

290+ million records

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Language Coverage

Percentage of records for non-English materials

30 June 2012

60.2%

274 million

36.5 million

25.5 million11.3

million4.7 million4.3 million3.6 million3.5 million

Total

GermanFrenchSpanishItalianDutch Russian Latin

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Worldcat.org/identities/

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

(J.K. Rowling)

(Diana Gabaldon)

(Galileo)

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Viaf.org

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

VIAF Participants

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

“Super” Authority File

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Our Cataloging Future

“Moving from cataloging to catalinking”

Eric Miller, Zepheira

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Some Lessons• Widespread collaboration is essential• Normalizing the data is essential• Normalizing the data is complicated• Everything is interrelated:

– You can’t bring names together if titles don’t match

– You can’t bring titles together if names don’t match

• Batch mode processing still rules (but we’re getting better and faster at it)

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Conclusions

• Data mining isn’t just useful, it’s essential• Extracting data from MARC that is useful in

other contexts is possible, but will require sophisticated processing

• Only very large organizations (e.g., OCLC, national libraries) have the data and resources to do this work

• Thankfully, we are doing it, but there is much more to be done

E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

Roy Tennant

tennantr@oclc.org

@rtennant

roytennant.com