Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%)...
Transcript of Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%)...
![Page 1: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare](https://reader036.fdocuments.us/reader036/viewer/2022071011/5fc994ae1438762de152e9a1/html5/thumbnails/1.jpg)
HELDIG SummitHelsinki
7 November 2019Leo Lahti (University of Turku)
Bibliographic Data Harmonization in Research open ecosystems for scalable collaboration
[email protected] | @openreslabs
![Page 2: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare](https://reader036.fdocuments.us/reader036/viewer/2022071011/5fc994ae1438762de152e9a1/html5/thumbnails/2.jpg)
Shakespeare was made big by small books!
Data: ESTC | Figure: DH2019 (best!) poster, Utrecht.
Drastic shift from large (2fo/4to) to small (8vo/12mo) books observed around 1700’s.
… how reliable and representative this data set is?
![Page 3: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare](https://reader036.fdocuments.us/reader036/viewer/2022071011/5fc994ae1438762de152e9a1/html5/thumbnails/3.jpg)
One (non-standard) XML file
~480 000 entries (1470-1800)
Designed for information retrieval rather than quantitative analysis
Not openly available
Browsable online: http://estc.bl.ac.ukSubject catalogue of the University Library of Graz.
Source: Wikimedia Commons.
![Page 4: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare](https://reader036.fdocuments.us/reader036/viewer/2022071011/5fc994ae1438762de152e9a1/html5/thumbnails/4.jpg)
Research potential of library catalogues has been debated for decades
Bibliography and Scienceby
G. Thomas Tanselle
![Page 5: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare](https://reader036.fdocuments.us/reader036/viewer/2022071011/5fc994ae1438762de152e9a1/html5/thumbnails/5.jpg)
<
Actors: - 558,243 original- 92,044 (16%) harmonized
Variants of Shakespeare in ESTCghost of shakespearekenrick, william shakespeareshakespeare, johnshakespeare room (birmingham, england)shakespeare, thomas, active 1598shakespeare, williamshakespeare, william, 1564-1616shakespeare, william, 1564-1616., (adaptations)shakespeare, william, 1564-1616, (adaptations)shakespeare, william, 1564-1616., (adaptions)shakespeare, william, 1564-1616., (selections)
Original data not ready for analysis
Actor harmonization: Mark Hill, Ville Vaara
![Page 6: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare](https://reader036.fdocuments.us/reader036/viewer/2022071011/5fc994ae1438762de152e9a1/html5/thumbnails/6.jpg)
From library catalogues to research reports?Research potential
Open
bibliographic data scienceecosystem
Research cases
![Page 7: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare](https://reader036.fdocuments.us/reader036/viewer/2022071011/5fc994ae1438762de152e9a1/html5/thumbnails/7.jpg)
Open data science ecosystem?Authors PublishersEditions Publication place GatheringsPage countLanguage Genre...
R for Data Science / H. Wickham
Dedicated data science infrastructureReproducible & automated workflowsOpen source (use/contribute/develop)Semi-automated curationHighly collaborative effort
![Page 8: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare](https://reader036.fdocuments.us/reader036/viewer/2022071011/5fc994ae1438762de152e9a1/html5/thumbnails/8.jpg)
“Standard” doc sizes vary across time and spaceData availability (HPB):
- Gatherings: 22.5%- Height: 11.6%- Width: 1.1%
![Page 9: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare](https://reader036.fdocuments.us/reader036/viewer/2022071011/5fc994ae1438762de152e9a1/html5/thumbnails/9.jpg)
Counting editions by publishers in London 1637-1662
- Manual curation (David Gants)- Automated analysis (Iiro Tiihonen)
Good correspondence supports our automated approach.
Boost curation & scalability by automation
Manually curated data from: David Gants. A Quantitative Analysis of the London Book Trade. Studies in Bibliography 55:185-213, 2002
![Page 10: Leo Lahti (University of Turku) 7 November 2019 ... · Actors: - 558,243 original - 92,044 (16%) harmonized Variants of Shakespeare in ESTC ghost of shakespeare kenrick, william shakespeare](https://reader036.fdocuments.us/reader036/viewer/2022071011/5fc994ae1438762de152e9a1/html5/thumbnails/10.jpg)
Thanks!
Material for the slides contributed by: Mikko Tolonen, Leo Lahti, Jani Marjanen, Mark Hill, Ali Ijaz, Ville Vaara, Hege Roivainen, Iiro Tiihonen
Helsinki Computational History Group:https://www.helsinki.fi/en/researchgroups/computational-history