Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access...

17
http://www.laudatio-repository.org Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 1 Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach Open Repositories Conference 2014 in Helsinki Session „IG3F: Interest Group Session 3F (Fedora / Islandora)“

Transcript of Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access...

Page 1: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 1

Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach

Open Repositories Conference 2014 in Helsinki

Session „IG3F: Interest Group Session 3F (Fedora / Islandora)“

Page 2: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 2

Agenda

Project information Challenges Complex data structure Functionalities and research data access Technologies Outlook

Page 3: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer- and Media Science, Humboldt-Universität zu Berlin, June 13 2014 3

LAUDATIO Long Term Access and Usage of Deeply Annotated Information

Long-term preservation, user-oriented storage, and re-use of research data (historical text corpora) for a sub discipline of linguistics (hictorical linguistics) according to the Open Access principles

Project information

Page 4: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 4

Funded by German Research Foundation (DFG) Funding program „Research data infrastructures“ Funding period from 2011 to 2014 Project partners: Computer- and Media Service, Department of Historical Linguistics and Corpus Linguistics (all HU Berlin) and INRIA, France Supported by: Berlin School of Library and Information Science (BSLIS) HU Berlin

Project information

Page 5: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 5

German historical texts and linguistic annotation including all dialects from 9th to the 19th century

Research data

Page 6: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Document Header

Corpus Header

Layer Header

Preparation Header

Complex data structure

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 6

Page 7: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 7

TEI XML P5

Page 8: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 8

Import Search Browse Modify

(Configuration) Analyze

Page 9: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 9

Fedora Repository Architecture

JavaScript Client JSON mapping and indexing

Cake PHP5 framework

Cake PHP5 framework

Page 10: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 10

Import

Page 11: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 11

Browse

Page 12: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 12

Search

Page 13: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 13

Configuration

Page 14: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 14

http://www.sfb632.uni-potsdam.de/annis/gallery.html

Search and visualization of annotations via ANNIS

Page 15: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 15

• CakePHP 5 • Fedora 3.6 as backbone • Fedora REST interface • ElasticSearch (JSON) • External EPIC PID-Webservice v2 for PID assignments (handle) • Third party Open Source libraries on Github • Flat-Design (html5, CSS3) work in progress

Technologies

Page 16: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 16

• Integration of other humanities disciplines e.g. musicology, history, literature studies working with historical (German) texts

• Building a multidisciplinary repository infrastructure at HU Berlin? • Compliance with Guidelines/Certificates (e.g. Data Seal of Approval) • Metadata editor • Metadata as Linked Open Data • …

Outlook

Page 17: Open Access Research Data Repository for Corpus Linguistic … · 2016. 6. 15. · Open Access Research Data Repository for Corpus Linguistic Data – A Modular Approach . Open Repositories

http://www.laudatio-repository.org

Dennis Zielke Computer and Media Service, Humboldt-Universität zu Berlin, June 13 2014 17

Thank you for your attention!

Source code is available on Github: https://github.com/DZielke/laudatio

Any questions/feedback: [email protected]