Building a corpus of Islamic embryology based on a robust ... · Building a corpus of Islamic...

26
COBHUNI | Universität Hamburg Building a corpus of Islamic embryology based on a robust and elegant architecture Alicia González Martínez, Tillmann Josua, Thomas Eich

Transcript of Building a corpus of Islamic embryology based on a robust ... · Building a corpus of Islamic...

COBHUNI | Universität Hamburg

Building a corpus of Islamic

embryology based on a robust and

elegant architecture

Alicia González Martínez,

Tillmann Josua, Thomas Eich

COBHUNI | Universität Hamburg

Outline

1 Advent and challenges of digital humanities

2 The COBHUNI project

3 The source texts

4 The workflow:

(a) collecting and processing source data

(b) annotation

(c) visualization

5 Conclusions and future work

COBHUNI | Universität Hamburg

Advent and challenges of digital humanities

COBHUNI | Universität Hamburg

Advent and challenges of digital humanities

COBHUNI | Universität Hamburg

Advent and challenges of digital humanities

COBHUNI | Universität Hamburg

Advent and challenges of digital humanities

COBHUNI | Universität Hamburg

Advent and challenges of digital humanities

COBHUNI | Universität Hamburg

Advent and challenges of digital humanities

COBHUNI | Universität Hamburg

Advent and challenges of digital humanities

COBHUNI | Universität Hamburg

The COBHUNI project

The COBHUNI project aims at diversifying our understanding

of how pre-natal life is conceptualized in texts of

Islamic normativity.

Contemporary Bioethics and the History of the Unborn in Islam

COBHUNI | Universität Hamburg

The COBHUNI project

1.1 Before the unborn

1.2 The unborn

1.3 After the Unborn

Philological exegesis

Hadith criticism

Latin script

Semen and similarity / heredity

Semen as colors

Semen and coitus interruptus or contraceptives

Semen and wet dream

Sex act itself & its timing

Conception / fertilization

General / larger debate about predestination

Embryology: 40 days

Embryology: Ensoulment

Embryology: Angel visits Embryo

Embryology: expressed in a series of numbers

Embryology: Macrocosm – microcosm

Embryology: Embryo and link to resurrection & afterlife

Embryology: Link to (modern) science

Pregnancy: duration: Definition

Miscarriage / abortion and legal status of slave mother

Miscarriage / abortion and legal status of free mother

Miscarriage / abortion and legal status of the siqt

Abortion compared to killing a new-born

Menstruation

Breast-feeding

Legal status questions concerning the child after birth

2 METAMOTIVES

3 NAMED ENTITIES Proper name→

1 MOTIVES

COBHUNI | Universität Hamburg

The source texts

COBHUNI

Corpus

crawl and extract

crawl andextract

scan and OCR&post­correct

COBHUNI | Universität Hamburg

COBHUNICorpus

crawl and extract

crawl andextract

scan and OCR&post­correct

Source material No. tokens

altafsir.com(Quran exegesis)

12,601,880

hadith.al-islam.com 11,482,139

Scan and OCR texts 36,132

TOTAL 24,120,151

The source texts

COBHUNI | Universität Hamburg

Source material No. tokens

altafsir.com(Quran exegesis)

12,601,880

hadith.al-islam.com 11,482,139

Scan and OCR texts 36,132

TOTAL 24,120,151

The source texts

COBHUNICorpus

crawl and extract

crawl andextract

scan and OCR&post­correct

Prof. Thomas Eich | Universität Hamburg

Great ideas

COBHUNI

Corpus

insert

insert

import

export

Visualization Tool: Annis

import

queryprocessed

information

Annotation Tool: WebAnno

The workflow

insert

Collecting and processing source data

Collecting and processing source data

get source code

.html

crawl

Collecting and processing source data

.html.json

get source code

crawlConvert & filter

Collecting and processing source data

.htmlCOBHUNI

Corpus.json

get source code

crawl

insert

Convert & filter

Annotation

Visualization

Technical workflow

COBHUNI | Universität Hamburg

Conclusions

✔ Create a multisource and reliable corpus of

Islamic texts of about 24M tokens

✔ Annotate subset of the corpus with semantic

information

✔ Developed a robust pipeline architecture for

integrating heterogeneous data, sanitising it,

enriching it and ingesting it into a powerful

software for corpus visualization (Annis)

COBHUNI | Universität Hamburg

Future work

✔ Continue to manually annotate the subcorpus for

the COBHUNI project

✔ Add more texts for the OCR part.

✔ Convert the data to other formats so that it

can be integrated in other projects

✔ Work on strategies to enrich the texts with

morphological information

COBHUNI | Universität Hamburg

يلا شكرا جز