October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source...

Post on 12-Jan-2016

212 views 0 download

Tags:

Transcript of October 6, 2015 WAHSP/BILAND Towards flexible and stable CLARIN-supported open-source...

April 21, 2023

WAHSP/BILAND

Towards flexible and stable CLARIN-supported open-source web-applications for historical data-mining in public media

April 21, 2023

WAHSP/BILAND

Research team: Stephen Snelders(UU), Pim Huijnen(UU), Daan Odijk(ISLA, UvA), Fons Laan(ISLA), Maarten de Rijke (ISLA), Toine Pieters (UU),

04/21/23

Research

Creating big-data resources

National library of the NetherlandsDigital Newspaper ArchiveNational library of the NetherlandsDigital Newspaper Archive

> 10.000.000 pages> 10.000.000 pages

> 1200 titles> 1200 titles

1618-1995

1618-1995

> 30.000.000 articles> 30.000.000 articles

Still growing...Still growing...

How did/do you study 30 millionnewspaper articles?

Dutch press on GermanyFrank van Vree (1989)Dutch press on GermanyFrank van Vree (1989)

> 1200 titles> 1200 titles

1618-1995

1618-1995

> 31.000.000 articles> 31.000.000 articles44

1930- 1939

1930- 1939

4.0004.000

Sampling

04/21/23

Research

Developing semantic document selection tools

April 21, 2023

Research

WE NEED:

A semi-automatic and interactive open-source

application

An application that does not replace, but

supports the intuition and insights of the

historical researcher with expert knowledge of a

specific topic or domain.

An application that is user-friendly.

April 21, 2023

Research

Problem:

Context and background of Dutch drug and eugenics

debates in time

Aim

Understanding and evaluation of public debates around

drugs, addiction and eugenics in the Netherlands, 1900-

1945

Research question

What are the dynamics (in terms of patterns and trends)

of public debates and sentiments around drugs and

addiction, and eugenics in the Dutch newspapers in the

first half of the twentieth century

April 21, 2023

Research

Poe’s detective finds the truth by using data in those newspaper articles that do not concern the murder.

In a similar way we will find terms and sentiments in those newspaper articles that may seem irrelevant, but are not.

12

E-everything

Information-extraction

Recognize structure in text

Part of speech

Noun, verb, …

Entities

people, organisations, locations, temporal expressions, …

Relations

Who, what, with whom, how, why

13

E-everything

Information-extraction (2)

04/21/23

Enjoyable but what does it tell us?

04/21/23

Research

04/21/23

Research Start Query: Opium

04/21/23

Research Drugs and drug policy

Odijk D., de Rooij O., Peetz M-H., Pieters T., de Rijke M., Snelders S. (2012). "Semantic Document Selection", TPDL 2012: Theory and Practice of Digital Libraries: Springer, September.

04/21/23

Combining and clustering queries

04/21/23

Research

By carefully inspecting the word counts, we found quantitative evidence for historical turning points that indicated the criminalization of the drugs debate around 1924

Eugenics case; query overerving (hereditarian) 1867

04/21/23

Research

Primarily associations with health related terms/entities

04/21/23

Research

Eugenics case;

Eugenics case; query overerving 1935

04/21/23

Research

In 1935, however, the medical context of using the term inheritance made way for a legal and racial context

Toine Pieters

E-Humanity Approaches to Reference Cultures: The Emergence of the United States in Public Discourse in the Netherlands, 1890-1990

Challenges: 1. OCR-Repair

2. Improving Text-mining software and data

infrastructure

3. Developing new historical research strategies

4. Educating historians and other humanities

researchers

04/21/23

NEW HORIZONS in DIGITAL HUMANITIES