Connecting political data to media data

36
Connecting political data to media data Laura Hollink VU University Amsterdam Web & Media group ASCoR Spring Colloquium ‘Big Data at the University of Amsterdam’ February 18, 2014

Transcript of Connecting political data to media data

Page 1: Connecting political data to media data

Connecting political data to media data

Laura Hollink

VU University AmsterdamWeb & Media group

ASCoR Spring Colloquium ‘Big Data at the University of Amsterdam’February 18, 2014

Page 2: Connecting political data to media data

Laura Hollink Damir JuricGeert-Jan Houben

Martijn KleppeMax KemmanHenri Beunders

Johan OomenJaap Blom

Funded by Clarin-NL

Page 3: Connecting political data to media data
Page 4: Connecting political data to media data
Page 5: Connecting political data to media data

Questions we want to answer

• Which events have attracted a lot of media attention?

• What are the differences between different media? E.g. in different newspapers, or newspapers vs. radio bulletins?

• Has the coverage changed over time?

• How are the events visualized (photos, layout of newspaper, etc.).

Page 6: Connecting political data to media data
Page 7: Connecting political data to media data

Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches.

Page 8: Connecting political data to media data

Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches. Archives of hundreds of

newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995.

(We only use 1945-1995)

Page 9: Connecting political data to media data

Transcriptions of all 9,294 meetings of the Dutch parliament between 1945-1995, consisting of 1,208,903 speeches.

Roughly 1.8 Million news bulletins between 1937-1984

(We only use 1945-1995)

Archives of hundreds of newspaper with tons of newspaper issues or 10’s of Millions of articles between 1618-1995.

(We only use 1945-1995)

Page 10: Connecting political data to media data

PoliMedia methods

Page 11: Connecting political data to media data

Step 1: Translate the Dutch parliamentary debates to the standard structured web format RDF

nl.proc.sgd.d.194519460000002

nl.proc.sgd.d.194519460000002.1

PartOfDebateDebate

http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002

http://statengeneraaldigitaal.nl/

http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf

nl.proc.sgd.d.19720000002

Handelingen Verenigde Vergadering...

Dutch

1945-11-20rdf:type

dc:id

dc:source

dc:source

dc:publisher

dc:language

dc:date

hasPart

rdf:type

nl.proc.sgd.d.194519460000002.1.1hasPart

DebateContext

rdf:type

nl.proc.sgd.d.194519460000002.1.2

Speech

rdf:type

hasPart

nl.proc.sgd.d.194519460000002.1.3

hasSubsequentSpeech

"Mijnheer de Voorzitter, de Commissie van …"

hasSpokenText

sem:hasActorSpeaker_0006

4

Party_kvp

hasParty

hasSpeaker

member_of _parliament

"De voorzitter opent de vergadering…"

hasText

http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr

coveredIn

Party

KVP

Katholieke Volkspartijrdf:type

hasAcronym

hasFullName

Joannes Antonius James

Bargefoaf:firstName

foaf:lastName

Bargerdfs:label

http://resolver.politicalmashup.nl/nl.m.00064

dc:source

Politician

rdf:typehasRole

nl.proc.sgd.d.194519460000002.2

hasSubsequentPartOfDebate

XML by War in

Parliament Project

Page 12: Connecting political data to media data

Modeling the debates as events

• An event has a date, a location, actors, and possibly sub-events.

• We build on the Simple Event Model (SEM).

•links to the original sources•reusing existing

vocabularies

nl.proc.sgd.d.194519460000002

Debate

http://resolver.politicalmashup.nl/nl.proc.sgd.d.194519460000002

http://statengeneraaldigitaal.nl/

http://resolver.kb.nl/resolve?urn=sgd:mpeg21:19451946:0000002:pdf

nl.proc.sgd.d.19720000002

Handelingen Verenigde Vergadering...

Dutch

1945-11-20rdf:type

dc:id

dc:source

dc:source

dc:publisher

dc:language

dc:date

dc:title

Page 13: Connecting political data to media data

•the part-of structure and chronological order of the debates.

nl.proc.sgd.d.194519460000002

nl.proc.sgd.d.194519460000002.1

PartOfDebate

hasPart

rdf:type

nl.proc.sgd.d.194519460000002.1.1hasPart

DebateContext

rdf:type

nl.proc.sgd.d.194519460000002.1.2

Speech

rdf:type

hasPart

nl.proc.sgd.d.194519460000002.1.3

hasSubsequentSpeech

"Mijnheer de Voorzitter, de Commissie van …"

hasSpokenText

"De voorzitter opent de vergadering…"

hasText

nl.proc.sgd.d.194519460000002.2

hasSubsequentPartOfDebate

Handelingen Verenigde Vergadering...

dc:title

Page 14: Connecting political data to media data

•the different roles and parties that a speaker can have in his/her career.

nl.proc.sgd.d.194519460000002.1.2

Speech

rdf:type

"Mijnheer de Voorzitter, de Commissie van …"

hasSpokenText

sem:hasActorSpeaker_0006

4

Party_kvp

hasParty

hasSpeaker

member_of _parliament

http://resolver.kb.nl/resolve?urn=ddd:011198136:mpeg21:a0525:ocr

coveredIn

Party

KVP

Katholieke Volkspartijrdf:type

hasAcronym

hasFullName

Joannes Antonius James

Bargefoaf:firstName

foaf:lastName

Bargerdfs:label

Politician

rdf:typehasRole

Page 15: Connecting political data to media data

Step 2: Linking speeches in the debate to the newspaper articles that cover them

We created a linking method to deal with our two challenges:1.How to link documents that are so different in nature?2. Can we use the structure of the debates: people, chronologic

order of speeches, introductions to each new topic, etc?

Detect topics in

speeches

Create queries

Search newspaper

archive

Topics

Named Entities

Name of speaker

Detect Named

Entities in speeches

Candidate articles

Queries

Rank candidate

articles

Links between speeches

and articles

Debates

Date of debate

Page 16: Connecting political data to media data

Step 2: Linking speeches in the debate to the newspaper articles that cover them

Detect topics in

speeches

Create queries

Search newspaper

archive

Topics

Named Entities

Name of speaker

Detect Named

Entities in speeches

Candidate articles

Queries

Rank candidate

articles

Links between speeches

and articles

Debates

Date of debate

Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate

Page 17: Connecting political data to media data

Step 2: Linking speeches in the debate to the newspaper articles that cover them

Detect topics in

speeches

Create queries

Search newspaper

archive

Topics

Named Entities

Name of speaker

Detect Named

Entities in speeches

Candidate articles

Queries

Rank candidate

articles

Links between speeches

and articles

Debates

Date of debate

Intuition 1: The name of the speaker should appear in the article and the article should be published within a week of the debate

Intuition 2: the more the article and the speech overlap in terms of topics and named entities, the more they are related.

Page 18: Connecting political data to media data

Evaluation: what do we use to rank the candidate articles?

• Experiment on 150 <newspaper article, speech in debate> pairs, 2 raters, K = 0.5

• Compare text of candidate articles to:• Setting 1: Named Entities in speech

• Setting 2: Named Entities + Topics in speech

• Setting 3: Named Entities + Topics in speech and larger part-of-debate

Score Setting 1 Setting 2 Setting 3

I don’t know 0.14 0.15 0.08

0 - unrelated 0.38 0.23 0.12

1- related 0.29 0.36 0.36

2- explicit mention of the debate 0.19 0.26 0.44

1+2 0.48 0.62 0.80

Page 19: Connecting political data to media data

Results

• An open data set of Dutch parliamentary debates,

• with almost 3 Million links between 450.000 speeches and URL’s of 1.5 Million news paper articles and radio bulletins at the National Library.

• accessible though a Web demonstrator and through a SPARQL endpoint.

Page 20: Connecting political data to media data

Demo

Page 21: Connecting political data to media data
Page 22: Connecting political data to media data
Page 23: Connecting political data to media data
Page 24: Connecting political data to media data
Page 25: Connecting political data to media data
Page 26: Connecting political data to media data
Page 27: Connecting political data to media data

SPARQL endpoint

• A service to query a knowledge base using the SPARQL query language.

“All speeches with more than 60 associated news items.”

SELECT ?speech ?no_newsitems {{ SELECT ?speech (COUNT(?news) AS ?no_news_items) WHERE{ ?speech <http://purl.org/linkedpolitics/nl/polivoc#coveredAt> ?news . }GROUP BY ?speech }FILTER (?no_news_items > 60) }

Page 28: Connecting political data to media data
Page 29: Connecting political data to media data
Page 30: Connecting political data to media data
Page 31: Connecting political data to media data
Page 32: Connecting political data to media data

Reflection: to what extend can we answer these questions?

• Which events have attracted a lot of media attention?

• What are the differences between different media? E.g. in different newspapers, or newspapers vs. radio bulletins?

• Has the coverage changed over time?

• How are the events visualized (photos, layout of newspaper, etc.).

Page 33: Connecting political data to media data

Future work

• More types of links

• From just “coveredIn” to “quotedIn”, “coveredIn”, “backgroundOf” “talksAbout”

• More types of media

• More types of (political) events.

Page 34: Connecting political data to media data

Project ‘Talk of Europe / Traveling Clarin Campus’2014-2015Funded by CLARIN-ERIC

From left to right: Max Kemman, Marnix van Berchum, Laura Hollink, Astrid van Aggelen, Steven Krauwer, Henri Beunders. (Unfortunately, Martijn Kleppe and Johan Oomen were not present to join the group pic.)

Page 35: Connecting political data to media data

Plans of ‘ToE/TTC’

1.Publish proceedings of the EU parliamentary debates in RDF• hosted by DANS

2.Organize 3 workshops/hackathons/‘Traveling Clarin Campuses’ in which we invite international partners to work with the data.

3.In collaboration with international partners:• enrich with annotations, e.g. topics, structured data about people, parties,

etc. • link to national datasets, e.g. media or national parliaments

Page 36: Connecting political data to media data