ICA Slides
-
Upload
shenghui-wang -
Category
Documents
-
view
294 -
download
0
description
Transcript of ICA Slides
What is the problem? How can we deal with concept drift? Summary
Extensional Mapping-Chains for studying ConceptDrift in Political Ontologies
Shenghui Wang1 Stefan Schlobach2
Janet Takens3 Wouter van Atteveldt3
1 The Network Institute2 Department of Computer Science
3 Department of Communication Science
Vrije Universiteit Amsterdam
ICA 2010Singapore
What is the problem? How can we deal with concept drift? Summary
Content analysis in Communication Science
Communication scientists study all sorts of media contentrelated to human communication
Content analysis based on the NET method
concepts: political actors and issuesrelations: associations, opinions, or actions.
Example
Het Openbaar Ministerie (OM) wil de komende vier jaar mensen-handel uitroeien.
What is the problem? How can we deal with concept drift? Summary
Content analysis in Communication Science
Communication scientists study all sorts of media contentrelated to human communication
Content analysis based on the NET method
concepts: political actors and issuesrelations: associations, opinions, or actions.
Example
Het Openbaar Ministerie (OM) wil de komende vier jaar mensen-handel uitroeien.
What is the problem? How can we deal with concept drift? Summary
Content analysis in Communication Science
Communication scientists study all sorts of media contentrelated to human communication
Content analysis based on the NET method
concepts: political actors and issuesrelations: associations, opinions, or actions.
Example
Het Openbaar Ministerie (OM) wil de komende vier jaar mensen-handel uitroeien.
om human trafficking-1
What is the problem? How can we deal with concept drift? Summary
Semantic network analysis
2077
4842606 2471
1625
2423
2076
1259
2151
1545
2647
2492
623
1827
1409
329
2655
870
1306
10731097
1439
2403 1932
1906
889
1145
956
845
1474
2054
480
1936
1045
1332
2614
2251
1373
1608
883
1233
2653
1011
693
1275
752
2259
2120
475
341
2323
539
2221
1034
1940
1635
545
1386
654
2806
2199
2002
1198
2696
907
2438
1052
2394
438
2186
2377548
2753
648
1721
361
2124
2467
2070
856
2751
1077
1708
2393
1067
1223
2351
22712127
1059
1706
1739
74013881268
2573
2090
4641841
1234
2516
964
2171
What is the problem? How can we deal with concept drift? Summary
Network-based communication science study
What information can we extract from these networks?
Politicians are networking
Politics is perceived by citizens via media
Media study by semantic network analysis
Who is determining the subjects?Who is teaming up?Who is more credible?Who owns which topic?
What is the problem? How can we deal with concept drift? Summary
Before network analysis
We first need to build the networks!
Requires: large corpora with annotated textual content
Manual coding against coding books (ontologies)Automated content analysis in progress
What is the problem? How can we deal with concept drift? Summary
Before network analysis
We first need to build the networks!
Requires: large corpora with annotated textual content
Manual coding against coding books (ontologies)Automated content analysis in progress
What is the problem? How can we deal with concept drift? Summary
What is the problem?
Problems with constructing annotated content
Data from different time periods or genres
Coded by different teams at different moments
Manifesto Research Group: 25 countries, from 1945 to 2006Comparative Policy Agendas project: media content,manifestos, legislative texts, government press statements, etc.Election campaign coverage from 1994 to 2006
What is the problem? How can we deal with concept drift? Summary
What are the challenges?
Interoperability problem while sharing information
Different teams use different code books
Example
illegal immigration
labour migrants
Different coding books should be merged or at least connected
Not the focus of this paper
What is the problem? How can we deal with concept drift? Summary
What are the challenges?
Interoperability problem while sharing information
Different teams use different code books
Example
illegal immigration
labour migrants
Different coding books should be merged or at least connected
Not the focus of this paper
What is the problem? How can we deal with concept drift? Summary
What are the challenges?
Interoperability problem while sharing information
Different teams use different code books
Example
illegal immigration
labour migrants
Different coding books should be merged or at least connected
Not the focus of this paper
What is the problem? How can we deal with concept drift? Summary
Follow the Fashion?
What is the problem? How can we deal with concept drift? Summary
Women’s role?
Suffragettes said that women’s role in society is unacceptable
Pope says that women’s role in society is unacceptable
What is the problem? How can we deal with concept drift? Summary
Concept drift
Our problem: Concept drift
Meaning of concepts changes over time
Analysis based on evolving concepts must consider temporallocality
Study concept drift itself is useful
What is the problem? How can we deal with concept drift? Summary
Datasets
Five political ontologies which were used to annotatenewspaper articles
23 639 manually annotated newspaper articles during fiverecent Dutch national election campaigns
There even exist manual mappings but most of them arelexically very similar
What is the problem? How can we deal with concept drift? Summary
Detecting concept drift
We use extensional mapping techniques
Consider concepts at different time to be different concepts
Use extensional method to detect the links between conceptsat different time
Assumption: similar sentences should be coded with similarconcepts, therefore, similar concepts should have similarextension.
What is the problem? How can we deal with concept drift? Summary
Representing concept drift using mapping chains
What is the problem? How can we deal with concept drift? Summary
Evaluating concept drift
What can we learn from those chains?
Do they agree with the political reality?
Do they tell us something we do not noticed before?
Are some concepts more stable/unstable than others?
Quantitative evaluation is interesting, but qualitative analysisseems to tell us something too.
What is the problem? How can we deal with concept drift? Summary
Qualitative analysis of mapping chains
Association vs. similarity
Early erroneous associations can turn large parts of theanalysis practically useless.
What is the problem? How can we deal with concept drift? Summary
Qualitative analysis of mapping chains
Association vs. similarity
Early erroneous associations can turn large parts of theanalysis practically useless.
What is the problem? How can we deal with concept drift? Summary
“productiviteit” (Productivity)
94_productiviteit 98_welvaart valence0.0387
02_economische groei0.0657
02_welvaart
0.0587
03_economische groei
0.0569
03_financieringstekort0.0499
06_economic growth0.0880
06_begroting0.0315
0.0327
06_bezuinigingen0.0336
03_spaarloon0.0361
06_spaarloon0.1505
06_levensloopregeling
0.0518
“euthanasie” (Euthanasia)
94_euthanasie
98_oeuthanasie
0.2636
98_hreferendum
0.0457
02_euthanasie
0.1057
02_milieuactivist0.0768
03_euthanasie
0.2999
03_homohuwelijk0.2519
06_gay marriage0.1789
06_abortion
0.1704
0.3491
0.1883
03_milieuactivist0.2185
03_justitie0.0507
06_criminelen
0.0425
06_verbetering communicatie overheid burger0.0165
0.0310
06_asielzoekers
0.0291
02_referendum0.1016
02_cdavvdlpf
0.0882
03_referendum eu0.1117
03_referendum
0.0432
06_gratis schoolboeken0.0441
06_referendum
0.0313
0.0571
06_burgerinitiatief
0.0454
03_zondagsrust0.0293
03_scholieren
0.0257
06_werknemers0.0548
06_sunday rest
0.0398
06_leerlingen0.0511
06_education
0.0286
What is the problem? How can we deal with concept drift? Summary
If we know two end-point concepts have the same meaning
Kite-shaped chains
94_asielzoekers
98_rcriminaliteit
98_avluchtelingen
98_okerken
98_asielzoekers
98_kabinet kokmierlods
02_criminaliteit
02_jusititie
02_cellentekort
02_drugkoeriers
03_politie
03_justitie
03_criminaliteit
06_asielzoekers
02_mensenrechten
02_instroom beperking
02_asielzoekers
03_asielzoekers
03_opvang illegalen
02_democratie
02_buitenlanders
03_illegalen
03_vluchtelingen
02_bedrijfsleven
What is the problem? How can we deal with concept drift? Summary
“christelijken” (Christians)
94_christelijken
98_ochristelijk christenen
98_oabortus 02_normen waarden
02_multiculturele samenleving
03_multiculturele samenleving 06_christenen
“asielzoekers” (Asylum seeker)
94_asielzoekers
98_rcriminaliteit
98_avluchtelingen
98_okerken
98_asielzoekers
98_kabinet kokmierlods
02_criminaliteit
02_jusititie
02_cellentekort
02_drugkoeriers
03_politie
03_justitie
03_criminaliteit
06_asielzoekers
02_mensenrechten
02_instroom beperking
02_asielzoekers
03_asielzoekers
03_opvang illegalen
02_democratie
02_buitenlanders
03_illegalen
03_vluchtelingen
02_bedrijfsleven
What is the problem? How can we deal with concept drift? Summary
Summary
By looking at extensions of concepts, we can detect conceptdrift
Domain experts found that the detected concept drift makessense
Automated matching techniques can help domain experts tofind hidden links between concepts
More work needs to be done