Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and...
-
Upload
timo-honkela -
Category
Education
-
view
350 -
download
0
Transcript of Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and...
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Timo Honkela, Jaakko Korhonen,Krista Lagus and Esa Saarinen
Five-dimensional sentimentanalysis of corpora,
documents and words
WSOM 2014, Mittweida, Germany
4th of June, 2014
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Timo Honkela Jaakko Korhonen Krista Lagus Esa Saarinen
Industrial Engineering and
Management
Information and Computer
Science
2014--
-2013
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Original domain of interest:
Life-philosophical lecturing
– and how to understand its effects
Studentessays
Survey
Studentessays
Survey
Lecture series
Up to approx. 1,000 students
Before: After:
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Different modes of education
● The dominant lecturing practices seek to function as a channel for predetermined knowledge and theories. Then the goal is to make the listeners to adopt the insights, scholarship or philosophy of the lecturer.
● In contrast, in life-philosophical lecturing ``the paramount aim is to facilitate, stimulate and vitalize the participants’ own life-philosophical thinking in the first-person - his or her use of the reflective mind''
● Life-philosophical lecturing is a form of positive philosophical practice and seeks key inspiration from the breakthroughs of the positive psychology movement
Saarinen, Esa (2013): Life-Philosophical Lecturing as a Systems-Intelligent Technology of the Self. The XXIII World Congress of Philosophy, Athens, Greece.
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
THINK BEFORE
YOU THINK
http://www.aalto.fi/fi/current/news/2014-03-24-004/
Academician Teuvo Kohonen23rd of March, 2014
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
http
://s
yste
msi
nte
llige
nce.
aalto
.fi/
“By Systems Intelligence we mean intelligent behaviour in the
context of complex systems involving interaction and
feedback. A subject acting with Systems Intelligence engages
successfully and productively with the holistic feedback
mechanisms of her environment. She perceives herself as
part of a whole, the influence of the whole upon herself as
well as her own influence
upon the whole.
By observing her own
interdependence in the
feedback intensive
environment, she is
able to act intelligently.”
Esa Saarinen and Raimo P. Hämäläinen (2004): Systems Intelligence: Connecting Engineering Thinking with Human Sensitivity.
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Original research question
● How to assess the effects of life-philosophical lecturing among the students who participate the lecture series?
● Surveys filled in and essays written before andafter the lecture series
Data
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Challenge and reorientation
● How to evaluate the developments thatpotentially take place related to individuals' abilities in using their reflective mind?
● After a series of experimental analyses and intensive research meetings on psychology, philosophy and methodology,we decided to focus on more specific researchquestions
● The original research question is such that it deserves a longer research program
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
A longish detour:
Perspectives to language, cognitionand human knowing
and
How to model them with “our methods”
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
What do we know about language?
What can we achieve by makingmachines to “read”?
Simulating processes of language emergence and communication 11
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Language as a system
● Considering natural language as a signal and dynamic system at cognitive and social levels (also in its written form) rather than a symbolic and logical system
● Importance of embodiment (cf. e.g. Harnad) and embeddedness (cf. e.g. Edelman)
● Learning and pattern recognition processes are essential (as opposed to the theories presented e.g. by Chomsky, Fodor, Pinker); much of the learning is bound to be unsupervised
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
> 6000 languages,many more dialects Billions of people
blogs.state.gov
en.wikipedia.org
A large number ofdifferent cultures
en.wikipedia.org A vast number of ways to relatelanguage, concepts andthe world to each other
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Example:
Complexity ofFinnish at thelevel of wordforms
Kimmo Koskenniemi (2013):Johdatus kieliteknologiaan,sen merkitykseen ja sovelluksiin(Introduction to language technology, its significance andapplications)
https://helda.helsinki.fi/bitstream/handle/10138/38503/kt-johd.pdf?sequence=1
Simulating processes of language emergence and communication 14
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
General communication system andmeasuring information (Shannon & Weaver)
INFORMATIONSOURCE TRANSMITTER RECEIVER DESTINATION
MESSAGE MESSAGE
NOISESOURCE
SIGNAL RECEIVEDSIGNAL
H = - Σ pi log piNoisy channel model
Simulating processes of language emergence and communication 15
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Weaver on Shannon
● “Relative to the broad subject of communication, there seem to be problems at three levels. [...]
– LEVEL A. How accurately can the symbols of communication be transmitted? (The technical problem)
– LEVEL B. How precisely do the transmitted symbols convey the desired meaning? (The semantic problem)
– LEVEL C. How effectively does the received meaning affect conduct in the desired way? (The effectiveness problem)”
● “The semantic problems are concerned with the identity, or satisfactorily close approximation, in the interpretation of meaning by the receiver, as compared with the intended meaning of the sender.” (1949, p. 4)
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Point of view fromcognitive linguistics
● The meaning of linguistic symbols in the mind of the language users derives from the users' sensory perceptions, their actions with the world and with each other.
● For example: the meaning of the word 'walk' involves– what walking looks like– what it feels like to walk and after having walked
– how the world looks when walking (e.g. objects approach at a certain speed, etc.).
– ...
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Meaning is contextual
red winered skinred shirt
Gärdenfors: Conceptual Spaces
Hardin: Color for Philosophers
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Meaning is subjective
● Good● Fair● Useful● Scientific● Democratic● Sustainable● etc.
A proper theory ofmeaning has to takethis into account
Simulating processes of language emergence and communication 19
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Distributional hypothesis
● Two words are semantically similar to the extent that their contextual representations are similar (Miller & Charles 1991)
● The meaning of words is in their use (Wittgenstein)
Simulating processes of language emergence and communication 20
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Modeling distributional similarity: word space models
● Word space models represent meaning as points or areas in a high dimensional vector space– Self-Organizing Semantic Maps (Ritter and Kohonen 1989)
– LSA (Landauer & Dumais 1997)
– HAL (Lund & Burgess 1996)
– Conceptual spaces (Gärdenfors 2000)
– Word ICA (Honkela, Hyvärinen & Väyrynen 2004)
– etc. etc.
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Classical example: Learning meaning from context:
Maps of words in Grimm fairy tales
Honkela, Pulkki & Kohonen 1995
Automated learning of word re
lations
using self-organizing m
ap on text c
ontext data
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Map of Finnish Science
Chemistry
Physics andengineering
Biosciences
Medicine
Culture and society
A fully automated process from terminology extraction (Likey) to semantic space construction (SOM) without any manually constructed resources.
Simulating processes of language emergence and communication 23
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Language as dimensionalityreduction?
ICA of wordcontexts; nonlinearitythrough thresholding
Comparisonwith SVD/LSA
Effect of sparsenessand meaningfulemergent components
Data: TOEFL tests
(Väyrynen, Lindqvist, Honkela 2007)
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Why brains?
● What are the central differences between plants and animals?
“The original need for a nervous system was to coordinate movement, so an organism could go find food, instead of waiting for the food to come to it.” http://www.fi.edu/learn/brain/
Simulating processes of language emergence and communication 25
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
(Förger, Honkela & Takala, 2013)
Simulating processes of language emergence and communication 26
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
(Honkela & Förger, 2013)
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Förger, Honkela & Takala (2013)
WALKING
RUNNINGRUNNING
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Timo Honkela, Juha Raitio, Krista Lagus, Ilari T. Nieminen, Nina Honkela, and Mika Pantzar:Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity (IJCNN 2012)
GICA: Analysis ofSubject-Object-Context tensors
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
● Text mining is used in populating a Subject-Object-Context tensor
● This took place by calculating the frequencies on how often a subject uses an object word in the context of a context word– Context window of 30 words
GICA: State of the Union Addresses
Timo Honkela, Juha Raitio, Krista Lagus, Ilari T. Nieminen, Nina Honkela, and Mika Pantzar:Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity (IJCNN 2012)
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Analysis of the word 'health'
Timo Honkela, Juha Raitio, Krista Lagus, Ilari T. Nieminen, Nina Honkela, and Mika Pantzar:Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity (IJCNN 2012)
Simulating processes of language emergence and communication 31
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Quantifying the effect of“semantic noise”
● Sintonen, Raitio & Honkela: “Quantifying the effect of meaning variation in survey analysis”, forthcoming in ICANN 2014
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Concept Formation andCommunication - General Theory
Timo Honkela, Ville Könönen, Tiina Lindh-Knuutila, and Mari-Sanna Paukkeri. Simulating processes of concept formation and communication. Journal of Economic Methodology, 15(3):245–259, 2008.
λ : Ci × Cj → R, i ≠ jA distance between two points in the concept spaces of different agents
S: symbol space,The vocabulary of anagent that consists of discrete symbols
: sξ i S∈ i → CAn individual mapping function from symbols to concepts
φi: Si D→An individual mapping from agent i's vocabulary to the signal space D andan inverse mapping φ
1 i from the signal
space to the symbol space
Ci: Ndimensional metric concept space
Observing f1 and after symbol selection process, agent 1 communicates a symbol s*to agent 2 as signal d. When agent 2 observes d, it maps it to some s2
S∈ 2 by using the function φ 11.
Then it maps the symbol to some point in its concept space by using ξ2. If this point is close to its observation f2 in the sense of λ, the communication process has succeeded.
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
On digital humanities
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Digital humanities
● Research within humanities with the help of computers– Digital resources
– Computational models
● Basic motivation– One can already fly to moon and
build sophisticated factory products
– The most important open questionsin the world are related to humanitiesand social sciences
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Digital Computational
Humanities
Contentstorage and
transfer
Contentanalysis
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Societal and CulturalText Mining
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Analyzing about 2 million pages in a historical newspaper collection digitized at the Center for Preservation and Digitisation, National Library of Finland, Mikkeli (“Mittweida of Finland”)
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
End ofthe detour
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
PERMA model
● Seligman and his colleagues has developed the PERMA model that addresses different aspects of wellbeing.
● The model includes five components related to subjective well-being: – Positive emotion (P),
– Engagement (E),
– Relationships (R),
– Meaning (M) and
– Achievement (A)
● Researchers have gathered a PERMA lexicon that is a collection of words that are associated with each of the components in a positive or negative manner
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Using PERMA model and lexicon
1) PERMA profiling of document collections. This can provide an overall understanding of the nature of different corpora. We analyze the five-dimensional profile of corpora in six different genres.
2) PERMA profiling of individual documents. The second level of analysis is seen to be useful for the lecturer who is provided tools for familiarizing himself with certain aspects of hundreds of long essays written by the students.
3) Comparison of PERMA and non-PERMA words. This analysis can be conducted, for example, in order to find new PERMA word candidates. We use here the SOM for this purpose.
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
One challenge: Complexity of Finnish morphology
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
PERMA profiles of different corpora
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Extending the coverage ofa theory-based vocabulary
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
A map of sentiment words based on context statistics obtained from theWikipediaA corpus.
The words that belong to the PERMA lexicon are marked with a label that indicates the category
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014
Danke schön!Kiitos!Tack!Merci!謝謝!
Σας ευχαριστούμε!¡Gracias!