Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and...

46
Honkela, Korhonen, Lagus & Saarinen, WSOM 2014 Timo Honkela, Jaakko Korhonen, Krista Lagus and Esa Saarinen Five-dimensional sentiment analysis of corpora, documents and words WSOM 2014, Mittweida, Germany 4th of June, 2014

Transcript of Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and...

Page 1: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Timo Honkela, Jaakko Korhonen,Krista Lagus and Esa Saarinen

Five-dimensional sentimentanalysis of corpora,

documents and words

WSOM 2014, Mittweida, Germany

4th of June, 2014

Page 2: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Timo Honkela Jaakko Korhonen Krista Lagus Esa Saarinen

Industrial Engineering and

Management

Information and Computer

Science

2014--

-2013

Page 3: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Original domain of interest:

Life-philosophical lecturing

– and how to understand its effects

Studentessays

Survey

Studentessays

Survey

Lecture series

Up to approx. 1,000 students

Before: After:

Page 4: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Different modes of education

● The dominant lecturing practices seek to function as a channel for predetermined knowledge and theories. Then the goal is to make the listeners to adopt the insights, scholarship or philosophy of the lecturer.

● In contrast, in life-philosophical lecturing ``the paramount aim is to facilitate, stimulate and vitalize the participants’ own life-philosophical thinking in the first-person - his or her use of the reflective mind''

● Life-philosophical lecturing is a form of positive philosophical practice and seeks key inspiration from the breakthroughs of the positive psychology movement

Saarinen, Esa (2013): Life-Philosophical Lecturing as a Systems-Intelligent Technology of the Self. The XXIII World Congress of Philosophy, Athens, Greece.

Page 5: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

THINK BEFORE

YOU THINK

http://www.aalto.fi/fi/current/news/2014-03-24-004/

Academician Teuvo Kohonen23rd of March, 2014

Page 6: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

http

://s

yste

msi

nte

llige

nce.

aalto

.fi/

“By Systems Intelligence we mean intelligent behaviour in the

context of complex systems involving interaction and

feedback. A subject acting with Systems Intelligence engages

successfully and productively with the holistic feedback

mechanisms of her environment. She perceives herself as

part of a whole, the influence of the whole upon herself as

well as her own influence

upon the whole.

By observing her own

interdependence in the

feedback intensive

environment, she is

able to act intelligently.”

Esa Saarinen and Raimo P. Hämäläinen (2004): Systems Intelligence: Connecting Engineering Thinking with Human Sensitivity.

Page 7: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Original research question

● How to assess the effects of life-philosophical lecturing among the students who participate the lecture series?

● Surveys filled in and essays written before andafter the lecture series

Data

Page 8: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Challenge and reorientation

● How to evaluate the developments thatpotentially take place related to individuals' abilities in using their reflective mind?

● After a series of experimental analyses and intensive research meetings on psychology, philosophy and methodology,we decided to focus on more specific researchquestions

● The original research question is such that it deserves a longer research program

Page 9: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

A longish detour:

Perspectives to language, cognitionand human knowing

and

How to model them with “our methods”

Page 10: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

What do we know about language?

What can we achieve by makingmachines to “read”?

Page 11: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Simulating processes of language emergence and communication 11

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Language as a system

● Considering natural language as a signal and dynamic system at cognitive and social levels (also in its written form) rather than a symbolic and logical system

● Importance of embodiment (cf. e.g. Harnad) and embeddedness (cf. e.g. Edelman)

● Learning and pattern recognition processes are essential (as opposed to the theories presented e.g. by Chomsky, Fodor, Pinker); much of the learning is bound to be unsupervised

Page 12: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

> 6000 languages,many more dialects Billions of people

blogs.state.gov

en.wikipedia.org

A large number ofdifferent cultures

en.wikipedia.org A vast number of ways to relatelanguage, concepts andthe world to each other

Page 13: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Example:

Complexity ofFinnish at thelevel of wordforms

Kimmo Koskenniemi (2013):Johdatus kieliteknologiaan,sen merkitykseen ja sovelluksiin(Introduction to language technology, its significance andapplications)

https://helda.helsinki.fi/bitstream/handle/10138/38503/kt-johd.pdf?sequence=1

Page 14: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Simulating processes of language emergence and communication 14

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

General communication system andmeasuring information (Shannon & Weaver)

INFORMATIONSOURCE TRANSMITTER RECEIVER DESTINATION

MESSAGE MESSAGE

NOISESOURCE

SIGNAL RECEIVEDSIGNAL

H = - Σ pi log piNoisy channel model

Page 15: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Simulating processes of language emergence and communication 15

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Weaver on Shannon

● “Relative to the broad subject of communication, there seem to be problems at three levels. [...]

– LEVEL A. How accurately can the symbols of communication be transmitted? (The technical problem)

– LEVEL B. How precisely do the transmitted symbols convey the desired meaning? (The semantic problem)

– LEVEL C. How effectively does the received meaning affect conduct in the desired way? (The effectiveness problem)”

● “The semantic problems are concerned with the identity, or satisfactorily close approximation, in the interpretation of meaning by the receiver, as compared with the intended meaning of the sender.” (1949, p. 4)

Page 16: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Point of view fromcognitive linguistics

● The meaning of linguistic symbols in the mind of the language users derives from the users' sensory perceptions, their actions with the world and with each other.

● For example: the meaning of the word 'walk' involves– what walking looks like– what it feels like to walk and after having walked

– how the world looks when walking (e.g. objects approach at a certain speed, etc.).

– ...

Page 17: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Meaning is contextual

red winered skinred shirt

Gärdenfors: Conceptual Spaces

Hardin: Color for Philosophers

Page 18: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Meaning is subjective

● Good● Fair● Useful● Scientific● Democratic● Sustainable● etc.

A proper theory ofmeaning has to takethis into account

Page 19: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Simulating processes of language emergence and communication 19

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Distributional hypothesis

● Two words are semantically similar to the extent that their contextual representations are similar (Miller & Charles 1991)

● The meaning of words is in their use (Wittgenstein)

Page 20: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Simulating processes of language emergence and communication 20

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Modeling distributional similarity: word space models

● Word space models represent meaning as points or areas in a high dimensional vector space– Self-Organizing Semantic Maps (Ritter and Kohonen 1989)

– LSA (Landauer & Dumais 1997)

– HAL (Lund & Burgess 1996)

– Conceptual spaces (Gärdenfors 2000)

– Word ICA (Honkela, Hyvärinen & Väyrynen 2004)

– etc. etc.

Page 21: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Classical example: Learning meaning from context:

Maps of words in Grimm fairy tales

Honkela, Pulkki & Kohonen 1995

Automated learning of word re

lations

using self-organizing m

ap on text c

ontext data

Page 22: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Map of Finnish Science

Chemistry

Physics andengineering

Biosciences

Medicine

Culture and society

A fully automated process from terminology extraction (Likey) to semantic space construction (SOM) without any manually constructed resources.

Page 23: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Simulating processes of language emergence and communication 23

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Language as dimensionalityreduction?

ICA of wordcontexts; nonlinearitythrough thresholding

Comparisonwith SVD/LSA

Effect of sparsenessand meaningfulemergent components

Data: TOEFL tests

(Väyrynen, Lindqvist, Honkela 2007)

Page 24: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Why brains?

● What are the central differences between plants and animals?

“The original need for a nervous system was to coordinate movement, so an organism could go find food, instead of waiting for the food to come to it.” http://www.fi.edu/learn/brain/

Page 25: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Simulating processes of language emergence and communication 25

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

(Förger, Honkela & Takala, 2013)

Page 26: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Simulating processes of language emergence and communication 26

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

(Honkela & Förger, 2013)

Page 27: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Förger, Honkela & Takala (2013)

WALKING

RUNNINGRUNNING

Page 28: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Timo Honkela, Juha Raitio, Krista Lagus, Ilari T. Nieminen, Nina Honkela, and Mika Pantzar:Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity (IJCNN 2012)

GICA: Analysis ofSubject-Object-Context tensors

Page 29: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

● Text mining is used in populating a Subject-Object-Context tensor

● This took place by calculating the frequencies on how often a subject uses an object word in the context of a context word– Context window of 30 words

GICA: State of the Union Addresses

Timo Honkela, Juha Raitio, Krista Lagus, Ilari T. Nieminen, Nina Honkela, and Mika Pantzar:Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity (IJCNN 2012)

Page 30: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Analysis of the word 'health'

Timo Honkela, Juha Raitio, Krista Lagus, Ilari T. Nieminen, Nina Honkela, and Mika Pantzar:Subjects on objects in contexts: Using GICA method to quantify epistemological subjectivity (IJCNN 2012)

Page 31: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Simulating processes of language emergence and communication 31

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Quantifying the effect of“semantic noise”

● Sintonen, Raitio & Honkela: “Quantifying the effect of meaning variation in survey analysis”, forthcoming in ICANN 2014

Page 32: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Concept Formation andCommunication - General Theory

Timo Honkela, Ville Könönen, Tiina Lindh-Knuutila, and Mari-Sanna Paukkeri. Simulating processes of concept formation and communication. Journal of Economic Methodology, 15(3):245–259, 2008.

 λ : Ci × Cj   → R, i ≠ jA distance between two points in the concept spaces of different agents

S: symbol space,The vocabulary of anagent that consists of discrete symbols

: sξ i   S∈ i → CAn individual mapping function from symbols to concepts

φi: Si   D→An individual mapping from agent i's vocabulary to the signal space D andan inverse mapping φ­

1 i from the signal 

space to the symbol space

Ci: N­dimensional metric concept space 

Observing f1 and after symbol selection process, agent 1 communicates a symbol s*to agent 2 as signal d.  When agent 2 observes d, it maps it  to some s2 

 S∈ 2  by using the function φ ­11.   

Then it maps the symbol to some point in its concept space by using ξ2.  If this point is close to its observation f2 in the sense of λ, the communication process has succeeded.

Page 33: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

On digital humanities

Page 34: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Digital humanities

● Research within humanities with the help of computers– Digital resources

– Computational models

● Basic motivation– One can already fly to moon and

build sophisticated factory products

– The most important open questionsin the world are related to humanitiesand social sciences

Page 35: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Digital Computational

Humanities

Contentstorage and

transfer

Contentanalysis

Page 36: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Societal and CulturalText Mining

Page 37: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Analyzing about 2 million pages in a historical newspaper collection digitized at the Center for Preservation and Digitisation, National Library of Finland, Mikkeli (“Mittweida of Finland”)

Page 38: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Page 39: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

End ofthe detour

Page 40: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

PERMA model

● Seligman and his colleagues has developed the PERMA model that addresses different aspects of wellbeing.

● The model includes five components related to subjective well-being: – Positive emotion (P),

– Engagement (E),

– Relationships (R),

– Meaning (M) and

– Achievement (A)

● Researchers have gathered a PERMA lexicon that is a collection of words that are associated with each of the components in a positive or negative manner

Page 41: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Using PERMA model and lexicon

1) PERMA profiling of document collections. This can provide an overall understanding of the nature of different corpora. We analyze the five-dimensional profile of corpora in six different genres.

2) PERMA profiling of individual documents. The second level of analysis is seen to be useful for the lecturer who is provided tools for familiarizing himself with certain aspects of hundreds of long essays written by the students.

3) Comparison of PERMA and non-PERMA words. This analysis can be conducted, for example, in order to find new PERMA word candidates. We use here the SOM for this purpose.

Page 42: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

One challenge: Complexity of Finnish morphology

Page 43: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

PERMA profiles of different corpora

Page 44: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Extending the coverage ofa theory-based vocabulary

Page 45: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

A map of sentiment words based on context statistics obtained from theWikipediaA corpus.

The words that belong to the PERMA lexicon are marked with a label that indicates the category

Page 46: Honkela, Korhonen, Lagus & Saarinen: Five-Dimensional Sentiment Analysis of Corpora, Documents and Words

Honkela, Korhonen, Lagus & Saarinen, WSOM 2014

Danke schön!Kiitos!Tack!Merci!謝謝!

Σας ευχαριστούμε!¡Gracias!