Data mining for analyzing social media

14
06/05/2013 1 Data mining for analyzing the social media Social Networks Video/picture sharing Opinions News websites Blogs Knowledge sharing Microblogging eminar at 4/18/2013 PresentaCon: J. Velcin hGp://mediamining.univlyon2.fr/people/velcin eminar at housie University – 4/18/2013 – ulien elcin Ecosystem of ERIC Lab 2 BSc & MSc degrees BI, data mining, staCsCcs 2 teams: SID & DMD Academics Companies Context The big picture Online discussions ½ sup. clustering ImagiWeb Conclusion Lyon eminar at housie University – 4/18/2013 – ulien elcin Research landscape 3 Data Data warehouse Knowledge ETL Online analysis Data mining D e c i s i o n Complex data integraCon MulCdimensional modeling Context The big picture Online discussions ½ sup. clustering ImagiWeb Conclusion Data Mining & Decision (DMD) eminar at housie University – 4/18/2013 – ulien elcin Data Mining & Decision (DMD) 4 Social Networks Microblogging Video/picture sharing Opinion sharing News websites Blogs Knowledge sharing e.g. Social Media heterogeneous voluminous interconnected evolving RecommandaCon Summzariz aCon InformaCon retrieval MulCcriteria analysis Machine learning Graph analysis Complex data analysis Topological learning Text mining Prac<cal issue Approach Goal: coping with complex data Context The big picture Online discussions ½ sup. clustering ImagiWeb Conclusion

Transcript of Data mining for analyzing social media

Page 1: Data mining for analyzing social media

06/05/2013  

1  

Data  mining  for  analyzing  the  social  media   Social  

Networks  

Video/picture  sharing  

Opinions  

News  websites  

Blogs  

Knowledge  sharing  Microblogging  

eminar  at                              4/18/2013  

PresentaCon:  J.  Velcin  hGp://mediamining.univ-­‐lyon2.fr/people/velcin  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Ecosystem  of  ERIC  Lab  

2

Axe Carrés 2 ter

BSc  &  MSc  degrees  

BI,  data  mining,  staCsCcs  2  teams:  SID  &  DMD  

Academics  

Companies  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Lyon  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Research  landscape  

3

Data  Data-­‐

warehouse  Knowledge  

ETL  

Online  analysis  

Data  mining  

Decision  

Complex  data  integraCon  

MulCdimensional  modeling  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Data  Mining  &  Decision  (DMD)  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Data  Mining  &  Decision  (DMD)  

4

Social  Networks  

Microblogging    

Video/picture  sharing  

Opinion  sharing  

News  websites  

Blogs  

Knowledge  sharing  

e.g.  Social  Media  -­‐   heterogeneous  -­‐   voluminous  -­‐   interconnected  -­‐   evolving  

RecommandaCon   SummzarizaCon  

InformaCon  retrieval  

MulCcriteria  analysis  

Machine  learning   Graph  analysis  

Complex  data  analysis  

Topological  learning   Text  mining  

Prac<cal  issue  

Approach  

Goal:  coping  with  complex  data  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Page 2: Data mining for analyzing social media

06/05/2013  

2  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Outline  

 "  The  big  picture  "  Modeling  and  analyzing  online  discussions  "  Semi-­‐supervised  clustering  "  Focus  on  Project  ImagiWeb  "  Future  lines  of  research  

5

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Outline  

"  The  big  picture  "  Modeling  and  analyzing  online  discussions  "  Semi-­‐supervised  clustering  "  Focus  on  Project  ImagiWeb  "  Future  lines  of  research  

6

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Section  1  The  big  picture  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

"   A  long  questioning  "   Social  representation  through  the  media  

[Lippman,22]  [Moscovici,76]  [Newman  and  Block,06]  

"   Numeric  watch  on  the  Web  [Chateauraynaud,03]  

8

Public  event  

From  facts  to  people:  the  essential  role  of  media  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Page 3: Data mining for analyzing social media

06/05/2013  

3  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Information  overload  

9

Image  credit:  Go-­‐Globe.com

 

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Data  journalism  

10

"   Crucial  need  to  catch  the  meaning  of  voluminous  data  provided  by  modern  social  media,  in  order  to  design  new  search  engine  systems  

"   In  particular  (MSND  workshop@WWW’12)  

"   “How  to  surface  the  best  comments,  videos  and  pictures  from  a  variety  of  sources  in  real  time  and  then  how  to  verify  them  ?”  

"   “How  to  quickly  surface  the  best  comments  and  work  out  which  ones  are  worth  investigating  further  ?”  

"   “How  to  identify  quickly  the  key  influencers  on  any  particular  story,  so  they  can  get  inside  information  or  interview  them  for  their  news  outlets  ?”  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Salvaged  by  (media)  curation?  

" Term  originated  from  Art,  appears  ~2011  " Three-­‐step  process:  

" Aggregation:  gathering  " Editorialize:  sorting,  categorizing,  

summarizing,  presenting…  " Disseminate:  contextualizing,  sharing  

"   Important  role  of  the  curator  "   Difference  between  “full  curation”  and  

automatic  edition  (e.g.,  paper.li)  "   Many  platforms  (Scoop.it!,  Storify,  Storiful,  

Hopflow,  Stumbleupon,  Patch…):  http://socialcompare.com/fr/comparison/curation-­‐platforms-­‐amplify-­‐knowledge-­‐plaza-­‐storify    

 

11

[Rosenbaum,11]  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

A  case  study:  the  “HuffPost”  

12

"   Linked  with  social  networks  "   Topically  indexed  "   Available  on  various  devices  "   Commented  news  

"   Community  of  bloggers  

"   Journalist  can  play  both  the  roles  of  curator  and  community  manager  

 

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Page 4: Data mining for analyzing social media

06/05/2013  

4  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Outline  

"  The  big  picture  "  Modeling  and  analyzing  online  discussions  "  Semi-­‐supervised  clustering  "  Focus  on  Project  ImagiWeb  "  Future  lines  of  research  

13

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Section  2  Modeling  and  analyzing  

online  discussions  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Online  discussions  

"   Motivation:  "   Numerous  available,  often  underused  data  "   Crucial  to  feel  the  opinion  of  people    

 "   Contributions:  

"   Recommending  key  messages  [Stavrianou  et  al.,09,10]  "   Extracting  the  latent  social  network  [Forestier  et  al.,11]  "   Detecting  celebrities  from  online  forums  [Forestier  et  al.,12]  "   Surfacing  roles  with  unsupervised  mechanisms  [Anukhin  et  al.,12]  

15

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   16 Julien Velcin - présentation ARC6 18 Octobre 2012

Page 5: Data mining for analyzing social media

06/05/2013  

5  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Anatomy  of  an  online  discussion  

17

A  

B  

C  

A  

C  

B  

D D

A

B

C

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Recommending  key  messages  

"   “interesting”  message:  popular,  opinionated,  pioneer  etc.  " Formalization  of  6  criteria  +  simple  aggregation  " Comparison  to  manually-­‐labelled  data  on  8  french  forums  " Results  for  a  priori  evaluation:  

"   F1-­‐Measure  ranges  from  0.2  to  0.3  for  a  single  criterion  "   F1-­‐Measure  equals  0.48  for  aggregated  criteria  (simple  mean)  

" Results  for  a  posteriori  evaluations:  

18

1  [Stavrianou  et  al.,09,10]  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Extracting  the  (latent)  social  network  

"   Latent SN = reply-to links + name citation + text quotation "   Name citation: bad spelling, compound names, abbreviations…

(what about “obama49”?) "   Our solution: edit distance, soundex, PoS to detect nouns

"   Text quotation: cut-paste without quotation marks, rephrasing… "   Our solution: string matching, locality principle (comparing close

messages), use quotation marks if provided

19

2  [Forestier  et  al.,11]  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Detecting  celebrities  

" Modeling the forum discussion with a graph G=(V,E) " vertice v = forum participant " edge e = link (implicit or explicit) between two participants

" Weighted in-degree of v: deg-(v) " Weighted out-degree of v: deg+(v) "   p(v) = set of messages posted by v "   p~ = average of messages " thr(v) = set of threads not initiated by v

20

3   [ForesCer  et  al.,12]  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Page 6: Data mining for analyzing social media

06/05/2013  

6  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Detecting  celebrities  

"   Extracting social roles from a SN is a key issue [Fisher et al.,06] [Himelboim et al.,09] [Forestier et al.,12]

"   Some examples of roles: "   Leader: very participative user, who initiates discussion threads and

makes the animation

"   Expert: user particularly active in a restrictive number of topics "   Celebrity: public person well known by the participants " Flammer: user with a negative behavior, who can generate conflicts "   Lurker: user who has a low participation in the discussion

"   In the following, we have chosen to focus on the explicit “celebrity” role within online discussion forums

21

3  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Detecting  celebrities  

" Formalize the criteria given by [Golder and Donath,04]

22

3  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Detecting  celebrities  

"   Based on these atomic criteria, we define 3 meta-criteria: "   meta-criterion 1: all the basic criteria must be satisfied (necessary

conditions), and we rank the interesting users in descending order relative to the total number of posts

"   meta-criterion 2: id. but with a ranking depending on the user’s average forum participation multiplied by the number of posts

"   meta-criterion 3: id. but taking into account name citation and text quotation

"   Evaluation measure: compare the ranking of our meta-criteria with the number of fans of each user (>800) = gold standard

"   Dataset: "   57 forums from the US version of the Huffington Post "   3 topics: politics, media, living "   Overall 11,443 unique users and 35,175 posts

23

3  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin   24

[Forestier  et  al.,12]  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Page 7: Data mining for analyzing social media

06/05/2013  

7  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Surfacing  roles  

"   New collaboration between and

"   Bottom-up “emerging” roles:

25

Axe Carrés 2 ter

4  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Surfacing  roles  

"   Discussions about 6 popular TV shows from TWOP forums

"   Parent-child relationship is restored using “quote” mechanism: "   check previous 20 messages in the thread; "   a parent has to contain at least 95% of the quoted text.

26

4  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Surfacing  roles  

" Profiling users using temporal-aware features: " weighted in-degree, " weighted out-degree, " node in-g-index, " node out-g-index, " catalytic power, " number of posts, "   cross-topic entropy.

"   The role identification procedure is applied to the time series of feature vectors of 1 263 forum users.

" Using moving time windows (size=1 week, shift=1 day)

27

4  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Surfacing  roles  

"   Clustering time series "   Basic k-means algorithm " Hartigan’s index used for estimating the best k

28

[Anokhin  et  al.,12]  

4  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Page 8: Data mining for analyzing social media

06/05/2013  

8  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Surfacing  roles  

" Some  observations:  

29

4  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Outline  

"  The  big  picture  "  Modeling  and  analyzing  online  discussions  "  Semi-­‐supervised  clustering  "  Focus  on  Project  ImagiWeb  "  Future  lines  of  research  

30

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Section  4  Semi-­‐supervised  

clustering  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Temporal-­‐driven  clustering  

"   Goal:  detecting  typical  patterns  over  time  

"   How  to  deal  with  temporally  described  entities?  

"   Applications:  "   Evolution  of  nation’s  political  

states  (proof  of  concept)  "   Trajectories  over  roles  "   Evolution  of  entities’  images  

(c.f.  ImagiWeb)  32

φ2  

φ1  

t1  

t2  

t3  

t1  

t2  

t3  

x1d  

x2d  

x3d  

x4d  

x5d  

x6d  

t2   t3  t1  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Page 9: Data mining for analyzing social media

06/05/2013  

9  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Temporal-­‐driven  clustering  

" Detect  typical  evolution  patterns  of  individuals  in  the  dataset:  "   phases  through  which  the  entity  

collection  went  over  time  

" trajectory  of  entities  through  the  different  phases  

33

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Temporal-­‐aware  constrained  clustering  

"   The  resulted  partition  must  ensure:  "   descriptive  coherence  of  clusters;  "   temporal  coherence  of  clusters;  " continuous  segmentation  of  observations    

belonging  to  an  entity  

"   Objective  function  to  minimize  (inspired  by  semi-­‐supervised  clustering  clustering  [Wagstaff  and  Cardie,00])  +  use  of  K-­‐Means-­‐like  algorithm:  

34

Temporal-­‐aware  dissimilarity  measure  

ConCguity  penalty  measure  

(a)  

(b)  

(a)   (b)  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Experiments  on  political  dataset  

"   23  countries,  60  years  "   207  political,  demographic,  social  and  economic  variables  "   Running  TDCK-­‐Means  (8  clusters,  β  =  0.003  and  δ  =  3)  

35

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Experiments  on  political  dataset  

36

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Page 10: Data mining for analyzing social media

06/05/2013  

10  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Experiments  on  political  dataset  

37

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Experiments  on  political  dataset  

38

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Outline  

"  The  big  picture  "  Modeling  and  analyzing  online  discussions  "  Semi-­‐supervised  clustering  "  Focus  on  Project  ImagiWeb  "  Future  lines  of  research  

39

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Section  5  Focus  on  Project  

ImagiWeb  hGp://eric.univ-­‐lyon2.fr/~jvelcin/imagiweb  

Page 11: Data mining for analyzing social media

06/05/2013  

11  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Project  ImagiWeb  

"   Goal  of  Project  ANR  ImagiWeb:  analyzing  the  life  cycle  (production,  diffusion,  evolution)  of  images  through  the  Web  2.0  

" Strong  points:  "   Joint  analysis  of  opinions,  topics,  social  networks…  " Involvement  of  (true)  researchers  in  LLSSH  

" Partners:  "   ERIC:  data  mining,  machine  learning  "   LIA:  text/opinion  mining,  information  retrieval  "   CEPEL:  social  scientists,  specialist  in  politics  study  "   XRCE:  information  extraction,  NLP  "   AMI  Soft.:  numeric  watch  "   EDF  R&D:  end-­‐user,  semiology  study  

41

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Project  ImagiWeb  

42

!"#$%&'

("$)*"+,$)&'

)-.'/')"$*)0*1&)&2'

3455)&'0461#7,)&'

(5+8)'

%51&)'

(5+8)'

0)*9,)'

(5+8)'

0)*9,)'

(5+8)'

0)*9,)'

:455)"$+1*)&'

;%<1+&'<)'

=455,"1=+#4"'>&1$)&'?)@2'06+7,)A)2')$=.B'

C"+6D&)'<)&'<4""%)&'

<E)-0*)&&14"'

C"+6D&)'<)&'

040,6+#4"&'

F))<@+=G' (;CH(I!J'

%5)A),*&'

%5)A),*&'

*%=)0$),

*&'

*%=)0$),*&'

*%=)0$),*&'

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Platform  for  performing  the  annotation  

"   Web  applications  designed  for  annotating  ~10k  tweets  +  200  blog  comments;  22  annotators  are  working  on  it  right  now!  

"   Output:  (mφ  ;  mt;  mp  ;  ma  ;  mt  ;  ms  )  

43

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Platform  for  performing  the  annotation  

"   Web  applications  designed  for  annotating  ~10k  tweets  +  200  blog  comments;  22  annotators  are  working  on  it  right  now!  

"   Output:  (mφ  ;  mt;  mp  ;  ma  ;  mt  ;  ms  )  

44

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

Page 12: Data mining for analyzing social media

06/05/2013  

12  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Catching  image’s  evolution  over  time  

"   Input:  set  of  tuples  (mφ  ;  mt;  mp  ;  ma  ;  mt  ;  ms  )  "   Some  good  questions:  

"   What  is  an  image?  "   How  to  sum  up  the  bunch  of  (temporally-­‐situated  and  spatially-­‐located)  opinions?  

"   First  insight:  investigating  time  series  analysis,  temporally-­‐driven  clustering,  graphical  models…  

"   Fortunately  we’ll  have  a  fulltime  post-­‐doc  student  to  work  on  it!  

45

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Recent  work  on  opinion  mining  

"   Participation  to  Sem-­‐Eval  2013  " Task  2.B:  Discriminating  positive  (+)  from  negative  (-­‐)  

opinions  (+  neutral)  " Very  recent  work:  improving  basic  NB  by  using  

background  knowledge  (seed  lists)  "   6/35  and  3/16  on  the  official  tweet  dataset!  " Results  on  our  own  datasets:  

46

[paper  just  submiGed]  

Context   The  big  picture   Online  discussions   ½  -­‐sup.  clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Outline  

"  The  big  picture  "  Modeling  and  analyzing  online  discussions  "  Semi-­‐supervised  clustering  "  Focus  on  Project  ImagiWeb  "  Future  lines  of  research  

47

Context   The  big  picture   Online  discussions   Topics   Clustering   ImagiWeb   Conclusion  

Section  6  Future  lines  of  

research  

Page 13: Data mining for analyzing social media

06/05/2013  

13  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

An  integrated  view  

                   Research  +  tools  +  applications  

"   Ongoing  Research  "   Structured  temporal-­‐driven  clustering  (M.  A.  Rizoiu,  PhD  student)  "   Bridging  the  gap  between  topics  and  concepts  (M.  A.  Rizoiu,  PhD  student)  "   Multi-­‐document  summarization  of  online  discussions  (C.  Cercel,  PhD  student,  in  

collaboration  with  the  Polytechnic  Institute  of  Bucharest)  "   Bottom-­‐up,  dynamic  extraction  of  roles  (A.  Lumbreras,  PhD  students,  in  

collaboration  with  Technicolor)  "   Dynamic  joint  extraction  of  topics  and  opinions  (M.  Dermouche,  PhD  student,  in  

collaboration  with  AMI  Software)  "   Extracting  opinionated  images  from  tweets  and  blogs  in  an  unsupervised  way  (Y.  

Kim,  post-­‐doc  student,  in  collaboration  with  LIA)  

49

Context   The  big  picture   Online  discussions   Topics   Clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

An  integrated  view  

"   Tools  " MediaMining:  a  full  open-­‐access  platform  for  analyzing  online  discussions  

"   Applications  "   Reputation  Management  services  

 =>  Project  ImagiWeb,  with  specialist  in  political  studies  (2012-­‐2015,  ~860k)  "   Discourse  analysis  in  public  opinion  

 =>  Project  DANuM,  with  linguists  (2013-­‐2014,  23k)      =>  Project  ALICE,  with  social  scientists  and  specialists  in  communication  

 (just-­‐submitted)  " The  next  step:  datamining-­‐based  services  for  “curation  support”,  with  specialist  in  

communication  and  journalists  

50

Context   The  big  picture   Online  discussions   Topics   Clustering   ImagiWeb   Conclusion  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

Focus  on  the  collaboration  DAL/Lyon  

"   3  possible  scientific  contributions:  " Labeling  hierarchical  topic  models  " Labeling  dynamic  topic  models  " Visualization  of  hierarchical/dynamic  topic  models  

51

ArCficial  Neuronal  Network  

Neuroscience  

OpCmizaCon  

Efficiency  (staCsCcs)  

Learning  theory  

Vision  chip  GeneraCve  

model  

Graphical  models  

Neural  networks  

Background  

Computer  vision  

Markov  decision  process  

ComputaConal  complexity  

theory  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

References  (excerpt)  

" Anokhin  N.,  J.  Lanagan,  J.  Velcin  (2012),  Social  Citation:  Finding  Roles  in  Social  Networks.  An  Analysis  of  TV-­‐Series  Web  Forums.  Second  International  Workshop  on  Mining  Communities  and  People  Recommenders  (COMMPER),  in  conjunction  with  ECML/PKDD,  Bristol,  UK.  

" Dermouche  M.,  J.  Velcin,  S.  Loudcher,  L.  Khouas  (2013),  Une  nouvelle  mesure  pour  l'évaluation  des  méthodes  d'extraction  de  thématiques  :  la  Vraisemblance  Généralisée.  Actes  de  la  13ème  Conférence  Francophone  sur  l'Extraction  et  la  Gestion  des  Connaissances  (EGC).  Toulouse,  France.  

"   Forestier,  M.,  Stavrianou,  A.,  Velcin,  J.  and  Zighed,  D.A.  (2012),  Roles  in  Social  Networks:  Methodologies  and  Research  Issues.  Web  Intelligence  and  Agent  Systems:  An  International  Journal  (WIAS).  

" Musat,  C.,  Velcin,  J.,  Rizoiu,  M.A.  and  Trausan-­‐Matu,  S.  (2011),  Improving  Topic  Evaluation  Using  Conceptual  Knowledge.  Proceedings  of  the  22nd  International  Joint  Conference  on  Artificial  Intelligence  (IJCAI).  Barcelona,  Spain.  

" Rizoiu  M.A.,  J.  Velcin,  S.  Lallich  (2012),  Structuring  typical  evolutions  using  Temporal-­‐Driven  Constrained  Clustering.  Proceedings  of  the  24th  IEEE  Internatinal  Conference  on  Tools  with  Artificial  Intelligence  (ICTAI).  Athens,  Greece.  Best  student  paper  award.  

" Stavrianou,  A.,  Velcin,  J.  and  Chauchat,  J.H.  (2009),  A  combination  of  opinion  mining  and  social  network  techniques  for  discussion  analysis.  Revue  des  Nouvelles  Technologies  de  l'Information  (RNTI),  Cepadues.  

52

Context   The  big  picture   Online  discussions   Topics   Clustering   ImagiWeb   Conclusion  

Page 14: Data mining for analyzing social media

06/05/2013  

14  

eminar  at   housie  University  –  4/18/2013  –  ulien   elcin  

     Thank  you!  

53

Context   The  big  picture   Online  discussions   Topics   Clustering   ImagiWeb   Conclusion