Harvesting Knowledge from Social Networks: Extracting Typed Relationships among Entities
-
Upload
marco-brambilla -
Category
Data & Analytics
-
view
255 -
download
0
Transcript of Harvesting Knowledge from Social Networks: Extracting Typed Relationships among Entities
Harvesting Knowledge from Social Networks:
Extracting Typed Relationships among Entities
Andrea Caielli, Marco Brambilla, Stefano Ceri, Florian Daniel
marcobrambiSoWeMine Workshop @ ICWE 2017, Rome, Italy
Agenda
(1)Context
(2)Objectives
(3)Method
(4)Experiments and Validation
(5)Visualization and Exploration
(6)Conclusions
(1) Context
Ontology is the philosophical study ofthe nature of being, becoming,
existence or realityand the basic categories of being and their
relations.
Formalizing new knowledge is hard
Only high frequency emerges
The long tail challenge
Sourcing the Long Tail
Famous Emerging
…
(2) Objective
Objective
Extraction of relationships among entities
Reconstruct a typed graph of entities & relationships
Represent the knowledge contained in social data
No need for a-priori domain knowledge
Knowledge Enrichment Setting
HF Entity1 HF Entity5
HF Entity2 HF Entity4
HF Entity3
LF Entity1??
LF Entity2 LF Entity4
LF Entity3
??
High Frequency
Entities
Low Frequency
Entities
??
??????
??
Type1
Type11
Type2
Type111
InstancesTypes
<<instanceof>>
<<instanceof>>
<<in
stan
ceof
>>
<<instanceof>>
<<instanceof>>
<<instanceof>>
??
??
??
??
??
Seed Entity
Seed TypeType of
interest
Legend
Expert inputs
Enrichment problems
Property2
Relations HF - LF entities
Relations LF - LF entities
Typing of LF entities
Extraction of new LF entities
Property1
?? ?? ??Finding attribute values
A Practical Example
A Practical Example
Challenge and Innovation
Highly unstructured social data (tweets and Facebook posts)
No reliable grammar structures
(3) Method
Analysis Pipeline
(0) Preprocessing
(1) Entity Extraction
(2) Relationship Extraction
(3) Relationship Aggregation
(4) Relationship Typing
(1) Evolution of work presented in:
M. Brambilla, S. Ceri, E. Della Valle, R. Volonterio, and F. Acero Salazar.
“Extracting Emerging Knowledge from Social Media”, WWW 2017.
Pipeline Summary
(0) Preprocessing
Text cleaning and enrichment
+ Traditional text preprocessing (stemming, …)
(1) Entity Extraction
Entity identification and semantic typing
Exploiting:
Stanford CoreNLPNER
Dandelion API
(2) Relationship Extraction
Baseline with Stanford OpenIE for triple extraction:
Several issues:
- Meaningless relations
- Wrong relations
- Multiple relations
(3) Relationship Aggregation
Sails fans. Season 2 airs on May 24th on History on D Stv Jag Comms
Too many answers
for the same question!
Empirical rules
{"entity1":"Season 2",
"relationship":"air on",
"entity2":"May 24th"}
(4) Relationship Typing (A): Synonyms
Exploiting synsets based on WordNet 3.1
(4) Relationship Typing (B): Matching Types
(4) Relationship Typing (C): Linguistics
Based on VerbNet
Groupings of verbs based on syntactic and semantic properties
Pipeline Implementation
(4) Validation
Experiments
TV Series: Black Salis, Teen Wolf, Vikings
Milan Fashion Week
Rugby games
Domains and quality of results -summary
Relationships and Verb Classes
Example: Teen Wolf
0
100
200
300
400
500
600
700
800
Occ
urr
ence
s
Teen Wolf Synonyms Classes
Example: Teen Wolf
0
100
200
300
400
500
600
700
800
Occ
urr
ence
s
Teen Wolf Synonyms Classes
OC
CU
RR
ENC
ES
TEEN WOLF VERBNET CLASSES
Overall Quality Indexes of Entity and Relationships Extraction
(5) Visualization
Motivation
Resulting semantic models extremely large and hard to interpret
Example:
Black Sails collection, containing 1243 entities and 2025 relations.
Exploration
Visualization
Filtering
Navigation
Exploration
Visualization
Filtering
Navigation
Exploration
Visualization
RELATIONSHIP Filtering
Navigation
Examples
Milano Fashion Week
Generated graph
Examples
Milano Fashion Week
Generated graph
Examples
Milano Fashion Week
Generated graph
Examples
Milano Fashion Week
Generated graph
(6) Conclusions
Conclusions
Extraction of relevant emerging relationships feasible even in case of extremely unstructured
and informal content (social media)
Still a long way to perfect extraction:•N-ary relations•Time-dependency•Poor typing of entities in ontologies
THANKS! QUESTIONS?
Andrea Caielli, Marco Brambilla, Stefano Ceri, Florian Daniel
Harvesting Knowledge from Social Networks: Extracting Typed Relationships among Entities
Marco Brambilla @marcobrambi [email protected]
http://datascience.deib.polimi.it http://home.deib.polimi.it/marcobrambi