Integrating and Interpreting Social Data from Heterogeneous Sources

19
ing and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010 Integrating and Interpreting Social Data from Heterogeneous Sources Matthew Rowe Organisations, Information and Knowledge Group University of Sheffield Suvodeep Mazumdar Department of Information Studies University of Sheffield

description

 

Transcript of Integrating and Interpreting Social Data from Heterogeneous Sources

Page 1: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Integrating and Interpreting Social Data from Heterogeneous Sources

Matthew Rowe Organisations, Information and

Knowledge GroupUniversity of Sheffield

Suvodeep MazumdarDepartment of Information Studies

University of Sheffield

Page 2: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Outline

• Information overload– Increase in social data publication

• Interlinking social data– Metadata Generation– Integrating Social Data

• Application: Interpreting Social Data– Cumbrian Floods Use Case– Interacting with Social Data

• Conclusions

Page 3: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Information Overload

• Masses of social data are published every day– E.g. 50 million tweets (600 per second)

• http://blog.twitter.com– 22 million Facebook users in the UK

• http://www.clickymedia.co.uk/2009/10/uk-facebook-user-statistics-october-2009/

• Too much information to deal with!• Social data is multi-faceted:

– Provenance– Topic– Geo

• Trend services (e.g. trendistic, blogpulse):– Focus on majority consensus– Need to listen in to a specific topic– Concentrate on a single source/platform– Do not consider geo facet

Page 4: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Page 5: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Page 6: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Interlinking Social Data

• Consider multi-faceted nature of social data:– Allows fine-grained analysis– Show geo-localised social data– Relevant past social data

• Solution: Interlink social data from heterogeneous sources– Use semantics!– Consistent data interpretation

Page 7: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post and itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

Page 8: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post and itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>

<photo id="949406913" media="photo"> <owner nsid="54948696@N00”/> <title>DSC00171.JPG</title> <description></description> <dates posted="1205398307" taken="2009-01-09 09:16:31" lastupdate="1257421561" /> <tags> <tag id="24539622-2330113101-400" author="54948696@N00" raw="arctic">arctic</tag> <tag id="24539622-2330113101-401" author="54948696@N00" raw="monkeys">monkeys</tag> </tags> <location latitude="53.4813" longitude="-2.2392" place_id="R8vDw_abBpSzUA"> <locality place_id="R8vDw_abBpSzUA" woeid="27872">Manchester</locality> <region place_id="pn4MsiGbBZlXeplyXg" woeid="24554868">England</region> <country place_id="DevLebebApj4RVbtaQ" woeid="23424975">United Kingdom</country> </location></photo>

Page 9: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post and itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>

Page 10: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>

<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ;

Page 11: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>

<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for

#lupas2010." ;dcterms:subject "lupas2010" ;

Page 12: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>

<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for

#lupas2010." ;dcterms:subject "lupas2010" ;itr:has_Localization _:a2 .

_:a2rdf:type gml:Geometry ;gml:pos "53.3833,-1.4722" .

Page 13: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>

<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for

#lupas2010." ;dcterms:subject "lupas2010" ;dcterms:created "2010-2-28 12:22:47.0" ;itr:has_Localization _:a2 .

_:a2rdf:type gml:Geometry ;gml:pos "53.3833,-1.4722" .

Page 14: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Metadata Generation

• Web 2.0 platforms return data using:– Proprietary formats;– Heterogeneous data schemas

• Need to link data together from disparate sources• A social data fragment = a single piece of social data

– E.g. A tweet, an image, a video• Lift each social data fragment to RDF:

1. Create an instance of sioc:Post/itr:LocalizedResource• Assign it a URI

2. Assign the content to the instance (topic)• Use hashtags of the microblog

3. Create an instance of gml:Geometry (geo)• Capture geo facet

4. Assign timestamp of fragment creation (provenance)• Using dc:created

5. Assign the fragment to its owner (provenance)• Create foaf:Person instance

<status> <created_at>Sun Feb 28 12:22:47 +0000 2010</created_at> <id>9774519667</id> <text>Writing up our Geovation work for #lupas2010.</text> <truncated>false</truncated> <in_reply_to_status_id></in_reply_to_status_id> <in_reply_to_user_id></in_reply_to_user_id> <favorited>false</favorited> <in_reply_to_screen_name></in_reply_to_screen_name> <geo xmlns:georss="http://www.georss.org/georss"> <georss:point>53.3833,-1.4722</georss:point> </geo></status>

<http://twitter.com/mattroweshow> rdf:type foaf:Person ;rdf:type itr:LocalizedResource ;foaf:name "Matthew Rowe" ;foaf:homepage <http://www.dcs.shef.ac.uk/~mrowe> ;

<http://twitter.com/mattroweshow/9774519667> rdf:type sioc:Post ;rdf:type itr:LocalizedResource ; sioc:content "Writing up our Geovation work for

#lupas2010." ;dcterms:subject "lupas2010" ;dcterms:created "2010-2-28 12:22:47.0" ;sioc:hasCreator <http://twitter.com/mattroweshow> ;itr:has_Localization _:a2 .

_:a2rdf:type gml:Geometry ;gml:pos "53.3833,-1.4722" .

Page 15: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Integrated Social Data

• Triplify social data from multiple platforms– Flickr XML response -> RDF– Picassa XML response -> RDF

• Use common semantics– Can perform SPARQL queries

PREFIX dcterms:<http://purl.org/dc/terms>SELECT ?itemWHERE {

?item dcterms:subject "iranelections" .

?item dcterms:created ?date}ORDER BY DESC(?date)

PREFIX dcterms:<http://purl.org/dc/terms>PREFIX itr:<http://www.dcs.shef.ac.uk/~gregoire/interaction/ns#>PREFIX gml:<http://www.opengis.net/gml/>SELECT DISTINCT ?post ?tagWHERE {

?post dcterms:subject ?tag .?post itr:has_Localization ?geo .?geo gml:pos "53.4813,-2.2392"

}

Page 16: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Interpreting Social Data

• Cumbrian Use Case– UK region suffered worst floods in centuries– Observe the effects in social data

• Rise in publication• Fine-grained geocoded social data

• Dataset:– Microblogs from 200 Cumbrian Twitter users

• Published during 2009• 3513 microblogs• Produced 475,043 triples

– Images from Flickr taken in Cumbria• 6663 images• Produced 182,304

Page 17: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Interacting with Social Data

• Built a visualisation application to analyse social data fragmentshttp://www.dcs.shef.ac.uk/~suvodeep/ViziSocial

• Filter by date– Lower slider

• Fine-grained focus– Zoom in

• Tag cloud– Shows fragment topics– Window controls tag cloud topics

• Markers contain number of fragments

Page 18: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Conclusions

• Consistent interpretation of social data– Across heterogeneous sources

• Application– Allows analyses of social data

• To fine-grained detail– Utilises multiple facets of social data– Requires metadata

• Issue of scalability

• Future Work– Adapting to real time data acquisition

• Focussing on South Yorkshire region at present• Assess scalability issue

Page 19: Integrating and Interpreting Social Data from Heterogeneous Sources

Integrating and Interpreting Social Data from Heterogeneous Sources – LUPAS 2010

Questions?

Twitter: @mattroweshowWeb: http://www.dcs.shef.ac.uk/~mroweEmail: [email protected]