Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

40
Steffen Staab [email protected] 1 WeST Web Science & Technologies University of Koblenz ▪ Landau, Germany Modelling the Web Examples of Modelling Text, Knowledge Networks and Physical-Social Systems Steffen Staab

description

 

Transcript of Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Page 1: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

1WeST

Web Science & TechnologiesUniversity of Koblenz ▪ Landau, Germany

Modelling the Web Examples of Modelling Text, Knowledge Networks

and Physical-Social Systems

Steffen Staab

Page 2: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

2WeST

What do people want from the Web?

Web as storagelibrary

memory

Web as toolsearch

transaction

Web as social mediumcommunication

cooperation

Web as mirror of selfIdentification

outreach

Page 3: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

3WeST

What are some of the footprints people leave?

Page 4: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

4WeST

My Agenda in the Large

Web Content Discovering patterns Building tools Understanding

Web Interaction Monitoring Exploiting Guiding Understanding

Web Evolution Monitoring Predicting Guiding Understanding

Page 5: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

5WeST

1. Modelling Text

My Agenda for Today

Web Content Web Interaction

Web Evolution

2. Modeling Network

Evolution3. Modeling Physical-

social Data

Page 6: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

6WeST

1. Modelling Text

My Agenda for Today

Web Content Web Interaction

Web Evolution

2. Modeling Network

Evolution3. Modeling Physical-

social Data

Page 7: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

7WeST

Autocompletion of queries

„UK is“?

Page 8: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

8WeST

Language Models

What follows „UK is“?

Conditional probability:

where

Issue:Long word sequences can rarely be observed

Page 9: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

9WeST

Modified Kneser-Ney Smoothing of n-grams

If sequence is hard to observethen approximate recursively observing marginal frequencies of

......

Page 10: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

10WeST

Modified Kneser-Ney Smoothing of n-grams

If sequence is hard to observethen approximate recursively observing marginal frequencies of

First recursion step:

Problem:If last word in the sequnce is rare, the overall sequence will be rare,

then the approximation will be of low quality.

Page 11: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

11WeST

Generalized Language Models [ACL14]

If sequence is too hard to observe, then approximate based on marginal probabilities of

...

recursively.

Core idea of formal solution: Recursively applicable, commutative skip operators

Page 12: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

12WeST

Improvement of GLMs [ACL14]

Evaluation measure: Perplexity

Data set: English Wikipedia, different sample sizes

Relative improvement: 2,6% (most training data, smallest model) to13,9% (least training data, largest model)

Perplexity (normalized)

Page 13: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

13WeST

Outlook for Generalized Language Models Correcting mistakes that are done in all tools

Lack of appropriate models

Other operators („the wild black cat“) Delete: „the black cat“ Part-of-speech: „the adj adj cat“

Application: e.g. next word prediction

Other data structures Tree-like data Graph data

proposal for Google

current focus

Semantic Web

Page 14: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

14WeST

1. Modelling Text

My Agenda for Today

Web Content Web Interaction

Web Evolution

2. Modeling Network

Evolution3. Modeling Physical-

social Data

Page 15: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

15WeST

Evolution of Networks [ICWSM 2013]

Additions RemovalsTraining

Link Prediction Problem

Unlink Prediction Problem

Markov assumption:

history irrelevant

Page 16: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

16WeST

Related Work in Brief

Prediction feature f assigns a score to node pair (i, j) implies to be ranked above

• Link Prediction: edge likelier to be added• Unlink Prediction: edge likelier to be removed

f (i , j ) > f (i , k ) (i , j) (i , k )

Page 17: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

17WeST

Related Work in Brief

Static features degree common-neighbours path3 local-clustering-

coefficient/embeddedness ...

Prediction feature f assigns a score to node pair (i, j) implies to be ranked above

• Link Prediction: edge likelier to be added• Unlink Prediction: edge likelier to be removed

f (i , j ) > f (i , k ) (i , j) (i , k )

Page 18: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

18WeST

Unlink prediction is much more difficult than link prediction

The Snapshot View

Link and unlink prediction

(ICWSM 2013)

Page 19: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

19WeST

Related Work in Brief

Additions RemovalsTraining

Link Prediction Problem

Unlink Prediction Problem

Markov assumption:

history irrelevant

Advantage: General ModelDisadvantage: General Model

IdeaKeep generality,

improve prediction

Page 20: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

20WeST

Our Approach - 1

Additions RemovalsTraining

Link Prediction Problem

Unlink Prediction Problem

Markov assumption:

history irrelevant

Hypothesis: Temporal information generally improves prediction

Idea1 Nodes concerned2 Neighbourhood

Page 21: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

21WeST

Our Approach - 2

Dynamic features:+ recency+ longevity

Extrapolation for temporal preferential attachment:

Page 22: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

22WeST

Evaluation & Discussion (excerpt)

Temporal link prediction significantly better, but only sightly Temporal unlink prediction always significantly improved Temporal preferential attachment best

AUC baselinequalitativequantitativeextrapolation

Page 23: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

23WeST

Outlook for Evolution of Networks

Temporal dynamics still underexplored lack of datasets! next experiments:

• Twitter followers• Xing.de

Unlinks lead to link recommendation new Wikipedia link (reorganization of Wikipedia pages!) new job new friend

Page 24: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

24WeST

1. Modelling Text

My Agenda for Today

Web Content Web Interaction

Web Evolution

2. Modeling Network

Evolution3. Modeling Physical-

social Data

Page 25: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

25WeST

fish, rice

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

fish, salmon, wine

rice, fish

lobster, seafood, shrimp

coffee

coffee, wine

coffee

wine

wine

pizza, wine

pizza, wine

pasta, wine

pasta, shrimplobster, shrimp

seafood, shrimp

Tagged photos with geo-coordinates from Flickr

Page 26: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

26WeST

fish, rice

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

seafoodfishlobstershrimpcrabwinesalmon

winepizzacoffeeitalianpasta

seafood, shrimp

lobster, shrimp

Tasks: Discovering topics, finding clusters

Page 27: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

27WeST

Cultural areas, country borders, geographical features and other geographical observations exhibit complex spatial distributions

wikipedia.org

Challenge

Page 28: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

28WeST

fish, rice

lobster, shrimp

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

seafood, shrimp

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

seafoodfishlobstershrimpcrabwinesalmon

winepizzacoffeeitalianpasta

A. Ahmed, L. Hong and A. Smola, 2013 (following (Yin et al 2011; Sizov 2010))

Existing approaches: Gaussian regions

Page 29: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

29WeST

fish, rice

lobster, shrimp

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

seafood, shrimp

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

seafoodfishlobstershrimpcrabwinesalmon

winepizzacoffeeitalianpasta

MGTM 1: Global Topic Clustering

Page 30: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

30WeST

fish, rice

lobster, shrimp

seafood, fish seafood, shrimp lobster, wine

seafood, fish, salmon

seafood, shrimp

fish, salmon, wine

seafood, shrimp

lobster, seafood, shrimp

coffee

coffee, wine

coffeeitalian, wine

wine

pizza, wine

italian, pizza, wine

pasta, wine

pasta, shrimp

seafoodfishlobstershrimpcrabwinesalmon

winepizzacoffeeitalianpasta

MGTM 2: Determining Neighbourhoods

Page 31: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

31WeST

Cluster adjacency Dependencies of document-specific topic distributions

Exchange of topic information between clusters

MGTM 3: Derived Topic Model

Page 32: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

32WeST

Exchange of topic information between clusters

MGTM 4: Exchange of Topic Information

Page 33: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

33WeST

Exchange of topic information between clusters

MGTM 4: Exchange of Topic Information

Page 34: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

34WeST

Exchange of topic information between clusters

MGTM 4: Exchange of Topic Information

Page 35: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

36WeST

Evaluation: Anectodal, Perplexity, Gaming

Gaming study: intrusion detection

Precision 8 topicsavg / median

LGTA 0.60 / 0.58

Basic model 0.64 / 0.58

MGTM 0.78 / 0.75

Page 36: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

37WeST

Outlook for LDA with structure

Texts + social network structures scientometry xing.de

Web pages + user visits chefkoch.de

Page 37: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

38WeST

Future: Knowledge about social aspects needed

Future: CS style models for social sciences

Page 38: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

39WeST

References[ACL14] R. Pickhardt, T. Gottron, M. Körner, P. G. Wagner, T. Speicher, S.

Staab. A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser Ney Smoothing. In: Proc. of ACL-2014 - The 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, June 22-27, 2014.

[WSDM14] C. Kling, J. Kunegis, S. Sizov, S. Staab. Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections. In: Proc. of the 7th ACM Conference on Web Search and Data Mining (WSDM2014), New York, US, February 24-28, 2014.

[ICWSM13] J.Preusse, J.Kunegis, M.Thimm, T.Gottron, S. Staab. Structural Changes in Collaborative Knowledge Networks. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM 2013), Boston, July 8-10, 2013.

Page 39: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

40WeST

Semantic Web

Social Web & Web Retrieval

Interactive Web & Human Computing

Web & Economy

Software & Services

Web Science & Technologies Team & Research

Computational Social Science

Thank You!

Page 40: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

Steffen [email protected]

41WeST

Maslows pyramid of needs