Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

download Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

of 40

Embed Size (px)

description

 

Transcript of Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems

  • Steffen Staab staab@uni-koblenz.de 1WeST Web Science & Technologies University of Koblenz Landau, Germany Modelling the Web Examples of Modelling Text, Knowledge Networks and Physical-Social Systems Steffen Staab
  • Steffen Staab staab@uni-koblenz.de 2WeST What do people want from the Web? Web as storage library memory Web as tool search transaction Web as social medium communication cooperation Web as mirror of self Identification outreach
  • Steffen Staab staab@uni-koblenz.de 3WeST What are some of the footprints people leave?
  • Steffen Staab staab@uni-koblenz.de 4WeST My Agenda in the Large Web Content Discovering patterns Building tools Understanding Web Interaction Monitoring Exploiting Guiding Understanding Web Evolution Monitoring Predicting Guiding Understanding
  • Steffen Staab staab@uni-koblenz.de 5WeST 1. Modelling Text My Agenda for Today Web Content Web Interaction Web Evolution 2. Modeling Network Evolution 3. Modeling Physical- social Data
  • Steffen Staab staab@uni-koblenz.de 6WeST 1. Modelling Text My Agenda for Today Web Content Web Interaction Web Evolution 2. Modeling Network Evolution 3. Modeling Physical- social Data
  • Steffen Staab staab@uni-koblenz.de 7WeST Autocompletion of queries UK is?
  • Steffen Staab staab@uni-koblenz.de 8WeST Language Models What follows UK is? Conditional probability: where Issue: Long word sequences can rarely be observed
  • Steffen Staab staab@uni-koblenz.de 9WeST Modified Kneser-Ney Smoothing of n-grams If sequence is hard to observe then approximate recursively observing marginal frequencies of ......
  • Steffen Staab staab@uni-koblenz.de 10WeST Modified Kneser-Ney Smoothing of n-grams If sequence is hard to observe then approximate recursively observing marginal frequencies of First recursion step: Problem: If last word in the sequnce is rare, the overall sequence will be rare, then the approximation will be of low quality.
  • Steffen Staab staab@uni-koblenz.de 11WeST Generalized Language Models [ACL14] If sequence is too hard to observe, then approximate based on marginal probabilities of ... recursively. Core idea of formal solution: Recursively applicable, commutative skip operators
  • Steffen Staab staab@uni-koblenz.de 12WeST Improvement of GLMs [ACL14] Evaluation measure: Perplexity Data set: English Wikipedia, different sample sizes Relative improvement: 2,6% (most training data, smallest model) to 13,9% (least training data, largest model) Perplexity (normalized)
  • Steffen Staab staab@uni-koblenz.de 13WeST Outlook for Generalized Language Models Correcting mistakes that are done in all tools Lack of appropriate models Other operators (the wild black cat) Delete: the black cat Part-of-speech: the adj adj cat Application: e.g. next word prediction Other data structures Tree-like data Graph data proposal for Google current focus Semantic Web
  • Steffen Staab staab@uni-koblenz.de 14WeST 1. Modelling Text My Agenda for Today Web Content Web Interaction Web Evolution 2. Modeling Network Evolution 3. Modeling Physical- social Data
  • Steffen Staab staab@uni-koblenz.de 15WeST Evolution of Networks [ICWSM 2013] Additions RemovalsTraining Link Prediction Problem Unlink Prediction Problem Markov assumption: history irrelevant
  • Steffen Staab staab@uni-koblenz.de 16WeST Related Work in Brief Prediction feature f assigns a score to node pair (i, j) implies to be ranked above Link Prediction: edge likelier to be added Unlink Prediction: edge likelier to be removed f (i , j) > f (i ,k) (i , j) (i , k)
  • Steffen Staab staab@uni-koblenz.de 17WeST Related Work in Brief Static features degree common-neighbours path3 local-clustering- coefficient/embeddedness ... Prediction feature f assigns a score to node pair (i, j) implies to be ranked above Link Prediction: edge likelier to be added Unlink Prediction: edge likelier to be removed f (i , j) > f (i ,k) (i , j) (i , k)
  • Steffen Staab staab@uni-koblenz.de 18WeST Unlink prediction is much more difficult than link prediction The Snapshot View Link and unlink prediction (ICWSM 2013)
  • Steffen Staab staab@uni-koblenz.de 19WeST Related Work in Brief Additions RemovalsTraining Link Prediction Problem Unlink Prediction Problem Markov assumption: history irrelevant Advantage: General Model Disadvantage: General Model Idea Keep generality, improve prediction
  • Steffen Staab staab@uni-koblenz.de 20WeST Our Approach - 1 Additions RemovalsTraining Link Prediction Problem Unlink Prediction Problem Markov assumption: history irrelevant Hypothesis: Temporal information generally improves prediction Idea 1 Nodes concerned 2 Neighbourhood
  • Steffen Staab staab@uni-koblenz.de 21WeST Our Approach - 2 Dynamic features: + recency + longevity Extrapolation for temporal preferential attachment:
  • Steffen Staab staab@uni-koblenz.de 22WeST Evaluation & Discussion (excerpt) Temporal link prediction significantly better, but only sightly Temporal unlink prediction always significantly improved Temporal preferential attachment best AUC baseline qualitative quantitative extrapolation
  • Steffen Staab staab@uni-koblenz.de 23WeST Outlook for Evolution of Networks Temporal dynamics still underexplored lack of datasets! next experiments: Twitter followers Xing.de Unlinks lead to link recommendation new Wikipedia link (reorganization of Wikipedia pages!) new job new friend
  • Steffen Staab staab@uni-koblenz.de 24WeST 1. Modelling Text My Agenda for Today Web Content Web Interaction Web Evolution 2. Modeling Network Evolution 3. Modeling Physical- social Data
  • Steffen Staab staab@uni-koblenz.de 25WeST fish, rice seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon fish, salmon, wine rice, fish lobster, seafood, shrimp coffee coffee, wine coffee wine wine pizza, wine pizza, wine pasta, wine pasta, shrimp lobster, shrimp seafood, shrimp Tagged photos with geo-coordinates from Flickr
  • Steffen Staab staab@uni-koblenz.de 26WeST fish, rice seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta seafood, shrimp lobster, shrimp Tasks: Discovering topics, finding clusters
  • Steffen Staab staab@uni-koblenz.de 27WeST Cultural areas, country borders, geographical features and other geographical observations exhibit complex spatial distributions wikipedia.org Challenge
  • Steffen Staab staab@uni-koblenz.de 28WeST fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta A. Ahmed, L. Hong and A. Smola, 2013 (following (Yin et al 2011; Sizov 2010)) Existing approaches: Gaussian regions
  • Steffen Staab staab@uni-koblenz.de 29WeST fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta MGTM 1: Global Topic Clustering
  • Steffen Staab staab@uni-koblenz.de 30WeST fish, rice lobster, shrimp seafood, fish seafood, shrimp lobster, wine seafood, fish, salmon seafood, shrimp fish, salmon, wine seafood, shrimp lobster, seafood, shrimp coffee coffee, wine coffee italian, wine wine pizza, wine italian, pizza, wine pasta, wine pasta, shrimp seafood fish lobster shrimp crab wine salmon wine pizza coffee italian pasta MGTM 2: Determining Neighbourhoods
  • Steffen Staab staab@uni-koblenz.de 31WeST Cluste