KDD 2005 Review Session Jure Leskovec. Query Incentive Networks Jon Kleinberg Prabhakar Raghavan.
-
Upload
rodney-higgins -
Category
Documents
-
view
218 -
download
1
Transcript of KDD 2005 Review Session Jure Leskovec. Query Incentive Networks Jon Kleinberg Prabhakar Raghavan.
KDD 2005 Review Session
Jure Leskovec
Query Incentive Networks
Jon Kleinberg
Prabhakar Raghavan
Query Incentive Networks Networks are everywhere
Decentralized peer-to-peer networks On-line communities
There is no central index So users post queries to the network itself Requests get propagated until the answer
is found
Motivating example On-line Social Networking sites
Frendster, Orkut, LinkedIn, … maintain social network of their members
Large on-line community of off-line world friendships
Use the network to help find information and services:
Find job or apartment through friends of friends
Queries propagating over the network of friendships (trust)
Networks as Marketplaces Intuitively we want our network to have short
easily findable paths
More questions How do members of the network extract utility from
their interactions with other members? What is the system behavior as members interact
strategically to maximize their utilities?
The Setting Formulation of a simple model of query
propagation on a random network Node v* poses a query and offers a reward Query propagates and answer is found
How should other nodes behave? How much reward should a query node v* offer?
Main Result If a node has less than 2 neighbors on the
average (branching factor is 2), then node has to invest enormous amount to receive the answer
If branching factor is more than 2, it needs to invest only O(log n)
Consequence Known result
At branching factor of 1 network has a giant connected component and short paths O(log N)
Consequence The network achieves structural robustness at
branching factor of 1 At branching factor of at least 2 the network makes
searching feasible in the presence of incentives
Formulating the Model – Big Picture Node v* belonging to network is seeking a
piece of information held by certain nodes Node v* offers a reward which will be paid
when the answer is received If a neighbor of v* does not have an answer
It takes a piece of the reward for itself Offers a smaller reward to its neighbors –
“subcontractors” (hoping they would have the answer)
Query propagates and eventually finds the answer
Tree Model – Example
v*
Initial utility 9
reward 5
cb
fed
reward 2reward 2
answer
g reward 0
reward 1 answer
Offers reward for the answer
Tree Model – Answer propagation
v*
Initial utility 9
reward 5
cb
fed
reward 2reward 2
g reward 0
reward 1 answer
Offers reward for the answer
Answer dreward=2
Reward5-2=3
Utility9-5=4
answer
Tree model We model the underlying network as a tree T The root v* of thee T has a query for which it has
utility r* Each node holds an answer with probability 1-p Node v* offers a reward for the answer The query propagates down the tree
Tree model – “Subcontracting” Each node takes its share of reward
Each node v has a integer valued function fv
If node v is offered a reward of r by its parent and node v does not posses the answer
Then it offers a reward fv(r) < r to its children
The propagation of query stops along a particular path in T when Offered rewards shrink to 0 When a node that holds the answer is reached
Tree model – Getting the Reward From among all the answer-holders that are
discovered the root v* selects an answer The reward propagates down the path to the
answer-holder Each node on the answer path keeps its share
of the reward Forwarding the reward has unit cost
The Model as a Game (1) Nodes behave strategically Each chooses how to offer a reward to
maximize the payoff Each player (node) v picks a strategy in form of
function fv
u v
Offers reward r
Offers reward fv(r)
Note that fv(r) is integer valuedso propagation is not infinite
The Model as a Game (2) Function fv(r) determines how much of the
reward is passed forward Defines the strategy Forwarding the query has unit cost: fv(r) ≤ r-1 So each node keeps at least 1 unit for their effort
u v
Offers reward r
Offers reward fv(r)
Note that fv(r) is integer valuedso propagation is not infinite
Unique Nash Equilibrium The game has a unique Nash Equilibrium
fv(r) = x subject to maxx (r-x-1)αv(f, x)
αv(f, x) … probability that tree bellow v contains an answer
At Nash equilibrium no player gains anything by deviating from the current strategy
All nodes have the same strategy (function f)
Proof: Show that maximum expected reward is
exactly maxx (r-x-1)αv(f, x)
The Model – the Details Rarity n
One out of n nodes has the answer: n = (1–p)-1
We model the tree T with 2 parameters T is a d-ary tree Each node in T is on-line with probability q Average branching b of the tree T is then b=q∙d
v*
db
ge
d = 3q = 0.5
tree T
Structure of Rewards (1) How large must be the initial reward to obtain an
answer with high probability?
Given n … the rarity of an answer b … tree branching factor
Let Rσ(n,b) be the reward root node v* must offer to get the answer with probability σ
What can we say about Rσ(n,b) as we change σ, the probability of getting an answer?
Structure of Rewards (2) Reward Rσ(n,b) increases in steps >1
The steps occur exactly the reward is sufficient to propagate a level deeper
σ, probability of getting an answer
r* = Rσ(n,b)Reward offered
by the root
Growth rate of Rewards (1) How does the reward offered by the root Rσ(n,b)
change as a function of rarity n and branching b
For branching factor 1<b<2
Rσ(n,b) = Ω(nc)
where c depends on 2-b
A node needs to offer exponentially larger rewards as the answer gets rarer
Growth rate of Rewards (2) For branching factor b>2
Rσ(n,b) = O(log n)
where constant in O() depends on b-2
The offered reward increases logarithmically with the rarity
Consequences + Conclusion (1) For branching factors b>1 the answer is close
The distance to the nearest answer is O(log n) Each node on the path retains at least a unit of reward
For large branching factor the propagation of queries is very efficient in the use of reward
When 1<b<2 the distance to the nearest answer is still O(log n)
But now the reward needed by the root is exponential in distance
Consequences + Conclusion (2) The result is a surprise since branching b=2 is
not a critical point of the branching process
As b>1 the answer lies O(log n) steps away – network structure becomes robust
But as incentives are taken into account b=2 is a critical value – network becomes efficient incentive based queries
Other interesting papers (1) The Predictive Power of On-Line Chatter by
Gruhl, Guha, Kumar, Novak and Tomkins Can we say anything about book sales while
observing blog postings? Correlate the blog postings with book sales
Craw blogs to get blog postings Get sales data from Amazon (Sales-rank)
They automate the process Given a set of book titles Automatically design queries for blog posts Predict whether a spike in the book sales will occur
Predictive power of on-line chatter
The Lance Armstrong performance program
Other interesting papers (2) Monograms for Visualizing Support Vector
Machines by Jakulin, Mozina, Demsar, Bratko and Zupan Monogram … place axis parallel to each other
instead of at right angles Can work with large number of attributes Supports either regression of classification Can be generalized to visualize any Generalized
Additive Model (decomposable kernels)
Example – Boston Housing
Expensive Cheap
Outcomeprobability
log odds ratio:the odds in favor of a cheap area
Aggregate OR:
Att
rib
ute
s
possible values
valuestaken
prediction
logpcheap
pexpensive=intercept(prior)
+e®ect
(evidence)
Other interesting papers (3) Unweaving a Web of Documents by Guha,
Kumar, Sivakumar and Sundaram Given a set of time-stamped documents (e.g. news) We want to decompose them into semantically
coherent threads Algorithm
Create relevance graph pointing back in time Decompose a graph into node-disjoint paths Formulate the minimum cost flow problem
Given graph with costs and capacities Push as much flow using minimum cost
Unweaving a Web of Documents
Thread found by the algorithm IRA attacks