SOCIAL QUERY ROUTING IN SOCIAL NETWORKS
14th June, 2010
FINAL SEMINAR
Dushyant Arora B.I.T.S. Pilani
Praphul Chandra Senior Research
Scientist HP Labs India
Agenda
Motivation
Exploiting Structural properties for routing
Simulation – Power-law networks Implementation – Twitter
Exploiting non-structural properties for routing
Simulation – 3 new network evolution models
2
Motivation2
2
Image courtesy of Aardvark http://vark.com/
Structural properties of networksSmall World Phenomenon Any two people are linked by short chains of
acquaintances. Random graphs show small-world effect but no clustering.
Power Law degree distribution
“Rich gets richer model” Growth and Preferential attachment
3
SoA: Social Search using Power Law
Efficient search algorithm Exploit network structure i.e. power law degree distribution Intuition : well connected nodes provide access to greater portion of the network.
No information about the global position of the target. Each node maintains local information about its neighbor and its neighbor’s
neighbor. Search time scales sub-linearly with the size of the graph. Works best for graphs with exponent close to 2.
Power law graph finds 31 nodesIn 4 steps
Poisson graph finds 14 nodesIn 4 steps
Algo: forward message to the next most highly connected unvisited neighbor
5
Adamic, Lukose, Puniyani, Huberman (2001)
Revised Algorithm (Our Work)
Disconnected graphs ? What if exponent not ~ 2 T node rewiring ST and T node rewiring Application - P2P networks
6
Revised Algorithm (Our Work)
7
Twitter Short updates, called "tweets" of 140 characters or fewer - sent to followers and are
searchable on Twitter search. Friends and Followers @username – reply and Mentions. A @reply is a public message sent regardless of
follow-ship that anyone can view. Direct messages – Private message sent only to a person who follows you. Hashtags - Easy way to group tweets. Add tweets to a category.
8
Twitter exposes its web services in the form of a URL using REST style architecture. Users can query this URL, through HTTP methods like GET and POST. REST calls generally return data in XML or JSON. The simplest form of REST with PHP comes with XML + CURL.
http://api.twitter.com/1/users/show.xml?screen_name=dushyantarora13
Algorithm
Enron Email exchange data – Information profile (list of keywords) – Expertise In order to distinguish the query from a normal tweet we use hashtags. #SQR
BCS – Best connected search A person with high number of followers can be regarded as an expert. Also a person
with high status message count can be regarded as an active user. Problem – Bots. Thus we need a balance of all three metrics. #algorithm #followers
STS – Strong Ties Search Send to node with most replies to current user. #algorithm
WTS – Weak Ties Search The weak tie is a crucial bridge between the two densely knit clumps of close
friends. #algorithm #StrengthOf WeakTies
RWS – Random Walk Search
9
Zhang and Ackerman (2005), Mark Granovetter (1973)
Database10
userdata
Followersscreen_name - stringuid - intfollowers_count - intfriends_count - intstatus_count - intis_my_friend - boollocation - string
Sent_Queriesqid – inttext – stringno_of_replies - int
Received Queries
qid - int sender_screen_name - stringtext - stringis_query_processed - booldm_or_pt - bool
Answered Queries
qid – intsender_screen_name – stringtext - string
We use SQLite for storing all the information. It is the ideal choice for creating small databases as it is highly portable and can be used for developing light weight applications for mobiles as well. #database
#HowThingsWork
Posting Queries The user has the option to post query as a direct message or a tweet. Public tweets
can be seen by all users of Twitter and anyone can reply back to such queries. We use the algorithm when sending direct message. Automatically append #SQR to query. #sent_queries #database
Replying to Queries Scan the Twitter user space for #SQR hashtag. We can parse public tweets posted
by any user but only those direct messages which were sent directly to us. #received_queries #database
Displaying Answers We see the unique query ID (qid) stored in sent_query table and extract all the
replies for a given query using the API. However we cannot find out the replies if the query was sent as a direct message. #sent_queries #database
11
Demo12
Small Worlds13
Pathe Length & Clustering Coefficient vs. p
There is a broad interval of p over which L(p) ~ Lrandom yet C >> Crandom
i.e. a lot of small worlds with high clustering coefficient(attributed to the few long-range random contacts)
Model social networks (Small world + high clustering) Start with a regular lattice and randomly rewire some edges
People socialize often with local contacts & sometimes with long range contacts
Watts & Strogatz (1998)
Starting from a ring lattice with n vertices and k edges per vertex, we rewire each edge to a random node with
probability p, with duplicate edges forbidden
Dynamic Rewiring in Grid Based Networks (Our Work) Thomas Schelling’s segregation model (1971) Small preferences for same color neighbors could lead to total segregation
in society by formation of ghettos
Number of neighbors of same type in Moore neighborhood > β (%similarity)
8
Wilensky,U. (1997). NetLogo Segregation model. http://ccl.northwestern.edu/netlogo/models/Segregation. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.
14
Schelling with Link Retention (SLR)pio = (1 – β*|Ci-Co|) * f(t) where f(t) = (1-e-t)/(1+e-t)
t is the time a node has stayed in a particular neighborhood and f(t) is an increasing function in [0,1).
pio = f(t) when Ci = Co
pio = (1- β) * f(t) when Ci Co
15
Path Length & Clustering Coefficient
16
We observe a minima at β = 0.7
Degree Distribution & Routing Time
17
We observe a minima at β = 0.7Log normal Distribution
Rewiring for Homophily (RfH) Physical relocation has a high cost in terms of the social links broken and the effort
required to created new social links Nodes do not ‘move’ at all but are allowed to form some long distance links to
satisfy their homophily requirements via rewiring.
18
Rewiring for Homophily with r-dimensional identity (RfHr)
19
Each node an r-dimensional identity assigned at random. Like RfH model nodes do not “move” but rewire to satisfy their homophily requirements.
Use Hamming distance to measure social distance between nodes.
Each unsatisfied node rewires such that 1 - H(new_neighbor,current_node)/r > β.
Future Work
Find better algorithms for SLR OAuth implementation SLR + r-bit identity vector We use only keywords as queries
presently
20
21
THANK YOU
Q & A22
Top Related