SOCIAL QUERY ROUTING IN SOCIAL NETWORKS 14 th June, 2010 FINAL SEMINAR Dushyant Arora B.I.T.S....

Post on 20-Jan-2016

213 views 0 download

Transcript of SOCIAL QUERY ROUTING IN SOCIAL NETWORKS 14 th June, 2010 FINAL SEMINAR Dushyant Arora B.I.T.S....

SOCIAL QUERY ROUTING IN SOCIAL NETWORKS

14th June, 2010

FINAL SEMINAR

Dushyant Arora B.I.T.S. Pilani

Praphul Chandra Senior Research

Scientist HP Labs India

Agenda

Motivation

Exploiting Structural properties for routing

Simulation – Power-law networks Implementation – Twitter

Exploiting non-structural properties for routing

Simulation – 3 new network evolution models

2

Motivation2

2

Image courtesy of Aardvark http://vark.com/

Structural properties of networksSmall World Phenomenon Any two people are linked by short chains of

acquaintances. Random graphs show small-world effect but no clustering.

Power Law degree distribution

“Rich gets richer model” Growth and Preferential attachment

3

SoA: Social Search using Power Law

Efficient search algorithm Exploit network structure i.e. power law degree distribution Intuition : well connected nodes provide access to greater portion of the network.

No information about the global position of the target. Each node maintains local information about its neighbor and its neighbor’s

neighbor. Search time scales sub-linearly with the size of the graph. Works best for graphs with exponent close to 2.

Power law graph finds 31 nodesIn 4 steps

Poisson graph finds 14 nodesIn 4 steps

Algo: forward message to the next most highly connected unvisited neighbor

5

Adamic, Lukose, Puniyani, Huberman (2001)

Revised Algorithm (Our Work)

Disconnected graphs ? What if exponent not ~ 2 T node rewiring ST and T node rewiring Application - P2P networks

6

Revised Algorithm (Our Work)

7

Twitter Short updates, called "tweets" of 140 characters or fewer - sent to followers and are

searchable on Twitter search. Friends and Followers @username – reply and Mentions. A @reply is a public message sent regardless of

follow-ship that anyone can view. Direct messages – Private message sent only to a person who follows you.   Hashtags - Easy way to group tweets. Add tweets to a category.

8

Twitter exposes its web services in the form of a URL using REST style architecture. Users can query this URL, through HTTP methods like GET and POST. REST calls generally return data in XML or JSON. The simplest form of REST with PHP comes with XML + CURL.

http://api.twitter.com/1/users/show.xml?screen_name=dushyantarora13

Algorithm

Enron Email exchange data – Information profile (list of keywords) – Expertise In order to distinguish the query from a normal tweet we use hashtags. #SQR

BCS – Best connected search A person with high number of followers can be regarded as an expert. Also a person

with high status message count can be regarded as an active user. Problem – Bots. Thus we need a balance of all three metrics. #algorithm #followers

STS – Strong Ties Search Send to node with most replies to current user. #algorithm

WTS – Weak Ties Search The weak tie is a crucial bridge between the two densely knit clumps of close

friends. #algorithm #StrengthOf WeakTies

RWS – Random Walk Search

9

Zhang and Ackerman (2005), Mark Granovetter (1973)

Database10

userdata

Followersscreen_name - stringuid - intfollowers_count - intfriends_count - intstatus_count - intis_my_friend - boollocation - string

Sent_Queriesqid – inttext – stringno_of_replies - int

Received Queries

qid - int sender_screen_name - stringtext - stringis_query_processed - booldm_or_pt - bool

Answered Queries

qid – intsender_screen_name – stringtext - string

We use SQLite for storing all the information. It is the ideal choice for creating small databases as it is highly portable and can be used for developing light weight applications for mobiles as well. #database

#HowThingsWork

Posting Queries The user has the option to post query as a direct message or a tweet. Public tweets

can be seen by all users of Twitter and anyone can reply back to such queries. We use the algorithm when sending direct message. Automatically append #SQR to query. #sent_queries #database

Replying to Queries Scan the Twitter user space for #SQR hashtag. We can parse public tweets posted

by any user but only those direct messages which were sent directly to us. #received_queries #database

Displaying Answers We see the unique query ID (qid) stored in sent_query table and extract all the

replies for a given query using the API. However we cannot find out the replies if the query was sent as a direct message. #sent_queries #database

11

Demo12

Small Worlds13

Pathe Length & Clustering Coefficient vs. p

There is a broad interval of p over which L(p) ~ Lrandom yet C >> Crandom

i.e. a lot of small worlds with high clustering coefficient(attributed to the few long-range random contacts)

Model social networks (Small world + high clustering) Start with a regular lattice and randomly rewire some edges

People socialize often with local contacts & sometimes with long range contacts

Watts & Strogatz (1998)

Starting from a ring lattice with n vertices and k edges per vertex, we rewire each edge to a random node with

probability p, with duplicate edges forbidden

Dynamic Rewiring in Grid Based Networks (Our Work) Thomas Schelling’s segregation model (1971) Small preferences for same color neighbors could lead to total segregation

in society by formation of ghettos

Number of neighbors of same type in Moore neighborhood > β (%similarity)

8

Wilensky,U. (1997). NetLogo Segregation model. http://ccl.northwestern.edu/netlogo/models/Segregation. Center for Connected Learning and Computer-Based Modeling, Northwestern University, Evanston, IL.

14

Schelling with Link Retention (SLR)pio = (1 – β*|Ci-Co|) * f(t) where f(t) = (1-e-t)/(1+e-t)

t is the time a node has stayed in a particular neighborhood and f(t) is an increasing function in [0,1).

pio = f(t) when Ci = Co

pio = (1- β) * f(t) when Ci Co

15

Path Length & Clustering Coefficient

16

We observe a minima at β = 0.7

Degree Distribution & Routing Time

17

We observe a minima at β = 0.7Log normal Distribution

Rewiring for Homophily (RfH) Physical relocation has a high cost in terms of the social links broken and the effort

required to created new social links Nodes do not ‘move’ at all but are allowed to form some long distance links to

satisfy their homophily requirements via rewiring.

18

Rewiring for Homophily with r-dimensional identity (RfHr)

19

Each node an r-dimensional identity assigned at random. Like RfH model nodes do not “move” but rewire to satisfy their homophily requirements.

Use Hamming distance to measure social distance between nodes.

Each unsatisfied node rewires such that 1 - H(new_neighbor,current_node)/r > β.

Future Work

Find better algorithms for SLR OAuth implementation SLR + r-bit identity vector We use only keywords as queries

presently

20

21

THANK YOU

Q & A22