Peeking into massive Online Social Networks (aka “Walking on Facebook”)

8
1 Peeking into massive Online Social Networks (aka “Walking on Facebook”) Maciej Kurant Miniprojects

description

Miniprojects. Peeking into massive Online Social Networks (aka “Walking on Facebook”). Maciej Kurant. Miniprojects. Essentially, measure. LastFM. www.last.fm/user/rj. LastFM. API www.last.fm/api. LastFM. http://ws.audioscrobbler.com/2.0/?method=user.getfriends &user= rj &limit=10 - PowerPoint PPT Presentation

Transcript of Peeking into massive Online Social Networks (aka “Walking on Facebook”)

1

Peeking into massive Online Social Networks

(aka “Walking on Facebook”)

Maciej Kurant

Miniprojects

Miniprojects

Essentially, measure

LastFMwww.last.fm/user/rj

LastFMAPI

www.last.fm/api

http://ws.audioscrobbler.com/2.0/?method=user.getfriends&user=rj&limit=10&page=1&api_key=1b4218629b50c1159e15a6b8285b90ba

LastFMAPI

import urllib2import re

api_key = '1b4218629b50c1159e15a6b8285b90ba'user = "rj"command = "http://ws.audioscrobbler.com/2.0/?method= user.getfriends&user="+user+"&limit=10&page=1&api_key="+api_key data = urllib2.urlopen(command).read() # XML formatdegree = int(re.search('total="(\d+)"', data).group(1))friends = re.findall("<name>(.*)</name>", data)

print degree # number of friends of "rj"print friends # first 10 friends (because page=1 and limit=10).

http://ws.audioscrobbler.com/2.0/?method=user.getfriends&user=rj&limit=10&page=1&api_key=1b4218629b50c1159e15a6b8285b90ba

LastFMAPI

For BFS, you need all friends. Set “limit=500” and pull multiple pages if necessary. For Random Walks, you will need only the degree and one neighbor. Set “limit=1” and 1)learn the degree, 2) select the index i of the neighbor, 3) Get the name by setting “page=i”.

In Python

Surprises• Banned user (once reached, seem to have

0 friends)

• Server not responding

• Friendship graph not connected (solution: consider only the component connected to user 'rj'.)

• Case-sensitiveness? (rj == RJ ??)

• …

Your program has to deal with them!

C

DM

J

N

A

B

IE

K

F

LH

G

Data: LastFM, the component connected to user 'rj'

1) Random nodeUse MHRW of length L=50 to select a node uniformly at random from LastFM. Repeat it 100 times. Report the average degree of selected nodes, and of their neighbors. What changes if L counts only unique nodes in MHRW? Why? What happens if you use RW instead of MHRW?

2) RW vs RWRWRun RW in LastFM. What are the average <playcount>, <playlists>, <age>, <id>, and number of friends observed in the sample. How do they change after correcting for the degree bias (RWRW)?

3) Component size Based on RW, estimate the size of the component connected to user 'rj'. Use two approaches: [Katzir’11] and [Kurant’13?].

4) BFSCollect a BFS sample starting from user 'rj' in LastFM. What node degrees, <playcount>, <playlists>, <age>, <id>, do you sample as you collect more nodes? How about implementing it on multiple threads?

5) Barbarian samplingTry to download the entire component connected to user ‘rj’. You will probably need to use a cluster of machines, multiple threads, etc. Use your own API-key, please. Once you have it, report basic properties: size, average degree, degree distribution, etc (e.g., average <age>?). Compare with others.

Miniprojects