Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

19
Final Projects CIS 6930.007/SYA 6933.904 Spring 2011

Transcript of Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Page 1: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Final Projects

CIS 6930.007/SYA 6933.904Spring 2011

Page 2: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

What you may take away

• Interdisciplinary collaboration experience• Team work experience • Leaning about the other field• The seed work of a publishable result– In CS only: a plethora of good-quality peer-

reviewed conferences on social networks topics• More significant results can be obtained by asking the

right SNA questions.

• Fun!

Page 3: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

This is an experiment for all of us!

• Please communicate well with your team and with us to make sure things go well– Progress– Useful/pleasant experience

Page 4: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Project selection

• Significant components of SNA and parallel/distributed computing contribution– A program that runs for an hour on your laptop

perhaps does not deserve the effort to be written in parallel• Unless it’s an online service with “real time”

requirements.– Can you contribute with SNA knowledge?

• Dataset available or possible to collect– CS students: check APIs (and Terms of Use?)

Page 5: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

One way to reason about parallelism: Data vs. Task Parallelism

Page 6: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

What’s next• In this class: you’ll talk with potential teams• In 2 weeks: your team submits a 1000-words project

proposal that includes:– Objectives– Dataset description (or how to collect it)– Why appropriate for parallel computations– (Rough ideas on how to parallelize the code)– Responsibilities for each team member

• In-class team meetings with respective professors (02/28)

• …

Page 7: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Project 1

• For a set of real networks (some social, some biological or technological), compute the correlation between edge betweenness for every pair of nodes and the overlap between their neighborhoods.

• Datasets are available:– Most are applicable

Page 8: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Project 2

• Path structure and strength of tie in the legislature data (variant of forbidden triad). The basic idea is that the greater the number of paths of length 2 connecting a pair of nodes and the stronger the ties are in these paths, the more likely it is that the pair is connected by a tie ...

• Datasets: – Legislature data– Other weighted graphs may be available

Page 9: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Project 3

• Compare the centrality measures of different nodes in a Facebook social graph to understand the topology and the position of a node in the graph. See SNA course slide on comparison between high degree vs. low closeness -> many ties in a cluster on the edge of network.

• Datasets:– Facebook– Others as well

Page 10: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Project 4

• Propose a hypothesis to test, collect data from a relevant online source, and evaluate the hypothesis.

• Already proposed in Projects 8, 10, 12

Page 11: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Project 5

• Confirm or infirm a result from social network analysis (such as, for example, Friendkin's Horizon of observability or the forbidden triad hypothesis) using much larger and mediated social networks (e.g., online social networks or other content producing/sharing systems). An overarching question for your investigation could be: do mediated networks have different characteristics than what is accepted in traditional SNA?

Page 12: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Project 6

• Identifying ring voters in a community such as reddit.com or dig.com. The original problem was posed as a job interview question (http://www.thesixtyone.com/#/info/settings/jobs/). The problem, however, is relevant for other contexts, as well, such as eBay or rating posts in any product review listing.

Page 13: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Project 7 (Shankar Prawesh)

• Study of the topic of social distance among Information Systems (IS) researchers during period of 1980-2010. Our analysis will be based on premiere journal published in this stream. The aforementioned period covers the almost major development period in the field of information systems and computer technologies.

• Notes:– Is there a large enough dataset to make use of

parallel computing?

Page 14: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Project 8 (Ginger Johnson)• This project seeks to analyze the role of social media

networks in contemporary political events in North Africa and the Middle East.

• Datasets:– Twitter dataset available (collected until 2009) – useful for

testing code while the newer data gets collected– Twitter API available for collecting data

• A recent paper that analyzes the communication in Twitter: What is Twitter, a Social Network or a News Media? Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. Proceedings of the 19th International World Wide Web (WWW) Conference, April 26-30, 2010, Raleigh NC (USA)

Page 15: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Project 9 (Oz Cimenler)• This project goal is to contribute to our understanding

of how individuals collaborate within social networks to measure network effectiveness. One of the certain network outcomes which can be viable alternative to direct measurement of effectiveness is network innovation. Homophilous networks encourage the spread of innovation, but heterophilous network connections provide unique opportunities of access to innovation. Especially focusing on value homophily, we will generate a network indicating innovation flow of USF among researchers.

• Notes:– Computationally challenging for parallel computation?

Page 16: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Project 10 (Amy Connolly)

• The Stanford SNAP database lists Amazon co-purchases connecting "people who bought item i also bought item j" from 2003. – First, can we collect current data and compare it

to the 2003 data (and/or, compare the 2003 data from different time periods)?

– Then, does this network reflect the rich get richer phenomenon (i.e., Barabasi & Albert's scale free properties including preferential attachment)?

Page 17: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Project 11 (Richard Salkowe)

• There is an extensive online database for disaster declaration requests from 1953-2004 including awards and denials. There is also an extensive online database related to voting patterns of congressional representatives, committee appointments, tenure, and party affiliation. A Social Network Analysis of these relationships may reveal potential tendencies for disaster awards versus denials based on network ties.

Page 18: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Project 12 (Jeremy Blackburn)

• Analysis of a service rating dataset. The dataset is a bipartite graph: service providers and customers (the service providers are never customers). The project consists of large scale data collection, feature extraction, and calculation.

Page 19: Final Projects CIS 6930.007/SYA 6933.904 Spring 2011.

Project 13 (Larry Moore)

• Use the International Movie Database (IMDB.org) to infer relationships among actors.