Today’s Annual Review Meeting

31
P. Smyth: Networks MURI Meeting, Jan 1 Scalable Methods for the Analysis of Network-Based Data Annual Review Meeting Principal Investigator: Professor Padhraic Smyth Department of Computer Science University of California, Irvine Additional project information online at www.datalab.uci.edu/muri

description

Scalable Methods for the Analysis of Network-Based Data Annual Review Meeting Principal Investigator: Professor Padhraic Smyth Department of Computer Science University of California, Irvine Additional project information online at www.datalab.uci.edu/muri. Today’s Annual Review Meeting. - PowerPoint PPT Presentation

Transcript of Today’s Annual Review Meeting

Page 1: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

1

Scalable Methods for the Analysis of Network-Based Data

Annual Review Meeting

Principal Investigator: Professor Padhraic SmythDepartment of Computer ScienceUniversity of California, Irvine

Additional project information online at www.datalab.uci.edu/muri

Page 2: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

2

Today’s Annual Review Meeting

• Goals– Review our research progress– Discussion, questions, interaction– Feedback from visitors

• Format– Introduction – Research talks

• 20 minute talks + 10 minutes/session for questions/discussion – Poster session

• 1 to 2:30 (in this room)– Questions/discussion encouraged during talks– Several breaks

Butts

Page 3: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

3

MURI Project Timeline

• Initial 3-year period– May 1 2008 to April 30th 2011– Funding arrived to universities in Oct 2008

• 2-year extension: – May 1 2011 to April 30th 2013

• Meetings (all at UC Irvine)– Kickoff Meeting, November 2008– Working Meetings, April 2009, August 2009– Annual Review, December 2009– Working Meeting, May 2010– Annual Review, November 2010– Working Meeting, June 2011– Annual Review, January 2012– …….many various smaller meetings involving subsets of the research team

Page 4: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

4

Motivation2007: interdisciplinary interest in

analysis of large network data sets

Many of the available techniques were descriptive, could not handle

- Prediction- Missing data- Covariates, etc

Page 5: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

5

Motivation2007: interdisciplinary interest in

analysis of large network data sets

Many of the available techniques were descriptive, could not handle

- Prediction- Missing data- Covariates, etc

2007: significant statistical body of theory available on network modeling

Many of the available techniques did not scale up to large data sets, not widely known/understood/used, etc

Page 6: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

6

Motivation2007: interdisciplinary interest in

analysis of large network data sets

Many of the available techniques were descriptive, could not handle

- Prediction- Missing data- Covariates, etc

2007: significant statistical body of theory available on network modeling

Many of the available techniques did not scale up to large data sets, not widely known/understood/used, etc

Goal of this MURI project

Develop new statistical network models and algorithms to broaden their scope of application to large, complex, dynamic

real-world network data sets

Page 7: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

7

Key Aspects of Our Technical Approach

– Foundational statistical theory for network data

– New methods to handle heterogeneous network data (with time, text, ..)

– Efficient algorithms and data structures for scalable statistical estimation

– Applications to large real-world data sets

– Open-source software for others to build on

Page 8: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

8

Example: Network Dynamics in ClassroomsChris DuBois, Carter Butts, Padhraic Smyth, Dan McFarland (Stanford)

Page 9: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

9

Data:Count matrix of 200,000 email messages among 3000 individuals over 3 months

Problem: Understand communicationpatterns and predictfuture communication activity

Challenges: sparse data, missing data,non-stationarity,unseen covariates

C. DuBois, J. Foulds, P. Smyth, ICWSM, 2011

Example: Email Communication Data

Page 10: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

10

Example: Time Evolution of Emergency Responder Organizational Network for Hurricane Katrina

C. T. Butts, R. Acton, and C. Marcum, Interorganizational collaboration in the hurricane Katrina response, Journal of Social Structure, 2010

Page 11: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

11

MURI TeamInvestigator University Department Expertise Number

Of PhD Students

Number of Postdocs

Padhraic Smyth (PI) UC Irvine Computer Science

Machine learning 6 1

Carter Butts UC Irvine Sociology Statistical social network analysis

6

Mark Handcock UCLA Statistics Statistical social network analysis

2 1

Dave Hunter Penn State Statistics Computational statistics

2 2

David Eppstein UC Irvine Computer Science Graph algorithms 2

Michael Goodrich UC Irvine Computer Science Algorithms and data structures

2 1

Dave Mount U Maryland Computer Science Algorithms and data structures

2

TOTALS 22 5

Page 12: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

12Collaboration Network

PadhraicSmyth

DaveHunter

MarkHandcock

DaveMount

MikeGoodrich

DavidEppstein Carter

Butts

(Circa 2007)

Page 13: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

13

EmmaSpiro

LorienJasny

ZackAlmquist

ChrisMarcum

SeanFitzhugh

RagupathyrajVallyvan

RyanActon

Collaboration Network

PadhraicSmyth

DaveHunter

MarkHandcock

DaveMount

MikeGoodrich

DavidEppstein Carter

Butts

ChrisDuBois

MinkyoungCho

EunhuiPark

Miruna Petrescu-Prahova

ArthurAsuncion

JimmyFoulds

Duy Vu RuthHummel

MichaelSchweinberger

Ranran Wang

NickNavaroli

Krista Gile

Darren Strash

Lowell TrottMaarten

Loffler

JoeSimons

PavelPszona

Ian Fellows

RomainThibaux

PavelKrivitsky

Page 14: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

14

EmmaSpiro

LorienJasny

ZackAlmquist

ChrisMarcum

SeanFitzhugh

RagupathyrajVallyvan

RyanActon

Collaboration Network

PadhraicSmyth

DaveHunter

MarkHandcock

DaveMount

MikeGoodrich

DavidEppstein Carter

Butts

ChrisDuBois

MinkyoungCho

EunhuiPark

Miruna Petrescu-Prahova

ArthurAsuncion

JimmyFoulds

Duy Vu RuthHummel

MichaelSchweinberger

Ranran Wang

NickNavaroli

Krista Gile

Darren Strash

Lowell TrottMaarten

Loffler

JoeSimons

PavelPszona

Ian Fellows

RomainThibaux

PavelKrivitsky

Page 15: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

15

EmmaSpiro

LorienJasny

ZackAlmquist

ChrisMarcum

SeanFitzhugh

RagupathyrajVallyvan

RyanActon

Collaboration Network

PadhraicSmyth

DaveHunter

MarkHandcock

DaveMount

MikeGoodrich

DavidEppstein Carter

Butts

ChrisDuBois

Facebook

MinkyoungCho

EunhuiPark

Miruna Petrescu-Prahova

ArthurAsuncion

JimmyFoulds

Duy Vu RuthHummel

MichaelSchweinberger

Ranran Wang

NickNavaroli

Krista Gile

U Mass AmherstComputational Social

Science Initiative

Google

Intel

Darren Strash

Lowell TrottMaarten

Loffler

JoeSimons

PavelPszona

RANDUniversityof Utrecht

Ian Fellows

RomainThibaux

PavelKrivitsky

Page 16: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

16

Domain Theory Data Collection

Network Modeling

Mapping the Project Terrain

Page 17: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

17

Data Structures and Algorithms

Domain Theory Data Collection

Network ModelingStatistical Theory

Inference Algorithms

Mapping the Project Terrain

Page 18: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

18

SimulationHypothesisTesting

Data Structures and Algorithms

Domain Theory Data Collection

Network ModelingStatistical Theory

Inference Algorithms

Prediction/Forecasting

DecisionSupport

Mapping the Project Terrain

Page 19: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

19

SimulationHypothesisTesting

Data Structures and Algorithms

Domain Theory Data Collection

Network ModelingStatistical Theory

Inference Algorithms

Prediction/Forecasting

DecisionSupport

Mapping the Project Terrain

Page 20: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

20

Statistical Network Modeling Approaches

• Exponential Random Graph Models (ERGMs)– “Canonical” representation for statistical models of networks– Can model edge dependencies in very flexible ways – Fitting of the model can be computationally difficult

• Latent Variable Models– Edges are conditionally independent given the latent variables– Can lead to much simpler estimation algorithms than regular ERGMs– Model interpretation can be difficult

• Event-Based Models– Edges have time-stamps, models based on survival analysis– Surprisingly can be much easier to fit than models for “static” networks

Page 21: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

21

Impact: Software

• R Language and Environment– Open-source, high-level environment for statistical computing– Default standard among research statisticians - increasingly being adopted by others– Estimated 250k to 1 million users

• Statnet– R libraries for analysis of network data– New contributions from this MURI project:

• Missing data (Gile and Handcock, 2010)• Relational event models (Butts, 2008-2011)• Latent-class models (DuBois, 2010)• Fast clique-finding (Strash, 2011)• + more……

Page 22: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

22

Impact: Publications

• Approximately 60 peer-reviewed publications– across computer science, statistics, and social science– High visibility

• Science, Butts, 2009• Journal of the American Statistical Association, Schweinberger, in press• Annals of Applied Statistics, Gile and Handcock, 2010• Journal of the ACM, da Fonseca and Mount, 2010• Journal of Machine Learning Research, Asuncion, Smyth, etc, 2010

– Highly selective conferences• ACM SIGKDD 2010 (16% accept rate)• Neural Information Processing (NIPS) Conference 2009, 2011 (25% accepts)• IEEE Infocom 2010 (17.5% accepts) • Best paper and best poster awards

• Cross-pollination– Exposing computer scientists to statistical and social networking ideas– Exposing social scientists and statisticians to computational modeling ideas

Page 23: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

23

Impact: Workshops and Invited Talks• 2010 Political Networks Conference

– Workshop on Network Analysis – Presented and run by Butts and students Spiro, Fitzhugh, Almquist

• Invited Talks: Conferences and Workshops– R!2010 Conference at NIST (Handcock, 2010)– 2010 Summer School on Social Networks (Butts) – Mining and Learning with Graphs Workshop (Smyth, 2010)– NSF/SFI Workshop on Statistical Methods for the Analysis of Network Data (Handcock, 2009)– International Workshop on Graph-Theoretic Methods in Computer Science (Eppstein, 2009)– Quantitative Methods in Social Science (QMSS) Seminar, Dublin (Almquist, 2010)– + many more…..

• Invited Talks: Universities– Stanford, UCLA, Georgia Tech, U Mass, Brown, etc

Page 24: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

24

Impact: the Next Generation• Where students have gone…

– Academia: University of Massachusetts, Karlsruhe, Utrecht– Research Labs/Industry: RAND, Google, Facebook

• Students speaking at major conferences– Sunbelt International Social Network Meetings

• Jasny, Spiro, Fitzhugh, Almquist, DuBois– American Sociological Association Meetings

• Marcum, Jasny, Spiro, Fitzhugh, Almquist– 2010 ACM SIGKDD Conference (DuBois)– 2011 International Conference on Machine Learning (Vu)– 2011 Neural Information Processing Conference (Asuncion)

• Only 20 talks selected for presentation out of 1400 submissions

• Best paper awards or nominations (Spiro, Hummel, Almquist)

• National fellowships: DuBois (NDSEG), Asuncion (NSF), Navaroli (NDSEG)

Page 25: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

25

…..and the Old Generation• Carter Butts

– American Sociological Association, Leo A. Goodman award, 2010 – highest award to young methodological researchers in social science

• David Eppstein– ACM Fellow, 2011

• Michael Goodrich– ACM Fellow, IEEE Fellow, 2009

• Mark Handcock– Fellow of the American Statistical Association, 2009

• Padhraic Smyth– ACM SIGKDD Innovation Award 2009– AAAI Fellow 2010

Page 26: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

26

What Next?• Extending algorithmic advances into statistical modeling

– Will allow us to scale existing algorithms to much larger data sets

• Develop network models with richer representational power– Geographic data, temporal events, text data, actor covariates, heterogeneity, etc

• Systematically evaluate and test different approaches– evaluate ability of models to predict over time, to impute missing values, etc

• Apply these approaches to high visibility problems and data sets– e.g., online social interaction such as email, Facebook, Twitter, blogs

• Make software publicly available

Page 27: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

27 SESSION 1:

9:20 External-Memory Network Analysis Algorithms for Naturally Sparse GraphsMichael Goodrich, Professor, Computer Science, UC Irvine

 9:40 New Models for Exponential Family Random Networks

Ian Fellows, Phd student, Statistics, UCLA 10:00 Set-Differencing Data Structures

David Eppstein, Professor, Computer Science, UC Irvine

10:30 BREAK 

SESSION 2:  10:50 Hierarchical Statistical Models for Event-Based Social Network Data

Chris DuBois, Phd student, Statistics, UC Irvine

11:10 Scalable Statistical Estimation Methods for Large Time-Varying NetworksDave Hunter, Professor, Statistics, Penn State

11:30 Large-Scale Social Network Analysis of Facebook DataEmma Spiro, Phd student, Sociology, UC Irvine

 

Page 28: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

28

12:00 – 1:00 LUNCH PIs + visitors at the University Club Students + postdocs in 6011

1:00 to 2:30 POSTER SESSION with Phd students and postdoctoral fellows

2:30 – 3:40: SESSION 3

2:30 Order-Stable Parametrizations for ERGMs Carter Butts, Professor, Sociology, UC Irvine

2:50 ERGMs for Rank-Order Statistics Pavel Krivitsky, Postdoctoral Fellow, Statistics, Penn State

3:10 Estimating the Size of Hidden Populations based on Partially-Observed Network Data Mark Handcock, Professor, Statistics, UCLA

3:40 WRAP-UP, CLOSING COMMENTS (+ BEVERAGE BREAK)

4:00 ADJOURN5:00 ADJOURN

Page 29: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

29

Logistics

• All talks and posters in this room

• Wireless

• Restrooms

Page 30: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

30

Additional ResourcesProject Web site: http://www.datalab.uci.edu/muri/

Slides and Posters from AHM: http://www.datalab.uci.edu/muri/june2011/

Publications: http://www.datalab.uci.edu/muri/publications.php

Software: http://csde.washington.edu/statnet/

Data Sets: http://networkdata.ics.uci.edu/resources.php

Page 31: Today’s  Annual Review Meeting

P. Smyth: Networks MURI Meeting, Jan 10th 2012

31

QUESTIONS?