Today’s Annual Review Meeting
description
Transcript of Today’s Annual Review Meeting
P. Smyth: Networks MURI Meeting, Jan 10th 2012
1
Scalable Methods for the Analysis of Network-Based Data
Annual Review Meeting
Principal Investigator: Professor Padhraic SmythDepartment of Computer ScienceUniversity of California, Irvine
Additional project information online at www.datalab.uci.edu/muri
P. Smyth: Networks MURI Meeting, Jan 10th 2012
2
Today’s Annual Review Meeting
• Goals– Review our research progress– Discussion, questions, interaction– Feedback from visitors
• Format– Introduction – Research talks
• 20 minute talks + 10 minutes/session for questions/discussion – Poster session
• 1 to 2:30 (in this room)– Questions/discussion encouraged during talks– Several breaks
Butts
P. Smyth: Networks MURI Meeting, Jan 10th 2012
3
MURI Project Timeline
• Initial 3-year period– May 1 2008 to April 30th 2011– Funding arrived to universities in Oct 2008
• 2-year extension: – May 1 2011 to April 30th 2013
• Meetings (all at UC Irvine)– Kickoff Meeting, November 2008– Working Meetings, April 2009, August 2009– Annual Review, December 2009– Working Meeting, May 2010– Annual Review, November 2010– Working Meeting, June 2011– Annual Review, January 2012– …….many various smaller meetings involving subsets of the research team
P. Smyth: Networks MURI Meeting, Jan 10th 2012
4
Motivation2007: interdisciplinary interest in
analysis of large network data sets
Many of the available techniques were descriptive, could not handle
- Prediction- Missing data- Covariates, etc
P. Smyth: Networks MURI Meeting, Jan 10th 2012
5
Motivation2007: interdisciplinary interest in
analysis of large network data sets
Many of the available techniques were descriptive, could not handle
- Prediction- Missing data- Covariates, etc
2007: significant statistical body of theory available on network modeling
Many of the available techniques did not scale up to large data sets, not widely known/understood/used, etc
P. Smyth: Networks MURI Meeting, Jan 10th 2012
6
Motivation2007: interdisciplinary interest in
analysis of large network data sets
Many of the available techniques were descriptive, could not handle
- Prediction- Missing data- Covariates, etc
2007: significant statistical body of theory available on network modeling
Many of the available techniques did not scale up to large data sets, not widely known/understood/used, etc
Goal of this MURI project
Develop new statistical network models and algorithms to broaden their scope of application to large, complex, dynamic
real-world network data sets
P. Smyth: Networks MURI Meeting, Jan 10th 2012
7
Key Aspects of Our Technical Approach
– Foundational statistical theory for network data
– New methods to handle heterogeneous network data (with time, text, ..)
– Efficient algorithms and data structures for scalable statistical estimation
– Applications to large real-world data sets
– Open-source software for others to build on
P. Smyth: Networks MURI Meeting, Jan 10th 2012
8
Example: Network Dynamics in ClassroomsChris DuBois, Carter Butts, Padhraic Smyth, Dan McFarland (Stanford)
P. Smyth: Networks MURI Meeting, Jan 10th 2012
9
Data:Count matrix of 200,000 email messages among 3000 individuals over 3 months
Problem: Understand communicationpatterns and predictfuture communication activity
Challenges: sparse data, missing data,non-stationarity,unseen covariates
C. DuBois, J. Foulds, P. Smyth, ICWSM, 2011
Example: Email Communication Data
P. Smyth: Networks MURI Meeting, Jan 10th 2012
10
Example: Time Evolution of Emergency Responder Organizational Network for Hurricane Katrina
C. T. Butts, R. Acton, and C. Marcum, Interorganizational collaboration in the hurricane Katrina response, Journal of Social Structure, 2010
P. Smyth: Networks MURI Meeting, Jan 10th 2012
11
MURI TeamInvestigator University Department Expertise Number
Of PhD Students
Number of Postdocs
Padhraic Smyth (PI) UC Irvine Computer Science
Machine learning 6 1
Carter Butts UC Irvine Sociology Statistical social network analysis
6
Mark Handcock UCLA Statistics Statistical social network analysis
2 1
Dave Hunter Penn State Statistics Computational statistics
2 2
David Eppstein UC Irvine Computer Science Graph algorithms 2
Michael Goodrich UC Irvine Computer Science Algorithms and data structures
2 1
Dave Mount U Maryland Computer Science Algorithms and data structures
2
TOTALS 22 5
P. Smyth: Networks MURI Meeting, Jan 10th 2012
12Collaboration Network
PadhraicSmyth
DaveHunter
MarkHandcock
DaveMount
MikeGoodrich
DavidEppstein Carter
Butts
(Circa 2007)
P. Smyth: Networks MURI Meeting, Jan 10th 2012
13
EmmaSpiro
LorienJasny
ZackAlmquist
ChrisMarcum
SeanFitzhugh
RagupathyrajVallyvan
RyanActon
Collaboration Network
PadhraicSmyth
DaveHunter
MarkHandcock
DaveMount
MikeGoodrich
DavidEppstein Carter
Butts
ChrisDuBois
MinkyoungCho
EunhuiPark
Miruna Petrescu-Prahova
ArthurAsuncion
JimmyFoulds
Duy Vu RuthHummel
MichaelSchweinberger
Ranran Wang
NickNavaroli
Krista Gile
Darren Strash
Lowell TrottMaarten
Loffler
JoeSimons
PavelPszona
Ian Fellows
RomainThibaux
PavelKrivitsky
P. Smyth: Networks MURI Meeting, Jan 10th 2012
14
EmmaSpiro
LorienJasny
ZackAlmquist
ChrisMarcum
SeanFitzhugh
RagupathyrajVallyvan
RyanActon
Collaboration Network
PadhraicSmyth
DaveHunter
MarkHandcock
DaveMount
MikeGoodrich
DavidEppstein Carter
Butts
ChrisDuBois
MinkyoungCho
EunhuiPark
Miruna Petrescu-Prahova
ArthurAsuncion
JimmyFoulds
Duy Vu RuthHummel
MichaelSchweinberger
Ranran Wang
NickNavaroli
Krista Gile
Darren Strash
Lowell TrottMaarten
Loffler
JoeSimons
PavelPszona
Ian Fellows
RomainThibaux
PavelKrivitsky
P. Smyth: Networks MURI Meeting, Jan 10th 2012
15
EmmaSpiro
LorienJasny
ZackAlmquist
ChrisMarcum
SeanFitzhugh
RagupathyrajVallyvan
RyanActon
Collaboration Network
PadhraicSmyth
DaveHunter
MarkHandcock
DaveMount
MikeGoodrich
DavidEppstein Carter
Butts
ChrisDuBois
MinkyoungCho
EunhuiPark
Miruna Petrescu-Prahova
ArthurAsuncion
JimmyFoulds
Duy Vu RuthHummel
MichaelSchweinberger
Ranran Wang
NickNavaroli
Krista Gile
U Mass AmherstComputational Social
Science Initiative
Intel
Darren Strash
Lowell TrottMaarten
Loffler
JoeSimons
PavelPszona
RANDUniversityof Utrecht
Ian Fellows
RomainThibaux
PavelKrivitsky
P. Smyth: Networks MURI Meeting, Jan 10th 2012
16
Domain Theory Data Collection
Network Modeling
Mapping the Project Terrain
P. Smyth: Networks MURI Meeting, Jan 10th 2012
17
Data Structures and Algorithms
Domain Theory Data Collection
Network ModelingStatistical Theory
Inference Algorithms
Mapping the Project Terrain
P. Smyth: Networks MURI Meeting, Jan 10th 2012
18
SimulationHypothesisTesting
Data Structures and Algorithms
Domain Theory Data Collection
Network ModelingStatistical Theory
Inference Algorithms
Prediction/Forecasting
DecisionSupport
Mapping the Project Terrain
P. Smyth: Networks MURI Meeting, Jan 10th 2012
19
SimulationHypothesisTesting
Data Structures and Algorithms
Domain Theory Data Collection
Network ModelingStatistical Theory
Inference Algorithms
Prediction/Forecasting
DecisionSupport
Mapping the Project Terrain
P. Smyth: Networks MURI Meeting, Jan 10th 2012
20
Statistical Network Modeling Approaches
• Exponential Random Graph Models (ERGMs)– “Canonical” representation for statistical models of networks– Can model edge dependencies in very flexible ways – Fitting of the model can be computationally difficult
• Latent Variable Models– Edges are conditionally independent given the latent variables– Can lead to much simpler estimation algorithms than regular ERGMs– Model interpretation can be difficult
• Event-Based Models– Edges have time-stamps, models based on survival analysis– Surprisingly can be much easier to fit than models for “static” networks
P. Smyth: Networks MURI Meeting, Jan 10th 2012
21
Impact: Software
• R Language and Environment– Open-source, high-level environment for statistical computing– Default standard among research statisticians - increasingly being adopted by others– Estimated 250k to 1 million users
• Statnet– R libraries for analysis of network data– New contributions from this MURI project:
• Missing data (Gile and Handcock, 2010)• Relational event models (Butts, 2008-2011)• Latent-class models (DuBois, 2010)• Fast clique-finding (Strash, 2011)• + more……
P. Smyth: Networks MURI Meeting, Jan 10th 2012
22
Impact: Publications
• Approximately 60 peer-reviewed publications– across computer science, statistics, and social science– High visibility
• Science, Butts, 2009• Journal of the American Statistical Association, Schweinberger, in press• Annals of Applied Statistics, Gile and Handcock, 2010• Journal of the ACM, da Fonseca and Mount, 2010• Journal of Machine Learning Research, Asuncion, Smyth, etc, 2010
– Highly selective conferences• ACM SIGKDD 2010 (16% accept rate)• Neural Information Processing (NIPS) Conference 2009, 2011 (25% accepts)• IEEE Infocom 2010 (17.5% accepts) • Best paper and best poster awards
• Cross-pollination– Exposing computer scientists to statistical and social networking ideas– Exposing social scientists and statisticians to computational modeling ideas
P. Smyth: Networks MURI Meeting, Jan 10th 2012
23
Impact: Workshops and Invited Talks• 2010 Political Networks Conference
– Workshop on Network Analysis – Presented and run by Butts and students Spiro, Fitzhugh, Almquist
• Invited Talks: Conferences and Workshops– R!2010 Conference at NIST (Handcock, 2010)– 2010 Summer School on Social Networks (Butts) – Mining and Learning with Graphs Workshop (Smyth, 2010)– NSF/SFI Workshop on Statistical Methods for the Analysis of Network Data (Handcock, 2009)– International Workshop on Graph-Theoretic Methods in Computer Science (Eppstein, 2009)– Quantitative Methods in Social Science (QMSS) Seminar, Dublin (Almquist, 2010)– + many more…..
• Invited Talks: Universities– Stanford, UCLA, Georgia Tech, U Mass, Brown, etc
P. Smyth: Networks MURI Meeting, Jan 10th 2012
24
Impact: the Next Generation• Where students have gone…
– Academia: University of Massachusetts, Karlsruhe, Utrecht– Research Labs/Industry: RAND, Google, Facebook
• Students speaking at major conferences– Sunbelt International Social Network Meetings
• Jasny, Spiro, Fitzhugh, Almquist, DuBois– American Sociological Association Meetings
• Marcum, Jasny, Spiro, Fitzhugh, Almquist– 2010 ACM SIGKDD Conference (DuBois)– 2011 International Conference on Machine Learning (Vu)– 2011 Neural Information Processing Conference (Asuncion)
• Only 20 talks selected for presentation out of 1400 submissions
• Best paper awards or nominations (Spiro, Hummel, Almquist)
• National fellowships: DuBois (NDSEG), Asuncion (NSF), Navaroli (NDSEG)
P. Smyth: Networks MURI Meeting, Jan 10th 2012
25
…..and the Old Generation• Carter Butts
– American Sociological Association, Leo A. Goodman award, 2010 – highest award to young methodological researchers in social science
• David Eppstein– ACM Fellow, 2011
• Michael Goodrich– ACM Fellow, IEEE Fellow, 2009
• Mark Handcock– Fellow of the American Statistical Association, 2009
• Padhraic Smyth– ACM SIGKDD Innovation Award 2009– AAAI Fellow 2010
P. Smyth: Networks MURI Meeting, Jan 10th 2012
26
What Next?• Extending algorithmic advances into statistical modeling
– Will allow us to scale existing algorithms to much larger data sets
• Develop network models with richer representational power– Geographic data, temporal events, text data, actor covariates, heterogeneity, etc
• Systematically evaluate and test different approaches– evaluate ability of models to predict over time, to impute missing values, etc
• Apply these approaches to high visibility problems and data sets– e.g., online social interaction such as email, Facebook, Twitter, blogs
• Make software publicly available
P. Smyth: Networks MURI Meeting, Jan 10th 2012
27 SESSION 1:
9:20 External-Memory Network Analysis Algorithms for Naturally Sparse GraphsMichael Goodrich, Professor, Computer Science, UC Irvine
9:40 New Models for Exponential Family Random Networks
Ian Fellows, Phd student, Statistics, UCLA 10:00 Set-Differencing Data Structures
David Eppstein, Professor, Computer Science, UC Irvine
10:30 BREAK
SESSION 2: 10:50 Hierarchical Statistical Models for Event-Based Social Network Data
Chris DuBois, Phd student, Statistics, UC Irvine
11:10 Scalable Statistical Estimation Methods for Large Time-Varying NetworksDave Hunter, Professor, Statistics, Penn State
11:30 Large-Scale Social Network Analysis of Facebook DataEmma Spiro, Phd student, Sociology, UC Irvine
P. Smyth: Networks MURI Meeting, Jan 10th 2012
28
12:00 – 1:00 LUNCH PIs + visitors at the University Club Students + postdocs in 6011
1:00 to 2:30 POSTER SESSION with Phd students and postdoctoral fellows
2:30 – 3:40: SESSION 3
2:30 Order-Stable Parametrizations for ERGMs Carter Butts, Professor, Sociology, UC Irvine
2:50 ERGMs for Rank-Order Statistics Pavel Krivitsky, Postdoctoral Fellow, Statistics, Penn State
3:10 Estimating the Size of Hidden Populations based on Partially-Observed Network Data Mark Handcock, Professor, Statistics, UCLA
3:40 WRAP-UP, CLOSING COMMENTS (+ BEVERAGE BREAK)
4:00 ADJOURN5:00 ADJOURN
P. Smyth: Networks MURI Meeting, Jan 10th 2012
29
Logistics
• All talks and posters in this room
• Wireless
• Restrooms
P. Smyth: Networks MURI Meeting, Jan 10th 2012
30
Additional ResourcesProject Web site: http://www.datalab.uci.edu/muri/
Slides and Posters from AHM: http://www.datalab.uci.edu/muri/june2011/
Publications: http://www.datalab.uci.edu/muri/publications.php
Software: http://csde.washington.edu/statnet/
Data Sets: http://networkdata.ics.uci.edu/resources.php
P. Smyth: Networks MURI Meeting, Jan 10th 2012
31
QUESTIONS?