The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE...

29
The Computer Experiment in The Computer Experiment in Computational Social Science Computational Social Science Greg Madey Yongqin Gao Computer Science & Engineering University of Notre Dame http://www.nd.edu/~gmadey Eighth Annual Swarm Users/Researchers Conference University of Michigan Ann Arbor, Michigan USA May 9-11, 2004 This research was partially supported by the US National Science Foundation, CISE/IIS- Digital Society & Technology, under Grant No. 0222829

Transcript of The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE...

Page 1: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

The Computer Experiment inThe Computer Experiment inComputational Social ScienceComputational Social Science

Greg Madey

Yongqin Gao

Computer Science & EngineeringUniversity of Notre Dame

http://www.nd.edu/~gmadey

Eighth Annual Swarm Users/Researchers ConferenceUniversity of Michigan

Ann Arbor, Michigan USA

May 9-11, 2004This research was partially supported by the US National Science Foundation, CISE/IIS-

Digital Society & Technology, under Grant No. 0222829

Page 2: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

OutlineOutline

•• BackgroundBackground

•• The epistemological questionsThe epistemological questions

•• Example research questionExample research question

•• SimulationSimulation

•• Computer experimentsComputer experiments

•• DiscussionDiscussion

Page 3: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

BackgroundBackground

•• Two NFS projects using agent-basedTwo NFS projects using agent-basedsimulationsimulation

1)1) Molecules and microbes as agentsMolecules and microbes as agents2)2) Free/Open Source Software developers as agentsFree/Open Source Software developers as agents

•• Primarily scientific investigations Primarily scientific investigations —— with IT with ITtool building and simulation supporttool building and simulation support

•• How do you justify the use of simulation?How do you justify the use of simulation?–– From a philosophy of science perspective (notFrom a philosophy of science perspective (not

engineering) what do simulation results tell us?engineering) what do simulation results tell us?

Page 4: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

Why Agent-based Approach for MoleculesWhy Agent-based Approach for Molecules

low high

small

large

(Atoms numberPercentage)

(Forces between atomsElectron density)

Detail (structure)

Scale(size, temporal)

(One molecule)(nanoseconds)

(Largeecosystem)(Years)

Copyright 1998, Thomas M. Terry,TheUniversity of Conn

Elemental Cycling

Connectivity Maps

NOM1.0Daisy

StochSim

Page 5: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

The Epistemological QuestionsThe Epistemological Questions

•• How do we come to know social science knowledge?How do we come to know social science knowledge?•• What do we (or should we) accept as support forWhat do we (or should we) accept as support for

proposition in social science research?proposition in social science research?–– Often Often ““realreal”” experiments are not possible experiments are not possible

•• Only one real historyOnly one real history•• Ethical issuesEthical issues

•• What role can simulation play in answering the above?What role can simulation play in answering the above?•• Does simulation have a role beyond Does simulation have a role beyond ““fishingfishing

expeditionsexpeditions””??–– Simulation just discovers phenomenon for Simulation just discovers phenomenon for ““realreal

experimentsexperiments””??

Page 6: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

Classical Scientific MethodClassical Scientific Method1.1. Observe the worldObserve the world

a)a) Identify a puzzling phenomenonIdentify a puzzling phenomenon

2.2. Generate a falsifiable hypothesis Generate a falsifiable hypothesis (K. Popper)(K. Popper)

3.3. Design and conduct an experiment with the goal of disprovingDesign and conduct an experiment with the goal of disprovingthe hypothesisthe hypothesis

a)a) If the experiment If the experiment ““failsfails””, then the hypothesis is accepted (until replaced), then the hypothesis is accepted (until replaced)b)b) If the experiment If the experiment ““succeedssucceeds””, then reject hypothesis, but additional, then reject hypothesis, but additional

insight into the phenomenon may be obtained and steps 2-3 repeatedinsight into the phenomenon may be obtained and steps 2-3 repeated

4.4. Then add to the body of theoryThen add to the body of theorya)a) A new axiom/lawA new axiom/lawb)b) A new modelA new modelc)c) Then derive new deductions or model conclusionsThen derive new deductions or model conclusions

(Note: Realism (Note: Realism vs vs Instrumentalism)Instrumentalism)

Page 7: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

The Computer ExperimentThe Computer Experiment

Page 8: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

Agent-Based Simulation asAgent-Based Simulation asa Component of thea Component of theScientific MethodScientific Method

Hypothesis

ComputerExperiment

Observation

ConceptualModel

Agent-BasedSimulation

InterestingPhenomenon

Page 9: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

Open Source Software (OSS)Open Source Software (OSS)•• Free Free ……

–– to view sourceto view source–– to modifyto modify–– to shareto share–– of costof cost

•• ExamplesExamples–– ApacheApache–– PerlPerl–– GNUGNU–– LinuxLinux–– SendmailSendmail–– PythonPython–– KDEKDE–– GNOMEGNOME–– MozillaMozilla–– Thousands moreThousands more

LinuxGNU

Savannah

Page 10: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

Example: F/OSS StudyExample: F/OSS Study

•• Online dataOnline data–– Screen scrapingScreen scraping

–– Database dumpsDatabase dumps

•• ModelingModeling–– Social network theorySocial network theory–– Evolutionary assumptionsEvolutionary assumptions

•• SimulationSimulation–– Verification and validationVerification and validation

–– Computer experimentsComputer experiments

•• Variation of Classical Scientific MethodVariation of Classical Scientific Method

Page 11: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

Collaborative Social NetworksCollaborative Social Networks•• Research-paper co-authorship, small world phenomenon, e.g., Research-paper co-authorship, small world phenomenon, e.g., ErdosErdos

number number ((Barabasi Barabasi 2001, Newman 2001)2001, Newman 2001)

•• Movie actors, small world phenomenon, e.g., Kevin Bacon numberMovie actors, small world phenomenon, e.g., Kevin Bacon number(Watts 1999, 2003)(Watts 1999, 2003)

•• Interlocking corporate directorshipsInterlocking corporate directorships•• Terrorist NetworksTerrorist Networks•• Open-source software developers Open-source software developers ((Madey Madey et al, AMCIS 2002)et al, AMCIS 2002)

•• Collaborators are nodes in a graph, and collaborative relationship areCollaborators are nodes in a graph, and collaborative relationship arethe edges of the graph => a framework to model data/phenomenonthe edges of the graph => a framework to model data/phenomenon

Page 12: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

SourceForgeSourceForge

• VA Software• Part of OSDN• Started 12/1999• Collaboration tools• 70,000 Projects• 90,000 Developers• 800,00 RegisteredUsers

Page 13: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

ObservationsObservations

•• Web miningWeb mining•• Web crawler (scripts)Web crawler (scripts)

–– PythonPython–– PerlPerl–– AWKAWK–– SedSed

•• MonthlyMonthly•• Since Jan 2001Since Jan 2001•• ProjectIDProjectID•• DeveloperIDDeveloperID•• Almost 2 million recordsAlmost 2 million records•• Relational databaseRelational database

PROJ|DEVELOPER8001|dev3788001|dev89758001|dev99728002|dev276508005|dev313518006|dev125098007|dev193958007|dev46228007|dev356118008|dev8975

Page 14: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

15850 dev[46]dev[83] 15850 dev[46]

dev[48]

15850 dev[46]dev[56]

15850 dev[46]dev[58]

6882 dev[58]dev[47]

6882 dev[47]dev[79]

6882 dev[47]dev[52]

6882 dev[47]dev[55]

7028 dev[46]dev[99]

7028 dev[46]dev[51]

7028 dev[46]dev[57] 7597 dev[46]

dev[45]

7597 dev[46]dev[72]

7597 dev[46]dev[55]

7597 dev[46]dev[58]

7597 dev[46]dev[61]

7597 dev[46]dev[64]7597 dev[46]

dev[67]

7597 dev[46]dev[70]

9859 dev[46]dev[49]9859 dev[46]

dev[53]

9859 dev[46]dev[54]

9859 dev[46]dev[59]

dev[46]

dev[83] dev[56]

dev[48]

dev[52]

dev[79]

dev[72]

dev[51]

dev[57]

dev[55]

dev[99]

dev[47]

dev[58]

dev[53]

dev[58]

dev[65]

dev[45]

dev[70]

dev[67]

dev[59]

dev[54]

dev[49]

dev[64]

dev[61]

Project 6882

Project 9859

Project 7597

Project 7028

Project 15850

F/OSS Developers - Collaboration Social NetworkDevelopers are nodes / Projects are links

24 Developers5 Projects

2 Linchpin Developers1 Cluster

Page 15: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

Topological Analysis of the DataTopological Analysis of the Data

•• Statistics inspectedStatistics inspected–– DiameterDiameter

–– Average degreeAverage degree

–– Clustering coefficientClustering coefficient

–– Degree distributionDegree distribution–– Cluster size distributionCluster size distribution

–– Relative size of major clusterRelative size of major cluster

–– Fitness and life cycleFitness and life cycle

•• Evolution of these statisticsEvolution of these statistics

•• Dual networksDual networks–– developer network and project networkdeveloper network and project network

Page 16: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

Degree Distribution: DevelopersDegree Distribution: Developers

Page 17: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

An Example Research QuestionAn Example Research Question

•• What processes can explain the evolution of theWhat processes can explain the evolution of thedeveloper social networks?developer social networks?–– Randomly growing network (Randomly growing network (ErdosErdos--ReyniReyni, 1960)?, 1960)?–– Evolving network with preferential attachment (Evolving network with preferential attachment (BarabasiBarabasi--

Albert, 1999)?Albert, 1999)?–– Evolving network with preferential attachment and fitnessEvolving network with preferential attachment and fitness

((BarabasiBarabasi-Albert, 2001)?-Albert, 2001)?–– Evolving network with preferential attachment and fitnessEvolving network with preferential attachment and fitness

((Madey Madey et al, 2003)?et al, 2003)?

•• Can we use the computer experiment to test (falsify?)Can we use the computer experiment to test (falsify?)hypothesis about possible processes in the formation ofhypothesis about possible processes in the formation ofthe F/OSS developer networkthe F/OSS developer network

Page 18: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

Computer ExperimentsComputer Experiments•• Agent-based simulationsAgent-based simulations•• Java programs using Swarm class librariesJava programs using Swarm class libraries

–– Validation (docking) exercises using Java/RepastValidation (docking) exercises using Java/Repast

•• Grow artificial Grow artificial SourceForgeSourceForge’’s s (Epstein & Axtell, 1996)(Epstein & Axtell, 1996)

–– Parameterized with observed data, e.g., developer behaviorsParameterized with observed data, e.g., developer behaviors•• Join ratesJoin rates•• New project additionsNew project additions•• Leave projectsLeave projects

–– Evaluation of multiple models (hypotheses)Evaluation of multiple models (hypotheses)

•• Verification/falsification (simulation and hypothesis)Verification/falsification (simulation and hypothesis)•• Ensemble averages of time series dataEnsemble averages of time series data•• DistributionsDistributions•• Chi-squared testsChi-squared tests•• t-Testst-Tests•• KolmogorovKolmogorov-Smirnov tests-Smirnov tests

Page 19: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

Cycles of Modeling & SimulationCycles of Modeling & Simulation

Modeling(Hypothesis)

Agent -BasedSimulation(Experiment)

Observation

Social Network ModelsER => BA => BA+Fitness => BA+Dynamic Fitness

Grow ArtificialSourceForge

Analysis ofSourceForge

Data

Degree DistributionAverage Degree

DiameterClustering Coefficient

Cluster Size Distribution

Page 20: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

Model for SourceForgeModel for SourceForge

•• ABM ABM —— collaborative social network collaborative social network•• Model descriptionModel description–– Agent: developerAgent: developer–– Behaviors: Create, join, abandon and idleBehaviors: Create, join, abandon and idle–– Preference: developerPreference: developer’’s and projects and project’’ss–– FitnessFitness

•• Four models in iterationsFour models in iterations–– ER, BA, BA with constant fitness and BA with dynamicER, BA, BA with constant fitness and BA with dynamic

fitnessfitness

•• Comparison of empirical and simulated dataComparison of empirical and simulated data

Page 21: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

ER Model ER Model –– Degree Distribution Degree Distribution

•• DegreeDegreedistribution isdistribution isnormalnormaldistributiondistributionwhile it iswhile it ispower law inpower law inempirical dataempirical data

•• Fit Fails!Fit Fails!

Page 22: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

BA Model BA Model –– Degree Distribution Degree Distribution

•• Power laws in degreePower laws in degreedistributions, similar todistributions, similar toempirical data (o forempirical data (o forsimulated data and x forsimulated data and x forempirical data).empirical data).

•• For developerFor developerdistribution: simulateddistribution: simulateddata has Rdata has R22 as 0.9798 as 0.9798and empirical data has Rand empirical data has R22

as 0.9714.as 0.9714.•• For project distribution:For project distribution:

simulated data has Rsimulated data has R22 as as0.6650 and empirical0.6650 and empiricaldata has Rdata has R22 as 0.9838. as 0.9838.

•• Partial Fit!Partial Fit!

Page 23: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

BA Model with Constant FitnessBA Model with Constant Fitness

•• Power laws in degreePower laws in degreedistributions, similar todistributions, similar toempirical data (o for simulatedempirical data (o for simulateddata and x for empirical data).data and x for empirical data).

•• For developer distribution:For developer distribution:simulated data has Rsimulated data has R22 as as0.9742 and empirical data has0.9742 and empirical data hasRR22 as 0.9714. as 0.9714.

•• For project distribution:For project distribution:simulated data has Rsimulated data has R22 as as0.7253 and empirical data has0.7253 and empirical data hasRR22 as 0.9838. as 0.9838.

•• Improved fit!Improved fit!

Page 24: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

Discovery: Project Life CycleDiscovery: Project Life Cycle

Page 25: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

BA Model with Dynamic FitnessBA Model with Dynamic Fitness

•• Power laws in degreePower laws in degreedistribution, similar todistribution, similar toempirical data (o forempirical data (o forsimulated data and x forsimulated data and x forempirical data).empirical data).

•• For developer distribution:For developer distribution:simulated data has Rsimulated data has R22 as as0.9695 and empirical data has0.9695 and empirical data hasRR22 as 0.9714. as 0.9714.

•• For project distribution:For project distribution:simulated data has Rsimulated data has R22 as as0.8051 and empirical data has0.8051 and empirical data hasRR22 as 0.9838. as 0.9838.

•• Somewhat better fit!Somewhat better fit!

Page 26: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

Models of the F/OSS Social NetworkModels of the F/OSS Social Network(Alternative Hypotheses)(Alternative Hypotheses)

•• General model featuresGeneral model features–– Agents are nodes on a graph (developers or projects)Agents are nodes on a graph (developers or projects)–– Behaviors: Create, join, abandon and idleBehaviors: Create, join, abandon and idle–– Edges are relationships (joint project participation)Edges are relationships (joint project participation)–– Growth of network: random or types of preferentialGrowth of network: random or types of preferential

attachment, formation of clustersattachment, formation of clusters–– FitnessFitness–– Network attributes: diameter, average degree, degreeNetwork attributes: diameter, average degree, degree

distribution, clustering coefficientdistribution, clustering coefficient•• Four specific modelsFour specific models

–– ER (random graph) - (1960)ER (random graph) - (1960)–– BA (preferential attachment) - (1999)BA (preferential attachment) - (1999)–– BA ( + constant fitness) - (2001)BA ( + constant fitness) - (2001)–– BA ( + dynamic fitness) - (2003)BA ( + dynamic fitness) - (2003)

Page 27: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

DiscussionDiscussion•• Is simulation better for falsification, but weaker at confirmationIs simulation better for falsification, but weaker at confirmation

of hypotheses?of hypotheses?•• Under what conditions can simulation results be accepted asUnder what conditions can simulation results be accepted as

confirmation of a hypothesis?confirmation of a hypothesis?–– Need more validation/verification of simulationsNeed more validation/verification of simulations

•• Confidence in resultsConfidence in results•• Case of computer proofs (four color problem in mathematics)Case of computer proofs (four color problem in mathematics)•• Need for open source/open dataNeed for open source/open data

–– For replication of results?For replication of results?–– For docking and model-2-model comparisonsFor docking and model-2-model comparisons

•• Or is the real value of the simulation for Or is the real value of the simulation for ““fishing aroundfishing around”” for fordeveloping new hypotheses? Discovery?developing new hypotheses? Discovery?–– Hidden relationships/rules-of-operationsHidden relationships/rules-of-operations–– Hidden features of componentsHidden features of components–– Black-box, grey-box, white-box modelsBlack-box, grey-box, white-box models–– Discovery by reverse engineeringDiscovery by reverse engineering

Page 28: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

SummarySummary

•• Why Agent-Based Modeling and Simulation?Why Agent-Based Modeling and Simulation?–– Can be used as components of the Scientific MethodCan be used as components of the Scientific Method–– A research approach for studying socio-technical systemsA research approach for studying socio-technical systems

•• Case study: F/OSS - Collaboration Social NetworksCase study: F/OSS - Collaboration Social Networks–– SourceForge SourceForge conceptual models: ER, BA, BA with constantconceptual models: ER, BA, BA with constant

fitness and BA with dynamic fitness.fitness and BA with dynamic fitness.–– SimulationsSimulations

•• Computer experiments rejected some and confirmed plausibility of oneComputer experiments rejected some and confirmed plausibility of onehypothesishypothesis

•• Provided insight into the phenomenon under study and guided data miningProvided insight into the phenomenon under study and guided data miningof collected observationsof collected observations

•• Provided focus for additional data collection and Provided focus for additional data collection and ““real experimentsreal experiments””..

Page 29: The Computer Experiment in Computational Social Scienceoss/Papers/Swarm04_computer... · – KDE – GNOME – Mozilla – Thousands more Linux GNU Savannah. Example: F/OSS Study

Thank youThank you