Game Theoretic Analysis for Offense-Defense Challenges of...

Game Theoretic Analysis for Offense-DefenseChallenges of Algorithm Contests on TopCoder

Zhenghui Hu and Wenjun WuState Key Laboratory of Software Development Environment

Department of Computer Science and Engineering, Beihang University

Beijing, China 100191

[email protected] [email protected]

Abstract—Software crowdsourcing platforms such as Top-Coder have successfully adopted the offense-defense based qualityassurance mechanism into software development process to deliv-er high-quality software solutions. TopCoder algorithm contestsrun single-round matches (SRM) with the challenge phase thatallows participants to find bugs in the submitted programand eliminate their opponents. In the paper, we introduce agame theoretic model to study the competitive behaviors in thechallenge phase of SRM. By analyzing the Nash Equilibrium ofour multiple-person game model, we find that the probabilityof making a successful challenge and effort cost are the majorfactors for contestants’ decisions. To verify the theoretic result,we make empirical data analysis with the dataset collected fromthe algorithm challenge phase on TopCoder. And the resultsindicate that contestants with a high rating are more likely tolaunch challenges against lower ones. However, contestants withthe highest rating may be unwilling to challenge to avoid risksof losing their points in the contests.

1. INTRODUCTION

Crowdsourcing is a powerful paradigm that allows thewisdom of crowds to apply to more problems with fasterand better results [1]. As a new and efficient innovation, ithas already demonstrated its capability of being a problem-solving mechanism for various groups, such as government,business, nonprofit use, researchers, artists, and even softwaredesign and development [2]. Obtaining quality software isa common but challenging goal for software crowdsourcingprojects. Software quality has many dimensions includingreliability, performance, security, safety, maintainability, con-trollability, and usability. Based on the observation of suc-cess practices of crowdsourcing software tests such as Top-Coder(www.topcoder.com) and uTest(www.utest.com), we findthat the offense-defense based quality assurance is one of themost fundamental way to eliminate the potential deficits in thedocuments, models and codes submitted by crowd. Multiplegroups from the community undertake different responsibilitiesof software development tasks and intensively cooperate witheach other to find problems in their counterpart’s work andreduce the bugs in their own work.

Although research efforts have been made to study thetheory and practices of software crowdsourcing, few havebeen done about the offense-defense quality assurance. Manyresearch questions remain such as the major decision factorfor participants to launch offenses against their peers andthe effectiveness of the offense-defense mechanism on thequality assurance. In order to investigate the behaviors ofparticipants in the scenario of offense-defense scenario of

software crowdsourcing, this paper focus on studying theoffense-defense occurred during the algorithm contests on theTopCoder platform.

TopCoder, a crowdsourcing development community, hasattracted 716,180 registered members by December 2014 [3].It applied the competitive and interesting programming contestas the kernel of the platforms software development pattern.On TopCoder, a project will be divided into several stagesand presents people with various categories of contests, suchas conception, specification, architecture, design, development,assembly, and test suites. Among these types of contests, thealgorithm contest type is regarded as the most important pro-gramming contest in TopCoder, which is used as an effectiveand primary way to attract and retain the community members.

Algorithm contests, hosted fortnightly on TopCoder, aredesigned to encourage wide participation by the TopCodercommunity. And single round match (SRM), the main form ofalgorithm competitions, consists of three phases including thecoding phase, the challenge phase, and system test phase. Themajor goal of the challenge phase is to collect valid test casesfor finding the faulty programs submitted by participants atthe coding phase. It reflects strong min-max interaction whereplayers take offensive actions against their peers and eliminatethe weaker players. Therefore, such a contest provides us greatopportunity to observe and analyze the competitive behaviorsof contestants during the real offense-defense scenario such asalgorithm challenge phase.

In our previous research on the TopCoder’s algorithmcontests [4], we proposed a two-person competitive model onthe basis of game theory. In order to make further investigationinto the behaviors and performances of contestants during thealgorithm challenge phase, we extend the model to a multiple-person game model in the paper, along with empirical analyses.

The rest of the paper is organized as follows. Section 2provides the overview of related work. Section 3 specifical-ly introduces our multiple-person game model for softwarecrowdsourcing. Section 4 presents empirical data analysis ofSRM history on TopCoder to validate the game theoreticmodel. And we conclude the paper and discuss the future workin section 5.

2. RELATED WORK

This section reviews the related work on the two mainareas: software development crowdsourcing and game theory’sapplications in crowdsourcing.

2015 IEEE Symposium on Service-Oriented System Engineering

978-1-4799-8356-8/15 $31.00 © 2015 IEEE

DOI 10.1109/SOSE.2015.44

339

A. Software Development CrowdsourcingCrowdsourcing has been more and more popular for people

to deal with various issues that require human intelligence suc-cessfully. And researchers are promoted to apply it in softwareengineering, one of the most challenging and creative activi-ties. Many companies, such as TopCoder, uTest and oDesk,have already adopted the new development mode–softwarecrowdsourcing in different forms. TopCoder and uTest selectmembers to solve problems, while oDesk serves as an onlineliaison between clients and freelance software developers [5].Additionally, successful crowdsourcing platforms such as AppStore and TopCoder, have demonstrated the capability andpotential of crowdsourcing on supporting various softwaredevelopment activities, coding and testing included [6].

Several researches have already been done for softwarecrowdsourcing. Lakhani, Garvin and Lonstein [5] discusseda variety of issues of TopCoder in detail, such as the evolutionof community, the resource management mode, challengesand the future. In [7], competitive and collaborative soft-ware frameworks for online crowdsourcing were investigated.Moreover, empirical analyses on TopCoder have been doneto analyze strategic behaviors of contestants and collaborativecompetition in specific software development projects [8, 9].Ke Mao et al. [10] addressed the important pricing questionsoccurred during the development of software crowdsourcing,while Wenjun Wu et al. [6] presented an evaluation frameworkfor software crowdsourcing. Lately, [11] analyzed the keyfactors for software quality in crowdsourcing development,[12] discussed the influence of the competition’s level anda tournament’s size on performance, and the relationshipsbetween customer, business and technical characteristics weredescribed in [13]. Nevertheless, crowdsourcing software devel-opment is still at its infant stage, and few research efforts havebeen made on the offense-defense quality assurance. There-fore, it’s essential to make some further work on competitivebehaviors during the process of offense-defense in softwarecrowdsourcing.

B. Game Theory’s Applications in CrowdsourcingGame theory, introduced by von Neumann and Morgen-

stern, studies mathematical models of conflict and cooperationbetween intelligent rational decision-makers [14]. It has beenused in economics, political science, psychology, logic, andbiology, as well as crowdsourcing, by offering techniques forformulating competitions between parties that wish to reachan optimal position.

[15] and [16] applied the game theory to design optimalcrowdsourcing contests. [17] used all-pay auctions to evaluatethe effects of reward size and early high-quality submission onoverall participation and submission quality on Taskcn. Otherwork adopted game theory or all-pay auctions to seek for greatincentives for workers to exert effort, the relationship betweenrewards and participation, and how to obtain perfect outputswith a high quality [18–23]. Most current game theoretic mod-els about software crowdsourcing are focused on design andprogramming contests where players attempt to outperformothers for contest prizes. Few research papers tackle with theissue on the strong competitions through which players caneliminate their opponents. In this paper, we apply the gametheory to study and analyze competitive behaviors in the SRMchallenges by extending the two-person model defined in [4].

Fig. 1: Algorithm contests on TopCoder

3. MULTIPLE-PERSON GAME MODEL FOR SOFTWARE

CROWDSOURCING

A. Overview of TopCoder Challenge in SRMFor each rated algorithm contest, there are three phases:

the coding phase, the challenge phase, and system-testingphase [24, 25].

(1) The Coding PhaseThe Coding Phase is a timed event where all contestants are

presented with the same three algorithm questions representingthree levels of complexity, and accordingly, three levels ofpoint earnings potential. Within the time constraint of 75minutes, contestants have to design a solution and implement itin programming language. Upon submission of their solutionsto a problem, the system checks whether the submitted codecan successfully compiles. If so, contestants are awarded withpoints calculated based on the problem difficulty and theelapsed time for accomplishing the solution.

(2) The Challenge PhaseThe Challenge Phase is a offense-defense process with the

goal to collect test cases against the submission. It presentseach competitor a chance to review their peer’s source codeand challenge the functionality of the solutions. Within 15minutes, a contestant needs to locate the most vulnerable codeand decides whether to send a test case to challenge the code.A successful challenge will result in a loss of the originalproblem submission points by the defendant, and a 50-pointreward for the challenger. Unsuccessful challengers will incura point reduction of 25 points as a penalty, applied againsttheir total score in that round of competition.

(3) The System-testing PhaseThe System-testing Phase is not an interactive process. It

is run by an automated tester responsible for checking thesubmitted source code with specified inputs. All successfulchallenges from the challenge phase will be added to the setsof inputs for the System-testing Phase. With the test cases fromboth TopCoder team and the contestants in the challenge phase,the automated tester applies the test cases to all submitted codethat has not already been successfully challenged. If the testerfinds code that is flawed, the author of that code submissionwill lose all of the points that were originally earned for thatcode submission.

Additionally, each algorithm contest consists of two

340

divisions–division I and division II, each of which hostscontestants based on their rating values. And in each division,three algorithm problems with different difficulty level arepresented to participants. They are asked to design programsas proper solutions to the problems within the given time inthe coding phase. In division I, the players are more skillfulat the contests as they are required to acquire a high ratingvalue greater than or equal to 1200. While the contests indivision II with the lower requirement allows less-skilledmembers, and even novices to participant. In each division,all registered participants are placed into virtual competitiverooms with around 20 contestants. And this room assignmentsfor algorithm contests really only matter for the challengephase.

In order to observe what decisions a rational person makesunder the competitive environment, we concentrate more onthe behaviors and phenomena occurred in division I. Becausewe assume that members allowed to enter into division I arealready familiar with TopCoder and the algorithm contest, thusthey will make decisions rationally and reasonably during algo-rithm challenges. And the simultaneous game of our attentionoccurs in the virtual room during the algorithm challenge phasein the form of challenges.

B. Description of Multiple-person Game ModelConsider n participants playing in a simultaneous game in

the form of challenges for some tempting rewards. This gameis modelled by a tuple 〈P, {Σi}, {μi}〉, where P represents theset of n players, Σi denotes the finite strategy space for playeriεP , μi(·) is the utility function of player i. Furthermore, wedefine π = (π1, · · ·, πn) as the strategy profile which is anassociation of strategies made by the n players. The strategyprofile π can be described by a directed graph Gπ = (P,E),where the set of nodes P = {p1, p2, · · · , pn} is the equiv-alent of the set of players, and the set of directed edgesE = {(pi, pj)| i challenges j′s code at the challenge phase}represents the challenges in π.

A player i is characterized by his skill level si andresources ri in the form of scores for his performance atthe coding phase. And we presume that a player can onlychallenge one at a time. That is, players are not allowedto challenge more than one person simultaneously. Thereby,for each player there are n available options or actions:not challenge or choose one of the rest n − 1 players tochallenge. During the game, a successful challenge is awardedby return R, while an unsuccessful one will be penalizedby loss L. Here, we assume that the game or contest is anapplication of complete information game theory, which meansthat every player knows the strategies and payoffs available toall players in the same contest. For simplicity, we presumethat a contestant is related to one solution of some difficultylevel. We think this assumption is reasonable as challenges canhappen only once at a time. In the meanwhile, only one of thesource code solutions can be challenged.

The beginning of the algorithm challenge phase is definedas the model’s initial state π0, where there are no challenges.And every contestant i has an associated ability power si andresources ri in the form of scores obtained in algorithm codingphase. In every strategy profile π �= π0, we say that there isa challenge between i and j if either contestant i challengescontestant j or contestant j challenges contestant i.

When player i attempts to challenge player j, there are onlytwo possible results: either i’s effort succeeds or fails. Let Xi,j

is a Bernoulli random variable representing the outcome of thechallenge between i and j as follows:

Xi,j =

⎧⎨⎩1 if i succeeds

0 otherwise

(3.1)

Additionally, for a challenge at the challenge phase, any playercan be a challenger, and also be challenged at the same time.And in our algorithm game, the number of challenges betweentwo contestants can be zero, one, and two.

The probability of a player i to win the challenge against jis denoted by P (Xi,j). In our game model, we assume that achallenge entails a cost on its initiator such as time or financialexpense incurred by the effort of finding test cases against othercontestants’ programs. We define C(Xi,j) to represent the costof the challenge between contestants i and j. Specifically, ifcontestant i tries to challenge contestant j with a successfulprobability of P (Xi,j = 1), player i’s effort incurs a costCi(Xi,j); and conversely, if contestant j aims to challengecontestant i with a winning probability of P (Xj,i = 1), thecost of player j is denoted as Cj(Xj,i).

In each virtual room of algorithm challenges, under thestrategy π0 where no challenge occurs, the utility for everyplayer is zero. Otherwise, we assume that each participant triesto maximize its expected utility with probabilities. During thechallenge phase, the expected utility of every player iεP isthe expected gain from the challenges that he gets involvedin minus the cost. And the gain that a player can achievefrom a challenge depends upon his role in the challenge.In the following, we discuss the utility/payoff functions forcontestants under the scenario that player i initiates a challengeto player j.

Define πij as the strategy profile where contestant i initiatesthe offense against j while contestant j doesn’t launch anactive offense back. Then the gain for contestant i is,

Gaini(Xi,j) =

⎧⎨⎩50 for Xi,j = 1

−25 for Xi,j = 0

(3.2)

Therefore, the (expected) utility of player i in πij is

μi(πij) = P (Xi,j) ·Gaini(Xi,j)− Ci(Xi,j)

= P (Xi,j = 1) ·Gaini(Xi,j = 1) (3.3)

+P (Xi,j = 0) ·Gaini(Xi,j = 0)− Ci(Xi,j)

And the gain for contestant j is,

Gainj(Xi,j) =

⎧⎨⎩−rj for Xi,j = 1

0 for Xi,j = 0

(3.4)

Therefore the (expected) utility of player j in πij is

μj(πij) = P (Xi,j) ·Gainj(Xi,j)

= P (Xi,j = 1) ·Gainj(Xi,j = 1)

+P (Xi,j = 0) ·Gainj(Xi,j = 0) (3.5)

= P (Xi,j = 1) ·Gainj(Xi,j = 1)

341

C. Equilibrium of Multiple-person Game ModelWhich strategy will contestants tend to select during the

algorithm challenge phase? Do they always adopt an attackingstrategy or a defensive one? What are the optimal strategies foreach of them? To answer such problems, we should analyzethe Nash Equilibrium of our multiple-person game model.

Definition 1 (Nash Equlilibrium:) A strategy profile π∗ =(π∗1 , π

∗2 , · · · , π∗n) is a Nash Equilibrium(NE) if for each player

i, every other possible strategy profile can’t achieve higherutility than π∗, i.e., ∀iεP, μi(π

∗1 , · · · , πi, · · · , π∗n) ≤ μi(π

∗).Although it’s difficult to solve the Nash Equilibrium for

the multiple-person game directly, one can take advantage ofthe characteristics of the problem to simplify the computingprocedure. And as previously discussed, the players in ourgame are not able to challenge more than one person at a time.Therefore, each player’s utility consists of two major parts: theutility yield by his offense against his rival and loss incurredby the offense against him.

μi(π) = μi(πi,j) +Σn−1

j=1Gaini(Xj,i)

Σn−1j=1Xj,i

(3.6)

Apparently, player i has no option to change the second partbecause it is totally depends upon the decisions made by theother players in the same room. The player can only decidewhether to be a challenger. If his decision is positive, heneeds to evaluate the capability of the other players and selectthe most promising target to launch an offense for his gain.According to this assumption, we can simply reduce the NashEquilibrium analysis of the multiple-person game into that ofa two-player games ( one player against N − 1 players). Inthis paper, we focus on the directed utility output of these2-player games, which is enough to achieve the main goalof our research: study the contestants’ behaviors under thecompetitive circumstance. Then we extract the competitionsbetween two players from our multiple-person game model,and the one-vs-multi game’s utility matrix is as follows (seeTableI).

In the below game matrix, {C1, · · · , Ci−1, Ci+1, · · · , Cn}denotes the challenge strategies that are available to player i.And C represents the offensive strategy that one constant outof N − 1 players can adopt to challenge player i, while Drepresents the defensive strategy.

In order to know the contestants’ rational actions in theface of competitions, we try to infer the contestants’ optimalstrategies by solving the mathematical matrix’s Nash Equilib-rium, including the pure strategy Nash Equilibrium and themixed strategy Nash Equilibrium.

Theorem 1 (Optimal Strategy): The optimal strategy πik

for player i satisfies the following constraints:

(1)∀j, j �= k,

P (Xi,j = 1) ≤ P (Xi,k = 1), Ci(Xi,k) ≤ Ci(Xi,j)

(2)P (Xi,k = 1) >Ci(Xi,k)−Gaini(Xi,k = 0)

Gaini(Xi,k = 1)−Gaini(Xi,k = 0)

According to the game form in TableI, there are four majorcases for the utility of every player i:Case 1: μi(πik)+μi(πji)when player i challenges player k out of N − 1 players andplayer j out of N − 1 players also challenges player i;Case

2: μi(πik) when player i challenges player k out of N − 1players and no one of N − 1 players challenges player i;Case3: μi(πji) when player i adopts the defensive strategy andplayer j out of N − 1 players challenges player i;Case 4:0 when neither player i nor player j chooses to launch anoffense.

No matter what strategy the rival out of N − 1 playerschooses, what player i needs to evaluate is whether μi(πik) ispositive. If there exists such a player k strategy profile πik isthe optimal strategy for player i. Otherwise, if for any playerk, μi(πik) < 0, then the best response of player i is to staydefensive instead of launching challenges. In the case whenμi(πik) = 0, then player i can take either the offensive strategyor the defensive strategy.

Using the Eq.(3.3), we can further derive the expression ofμi(πik) as the following:

μi(πik) = P (Xi,j) ·Gaini(Xi,j)− Ci(Xi,j)

= P (Xi,k = 1) ·Gaini(Xi,k = 1)

+P (Xi,k = 0) ·Gaini(Xi,k = 0)− Ci(Xi,k)

= P (Xi,k = 1) ·Gaini(Xi,k = 1)

+[1− P (Xi,k = 1)] ·Gaini(Xi,k = 0)

−Ci(Xi,k)

= P (Xi,k = 1)

·[Gaini(Xi,k = 1)−Gaini(Xi,k = 0)]

+Gaini(Xi,k = 0)− Ci(Xi,k) (3.7)

The rule of the challenge phase at the TopCoder algorithmcontests has specified the constant value of Gaini(Xi,k = 1)and Gaini(Xi,k = 1) in the Eq.(3.2). Thus, the variables inEq.(3.7) are the probability of winning the challenge P (Xi,k =1) and Ci(Xi,k), the normalized cost of the effort to findthe valid test case. From Eq.(3.7), we have the condition forthe player i to have a positive utility from the challenge:μi(πik) > 0 if and only if P (Xi,k = 1) > Vi(Xi,k), where

Vi(Xi,k) =Ci(Xi,k)−Gaini(Xi,k=0)

Gaini(Xi,k=1)−Gaini(Xi,k=0) . The player i evalu-

ates all the possible strategies where the winning probabilityP (Xi,k = 1) is higher than the threshold value Vi(Xi,k). Toachieve the maximum expected utility, he needs to select theweakest rival k against whom he can find the valid test casewith the highest winning probability P (Xi,k = 1) and lowesteffort Ci(Xi,k).

When the probability of challenging every rival is lowerthan the threshold value, i.e., ∀k, k �= i, P (Xi,k = 1) <Vi(Xi,k), the player i will avoid the offensive strategy becausethe offense utility μi(πik) < 0. Therefore the player has noincentive to challenge other players. If such a condition existsfor every player, i.e., ∀i, k, k �= i, P (Xi,k = 1) < Vi(Xi,k),every player prefers to stay defensive, thus keeping the strategyprofile at π0.

4. DATA ANALYSIS OF TOPCODER SRM CHALLENGES

As mentioned in the previous section, there are Nash E-quilibria in our multiple-person game model. If the probabilityof a successful challenge and the cost of challenging areknown with other factors fixed, it is obvious for contestantsto calculate their optimal strategies. To validate the theoreticresult, we need to collect the dataset from the TopCoder plat-form and make further investigation to the winning probability

342

TABLE I: One-VS-Multi Game

Contestant j from N-1 playersC D

Contestant i

C1μ1(πi1) + μj(πji) μ1(πi1)

μi(πi1) + μi(πji) μi(πi1)

.

.

.

.

.

....

.

.

....

Cjμj(πij) + μj(πji) μj(πij)

μi(πij) + μi(πji) μi(πij)

.

.

.

.

.

....

.

.

....

Cnμn(πin) + μj(πji) μn(πin)

μi(πin) + μi(πji) μi(πin)

Dμj(πji) 0

μi(πji) 0

P (Xi,j = 1) and the challenge cost Ci(Xi,j).

As the winning probability is a latent variable in the processof TopCoder SRM contests, it’s not intuitive for us to tellwhether a participant is willing to take the offensive strategyor the defensive one. We want to use statistical analysis on theSRM dataset to answer the following research questions:

Q1: What factors are relevant to the probability to win achallenge?

Q2: What is the winning probability determined by? Canit simply be expressed as the ratio between the allocated skilllevels of the contestants, such as P (Xi,j = 1) = si

si+sjfor

challenges between contestants i and j? Or does there existsome certain mathematical formula for the probability?

In order to solve these problems, we implement a Webcrawler to download data of the TopCoder algorithm contestsheld before Feb 20th 2014. For each contest, we focus on thedataset that is relevant to the algorithm challenge phase. Andthis data collection is composed of records with 19 attributesincluding contest ID, division ID, room ID, room name, chal-lenger ID, challenger name, defender ID, defender name, thechallenged solution, the difficulty level of the solution, givenscores for the challenged solution, challenger’s old rating,defender’s old rating, challenger’s coding time, defender’scoding time, challenger’s solution scores, defender’s solutionscores, challenge time, and the result of a challenge (succeededor not). Note each player has an algorithm rating that iscalculated by TopCoder based on his historical performancein the algorithm contests. The value can be measured as theplayer’s algorithm programming ability.

After the data cleaning process to remove invalid data itemssuch as null values, we get 95477 challenge items, whichinvolve 576 algorithm contests. Among these contests, contest14514 has the largest number of challenges with a value of 853,while the smallest number of challenges is 1. We replaced thevalues of “Yes” and “No” with 1 and -1 for the result thatmarks the success or failure of the challenge. In addition tothese above extracted properties, we guess that the differencevalue of both players’ rating involved in the same challengemay also be important. Therefore, we computed the differencevalue for every challenge item and added it into subsequentanalyses. On the dataset, we apply correlation analysis to studyprimary factors related to the result of a challenge.

Fig. 2: Correlation analysis output1 for algorithm challenges

Fig. 3: Correlation analysis output2 for algorithm challenges

A. Correlation Analysis for Algorithm ChallengesFrom Figure2 and Figure3, one can find that the result

of a challenge (succeeded or not) has significant correlationwith the old rating, coding time, solution scores for bothchallenger and defender, challenge time, and the differencevalue of the two competitors’ ratings. Among these factors,the difference value of ratings and the challenger’s old ratingare the top two elements that has the strongest influenceon the challenge result. The correlations for the challenger’srating and submission scores are positive, while the defender’sold rating and challenge time are negative. These correlationresults are in line with our intuition. Specifically, as thechallenger’s rating is an indicator of his programming skill,a challenger with a high rating seems more likely to find thebugs in his rival’s code and successfully launch an offenseif his rival is a low-ranked defender. Moreover, the negativecorrelation of the challenge time with the challenge outcomes

343

Fig. 4: The relationship between the result and the skill ratiop

Fig. 5: Algorithm rating division

implies that within the shorter challenge time the challengeri have more chances to find a valid test case to fail the rivalj’s code, as the total challenge time is limited to 15 minutes.Given the fact that the challenge time is proportional to theeffort cost Ci(Xi,j), the lower cost reduces the threshold valuefor the player i to make the decision of an offense.

With the correlation analysis result, we remove someirrelevant factors and retain the rest for follow-up data analysis.Then we attempt to find if there exists some explicit mathe-matical expression for the challenge result or the probabilityin order to make some predictions. Specifically, if explicitformula is founded to compute the winning probability andthe challenge cost for challengers, one can infer contestants’decision-making under the circumstances of competitions inadvance.

Firstly we examine if the ratio between the allocated skilllevels of the players simply can express the winning probabilitywell. And then we test our expectation by computing the ratioof the allocated skill levels and doing the correlation analysisbetween the factor succeeded or not and the factor skill ratio.The result (see Figure4) shows a significant positive correlationbetween them with a relatively low coefficient value 0.209.Thus, the ratio of the allocated skill levels for players simply isnot enough to express the winning probability well. And thenwe explore to use all the retained significant factors to findsome proper mathematical formula to express the probabilityby applying a few regression and classification methods instatistics including linear regression, multi-variable regression,logistic regression, principal component analysis(PCA), andsupport vector machine(SVM). However, none of the methodscan yield a certain prediction formula to describe the functionalrelationship between the challenge result and the primaryfactors. For this, there is one possibility that some key factorswhich are not open to the public have not yet been includedin our data collection, i.e., the submitted solution source code.If so, we should ask the TopCoder to share more data with usfor further research.

B. Empirical Analysis for Algorithm ChallengesBased on the above-mentioned outcomes of our multiple-

person game model and correlation analysis, we propose thefollowing three hypotheses to study the contestants’ compet-itive behaviors in TopCoder algorithm contests. According to

Fig. 6: Win/loss ratio of challengers

the rating system on TopCoder[26], we divide contestants intothree rating levels with different colors–red, yellow and blue(see Figure5). And we collectively described these levels asrank1(2200+), rank2(1500-2199) and rank3(1200-1499) in thepaper.

H1: A contestant with a high rating is more likely tomake a successful challenge.

As the result of the correlation analysis indicates, the chal-lenger’s old rating may have a positive and significant impacton the result of a challenge. To further test the hypothesis H1,we compute the number and ratio of challengers under thetwo cases (win and loss) respectively according to the ratingdivision rule. And Figure6 shows the results.

As shown in the Figure6, one can see that the percentageof successful challenges increases along with the growth ofchallengers’ rating level. Therefore, we believe the assumptionthat a contestant with a high rating is more likely to make asuccessful challenge is valid.

H2: High-ranked contestants seem to challenge more,and relatively low-ranked contestants are more likely to bechosen as rivals.

According to the correlation analysis, the difference valueof ratings and the challenger’s old rating are the top twofactors with the strongest influence on the challenge result.In other words, it seems easier for a challenger with a higherrating and a greater difference value against the defender tomake a successful challenge. Based on this observation, wehave reason to believe that high-ranked contestants are moremotivated to challenge with a high winning probability, whilelow-ranked contestants are more likely to be chosen as rivals.On the basis of the rating division, we counted the number ofchallengers and defenders respectively to validate the hypothe-sis H2. The outcomes and the frequency distribution histogramfor the difference value are shown below.

According to Figure7(a), the number of challengers doesn’tgo up with the rise of rating. Instead, players in the middlerating level(rank2) are the biggest group who actively launchesthe challenges. One possible explanation is that a player’sdecisions about offense are affected by psychological factorssuch as the confidence of the player. Specifically, a player inrank3 often has less experience in TopCoder algorithm contestsand feels less confidence to launch a risky challenge againstthe programs submitted by peers with more experiences.Moreover, a player in rank1 may think that his performance inthe coding phase is perfect enough to stand out from the restover the competition, thus opting out to make an extra effortin the challenge phase.

And Figure7(b) indicates that the distribution of defenders

344

(a)

(b)

(c)

Fig. 7: The frequency distributions for challengers, defenders,and the difference value

is quite different from the distribution of the challengers. Thelower rating a player has, the more likely he is going to bechallenged by other players with superior ratings.

And from Figure7(c), one see that the distribution for thedifference value of players’ ratings is very similar to a normaldistribution. In other words, players are not always apt tochallenge one with the lowest skill level. On the contrary,they may choose to challenge the greatest players in the samecompetition room. In our perspective, the reason they behavelike this is that they are ambitious to compete in order to provethemselves and see who’s the best, as Jeff Howe ever said [27].

H3: contestants are less likely to challenge with the riseof the difficulty level.

From the Nash Equilibrium solution of our multiple-persongame model, we obtain that if the probability P (Xi,j = 1) fora player to make a successful challenge exceeds the thresholdvalue Vi(Xi,j), then he would like to select the challengestrategy. And the threshold value is mainly determined by theeffort cost for finding a valid test case. Obviously, it takes moreeffort to challenge the submitted solution with a high difficultylevel, thus resulting in the higher cost. Therefore, the expensivecost renders the higher threshold value that discourage a playerfrom launching offensive challenges for difficult programmingproblems.

(a)

(b)

Fig. 8: The number and ratio of challenges for the problemswith different difficulty levels

By observing the behaviors of contestants in algorithmcontests on TopCoder, we find that in general the number ofchallenges is low compared to the scale of participation inthese competitions. There are about 20 members in a virtualcompetitive room, while the mean value of challenges is about10. Moreover, the number of challenges with the highestfrequency is 2. As for this phenomena, we think that forcontestants in division I with a rating more than 1200, thechallenge cost seems too expensive for them to make offensivechallenges. During the algorithm challenge phase, there aresolutions for three problems with different levels of difficulty.We cross examine the challenged number and ratio for thethree different categories respectively.

Figure8 illustrates that both the number and ratio ofchallenged solutions decrease with the difficulty level ofcorresponding problems increases. It’s easy to understand, asthe challenge cost will inevitably increase along with therise of challenged solutions’ difficulty, and then it becomesharder for the winning probability to exceed the thresholdvalue Vi(Xi,j). Consequently, fewer contestants are willingto challenge others. That is to say, fewer contestants willparticipant in the algorithm challenge phase with the increaseof cost (leaving the other factors unchanged).

5. CONCLUSION AND FUTURE WORK

The paper presents a multiple-person game model to studythe competitive behaviors and phenomena occurred duringalgorithm challenges on TopCoder by applying complete in-formation game theory. With the Nash Equilibrium solution,we find that if a contestant’s probability to make a successfulchallenge exceeds the threshold value related to the cost oflaunching such a challenge, he will always decide to challenge.

345

The theoretical model is validated by empirical analysis of thedataset about the challenge phase of TopCoder SRM contest-s. Additionally, the analytical results indicate the followingconclusions: (1) Both the difference rating value betweenthe challenger and defendant as well as the challenger’s oldrating have a relatively strong and positive influence on theresult of a challenge. With the proficient programming skills,a contestant with a high rating is more likely to deliver asuccessful challenge. (2) Contestants with mid-level rankingtend to challenge more, and relatively low-ranked contestantsare easier to be chosen as rivals. But contestants with thehighest rating may be unwilling to challenge due to somepsychological factors. (3) Fewer contestants tend to be activein the algorithm challenge phase with the increase of the costfor initiating challenges.

The research results in the paper presents better under-standing of competitive behaviors in the offense-defense pro-cess of software crowdsourcing. Nonetheless, there are stillmany open questions to be addressed. In future work, wecan make more effort on the data collection to find the keydeterminative factors to express the winning probability in anexplicit mathematical formula. More attributes for challenges,especially the source code and test case in the challenges, needto be incorporated to our study to reveal the latent rational ofchallenge decisions in terms of the winning probability andthe challenge cost. More importantly, these program-relatedattributes may enable us to assess the effectiveness of offense-defense based quality assurance in the scenarios of TopCoderSRM challenges.

ACKNOWLEDGMENT

This work is funded by National High-Tech R&D Programof China (Grant No. 2013AA01A210) and the State KeyLaboratory of Software Development Environment (FundingNo. SKLSDE-2013ZX-03).

REFERENCES

[1] Crowdsourcing in 2014: With GreatPower Comes Great Responsibility.http://www.wired.com/2014/01/crowdsourcing-2014-great-power-comes-great-responsibility/

[2] Crowdsourcing. http://en.wikipedia.org/wiki/Crowdsourcing[3] TopCoder. http://www.topcoder.com/[4] Z. Hu and W. Wu, “A game theoretic model of software

crowdsourcing,” in 2014 IEEE 8th International Sympo-sium on Service Oriented System Engineering (SOSE),April 2014, pp. 446–453.

[5] K. R. Lakhani, D. A. Garvin, and E. Lonstein., “Top-coder (a): Developing software through crowdsourcing,”Harvard Business School, Tech. Rep., January 2010.

[6] W. Wu, W.-T. Tsai, and W. Li, “An evaluationframework for software crowdsourcing,” Frontiers ofComputer Science, vol. 7, no. 5, pp. 694–709, 2013.

[7] D. Fried, “Crowdsourcing in the software developmentindustry,” 2010.

[8] N. Archak, “Money, glory and cheap talk: Analyzingstrategic behavior of contestants in simultaneouscrowdsourcing contests on topcoder.com,” in Proceedingsof the 19th International Conference on World WideWeb, 2010, pp. 21–30.

[9] S. Nag, I. Heffan, A. Saenz-Otero, and M. Lydon,“Spheres zero robotics software development: Lessons oncrowdsourcing and collaborative competition,” in IEEEConference Publications, 2012.

[10] K. Mao, Y. Yang, M. Li, and M. Harman, “Pricingcrowdsourcing-based software development tasks,” inProceedings of the 2013 International Conference onSoftware Engineering, 2013, pp.1205–1208.

[11] K. Li, J. Xiao, Y. Wang, and Q. Wang, “Analysis ofthe key factors for software quality in crowdsourcingdevelopment: An empirical study on topcoder.com,” inProceedings of the 2013 IEEE 37th Annual ComputerSoftware and Applications Conference, 2013, pp.812–817.

[12] K. J. Boudreau, K. Lakhani, and M. E. Menietti, “Per-formance responses to competition across skill-levels inrank order tournaments: Field evidence and implicationsfor tournament design,” 2014.

[13] A. F. M. H. Y. J. W. M. F. Sarro and Y. Zhang, “App storeanalysis: Mining app stores for relationships betweencustomer, business and technical characteristics.”

[14] http://en.wikipedia.org/wiki/Game theory.[15] N. Archak and A. Sundararajan, “Optimal design of

crowdsourcing contests,” in ICIS, 2009.[16] S. Chawla, J. D. Hartline, and B. Sivan, “Optimal

crowdsourcing contests,” in Proceedings of the Twenty-third Annual ACM-SIAM Symposium on DiscreteAlgorithms, 2012, pp. 856–868.

[17] T. X. Liu, J. Yang, L. A. Adamic, and Y. Chen, “Crowd-sourcing with all-pay auctions: A field experiment ontaskcn,” in Proceedings of the American Society forInformation Science and Technology, 2011.

[18] DiPalantino, Dominic, Vojnovic, and Milan, “Crowd-sourcing and all-pay auctions,” in Proceedings of the10th ACM Conference on Electronic Commerce, 2009,pp. 119–128.

[19] Y. Zhang and M. van der Schaar, “Reputation-based in-centive protocols in crowdsourcing applications,” CoRR,vol. abs/1108.2096, 2011.

[20] G. Ranade and L. Varshney, “Tocrowdsource or not to crowdsource?” 2012.http://www.aaai.org/ocs/index.php/WS/AAAIW12/paper/view/5241

[21] D. Yang, G. Xue, X. Fang, and J. Tang, “Crowdsourcingto smartphones: Incentive mechanism design for mobilephone sensing,” in Proceedings of the 18th AnnualInternational Conference on Mobile Computing andNetworking, 2012, pp. 173–184.

[22] B. Hoh, T. Yan, D. Ganesan, K. Tracton, T. Iwuchukwu,and J.-S. Lee, “Trucentive: A game-theoretic incentiveplatform for trustworthy mobile crowdsourcing parkingservices,” in Proceedings of IEEE ITSC, 2012.

[23] A. Ghosh, “Social computing and user-generated content:A game-theoretic approach,” SIGecom Exch., vol. 11,no. 2, pp. 16–21, 2012.

[24] http://blog.csdn.net/touzani/article/details/1633572.[25] http://apps.topcoder.com/wiki/display/tc/Algorithm+Overview.[26] Algorithm Rating System. http://help.topcoder.com/data-

science/srm-and-mm-rating-systems/algorithm-rating-system/

[27] A. Bingham and D. Spradlin, Case Study: Virtual Soft-ware Development: How TopCoder Is Rewriting theCode. FT Press, 2011, ch. 7.

346

Game Theoretic Analysis for Offense-Defense Challenges of...

Documents

Transcript of Game Theoretic Analysis for Offense-Defense Challenges of...