Predicting the Success of Kickstarter Campaignsaldous/157/Old_Projects/Haochen_Z… · average...
Transcript of Predicting the Success of Kickstarter Campaignsaldous/157/Old_Projects/Haochen_Z… · average...
1
PredictingtheSuccessofKickstarterCampaigns
Peter(Haochen)Zhou
STAT157FinalProject
ProfessorAldous
December5th,2017
2
1. Introduction
Overthepastdecade,wehavewitnessedanincreasingpopularityincrowdfunding.
Crowdfundingisthepracticeoffundingaprojectorventurebyraisingmanysmall
amountsofmoneyfromalargenumberofpeople,typicallyviatheInternet.Duetoitseasy
accessibilityandbroadexposuretothepublic,entrepreneursandaspiringcreatorsseekto
utilizetheplatformtoraisemoneyfortheirprojects.Accordingtocrowdfundingresearch
firmMassolution,crowdfundinggrew167percentin2014.Toputthatintodollars,
crowdfundingplatformsraised$16.2billionin2014,escalatingto$34.4billionin2015.
For2016,crowdfundingtrendsarepredictedtopassVCfundingforthefirsttime.
Kickstarter,launchedin2009,isoneofthemajorcrowdfundingplatformsalongwith
Indiegogo,RocketHubandGoFundMe.Withamissiontohelpbringcreativeprojectstolife,
Kickstarterhasattractedover$13millionbackerstosuccessfullyfund134,767projects
withatotalamountexceeding$3.4billiontodate.InKickstarter,projectsarelistedinto15
categories:Art,Comics,Crafts,Dance,Design,Fashion,FilmandVideo,Food,Games,
Journalism,Music,Photography,Publishing,TechnologyandTheater.InKickstarter,
“Creators”arepeoplebehindtheprojectswhoareseekingfunding.“Backers”arepeople
whopledgemoneyforprojectstheybelieveinand“pledges”aremonetarycontributions
towardstheprojects.
Kickstarterhasaunique“All-or-Nothing”model,meaningunlessaprojectreachesits
fundinggoal,nobackerwillbechargedanypledgetowardsaproject.Ontheonehand,itis
lessriskyforbackersbecauseprojectsthatarenotfullyfundedhavelesslikelihoodof
completion.Ontheotherhand,itaddsmoreincentivesforcreatorstoconnectwiththe
publicandtofinishtheirprojects.AnotheruniquefeatureofKickstarteristhatbackerswill
onlyberewardedinexperienceorcreativeproductsinsteadofequity.Inaddition,
Kickstarterclaimsnoequityincreators’projectsandcreatorsmaintainfullownershipof
theirwork.
3
AlthoughnumerousprojectsenjoyedtremendoussuccessonKickstarter,theaggregated
averagesuccessrateisdecliningovertheyears.Datahaveshownthattheaggregated
averagesuccessrateofKickstartercampaignswas43.70%in2011,43.56%in2013and
35.82%atMay2017.Inthispaper,Iseektoexplorefactorsthataffectthesuccessof
Kickstartercampaigns.Ibelievethatsuccessratediffersacrossdifferentcategoriesand
priorresearcheshaveshownthatFilm&VideoandMusicarethelargestcategoriesand
haveraisedthemostamountofmoney.Film&Video,MusicandGamescombinedaccount
foroverhalfofthemoneyraised.MyprojectattemptstopredictthesuccessofKickstarter
campaignsbyanalyzingthesignificanceoffactorsidentifiedinthedata.Ihypothesizethat
amongallfactorslisted,thecreator’spriorsuccess,goal,numbersofperks,creator’s
Facebookfriends,totalwordcountinthedescriptionaremoresignificantthanother
factors.
2. DataandMethodsIcollectedmydatafromKaggle.com,theopenonlinedatabase.Thedatacontain4000
most-backedprojectsand4000liveprojectswithlimitedinformationsuchasthepledged
amount,category,goal,location,blurbandnumberofbackers.Anotherdatasetcontains
3652Kickstarterprojectsin2017withcomprehensiveinformationsuchasaproject’s
creator,goal,maincategory,duration,numberofcomments,numberofupdatesand
creator’sFacebookfriends.Thereare51uniquecharacteristicsforeachprojectandI
intendtousethisasmymainsourceofdata.Hereisasnapshotofmydataset:
Sincethedataispre-processed,IwillfirstcleanthedatainRtoconvertthemintoa“ready-
to-use”format.Secondly,IwillperformlinearregressioninStatatoexplorethe
significanceofkeyvariablestotestmyoriginalhypothesisthatthecreator’spriorsuccess,
goal,numbersofperks,creator’sFacebookfriendsandtotalwordcountinthedescription
aremoresignificantthanotherfactors.Inadditiontolinearregression,Iwillgenerateplots
4
tovisualizethedataandfurtherexploretherelationshipbetweensuccessrateandkey
factors.DuetoKickstarter’s“All-or-Nothing”model,Iwilllimitthescopeofmyprojectand
onlyanalyzeprojectswithsuccess=1(“success”isadummyvariableandtakesthevalueof
1whentheprojectmeetsitsfundinggoaland0otherwise).
GiventhedecliningaggregatedaveragesuccessrateofKickstartercampaigns,Ihopethis
papercanuncoverthemythsbehindthesuccessandserveasapredictionofsuccessfor
creatorsandbackers.
3. LiteratureReviewPriorresearchershavestudiedsimilartopicsandprovidedinsightfulfindingstothe
topic.Thereisevenawebsite(http://sidekick.epfl.ch)thatshowsreal-timepredictionof
thesuccessofKickstartercampaigns.Inthissection,Iwillconductliteraturereviewoftwo
academicpapers“PredictingtheSuccessofKickstarterCampaigns”(Hussain,Kamel&
Radhakrishna)and“LaunchHardorGoHome”(Etter,Grossglauser&Thiran)inorderto
gainmoreinsightsformyownanalysis.
In“PredictingtheSuccessofKickstarterCampaigns”,theauthorsusedamixtureof
simpleandcomplexfeaturestoperformtheirpredictivetask.Simplefeaturesinclude:time
features,locationfeatures,categoryfeatures,textfeatures,goal,staffpickedandnumberof
rewardlevels.Thecomplexfeaturesinclude:previousprojectsbycreator,averagemoney
perrewardlevel,preparednessanddistancetomeanblurblength.Afteridentifyingthe
features,theauthorsusedk-nearestneighbors,logisticregression,supportvector
classificationandrandomforesttocarryoutthemodel.K-nearestNeighborsproducedthe
resultthatonly4ofthesimplefeatureshavepredictivevalue:thegoalamount,lengthof
theprojectdescription,thenumberofrewardlevelsandtheaveragedollaramountper
rewardlevel.Inaddition,RandomForestisprovedtobethebestclassifieryieldingthe
accuracyof80.37%.Itauthorsproposedabettermodeltopredictthesuccessandfailureof
Kickstartercampaigns.
5
In“LaunchHardorGoHome”,theauthorsstudiedthepredictionofsuccessontwo
features:thetime-seriesofmoneypledgedandsocialattributes.Theyshowedthat
predictorsthatusetime-seriesfeaturesreachahighpredictionaccuracyof85%.Thesocial
predictorsreachaloweraccuracybutthetwofeaturescombinedyieldhighprediction
accuracy.Thetwopapersarehighlysignificantandlaidthefoundationforfuture
researchesonsimilartopics.
4. Analysis 4.1AnalysisonKickstarterprojectstatistics
6
FromthedataontheKickstarterwebsite,weobservethatacross15categories,the5
categoriesthathavethehighestsuccessratesare:Dance,Theater,Comics,MusicandArt.
The5categoriesthathavethelowestsuccessratesare:Technology,Journalism,Crafts,
FashionandPublishing.Itisintuitivethatsuccessratevariesacrossdifferentcategories
duetotheuniquenatureofeachcategory.
However,thedataitselfisnotconvincingenoughbecausethenumberoflaunchprojects
differssignificantlyacrosscategories.“Dance”onlyhas3,765launchprojectswhereas
“Film&Video”has64,773projects.Thesmallprojectsizemaynotberepresentativeofthe
wholepictureandthusweneedtoconductmorein-depthanalysis.
4.2LinearRegressiononkeyvariablesNext,Iwillconductlinearregressiontoexplorethesignificanceofkeyvariables.Inmy
regression,thedependentvariableisnaturallogoftotalpledges:ln(totalPledge)andIwill
addimportantindependentvariablesoneatatimetoexplorethebestfittedmodel.
Intuitively,Ihypothesizethatcreator’sinitialgoal,numberofcommentsundertheprojects,
numberofFacebookfriendsthecreatorhas,creator’spastsuccessrate,numberof
competitors,numberofperksthecreatorprovidesforbackers,totalwordcountinthe
blurb,numberofimagesinthedescriptionsandnumberofupdatesareimportantfactors
forsuccess.Hereattachedaregressiontablewithdifferentindependentvariables.Intheregression
output,adjustedR-squaredmeasureshowwellourindependentvariablesjointlyexplain
thevariationindependentvariable.WenoticethattheadjustedR-squaredhasabigjump
(0.3938to0.4538)fromregression(5)to(6).Itshowsthatpastsuccessrateisan
importantindependentvariableinourmodel.
7
regression (1) (2) (3) (4) (5) (6) (7)
dependent lnpledge lnpledge lnpledge lnpledge lnpledge lnpledge lnpledgeintercept 8.2597
(0.0403)
p=0.000
8.2309
(0.0389)
p=0.000
8.1947
(0.0497)
p=0.000
7.8686
(0.0927)
p=0.000
7.6155
(0.9492)
p=0.000
7.0257
(0.1011)
p=0.000
6.8416
(0.0994)
p=0.000
goal 0.000043
(1.66e-
06)
p=0.000
0.00004
(1.63e-
06)
p=0.000
0.00004
(1.64e-
06)
p=0.000
0.00004
(1.63e-
06)
p=0.000
0.00004
(1.60e-
06)
p=0.000
0.00003
(1.58e-
06)
p=0.000
0.00003
(1.56e-
06)
p=0.000
comments -- 0.001426
(0.0001)
p=0.000
0.00142
(0.0001)
p=0.000
0.00139
(0.0001)
p=0.000
0.00129
(0.0001)
p=0.000
0.00134
(0.0001)
p=0.000
0.00124
(0.0001)
p=0.000
friends -- -- 0.08287
(0.0709)
p=0.243
0.0906
(0.0706)
p=0.199
0.0869
(0.0689)
p=0.207
0.1301
(0.0655)
p=0.047
0.1320
(0.0633)
p=0.037
pastsuccess -- -- -- 0.5689
(0.1367)
p=0.000
0.5329
(0.1335)
p=0.000
0.4205
(0.1270)
p=0.001
0.3509
(0.1230)
p=0.004
competitors -- -- -- -- 0.0037
(0.0004)
p=0.000
0.0037
(0.0004)
p=0.000
0.0031
(0.0004)
p=0.000
perks -- -- -- -- -- 0.0709
(0.0055)
p=0.000
0.0558
(0.0055)
p=0.000
wordcount -- -- -- -- -- -- 0.0005
(0.0001)
p=0.000
images -- -- -- -- -- -- --
updates -- -- -- -- -- -- --
8
2R 0.3057 0.3560 0.3562 0.3631 0.3938 0.4538 0.4890
regression (8) (9)
dependent lnpledge lnpledge
intercept 6.8957
(0.0971)
p=0.000
6.8548
(0.0963)
p=0.000
goal 0.00003
(1.55e-06)
p=0.000
0.00003
(1.53e-06)
p=0.000
comments 0.0010
(0.0001)
p=0.000
0.0009
(0.0001)
p=0.000
Friends
0.1110
(0.6718)
p=0.072
0.1416
(0.0613)
p=0.021
pastsuccess 0.3920
(0.1200)
p=0.001
0.3967
(0.1187)
p=0001
competitors 0.0021
(0.0004)
p=0.000
0.0017
(0.0004)
p=0.000
perks 0.0472
(0.0055)
p=0.000
0.0409
(0.0055)
p=0.000
wordcount 0.0003
(0.0001)
p=0.000
0.00025
(0.00005)
p=0.000
images 0.0220 0.01987
9
(0.0025)
p=0.000
(0.0025)
p=0.000
updates -- 0.0379
(0.0065)
p=0.0002R 0.5144 0.5249
Analyzingtheregressionoutput,regression(9)givesusthebestfitwiththeadjustedR-
squaredof0.5249.Inregression(9),thedependentvariableisln(totalPledge)andthe
independentvariablesare:goal,numComments,noFacebookFriends,pastSuccessRate,
numCompetitors,numPerks,totWordCount,numImagesandnumUpdates.
Thus,myinitialmodelcanbewrittenas:
ln 𝑡𝑜𝑡𝑎𝑙𝑃𝑙𝑒𝑑𝑔𝑒 = b, + b.𝑔𝑜𝑎𝑙 + b/𝑐𝑜𝑚𝑚𝑒𝑛𝑡𝑠 + b4𝑓𝑟𝑖𝑒𝑛𝑑𝑠 + b8𝑝𝑎𝑠𝑡𝑠𝑢𝑐𝑐𝑒𝑠𝑠 +
b;𝑐𝑜𝑚𝑝𝑒𝑡𝑖𝑡𝑜𝑟𝑠 + b<𝑝𝑒𝑟𝑘𝑠 + 𝛽?𝑡𝑜𝑡𝑎𝑙𝑤𝑜𝑟𝑑 + 𝛽A𝑖𝑚𝑎𝑔𝑒𝑠 + 𝛽B𝑢𝑝𝑑𝑎𝑡𝑒
Plugginginthevaluesfromtheregressionoutput,mymodelis:
ln 𝑡𝑜𝑡𝑎𝑙𝑃𝑙𝑒𝑑𝑔𝑒
= b, + 0.0000261𝑔𝑜𝑎𝑙 + 0.0008538𝑐𝑜𝑚𝑚𝑒𝑛𝑡𝑠 + 0.1416243𝑓𝑟𝑖𝑒𝑛𝑑𝑠
+ 0.3967113𝑝𝑎𝑠𝑡𝑠𝑢𝑐𝑐𝑒𝑠𝑠 + 0.0017468𝑐𝑜𝑚𝑝𝑒𝑡𝑖𝑡𝑜𝑟𝑠 + 0.0408655𝑝𝑒𝑟𝑘𝑠
+ 0.0002534𝑡𝑜𝑡𝑎𝑙𝑤𝑜𝑟𝑑 + 0.0198685𝑖𝑚𝑎𝑔𝑒𝑠 + 0.0379527𝑢𝑝𝑑𝑎𝑡𝑒
10
Iuselog-linmodelfortheregressioninordertocorrectskeweddataandprovidemore
intuitiveinterpretationsofthecoefficients.Fromtheregressionresult,wecanobservethat
𝛽., 𝛽/, 𝛽8, 𝛽;, 𝛽<, 𝛽?, 𝛽A𝑎𝑛𝑑𝛽Baresignificantbecausetheirrespectivep-valueislessthan
0.01anditmeansthattheyaresignificantat1%significancelevel.However,weshould
notethat𝛽4hasap-valueof0.021,whichisnotsignificantat1%significancelevelbutis
significantat5%significancelevel.
Firstly,lookingattheregressionresult,wecanobservethatadjustedR-squaredis0.5249.
Itshowsthat52.49%ofchangeinln(totalpledge)canbeexplainedbytheindependent
variablesinthemodel.52.49%isadecentfittobeginwithandtheresultvalidatesmy
choiceofindependentvariables.
Secondly,fromtheregressionresult:numbersofFacebookfriends,numbersofperks,
numbersofupdates,numbersofimagesandpastsuccessrateinfluenceln(totalPledge)the
mostandarethusthemostsignificantindependentvariablesinregression(9).
Totestwhetherthismodelisagoodfit,Iwillperformtwotests:observedvs.predicted
valuestestandhomoscedasticitytest.
Totestforobservedvs.predictedvalue,Iplotln(totalPledge)against
ln(totalPledge)_predictinordertoexplorethelinearrelationship.Weshouldexpecta45-
degreepatterninthedata.Fromtheplot,ifweignoreoutliers,thelinearrelationshipis
relativelycloseto45-degree.Itshowsthatourmodelisdoingagoodjobinpredicting
lntotalPledge.
11
Totestforhomoscedasticity,Iuseresidualvsfittedvalueplottostudyifthereareany
patternsoftheresidualsplottedagainstfittedvalues.Iftherearenopatterns,itindicates
thatthemodeldoesnotviolatetheMultipleLinearRegressionassumptionsandourmodel
iswell-fitted.Ignoringtheoutliers,wecannotobserveanypatternsofresidualsvsfitted
valuesandtheerrorsappeartobehomoscedastic.Thus,wecanconcludethatourlinear
modelisagoodfit.
12
Iunderstandthattheremightbeomittedvariablesinthemodel.However,giventhemodel
isagoodfitprovedbytwotestsaboveandourindependentvariablescanexplainupto
52.49%ofthechangeinln(totalPledge),Iwillusethismodeltocarryoutmyanalyses.One
potentialimprovementofthispaperistoidentifyandstudyotherindependentvariables
thataresignificantinexplainingln(totalPledge).
4.3 HypothesesFormulationGiventheresultofmyregression,Iwillmodifymyhypotheses.Beforeconductingany
research,myoriginalhypotheseswere:Amongallfactorslisted,thecreator’spriorsuccess,
goal,numbersofperks,numberofFacebookfriends,totalwordcountinthedescriptionare
moresignificantthanotherfactors.Mymodifiedhypothesesare:numbersofFacebook
friends,numbersofperks,numbersofupdates,numbersofimagesandpastsuccessrateare
themostimportantfactorsforacampaign’ssuccess.
ForthenumbersofFacebookfriends,Ihypothesizethatthereisapositiverelationship.
ThemoreFacebookfriendsthecreatorhas,themoresupportheislikelytoreceive.When
thecreatorlaunchesaproject,theirFacebookfriendsaremorelikelytosupportthem
initiallycomparedtostrangers.Inaddition,inthecurrentera,socialmediaismore
powerfulthaneverandFacebookisanidealplatformtoconnectwithsupportersand
garnersupportforcreativeprojects.
Fornumbersofperks,Ihypothesizethatperksareattractivetobackers.Sincebackers
cannotclaimanyequityfromtheproject,perksaretheonlyrewardstheycanreceivefrom
creators.TheperksinKickstartervaryinformsandmonetaryvaluesandserveas
incentivesforbackers.
Therelationshipbetweenthenumberofupdatesandthesuccessofcampaignsisnot
obviousatthebeginning.Onemayarguethateachupdateconveysmoreinformationofthe
projectandprovidesmorevaluableinformationtohelpbackersmaketheirdecisions.
13
However,itisdifficulttoquantifythisrelationshipandmoreresearchneedstobe
conductedtofurtherexplorethisrelationship.
ThenumberofimagesincreasestheprobabilityofsuccessofKickstartercampaigns.
Backersaremoreattractedbyvisualimagesandthenumberofimageshelpsthem
understandtheprojectsbetter.Thus,itisimportantforcreatorstouseimagesto
communicatewiththeKickstartercommunityandattractbackers.
Creator’spastsuccessrateisanimportantindicatorofthesuccessoftheircurrentprojects.
Pastsuccessrateindicatesthecreator’sexperienceandconnectionstotheKickstarter
community.Creatorsarelikelytocreatemultipleprojectsinthesamecategoryandifthey
haveconnectedwithbackersfortheirpreviousprojects,itislikelythatthebackerswill
continuetosupporttheircurrentendeavors.However,inthedataset,itisdifficultto
distinguishnewcreatorsandcreatorswithallpriorfailuressincethepriorsuccessrates
areboth0.
4.4 DataVisualizationandSummaryStatisticsMydatasetcontains3652Kickstarterprojectsin2017only.Ofthetotal3652projects,
1503projectsmettheirfundinggoalsandareconsidered“successful”.Forthispaper,Iwill
analyzetheattributesof1503successfulprojects.Asmentionedabove,Kickstarterhasa
unique“All-or-Nothing”modelandevenprojectswithhugeamountsoftotalpledgeswill
failiftheydonotmeettheirfundinggoals.Anotherpotentialimprovementofthispaperis
toanalyzetheattributesof“failed”projectssuchasfundinggoals.First,Iwillvisualizethe
successrateineachofthe15categoriesin2017:
14
Forprojectsin2017,thethreecategoriesthathavethehighestsuccessratesare:Comics
(72.92%),Theater(71.88%)andDance(51.43%).Thethreecategoriesthathavethe
lowestsuccessratesare:Crafts(28.71%),Technology(26.25%)andJournalism(17.78%).
Theresultisconsistentwiththeall-timedata.SincethelaunchofKickstarter,the5
categoriesthathavethehighestsuccessratesare:Dance,Theater,Comics,MusicandArt.
Originally,Iexpectedthatthesuccessratebycategoryvariesoveryearduetotheshiftof
trend.Theresultissurprisingandshowsthatthesuccessratebycategoryhardlychanges
overtime.
InKickstarter,aprojectisconsidered“successful”onceitmeetsitsinitialfundinggoal.
However,successrateonKickstarteraloneisnotindicativeoftheproject’ssuccessin
general.Therearemanynuancesbehindthesuccessrate.Afterobservingthesuccessrate
bycategoryin2017,thenextstepistostudytherelationshipbetweenpledgeamountand
category.Itisnaturaltothinkthattotalpledgeamountisrelatedtotheprojectsize,which
differsbycategory.Nowweusedatatofurtherexplorethisrelationship.
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
Successratein2017
15
Sincethereareonly8successfulprojectsinthecategory“Journalism”,Iomitthatcategory
fromtheanalysis.Aswecanseefromthegraphabove,across14categories,Technology,
Design,Food,GamesandFashionhavethehighestmeannaturallogoftotalpledges.Crafts,
Theater,Art,DanceandComicshavethelowestmeannaturallogoftotalpledges.Itverifies
ourpreviousassumptionthatpledgeamountisrelatedtoprojectsize,whichdiffersacross
categories.ProjectsinTechnology,GamesandFoodmightrequiremoreResearchand
Developmentcostsandmorecapitalinvestments.Thus,creatorsrequiremorefundingin
ordertocompletetheprojectsinthesecategories.Inaddition,duetothelargesizeofthe
projectsinthesecategories,thecreatorsaremorelikelytoraisehighamountsofpledges.
Naturally,wethinkthattotalamountofpledgesisrelatedtothenumberofbackers.
However,backerswithlargeamountsofpledgesmightaffecttheaccuracyofouranalysis.
Now,Iwillexploretherelationshipbetweennaturallogoftotalpledgesandthetotal
numberofbackersbycategorytostudywhichcategoriesaremorepronetooutliers.I
expectthelargerthetotalnumberofbackers,themorepledgesthereare.
16
Fromtheresultabove,wecanconcludethat:Technology,Games,DesignandFashionare
morepronetooutliersthanothercategories.Itmeansthatthesefourcategoriesaremore
likelytoattractbackerswithhugeamountsofpledges.Previously,wefoundoutthat
Technology,Games,DesignandFashionarefouroutofthefivecategoriesthatreceivethe
mostpledgesin2017.Wecanthusconcludethattheyareindeedthemostpopular
categoriesin2017.Inaddition,thecategorieswithmoreoutlierscorrespondtothe
categoriesthatreceivehigheraverageamountofpledges.Wecanconcludethatthelarger
thesizeoftheproject,themorelikelythecreatorwillreceivelargeamountsofpledges.
InKickstarter’s“All-or-Nothing”model,creatorsneedtoreceivethefullamountoftheir
goalswithinacertaintimeperiod,otherwise,theywillreceivenothing.Whilenearlyhalfof
theprojectsfail,therearestillalargenumberprojectsthataresignificantlyoverfunded.
Nowwestudytheaveragepercentageofoverfundedbycategory.Inouranalysis,
overfundingstatusiscalculatedbythetotalpledgedividedbycreator’sgoalanditgivesus
apercentageamount.
17
Whenwelookatthesummarystatisticsofoverfundedprojects,wefindthatthemeanis
640.1.Itindicatesthatthemeanamountoftotalpledgesforsuccessfulprojectsis6times
theoriginalgoal.However,thestandarddeviationisextremelylargeat9233.767andit
showsthattherearealotofvariationsintheoverfundingprojects.
Fromthecalculationofmeanpercentagegoal,wefindthatMusichasthehighest
percentagegoalof2724.68.Itshowsthatonaverage,projectsintheMusiccategorycan
achieveover27timesoftheirfundinggoals.Dancehasthelowestpercentagegoalof
115.4907anditshowsthatsuccessfulprojectsintheDancecategorybarelymeettheir
fundinggoalsonaverage.Therankingofallcategoriesaccordingtotheiroverfunding
statusis:Music,Technology,Fashion,Games,Design,Publishing,Comics,Art,Photography,
Journalism,Crafts,Food,Film&Video,TheaterandDance.Therankingprovidescreators
anindicationonhowwelltheirprojectswillperformbasedontheircategoriesaccordingto
thedatain2017.
Thedataonlycoverprojectsin2017anddonotrepresenttheoveralloverfundingstatusof
projectssincetheinceptionofKickstarter.However,Idonotexpectabigdifferenceaswe
2724.68%
602.83%490.52%
115.49% 0.00%
500.00%
1000.00%
1500.00%
2000.00%
2500.00%
3000.00%
Percentagegoalachievedin2017
18
foundoutpreviouslythatthesuccessrateacrosscategoriesisconsistentthroughoutthe
years.
Afterexploringdifferentattributesofsuccessfulprojectsbycategory,Iwillnowuseatable
tosummarizethestatisticsofthe5independentvariablesinmyhypotheses:
Fromthesummary,wecanobservethemin,median,mean,maxandstandarddeviationof
eachindependentvariable.ThenumberofFacebookfriendshasthewidestrange(1to
5000)andlargeststandarddeviationof795.Thesuccessratehasthelowestrange(0to1)
andsmalleststandarddeviationof0.26.Theseindependentvariablesareprovenbythe
linearmodelidentifiedbeforeandtheycollectivelyexplainupto52.49%ofvariationsin
predictednaturallogofpledgeamounts.
19
ThenumberofFacebookfriendsisacrucialfactorforthesuccessofprojectssincethe
moreFacebookfriendsacreatorhas,themoreconnectionshehaswiththecommunity.
However,thenumberofFacebookfriendsisalsotheindependentvariablethathasthe
largeststandarderror.SomepeoplemayarguethatthemoreFacebookfriendsacreator
has,thebetter.Iwouldarguethatalthoughacreatorshouldhaveacertainnumberof
Facebookfriends,thequalityoftheirFacebookfriendsisalsoimportant.Fromtheplot
above,weobservethattherelationshipbetweenthenumberofFacebookfriendsandthe
naturallogoftotalpledgeamountisextremelyweak.Intheageofsocialmedia,people
tendtobefriendwithstrangersonsocialmediaplatforms.Inaddition,somepeoplemay
evenpurchase“friends”and“followers”onFacebookorTwittertoappeartobemore
popularthantheyactuallyare.Althoughitmayseemthatthecreatorhasawide
connectiononsocialmediawithalargenumberoffollowers,thetruthisthatthe“fake
friends”willnothavemuchinfluenceonthesuccessofprojects.Ingeneral,whilewe
recognizethesignificanceofthenumberofFacebookfriendsonthesuccessofprojects,we
shouldnotignorethenuancesbehindthenumberofFacebookfriends.
TostudywhetherthenumberofFacebookfriendsvariesbycategory,Igeneratethe
followinggraph:
20
Aswecanobservefromtheplotabove,theredoesnotseemtobeastrongrelationshipof
theaveragenumberofFacebookfriendsacrossdifferentcategories.However,Musicand
DancecategoriesstandoutsincecreatorshavemoreFacebookfriendsonaverage
comparedtoothercategories.Therecouldbevariousreasonsbehindthisfinding.For
example,creatorsinMusicandDancecategoriesmaytendtobemoreartisticandspend
moretimeandenergyonsocialmedia.However,thereasonisnotvalidenoughwithout
furtherresearchandIwillexcludethisdetailedpartofanalysisfrommypaper.
Asforthenumbersofperks,onaveragecreatorsprovidebackerswitharound9perksand
therangeisfrom1to74.Fortheproject,Iamusingthenumberoftheperksinmyanalysis
insteadofthevaluesoftheperks.Thenumbersofperksprovideanincentiveforbackers
whentheyconsideraboutbackingtheprojectssincetheycannotclaimanyequityfromthe
projects.Hereisthevisualizationofthenumberofperksandthenaturallogoftotal
pledges.Aswecanobservefromtheplotbelow,thereisapositiverelationshipbetween
thenumberofperksandthenaturallogoftotalpledge.Itshowsthatperksareattractiveto
backersandcaninfluencetheirdecisionsonKickstarter.
Tostudywhetherthenumberofperksvariesbycategory,Igeneratethefollowinggraph.
Fromthegraph,wecanobservethattheaveragenumberofperksaresimilaracross
categoriesexceptforComics.Inourpreviousanalysis,wefoundoutthatprojectsinthe
Comicscategoryhavethehighestsuccessratein2017.Thehighsuccessratecouldbedue
21
tothelargenumberofperkscreatorsprovideinComicscategory.Perksareindeed
importantforthesuccessofKickstartercampaigns.
Anotherimportantindependentvariableinmymodelispastsuccessrate.Themeanpast
successrateis0.5737andthemaxis1.Thedatavisualizationisattachedbelow:
Aswecanobservefromtheplot,themajorityofprojectshaveapastsuccessratebetween
0.5and1andonly2projectshavepastsuccessratebelow0.Wewouldexpectcreators
22
whohaveahighpastsuccessratetobemoreexperiencedinKickstarterandhavemore
supportersintheKickstartercommunity.Asaresult,theywillhaveahigherprobabilityof
successfortheirnewprojects.Creatorswhohaveahighpastsuccessratemayalsohave
moremotivationstostartnewprojects.Inaddition,pastsuccessratecanserveasthe
“resume”forcreatorsandmayinfluencethechoiceofbackers.Backerswilllikelytohave
morefaithincreatorswithhighpastsuccessrates.
Asfarasthenumberofsuccessfulcampaigns,creatorswhohaveasuccessfulprojectin
2017have1.993successfulprojectsonaverageinthepastwiththestandarddeviationof
2.89.Itfurthershowsthatnotallcreatorswithcurrentsuccessfulprojectshaveextensive
experiencesonKickstarter.However,moreexperiencesdefinitelyhelpcreatorsin
launchingtheirfuturecampaigns.
5. ConclusionandFutureWorkInmyproject,IexploredkeyfactorsthataffectthesuccessofKickstartercampaigns.Dueto
thedecliningsuccessrateovertheyears,IbelievethatmyprojectisrelevantandIhopeto
uncoverthemythsbehindthesuccessofKickstartercampaigns.
Iidentifiedalinearregressionmodelthatexplainsupto52.49%ofthevariationsinnatural
logoftotalpledge.Inmymodel,thedependentvariableisthenaturallogoftotalpledges
andtheindependentvariablesare:goal,numComments,noFacebookFriends,pasSuccessRate,
numCompetitors,numPerks,totWordCount,numImagesandnumUpdates.Amongthenine
independentvariables,thefivevariablesthathavethelargestcoefficientare:Facebook
friends,numberofperks,numberofupdates,numberofimagesandpastsuccessrate.
23
Inmygraphicanalysis,Ifirstfindoutthatthesuccessrateofcategoriesstaysconstant
throughtime.Inaddition,projectsinTechnology,GameandDesignhavethelargest
averagenumberoftotalmoneyraised.Coincidently,projectsinTechnology,Gameand
Designhavemoreoutliersthanothercategories.Itshowsthatprojectsinthesecategories
havelargersizesonaverageandrequiremorefunding.Moreover,forprojectsin2017,the
topthreeoverfundedcategoriesare:Music,TechnologyandFashion.
TostudydeeplyaboutthenumberofFacebookfriends,Iconcludethattheaveragenumber
ofFacebookfriendsstaysrelativelyconstantacrosscategories.However,creatorstendto
havemoreFacebookfriendsonaverageinMusicandDancecategories.Asforthenumber
ofperks,thereisapositiverelationshipbetweenthenumberofperksandthenaturallogof
totalpledge.Andthemorepastsuccessacreatorhas,themorelikelyhewillsucceedinthe
currentproject.
Ibelievethatthereisstillmuchroomforimprovementsformyproject.Tobeginwith,my
dataonlycoverprojectsin2017.MyconclusionwillbemoreaccurateandpredictiveifI
canobtaintheall-timedataofKickstarter.Inaddition,thereareothercharacteristicsof
theprojectsthatmightbeinterestingtoexploresuchastheduration,locationetc.Lastbut
notleast,Ihopetogainmoreknowledgeaboutmachinelearningandbuildamodelwith
moreaccuracyinpredictingthesuccessofKickstartercampaigns.
Thisprojectdefinitelybroadensmyhorizononstatisticalmethodsandanalysis.Iwould
liketothankProfessorAldousforhisguidanceandsupportontheproject.Ihopetokeep
learninginthefieldofStatisticsandequipmyselfwithmoreknowledgetoconductmore
in-depthanalysisinthefuture.
24
6. AppendixStatacommand:
clearall
importexcel"\\Client\H$\Desktop\STAT157\Kickstarter_success.xlsx",sheet("Sheet1")
firstrow
genlntotalPledge=ln(totalPledge)
reglntotalPledgegoal
reglntotalPledgegoalnumComments
reglntotalPledgegoalnumCommentsnoFacebookFriends
reglntotalPledgegoalnumCommentsnoFacebookFriendspastSuccessRate
reglntotalPledgegoalnumCommentsnoFacebookFriendspastSuccessRate
numCompetitors
reglntotalPledgegoalnumCommentsnoFacebookFriendspastSuccessRate
numCompetitorsnumPerks
reglntotalPledgegoalnumCommentsnoFacebookFriendspastSuccessRate
numCompetitorsnumPerkstotWordCount
reglntotalPledgegoalnumCommentsnoFacebookFriendspastSuccessRate
numCompetitorsnumPerkstotWordCountnumImages
reglntotalPledgegoalnumCommentsnoFacebookFriendspastSuccessRate
numCompetitorsnumPerkstotWordCountnumImagesnumUpdates
predictlntotalPledge_predict
labelvariablelntotalPledge_predict"lntotalPledge_predict"
scatterlntotalPledgelntotalPledge_predict
rvfplot,yline(0)
Rcode:
```{r}
library(readxl)
Kickstarter_success<-read_excel("~/Desktop/STAT157/Kickstarter_success.xlsx")
head(Kickstarter_success)
25
```
```{r}
a<-Kickstarter_success[Kickstarter_success$mainCategory=='Art',]
b<-Kickstarter_success[Kickstarter_success$mainCategory=='Food',]
c<-Kickstarter_success[Kickstarter_success$mainCategory=='Music',]
d<-Kickstarter_success[Kickstarter_success$mainCategory=='Games',]
e<-Kickstarter_success[Kickstarter_success$mainCategory=='Publishing',]
f<-Kickstarter_success[Kickstarter_success$mainCategory=='Photography',]
g<-Kickstarter_success[Kickstarter_success$mainCategory=='Design',]
h<-Kickstarter_success[Kickstarter_success$mainCategory=='Comics',]
i<-Kickstarter_success[Kickstarter_success$mainCategory=='Technology',]
j<-Kickstarter_success[Kickstarter_success$mainCategory=='Fashion',]
k<-Kickstarter_success[Kickstarter_success$mainCategory=='Theater',]
l<-Kickstarter_success[Kickstarter_success$mainCategory=='Journalism',]
m<-Kickstarter_success[Kickstarter_success$mainCategory=='Dance',]
n<-Kickstarter_success[Kickstarter_success$mainCategory=='Film&Video',]
o<-Kickstarter_success[Kickstarter_success$mainCategory=='Crafts',]
```
```{r}
A<-c(121/269,70/232,191/373,196/447,170/420,29/79,158/347,105/144,110/419,
79/254,45/64,8/45,18/35,174/423,29/101)
labels<-
c("Art","Food","Music","Games","Publishing","Photography","Design","Comics","Technolog
y","Fashion","Theater","Journal","Dance","Film&Video","Crafts")
pie(A,labels=labels,radius=1,cex=0.8)
```
```{r}
summary(Kickstarter_success$facebookFriends)
summary(Kickstarter_success$numPerks)
summary(Kickstarter_success$numUpdates)
summary(Kickstarter_success$numImages)
26
summary(Kickstarter_success$pastSuccessRate)
```
```{r}
sd(Kickstarter_success$facebookFriends)
sd(Kickstarter_success$numPerks)
sd(Kickstarter_success$numUpdates)
sd(Kickstarter_success$numImages)
sd(Kickstarter_success$pastSuccessRate)
```
```{r}
statistics<-matrix(c(1,574.5,824.1,5000,795.2921,1,8,9.617,74,6.144794,0,4,5.48,44,
5.653509,0,8,14.8,117,16.42112,-0.5,0.5,0.5737,1,0.2567152),ncol=5,byrow=TRUE)
colnames(statistics)<-c("Min","Median","Mean","Max","SD")
rownames(statistics)<-c("FacebookFriends","Perks","Updates","Images","Successrate")
statistics<-as.table(statistics)
statistics
```
```{r}
library(ggplot2)
p1<-ggplot(Kickstarter_success,aes(x=facebookFriends))
p1+geom_histogram()+ggtitle("NumberofFacebookfriendsinsuccessfulprojects")
p2<-ggplot(Kickstarter_success,aes(x=numPerks))
p2+geom_histogram()+ggtitle("Numberofperksinsuccessfulprojects")
p3<-ggplot(Kickstarter_success,aes(x=numUpdates))
p3+geom_histogram()+ggtitle("Numberofupdatesinsuccessfulprojects")
p4<-ggplot(Kickstarter_success,aes(x=numImages))
p4+geom_histogram()+ggtitle("Numberofimagesinsuccessfulprojects")
p5<-ggplot(Kickstarter_success,aes(x=pastSuccessRate))
p5+geom_histogram()+ggtitle("Pastsuccessrateinsuccessfulprojects")
```
27
```{r}
g6<-ggplot(Kickstarter_success,aes(x=facebookFriends,y=log10(totalPledge)))
g6+geom_point()+geom_smooth()
g7<-ggplot(Kickstarter_success,aes(x=numPerks,y=log10(totalPledge)))
g7+geom_point()+geom_smooth()
g8<-ggplot(Kickstarter_success,aes(x=numUpdates,y=log10(totalPledge)))
g8+geom_point()+geom_smooth()
g9<-ggplot(Kickstarter_success,aes(x=numImages,y=log10(totalPledge)))
g9+geom_point()+geom_smooth()
g10<-ggplot(Kickstarter_success,aes(x=pastSuccessRate,y=log10(totalPledge)))
g10+geom_point()+geom_smooth()
```
```{r}
g11<-ggplot(c,aes(x=facebookFriends,y=log10(totalPledge)))
g11+geom_point()+geom_smooth()
```
```{r}
Kickstarter_success%>%
group_by(mainCategory)%>%
filter(n()>10)%>%
ggplot()+geom_point(aes(x=mainCategory,y=numPerks),alpha=0.3,color="tomato")+
geom_boxplot(aes(x=mainCategory,y=numPerks),alpha=0)+
ggtitle("NumberofperksbyCategory(n>10)")+xlab("Category")+
theme(axis.text.x=element_text(angle=90,hjust=1))+
ylab("Numberofperks")+coord_flip()
```
```{r}
28
Kickstarter_success%>%
group_by(mainCategory)%>%
filter(n()>10)%>%
ggplot()+geom_point(aes(x=mainCategory,y=facebookFriends),alpha=0.3,
color="tomato")+
geom_boxplot(aes(x=mainCategory,y=facebookFriends),alpha=0)+
ggtitle("NumberoffacebookFriendsbyCategory(n>10)")+xlab("Category")+
theme(axis.text.x=element_text(angle=90,hjust=1))+
ylab("NumberoffacebookFriends")+coord_flip()
```
ggplot(Kickstarter_success,aes(totalNumberBackers,lntotalPledge))+geom_point()+geom_s
mooth()+facet_wrap(~mainCategory)
```{r}
summary(Kickstarter_success$percentageGoal)
sd(Kickstarter_success$percentageGoal)
summary(Kickstarter_success$numSuccessfulCampaigns)
sd(Kickstarter_success$numSuccessfulCampaigns)
```
29
7. ReferencesChen, K., Jones, B., Kim, I., & Schlamp, B. (n.d.). KickPredict : Predicting Kickstarter Success.
CrowdExpert. (n.d.). Total crowdfunding volume worldwide from 2012 to 2015 (in billion U.S. dollars).
Etter, V., Grossglauser, M., & Thiran, P. (2013). Launch Hard or Go Home! Predicting the Success of Kickstarter Campaigns.
Hussain, N., Kamel, K., & Radhakrishna, A. (n.d.). Predicting the success of Kickstarter campaigns.
Kickstarter. (2017). Kickstarter Stats.