Approving Automation: Analyzing Requests for Permissions ... · • We used our defined scheme to...

10
Approving Automation: Analyzing Requests for Permissions of Bots in Wikidata Mariam Farda-Sarbas Human-Centered Computing | Freie Universität Berlin [email protected] Hong Zhu Human-Centered Computing | Freie Universität Berlin [email protected] Marisa Frizzi Nest Human-Centered Computing | Freie Universität Berlin [email protected] Claudia Müller-Birn Human-Centered Computing | Freie Universität Berlin [email protected] ABSTRACT Wikidata, initially developed to serve as a central structured knowl- edge base for Wikipedia, is now a melting point for structured data for companies, research projects and other peer production com- munities. Wikidata’s community consists of humans and bots, and most edits in Wikidata come from these bots. Prior research has raised concerns regarding the challenges for editors to ensure the quality of bot-generated data, such as the lack of quality control and knowledge diversity. In this research work, we provide one way of tackling these challenges by taking a closer look at the approval process of bot activity on Wikidata. We collected all bot requests, i.e. requests for permissions (RfP) from October 2012 to July 2018. We analyzed these 683 bot requests by classifying them regarding activity focus, activity type, and source mentioned. Our results show that the majority of task requests deal with data additions to Wikidata from internal sources, especially from Wikipedia. How- ever, we can also show the existing diversity of external sources used so far. Furthermore, we examined the reasons which caused the unsuccessful closing of RfPs. In some cases, the Wikidata com- munity is reluctant to implement specific bots, even if they are urgently needed because there is still no agreement in the commu- nity regarding the technical implementation. This study can serve as a foundation for studies that connect the approved tasks with the editing behavior of bots on Wikidata to understand the role of bots better for quality control and knowledge diversity. KEYWORDS Wikidata, Bot, Task Approval ACM Reference Format: Mariam Farda-Sarbas, Hong Zhu, Marisa Frizzi Nest, and Claudia Müller- Birn. 2019. Approving Automation: Analyzing Requests for Permissions of Bots in Wikidata. In The 15th International Symposium on Open Collaboration (OpenSym ’19), August 20–22, 2019, Skövde, Sweden. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3306446.3340833 Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. OpenSym ’19, August 20–22, 2019, Skövde, Sweden © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-6319-8/19/08. . . $15.00 https://doi.org/10.1145/3306446.3340833 1 INTRODUCTION Wikidata, the sister project of Wikipedia, was launched in October 2012. Similar to Wikipedia, Wikidata is a peer production commu- nity with more than 20 K active users 1 . It serves as the primary source for structured data of Wikimedia projects, especially for Wikipedia. Wikidata is also currently being used by commercial services, such as Apple’s assistant Siri, in research projects, such as WikiGenomes 2 and other peer-production communities, such as OpenStreetMap 3 . Thus, the quality of data in Wikidata is crucial, not only for Wikidata itself but also for the projects which rely on Wikidata as their source of data. However, early research on Wikidata’s community has already shown that in addition to an active human editor community, bots are responsible for the majority of edits (e.g., [10, 16, 17]). Contrary to Wikipedia, bots were massively used in the Wikidata community from the beginning of the project; therefore, it seems they exhibit an essential duty in Wikidata. Bots have been studied for some time in the context of Wikipedia (e.g. [3, 5, 9]), however, their role in the context of the Wikidata community is less explored. Concerns were raised recently about the role of bots in Wikidata. The quality of bot-generated data, for example, and their influence on quality control and knowledge diversity has been questioned [17]. This motivation led to the research presented in this paper, which is guided by the question: What type of bot activities are approved by the Wikidata community? We are interested in understanding what type of tasks the Wikidata community permits to be performed through bots, i.e. to automate. As opposed to previous work [10, 16, 20] or [18] which was based primarily on actual bot edits, we analyzed the requests for permission (RfP) process and the accompanying information in Wikidata. In the RfP process, a bot operator applies for task approval for each task a bot is intended to carry out. The community then decides on each task. We collected all task descriptions from October 31, 2012, to July 01, 2018, and classified and analyzed them manually. Our results suggest that the community uses bots mainly to add data to Wiki- data. These data originate primarily from Wikipedia. Furthermore, the main reasons for unsuccessful RfPs are either the operators themselves, because the specification of the task is insufficient, or the community, who could not agree on the design of the technical implementation. The contributions of our research are as follows: 1 https://www.wikidata.org/wiki/Wikidata:Statistics. 2 Community-created genome data portal available at www.wikigenomes.org. 3 An online map with an open license, available at https://www.openstreetmap.org.

Transcript of Approving Automation: Analyzing Requests for Permissions ... · • We used our defined scheme to...

Page 1: Approving Automation: Analyzing Requests for Permissions ... · • We used our defined scheme to identify bot-requested tasks and data sources. • We analyzed how the community

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata

Mariam Farda-SarbasHuman-Centered Computing | Freie Universitaumlt Berlin

mariamfardasarbasfu-berlinde

Hong ZhuHuman-Centered Computing | Freie Universitaumlt Berlin

hongzhufu-berlinde

Marisa Frizzi NestHuman-Centered Computing | Freie Universitaumlt Berlin

marisanestfu-berlinde

Claudia Muumlller-BirnHuman-Centered Computing | Freie Universitaumlt Berlin

clmbinffu-berlinde

ABSTRACTWikidata initially developed to serve as a central structured knowl-edge base for Wikipedia is now a melting point for structured datafor companies research projects and other peer production com-munities Wikidatarsquos community consists of humans and bots andmost edits in Wikidata come from these bots Prior research hasraised concerns regarding the challenges for editors to ensure thequality of bot-generated data such as the lack of quality controland knowledge diversity In this research work we provide one wayof tackling these challenges by taking a closer look at the approvalprocess of bot activity on Wikidata We collected all bot requestsie requests for permissions (RfP) from October 2012 to July 2018We analyzed these 683 bot requests by classifying them regardingactivity focus activity type and source mentioned Our resultsshow that the majority of task requests deal with data additions toWikidata from internal sources especially from Wikipedia How-ever we can also show the existing diversity of external sourcesused so far Furthermore we examined the reasons which causedthe unsuccessful closing of RfPs In some cases the Wikidata com-munity is reluctant to implement specific bots even if they areurgently needed because there is still no agreement in the commu-nity regarding the technical implementation This study can serveas a foundation for studies that connect the approved tasks withthe editing behavior of bots on Wikidata to understand the role ofbots better for quality control and knowledge diversity

KEYWORDSWikidata Bot Task Approval

ACM Reference FormatMariam Farda-Sarbas Hong Zhu Marisa Frizzi Nest and Claudia Muumlller-Birn 2019 Approving Automation Analyzing Requests for Permissions ofBots inWikidata In The 15th International Symposium on Open Collaboration(OpenSym rsquo19) August 20ndash22 2019 Skoumlvde Sweden ACM New York NYUSA 10 pages httpsdoiorg10114533064463340833

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page Copyrights for components of this work owned by others than theauthor(s) must be honored Abstracting with credit is permitted To copy otherwise orrepublish to post on servers or to redistribute to lists requires prior specific permissionandor a fee Request permissions from permissionsacmorgOpenSym rsquo19 August 20ndash22 2019 Skoumlvde Swedencopy 2019 Copyright held by the ownerauthor(s) Publication rights licensed to ACMACM ISBN 978-1-4503-6319-81908 $1500httpsdoiorg10114533064463340833

1 INTRODUCTIONWikidata the sister project of Wikipedia was launched in October2012 Similar to Wikipedia Wikidata is a peer production commu-nity with more than 20 K active users1 It serves as the primarysource for structured data of Wikimedia projects especially forWikipedia Wikidata is also currently being used by commercialservices such as Applersquos assistant Siri in research projects such asWikiGenomes2 and other peer-production communities such asOpenStreetMap3 Thus the quality of data in Wikidata is crucialnot only for Wikidata itself but also for the projects which rely onWikidata as their source of data

However early research on Wikidatarsquos community has alreadyshown that in addition to an active human editor community botsare responsible for the majority of edits (eg [10 16 17]) Contraryto Wikipedia bots were massively used in the Wikidata communityfrom the beginning of the project therefore it seems they exhibitan essential duty in Wikidata Bots have been studied for some timein the context of Wikipedia (eg [3 5 9]) however their role inthe context of the Wikidata community is less explored Concernswere raised recently about the role of bots in Wikidata The qualityof bot-generated data for example and their influence on qualitycontrol and knowledge diversity has been questioned [17]

This motivation led to the research presented in this paper whichis guided by the question What type of bot activities are approvedby the Wikidata community We are interested in understandingwhat type of tasks theWikidata community permits to be performedthrough bots ie to automate

As opposed to previous work [10 16 20] or [18] which wasbased primarily on actual bot edits we analyzed the requests forpermission (RfP) process and the accompanying information inWikidata In the RfP process a bot operator applies for task approvalfor each task a bot is intended to carry out The community thendecides on each task

We collected all task descriptions from October 31 2012 to July01 2018 and classified and analyzed them manually Our resultssuggest that the community uses bots mainly to add data to Wiki-data These data originate primarily from Wikipedia Furthermorethe main reasons for unsuccessful RfPs are either the operatorsthemselves because the specification of the task is insufficient orthe community who could not agree on the design of the technicalimplementation The contributions of our research are as follows

1 httpswwwwikidataorgwikiWikidataStatistics2Community-created genome data portal available at wwwwikigenomesorg3An online map with an open license available at httpswwwopenstreetmaporg

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

bull We investigated bots from a community perspective basedon RfPs

bull We provided a classification scheme for the categorizationof RfPs

bull We used our defined scheme to identify bot-requested tasksand data sources

bull We analyzed how the community decides on RfPs and clas-sified the reasons for unsuccessful RfPs

bull We developed a dataset of bots RfPsIn the following sections we firstly present existing research

on bots in peer-production by focusing on Wikimedia projectsand how our work is built on them We then explain our datasetand classification approach Next come the findings of the studyfollowed by a discussion of the results We conclude this article byhighlighting possible directions for future research

2 BOTS IN PEER PRODUCTIONVery different bots populate peer production communities Thesebots collect information execute functions independently createcontent or mimic humans The effects of bots on peer productionsystems and our society are increasingly being discussed for ex-ample when influencing voting behaviour [1] or imitating humanbehaviour [13]

Bots have been used in Wikipedia from early on4 Wikidatarsquoscommunity has profited from these experiences when handlingtheir bots We review therefore existing insights into the bot com-munity on Wikipedia and building on that highlight research onWikidata that considers bot activity

21 Bots in WikipediaBots are software programs that automate tasks usually repeti-tive or routine tasks which humans consider time-consuming andtedious (eg [3 4 16 21]) They are operated and controlled byhumans

Wikipedia has developed a stable and increasingly active botcommunity over the years although bots were not widely acceptedand trusted in the beginning [14] Halfaker et al distinguishes fourtypes of bots in Wikipedia [9] (1) Bots that transfer data frompublic databases into articles (2) bots that monitor and curatearticles (3) bots that extend the existing software functionality ofthe underlying Wikipedia MediaWiki software and (4) bots thatprotect against vandalism

The majority of research focuses on bots that protect againstvandalism Geiger et al [7] for example investigated the process offighting vandalism in Wikipedia by using trace ethnography Theyshow how human editors and bots work together to fight vandalismin Wikipedia They conjecture that such distribution of concernsto human and algorithmic editors might change the moral order inWikipedia Halfaker et al [8] show how a distributed cognitive net-work of social and algorithmic actors works efficiently together todetect and revert vandalism on Wikipedia In another study Geigeramp Halfaker [5] investigated the impact of a counter-vandalism botrsquosdowntime on the quality control network of Wikipedia and found

4 httpsenwikipediaorgwikiWikipediaHistory_of_Wikipedia_botsrambot_and_other_small-town_bots

that during this downtime the quality control network performedslower but was still effective

Another piece of research focuses on how such tools transformthe nature of editing and user interaction Geiger shows how aweak but pre-existing social norm was controversially reified intoa technological actor [3] He refers to the example of the Hager-manBot which has been implemented to sign unsigned discussionentries in Wikipedia Halfaker et al [9] show that bots are not onlyresponsible for the enforcement of existing guidelines on a largerscale but also that their activities can have unexpected effects Thenumber of reverts of newcomersrsquo edits for example has elevatedwhile (surprisingly) the quality of those edits has stayed almost con-stant Editors increasingly apply algorithmic tools for monitoringedits of newcomers In 2010 40 percent of rejections of newcomersrsquocontributions were based on these algorithmic tools [9] This con-tradicts attempts of the community to engage more new editorsMoreover Geiger and Halfaker defined bots as assemblages of codeand a human developer and show that botrsquos activity is well alignedwith Wikipediarsquos policy environment [6]

The research suggests that bots are more critical to the suc-cess of the Wikipedia project than expected previously despite thereluctance of the Wikipedia community to allow bots at the begin-ning [14] Bots have a significant role in maintaining this text-basedknowledge base especially in fighting vandalism As bots in Wiki-data have their roots in Wikipedia we expect to see similaritiesbetween bots in both peer production systems - Wikipedia andWikidata Before we look closer to see if the same areas of use ofbot activities emerge from Wikidata we give an overview of theexisting insights into the bot community in Wikidata

22 Bots in WikidataWikidata inherited bots from its sister project Wikipedia and botsstarted editing Wikidata with its launch by linking Wikidata itempages to their respective Wikipedia language pages The currentresearch on Wikidata bots shows that bots perform most of theedits in Wikidata [16 20] Steiner [20] in his research aims tounderstand the editing distribution of editors on Wikidata andWikipedia He provides a web application to observe real-time editactivity on Wikidata for bots and logged-in and anonymous usersThe study shows that the number of bots vs the number of editshas grown in a linear form and most of Wikidatarsquos edits ie 88account for bot edits

Muumlller-Birn et al [16] can confirm these insights in a later studyby studying the community editing patterns of Wikidata through acluster analysis of contributorsrsquo editing activities They determinesix editing patterns of the participating community (ie referenceeditor item creator item editor item expert property editor andproperty engineer) and show that bots are responsible for simplerediting patterns such as creating items editing items statementsor sitelinks

Further studies focus on how bot edits contribute to data qualityin Wikidata In one study on Wikidatarsquos external references Pis-copo et al [18] find that the diversity of external sources in botedits is lower than in human edits In another study Piscopo etal [19] explore the influence of bots and humans (registered andanonymous) contributions on the quality of data in Wikidata The

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

bot nameoperator nametaskscode

function details

decision datedecision

Figure 1 Example of a request-for-permissions (RfP) page with the extracted fields

research shows that equal contributions of humans and bots have apositive impact on data quality while more anonymous edits leadtowards a lower quality

Hall et al [10] analyzed these anonymous edits on Wikidata todetect bots in this group which had not previously been identi-fied in Wikidata The study shows that two to three percent of thecontributions (more than 1 million edits) considered as human con-tributions previously came from unidentified bots They emphasizethat it might be a concerning issue for Wikidata and all projectsrelying on these data Even if vandalism causes a small portion ofthese edits the damaging effect on Wikidata might be high

This danger is reflected by Piscopo [17] who highlights signifi-cant challenges for Wikidata that might endanger its sustainabilityNamely a lack of quality control because of the large amount ofdata added by bots a lack of diversity because of the usage of a fewsources only and existing threats to user participation because ofbot usage

Existing research focuses primarily on the activity levels of botsin Wikidata based on their edits Some work (eg [17]) conveys theimpression that bots are independent of humans However thisignores the fact that humans operate bots The community officiallygrants most bot activities in a well-defined process

It is visible from the literature that bots are so far studiedmainly from their activity angle both in Wikipedia and WikidataIn Wikipedia bots are used primarily for quality assurance tasksie vandalism detection and maintenance tasks for example re-movingreplacing templates on articles while Wikidatarsquos bots areperforming tasks mostly related to content editing A possible rea-son could be the structured nature ofWikidata content which is lesschallenging to deal with than the unstructured data in WikipediaIt is interesting to dig into the content editing activities for whichthe bots have asked most in Wikidata and approved by the commu-nity While adding more content through bots seems to contributeto data quality from completeness angle this raises the questionwhether this data is based on sources and improves data qualityfrom the data trustworthiness aspect The following study providesthe ground for answering such questions in the future We inves-tigate the RfP process of bots in more detail We describe which

tasks Wikidatarsquos community find useful for bots and in what casesthe community does not support a bot request

3 APPROVING BOT TASKSIn this section we describe the bot approval process on Wikidataexplain our data collection detail the classification process andfinally give an overview of the resulting dataset

31 Approval Process in WikidataBuilding on the experiences of the Wikipedia community Wikidatahas had a well-defined policy system for bots almost from the be-ginning (November 2012)5 Except for cases like editing Wikidatarsquossandbox6 or their own user pages every (semi-) automatic taskcarried out by a bot needs to be approved by the community Itmeans that before operators can run their bots on Wikidata theyhave to open an RfP for their bot7 The RfP is thus the formal wayof requesting bot rights for an account where the decisions on RfPsare based on the community consensus

An RfP is caused by either an editorrsquos need or by a communityrequest8 Such a bot request is well documented and available toall interested community members on Wikidata

Figure 1 shows a typical RfP page It consists of a bot name9an operator name (bot owner) tasks (a brief description of whatthe bot intends to do) link to the source code function details (adetailed description of bot tasks and sources of imported data) andthe final decision (RfP approved or not approved with a reason)

Bot owners have to use a template for providing all informationfor the decision-making process and need to clarify questions re-garding their request during the decision-making process Theyalso often have to provide a number of test edits (50 to 250 edits)

5 httpswwwwikidataorgwindexphptitle=WikidataBotsampoldid=5491666 Wikidatarsquos sandbox can be used for test edits httpswwwwikidataorgwikiWikidataSandbox7All open requests are available at wwwwikidataorgwikiWikidataRequest_for_permissionsBot8An example can be found at httpswwwwikidataorgwindexphptitle=WikidataBot_requestsampoldid=38203229Polish_22Italian_comune22_descriptions9According to the bot policy bot accounts are recommended to have the word bot intheir names

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

Table 1 Example of the information extracted from an RfPpage

URL wwwwikidataorg[]BotMastiBotBot Name mastiBotOperator Name MastiTask and Function add basic personal information based on

biografy related infoboxes from plwikiDecision approvedDecision Date 15092017First Edit 03092017Last Edit 21022018No of Edits 8No of Editors 5

The community handles RfPs in accordance with the Bot ApprovalProcess10 The community discuss whether the task requested isneeded and whether the implementation of the task functions prop-erly After the decision has been made an administrator or a bu-reaucrat close the request if approved a bureaucrat will flag theaccount and if not the request is closed stating the reason for theunsuccessful request

After a successful request each bot operator has to list the taskthe bot performswith a link to the RfP on its user page However thebot operator is required to open a new RfP if there is a substantialchange in the tasks the bot performs Consequently bots can havemultiple requests which are shown on the user page Furthermorein special cases a bot is also allowed to carry out administrativetasks such as blocking users or deleting or protecting pages Inthis case the operator needs to apply for both the RfP and theadministrator status11

32 Data CollectionWe collected all bot requests that were in the final approval stage inJuly 201812 ie we ignored all tasks without a final decision fromWikidatarsquos archive for requests13 We collected our data based onweb scraping ie web data extraction programs implemented inPython Bot requests which were listed several times in differentarchives were only parsed once14 This resulted in 685 task approvalpages

We extracted the following information from each page (cf Fig-ure 1) URL bot and operator name decision decision date taskscode and function details We additionally collected the date of thefirst and last page edit15 the number of page edits and the numberof distinct editors who contributed to the request for each page by10A detailed description is available at wwwwikidataorgwikiWikidataBots11 Further information can be found at httpswwwwikidataorgwikiWikidataRequests_for_permissionsAdministrator12 The first request in our data set was opened on October 31 2012 and the last onewas on June 29 201813 wwwwikidataorgwikiWikidataRequests_for_permissionsArchiveRequests_for_bot_flags14 For example wwwwikidataorgwikiWikidataRequests_for_permissionsBotVIAFbot is listed in wwwwikidataorgwikiWikidataRequests_for_permissionsRfBotMarch_2013 and in wwwwikidataorgwikiWikidataRequests_for_permissionsRfBotApril_2013 and only the later one was used15 The first edit can be interpreted as the request opening and last edit as the requestclosing or date of archiving

using the MediaWiki API16 This extracted information (cf Table 1)was processed and stored in a relational database17

33 Task Approval DataWe collected 683 distinct requests from Wikidata however two ofthem are no longer available and consequently we excluded themOf the resulting 681 requests 600 requests (88) were successfuland 81 (12) were unsuccessful

An average of five people (σ = 28) participated in an approvalrequest for example by taking part in the discussion or stating thefinal decision These people accounted for an average of slightlyabove ten edits for each request (σ = 85)

Based on the requests collected we identified 391 distinct botsand 366 distinct bot operators on Wikidata (cf Table 2) Someoperators applied for more than one task for their bot in one request(eg update label update description update alias) with five beingthe highest number Furthermore bot owners can operate morethan one bot The majority of operators (319 editors) have onlyone bot while three editors were running three bots and there waseven one operator managing seven bots simultaneously Similarlyone bot can also be operated by more than one user for examplethree editors were managing the ProteinBoxBot together

Table 2 Overview on task requests (n = 683 number of all taskrequests)

Decision Result Requests No of Bots Operators

Successful 600 323 299Unsuccessful 81 68 67Both 681 391 366

34 Classification ProcessWe classified the data collected manually to gain a deeper under-standing of the different tasks that bot operators carry out on Wiki-data In the following we describe this process in detail

We employed the qualitative method of content analysis whichis used to analyze unstructured content [12] which in our caseis the text in the RfPs We classified the data manually using theopen coding approach of the grounded theory The data we weredealing with were applicantsrsquo own sentences (in vivo codes) thuswe developed an emergent coding scheme and categorized thetextual information Two of the authors18 coded the data in a three-phase process by ensuring the consistency of the codes and reducingpossible bias during the coding

We read a sample of the data collected and discussed the diverserequest details We noticed that some common categories could befound within all requests based on some particular perspectivessuch as the potential actions of the requesting bots and the mainsources of the data to be imported from bots We focused thus ontask and function details of the RfP page and extracted the intended

16 wwwmediawikiorgwikiAPIMain_page17 The database is released under a public license on GitHub httpsgithubcomFUB-HCCwikidata-bot-request-analysis18 The first and second authors of this paper

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

use (task) and the data source of the bot edits which were approvedor closed as successful

In the first phase we categorized 30 randomly chosen RfPs col-laboratively to develop a shared understanding Based on the firstcategories we carried out a second round in which another set ofrandomly chosen 30 RfPs were categorized individually We thencompared the results and checked the agreement level19 After dis-cussing diverging cases and cross-validating our category set wecontinued with another round of categorizing the data (40 RfPs)individually and we had 39 agreements out of the 40 cases Wecontinued with further rounds of coding individually we met fre-quently on a regular basis and discussed new or unclear cases andcross-validated our category sets to ensure the consistency of ourdata classification

We developed a codebook20 as the primary reference of theclassification process by starting from the first phase and updat-ing it continually during the classification period The codes arestructured in verb-noun pairs denoting the content entities andoperations which are inspired by the work of Muumlller-Birn [15] andinitially drafted after the first 30 RfP classifications Based on thiswe developed it further by structuring all newly presented infor-mation into the verb-noun pairs and adding them to the codebookThe first category describes the potential scope of bot activity Thesub-categories contain all the entity types of Wikidata such asitems claims statements references and sitelinks The unknowncategory refers to the tasks without a clear definition We developedan overview which shows a scheme of all entity types and theirrespective activities together with the number of requests for theseedits (cf Table 3)

Another dimension of our classification considers the originof the data which the bot operators plan to include in WikidataThis category contains three main sub-categories Internal exter-nal and unknown sources Internal sources are those within theWikimedia sphere whereas external sources are outsideWikimediaThose requests without a specific source of data were categorizedas unknown

After coding all successful RfPs we became curious about cer-tain RfPs which were not approved Were there some unusual taskrequests or operators having difficulties in convincing the commu-nity of their botsrsquo safe activities To find this out we did anotherprocess of coding which resulted in 81 unsuccessfully closed RfPsWe classified these RfPs as before except we included the reasonsfor the request failure The category unsuccessful thus contains allRfPs that are not approved for reasons such as being duplicatesinactive for a long time not in line with bot policy or withdrawn(cp Table 6)

4 RESULTSWe organized the results of our RfP analyses in two sections Firstlywe focused on the approved bots requests and their contents andsecondly we looked into the details of highly discussed task ap-proval requests

19We measured the inter-rater reliability by using Cohenrsquos kappa At this stage wehad a substantial agreement level ie 06520 The codebook along with the data is provided on GitHub httpsgithubcomFUB-HCCwikidata-bot-request-analysis

Table 3 Task requests categorized into edit focus activitytype and the request result (approveddenied) (n = 681 num-ber of all task requests)

Edit Focus Activity Approved Denied

Item add 85 12update 40 3merge 6 1mark-deleted 1 0

Term add 22 5update 11 0

Label add 31 3update 15 1delete 1 0

Description add 33 5update 10 1

Alias add 4 0update 1 0delete 1 1

Statement add 141 19update 15 0delete 1 1

Claim add 161 18update 33 7delete 13 3

Qualifier add 7 0Reference add 9 3

update 4 0delete 1 0

Rank update 1 0

Sitelink add 52 3update 23 4delete 1 0merge 1 0revert 1 0

Badge add 4 0

Page add 0 1update 8 1delete 1 0

Community maintain 25 5Misc minusminus 26 13

41 Approved RequestsIn this part we show the types of activities which were approvedthe sources they used for editing and the relationship between botsand their operators We focus on three aspects of the tasks Theactivity focus the activity type and the origin of the data used bythe bots

411 Activity Type and Focus The activity focus considers whatthe bots have planned to edit in Wikidata It can be seen in Table 3that most task requests aim to edit items in Wikidata The differentcategories represent the different parts of an item which are or-ganized into terms ie labels descriptions and aliases statements

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

70 Add

20 Update

3 Maintain

3 Unknow

n

3 Delete

1 Revert

16 Item

16 Term

12 Sitelink

26 Claim

20 Statement

2 Reference5 M

isc

3 Unknow

n

Figure 2 Approved task requests organized into activity type (the upper half part) and edit focus (the lower half part)(n = 794 262 RfPs are requests with multiple tasks edit focus Term consists of Alias Label and Description The edit focus Miscellaneouscontains the smaller categories such as Qualifier Rank Badge and Maintenance related data)

(ie claims with qualifiers) references ranks and sitelinks withbadges21 The number of approved task requests show that themajority of bots seem to focus on adding data to statements in gen-eral and claims more specifically Figure 2 shows the distributionof approved tasks into activity types and edit focuses Again themajority of task requests deal with the addition of data

Furthermore there are 30 requests dealing with community con-cerns regarding maintenance activities such as archiving pagessections moving pages revert edits remove duplicates generatestatistical reports as some of the examples

In addition to this there are 24 requests concluded as unknowntasks half of which are written extremely briefly22 or in an unclearway23 so that it was difficult for us to identify their potential tasksSome other cases can only be classified to a specific task based onparticular assumptions for instance ShBot24 uses a Bot-FrameworkQuickStatements25 to batch edits which shows a great possibilityof importing data to statements However this tool could also beused to remove statements or importing terms so there is stilla challenge to identify the requestsrsquo focus without analyzing theedits We categorized all tasks abiding strictly by our codebookthus all the RfPs which needed assumptions are therefore classifiedas unknown

We assumed that the data addition to Wikidata is related primar-ily to Wikidatarsquos inception ie that the majority of task approvalwould be more at the beginning of Wikidatarsquos lifetime HoweverFigure 3 shows that the most often requested activity types in thefirst six months were add and update At the end of 2017 thesetwo activity types were often requested again We found the peakin 2017 for requesting tasks such as adding and updating itemsunusual and investigated it further Two operators26 carried out 31requests in 2017 for importing items for different languages Since

21These badges present if a sitelink to a specific article on Wikipedia language is anexcellent article22eg wwwwikidataorgwikiWikidataRequests_for_permissionsBotGlavkos_bot23eg wwwwikidataorgwikiWikidataRequests_for_permissionsBotBotMultichillT24 wwwwikidataorgwikiWikidataRequests_for_permissionsBotShBot25 Further information are available at metawikimediaorgwikiQuickStatements26 Both editors are employed by Wikimedia Sverige and were engaged in a GLAMinitiative

120

App

rove

d Re

ques

ts

mergerevertmaintain

addupdatedelete

unknown100

80

60

40

20

0

2013-0

6

2013-1

2

2014-0

6

2015-0

6

2014-1

2

2015-1

2

2016-0

6

2016-1

2

2017-0

6

2018-0

6

2017-1

2

2012-1

2

Figure 3 Activity types of approved task requests over time(n = 600 all 81 unsuccessful task requests were not included)

all these requests were permitted it explains the sudden accelera-tion of importing and updating tasks shown in the graph

In the following step we looked more closely at the sources ofthe data which bots are supposed to add to Wikidata

100

Sour

ces o

f app

rove

d ta

sks

80

60

40

20

0

internalexternalunknown

2013-0

6

2013-1

2

2014-0

6

2015-0

6

2014-1

2

2015-1

2

2016-0

6

2016-1

2

2017-0

6

2018-0

6

2017-1

2

2012-1

2

Figure 4 Sources of approved task requests over time (n =600 all 81 unsuccessful task requests were not included)

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 4 All internal the top 10 external sources most usedand unknown sources

Origin Source Task Requests

Internal (455)

Wikipedia 263Wikidata 110Wiki-loves-monuments 30Wikicommons 13Wikisource 5Wikivoyage 4Wikinews 2Wikiquote 2WikiPathways 2Wikimedia Project 24

External (38)

MusicBrainz 8VIAF 7OpenStreetMap 4DBpedia 4Freebase 4GRID 3GND 2GitHub 2YouTube 2Disease Ontology Project 2

Unknown Unknown 107

412 Data Origin In addition to the types of tasks in RfPs thesources from which data are imported also exist in the RfP Wetried to identify all sources independently of the focus or activitytype in the classification of all task approval pages We differentiateinternal external and unknown sources

(1) Internal including all sources from the Wikimedia ecosys-tem

(2) External All sources outside Wikimediarsquos ecosystem and(3) Unknown All cases where the source of data was a local

filedatabase or not clearly statedTable 4 shows the number of tasks approved per data source The

majority of approved task requests (454) deal with data from theWikimedia ecosystem with Wikipedia providing most of the dataIn addition to Wikipedia the Wiki Loves Monuments (WLM) ini-tiative is organized worldwide by Wikipedia community membersand Wikimedia projects The category Wikimedia project refers torequests where the operator provided information on the sourcebeing restricted to Wikimedia projectsrsquo scope however a specificattribution to one of the project was not possible

Another important source refers toWikidata itselfWe found thatonly a small number of these data are used in maintenance tasks27the majority of requests are for tasks such as retrieving informationfrom Wikidata adding descriptions and updating or adding newlabels by translating the existing labels in other languages

There are 128 task approval requests that relate to a task ofimporting external data into Wikidata The external sources most27Among 104 requests with a data source of Wikidata there are 16 requests holdingtasks of maintenance such as archive discussions or update database reports

Vietnam

ese

Hebrew

Maithill

Slova

k

Cebuan

o

Estonian

GreekCzec

hNep

ali

Bulgaria

nWels

h

Georgi

anFre

nch

Russian

German

Arabic

Farsi

Turkish

DanishUrduHindi

Latvi

an

Min-Nan

English

Dutch

Roman

ian

Swed

ish

Korean

Japan

ese

Belarusia

n

Bosnian

Serbi

an aiMala

y

Catalan

Ukranian

Chinese

Norweg

ianPoli

shIta

lian

Span

ish

5

10

15

20

25

30

35

40

0

Figure 5 Wikipedia language versions used as sources men-tioned on task approval pages (n = 195 all 488 other requestseither did not provide a specific Wikipedia language version or didnot have Wikipedia as the source)

often mentioned are MusicBrainz28 and VIAF29 (cf Table 4) Bothsources are used mainly to add identifiers to Wikidata items Otherexternal sources are OpenStreetMap30 DBpedia31 and Freebase32

As shown in Figure 4 the internal data sources have remainedthose most often mentioned in the task requests External sourcesand unknown sources are on the same level and external sourcesshow a slight increase over time

Most data from Wikipedia comes from 43 different languageversions (cf Figure 5) with English French German Russian andPersian being the top five languages The results show that botscontribution to certain languages had made these languages moreprominent than others in Wikidata

There is a total of 109 RfPs in total with unknown sources 85of them were approved some of these requests mentioned a localfile or database as a data source Other RfPs are not adding data toWikidata but are for example moving pages

42 Disputed RequestsThere are 81 RfPs in our dataset which were closed unsuccessfullyWe wanted to understand better why the community decided todecline these requests thus we investigated the main reasons forall unsuccessful RfPs

Our data show that operators themselves are most commonlyresponsible for their RfPs being rejected Most of the time operatorswere not responding to questions in a timely manner or did notprovide enough information when required There are only a fewcases where RfPs were not approved by the community ie nocommunity consensus was reached or the bot violated the botpolicy In one case33 for instance an editor was asking for botrights for its own user account instead of the one for the bot This

28MusicBrainz is an open and publicly available music encyclopedia that collects musicmetadata available at wwwmusicbrainzorg29VIAF is an international authority file which stands for Virtual International Au-thority File available at wwwviaforg30OpenStreetMap is an open license map of the world available at wwwopenstreetmaporg31DBpedia is also a Wikipedia-based knowledge graph wikidbpediaorg32Freebase was a collaborative knowledge base which was launched in 2007 and closedin 2014 All data from Freebase was subsequently imported into Wikidata33 wwwwikidataorgwikiWikidataRequests_for_permissionsBotFlorentyna

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

Table 5 Top 5 most edited task requests closed as unsuccessful

Reasons No of RfP Edits No of Editors RfP Created At Reference

Data source introduced errors 88 10 2018-03 [26]Bot name bot edits bot performance 43 6 2015-12 [24]Automatic adding of implicit statements 41 21 2013-04 [23]Support from less active users duplicate task 32 16 2018-03 [27]Bot name conflict with users running without flag 32 11 2014-10 [25]

violates bot policy as operators are required to create a new accountfor their bots and include the word bot in the account name

Table 6 shows the main reasons why tasks were rejected RfPswhich had already been implemented by other bots (duplicate)or RfPs requesting for tasks which were not needed such as onebot which wanted to remove obsolete claims related to property(P107)34 but there were no more items associated with P107

Furthermore RfPs were closed as unsuccessful because the com-munity could not trust on the operatorsrsquo behaviors One bot forexample has requested many RfPs which were not accepted Thecommunity was questioning its editing behavior and required thisuser to gain community trust first It can be seen that unsuccessfulclosing is not only related to the task types that users had requested

In the following we describe six RfPs which were edited anddiscussed most of all (cf Table 5) We defined the number of editson an RfP page as a proxy for challenging cases

Phenobot [26] was asking for correcting species names basedon the UniProt Taxonomy database The community doubted thedata source (UniProt Taxonomy database) and insisted on morereliable information The bot has introduced some error reportstherefore the community had to ask for edit-rollbacks Since thenthe operator has been inactive for two years which finally led tothis request being closed unsuccessfully

WikiLovesESBot [27] which wanted to add statements related toSpanish Municipalities from Spanish Wikipedia gained continuoussupport from nine users until a user asked for more active Wiki-data users to comment After this one active user checked the botcontributions and pointed out that this bot was blocked Moreoveranother active user figured out that this request was asking forduplicate tasks since there was already another user also workingon Spanish municipalities The RfP remained open for about halfa year without any operator activities and came to a proceduralclose

Another remarkable case is VlsergeyBot [25] which intended toupdate local dictionaries transfer data from infobox to Wikidataand update properties and reports in Wikidata At first the accountwas named Secretary which was against the bot policy Aftera community suggestion it was renamed to VlsergeyBot Afterthat a user opposed the request and said that Vlsergeylsquos activitygenerates a number of large conflicts with different users in ruwikiand then another user stated that maybe you have a personal con-flict with Vlsergey and discussions looked like a personal conflictThe community then considered whether this account needs a botflag and found that the bot was already running without a bot flagfor hours and even broke the speed limit for bots with a flag The

34P107 was a property representing GND type

operator promised to update his code by adding restrictions beforethe next run Finally the operator withdrew his request stating thathe does not need a bot flag that much

ImplicatorBot [23] was intended to add implicit statements intoWikidata automatically Implicated statements are relationshipsbetween items which can be inferred from existing relationships Aproperty of an item stating that a famous mother has a daughterimplies for example that the daughter has a mother even thoughit is not represented yet The task sounds quite straightforwardbut the community was hesitating to approve the task consideringthe possibility of vandalism Furthermore changes in propertiesare still common and such a bot might cause tremendous furtherwork in the case of those changes Even if the operator had sharedhis code discussions proceeded with the highest number of editorsdiscussing a request (ie 21) and finally the operator stated that heis ldquosick and tired of the bickeringrdquo Despite the community insistingthat they needed the task the operator withdrew it

Similar to ImplicatorBot Structor [24] was withdrawn by theoperator because the operator was tired of bureaucracy The botwanted to add claim label and description particularly to providestructured information about species items There were many ques-tions raised by a small number of community members (ie a totalof six different users participated in the discussion) including thebot name the source of edits duplicate entries the edit summaryand edit test performances The operator was actively respondingto all these questions However the whole process was so surpris-ingly complex (the discussions continued for over seven monthsfrom the start that the operator was inactive for a long time thenwithdrew his request

Requests denied because the user is not trusted are also inter-esting ElphiBot_335 [22] for example which is managed togetherby three operators sent a request along with the source code forupdating the interwiki links in Wikidata after the category in Ara-bic Wikipedia has been moved The community preferred how-ever someone who is more responsible and more familiar with thetechnical aspects of Wikipedia and Wikidata What concerned thediscussants most was that the operators were not capable enoughto fix the bugs introduced by their bot The community did not feelcomfortable approving this request because the operatorsrsquo treat-ment for fixing bugs was simply to reverse the mistakes manuallyThe requirement to run a bot should not only be to know how tocode but surely also to respond in a timely manner and notify thosewho can fix the issue

35This RfP has a total of 29 edits and 8 editors contributed in the discussions

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 6 Main reasons for unsuccessful task requests (n = 81 number of unsuccessful task requests)

Reason Description Frequency

No activity from operator Operators are not active or do not respond to the questions regarding their bots 35Withdrawn Operators wanted to close the requested task 14Duplicate Requested tasks are redundant and are already done through other bots or tools 9No community consensus Community opposed the requested tasks 6User not trusted Operator has questionable editing behavior and ability to fix any bugs introduced 6Task not needed Tasks do not need a bot or are not a significant problem for the time being 5Donrsquot need bot flag The requested tasks can be performed without a bot flag 3No info provided The fields in the RfP page are left empty 2Against bot policy Task is not in line with bot policy 1

5 DISCUSSIONIn this study we collected existing data from the RfP pages onWikidata to find out what kind of tasks the community allows tobe automated or shared with bots

Looking at the type and focus of the tasks which were mentionedmost often in RfPs we found that the dominant request type overtime is adding data This observation suggests that Wikidatarsquoscommunity uses bots to increase the coverage of the provideddata which aligns with the vision of the Wikimania FoundationImagine a world in which every single human being can freelyshare in the sum of all knowledge 36 Updating data is the secondhighest requested task which shows bots also take part in correctingmistakes or refreshing data according to changes (eg updatingthe sitelinks of moved pages) and contribute in data completenesswhich is defined as one data quality dimension [2]

However we were surprised that there were fewer requestsregarding adding or updating references which could support thetrustworthiness a data quality dimension as well of data importedby bots This result supports existing analyses on the edit activitiesof bots [18] However it would be interesting to bring these twolines of research together - the task approval and edit perspective -to understand more closely the development over time Thus wecan infer that bots are allowed to assist the Wikidata communityin ensuring data quality from the completeness angle howeverthis is less visible from the trustworthiness angle In comparison toWikipedia where bots perform primarily maintenance tasks botrequests in Wikidata concentrate mainly on the content ie dataperspective

The majority of data sources mentioned in RfPs come from in-side Wikimedia projects mainly Wikipedia (cf Section 412) Thisobservation implies that Wikidata is on its way to serve as thestructured data source for Wikimedia projects one of the maingoals for the development of Wikidata Among the five languageversions of Wikipedia most used English French Russian Germanand Persian (cf Figure 5) the first three languages show a dom-inance of Western knowledge in bots imports In a study Kaffeeet al [11] had earlier found the influence of western languagesin Wikidata with the most dominant ones being English DutchFrench German Spanish Italian and Russian The similarity of theresults in both studies could imply that the reason these languages

36Information on Wikimedia strategy can be found on httpswikimediafoundationorgaboutvision

have better coverage than other languages is a result of having botsKaffee et al also found that language coverage is not related to thenumber of speakers of a language which supports our assumptionWe could not find evidence that languages other than Western onesare underrepresented on bot requests on purpose We rather ex-pect that the current bot requests are a representative sample ofWikidatarsquos community

We can see from the external sources that bots use these sourcesmostly for importing identifiers (eg VIAF) or for importing datafrom other databases (eg DBpedia Freebase) This insight supportsPiscoporsquos [17] argument that bot edits need to be more diverse Wesuggest that further efforts should be made to import or link datafrom different sources for example from research institutions andlibraries With the increased usage of the Wikibase software37 weassume that more data might be linked or imported to WikidataOur classification revealed that bots import data from local files ordatabases already However such data imports often rely on thecommunity trusting the bot operators and do not seem large-scale

The requested maintenance tasks within the RfP show similarpatterns to those in the Wikipedia community Bots are being usedto overcome the limitations of the MediaWiki software [9] Anadministrative task such as deletion is not often requested in RfPsHowever bots on Wikidata are allowed to delete sitelinks onlySimilar to Wikipedia the Wikidata community comes closer to acommon understanding of what bots are supposed to do and whatnot

The issue of unknown tasks which was not clearly defined inRfPs shows the trust the community has in single operators proba-bly due to their previous participation history There is a noticeablenumber of approved RfPs which we coded as unknown due to theirvague task description while there are also cases where task de-scriptions were clear but the community did not trust the operatorsand thus they were not approved These two observations indicatethat trust is also given importance by the community in additionto their defined policy and procedures

The success rate of RfPs is relatively high since only a smallnumber of RfPs are closed as unsuccessful Our findings show thatamong the 81 unsuccessful RfPs only some of the RfPswere rejectedby the community directly the majority of them were unsuccessfuldue to the reason that the operators were not responding or had

37Wikibase is based on MediaWiki software with the Wikibase extension httpswwwmediawikiorgwikiWikibase

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

withdrawn the request Therefore operatorrsquos inactivity is a higherreason for failure than community refusal In some cases (ie theRfPs discussed most) we can see that the community considersevery detail of the tasks and then comes to a decision however insome cases they approve the request without a detailed discussionas can be seen in the cases of unknown tasks and unknown sourcesThis result could indicate that in addition to the defined procedurefor RfPs the community applies a more flexible approach whendeciding on RfPs considering the context (eg editors experience)of the application into account

In summary the high approval rate of RfPs shows that the Wiki-data community is positive towards bot operations in Wikidata andwilling to take advantage of task automation through bots Wiki-datarsquos community profited from the experiences of the Wikipediacommunity and built productive human-bot processes in quite ashort period However we will leave the impact of these processeson data quality to future research

6 CONCLUSIONWe studied the formal process of requesting bot rights in Wikidatato find out what kind of task types are allowed to be automated bybots Our study provides a detailed description of the RfP processWe retrieved the closed RfPs from the Wikidata archive up to mid-2018 We defined a scheme in the form of a codebook to classifythe RfPs and developed our dataset The RfPs were studied mainlyfrom two perspectives 1) What information is provided duringthe time the bot rights are requested and 2) how the communityhandles these requests We found that the main tasks requestedare adding claims statements terms and sitelinks into Wikidata aswell as the main source of bot edits have their roots in WikipediaThis contrasts with Wikipedia where bots are performing mostlymaintenance tasks Our findings also show that most of the RfPswere approved and a small number of them were unsuccessfulmainly because operators had withdrawn or there was no activityfrom the operators

Future directions of this research will focus on analyzing theactual bot activities and comparing them with the results of thisstudy to find out whether bots are really performing the tasks thatthey had asked for This could help us understand better how botsevolve over time and whether the community controls bot activitiesbased on their permitted tasks or bots can perform any edit oncethey get the permission to operate as a bot

ACKNOWLEDGMENTSWe thank the Deutscher Akademischer Austauschdienst (DAAD)[Research grant 57299294] and the German Research Foundation(DFG) [Fund number 229241684] who supported this research finan-cially and the anonymous reviewers for their valuable commentswhich helped to improve the paper

REFERENCES[1] Emilio Ferrara Onur Varol Clayton A Davis Filippo Menczer and Alessandro

Flammini 2016 The rise of social bots Commun ACM 59 7 (2016) 96ndash104[2] Michael Faumlrber Frederic Bartscherer Carsten Menne and Achim Rettinger 2018

Linked data quality of DBpedia Freebase OpenCyc Wikidata and YAGO Se-mantic Web 9 1 (Jan 2018) 77ndash129 httpsdoiorg103233SW-170275

[3] R Stuart Geiger 2009 The social roles of bots and assisted editing programs InInt Sym Wikis

[4] R Stuart Geiger 2012 Bots are users too Rethinking the roles of software agentsin HCI Tiny Transactions on Computer Science 1 (2012) 1

[5] R Stuart Geiger and Aaron Halfaker 2013 When the levee breaks without botswhat happens to Wikipediarsquos quality control processes In Proceedings of the 9thInternational Symposium on Open Collaboration (WikiSym rsquo13) ACM New YorkNY USA 61ndash66 httpsdoiorg10114524910552491061

[6] R Stuart Geiger and Aaron Halfaker 2017 Operationalizing conflict and co-operation between automated software agents in Wikipedia a replication andexpansion of ldquoeven good bots fightrdquo ACM Hum-Comput Interact 1 2 (Nov 2017)33 httpsdoiorg1011453134684

[7] R Stuart Geiger andDavid Ribes 2010 Thework of sustaining order inWikipediathe banning of a vandal In Proceedings of the 2010 ACM Conference on ComputerSupported Cooperative Work (CSCW rsquo10) ACM New York NY USA 117ndash126httpsdoiorg10114517189181718941

[8] Aaron Halfaker R Stuart Geiger Jonathan T Morgan and John Riedl 2013 Therise and decline of an open collaboration system how Wikipediarsquos reaction topopularity is causing its decline 57(5) (2013) 664ndash688 httpsdoiorg1011770002764212469365

[9] Aaron Halfaker and John Riedl 2012 Bots and cyborgs Wikipediarsquos immunesystem Computer 45 3 (March 2012) 79ndash82 httpsdoiorg101109MC201282

[10] AndrewHall Loren Terveen and Aaron Halfaker 2018 Bot detection inWikidatausing behavioral and other informal cues Proceedings of the ACM on Human-Computer Interaction 2 CSCW (Nov 2018) 1ndash18 httpsdoiorg1011453274333

[11] Lucie-Aimeacutee Kaffee Alessandro Piscopo Pavlos Vougiouklis Elena SimperlLeslie Carr and Lydia Pintscher 2017 A glimpse into Babel an analysis ofmultilinguality in Wikidata In Proceedings of the 13th International Symposiumon Open Collaboration (OpenSym rsquo17) ACM New York NY USA 141ndash145httpsdoiorg10114531254333125465

[12] Jonathan Lazar JinjuanHeidi Feng andHarryHochheiser 2017 ResearchMethodsin Human-Computer Interaction (2nd ed) Morgan Kaufmann

[13] Brenda Leong and Evan Selinger 2019 Robot Eyes Wide Shut UnderstandingDishonest Anthropomorphism ACM New York New York USA

[14] Randall M Livingstone 4 January 2016 Population automation An interviewwith Wikipedia bot pioneer Ram-Man First Monday 21 1 (4 January 2016)httpsdoiorg105210fmv21i16027

[15] Claudia Muumlller-Birn 2019 Bots in Wikipedia Unfolding their duties TechnicalReports Serie B TR-B-19-01 Freie Universitaumlt Berlin httpsrefubiumfu-berlindehandlefub18824775

[16] Claudia Muumlller-Birn Benjamin Karran Janette Lehmann and Markus Luczak-Roumlsch 2015 Peer-production system or collaborative ontology engineering effortWhat is Wikidata In Proceedings of the 11th International Symposium on OpenCollaboration ACM 20

[17] Alessandro Piscopo 2018 Wikidata A New Paradigm of Human-Bot Collabora-tion arXiv181000931 [cs] (Oct 2018) httparxivorgabs181000931 arXiv181000931

[18] Alessandro Piscopo Lucie-Aimeacutee Kaffee Chris Phethean and Elena Simperl2017 Provenance information in a collaborative knowledge graph an evaluationof Wikidata external references In The Semantic Web ndash ISWC 2017 (LectureNotes in Computer Science) Springer Cham 542ndash558 httpsdoiorg101007978-3-319-68288-4_32

[19] Alessandro Piscopo Chris Phethean and Elena Simperl 2017 What makes a goodcollaborative knowledge graph group composition and quality in Wikidata InSocial Informatics (Lecture Notes in Computer Science) Giovanni Luca CiampagliaAfra Mashhadi and Taha Yasseri (Eds) Springer International Publishing 305ndash322

[20] Thomas Steiner 2014 Bots vs Wikipedians Anons vs Logged-Ins (Redux) aglobal study of edit activity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration (OpenSym rsquo14) ACM New YorkNY USA 251ndash257 httpsdoiorg10114526415802641613

[21] Margaret-Anne Storey and Alexey Zagalsky 2016 Disrupting developer produc-tivity one bot at a time In Proceedings of the 2016 24th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering (FSE 2016) ACM New YorkNY USA 928ndash931 httpsdoiorg10114529502902983989

[22] Wikidata 2013 RfP ElphiBot_3 wwwwikidataorgwikiWikidataRequests_for_permissionsBotElphiBot_3 [Last accessed 01 August 2018]

[23] Wikidata 2013 RfP ImplicatorBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotImplicatorBot [Last accessed 01 August 2018]

[24] Wikidata 2014 RfP Structor wwwwikidataorgwikiWikidataRequests_for_permissionsBotStructor [Last accessed 01 August 2018]

[25] Wikidata 2014 RfP VlsergeyBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotVlsergeyBot [Last accessed 01 August 2018]

[26] Wikidata 2016 RfP Phenobot wwwwikidataorgwikiWikidataRequests_for_permissionsBotPhenobot [Last accessed 01 August 2018]

[27] Wikidata 2016 RfP WikiLovesESBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotWikiLovesESBot [Last accessed 01 August 2018]

  • Abstract
  • 1 Introduction
  • 2 Bots in Peer Production
    • 21 Bots in Wikipedia
    • 22 Bots in Wikidata
      • 3 Approving Bot Tasks
        • 31 Approval Process in Wikidata
        • 32 Data Collection
        • 33 Task Approval Data
        • 34 Classification Process
          • 4 Results
            • 41 Approved Requests
            • 42 Disputed Requests
              • 5 Discussion
              • 6 Conclusion
              • Acknowledgments
              • References
Page 2: Approving Automation: Analyzing Requests for Permissions ... · • We used our defined scheme to identify bot-requested tasks and data sources. • We analyzed how the community

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

bull We investigated bots from a community perspective basedon RfPs

bull We provided a classification scheme for the categorizationof RfPs

bull We used our defined scheme to identify bot-requested tasksand data sources

bull We analyzed how the community decides on RfPs and clas-sified the reasons for unsuccessful RfPs

bull We developed a dataset of bots RfPsIn the following sections we firstly present existing research

on bots in peer-production by focusing on Wikimedia projectsand how our work is built on them We then explain our datasetand classification approach Next come the findings of the studyfollowed by a discussion of the results We conclude this article byhighlighting possible directions for future research

2 BOTS IN PEER PRODUCTIONVery different bots populate peer production communities Thesebots collect information execute functions independently createcontent or mimic humans The effects of bots on peer productionsystems and our society are increasingly being discussed for ex-ample when influencing voting behaviour [1] or imitating humanbehaviour [13]

Bots have been used in Wikipedia from early on4 Wikidatarsquoscommunity has profited from these experiences when handlingtheir bots We review therefore existing insights into the bot com-munity on Wikipedia and building on that highlight research onWikidata that considers bot activity

21 Bots in WikipediaBots are software programs that automate tasks usually repeti-tive or routine tasks which humans consider time-consuming andtedious (eg [3 4 16 21]) They are operated and controlled byhumans

Wikipedia has developed a stable and increasingly active botcommunity over the years although bots were not widely acceptedand trusted in the beginning [14] Halfaker et al distinguishes fourtypes of bots in Wikipedia [9] (1) Bots that transfer data frompublic databases into articles (2) bots that monitor and curatearticles (3) bots that extend the existing software functionality ofthe underlying Wikipedia MediaWiki software and (4) bots thatprotect against vandalism

The majority of research focuses on bots that protect againstvandalism Geiger et al [7] for example investigated the process offighting vandalism in Wikipedia by using trace ethnography Theyshow how human editors and bots work together to fight vandalismin Wikipedia They conjecture that such distribution of concernsto human and algorithmic editors might change the moral order inWikipedia Halfaker et al [8] show how a distributed cognitive net-work of social and algorithmic actors works efficiently together todetect and revert vandalism on Wikipedia In another study Geigeramp Halfaker [5] investigated the impact of a counter-vandalism botrsquosdowntime on the quality control network of Wikipedia and found

4 httpsenwikipediaorgwikiWikipediaHistory_of_Wikipedia_botsrambot_and_other_small-town_bots

that during this downtime the quality control network performedslower but was still effective

Another piece of research focuses on how such tools transformthe nature of editing and user interaction Geiger shows how aweak but pre-existing social norm was controversially reified intoa technological actor [3] He refers to the example of the Hager-manBot which has been implemented to sign unsigned discussionentries in Wikipedia Halfaker et al [9] show that bots are not onlyresponsible for the enforcement of existing guidelines on a largerscale but also that their activities can have unexpected effects Thenumber of reverts of newcomersrsquo edits for example has elevatedwhile (surprisingly) the quality of those edits has stayed almost con-stant Editors increasingly apply algorithmic tools for monitoringedits of newcomers In 2010 40 percent of rejections of newcomersrsquocontributions were based on these algorithmic tools [9] This con-tradicts attempts of the community to engage more new editorsMoreover Geiger and Halfaker defined bots as assemblages of codeand a human developer and show that botrsquos activity is well alignedwith Wikipediarsquos policy environment [6]

The research suggests that bots are more critical to the suc-cess of the Wikipedia project than expected previously despite thereluctance of the Wikipedia community to allow bots at the begin-ning [14] Bots have a significant role in maintaining this text-basedknowledge base especially in fighting vandalism As bots in Wiki-data have their roots in Wikipedia we expect to see similaritiesbetween bots in both peer production systems - Wikipedia andWikidata Before we look closer to see if the same areas of use ofbot activities emerge from Wikidata we give an overview of theexisting insights into the bot community in Wikidata

22 Bots in WikidataWikidata inherited bots from its sister project Wikipedia and botsstarted editing Wikidata with its launch by linking Wikidata itempages to their respective Wikipedia language pages The currentresearch on Wikidata bots shows that bots perform most of theedits in Wikidata [16 20] Steiner [20] in his research aims tounderstand the editing distribution of editors on Wikidata andWikipedia He provides a web application to observe real-time editactivity on Wikidata for bots and logged-in and anonymous usersThe study shows that the number of bots vs the number of editshas grown in a linear form and most of Wikidatarsquos edits ie 88account for bot edits

Muumlller-Birn et al [16] can confirm these insights in a later studyby studying the community editing patterns of Wikidata through acluster analysis of contributorsrsquo editing activities They determinesix editing patterns of the participating community (ie referenceeditor item creator item editor item expert property editor andproperty engineer) and show that bots are responsible for simplerediting patterns such as creating items editing items statementsor sitelinks

Further studies focus on how bot edits contribute to data qualityin Wikidata In one study on Wikidatarsquos external references Pis-copo et al [18] find that the diversity of external sources in botedits is lower than in human edits In another study Piscopo etal [19] explore the influence of bots and humans (registered andanonymous) contributions on the quality of data in Wikidata The

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

bot nameoperator nametaskscode

function details

decision datedecision

Figure 1 Example of a request-for-permissions (RfP) page with the extracted fields

research shows that equal contributions of humans and bots have apositive impact on data quality while more anonymous edits leadtowards a lower quality

Hall et al [10] analyzed these anonymous edits on Wikidata todetect bots in this group which had not previously been identi-fied in Wikidata The study shows that two to three percent of thecontributions (more than 1 million edits) considered as human con-tributions previously came from unidentified bots They emphasizethat it might be a concerning issue for Wikidata and all projectsrelying on these data Even if vandalism causes a small portion ofthese edits the damaging effect on Wikidata might be high

This danger is reflected by Piscopo [17] who highlights signifi-cant challenges for Wikidata that might endanger its sustainabilityNamely a lack of quality control because of the large amount ofdata added by bots a lack of diversity because of the usage of a fewsources only and existing threats to user participation because ofbot usage

Existing research focuses primarily on the activity levels of botsin Wikidata based on their edits Some work (eg [17]) conveys theimpression that bots are independent of humans However thisignores the fact that humans operate bots The community officiallygrants most bot activities in a well-defined process

It is visible from the literature that bots are so far studiedmainly from their activity angle both in Wikipedia and WikidataIn Wikipedia bots are used primarily for quality assurance tasksie vandalism detection and maintenance tasks for example re-movingreplacing templates on articles while Wikidatarsquos bots areperforming tasks mostly related to content editing A possible rea-son could be the structured nature ofWikidata content which is lesschallenging to deal with than the unstructured data in WikipediaIt is interesting to dig into the content editing activities for whichthe bots have asked most in Wikidata and approved by the commu-nity While adding more content through bots seems to contributeto data quality from completeness angle this raises the questionwhether this data is based on sources and improves data qualityfrom the data trustworthiness aspect The following study providesthe ground for answering such questions in the future We inves-tigate the RfP process of bots in more detail We describe which

tasks Wikidatarsquos community find useful for bots and in what casesthe community does not support a bot request

3 APPROVING BOT TASKSIn this section we describe the bot approval process on Wikidataexplain our data collection detail the classification process andfinally give an overview of the resulting dataset

31 Approval Process in WikidataBuilding on the experiences of the Wikipedia community Wikidatahas had a well-defined policy system for bots almost from the be-ginning (November 2012)5 Except for cases like editing Wikidatarsquossandbox6 or their own user pages every (semi-) automatic taskcarried out by a bot needs to be approved by the community Itmeans that before operators can run their bots on Wikidata theyhave to open an RfP for their bot7 The RfP is thus the formal wayof requesting bot rights for an account where the decisions on RfPsare based on the community consensus

An RfP is caused by either an editorrsquos need or by a communityrequest8 Such a bot request is well documented and available toall interested community members on Wikidata

Figure 1 shows a typical RfP page It consists of a bot name9an operator name (bot owner) tasks (a brief description of whatthe bot intends to do) link to the source code function details (adetailed description of bot tasks and sources of imported data) andthe final decision (RfP approved or not approved with a reason)

Bot owners have to use a template for providing all informationfor the decision-making process and need to clarify questions re-garding their request during the decision-making process Theyalso often have to provide a number of test edits (50 to 250 edits)

5 httpswwwwikidataorgwindexphptitle=WikidataBotsampoldid=5491666 Wikidatarsquos sandbox can be used for test edits httpswwwwikidataorgwikiWikidataSandbox7All open requests are available at wwwwikidataorgwikiWikidataRequest_for_permissionsBot8An example can be found at httpswwwwikidataorgwindexphptitle=WikidataBot_requestsampoldid=38203229Polish_22Italian_comune22_descriptions9According to the bot policy bot accounts are recommended to have the word bot intheir names

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

Table 1 Example of the information extracted from an RfPpage

URL wwwwikidataorg[]BotMastiBotBot Name mastiBotOperator Name MastiTask and Function add basic personal information based on

biografy related infoboxes from plwikiDecision approvedDecision Date 15092017First Edit 03092017Last Edit 21022018No of Edits 8No of Editors 5

The community handles RfPs in accordance with the Bot ApprovalProcess10 The community discuss whether the task requested isneeded and whether the implementation of the task functions prop-erly After the decision has been made an administrator or a bu-reaucrat close the request if approved a bureaucrat will flag theaccount and if not the request is closed stating the reason for theunsuccessful request

After a successful request each bot operator has to list the taskthe bot performswith a link to the RfP on its user page However thebot operator is required to open a new RfP if there is a substantialchange in the tasks the bot performs Consequently bots can havemultiple requests which are shown on the user page Furthermorein special cases a bot is also allowed to carry out administrativetasks such as blocking users or deleting or protecting pages Inthis case the operator needs to apply for both the RfP and theadministrator status11

32 Data CollectionWe collected all bot requests that were in the final approval stage inJuly 201812 ie we ignored all tasks without a final decision fromWikidatarsquos archive for requests13 We collected our data based onweb scraping ie web data extraction programs implemented inPython Bot requests which were listed several times in differentarchives were only parsed once14 This resulted in 685 task approvalpages

We extracted the following information from each page (cf Fig-ure 1) URL bot and operator name decision decision date taskscode and function details We additionally collected the date of thefirst and last page edit15 the number of page edits and the numberof distinct editors who contributed to the request for each page by10A detailed description is available at wwwwikidataorgwikiWikidataBots11 Further information can be found at httpswwwwikidataorgwikiWikidataRequests_for_permissionsAdministrator12 The first request in our data set was opened on October 31 2012 and the last onewas on June 29 201813 wwwwikidataorgwikiWikidataRequests_for_permissionsArchiveRequests_for_bot_flags14 For example wwwwikidataorgwikiWikidataRequests_for_permissionsBotVIAFbot is listed in wwwwikidataorgwikiWikidataRequests_for_permissionsRfBotMarch_2013 and in wwwwikidataorgwikiWikidataRequests_for_permissionsRfBotApril_2013 and only the later one was used15 The first edit can be interpreted as the request opening and last edit as the requestclosing or date of archiving

using the MediaWiki API16 This extracted information (cf Table 1)was processed and stored in a relational database17

33 Task Approval DataWe collected 683 distinct requests from Wikidata however two ofthem are no longer available and consequently we excluded themOf the resulting 681 requests 600 requests (88) were successfuland 81 (12) were unsuccessful

An average of five people (σ = 28) participated in an approvalrequest for example by taking part in the discussion or stating thefinal decision These people accounted for an average of slightlyabove ten edits for each request (σ = 85)

Based on the requests collected we identified 391 distinct botsand 366 distinct bot operators on Wikidata (cf Table 2) Someoperators applied for more than one task for their bot in one request(eg update label update description update alias) with five beingthe highest number Furthermore bot owners can operate morethan one bot The majority of operators (319 editors) have onlyone bot while three editors were running three bots and there waseven one operator managing seven bots simultaneously Similarlyone bot can also be operated by more than one user for examplethree editors were managing the ProteinBoxBot together

Table 2 Overview on task requests (n = 683 number of all taskrequests)

Decision Result Requests No of Bots Operators

Successful 600 323 299Unsuccessful 81 68 67Both 681 391 366

34 Classification ProcessWe classified the data collected manually to gain a deeper under-standing of the different tasks that bot operators carry out on Wiki-data In the following we describe this process in detail

We employed the qualitative method of content analysis whichis used to analyze unstructured content [12] which in our caseis the text in the RfPs We classified the data manually using theopen coding approach of the grounded theory The data we weredealing with were applicantsrsquo own sentences (in vivo codes) thuswe developed an emergent coding scheme and categorized thetextual information Two of the authors18 coded the data in a three-phase process by ensuring the consistency of the codes and reducingpossible bias during the coding

We read a sample of the data collected and discussed the diverserequest details We noticed that some common categories could befound within all requests based on some particular perspectivessuch as the potential actions of the requesting bots and the mainsources of the data to be imported from bots We focused thus ontask and function details of the RfP page and extracted the intended

16 wwwmediawikiorgwikiAPIMain_page17 The database is released under a public license on GitHub httpsgithubcomFUB-HCCwikidata-bot-request-analysis18 The first and second authors of this paper

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

use (task) and the data source of the bot edits which were approvedor closed as successful

In the first phase we categorized 30 randomly chosen RfPs col-laboratively to develop a shared understanding Based on the firstcategories we carried out a second round in which another set ofrandomly chosen 30 RfPs were categorized individually We thencompared the results and checked the agreement level19 After dis-cussing diverging cases and cross-validating our category set wecontinued with another round of categorizing the data (40 RfPs)individually and we had 39 agreements out of the 40 cases Wecontinued with further rounds of coding individually we met fre-quently on a regular basis and discussed new or unclear cases andcross-validated our category sets to ensure the consistency of ourdata classification

We developed a codebook20 as the primary reference of theclassification process by starting from the first phase and updat-ing it continually during the classification period The codes arestructured in verb-noun pairs denoting the content entities andoperations which are inspired by the work of Muumlller-Birn [15] andinitially drafted after the first 30 RfP classifications Based on thiswe developed it further by structuring all newly presented infor-mation into the verb-noun pairs and adding them to the codebookThe first category describes the potential scope of bot activity Thesub-categories contain all the entity types of Wikidata such asitems claims statements references and sitelinks The unknowncategory refers to the tasks without a clear definition We developedan overview which shows a scheme of all entity types and theirrespective activities together with the number of requests for theseedits (cf Table 3)

Another dimension of our classification considers the originof the data which the bot operators plan to include in WikidataThis category contains three main sub-categories Internal exter-nal and unknown sources Internal sources are those within theWikimedia sphere whereas external sources are outsideWikimediaThose requests without a specific source of data were categorizedas unknown

After coding all successful RfPs we became curious about cer-tain RfPs which were not approved Were there some unusual taskrequests or operators having difficulties in convincing the commu-nity of their botsrsquo safe activities To find this out we did anotherprocess of coding which resulted in 81 unsuccessfully closed RfPsWe classified these RfPs as before except we included the reasonsfor the request failure The category unsuccessful thus contains allRfPs that are not approved for reasons such as being duplicatesinactive for a long time not in line with bot policy or withdrawn(cp Table 6)

4 RESULTSWe organized the results of our RfP analyses in two sections Firstlywe focused on the approved bots requests and their contents andsecondly we looked into the details of highly discussed task ap-proval requests

19We measured the inter-rater reliability by using Cohenrsquos kappa At this stage wehad a substantial agreement level ie 06520 The codebook along with the data is provided on GitHub httpsgithubcomFUB-HCCwikidata-bot-request-analysis

Table 3 Task requests categorized into edit focus activitytype and the request result (approveddenied) (n = 681 num-ber of all task requests)

Edit Focus Activity Approved Denied

Item add 85 12update 40 3merge 6 1mark-deleted 1 0

Term add 22 5update 11 0

Label add 31 3update 15 1delete 1 0

Description add 33 5update 10 1

Alias add 4 0update 1 0delete 1 1

Statement add 141 19update 15 0delete 1 1

Claim add 161 18update 33 7delete 13 3

Qualifier add 7 0Reference add 9 3

update 4 0delete 1 0

Rank update 1 0

Sitelink add 52 3update 23 4delete 1 0merge 1 0revert 1 0

Badge add 4 0

Page add 0 1update 8 1delete 1 0

Community maintain 25 5Misc minusminus 26 13

41 Approved RequestsIn this part we show the types of activities which were approvedthe sources they used for editing and the relationship between botsand their operators We focus on three aspects of the tasks Theactivity focus the activity type and the origin of the data used bythe bots

411 Activity Type and Focus The activity focus considers whatthe bots have planned to edit in Wikidata It can be seen in Table 3that most task requests aim to edit items in Wikidata The differentcategories represent the different parts of an item which are or-ganized into terms ie labels descriptions and aliases statements

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

70 Add

20 Update

3 Maintain

3 Unknow

n

3 Delete

1 Revert

16 Item

16 Term

12 Sitelink

26 Claim

20 Statement

2 Reference5 M

isc

3 Unknow

n

Figure 2 Approved task requests organized into activity type (the upper half part) and edit focus (the lower half part)(n = 794 262 RfPs are requests with multiple tasks edit focus Term consists of Alias Label and Description The edit focus Miscellaneouscontains the smaller categories such as Qualifier Rank Badge and Maintenance related data)

(ie claims with qualifiers) references ranks and sitelinks withbadges21 The number of approved task requests show that themajority of bots seem to focus on adding data to statements in gen-eral and claims more specifically Figure 2 shows the distributionof approved tasks into activity types and edit focuses Again themajority of task requests deal with the addition of data

Furthermore there are 30 requests dealing with community con-cerns regarding maintenance activities such as archiving pagessections moving pages revert edits remove duplicates generatestatistical reports as some of the examples

In addition to this there are 24 requests concluded as unknowntasks half of which are written extremely briefly22 or in an unclearway23 so that it was difficult for us to identify their potential tasksSome other cases can only be classified to a specific task based onparticular assumptions for instance ShBot24 uses a Bot-FrameworkQuickStatements25 to batch edits which shows a great possibilityof importing data to statements However this tool could also beused to remove statements or importing terms so there is stilla challenge to identify the requestsrsquo focus without analyzing theedits We categorized all tasks abiding strictly by our codebookthus all the RfPs which needed assumptions are therefore classifiedas unknown

We assumed that the data addition to Wikidata is related primar-ily to Wikidatarsquos inception ie that the majority of task approvalwould be more at the beginning of Wikidatarsquos lifetime HoweverFigure 3 shows that the most often requested activity types in thefirst six months were add and update At the end of 2017 thesetwo activity types were often requested again We found the peakin 2017 for requesting tasks such as adding and updating itemsunusual and investigated it further Two operators26 carried out 31requests in 2017 for importing items for different languages Since

21These badges present if a sitelink to a specific article on Wikipedia language is anexcellent article22eg wwwwikidataorgwikiWikidataRequests_for_permissionsBotGlavkos_bot23eg wwwwikidataorgwikiWikidataRequests_for_permissionsBotBotMultichillT24 wwwwikidataorgwikiWikidataRequests_for_permissionsBotShBot25 Further information are available at metawikimediaorgwikiQuickStatements26 Both editors are employed by Wikimedia Sverige and were engaged in a GLAMinitiative

120

App

rove

d Re

ques

ts

mergerevertmaintain

addupdatedelete

unknown100

80

60

40

20

0

2013-0

6

2013-1

2

2014-0

6

2015-0

6

2014-1

2

2015-1

2

2016-0

6

2016-1

2

2017-0

6

2018-0

6

2017-1

2

2012-1

2

Figure 3 Activity types of approved task requests over time(n = 600 all 81 unsuccessful task requests were not included)

all these requests were permitted it explains the sudden accelera-tion of importing and updating tasks shown in the graph

In the following step we looked more closely at the sources ofthe data which bots are supposed to add to Wikidata

100

Sour

ces o

f app

rove

d ta

sks

80

60

40

20

0

internalexternalunknown

2013-0

6

2013-1

2

2014-0

6

2015-0

6

2014-1

2

2015-1

2

2016-0

6

2016-1

2

2017-0

6

2018-0

6

2017-1

2

2012-1

2

Figure 4 Sources of approved task requests over time (n =600 all 81 unsuccessful task requests were not included)

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 4 All internal the top 10 external sources most usedand unknown sources

Origin Source Task Requests

Internal (455)

Wikipedia 263Wikidata 110Wiki-loves-monuments 30Wikicommons 13Wikisource 5Wikivoyage 4Wikinews 2Wikiquote 2WikiPathways 2Wikimedia Project 24

External (38)

MusicBrainz 8VIAF 7OpenStreetMap 4DBpedia 4Freebase 4GRID 3GND 2GitHub 2YouTube 2Disease Ontology Project 2

Unknown Unknown 107

412 Data Origin In addition to the types of tasks in RfPs thesources from which data are imported also exist in the RfP Wetried to identify all sources independently of the focus or activitytype in the classification of all task approval pages We differentiateinternal external and unknown sources

(1) Internal including all sources from the Wikimedia ecosys-tem

(2) External All sources outside Wikimediarsquos ecosystem and(3) Unknown All cases where the source of data was a local

filedatabase or not clearly statedTable 4 shows the number of tasks approved per data source The

majority of approved task requests (454) deal with data from theWikimedia ecosystem with Wikipedia providing most of the dataIn addition to Wikipedia the Wiki Loves Monuments (WLM) ini-tiative is organized worldwide by Wikipedia community membersand Wikimedia projects The category Wikimedia project refers torequests where the operator provided information on the sourcebeing restricted to Wikimedia projectsrsquo scope however a specificattribution to one of the project was not possible

Another important source refers toWikidata itselfWe found thatonly a small number of these data are used in maintenance tasks27the majority of requests are for tasks such as retrieving informationfrom Wikidata adding descriptions and updating or adding newlabels by translating the existing labels in other languages

There are 128 task approval requests that relate to a task ofimporting external data into Wikidata The external sources most27Among 104 requests with a data source of Wikidata there are 16 requests holdingtasks of maintenance such as archive discussions or update database reports

Vietnam

ese

Hebrew

Maithill

Slova

k

Cebuan

o

Estonian

GreekCzec

hNep

ali

Bulgaria

nWels

h

Georgi

anFre

nch

Russian

German

Arabic

Farsi

Turkish

DanishUrduHindi

Latvi

an

Min-Nan

English

Dutch

Roman

ian

Swed

ish

Korean

Japan

ese

Belarusia

n

Bosnian

Serbi

an aiMala

y

Catalan

Ukranian

Chinese

Norweg

ianPoli

shIta

lian

Span

ish

5

10

15

20

25

30

35

40

0

Figure 5 Wikipedia language versions used as sources men-tioned on task approval pages (n = 195 all 488 other requestseither did not provide a specific Wikipedia language version or didnot have Wikipedia as the source)

often mentioned are MusicBrainz28 and VIAF29 (cf Table 4) Bothsources are used mainly to add identifiers to Wikidata items Otherexternal sources are OpenStreetMap30 DBpedia31 and Freebase32

As shown in Figure 4 the internal data sources have remainedthose most often mentioned in the task requests External sourcesand unknown sources are on the same level and external sourcesshow a slight increase over time

Most data from Wikipedia comes from 43 different languageversions (cf Figure 5) with English French German Russian andPersian being the top five languages The results show that botscontribution to certain languages had made these languages moreprominent than others in Wikidata

There is a total of 109 RfPs in total with unknown sources 85of them were approved some of these requests mentioned a localfile or database as a data source Other RfPs are not adding data toWikidata but are for example moving pages

42 Disputed RequestsThere are 81 RfPs in our dataset which were closed unsuccessfullyWe wanted to understand better why the community decided todecline these requests thus we investigated the main reasons forall unsuccessful RfPs

Our data show that operators themselves are most commonlyresponsible for their RfPs being rejected Most of the time operatorswere not responding to questions in a timely manner or did notprovide enough information when required There are only a fewcases where RfPs were not approved by the community ie nocommunity consensus was reached or the bot violated the botpolicy In one case33 for instance an editor was asking for botrights for its own user account instead of the one for the bot This

28MusicBrainz is an open and publicly available music encyclopedia that collects musicmetadata available at wwwmusicbrainzorg29VIAF is an international authority file which stands for Virtual International Au-thority File available at wwwviaforg30OpenStreetMap is an open license map of the world available at wwwopenstreetmaporg31DBpedia is also a Wikipedia-based knowledge graph wikidbpediaorg32Freebase was a collaborative knowledge base which was launched in 2007 and closedin 2014 All data from Freebase was subsequently imported into Wikidata33 wwwwikidataorgwikiWikidataRequests_for_permissionsBotFlorentyna

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

Table 5 Top 5 most edited task requests closed as unsuccessful

Reasons No of RfP Edits No of Editors RfP Created At Reference

Data source introduced errors 88 10 2018-03 [26]Bot name bot edits bot performance 43 6 2015-12 [24]Automatic adding of implicit statements 41 21 2013-04 [23]Support from less active users duplicate task 32 16 2018-03 [27]Bot name conflict with users running without flag 32 11 2014-10 [25]

violates bot policy as operators are required to create a new accountfor their bots and include the word bot in the account name

Table 6 shows the main reasons why tasks were rejected RfPswhich had already been implemented by other bots (duplicate)or RfPs requesting for tasks which were not needed such as onebot which wanted to remove obsolete claims related to property(P107)34 but there were no more items associated with P107

Furthermore RfPs were closed as unsuccessful because the com-munity could not trust on the operatorsrsquo behaviors One bot forexample has requested many RfPs which were not accepted Thecommunity was questioning its editing behavior and required thisuser to gain community trust first It can be seen that unsuccessfulclosing is not only related to the task types that users had requested

In the following we describe six RfPs which were edited anddiscussed most of all (cf Table 5) We defined the number of editson an RfP page as a proxy for challenging cases

Phenobot [26] was asking for correcting species names basedon the UniProt Taxonomy database The community doubted thedata source (UniProt Taxonomy database) and insisted on morereliable information The bot has introduced some error reportstherefore the community had to ask for edit-rollbacks Since thenthe operator has been inactive for two years which finally led tothis request being closed unsuccessfully

WikiLovesESBot [27] which wanted to add statements related toSpanish Municipalities from Spanish Wikipedia gained continuoussupport from nine users until a user asked for more active Wiki-data users to comment After this one active user checked the botcontributions and pointed out that this bot was blocked Moreoveranother active user figured out that this request was asking forduplicate tasks since there was already another user also workingon Spanish municipalities The RfP remained open for about halfa year without any operator activities and came to a proceduralclose

Another remarkable case is VlsergeyBot [25] which intended toupdate local dictionaries transfer data from infobox to Wikidataand update properties and reports in Wikidata At first the accountwas named Secretary which was against the bot policy Aftera community suggestion it was renamed to VlsergeyBot Afterthat a user opposed the request and said that Vlsergeylsquos activitygenerates a number of large conflicts with different users in ruwikiand then another user stated that maybe you have a personal con-flict with Vlsergey and discussions looked like a personal conflictThe community then considered whether this account needs a botflag and found that the bot was already running without a bot flagfor hours and even broke the speed limit for bots with a flag The

34P107 was a property representing GND type

operator promised to update his code by adding restrictions beforethe next run Finally the operator withdrew his request stating thathe does not need a bot flag that much

ImplicatorBot [23] was intended to add implicit statements intoWikidata automatically Implicated statements are relationshipsbetween items which can be inferred from existing relationships Aproperty of an item stating that a famous mother has a daughterimplies for example that the daughter has a mother even thoughit is not represented yet The task sounds quite straightforwardbut the community was hesitating to approve the task consideringthe possibility of vandalism Furthermore changes in propertiesare still common and such a bot might cause tremendous furtherwork in the case of those changes Even if the operator had sharedhis code discussions proceeded with the highest number of editorsdiscussing a request (ie 21) and finally the operator stated that heis ldquosick and tired of the bickeringrdquo Despite the community insistingthat they needed the task the operator withdrew it

Similar to ImplicatorBot Structor [24] was withdrawn by theoperator because the operator was tired of bureaucracy The botwanted to add claim label and description particularly to providestructured information about species items There were many ques-tions raised by a small number of community members (ie a totalof six different users participated in the discussion) including thebot name the source of edits duplicate entries the edit summaryand edit test performances The operator was actively respondingto all these questions However the whole process was so surpris-ingly complex (the discussions continued for over seven monthsfrom the start that the operator was inactive for a long time thenwithdrew his request

Requests denied because the user is not trusted are also inter-esting ElphiBot_335 [22] for example which is managed togetherby three operators sent a request along with the source code forupdating the interwiki links in Wikidata after the category in Ara-bic Wikipedia has been moved The community preferred how-ever someone who is more responsible and more familiar with thetechnical aspects of Wikipedia and Wikidata What concerned thediscussants most was that the operators were not capable enoughto fix the bugs introduced by their bot The community did not feelcomfortable approving this request because the operatorsrsquo treat-ment for fixing bugs was simply to reverse the mistakes manuallyThe requirement to run a bot should not only be to know how tocode but surely also to respond in a timely manner and notify thosewho can fix the issue

35This RfP has a total of 29 edits and 8 editors contributed in the discussions

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 6 Main reasons for unsuccessful task requests (n = 81 number of unsuccessful task requests)

Reason Description Frequency

No activity from operator Operators are not active or do not respond to the questions regarding their bots 35Withdrawn Operators wanted to close the requested task 14Duplicate Requested tasks are redundant and are already done through other bots or tools 9No community consensus Community opposed the requested tasks 6User not trusted Operator has questionable editing behavior and ability to fix any bugs introduced 6Task not needed Tasks do not need a bot or are not a significant problem for the time being 5Donrsquot need bot flag The requested tasks can be performed without a bot flag 3No info provided The fields in the RfP page are left empty 2Against bot policy Task is not in line with bot policy 1

5 DISCUSSIONIn this study we collected existing data from the RfP pages onWikidata to find out what kind of tasks the community allows tobe automated or shared with bots

Looking at the type and focus of the tasks which were mentionedmost often in RfPs we found that the dominant request type overtime is adding data This observation suggests that Wikidatarsquoscommunity uses bots to increase the coverage of the provideddata which aligns with the vision of the Wikimania FoundationImagine a world in which every single human being can freelyshare in the sum of all knowledge 36 Updating data is the secondhighest requested task which shows bots also take part in correctingmistakes or refreshing data according to changes (eg updatingthe sitelinks of moved pages) and contribute in data completenesswhich is defined as one data quality dimension [2]

However we were surprised that there were fewer requestsregarding adding or updating references which could support thetrustworthiness a data quality dimension as well of data importedby bots This result supports existing analyses on the edit activitiesof bots [18] However it would be interesting to bring these twolines of research together - the task approval and edit perspective -to understand more closely the development over time Thus wecan infer that bots are allowed to assist the Wikidata communityin ensuring data quality from the completeness angle howeverthis is less visible from the trustworthiness angle In comparison toWikipedia where bots perform primarily maintenance tasks botrequests in Wikidata concentrate mainly on the content ie dataperspective

The majority of data sources mentioned in RfPs come from in-side Wikimedia projects mainly Wikipedia (cf Section 412) Thisobservation implies that Wikidata is on its way to serve as thestructured data source for Wikimedia projects one of the maingoals for the development of Wikidata Among the five languageversions of Wikipedia most used English French Russian Germanand Persian (cf Figure 5) the first three languages show a dom-inance of Western knowledge in bots imports In a study Kaffeeet al [11] had earlier found the influence of western languagesin Wikidata with the most dominant ones being English DutchFrench German Spanish Italian and Russian The similarity of theresults in both studies could imply that the reason these languages

36Information on Wikimedia strategy can be found on httpswikimediafoundationorgaboutvision

have better coverage than other languages is a result of having botsKaffee et al also found that language coverage is not related to thenumber of speakers of a language which supports our assumptionWe could not find evidence that languages other than Western onesare underrepresented on bot requests on purpose We rather ex-pect that the current bot requests are a representative sample ofWikidatarsquos community

We can see from the external sources that bots use these sourcesmostly for importing identifiers (eg VIAF) or for importing datafrom other databases (eg DBpedia Freebase) This insight supportsPiscoporsquos [17] argument that bot edits need to be more diverse Wesuggest that further efforts should be made to import or link datafrom different sources for example from research institutions andlibraries With the increased usage of the Wikibase software37 weassume that more data might be linked or imported to WikidataOur classification revealed that bots import data from local files ordatabases already However such data imports often rely on thecommunity trusting the bot operators and do not seem large-scale

The requested maintenance tasks within the RfP show similarpatterns to those in the Wikipedia community Bots are being usedto overcome the limitations of the MediaWiki software [9] Anadministrative task such as deletion is not often requested in RfPsHowever bots on Wikidata are allowed to delete sitelinks onlySimilar to Wikipedia the Wikidata community comes closer to acommon understanding of what bots are supposed to do and whatnot

The issue of unknown tasks which was not clearly defined inRfPs shows the trust the community has in single operators proba-bly due to their previous participation history There is a noticeablenumber of approved RfPs which we coded as unknown due to theirvague task description while there are also cases where task de-scriptions were clear but the community did not trust the operatorsand thus they were not approved These two observations indicatethat trust is also given importance by the community in additionto their defined policy and procedures

The success rate of RfPs is relatively high since only a smallnumber of RfPs are closed as unsuccessful Our findings show thatamong the 81 unsuccessful RfPs only some of the RfPswere rejectedby the community directly the majority of them were unsuccessfuldue to the reason that the operators were not responding or had

37Wikibase is based on MediaWiki software with the Wikibase extension httpswwwmediawikiorgwikiWikibase

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

withdrawn the request Therefore operatorrsquos inactivity is a higherreason for failure than community refusal In some cases (ie theRfPs discussed most) we can see that the community considersevery detail of the tasks and then comes to a decision however insome cases they approve the request without a detailed discussionas can be seen in the cases of unknown tasks and unknown sourcesThis result could indicate that in addition to the defined procedurefor RfPs the community applies a more flexible approach whendeciding on RfPs considering the context (eg editors experience)of the application into account

In summary the high approval rate of RfPs shows that the Wiki-data community is positive towards bot operations in Wikidata andwilling to take advantage of task automation through bots Wiki-datarsquos community profited from the experiences of the Wikipediacommunity and built productive human-bot processes in quite ashort period However we will leave the impact of these processeson data quality to future research

6 CONCLUSIONWe studied the formal process of requesting bot rights in Wikidatato find out what kind of task types are allowed to be automated bybots Our study provides a detailed description of the RfP processWe retrieved the closed RfPs from the Wikidata archive up to mid-2018 We defined a scheme in the form of a codebook to classifythe RfPs and developed our dataset The RfPs were studied mainlyfrom two perspectives 1) What information is provided duringthe time the bot rights are requested and 2) how the communityhandles these requests We found that the main tasks requestedare adding claims statements terms and sitelinks into Wikidata aswell as the main source of bot edits have their roots in WikipediaThis contrasts with Wikipedia where bots are performing mostlymaintenance tasks Our findings also show that most of the RfPswere approved and a small number of them were unsuccessfulmainly because operators had withdrawn or there was no activityfrom the operators

Future directions of this research will focus on analyzing theactual bot activities and comparing them with the results of thisstudy to find out whether bots are really performing the tasks thatthey had asked for This could help us understand better how botsevolve over time and whether the community controls bot activitiesbased on their permitted tasks or bots can perform any edit oncethey get the permission to operate as a bot

ACKNOWLEDGMENTSWe thank the Deutscher Akademischer Austauschdienst (DAAD)[Research grant 57299294] and the German Research Foundation(DFG) [Fund number 229241684] who supported this research finan-cially and the anonymous reviewers for their valuable commentswhich helped to improve the paper

REFERENCES[1] Emilio Ferrara Onur Varol Clayton A Davis Filippo Menczer and Alessandro

Flammini 2016 The rise of social bots Commun ACM 59 7 (2016) 96ndash104[2] Michael Faumlrber Frederic Bartscherer Carsten Menne and Achim Rettinger 2018

Linked data quality of DBpedia Freebase OpenCyc Wikidata and YAGO Se-mantic Web 9 1 (Jan 2018) 77ndash129 httpsdoiorg103233SW-170275

[3] R Stuart Geiger 2009 The social roles of bots and assisted editing programs InInt Sym Wikis

[4] R Stuart Geiger 2012 Bots are users too Rethinking the roles of software agentsin HCI Tiny Transactions on Computer Science 1 (2012) 1

[5] R Stuart Geiger and Aaron Halfaker 2013 When the levee breaks without botswhat happens to Wikipediarsquos quality control processes In Proceedings of the 9thInternational Symposium on Open Collaboration (WikiSym rsquo13) ACM New YorkNY USA 61ndash66 httpsdoiorg10114524910552491061

[6] R Stuart Geiger and Aaron Halfaker 2017 Operationalizing conflict and co-operation between automated software agents in Wikipedia a replication andexpansion of ldquoeven good bots fightrdquo ACM Hum-Comput Interact 1 2 (Nov 2017)33 httpsdoiorg1011453134684

[7] R Stuart Geiger andDavid Ribes 2010 Thework of sustaining order inWikipediathe banning of a vandal In Proceedings of the 2010 ACM Conference on ComputerSupported Cooperative Work (CSCW rsquo10) ACM New York NY USA 117ndash126httpsdoiorg10114517189181718941

[8] Aaron Halfaker R Stuart Geiger Jonathan T Morgan and John Riedl 2013 Therise and decline of an open collaboration system how Wikipediarsquos reaction topopularity is causing its decline 57(5) (2013) 664ndash688 httpsdoiorg1011770002764212469365

[9] Aaron Halfaker and John Riedl 2012 Bots and cyborgs Wikipediarsquos immunesystem Computer 45 3 (March 2012) 79ndash82 httpsdoiorg101109MC201282

[10] AndrewHall Loren Terveen and Aaron Halfaker 2018 Bot detection inWikidatausing behavioral and other informal cues Proceedings of the ACM on Human-Computer Interaction 2 CSCW (Nov 2018) 1ndash18 httpsdoiorg1011453274333

[11] Lucie-Aimeacutee Kaffee Alessandro Piscopo Pavlos Vougiouklis Elena SimperlLeslie Carr and Lydia Pintscher 2017 A glimpse into Babel an analysis ofmultilinguality in Wikidata In Proceedings of the 13th International Symposiumon Open Collaboration (OpenSym rsquo17) ACM New York NY USA 141ndash145httpsdoiorg10114531254333125465

[12] Jonathan Lazar JinjuanHeidi Feng andHarryHochheiser 2017 ResearchMethodsin Human-Computer Interaction (2nd ed) Morgan Kaufmann

[13] Brenda Leong and Evan Selinger 2019 Robot Eyes Wide Shut UnderstandingDishonest Anthropomorphism ACM New York New York USA

[14] Randall M Livingstone 4 January 2016 Population automation An interviewwith Wikipedia bot pioneer Ram-Man First Monday 21 1 (4 January 2016)httpsdoiorg105210fmv21i16027

[15] Claudia Muumlller-Birn 2019 Bots in Wikipedia Unfolding their duties TechnicalReports Serie B TR-B-19-01 Freie Universitaumlt Berlin httpsrefubiumfu-berlindehandlefub18824775

[16] Claudia Muumlller-Birn Benjamin Karran Janette Lehmann and Markus Luczak-Roumlsch 2015 Peer-production system or collaborative ontology engineering effortWhat is Wikidata In Proceedings of the 11th International Symposium on OpenCollaboration ACM 20

[17] Alessandro Piscopo 2018 Wikidata A New Paradigm of Human-Bot Collabora-tion arXiv181000931 [cs] (Oct 2018) httparxivorgabs181000931 arXiv181000931

[18] Alessandro Piscopo Lucie-Aimeacutee Kaffee Chris Phethean and Elena Simperl2017 Provenance information in a collaborative knowledge graph an evaluationof Wikidata external references In The Semantic Web ndash ISWC 2017 (LectureNotes in Computer Science) Springer Cham 542ndash558 httpsdoiorg101007978-3-319-68288-4_32

[19] Alessandro Piscopo Chris Phethean and Elena Simperl 2017 What makes a goodcollaborative knowledge graph group composition and quality in Wikidata InSocial Informatics (Lecture Notes in Computer Science) Giovanni Luca CiampagliaAfra Mashhadi and Taha Yasseri (Eds) Springer International Publishing 305ndash322

[20] Thomas Steiner 2014 Bots vs Wikipedians Anons vs Logged-Ins (Redux) aglobal study of edit activity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration (OpenSym rsquo14) ACM New YorkNY USA 251ndash257 httpsdoiorg10114526415802641613

[21] Margaret-Anne Storey and Alexey Zagalsky 2016 Disrupting developer produc-tivity one bot at a time In Proceedings of the 2016 24th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering (FSE 2016) ACM New YorkNY USA 928ndash931 httpsdoiorg10114529502902983989

[22] Wikidata 2013 RfP ElphiBot_3 wwwwikidataorgwikiWikidataRequests_for_permissionsBotElphiBot_3 [Last accessed 01 August 2018]

[23] Wikidata 2013 RfP ImplicatorBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotImplicatorBot [Last accessed 01 August 2018]

[24] Wikidata 2014 RfP Structor wwwwikidataorgwikiWikidataRequests_for_permissionsBotStructor [Last accessed 01 August 2018]

[25] Wikidata 2014 RfP VlsergeyBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotVlsergeyBot [Last accessed 01 August 2018]

[26] Wikidata 2016 RfP Phenobot wwwwikidataorgwikiWikidataRequests_for_permissionsBotPhenobot [Last accessed 01 August 2018]

[27] Wikidata 2016 RfP WikiLovesESBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotWikiLovesESBot [Last accessed 01 August 2018]

  • Abstract
  • 1 Introduction
  • 2 Bots in Peer Production
    • 21 Bots in Wikipedia
    • 22 Bots in Wikidata
      • 3 Approving Bot Tasks
        • 31 Approval Process in Wikidata
        • 32 Data Collection
        • 33 Task Approval Data
        • 34 Classification Process
          • 4 Results
            • 41 Approved Requests
            • 42 Disputed Requests
              • 5 Discussion
              • 6 Conclusion
              • Acknowledgments
              • References
Page 3: Approving Automation: Analyzing Requests for Permissions ... · • We used our defined scheme to identify bot-requested tasks and data sources. • We analyzed how the community

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

bot nameoperator nametaskscode

function details

decision datedecision

Figure 1 Example of a request-for-permissions (RfP) page with the extracted fields

research shows that equal contributions of humans and bots have apositive impact on data quality while more anonymous edits leadtowards a lower quality

Hall et al [10] analyzed these anonymous edits on Wikidata todetect bots in this group which had not previously been identi-fied in Wikidata The study shows that two to three percent of thecontributions (more than 1 million edits) considered as human con-tributions previously came from unidentified bots They emphasizethat it might be a concerning issue for Wikidata and all projectsrelying on these data Even if vandalism causes a small portion ofthese edits the damaging effect on Wikidata might be high

This danger is reflected by Piscopo [17] who highlights signifi-cant challenges for Wikidata that might endanger its sustainabilityNamely a lack of quality control because of the large amount ofdata added by bots a lack of diversity because of the usage of a fewsources only and existing threats to user participation because ofbot usage

Existing research focuses primarily on the activity levels of botsin Wikidata based on their edits Some work (eg [17]) conveys theimpression that bots are independent of humans However thisignores the fact that humans operate bots The community officiallygrants most bot activities in a well-defined process

It is visible from the literature that bots are so far studiedmainly from their activity angle both in Wikipedia and WikidataIn Wikipedia bots are used primarily for quality assurance tasksie vandalism detection and maintenance tasks for example re-movingreplacing templates on articles while Wikidatarsquos bots areperforming tasks mostly related to content editing A possible rea-son could be the structured nature ofWikidata content which is lesschallenging to deal with than the unstructured data in WikipediaIt is interesting to dig into the content editing activities for whichthe bots have asked most in Wikidata and approved by the commu-nity While adding more content through bots seems to contributeto data quality from completeness angle this raises the questionwhether this data is based on sources and improves data qualityfrom the data trustworthiness aspect The following study providesthe ground for answering such questions in the future We inves-tigate the RfP process of bots in more detail We describe which

tasks Wikidatarsquos community find useful for bots and in what casesthe community does not support a bot request

3 APPROVING BOT TASKSIn this section we describe the bot approval process on Wikidataexplain our data collection detail the classification process andfinally give an overview of the resulting dataset

31 Approval Process in WikidataBuilding on the experiences of the Wikipedia community Wikidatahas had a well-defined policy system for bots almost from the be-ginning (November 2012)5 Except for cases like editing Wikidatarsquossandbox6 or their own user pages every (semi-) automatic taskcarried out by a bot needs to be approved by the community Itmeans that before operators can run their bots on Wikidata theyhave to open an RfP for their bot7 The RfP is thus the formal wayof requesting bot rights for an account where the decisions on RfPsare based on the community consensus

An RfP is caused by either an editorrsquos need or by a communityrequest8 Such a bot request is well documented and available toall interested community members on Wikidata

Figure 1 shows a typical RfP page It consists of a bot name9an operator name (bot owner) tasks (a brief description of whatthe bot intends to do) link to the source code function details (adetailed description of bot tasks and sources of imported data) andthe final decision (RfP approved or not approved with a reason)

Bot owners have to use a template for providing all informationfor the decision-making process and need to clarify questions re-garding their request during the decision-making process Theyalso often have to provide a number of test edits (50 to 250 edits)

5 httpswwwwikidataorgwindexphptitle=WikidataBotsampoldid=5491666 Wikidatarsquos sandbox can be used for test edits httpswwwwikidataorgwikiWikidataSandbox7All open requests are available at wwwwikidataorgwikiWikidataRequest_for_permissionsBot8An example can be found at httpswwwwikidataorgwindexphptitle=WikidataBot_requestsampoldid=38203229Polish_22Italian_comune22_descriptions9According to the bot policy bot accounts are recommended to have the word bot intheir names

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

Table 1 Example of the information extracted from an RfPpage

URL wwwwikidataorg[]BotMastiBotBot Name mastiBotOperator Name MastiTask and Function add basic personal information based on

biografy related infoboxes from plwikiDecision approvedDecision Date 15092017First Edit 03092017Last Edit 21022018No of Edits 8No of Editors 5

The community handles RfPs in accordance with the Bot ApprovalProcess10 The community discuss whether the task requested isneeded and whether the implementation of the task functions prop-erly After the decision has been made an administrator or a bu-reaucrat close the request if approved a bureaucrat will flag theaccount and if not the request is closed stating the reason for theunsuccessful request

After a successful request each bot operator has to list the taskthe bot performswith a link to the RfP on its user page However thebot operator is required to open a new RfP if there is a substantialchange in the tasks the bot performs Consequently bots can havemultiple requests which are shown on the user page Furthermorein special cases a bot is also allowed to carry out administrativetasks such as blocking users or deleting or protecting pages Inthis case the operator needs to apply for both the RfP and theadministrator status11

32 Data CollectionWe collected all bot requests that were in the final approval stage inJuly 201812 ie we ignored all tasks without a final decision fromWikidatarsquos archive for requests13 We collected our data based onweb scraping ie web data extraction programs implemented inPython Bot requests which were listed several times in differentarchives were only parsed once14 This resulted in 685 task approvalpages

We extracted the following information from each page (cf Fig-ure 1) URL bot and operator name decision decision date taskscode and function details We additionally collected the date of thefirst and last page edit15 the number of page edits and the numberof distinct editors who contributed to the request for each page by10A detailed description is available at wwwwikidataorgwikiWikidataBots11 Further information can be found at httpswwwwikidataorgwikiWikidataRequests_for_permissionsAdministrator12 The first request in our data set was opened on October 31 2012 and the last onewas on June 29 201813 wwwwikidataorgwikiWikidataRequests_for_permissionsArchiveRequests_for_bot_flags14 For example wwwwikidataorgwikiWikidataRequests_for_permissionsBotVIAFbot is listed in wwwwikidataorgwikiWikidataRequests_for_permissionsRfBotMarch_2013 and in wwwwikidataorgwikiWikidataRequests_for_permissionsRfBotApril_2013 and only the later one was used15 The first edit can be interpreted as the request opening and last edit as the requestclosing or date of archiving

using the MediaWiki API16 This extracted information (cf Table 1)was processed and stored in a relational database17

33 Task Approval DataWe collected 683 distinct requests from Wikidata however two ofthem are no longer available and consequently we excluded themOf the resulting 681 requests 600 requests (88) were successfuland 81 (12) were unsuccessful

An average of five people (σ = 28) participated in an approvalrequest for example by taking part in the discussion or stating thefinal decision These people accounted for an average of slightlyabove ten edits for each request (σ = 85)

Based on the requests collected we identified 391 distinct botsand 366 distinct bot operators on Wikidata (cf Table 2) Someoperators applied for more than one task for their bot in one request(eg update label update description update alias) with five beingthe highest number Furthermore bot owners can operate morethan one bot The majority of operators (319 editors) have onlyone bot while three editors were running three bots and there waseven one operator managing seven bots simultaneously Similarlyone bot can also be operated by more than one user for examplethree editors were managing the ProteinBoxBot together

Table 2 Overview on task requests (n = 683 number of all taskrequests)

Decision Result Requests No of Bots Operators

Successful 600 323 299Unsuccessful 81 68 67Both 681 391 366

34 Classification ProcessWe classified the data collected manually to gain a deeper under-standing of the different tasks that bot operators carry out on Wiki-data In the following we describe this process in detail

We employed the qualitative method of content analysis whichis used to analyze unstructured content [12] which in our caseis the text in the RfPs We classified the data manually using theopen coding approach of the grounded theory The data we weredealing with were applicantsrsquo own sentences (in vivo codes) thuswe developed an emergent coding scheme and categorized thetextual information Two of the authors18 coded the data in a three-phase process by ensuring the consistency of the codes and reducingpossible bias during the coding

We read a sample of the data collected and discussed the diverserequest details We noticed that some common categories could befound within all requests based on some particular perspectivessuch as the potential actions of the requesting bots and the mainsources of the data to be imported from bots We focused thus ontask and function details of the RfP page and extracted the intended

16 wwwmediawikiorgwikiAPIMain_page17 The database is released under a public license on GitHub httpsgithubcomFUB-HCCwikidata-bot-request-analysis18 The first and second authors of this paper

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

use (task) and the data source of the bot edits which were approvedor closed as successful

In the first phase we categorized 30 randomly chosen RfPs col-laboratively to develop a shared understanding Based on the firstcategories we carried out a second round in which another set ofrandomly chosen 30 RfPs were categorized individually We thencompared the results and checked the agreement level19 After dis-cussing diverging cases and cross-validating our category set wecontinued with another round of categorizing the data (40 RfPs)individually and we had 39 agreements out of the 40 cases Wecontinued with further rounds of coding individually we met fre-quently on a regular basis and discussed new or unclear cases andcross-validated our category sets to ensure the consistency of ourdata classification

We developed a codebook20 as the primary reference of theclassification process by starting from the first phase and updat-ing it continually during the classification period The codes arestructured in verb-noun pairs denoting the content entities andoperations which are inspired by the work of Muumlller-Birn [15] andinitially drafted after the first 30 RfP classifications Based on thiswe developed it further by structuring all newly presented infor-mation into the verb-noun pairs and adding them to the codebookThe first category describes the potential scope of bot activity Thesub-categories contain all the entity types of Wikidata such asitems claims statements references and sitelinks The unknowncategory refers to the tasks without a clear definition We developedan overview which shows a scheme of all entity types and theirrespective activities together with the number of requests for theseedits (cf Table 3)

Another dimension of our classification considers the originof the data which the bot operators plan to include in WikidataThis category contains three main sub-categories Internal exter-nal and unknown sources Internal sources are those within theWikimedia sphere whereas external sources are outsideWikimediaThose requests without a specific source of data were categorizedas unknown

After coding all successful RfPs we became curious about cer-tain RfPs which were not approved Were there some unusual taskrequests or operators having difficulties in convincing the commu-nity of their botsrsquo safe activities To find this out we did anotherprocess of coding which resulted in 81 unsuccessfully closed RfPsWe classified these RfPs as before except we included the reasonsfor the request failure The category unsuccessful thus contains allRfPs that are not approved for reasons such as being duplicatesinactive for a long time not in line with bot policy or withdrawn(cp Table 6)

4 RESULTSWe organized the results of our RfP analyses in two sections Firstlywe focused on the approved bots requests and their contents andsecondly we looked into the details of highly discussed task ap-proval requests

19We measured the inter-rater reliability by using Cohenrsquos kappa At this stage wehad a substantial agreement level ie 06520 The codebook along with the data is provided on GitHub httpsgithubcomFUB-HCCwikidata-bot-request-analysis

Table 3 Task requests categorized into edit focus activitytype and the request result (approveddenied) (n = 681 num-ber of all task requests)

Edit Focus Activity Approved Denied

Item add 85 12update 40 3merge 6 1mark-deleted 1 0

Term add 22 5update 11 0

Label add 31 3update 15 1delete 1 0

Description add 33 5update 10 1

Alias add 4 0update 1 0delete 1 1

Statement add 141 19update 15 0delete 1 1

Claim add 161 18update 33 7delete 13 3

Qualifier add 7 0Reference add 9 3

update 4 0delete 1 0

Rank update 1 0

Sitelink add 52 3update 23 4delete 1 0merge 1 0revert 1 0

Badge add 4 0

Page add 0 1update 8 1delete 1 0

Community maintain 25 5Misc minusminus 26 13

41 Approved RequestsIn this part we show the types of activities which were approvedthe sources they used for editing and the relationship between botsand their operators We focus on three aspects of the tasks Theactivity focus the activity type and the origin of the data used bythe bots

411 Activity Type and Focus The activity focus considers whatthe bots have planned to edit in Wikidata It can be seen in Table 3that most task requests aim to edit items in Wikidata The differentcategories represent the different parts of an item which are or-ganized into terms ie labels descriptions and aliases statements

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

70 Add

20 Update

3 Maintain

3 Unknow

n

3 Delete

1 Revert

16 Item

16 Term

12 Sitelink

26 Claim

20 Statement

2 Reference5 M

isc

3 Unknow

n

Figure 2 Approved task requests organized into activity type (the upper half part) and edit focus (the lower half part)(n = 794 262 RfPs are requests with multiple tasks edit focus Term consists of Alias Label and Description The edit focus Miscellaneouscontains the smaller categories such as Qualifier Rank Badge and Maintenance related data)

(ie claims with qualifiers) references ranks and sitelinks withbadges21 The number of approved task requests show that themajority of bots seem to focus on adding data to statements in gen-eral and claims more specifically Figure 2 shows the distributionof approved tasks into activity types and edit focuses Again themajority of task requests deal with the addition of data

Furthermore there are 30 requests dealing with community con-cerns regarding maintenance activities such as archiving pagessections moving pages revert edits remove duplicates generatestatistical reports as some of the examples

In addition to this there are 24 requests concluded as unknowntasks half of which are written extremely briefly22 or in an unclearway23 so that it was difficult for us to identify their potential tasksSome other cases can only be classified to a specific task based onparticular assumptions for instance ShBot24 uses a Bot-FrameworkQuickStatements25 to batch edits which shows a great possibilityof importing data to statements However this tool could also beused to remove statements or importing terms so there is stilla challenge to identify the requestsrsquo focus without analyzing theedits We categorized all tasks abiding strictly by our codebookthus all the RfPs which needed assumptions are therefore classifiedas unknown

We assumed that the data addition to Wikidata is related primar-ily to Wikidatarsquos inception ie that the majority of task approvalwould be more at the beginning of Wikidatarsquos lifetime HoweverFigure 3 shows that the most often requested activity types in thefirst six months were add and update At the end of 2017 thesetwo activity types were often requested again We found the peakin 2017 for requesting tasks such as adding and updating itemsunusual and investigated it further Two operators26 carried out 31requests in 2017 for importing items for different languages Since

21These badges present if a sitelink to a specific article on Wikipedia language is anexcellent article22eg wwwwikidataorgwikiWikidataRequests_for_permissionsBotGlavkos_bot23eg wwwwikidataorgwikiWikidataRequests_for_permissionsBotBotMultichillT24 wwwwikidataorgwikiWikidataRequests_for_permissionsBotShBot25 Further information are available at metawikimediaorgwikiQuickStatements26 Both editors are employed by Wikimedia Sverige and were engaged in a GLAMinitiative

120

App

rove

d Re

ques

ts

mergerevertmaintain

addupdatedelete

unknown100

80

60

40

20

0

2013-0

6

2013-1

2

2014-0

6

2015-0

6

2014-1

2

2015-1

2

2016-0

6

2016-1

2

2017-0

6

2018-0

6

2017-1

2

2012-1

2

Figure 3 Activity types of approved task requests over time(n = 600 all 81 unsuccessful task requests were not included)

all these requests were permitted it explains the sudden accelera-tion of importing and updating tasks shown in the graph

In the following step we looked more closely at the sources ofthe data which bots are supposed to add to Wikidata

100

Sour

ces o

f app

rove

d ta

sks

80

60

40

20

0

internalexternalunknown

2013-0

6

2013-1

2

2014-0

6

2015-0

6

2014-1

2

2015-1

2

2016-0

6

2016-1

2

2017-0

6

2018-0

6

2017-1

2

2012-1

2

Figure 4 Sources of approved task requests over time (n =600 all 81 unsuccessful task requests were not included)

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 4 All internal the top 10 external sources most usedand unknown sources

Origin Source Task Requests

Internal (455)

Wikipedia 263Wikidata 110Wiki-loves-monuments 30Wikicommons 13Wikisource 5Wikivoyage 4Wikinews 2Wikiquote 2WikiPathways 2Wikimedia Project 24

External (38)

MusicBrainz 8VIAF 7OpenStreetMap 4DBpedia 4Freebase 4GRID 3GND 2GitHub 2YouTube 2Disease Ontology Project 2

Unknown Unknown 107

412 Data Origin In addition to the types of tasks in RfPs thesources from which data are imported also exist in the RfP Wetried to identify all sources independently of the focus or activitytype in the classification of all task approval pages We differentiateinternal external and unknown sources

(1) Internal including all sources from the Wikimedia ecosys-tem

(2) External All sources outside Wikimediarsquos ecosystem and(3) Unknown All cases where the source of data was a local

filedatabase or not clearly statedTable 4 shows the number of tasks approved per data source The

majority of approved task requests (454) deal with data from theWikimedia ecosystem with Wikipedia providing most of the dataIn addition to Wikipedia the Wiki Loves Monuments (WLM) ini-tiative is organized worldwide by Wikipedia community membersand Wikimedia projects The category Wikimedia project refers torequests where the operator provided information on the sourcebeing restricted to Wikimedia projectsrsquo scope however a specificattribution to one of the project was not possible

Another important source refers toWikidata itselfWe found thatonly a small number of these data are used in maintenance tasks27the majority of requests are for tasks such as retrieving informationfrom Wikidata adding descriptions and updating or adding newlabels by translating the existing labels in other languages

There are 128 task approval requests that relate to a task ofimporting external data into Wikidata The external sources most27Among 104 requests with a data source of Wikidata there are 16 requests holdingtasks of maintenance such as archive discussions or update database reports

Vietnam

ese

Hebrew

Maithill

Slova

k

Cebuan

o

Estonian

GreekCzec

hNep

ali

Bulgaria

nWels

h

Georgi

anFre

nch

Russian

German

Arabic

Farsi

Turkish

DanishUrduHindi

Latvi

an

Min-Nan

English

Dutch

Roman

ian

Swed

ish

Korean

Japan

ese

Belarusia

n

Bosnian

Serbi

an aiMala

y

Catalan

Ukranian

Chinese

Norweg

ianPoli

shIta

lian

Span

ish

5

10

15

20

25

30

35

40

0

Figure 5 Wikipedia language versions used as sources men-tioned on task approval pages (n = 195 all 488 other requestseither did not provide a specific Wikipedia language version or didnot have Wikipedia as the source)

often mentioned are MusicBrainz28 and VIAF29 (cf Table 4) Bothsources are used mainly to add identifiers to Wikidata items Otherexternal sources are OpenStreetMap30 DBpedia31 and Freebase32

As shown in Figure 4 the internal data sources have remainedthose most often mentioned in the task requests External sourcesand unknown sources are on the same level and external sourcesshow a slight increase over time

Most data from Wikipedia comes from 43 different languageversions (cf Figure 5) with English French German Russian andPersian being the top five languages The results show that botscontribution to certain languages had made these languages moreprominent than others in Wikidata

There is a total of 109 RfPs in total with unknown sources 85of them were approved some of these requests mentioned a localfile or database as a data source Other RfPs are not adding data toWikidata but are for example moving pages

42 Disputed RequestsThere are 81 RfPs in our dataset which were closed unsuccessfullyWe wanted to understand better why the community decided todecline these requests thus we investigated the main reasons forall unsuccessful RfPs

Our data show that operators themselves are most commonlyresponsible for their RfPs being rejected Most of the time operatorswere not responding to questions in a timely manner or did notprovide enough information when required There are only a fewcases where RfPs were not approved by the community ie nocommunity consensus was reached or the bot violated the botpolicy In one case33 for instance an editor was asking for botrights for its own user account instead of the one for the bot This

28MusicBrainz is an open and publicly available music encyclopedia that collects musicmetadata available at wwwmusicbrainzorg29VIAF is an international authority file which stands for Virtual International Au-thority File available at wwwviaforg30OpenStreetMap is an open license map of the world available at wwwopenstreetmaporg31DBpedia is also a Wikipedia-based knowledge graph wikidbpediaorg32Freebase was a collaborative knowledge base which was launched in 2007 and closedin 2014 All data from Freebase was subsequently imported into Wikidata33 wwwwikidataorgwikiWikidataRequests_for_permissionsBotFlorentyna

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

Table 5 Top 5 most edited task requests closed as unsuccessful

Reasons No of RfP Edits No of Editors RfP Created At Reference

Data source introduced errors 88 10 2018-03 [26]Bot name bot edits bot performance 43 6 2015-12 [24]Automatic adding of implicit statements 41 21 2013-04 [23]Support from less active users duplicate task 32 16 2018-03 [27]Bot name conflict with users running without flag 32 11 2014-10 [25]

violates bot policy as operators are required to create a new accountfor their bots and include the word bot in the account name

Table 6 shows the main reasons why tasks were rejected RfPswhich had already been implemented by other bots (duplicate)or RfPs requesting for tasks which were not needed such as onebot which wanted to remove obsolete claims related to property(P107)34 but there were no more items associated with P107

Furthermore RfPs were closed as unsuccessful because the com-munity could not trust on the operatorsrsquo behaviors One bot forexample has requested many RfPs which were not accepted Thecommunity was questioning its editing behavior and required thisuser to gain community trust first It can be seen that unsuccessfulclosing is not only related to the task types that users had requested

In the following we describe six RfPs which were edited anddiscussed most of all (cf Table 5) We defined the number of editson an RfP page as a proxy for challenging cases

Phenobot [26] was asking for correcting species names basedon the UniProt Taxonomy database The community doubted thedata source (UniProt Taxonomy database) and insisted on morereliable information The bot has introduced some error reportstherefore the community had to ask for edit-rollbacks Since thenthe operator has been inactive for two years which finally led tothis request being closed unsuccessfully

WikiLovesESBot [27] which wanted to add statements related toSpanish Municipalities from Spanish Wikipedia gained continuoussupport from nine users until a user asked for more active Wiki-data users to comment After this one active user checked the botcontributions and pointed out that this bot was blocked Moreoveranother active user figured out that this request was asking forduplicate tasks since there was already another user also workingon Spanish municipalities The RfP remained open for about halfa year without any operator activities and came to a proceduralclose

Another remarkable case is VlsergeyBot [25] which intended toupdate local dictionaries transfer data from infobox to Wikidataand update properties and reports in Wikidata At first the accountwas named Secretary which was against the bot policy Aftera community suggestion it was renamed to VlsergeyBot Afterthat a user opposed the request and said that Vlsergeylsquos activitygenerates a number of large conflicts with different users in ruwikiand then another user stated that maybe you have a personal con-flict with Vlsergey and discussions looked like a personal conflictThe community then considered whether this account needs a botflag and found that the bot was already running without a bot flagfor hours and even broke the speed limit for bots with a flag The

34P107 was a property representing GND type

operator promised to update his code by adding restrictions beforethe next run Finally the operator withdrew his request stating thathe does not need a bot flag that much

ImplicatorBot [23] was intended to add implicit statements intoWikidata automatically Implicated statements are relationshipsbetween items which can be inferred from existing relationships Aproperty of an item stating that a famous mother has a daughterimplies for example that the daughter has a mother even thoughit is not represented yet The task sounds quite straightforwardbut the community was hesitating to approve the task consideringthe possibility of vandalism Furthermore changes in propertiesare still common and such a bot might cause tremendous furtherwork in the case of those changes Even if the operator had sharedhis code discussions proceeded with the highest number of editorsdiscussing a request (ie 21) and finally the operator stated that heis ldquosick and tired of the bickeringrdquo Despite the community insistingthat they needed the task the operator withdrew it

Similar to ImplicatorBot Structor [24] was withdrawn by theoperator because the operator was tired of bureaucracy The botwanted to add claim label and description particularly to providestructured information about species items There were many ques-tions raised by a small number of community members (ie a totalof six different users participated in the discussion) including thebot name the source of edits duplicate entries the edit summaryand edit test performances The operator was actively respondingto all these questions However the whole process was so surpris-ingly complex (the discussions continued for over seven monthsfrom the start that the operator was inactive for a long time thenwithdrew his request

Requests denied because the user is not trusted are also inter-esting ElphiBot_335 [22] for example which is managed togetherby three operators sent a request along with the source code forupdating the interwiki links in Wikidata after the category in Ara-bic Wikipedia has been moved The community preferred how-ever someone who is more responsible and more familiar with thetechnical aspects of Wikipedia and Wikidata What concerned thediscussants most was that the operators were not capable enoughto fix the bugs introduced by their bot The community did not feelcomfortable approving this request because the operatorsrsquo treat-ment for fixing bugs was simply to reverse the mistakes manuallyThe requirement to run a bot should not only be to know how tocode but surely also to respond in a timely manner and notify thosewho can fix the issue

35This RfP has a total of 29 edits and 8 editors contributed in the discussions

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 6 Main reasons for unsuccessful task requests (n = 81 number of unsuccessful task requests)

Reason Description Frequency

No activity from operator Operators are not active or do not respond to the questions regarding their bots 35Withdrawn Operators wanted to close the requested task 14Duplicate Requested tasks are redundant and are already done through other bots or tools 9No community consensus Community opposed the requested tasks 6User not trusted Operator has questionable editing behavior and ability to fix any bugs introduced 6Task not needed Tasks do not need a bot or are not a significant problem for the time being 5Donrsquot need bot flag The requested tasks can be performed without a bot flag 3No info provided The fields in the RfP page are left empty 2Against bot policy Task is not in line with bot policy 1

5 DISCUSSIONIn this study we collected existing data from the RfP pages onWikidata to find out what kind of tasks the community allows tobe automated or shared with bots

Looking at the type and focus of the tasks which were mentionedmost often in RfPs we found that the dominant request type overtime is adding data This observation suggests that Wikidatarsquoscommunity uses bots to increase the coverage of the provideddata which aligns with the vision of the Wikimania FoundationImagine a world in which every single human being can freelyshare in the sum of all knowledge 36 Updating data is the secondhighest requested task which shows bots also take part in correctingmistakes or refreshing data according to changes (eg updatingthe sitelinks of moved pages) and contribute in data completenesswhich is defined as one data quality dimension [2]

However we were surprised that there were fewer requestsregarding adding or updating references which could support thetrustworthiness a data quality dimension as well of data importedby bots This result supports existing analyses on the edit activitiesof bots [18] However it would be interesting to bring these twolines of research together - the task approval and edit perspective -to understand more closely the development over time Thus wecan infer that bots are allowed to assist the Wikidata communityin ensuring data quality from the completeness angle howeverthis is less visible from the trustworthiness angle In comparison toWikipedia where bots perform primarily maintenance tasks botrequests in Wikidata concentrate mainly on the content ie dataperspective

The majority of data sources mentioned in RfPs come from in-side Wikimedia projects mainly Wikipedia (cf Section 412) Thisobservation implies that Wikidata is on its way to serve as thestructured data source for Wikimedia projects one of the maingoals for the development of Wikidata Among the five languageversions of Wikipedia most used English French Russian Germanand Persian (cf Figure 5) the first three languages show a dom-inance of Western knowledge in bots imports In a study Kaffeeet al [11] had earlier found the influence of western languagesin Wikidata with the most dominant ones being English DutchFrench German Spanish Italian and Russian The similarity of theresults in both studies could imply that the reason these languages

36Information on Wikimedia strategy can be found on httpswikimediafoundationorgaboutvision

have better coverage than other languages is a result of having botsKaffee et al also found that language coverage is not related to thenumber of speakers of a language which supports our assumptionWe could not find evidence that languages other than Western onesare underrepresented on bot requests on purpose We rather ex-pect that the current bot requests are a representative sample ofWikidatarsquos community

We can see from the external sources that bots use these sourcesmostly for importing identifiers (eg VIAF) or for importing datafrom other databases (eg DBpedia Freebase) This insight supportsPiscoporsquos [17] argument that bot edits need to be more diverse Wesuggest that further efforts should be made to import or link datafrom different sources for example from research institutions andlibraries With the increased usage of the Wikibase software37 weassume that more data might be linked or imported to WikidataOur classification revealed that bots import data from local files ordatabases already However such data imports often rely on thecommunity trusting the bot operators and do not seem large-scale

The requested maintenance tasks within the RfP show similarpatterns to those in the Wikipedia community Bots are being usedto overcome the limitations of the MediaWiki software [9] Anadministrative task such as deletion is not often requested in RfPsHowever bots on Wikidata are allowed to delete sitelinks onlySimilar to Wikipedia the Wikidata community comes closer to acommon understanding of what bots are supposed to do and whatnot

The issue of unknown tasks which was not clearly defined inRfPs shows the trust the community has in single operators proba-bly due to their previous participation history There is a noticeablenumber of approved RfPs which we coded as unknown due to theirvague task description while there are also cases where task de-scriptions were clear but the community did not trust the operatorsand thus they were not approved These two observations indicatethat trust is also given importance by the community in additionto their defined policy and procedures

The success rate of RfPs is relatively high since only a smallnumber of RfPs are closed as unsuccessful Our findings show thatamong the 81 unsuccessful RfPs only some of the RfPswere rejectedby the community directly the majority of them were unsuccessfuldue to the reason that the operators were not responding or had

37Wikibase is based on MediaWiki software with the Wikibase extension httpswwwmediawikiorgwikiWikibase

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

withdrawn the request Therefore operatorrsquos inactivity is a higherreason for failure than community refusal In some cases (ie theRfPs discussed most) we can see that the community considersevery detail of the tasks and then comes to a decision however insome cases they approve the request without a detailed discussionas can be seen in the cases of unknown tasks and unknown sourcesThis result could indicate that in addition to the defined procedurefor RfPs the community applies a more flexible approach whendeciding on RfPs considering the context (eg editors experience)of the application into account

In summary the high approval rate of RfPs shows that the Wiki-data community is positive towards bot operations in Wikidata andwilling to take advantage of task automation through bots Wiki-datarsquos community profited from the experiences of the Wikipediacommunity and built productive human-bot processes in quite ashort period However we will leave the impact of these processeson data quality to future research

6 CONCLUSIONWe studied the formal process of requesting bot rights in Wikidatato find out what kind of task types are allowed to be automated bybots Our study provides a detailed description of the RfP processWe retrieved the closed RfPs from the Wikidata archive up to mid-2018 We defined a scheme in the form of a codebook to classifythe RfPs and developed our dataset The RfPs were studied mainlyfrom two perspectives 1) What information is provided duringthe time the bot rights are requested and 2) how the communityhandles these requests We found that the main tasks requestedare adding claims statements terms and sitelinks into Wikidata aswell as the main source of bot edits have their roots in WikipediaThis contrasts with Wikipedia where bots are performing mostlymaintenance tasks Our findings also show that most of the RfPswere approved and a small number of them were unsuccessfulmainly because operators had withdrawn or there was no activityfrom the operators

Future directions of this research will focus on analyzing theactual bot activities and comparing them with the results of thisstudy to find out whether bots are really performing the tasks thatthey had asked for This could help us understand better how botsevolve over time and whether the community controls bot activitiesbased on their permitted tasks or bots can perform any edit oncethey get the permission to operate as a bot

ACKNOWLEDGMENTSWe thank the Deutscher Akademischer Austauschdienst (DAAD)[Research grant 57299294] and the German Research Foundation(DFG) [Fund number 229241684] who supported this research finan-cially and the anonymous reviewers for their valuable commentswhich helped to improve the paper

REFERENCES[1] Emilio Ferrara Onur Varol Clayton A Davis Filippo Menczer and Alessandro

Flammini 2016 The rise of social bots Commun ACM 59 7 (2016) 96ndash104[2] Michael Faumlrber Frederic Bartscherer Carsten Menne and Achim Rettinger 2018

Linked data quality of DBpedia Freebase OpenCyc Wikidata and YAGO Se-mantic Web 9 1 (Jan 2018) 77ndash129 httpsdoiorg103233SW-170275

[3] R Stuart Geiger 2009 The social roles of bots and assisted editing programs InInt Sym Wikis

[4] R Stuart Geiger 2012 Bots are users too Rethinking the roles of software agentsin HCI Tiny Transactions on Computer Science 1 (2012) 1

[5] R Stuart Geiger and Aaron Halfaker 2013 When the levee breaks without botswhat happens to Wikipediarsquos quality control processes In Proceedings of the 9thInternational Symposium on Open Collaboration (WikiSym rsquo13) ACM New YorkNY USA 61ndash66 httpsdoiorg10114524910552491061

[6] R Stuart Geiger and Aaron Halfaker 2017 Operationalizing conflict and co-operation between automated software agents in Wikipedia a replication andexpansion of ldquoeven good bots fightrdquo ACM Hum-Comput Interact 1 2 (Nov 2017)33 httpsdoiorg1011453134684

[7] R Stuart Geiger andDavid Ribes 2010 Thework of sustaining order inWikipediathe banning of a vandal In Proceedings of the 2010 ACM Conference on ComputerSupported Cooperative Work (CSCW rsquo10) ACM New York NY USA 117ndash126httpsdoiorg10114517189181718941

[8] Aaron Halfaker R Stuart Geiger Jonathan T Morgan and John Riedl 2013 Therise and decline of an open collaboration system how Wikipediarsquos reaction topopularity is causing its decline 57(5) (2013) 664ndash688 httpsdoiorg1011770002764212469365

[9] Aaron Halfaker and John Riedl 2012 Bots and cyborgs Wikipediarsquos immunesystem Computer 45 3 (March 2012) 79ndash82 httpsdoiorg101109MC201282

[10] AndrewHall Loren Terveen and Aaron Halfaker 2018 Bot detection inWikidatausing behavioral and other informal cues Proceedings of the ACM on Human-Computer Interaction 2 CSCW (Nov 2018) 1ndash18 httpsdoiorg1011453274333

[11] Lucie-Aimeacutee Kaffee Alessandro Piscopo Pavlos Vougiouklis Elena SimperlLeslie Carr and Lydia Pintscher 2017 A glimpse into Babel an analysis ofmultilinguality in Wikidata In Proceedings of the 13th International Symposiumon Open Collaboration (OpenSym rsquo17) ACM New York NY USA 141ndash145httpsdoiorg10114531254333125465

[12] Jonathan Lazar JinjuanHeidi Feng andHarryHochheiser 2017 ResearchMethodsin Human-Computer Interaction (2nd ed) Morgan Kaufmann

[13] Brenda Leong and Evan Selinger 2019 Robot Eyes Wide Shut UnderstandingDishonest Anthropomorphism ACM New York New York USA

[14] Randall M Livingstone 4 January 2016 Population automation An interviewwith Wikipedia bot pioneer Ram-Man First Monday 21 1 (4 January 2016)httpsdoiorg105210fmv21i16027

[15] Claudia Muumlller-Birn 2019 Bots in Wikipedia Unfolding their duties TechnicalReports Serie B TR-B-19-01 Freie Universitaumlt Berlin httpsrefubiumfu-berlindehandlefub18824775

[16] Claudia Muumlller-Birn Benjamin Karran Janette Lehmann and Markus Luczak-Roumlsch 2015 Peer-production system or collaborative ontology engineering effortWhat is Wikidata In Proceedings of the 11th International Symposium on OpenCollaboration ACM 20

[17] Alessandro Piscopo 2018 Wikidata A New Paradigm of Human-Bot Collabora-tion arXiv181000931 [cs] (Oct 2018) httparxivorgabs181000931 arXiv181000931

[18] Alessandro Piscopo Lucie-Aimeacutee Kaffee Chris Phethean and Elena Simperl2017 Provenance information in a collaborative knowledge graph an evaluationof Wikidata external references In The Semantic Web ndash ISWC 2017 (LectureNotes in Computer Science) Springer Cham 542ndash558 httpsdoiorg101007978-3-319-68288-4_32

[19] Alessandro Piscopo Chris Phethean and Elena Simperl 2017 What makes a goodcollaborative knowledge graph group composition and quality in Wikidata InSocial Informatics (Lecture Notes in Computer Science) Giovanni Luca CiampagliaAfra Mashhadi and Taha Yasseri (Eds) Springer International Publishing 305ndash322

[20] Thomas Steiner 2014 Bots vs Wikipedians Anons vs Logged-Ins (Redux) aglobal study of edit activity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration (OpenSym rsquo14) ACM New YorkNY USA 251ndash257 httpsdoiorg10114526415802641613

[21] Margaret-Anne Storey and Alexey Zagalsky 2016 Disrupting developer produc-tivity one bot at a time In Proceedings of the 2016 24th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering (FSE 2016) ACM New YorkNY USA 928ndash931 httpsdoiorg10114529502902983989

[22] Wikidata 2013 RfP ElphiBot_3 wwwwikidataorgwikiWikidataRequests_for_permissionsBotElphiBot_3 [Last accessed 01 August 2018]

[23] Wikidata 2013 RfP ImplicatorBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotImplicatorBot [Last accessed 01 August 2018]

[24] Wikidata 2014 RfP Structor wwwwikidataorgwikiWikidataRequests_for_permissionsBotStructor [Last accessed 01 August 2018]

[25] Wikidata 2014 RfP VlsergeyBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotVlsergeyBot [Last accessed 01 August 2018]

[26] Wikidata 2016 RfP Phenobot wwwwikidataorgwikiWikidataRequests_for_permissionsBotPhenobot [Last accessed 01 August 2018]

[27] Wikidata 2016 RfP WikiLovesESBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotWikiLovesESBot [Last accessed 01 August 2018]

  • Abstract
  • 1 Introduction
  • 2 Bots in Peer Production
    • 21 Bots in Wikipedia
    • 22 Bots in Wikidata
      • 3 Approving Bot Tasks
        • 31 Approval Process in Wikidata
        • 32 Data Collection
        • 33 Task Approval Data
        • 34 Classification Process
          • 4 Results
            • 41 Approved Requests
            • 42 Disputed Requests
              • 5 Discussion
              • 6 Conclusion
              • Acknowledgments
              • References
Page 4: Approving Automation: Analyzing Requests for Permissions ... · • We used our defined scheme to identify bot-requested tasks and data sources. • We analyzed how the community

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

Table 1 Example of the information extracted from an RfPpage

URL wwwwikidataorg[]BotMastiBotBot Name mastiBotOperator Name MastiTask and Function add basic personal information based on

biografy related infoboxes from plwikiDecision approvedDecision Date 15092017First Edit 03092017Last Edit 21022018No of Edits 8No of Editors 5

The community handles RfPs in accordance with the Bot ApprovalProcess10 The community discuss whether the task requested isneeded and whether the implementation of the task functions prop-erly After the decision has been made an administrator or a bu-reaucrat close the request if approved a bureaucrat will flag theaccount and if not the request is closed stating the reason for theunsuccessful request

After a successful request each bot operator has to list the taskthe bot performswith a link to the RfP on its user page However thebot operator is required to open a new RfP if there is a substantialchange in the tasks the bot performs Consequently bots can havemultiple requests which are shown on the user page Furthermorein special cases a bot is also allowed to carry out administrativetasks such as blocking users or deleting or protecting pages Inthis case the operator needs to apply for both the RfP and theadministrator status11

32 Data CollectionWe collected all bot requests that were in the final approval stage inJuly 201812 ie we ignored all tasks without a final decision fromWikidatarsquos archive for requests13 We collected our data based onweb scraping ie web data extraction programs implemented inPython Bot requests which were listed several times in differentarchives were only parsed once14 This resulted in 685 task approvalpages

We extracted the following information from each page (cf Fig-ure 1) URL bot and operator name decision decision date taskscode and function details We additionally collected the date of thefirst and last page edit15 the number of page edits and the numberof distinct editors who contributed to the request for each page by10A detailed description is available at wwwwikidataorgwikiWikidataBots11 Further information can be found at httpswwwwikidataorgwikiWikidataRequests_for_permissionsAdministrator12 The first request in our data set was opened on October 31 2012 and the last onewas on June 29 201813 wwwwikidataorgwikiWikidataRequests_for_permissionsArchiveRequests_for_bot_flags14 For example wwwwikidataorgwikiWikidataRequests_for_permissionsBotVIAFbot is listed in wwwwikidataorgwikiWikidataRequests_for_permissionsRfBotMarch_2013 and in wwwwikidataorgwikiWikidataRequests_for_permissionsRfBotApril_2013 and only the later one was used15 The first edit can be interpreted as the request opening and last edit as the requestclosing or date of archiving

using the MediaWiki API16 This extracted information (cf Table 1)was processed and stored in a relational database17

33 Task Approval DataWe collected 683 distinct requests from Wikidata however two ofthem are no longer available and consequently we excluded themOf the resulting 681 requests 600 requests (88) were successfuland 81 (12) were unsuccessful

An average of five people (σ = 28) participated in an approvalrequest for example by taking part in the discussion or stating thefinal decision These people accounted for an average of slightlyabove ten edits for each request (σ = 85)

Based on the requests collected we identified 391 distinct botsand 366 distinct bot operators on Wikidata (cf Table 2) Someoperators applied for more than one task for their bot in one request(eg update label update description update alias) with five beingthe highest number Furthermore bot owners can operate morethan one bot The majority of operators (319 editors) have onlyone bot while three editors were running three bots and there waseven one operator managing seven bots simultaneously Similarlyone bot can also be operated by more than one user for examplethree editors were managing the ProteinBoxBot together

Table 2 Overview on task requests (n = 683 number of all taskrequests)

Decision Result Requests No of Bots Operators

Successful 600 323 299Unsuccessful 81 68 67Both 681 391 366

34 Classification ProcessWe classified the data collected manually to gain a deeper under-standing of the different tasks that bot operators carry out on Wiki-data In the following we describe this process in detail

We employed the qualitative method of content analysis whichis used to analyze unstructured content [12] which in our caseis the text in the RfPs We classified the data manually using theopen coding approach of the grounded theory The data we weredealing with were applicantsrsquo own sentences (in vivo codes) thuswe developed an emergent coding scheme and categorized thetextual information Two of the authors18 coded the data in a three-phase process by ensuring the consistency of the codes and reducingpossible bias during the coding

We read a sample of the data collected and discussed the diverserequest details We noticed that some common categories could befound within all requests based on some particular perspectivessuch as the potential actions of the requesting bots and the mainsources of the data to be imported from bots We focused thus ontask and function details of the RfP page and extracted the intended

16 wwwmediawikiorgwikiAPIMain_page17 The database is released under a public license on GitHub httpsgithubcomFUB-HCCwikidata-bot-request-analysis18 The first and second authors of this paper

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

use (task) and the data source of the bot edits which were approvedor closed as successful

In the first phase we categorized 30 randomly chosen RfPs col-laboratively to develop a shared understanding Based on the firstcategories we carried out a second round in which another set ofrandomly chosen 30 RfPs were categorized individually We thencompared the results and checked the agreement level19 After dis-cussing diverging cases and cross-validating our category set wecontinued with another round of categorizing the data (40 RfPs)individually and we had 39 agreements out of the 40 cases Wecontinued with further rounds of coding individually we met fre-quently on a regular basis and discussed new or unclear cases andcross-validated our category sets to ensure the consistency of ourdata classification

We developed a codebook20 as the primary reference of theclassification process by starting from the first phase and updat-ing it continually during the classification period The codes arestructured in verb-noun pairs denoting the content entities andoperations which are inspired by the work of Muumlller-Birn [15] andinitially drafted after the first 30 RfP classifications Based on thiswe developed it further by structuring all newly presented infor-mation into the verb-noun pairs and adding them to the codebookThe first category describes the potential scope of bot activity Thesub-categories contain all the entity types of Wikidata such asitems claims statements references and sitelinks The unknowncategory refers to the tasks without a clear definition We developedan overview which shows a scheme of all entity types and theirrespective activities together with the number of requests for theseedits (cf Table 3)

Another dimension of our classification considers the originof the data which the bot operators plan to include in WikidataThis category contains three main sub-categories Internal exter-nal and unknown sources Internal sources are those within theWikimedia sphere whereas external sources are outsideWikimediaThose requests without a specific source of data were categorizedas unknown

After coding all successful RfPs we became curious about cer-tain RfPs which were not approved Were there some unusual taskrequests or operators having difficulties in convincing the commu-nity of their botsrsquo safe activities To find this out we did anotherprocess of coding which resulted in 81 unsuccessfully closed RfPsWe classified these RfPs as before except we included the reasonsfor the request failure The category unsuccessful thus contains allRfPs that are not approved for reasons such as being duplicatesinactive for a long time not in line with bot policy or withdrawn(cp Table 6)

4 RESULTSWe organized the results of our RfP analyses in two sections Firstlywe focused on the approved bots requests and their contents andsecondly we looked into the details of highly discussed task ap-proval requests

19We measured the inter-rater reliability by using Cohenrsquos kappa At this stage wehad a substantial agreement level ie 06520 The codebook along with the data is provided on GitHub httpsgithubcomFUB-HCCwikidata-bot-request-analysis

Table 3 Task requests categorized into edit focus activitytype and the request result (approveddenied) (n = 681 num-ber of all task requests)

Edit Focus Activity Approved Denied

Item add 85 12update 40 3merge 6 1mark-deleted 1 0

Term add 22 5update 11 0

Label add 31 3update 15 1delete 1 0

Description add 33 5update 10 1

Alias add 4 0update 1 0delete 1 1

Statement add 141 19update 15 0delete 1 1

Claim add 161 18update 33 7delete 13 3

Qualifier add 7 0Reference add 9 3

update 4 0delete 1 0

Rank update 1 0

Sitelink add 52 3update 23 4delete 1 0merge 1 0revert 1 0

Badge add 4 0

Page add 0 1update 8 1delete 1 0

Community maintain 25 5Misc minusminus 26 13

41 Approved RequestsIn this part we show the types of activities which were approvedthe sources they used for editing and the relationship between botsand their operators We focus on three aspects of the tasks Theactivity focus the activity type and the origin of the data used bythe bots

411 Activity Type and Focus The activity focus considers whatthe bots have planned to edit in Wikidata It can be seen in Table 3that most task requests aim to edit items in Wikidata The differentcategories represent the different parts of an item which are or-ganized into terms ie labels descriptions and aliases statements

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

70 Add

20 Update

3 Maintain

3 Unknow

n

3 Delete

1 Revert

16 Item

16 Term

12 Sitelink

26 Claim

20 Statement

2 Reference5 M

isc

3 Unknow

n

Figure 2 Approved task requests organized into activity type (the upper half part) and edit focus (the lower half part)(n = 794 262 RfPs are requests with multiple tasks edit focus Term consists of Alias Label and Description The edit focus Miscellaneouscontains the smaller categories such as Qualifier Rank Badge and Maintenance related data)

(ie claims with qualifiers) references ranks and sitelinks withbadges21 The number of approved task requests show that themajority of bots seem to focus on adding data to statements in gen-eral and claims more specifically Figure 2 shows the distributionof approved tasks into activity types and edit focuses Again themajority of task requests deal with the addition of data

Furthermore there are 30 requests dealing with community con-cerns regarding maintenance activities such as archiving pagessections moving pages revert edits remove duplicates generatestatistical reports as some of the examples

In addition to this there are 24 requests concluded as unknowntasks half of which are written extremely briefly22 or in an unclearway23 so that it was difficult for us to identify their potential tasksSome other cases can only be classified to a specific task based onparticular assumptions for instance ShBot24 uses a Bot-FrameworkQuickStatements25 to batch edits which shows a great possibilityof importing data to statements However this tool could also beused to remove statements or importing terms so there is stilla challenge to identify the requestsrsquo focus without analyzing theedits We categorized all tasks abiding strictly by our codebookthus all the RfPs which needed assumptions are therefore classifiedas unknown

We assumed that the data addition to Wikidata is related primar-ily to Wikidatarsquos inception ie that the majority of task approvalwould be more at the beginning of Wikidatarsquos lifetime HoweverFigure 3 shows that the most often requested activity types in thefirst six months were add and update At the end of 2017 thesetwo activity types were often requested again We found the peakin 2017 for requesting tasks such as adding and updating itemsunusual and investigated it further Two operators26 carried out 31requests in 2017 for importing items for different languages Since

21These badges present if a sitelink to a specific article on Wikipedia language is anexcellent article22eg wwwwikidataorgwikiWikidataRequests_for_permissionsBotGlavkos_bot23eg wwwwikidataorgwikiWikidataRequests_for_permissionsBotBotMultichillT24 wwwwikidataorgwikiWikidataRequests_for_permissionsBotShBot25 Further information are available at metawikimediaorgwikiQuickStatements26 Both editors are employed by Wikimedia Sverige and were engaged in a GLAMinitiative

120

App

rove

d Re

ques

ts

mergerevertmaintain

addupdatedelete

unknown100

80

60

40

20

0

2013-0

6

2013-1

2

2014-0

6

2015-0

6

2014-1

2

2015-1

2

2016-0

6

2016-1

2

2017-0

6

2018-0

6

2017-1

2

2012-1

2

Figure 3 Activity types of approved task requests over time(n = 600 all 81 unsuccessful task requests were not included)

all these requests were permitted it explains the sudden accelera-tion of importing and updating tasks shown in the graph

In the following step we looked more closely at the sources ofthe data which bots are supposed to add to Wikidata

100

Sour

ces o

f app

rove

d ta

sks

80

60

40

20

0

internalexternalunknown

2013-0

6

2013-1

2

2014-0

6

2015-0

6

2014-1

2

2015-1

2

2016-0

6

2016-1

2

2017-0

6

2018-0

6

2017-1

2

2012-1

2

Figure 4 Sources of approved task requests over time (n =600 all 81 unsuccessful task requests were not included)

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 4 All internal the top 10 external sources most usedand unknown sources

Origin Source Task Requests

Internal (455)

Wikipedia 263Wikidata 110Wiki-loves-monuments 30Wikicommons 13Wikisource 5Wikivoyage 4Wikinews 2Wikiquote 2WikiPathways 2Wikimedia Project 24

External (38)

MusicBrainz 8VIAF 7OpenStreetMap 4DBpedia 4Freebase 4GRID 3GND 2GitHub 2YouTube 2Disease Ontology Project 2

Unknown Unknown 107

412 Data Origin In addition to the types of tasks in RfPs thesources from which data are imported also exist in the RfP Wetried to identify all sources independently of the focus or activitytype in the classification of all task approval pages We differentiateinternal external and unknown sources

(1) Internal including all sources from the Wikimedia ecosys-tem

(2) External All sources outside Wikimediarsquos ecosystem and(3) Unknown All cases where the source of data was a local

filedatabase or not clearly statedTable 4 shows the number of tasks approved per data source The

majority of approved task requests (454) deal with data from theWikimedia ecosystem with Wikipedia providing most of the dataIn addition to Wikipedia the Wiki Loves Monuments (WLM) ini-tiative is organized worldwide by Wikipedia community membersand Wikimedia projects The category Wikimedia project refers torequests where the operator provided information on the sourcebeing restricted to Wikimedia projectsrsquo scope however a specificattribution to one of the project was not possible

Another important source refers toWikidata itselfWe found thatonly a small number of these data are used in maintenance tasks27the majority of requests are for tasks such as retrieving informationfrom Wikidata adding descriptions and updating or adding newlabels by translating the existing labels in other languages

There are 128 task approval requests that relate to a task ofimporting external data into Wikidata The external sources most27Among 104 requests with a data source of Wikidata there are 16 requests holdingtasks of maintenance such as archive discussions or update database reports

Vietnam

ese

Hebrew

Maithill

Slova

k

Cebuan

o

Estonian

GreekCzec

hNep

ali

Bulgaria

nWels

h

Georgi

anFre

nch

Russian

German

Arabic

Farsi

Turkish

DanishUrduHindi

Latvi

an

Min-Nan

English

Dutch

Roman

ian

Swed

ish

Korean

Japan

ese

Belarusia

n

Bosnian

Serbi

an aiMala

y

Catalan

Ukranian

Chinese

Norweg

ianPoli

shIta

lian

Span

ish

5

10

15

20

25

30

35

40

0

Figure 5 Wikipedia language versions used as sources men-tioned on task approval pages (n = 195 all 488 other requestseither did not provide a specific Wikipedia language version or didnot have Wikipedia as the source)

often mentioned are MusicBrainz28 and VIAF29 (cf Table 4) Bothsources are used mainly to add identifiers to Wikidata items Otherexternal sources are OpenStreetMap30 DBpedia31 and Freebase32

As shown in Figure 4 the internal data sources have remainedthose most often mentioned in the task requests External sourcesand unknown sources are on the same level and external sourcesshow a slight increase over time

Most data from Wikipedia comes from 43 different languageversions (cf Figure 5) with English French German Russian andPersian being the top five languages The results show that botscontribution to certain languages had made these languages moreprominent than others in Wikidata

There is a total of 109 RfPs in total with unknown sources 85of them were approved some of these requests mentioned a localfile or database as a data source Other RfPs are not adding data toWikidata but are for example moving pages

42 Disputed RequestsThere are 81 RfPs in our dataset which were closed unsuccessfullyWe wanted to understand better why the community decided todecline these requests thus we investigated the main reasons forall unsuccessful RfPs

Our data show that operators themselves are most commonlyresponsible for their RfPs being rejected Most of the time operatorswere not responding to questions in a timely manner or did notprovide enough information when required There are only a fewcases where RfPs were not approved by the community ie nocommunity consensus was reached or the bot violated the botpolicy In one case33 for instance an editor was asking for botrights for its own user account instead of the one for the bot This

28MusicBrainz is an open and publicly available music encyclopedia that collects musicmetadata available at wwwmusicbrainzorg29VIAF is an international authority file which stands for Virtual International Au-thority File available at wwwviaforg30OpenStreetMap is an open license map of the world available at wwwopenstreetmaporg31DBpedia is also a Wikipedia-based knowledge graph wikidbpediaorg32Freebase was a collaborative knowledge base which was launched in 2007 and closedin 2014 All data from Freebase was subsequently imported into Wikidata33 wwwwikidataorgwikiWikidataRequests_for_permissionsBotFlorentyna

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

Table 5 Top 5 most edited task requests closed as unsuccessful

Reasons No of RfP Edits No of Editors RfP Created At Reference

Data source introduced errors 88 10 2018-03 [26]Bot name bot edits bot performance 43 6 2015-12 [24]Automatic adding of implicit statements 41 21 2013-04 [23]Support from less active users duplicate task 32 16 2018-03 [27]Bot name conflict with users running without flag 32 11 2014-10 [25]

violates bot policy as operators are required to create a new accountfor their bots and include the word bot in the account name

Table 6 shows the main reasons why tasks were rejected RfPswhich had already been implemented by other bots (duplicate)or RfPs requesting for tasks which were not needed such as onebot which wanted to remove obsolete claims related to property(P107)34 but there were no more items associated with P107

Furthermore RfPs were closed as unsuccessful because the com-munity could not trust on the operatorsrsquo behaviors One bot forexample has requested many RfPs which were not accepted Thecommunity was questioning its editing behavior and required thisuser to gain community trust first It can be seen that unsuccessfulclosing is not only related to the task types that users had requested

In the following we describe six RfPs which were edited anddiscussed most of all (cf Table 5) We defined the number of editson an RfP page as a proxy for challenging cases

Phenobot [26] was asking for correcting species names basedon the UniProt Taxonomy database The community doubted thedata source (UniProt Taxonomy database) and insisted on morereliable information The bot has introduced some error reportstherefore the community had to ask for edit-rollbacks Since thenthe operator has been inactive for two years which finally led tothis request being closed unsuccessfully

WikiLovesESBot [27] which wanted to add statements related toSpanish Municipalities from Spanish Wikipedia gained continuoussupport from nine users until a user asked for more active Wiki-data users to comment After this one active user checked the botcontributions and pointed out that this bot was blocked Moreoveranother active user figured out that this request was asking forduplicate tasks since there was already another user also workingon Spanish municipalities The RfP remained open for about halfa year without any operator activities and came to a proceduralclose

Another remarkable case is VlsergeyBot [25] which intended toupdate local dictionaries transfer data from infobox to Wikidataand update properties and reports in Wikidata At first the accountwas named Secretary which was against the bot policy Aftera community suggestion it was renamed to VlsergeyBot Afterthat a user opposed the request and said that Vlsergeylsquos activitygenerates a number of large conflicts with different users in ruwikiand then another user stated that maybe you have a personal con-flict with Vlsergey and discussions looked like a personal conflictThe community then considered whether this account needs a botflag and found that the bot was already running without a bot flagfor hours and even broke the speed limit for bots with a flag The

34P107 was a property representing GND type

operator promised to update his code by adding restrictions beforethe next run Finally the operator withdrew his request stating thathe does not need a bot flag that much

ImplicatorBot [23] was intended to add implicit statements intoWikidata automatically Implicated statements are relationshipsbetween items which can be inferred from existing relationships Aproperty of an item stating that a famous mother has a daughterimplies for example that the daughter has a mother even thoughit is not represented yet The task sounds quite straightforwardbut the community was hesitating to approve the task consideringthe possibility of vandalism Furthermore changes in propertiesare still common and such a bot might cause tremendous furtherwork in the case of those changes Even if the operator had sharedhis code discussions proceeded with the highest number of editorsdiscussing a request (ie 21) and finally the operator stated that heis ldquosick and tired of the bickeringrdquo Despite the community insistingthat they needed the task the operator withdrew it

Similar to ImplicatorBot Structor [24] was withdrawn by theoperator because the operator was tired of bureaucracy The botwanted to add claim label and description particularly to providestructured information about species items There were many ques-tions raised by a small number of community members (ie a totalof six different users participated in the discussion) including thebot name the source of edits duplicate entries the edit summaryand edit test performances The operator was actively respondingto all these questions However the whole process was so surpris-ingly complex (the discussions continued for over seven monthsfrom the start that the operator was inactive for a long time thenwithdrew his request

Requests denied because the user is not trusted are also inter-esting ElphiBot_335 [22] for example which is managed togetherby three operators sent a request along with the source code forupdating the interwiki links in Wikidata after the category in Ara-bic Wikipedia has been moved The community preferred how-ever someone who is more responsible and more familiar with thetechnical aspects of Wikipedia and Wikidata What concerned thediscussants most was that the operators were not capable enoughto fix the bugs introduced by their bot The community did not feelcomfortable approving this request because the operatorsrsquo treat-ment for fixing bugs was simply to reverse the mistakes manuallyThe requirement to run a bot should not only be to know how tocode but surely also to respond in a timely manner and notify thosewho can fix the issue

35This RfP has a total of 29 edits and 8 editors contributed in the discussions

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 6 Main reasons for unsuccessful task requests (n = 81 number of unsuccessful task requests)

Reason Description Frequency

No activity from operator Operators are not active or do not respond to the questions regarding their bots 35Withdrawn Operators wanted to close the requested task 14Duplicate Requested tasks are redundant and are already done through other bots or tools 9No community consensus Community opposed the requested tasks 6User not trusted Operator has questionable editing behavior and ability to fix any bugs introduced 6Task not needed Tasks do not need a bot or are not a significant problem for the time being 5Donrsquot need bot flag The requested tasks can be performed without a bot flag 3No info provided The fields in the RfP page are left empty 2Against bot policy Task is not in line with bot policy 1

5 DISCUSSIONIn this study we collected existing data from the RfP pages onWikidata to find out what kind of tasks the community allows tobe automated or shared with bots

Looking at the type and focus of the tasks which were mentionedmost often in RfPs we found that the dominant request type overtime is adding data This observation suggests that Wikidatarsquoscommunity uses bots to increase the coverage of the provideddata which aligns with the vision of the Wikimania FoundationImagine a world in which every single human being can freelyshare in the sum of all knowledge 36 Updating data is the secondhighest requested task which shows bots also take part in correctingmistakes or refreshing data according to changes (eg updatingthe sitelinks of moved pages) and contribute in data completenesswhich is defined as one data quality dimension [2]

However we were surprised that there were fewer requestsregarding adding or updating references which could support thetrustworthiness a data quality dimension as well of data importedby bots This result supports existing analyses on the edit activitiesof bots [18] However it would be interesting to bring these twolines of research together - the task approval and edit perspective -to understand more closely the development over time Thus wecan infer that bots are allowed to assist the Wikidata communityin ensuring data quality from the completeness angle howeverthis is less visible from the trustworthiness angle In comparison toWikipedia where bots perform primarily maintenance tasks botrequests in Wikidata concentrate mainly on the content ie dataperspective

The majority of data sources mentioned in RfPs come from in-side Wikimedia projects mainly Wikipedia (cf Section 412) Thisobservation implies that Wikidata is on its way to serve as thestructured data source for Wikimedia projects one of the maingoals for the development of Wikidata Among the five languageversions of Wikipedia most used English French Russian Germanand Persian (cf Figure 5) the first three languages show a dom-inance of Western knowledge in bots imports In a study Kaffeeet al [11] had earlier found the influence of western languagesin Wikidata with the most dominant ones being English DutchFrench German Spanish Italian and Russian The similarity of theresults in both studies could imply that the reason these languages

36Information on Wikimedia strategy can be found on httpswikimediafoundationorgaboutvision

have better coverage than other languages is a result of having botsKaffee et al also found that language coverage is not related to thenumber of speakers of a language which supports our assumptionWe could not find evidence that languages other than Western onesare underrepresented on bot requests on purpose We rather ex-pect that the current bot requests are a representative sample ofWikidatarsquos community

We can see from the external sources that bots use these sourcesmostly for importing identifiers (eg VIAF) or for importing datafrom other databases (eg DBpedia Freebase) This insight supportsPiscoporsquos [17] argument that bot edits need to be more diverse Wesuggest that further efforts should be made to import or link datafrom different sources for example from research institutions andlibraries With the increased usage of the Wikibase software37 weassume that more data might be linked or imported to WikidataOur classification revealed that bots import data from local files ordatabases already However such data imports often rely on thecommunity trusting the bot operators and do not seem large-scale

The requested maintenance tasks within the RfP show similarpatterns to those in the Wikipedia community Bots are being usedto overcome the limitations of the MediaWiki software [9] Anadministrative task such as deletion is not often requested in RfPsHowever bots on Wikidata are allowed to delete sitelinks onlySimilar to Wikipedia the Wikidata community comes closer to acommon understanding of what bots are supposed to do and whatnot

The issue of unknown tasks which was not clearly defined inRfPs shows the trust the community has in single operators proba-bly due to their previous participation history There is a noticeablenumber of approved RfPs which we coded as unknown due to theirvague task description while there are also cases where task de-scriptions were clear but the community did not trust the operatorsand thus they were not approved These two observations indicatethat trust is also given importance by the community in additionto their defined policy and procedures

The success rate of RfPs is relatively high since only a smallnumber of RfPs are closed as unsuccessful Our findings show thatamong the 81 unsuccessful RfPs only some of the RfPswere rejectedby the community directly the majority of them were unsuccessfuldue to the reason that the operators were not responding or had

37Wikibase is based on MediaWiki software with the Wikibase extension httpswwwmediawikiorgwikiWikibase

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

withdrawn the request Therefore operatorrsquos inactivity is a higherreason for failure than community refusal In some cases (ie theRfPs discussed most) we can see that the community considersevery detail of the tasks and then comes to a decision however insome cases they approve the request without a detailed discussionas can be seen in the cases of unknown tasks and unknown sourcesThis result could indicate that in addition to the defined procedurefor RfPs the community applies a more flexible approach whendeciding on RfPs considering the context (eg editors experience)of the application into account

In summary the high approval rate of RfPs shows that the Wiki-data community is positive towards bot operations in Wikidata andwilling to take advantage of task automation through bots Wiki-datarsquos community profited from the experiences of the Wikipediacommunity and built productive human-bot processes in quite ashort period However we will leave the impact of these processeson data quality to future research

6 CONCLUSIONWe studied the formal process of requesting bot rights in Wikidatato find out what kind of task types are allowed to be automated bybots Our study provides a detailed description of the RfP processWe retrieved the closed RfPs from the Wikidata archive up to mid-2018 We defined a scheme in the form of a codebook to classifythe RfPs and developed our dataset The RfPs were studied mainlyfrom two perspectives 1) What information is provided duringthe time the bot rights are requested and 2) how the communityhandles these requests We found that the main tasks requestedare adding claims statements terms and sitelinks into Wikidata aswell as the main source of bot edits have their roots in WikipediaThis contrasts with Wikipedia where bots are performing mostlymaintenance tasks Our findings also show that most of the RfPswere approved and a small number of them were unsuccessfulmainly because operators had withdrawn or there was no activityfrom the operators

Future directions of this research will focus on analyzing theactual bot activities and comparing them with the results of thisstudy to find out whether bots are really performing the tasks thatthey had asked for This could help us understand better how botsevolve over time and whether the community controls bot activitiesbased on their permitted tasks or bots can perform any edit oncethey get the permission to operate as a bot

ACKNOWLEDGMENTSWe thank the Deutscher Akademischer Austauschdienst (DAAD)[Research grant 57299294] and the German Research Foundation(DFG) [Fund number 229241684] who supported this research finan-cially and the anonymous reviewers for their valuable commentswhich helped to improve the paper

REFERENCES[1] Emilio Ferrara Onur Varol Clayton A Davis Filippo Menczer and Alessandro

Flammini 2016 The rise of social bots Commun ACM 59 7 (2016) 96ndash104[2] Michael Faumlrber Frederic Bartscherer Carsten Menne and Achim Rettinger 2018

Linked data quality of DBpedia Freebase OpenCyc Wikidata and YAGO Se-mantic Web 9 1 (Jan 2018) 77ndash129 httpsdoiorg103233SW-170275

[3] R Stuart Geiger 2009 The social roles of bots and assisted editing programs InInt Sym Wikis

[4] R Stuart Geiger 2012 Bots are users too Rethinking the roles of software agentsin HCI Tiny Transactions on Computer Science 1 (2012) 1

[5] R Stuart Geiger and Aaron Halfaker 2013 When the levee breaks without botswhat happens to Wikipediarsquos quality control processes In Proceedings of the 9thInternational Symposium on Open Collaboration (WikiSym rsquo13) ACM New YorkNY USA 61ndash66 httpsdoiorg10114524910552491061

[6] R Stuart Geiger and Aaron Halfaker 2017 Operationalizing conflict and co-operation between automated software agents in Wikipedia a replication andexpansion of ldquoeven good bots fightrdquo ACM Hum-Comput Interact 1 2 (Nov 2017)33 httpsdoiorg1011453134684

[7] R Stuart Geiger andDavid Ribes 2010 Thework of sustaining order inWikipediathe banning of a vandal In Proceedings of the 2010 ACM Conference on ComputerSupported Cooperative Work (CSCW rsquo10) ACM New York NY USA 117ndash126httpsdoiorg10114517189181718941

[8] Aaron Halfaker R Stuart Geiger Jonathan T Morgan and John Riedl 2013 Therise and decline of an open collaboration system how Wikipediarsquos reaction topopularity is causing its decline 57(5) (2013) 664ndash688 httpsdoiorg1011770002764212469365

[9] Aaron Halfaker and John Riedl 2012 Bots and cyborgs Wikipediarsquos immunesystem Computer 45 3 (March 2012) 79ndash82 httpsdoiorg101109MC201282

[10] AndrewHall Loren Terveen and Aaron Halfaker 2018 Bot detection inWikidatausing behavioral and other informal cues Proceedings of the ACM on Human-Computer Interaction 2 CSCW (Nov 2018) 1ndash18 httpsdoiorg1011453274333

[11] Lucie-Aimeacutee Kaffee Alessandro Piscopo Pavlos Vougiouklis Elena SimperlLeslie Carr and Lydia Pintscher 2017 A glimpse into Babel an analysis ofmultilinguality in Wikidata In Proceedings of the 13th International Symposiumon Open Collaboration (OpenSym rsquo17) ACM New York NY USA 141ndash145httpsdoiorg10114531254333125465

[12] Jonathan Lazar JinjuanHeidi Feng andHarryHochheiser 2017 ResearchMethodsin Human-Computer Interaction (2nd ed) Morgan Kaufmann

[13] Brenda Leong and Evan Selinger 2019 Robot Eyes Wide Shut UnderstandingDishonest Anthropomorphism ACM New York New York USA

[14] Randall M Livingstone 4 January 2016 Population automation An interviewwith Wikipedia bot pioneer Ram-Man First Monday 21 1 (4 January 2016)httpsdoiorg105210fmv21i16027

[15] Claudia Muumlller-Birn 2019 Bots in Wikipedia Unfolding their duties TechnicalReports Serie B TR-B-19-01 Freie Universitaumlt Berlin httpsrefubiumfu-berlindehandlefub18824775

[16] Claudia Muumlller-Birn Benjamin Karran Janette Lehmann and Markus Luczak-Roumlsch 2015 Peer-production system or collaborative ontology engineering effortWhat is Wikidata In Proceedings of the 11th International Symposium on OpenCollaboration ACM 20

[17] Alessandro Piscopo 2018 Wikidata A New Paradigm of Human-Bot Collabora-tion arXiv181000931 [cs] (Oct 2018) httparxivorgabs181000931 arXiv181000931

[18] Alessandro Piscopo Lucie-Aimeacutee Kaffee Chris Phethean and Elena Simperl2017 Provenance information in a collaborative knowledge graph an evaluationof Wikidata external references In The Semantic Web ndash ISWC 2017 (LectureNotes in Computer Science) Springer Cham 542ndash558 httpsdoiorg101007978-3-319-68288-4_32

[19] Alessandro Piscopo Chris Phethean and Elena Simperl 2017 What makes a goodcollaborative knowledge graph group composition and quality in Wikidata InSocial Informatics (Lecture Notes in Computer Science) Giovanni Luca CiampagliaAfra Mashhadi and Taha Yasseri (Eds) Springer International Publishing 305ndash322

[20] Thomas Steiner 2014 Bots vs Wikipedians Anons vs Logged-Ins (Redux) aglobal study of edit activity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration (OpenSym rsquo14) ACM New YorkNY USA 251ndash257 httpsdoiorg10114526415802641613

[21] Margaret-Anne Storey and Alexey Zagalsky 2016 Disrupting developer produc-tivity one bot at a time In Proceedings of the 2016 24th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering (FSE 2016) ACM New YorkNY USA 928ndash931 httpsdoiorg10114529502902983989

[22] Wikidata 2013 RfP ElphiBot_3 wwwwikidataorgwikiWikidataRequests_for_permissionsBotElphiBot_3 [Last accessed 01 August 2018]

[23] Wikidata 2013 RfP ImplicatorBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotImplicatorBot [Last accessed 01 August 2018]

[24] Wikidata 2014 RfP Structor wwwwikidataorgwikiWikidataRequests_for_permissionsBotStructor [Last accessed 01 August 2018]

[25] Wikidata 2014 RfP VlsergeyBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotVlsergeyBot [Last accessed 01 August 2018]

[26] Wikidata 2016 RfP Phenobot wwwwikidataorgwikiWikidataRequests_for_permissionsBotPhenobot [Last accessed 01 August 2018]

[27] Wikidata 2016 RfP WikiLovesESBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotWikiLovesESBot [Last accessed 01 August 2018]

  • Abstract
  • 1 Introduction
  • 2 Bots in Peer Production
    • 21 Bots in Wikipedia
    • 22 Bots in Wikidata
      • 3 Approving Bot Tasks
        • 31 Approval Process in Wikidata
        • 32 Data Collection
        • 33 Task Approval Data
        • 34 Classification Process
          • 4 Results
            • 41 Approved Requests
            • 42 Disputed Requests
              • 5 Discussion
              • 6 Conclusion
              • Acknowledgments
              • References
Page 5: Approving Automation: Analyzing Requests for Permissions ... · • We used our defined scheme to identify bot-requested tasks and data sources. • We analyzed how the community

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

use (task) and the data source of the bot edits which were approvedor closed as successful

In the first phase we categorized 30 randomly chosen RfPs col-laboratively to develop a shared understanding Based on the firstcategories we carried out a second round in which another set ofrandomly chosen 30 RfPs were categorized individually We thencompared the results and checked the agreement level19 After dis-cussing diverging cases and cross-validating our category set wecontinued with another round of categorizing the data (40 RfPs)individually and we had 39 agreements out of the 40 cases Wecontinued with further rounds of coding individually we met fre-quently on a regular basis and discussed new or unclear cases andcross-validated our category sets to ensure the consistency of ourdata classification

We developed a codebook20 as the primary reference of theclassification process by starting from the first phase and updat-ing it continually during the classification period The codes arestructured in verb-noun pairs denoting the content entities andoperations which are inspired by the work of Muumlller-Birn [15] andinitially drafted after the first 30 RfP classifications Based on thiswe developed it further by structuring all newly presented infor-mation into the verb-noun pairs and adding them to the codebookThe first category describes the potential scope of bot activity Thesub-categories contain all the entity types of Wikidata such asitems claims statements references and sitelinks The unknowncategory refers to the tasks without a clear definition We developedan overview which shows a scheme of all entity types and theirrespective activities together with the number of requests for theseedits (cf Table 3)

Another dimension of our classification considers the originof the data which the bot operators plan to include in WikidataThis category contains three main sub-categories Internal exter-nal and unknown sources Internal sources are those within theWikimedia sphere whereas external sources are outsideWikimediaThose requests without a specific source of data were categorizedas unknown

After coding all successful RfPs we became curious about cer-tain RfPs which were not approved Were there some unusual taskrequests or operators having difficulties in convincing the commu-nity of their botsrsquo safe activities To find this out we did anotherprocess of coding which resulted in 81 unsuccessfully closed RfPsWe classified these RfPs as before except we included the reasonsfor the request failure The category unsuccessful thus contains allRfPs that are not approved for reasons such as being duplicatesinactive for a long time not in line with bot policy or withdrawn(cp Table 6)

4 RESULTSWe organized the results of our RfP analyses in two sections Firstlywe focused on the approved bots requests and their contents andsecondly we looked into the details of highly discussed task ap-proval requests

19We measured the inter-rater reliability by using Cohenrsquos kappa At this stage wehad a substantial agreement level ie 06520 The codebook along with the data is provided on GitHub httpsgithubcomFUB-HCCwikidata-bot-request-analysis

Table 3 Task requests categorized into edit focus activitytype and the request result (approveddenied) (n = 681 num-ber of all task requests)

Edit Focus Activity Approved Denied

Item add 85 12update 40 3merge 6 1mark-deleted 1 0

Term add 22 5update 11 0

Label add 31 3update 15 1delete 1 0

Description add 33 5update 10 1

Alias add 4 0update 1 0delete 1 1

Statement add 141 19update 15 0delete 1 1

Claim add 161 18update 33 7delete 13 3

Qualifier add 7 0Reference add 9 3

update 4 0delete 1 0

Rank update 1 0

Sitelink add 52 3update 23 4delete 1 0merge 1 0revert 1 0

Badge add 4 0

Page add 0 1update 8 1delete 1 0

Community maintain 25 5Misc minusminus 26 13

41 Approved RequestsIn this part we show the types of activities which were approvedthe sources they used for editing and the relationship between botsand their operators We focus on three aspects of the tasks Theactivity focus the activity type and the origin of the data used bythe bots

411 Activity Type and Focus The activity focus considers whatthe bots have planned to edit in Wikidata It can be seen in Table 3that most task requests aim to edit items in Wikidata The differentcategories represent the different parts of an item which are or-ganized into terms ie labels descriptions and aliases statements

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

70 Add

20 Update

3 Maintain

3 Unknow

n

3 Delete

1 Revert

16 Item

16 Term

12 Sitelink

26 Claim

20 Statement

2 Reference5 M

isc

3 Unknow

n

Figure 2 Approved task requests organized into activity type (the upper half part) and edit focus (the lower half part)(n = 794 262 RfPs are requests with multiple tasks edit focus Term consists of Alias Label and Description The edit focus Miscellaneouscontains the smaller categories such as Qualifier Rank Badge and Maintenance related data)

(ie claims with qualifiers) references ranks and sitelinks withbadges21 The number of approved task requests show that themajority of bots seem to focus on adding data to statements in gen-eral and claims more specifically Figure 2 shows the distributionof approved tasks into activity types and edit focuses Again themajority of task requests deal with the addition of data

Furthermore there are 30 requests dealing with community con-cerns regarding maintenance activities such as archiving pagessections moving pages revert edits remove duplicates generatestatistical reports as some of the examples

In addition to this there are 24 requests concluded as unknowntasks half of which are written extremely briefly22 or in an unclearway23 so that it was difficult for us to identify their potential tasksSome other cases can only be classified to a specific task based onparticular assumptions for instance ShBot24 uses a Bot-FrameworkQuickStatements25 to batch edits which shows a great possibilityof importing data to statements However this tool could also beused to remove statements or importing terms so there is stilla challenge to identify the requestsrsquo focus without analyzing theedits We categorized all tasks abiding strictly by our codebookthus all the RfPs which needed assumptions are therefore classifiedas unknown

We assumed that the data addition to Wikidata is related primar-ily to Wikidatarsquos inception ie that the majority of task approvalwould be more at the beginning of Wikidatarsquos lifetime HoweverFigure 3 shows that the most often requested activity types in thefirst six months were add and update At the end of 2017 thesetwo activity types were often requested again We found the peakin 2017 for requesting tasks such as adding and updating itemsunusual and investigated it further Two operators26 carried out 31requests in 2017 for importing items for different languages Since

21These badges present if a sitelink to a specific article on Wikipedia language is anexcellent article22eg wwwwikidataorgwikiWikidataRequests_for_permissionsBotGlavkos_bot23eg wwwwikidataorgwikiWikidataRequests_for_permissionsBotBotMultichillT24 wwwwikidataorgwikiWikidataRequests_for_permissionsBotShBot25 Further information are available at metawikimediaorgwikiQuickStatements26 Both editors are employed by Wikimedia Sverige and were engaged in a GLAMinitiative

120

App

rove

d Re

ques

ts

mergerevertmaintain

addupdatedelete

unknown100

80

60

40

20

0

2013-0

6

2013-1

2

2014-0

6

2015-0

6

2014-1

2

2015-1

2

2016-0

6

2016-1

2

2017-0

6

2018-0

6

2017-1

2

2012-1

2

Figure 3 Activity types of approved task requests over time(n = 600 all 81 unsuccessful task requests were not included)

all these requests were permitted it explains the sudden accelera-tion of importing and updating tasks shown in the graph

In the following step we looked more closely at the sources ofthe data which bots are supposed to add to Wikidata

100

Sour

ces o

f app

rove

d ta

sks

80

60

40

20

0

internalexternalunknown

2013-0

6

2013-1

2

2014-0

6

2015-0

6

2014-1

2

2015-1

2

2016-0

6

2016-1

2

2017-0

6

2018-0

6

2017-1

2

2012-1

2

Figure 4 Sources of approved task requests over time (n =600 all 81 unsuccessful task requests were not included)

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 4 All internal the top 10 external sources most usedand unknown sources

Origin Source Task Requests

Internal (455)

Wikipedia 263Wikidata 110Wiki-loves-monuments 30Wikicommons 13Wikisource 5Wikivoyage 4Wikinews 2Wikiquote 2WikiPathways 2Wikimedia Project 24

External (38)

MusicBrainz 8VIAF 7OpenStreetMap 4DBpedia 4Freebase 4GRID 3GND 2GitHub 2YouTube 2Disease Ontology Project 2

Unknown Unknown 107

412 Data Origin In addition to the types of tasks in RfPs thesources from which data are imported also exist in the RfP Wetried to identify all sources independently of the focus or activitytype in the classification of all task approval pages We differentiateinternal external and unknown sources

(1) Internal including all sources from the Wikimedia ecosys-tem

(2) External All sources outside Wikimediarsquos ecosystem and(3) Unknown All cases where the source of data was a local

filedatabase or not clearly statedTable 4 shows the number of tasks approved per data source The

majority of approved task requests (454) deal with data from theWikimedia ecosystem with Wikipedia providing most of the dataIn addition to Wikipedia the Wiki Loves Monuments (WLM) ini-tiative is organized worldwide by Wikipedia community membersand Wikimedia projects The category Wikimedia project refers torequests where the operator provided information on the sourcebeing restricted to Wikimedia projectsrsquo scope however a specificattribution to one of the project was not possible

Another important source refers toWikidata itselfWe found thatonly a small number of these data are used in maintenance tasks27the majority of requests are for tasks such as retrieving informationfrom Wikidata adding descriptions and updating or adding newlabels by translating the existing labels in other languages

There are 128 task approval requests that relate to a task ofimporting external data into Wikidata The external sources most27Among 104 requests with a data source of Wikidata there are 16 requests holdingtasks of maintenance such as archive discussions or update database reports

Vietnam

ese

Hebrew

Maithill

Slova

k

Cebuan

o

Estonian

GreekCzec

hNep

ali

Bulgaria

nWels

h

Georgi

anFre

nch

Russian

German

Arabic

Farsi

Turkish

DanishUrduHindi

Latvi

an

Min-Nan

English

Dutch

Roman

ian

Swed

ish

Korean

Japan

ese

Belarusia

n

Bosnian

Serbi

an aiMala

y

Catalan

Ukranian

Chinese

Norweg

ianPoli

shIta

lian

Span

ish

5

10

15

20

25

30

35

40

0

Figure 5 Wikipedia language versions used as sources men-tioned on task approval pages (n = 195 all 488 other requestseither did not provide a specific Wikipedia language version or didnot have Wikipedia as the source)

often mentioned are MusicBrainz28 and VIAF29 (cf Table 4) Bothsources are used mainly to add identifiers to Wikidata items Otherexternal sources are OpenStreetMap30 DBpedia31 and Freebase32

As shown in Figure 4 the internal data sources have remainedthose most often mentioned in the task requests External sourcesand unknown sources are on the same level and external sourcesshow a slight increase over time

Most data from Wikipedia comes from 43 different languageversions (cf Figure 5) with English French German Russian andPersian being the top five languages The results show that botscontribution to certain languages had made these languages moreprominent than others in Wikidata

There is a total of 109 RfPs in total with unknown sources 85of them were approved some of these requests mentioned a localfile or database as a data source Other RfPs are not adding data toWikidata but are for example moving pages

42 Disputed RequestsThere are 81 RfPs in our dataset which were closed unsuccessfullyWe wanted to understand better why the community decided todecline these requests thus we investigated the main reasons forall unsuccessful RfPs

Our data show that operators themselves are most commonlyresponsible for their RfPs being rejected Most of the time operatorswere not responding to questions in a timely manner or did notprovide enough information when required There are only a fewcases where RfPs were not approved by the community ie nocommunity consensus was reached or the bot violated the botpolicy In one case33 for instance an editor was asking for botrights for its own user account instead of the one for the bot This

28MusicBrainz is an open and publicly available music encyclopedia that collects musicmetadata available at wwwmusicbrainzorg29VIAF is an international authority file which stands for Virtual International Au-thority File available at wwwviaforg30OpenStreetMap is an open license map of the world available at wwwopenstreetmaporg31DBpedia is also a Wikipedia-based knowledge graph wikidbpediaorg32Freebase was a collaborative knowledge base which was launched in 2007 and closedin 2014 All data from Freebase was subsequently imported into Wikidata33 wwwwikidataorgwikiWikidataRequests_for_permissionsBotFlorentyna

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

Table 5 Top 5 most edited task requests closed as unsuccessful

Reasons No of RfP Edits No of Editors RfP Created At Reference

Data source introduced errors 88 10 2018-03 [26]Bot name bot edits bot performance 43 6 2015-12 [24]Automatic adding of implicit statements 41 21 2013-04 [23]Support from less active users duplicate task 32 16 2018-03 [27]Bot name conflict with users running without flag 32 11 2014-10 [25]

violates bot policy as operators are required to create a new accountfor their bots and include the word bot in the account name

Table 6 shows the main reasons why tasks were rejected RfPswhich had already been implemented by other bots (duplicate)or RfPs requesting for tasks which were not needed such as onebot which wanted to remove obsolete claims related to property(P107)34 but there were no more items associated with P107

Furthermore RfPs were closed as unsuccessful because the com-munity could not trust on the operatorsrsquo behaviors One bot forexample has requested many RfPs which were not accepted Thecommunity was questioning its editing behavior and required thisuser to gain community trust first It can be seen that unsuccessfulclosing is not only related to the task types that users had requested

In the following we describe six RfPs which were edited anddiscussed most of all (cf Table 5) We defined the number of editson an RfP page as a proxy for challenging cases

Phenobot [26] was asking for correcting species names basedon the UniProt Taxonomy database The community doubted thedata source (UniProt Taxonomy database) and insisted on morereliable information The bot has introduced some error reportstherefore the community had to ask for edit-rollbacks Since thenthe operator has been inactive for two years which finally led tothis request being closed unsuccessfully

WikiLovesESBot [27] which wanted to add statements related toSpanish Municipalities from Spanish Wikipedia gained continuoussupport from nine users until a user asked for more active Wiki-data users to comment After this one active user checked the botcontributions and pointed out that this bot was blocked Moreoveranother active user figured out that this request was asking forduplicate tasks since there was already another user also workingon Spanish municipalities The RfP remained open for about halfa year without any operator activities and came to a proceduralclose

Another remarkable case is VlsergeyBot [25] which intended toupdate local dictionaries transfer data from infobox to Wikidataand update properties and reports in Wikidata At first the accountwas named Secretary which was against the bot policy Aftera community suggestion it was renamed to VlsergeyBot Afterthat a user opposed the request and said that Vlsergeylsquos activitygenerates a number of large conflicts with different users in ruwikiand then another user stated that maybe you have a personal con-flict with Vlsergey and discussions looked like a personal conflictThe community then considered whether this account needs a botflag and found that the bot was already running without a bot flagfor hours and even broke the speed limit for bots with a flag The

34P107 was a property representing GND type

operator promised to update his code by adding restrictions beforethe next run Finally the operator withdrew his request stating thathe does not need a bot flag that much

ImplicatorBot [23] was intended to add implicit statements intoWikidata automatically Implicated statements are relationshipsbetween items which can be inferred from existing relationships Aproperty of an item stating that a famous mother has a daughterimplies for example that the daughter has a mother even thoughit is not represented yet The task sounds quite straightforwardbut the community was hesitating to approve the task consideringthe possibility of vandalism Furthermore changes in propertiesare still common and such a bot might cause tremendous furtherwork in the case of those changes Even if the operator had sharedhis code discussions proceeded with the highest number of editorsdiscussing a request (ie 21) and finally the operator stated that heis ldquosick and tired of the bickeringrdquo Despite the community insistingthat they needed the task the operator withdrew it

Similar to ImplicatorBot Structor [24] was withdrawn by theoperator because the operator was tired of bureaucracy The botwanted to add claim label and description particularly to providestructured information about species items There were many ques-tions raised by a small number of community members (ie a totalof six different users participated in the discussion) including thebot name the source of edits duplicate entries the edit summaryand edit test performances The operator was actively respondingto all these questions However the whole process was so surpris-ingly complex (the discussions continued for over seven monthsfrom the start that the operator was inactive for a long time thenwithdrew his request

Requests denied because the user is not trusted are also inter-esting ElphiBot_335 [22] for example which is managed togetherby three operators sent a request along with the source code forupdating the interwiki links in Wikidata after the category in Ara-bic Wikipedia has been moved The community preferred how-ever someone who is more responsible and more familiar with thetechnical aspects of Wikipedia and Wikidata What concerned thediscussants most was that the operators were not capable enoughto fix the bugs introduced by their bot The community did not feelcomfortable approving this request because the operatorsrsquo treat-ment for fixing bugs was simply to reverse the mistakes manuallyThe requirement to run a bot should not only be to know how tocode but surely also to respond in a timely manner and notify thosewho can fix the issue

35This RfP has a total of 29 edits and 8 editors contributed in the discussions

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 6 Main reasons for unsuccessful task requests (n = 81 number of unsuccessful task requests)

Reason Description Frequency

No activity from operator Operators are not active or do not respond to the questions regarding their bots 35Withdrawn Operators wanted to close the requested task 14Duplicate Requested tasks are redundant and are already done through other bots or tools 9No community consensus Community opposed the requested tasks 6User not trusted Operator has questionable editing behavior and ability to fix any bugs introduced 6Task not needed Tasks do not need a bot or are not a significant problem for the time being 5Donrsquot need bot flag The requested tasks can be performed without a bot flag 3No info provided The fields in the RfP page are left empty 2Against bot policy Task is not in line with bot policy 1

5 DISCUSSIONIn this study we collected existing data from the RfP pages onWikidata to find out what kind of tasks the community allows tobe automated or shared with bots

Looking at the type and focus of the tasks which were mentionedmost often in RfPs we found that the dominant request type overtime is adding data This observation suggests that Wikidatarsquoscommunity uses bots to increase the coverage of the provideddata which aligns with the vision of the Wikimania FoundationImagine a world in which every single human being can freelyshare in the sum of all knowledge 36 Updating data is the secondhighest requested task which shows bots also take part in correctingmistakes or refreshing data according to changes (eg updatingthe sitelinks of moved pages) and contribute in data completenesswhich is defined as one data quality dimension [2]

However we were surprised that there were fewer requestsregarding adding or updating references which could support thetrustworthiness a data quality dimension as well of data importedby bots This result supports existing analyses on the edit activitiesof bots [18] However it would be interesting to bring these twolines of research together - the task approval and edit perspective -to understand more closely the development over time Thus wecan infer that bots are allowed to assist the Wikidata communityin ensuring data quality from the completeness angle howeverthis is less visible from the trustworthiness angle In comparison toWikipedia where bots perform primarily maintenance tasks botrequests in Wikidata concentrate mainly on the content ie dataperspective

The majority of data sources mentioned in RfPs come from in-side Wikimedia projects mainly Wikipedia (cf Section 412) Thisobservation implies that Wikidata is on its way to serve as thestructured data source for Wikimedia projects one of the maingoals for the development of Wikidata Among the five languageversions of Wikipedia most used English French Russian Germanand Persian (cf Figure 5) the first three languages show a dom-inance of Western knowledge in bots imports In a study Kaffeeet al [11] had earlier found the influence of western languagesin Wikidata with the most dominant ones being English DutchFrench German Spanish Italian and Russian The similarity of theresults in both studies could imply that the reason these languages

36Information on Wikimedia strategy can be found on httpswikimediafoundationorgaboutvision

have better coverage than other languages is a result of having botsKaffee et al also found that language coverage is not related to thenumber of speakers of a language which supports our assumptionWe could not find evidence that languages other than Western onesare underrepresented on bot requests on purpose We rather ex-pect that the current bot requests are a representative sample ofWikidatarsquos community

We can see from the external sources that bots use these sourcesmostly for importing identifiers (eg VIAF) or for importing datafrom other databases (eg DBpedia Freebase) This insight supportsPiscoporsquos [17] argument that bot edits need to be more diverse Wesuggest that further efforts should be made to import or link datafrom different sources for example from research institutions andlibraries With the increased usage of the Wikibase software37 weassume that more data might be linked or imported to WikidataOur classification revealed that bots import data from local files ordatabases already However such data imports often rely on thecommunity trusting the bot operators and do not seem large-scale

The requested maintenance tasks within the RfP show similarpatterns to those in the Wikipedia community Bots are being usedto overcome the limitations of the MediaWiki software [9] Anadministrative task such as deletion is not often requested in RfPsHowever bots on Wikidata are allowed to delete sitelinks onlySimilar to Wikipedia the Wikidata community comes closer to acommon understanding of what bots are supposed to do and whatnot

The issue of unknown tasks which was not clearly defined inRfPs shows the trust the community has in single operators proba-bly due to their previous participation history There is a noticeablenumber of approved RfPs which we coded as unknown due to theirvague task description while there are also cases where task de-scriptions were clear but the community did not trust the operatorsand thus they were not approved These two observations indicatethat trust is also given importance by the community in additionto their defined policy and procedures

The success rate of RfPs is relatively high since only a smallnumber of RfPs are closed as unsuccessful Our findings show thatamong the 81 unsuccessful RfPs only some of the RfPswere rejectedby the community directly the majority of them were unsuccessfuldue to the reason that the operators were not responding or had

37Wikibase is based on MediaWiki software with the Wikibase extension httpswwwmediawikiorgwikiWikibase

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

withdrawn the request Therefore operatorrsquos inactivity is a higherreason for failure than community refusal In some cases (ie theRfPs discussed most) we can see that the community considersevery detail of the tasks and then comes to a decision however insome cases they approve the request without a detailed discussionas can be seen in the cases of unknown tasks and unknown sourcesThis result could indicate that in addition to the defined procedurefor RfPs the community applies a more flexible approach whendeciding on RfPs considering the context (eg editors experience)of the application into account

In summary the high approval rate of RfPs shows that the Wiki-data community is positive towards bot operations in Wikidata andwilling to take advantage of task automation through bots Wiki-datarsquos community profited from the experiences of the Wikipediacommunity and built productive human-bot processes in quite ashort period However we will leave the impact of these processeson data quality to future research

6 CONCLUSIONWe studied the formal process of requesting bot rights in Wikidatato find out what kind of task types are allowed to be automated bybots Our study provides a detailed description of the RfP processWe retrieved the closed RfPs from the Wikidata archive up to mid-2018 We defined a scheme in the form of a codebook to classifythe RfPs and developed our dataset The RfPs were studied mainlyfrom two perspectives 1) What information is provided duringthe time the bot rights are requested and 2) how the communityhandles these requests We found that the main tasks requestedare adding claims statements terms and sitelinks into Wikidata aswell as the main source of bot edits have their roots in WikipediaThis contrasts with Wikipedia where bots are performing mostlymaintenance tasks Our findings also show that most of the RfPswere approved and a small number of them were unsuccessfulmainly because operators had withdrawn or there was no activityfrom the operators

Future directions of this research will focus on analyzing theactual bot activities and comparing them with the results of thisstudy to find out whether bots are really performing the tasks thatthey had asked for This could help us understand better how botsevolve over time and whether the community controls bot activitiesbased on their permitted tasks or bots can perform any edit oncethey get the permission to operate as a bot

ACKNOWLEDGMENTSWe thank the Deutscher Akademischer Austauschdienst (DAAD)[Research grant 57299294] and the German Research Foundation(DFG) [Fund number 229241684] who supported this research finan-cially and the anonymous reviewers for their valuable commentswhich helped to improve the paper

REFERENCES[1] Emilio Ferrara Onur Varol Clayton A Davis Filippo Menczer and Alessandro

Flammini 2016 The rise of social bots Commun ACM 59 7 (2016) 96ndash104[2] Michael Faumlrber Frederic Bartscherer Carsten Menne and Achim Rettinger 2018

Linked data quality of DBpedia Freebase OpenCyc Wikidata and YAGO Se-mantic Web 9 1 (Jan 2018) 77ndash129 httpsdoiorg103233SW-170275

[3] R Stuart Geiger 2009 The social roles of bots and assisted editing programs InInt Sym Wikis

[4] R Stuart Geiger 2012 Bots are users too Rethinking the roles of software agentsin HCI Tiny Transactions on Computer Science 1 (2012) 1

[5] R Stuart Geiger and Aaron Halfaker 2013 When the levee breaks without botswhat happens to Wikipediarsquos quality control processes In Proceedings of the 9thInternational Symposium on Open Collaboration (WikiSym rsquo13) ACM New YorkNY USA 61ndash66 httpsdoiorg10114524910552491061

[6] R Stuart Geiger and Aaron Halfaker 2017 Operationalizing conflict and co-operation between automated software agents in Wikipedia a replication andexpansion of ldquoeven good bots fightrdquo ACM Hum-Comput Interact 1 2 (Nov 2017)33 httpsdoiorg1011453134684

[7] R Stuart Geiger andDavid Ribes 2010 Thework of sustaining order inWikipediathe banning of a vandal In Proceedings of the 2010 ACM Conference on ComputerSupported Cooperative Work (CSCW rsquo10) ACM New York NY USA 117ndash126httpsdoiorg10114517189181718941

[8] Aaron Halfaker R Stuart Geiger Jonathan T Morgan and John Riedl 2013 Therise and decline of an open collaboration system how Wikipediarsquos reaction topopularity is causing its decline 57(5) (2013) 664ndash688 httpsdoiorg1011770002764212469365

[9] Aaron Halfaker and John Riedl 2012 Bots and cyborgs Wikipediarsquos immunesystem Computer 45 3 (March 2012) 79ndash82 httpsdoiorg101109MC201282

[10] AndrewHall Loren Terveen and Aaron Halfaker 2018 Bot detection inWikidatausing behavioral and other informal cues Proceedings of the ACM on Human-Computer Interaction 2 CSCW (Nov 2018) 1ndash18 httpsdoiorg1011453274333

[11] Lucie-Aimeacutee Kaffee Alessandro Piscopo Pavlos Vougiouklis Elena SimperlLeslie Carr and Lydia Pintscher 2017 A glimpse into Babel an analysis ofmultilinguality in Wikidata In Proceedings of the 13th International Symposiumon Open Collaboration (OpenSym rsquo17) ACM New York NY USA 141ndash145httpsdoiorg10114531254333125465

[12] Jonathan Lazar JinjuanHeidi Feng andHarryHochheiser 2017 ResearchMethodsin Human-Computer Interaction (2nd ed) Morgan Kaufmann

[13] Brenda Leong and Evan Selinger 2019 Robot Eyes Wide Shut UnderstandingDishonest Anthropomorphism ACM New York New York USA

[14] Randall M Livingstone 4 January 2016 Population automation An interviewwith Wikipedia bot pioneer Ram-Man First Monday 21 1 (4 January 2016)httpsdoiorg105210fmv21i16027

[15] Claudia Muumlller-Birn 2019 Bots in Wikipedia Unfolding their duties TechnicalReports Serie B TR-B-19-01 Freie Universitaumlt Berlin httpsrefubiumfu-berlindehandlefub18824775

[16] Claudia Muumlller-Birn Benjamin Karran Janette Lehmann and Markus Luczak-Roumlsch 2015 Peer-production system or collaborative ontology engineering effortWhat is Wikidata In Proceedings of the 11th International Symposium on OpenCollaboration ACM 20

[17] Alessandro Piscopo 2018 Wikidata A New Paradigm of Human-Bot Collabora-tion arXiv181000931 [cs] (Oct 2018) httparxivorgabs181000931 arXiv181000931

[18] Alessandro Piscopo Lucie-Aimeacutee Kaffee Chris Phethean and Elena Simperl2017 Provenance information in a collaborative knowledge graph an evaluationof Wikidata external references In The Semantic Web ndash ISWC 2017 (LectureNotes in Computer Science) Springer Cham 542ndash558 httpsdoiorg101007978-3-319-68288-4_32

[19] Alessandro Piscopo Chris Phethean and Elena Simperl 2017 What makes a goodcollaborative knowledge graph group composition and quality in Wikidata InSocial Informatics (Lecture Notes in Computer Science) Giovanni Luca CiampagliaAfra Mashhadi and Taha Yasseri (Eds) Springer International Publishing 305ndash322

[20] Thomas Steiner 2014 Bots vs Wikipedians Anons vs Logged-Ins (Redux) aglobal study of edit activity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration (OpenSym rsquo14) ACM New YorkNY USA 251ndash257 httpsdoiorg10114526415802641613

[21] Margaret-Anne Storey and Alexey Zagalsky 2016 Disrupting developer produc-tivity one bot at a time In Proceedings of the 2016 24th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering (FSE 2016) ACM New YorkNY USA 928ndash931 httpsdoiorg10114529502902983989

[22] Wikidata 2013 RfP ElphiBot_3 wwwwikidataorgwikiWikidataRequests_for_permissionsBotElphiBot_3 [Last accessed 01 August 2018]

[23] Wikidata 2013 RfP ImplicatorBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotImplicatorBot [Last accessed 01 August 2018]

[24] Wikidata 2014 RfP Structor wwwwikidataorgwikiWikidataRequests_for_permissionsBotStructor [Last accessed 01 August 2018]

[25] Wikidata 2014 RfP VlsergeyBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotVlsergeyBot [Last accessed 01 August 2018]

[26] Wikidata 2016 RfP Phenobot wwwwikidataorgwikiWikidataRequests_for_permissionsBotPhenobot [Last accessed 01 August 2018]

[27] Wikidata 2016 RfP WikiLovesESBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotWikiLovesESBot [Last accessed 01 August 2018]

  • Abstract
  • 1 Introduction
  • 2 Bots in Peer Production
    • 21 Bots in Wikipedia
    • 22 Bots in Wikidata
      • 3 Approving Bot Tasks
        • 31 Approval Process in Wikidata
        • 32 Data Collection
        • 33 Task Approval Data
        • 34 Classification Process
          • 4 Results
            • 41 Approved Requests
            • 42 Disputed Requests
              • 5 Discussion
              • 6 Conclusion
              • Acknowledgments
              • References
Page 6: Approving Automation: Analyzing Requests for Permissions ... · • We used our defined scheme to identify bot-requested tasks and data sources. • We analyzed how the community

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

70 Add

20 Update

3 Maintain

3 Unknow

n

3 Delete

1 Revert

16 Item

16 Term

12 Sitelink

26 Claim

20 Statement

2 Reference5 M

isc

3 Unknow

n

Figure 2 Approved task requests organized into activity type (the upper half part) and edit focus (the lower half part)(n = 794 262 RfPs are requests with multiple tasks edit focus Term consists of Alias Label and Description The edit focus Miscellaneouscontains the smaller categories such as Qualifier Rank Badge and Maintenance related data)

(ie claims with qualifiers) references ranks and sitelinks withbadges21 The number of approved task requests show that themajority of bots seem to focus on adding data to statements in gen-eral and claims more specifically Figure 2 shows the distributionof approved tasks into activity types and edit focuses Again themajority of task requests deal with the addition of data

Furthermore there are 30 requests dealing with community con-cerns regarding maintenance activities such as archiving pagessections moving pages revert edits remove duplicates generatestatistical reports as some of the examples

In addition to this there are 24 requests concluded as unknowntasks half of which are written extremely briefly22 or in an unclearway23 so that it was difficult for us to identify their potential tasksSome other cases can only be classified to a specific task based onparticular assumptions for instance ShBot24 uses a Bot-FrameworkQuickStatements25 to batch edits which shows a great possibilityof importing data to statements However this tool could also beused to remove statements or importing terms so there is stilla challenge to identify the requestsrsquo focus without analyzing theedits We categorized all tasks abiding strictly by our codebookthus all the RfPs which needed assumptions are therefore classifiedas unknown

We assumed that the data addition to Wikidata is related primar-ily to Wikidatarsquos inception ie that the majority of task approvalwould be more at the beginning of Wikidatarsquos lifetime HoweverFigure 3 shows that the most often requested activity types in thefirst six months were add and update At the end of 2017 thesetwo activity types were often requested again We found the peakin 2017 for requesting tasks such as adding and updating itemsunusual and investigated it further Two operators26 carried out 31requests in 2017 for importing items for different languages Since

21These badges present if a sitelink to a specific article on Wikipedia language is anexcellent article22eg wwwwikidataorgwikiWikidataRequests_for_permissionsBotGlavkos_bot23eg wwwwikidataorgwikiWikidataRequests_for_permissionsBotBotMultichillT24 wwwwikidataorgwikiWikidataRequests_for_permissionsBotShBot25 Further information are available at metawikimediaorgwikiQuickStatements26 Both editors are employed by Wikimedia Sverige and were engaged in a GLAMinitiative

120

App

rove

d Re

ques

ts

mergerevertmaintain

addupdatedelete

unknown100

80

60

40

20

0

2013-0

6

2013-1

2

2014-0

6

2015-0

6

2014-1

2

2015-1

2

2016-0

6

2016-1

2

2017-0

6

2018-0

6

2017-1

2

2012-1

2

Figure 3 Activity types of approved task requests over time(n = 600 all 81 unsuccessful task requests were not included)

all these requests were permitted it explains the sudden accelera-tion of importing and updating tasks shown in the graph

In the following step we looked more closely at the sources ofthe data which bots are supposed to add to Wikidata

100

Sour

ces o

f app

rove

d ta

sks

80

60

40

20

0

internalexternalunknown

2013-0

6

2013-1

2

2014-0

6

2015-0

6

2014-1

2

2015-1

2

2016-0

6

2016-1

2

2017-0

6

2018-0

6

2017-1

2

2012-1

2

Figure 4 Sources of approved task requests over time (n =600 all 81 unsuccessful task requests were not included)

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 4 All internal the top 10 external sources most usedand unknown sources

Origin Source Task Requests

Internal (455)

Wikipedia 263Wikidata 110Wiki-loves-monuments 30Wikicommons 13Wikisource 5Wikivoyage 4Wikinews 2Wikiquote 2WikiPathways 2Wikimedia Project 24

External (38)

MusicBrainz 8VIAF 7OpenStreetMap 4DBpedia 4Freebase 4GRID 3GND 2GitHub 2YouTube 2Disease Ontology Project 2

Unknown Unknown 107

412 Data Origin In addition to the types of tasks in RfPs thesources from which data are imported also exist in the RfP Wetried to identify all sources independently of the focus or activitytype in the classification of all task approval pages We differentiateinternal external and unknown sources

(1) Internal including all sources from the Wikimedia ecosys-tem

(2) External All sources outside Wikimediarsquos ecosystem and(3) Unknown All cases where the source of data was a local

filedatabase or not clearly statedTable 4 shows the number of tasks approved per data source The

majority of approved task requests (454) deal with data from theWikimedia ecosystem with Wikipedia providing most of the dataIn addition to Wikipedia the Wiki Loves Monuments (WLM) ini-tiative is organized worldwide by Wikipedia community membersand Wikimedia projects The category Wikimedia project refers torequests where the operator provided information on the sourcebeing restricted to Wikimedia projectsrsquo scope however a specificattribution to one of the project was not possible

Another important source refers toWikidata itselfWe found thatonly a small number of these data are used in maintenance tasks27the majority of requests are for tasks such as retrieving informationfrom Wikidata adding descriptions and updating or adding newlabels by translating the existing labels in other languages

There are 128 task approval requests that relate to a task ofimporting external data into Wikidata The external sources most27Among 104 requests with a data source of Wikidata there are 16 requests holdingtasks of maintenance such as archive discussions or update database reports

Vietnam

ese

Hebrew

Maithill

Slova

k

Cebuan

o

Estonian

GreekCzec

hNep

ali

Bulgaria

nWels

h

Georgi

anFre

nch

Russian

German

Arabic

Farsi

Turkish

DanishUrduHindi

Latvi

an

Min-Nan

English

Dutch

Roman

ian

Swed

ish

Korean

Japan

ese

Belarusia

n

Bosnian

Serbi

an aiMala

y

Catalan

Ukranian

Chinese

Norweg

ianPoli

shIta

lian

Span

ish

5

10

15

20

25

30

35

40

0

Figure 5 Wikipedia language versions used as sources men-tioned on task approval pages (n = 195 all 488 other requestseither did not provide a specific Wikipedia language version or didnot have Wikipedia as the source)

often mentioned are MusicBrainz28 and VIAF29 (cf Table 4) Bothsources are used mainly to add identifiers to Wikidata items Otherexternal sources are OpenStreetMap30 DBpedia31 and Freebase32

As shown in Figure 4 the internal data sources have remainedthose most often mentioned in the task requests External sourcesand unknown sources are on the same level and external sourcesshow a slight increase over time

Most data from Wikipedia comes from 43 different languageversions (cf Figure 5) with English French German Russian andPersian being the top five languages The results show that botscontribution to certain languages had made these languages moreprominent than others in Wikidata

There is a total of 109 RfPs in total with unknown sources 85of them were approved some of these requests mentioned a localfile or database as a data source Other RfPs are not adding data toWikidata but are for example moving pages

42 Disputed RequestsThere are 81 RfPs in our dataset which were closed unsuccessfullyWe wanted to understand better why the community decided todecline these requests thus we investigated the main reasons forall unsuccessful RfPs

Our data show that operators themselves are most commonlyresponsible for their RfPs being rejected Most of the time operatorswere not responding to questions in a timely manner or did notprovide enough information when required There are only a fewcases where RfPs were not approved by the community ie nocommunity consensus was reached or the bot violated the botpolicy In one case33 for instance an editor was asking for botrights for its own user account instead of the one for the bot This

28MusicBrainz is an open and publicly available music encyclopedia that collects musicmetadata available at wwwmusicbrainzorg29VIAF is an international authority file which stands for Virtual International Au-thority File available at wwwviaforg30OpenStreetMap is an open license map of the world available at wwwopenstreetmaporg31DBpedia is also a Wikipedia-based knowledge graph wikidbpediaorg32Freebase was a collaborative knowledge base which was launched in 2007 and closedin 2014 All data from Freebase was subsequently imported into Wikidata33 wwwwikidataorgwikiWikidataRequests_for_permissionsBotFlorentyna

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

Table 5 Top 5 most edited task requests closed as unsuccessful

Reasons No of RfP Edits No of Editors RfP Created At Reference

Data source introduced errors 88 10 2018-03 [26]Bot name bot edits bot performance 43 6 2015-12 [24]Automatic adding of implicit statements 41 21 2013-04 [23]Support from less active users duplicate task 32 16 2018-03 [27]Bot name conflict with users running without flag 32 11 2014-10 [25]

violates bot policy as operators are required to create a new accountfor their bots and include the word bot in the account name

Table 6 shows the main reasons why tasks were rejected RfPswhich had already been implemented by other bots (duplicate)or RfPs requesting for tasks which were not needed such as onebot which wanted to remove obsolete claims related to property(P107)34 but there were no more items associated with P107

Furthermore RfPs were closed as unsuccessful because the com-munity could not trust on the operatorsrsquo behaviors One bot forexample has requested many RfPs which were not accepted Thecommunity was questioning its editing behavior and required thisuser to gain community trust first It can be seen that unsuccessfulclosing is not only related to the task types that users had requested

In the following we describe six RfPs which were edited anddiscussed most of all (cf Table 5) We defined the number of editson an RfP page as a proxy for challenging cases

Phenobot [26] was asking for correcting species names basedon the UniProt Taxonomy database The community doubted thedata source (UniProt Taxonomy database) and insisted on morereliable information The bot has introduced some error reportstherefore the community had to ask for edit-rollbacks Since thenthe operator has been inactive for two years which finally led tothis request being closed unsuccessfully

WikiLovesESBot [27] which wanted to add statements related toSpanish Municipalities from Spanish Wikipedia gained continuoussupport from nine users until a user asked for more active Wiki-data users to comment After this one active user checked the botcontributions and pointed out that this bot was blocked Moreoveranother active user figured out that this request was asking forduplicate tasks since there was already another user also workingon Spanish municipalities The RfP remained open for about halfa year without any operator activities and came to a proceduralclose

Another remarkable case is VlsergeyBot [25] which intended toupdate local dictionaries transfer data from infobox to Wikidataand update properties and reports in Wikidata At first the accountwas named Secretary which was against the bot policy Aftera community suggestion it was renamed to VlsergeyBot Afterthat a user opposed the request and said that Vlsergeylsquos activitygenerates a number of large conflicts with different users in ruwikiand then another user stated that maybe you have a personal con-flict with Vlsergey and discussions looked like a personal conflictThe community then considered whether this account needs a botflag and found that the bot was already running without a bot flagfor hours and even broke the speed limit for bots with a flag The

34P107 was a property representing GND type

operator promised to update his code by adding restrictions beforethe next run Finally the operator withdrew his request stating thathe does not need a bot flag that much

ImplicatorBot [23] was intended to add implicit statements intoWikidata automatically Implicated statements are relationshipsbetween items which can be inferred from existing relationships Aproperty of an item stating that a famous mother has a daughterimplies for example that the daughter has a mother even thoughit is not represented yet The task sounds quite straightforwardbut the community was hesitating to approve the task consideringthe possibility of vandalism Furthermore changes in propertiesare still common and such a bot might cause tremendous furtherwork in the case of those changes Even if the operator had sharedhis code discussions proceeded with the highest number of editorsdiscussing a request (ie 21) and finally the operator stated that heis ldquosick and tired of the bickeringrdquo Despite the community insistingthat they needed the task the operator withdrew it

Similar to ImplicatorBot Structor [24] was withdrawn by theoperator because the operator was tired of bureaucracy The botwanted to add claim label and description particularly to providestructured information about species items There were many ques-tions raised by a small number of community members (ie a totalof six different users participated in the discussion) including thebot name the source of edits duplicate entries the edit summaryand edit test performances The operator was actively respondingto all these questions However the whole process was so surpris-ingly complex (the discussions continued for over seven monthsfrom the start that the operator was inactive for a long time thenwithdrew his request

Requests denied because the user is not trusted are also inter-esting ElphiBot_335 [22] for example which is managed togetherby three operators sent a request along with the source code forupdating the interwiki links in Wikidata after the category in Ara-bic Wikipedia has been moved The community preferred how-ever someone who is more responsible and more familiar with thetechnical aspects of Wikipedia and Wikidata What concerned thediscussants most was that the operators were not capable enoughto fix the bugs introduced by their bot The community did not feelcomfortable approving this request because the operatorsrsquo treat-ment for fixing bugs was simply to reverse the mistakes manuallyThe requirement to run a bot should not only be to know how tocode but surely also to respond in a timely manner and notify thosewho can fix the issue

35This RfP has a total of 29 edits and 8 editors contributed in the discussions

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 6 Main reasons for unsuccessful task requests (n = 81 number of unsuccessful task requests)

Reason Description Frequency

No activity from operator Operators are not active or do not respond to the questions regarding their bots 35Withdrawn Operators wanted to close the requested task 14Duplicate Requested tasks are redundant and are already done through other bots or tools 9No community consensus Community opposed the requested tasks 6User not trusted Operator has questionable editing behavior and ability to fix any bugs introduced 6Task not needed Tasks do not need a bot or are not a significant problem for the time being 5Donrsquot need bot flag The requested tasks can be performed without a bot flag 3No info provided The fields in the RfP page are left empty 2Against bot policy Task is not in line with bot policy 1

5 DISCUSSIONIn this study we collected existing data from the RfP pages onWikidata to find out what kind of tasks the community allows tobe automated or shared with bots

Looking at the type and focus of the tasks which were mentionedmost often in RfPs we found that the dominant request type overtime is adding data This observation suggests that Wikidatarsquoscommunity uses bots to increase the coverage of the provideddata which aligns with the vision of the Wikimania FoundationImagine a world in which every single human being can freelyshare in the sum of all knowledge 36 Updating data is the secondhighest requested task which shows bots also take part in correctingmistakes or refreshing data according to changes (eg updatingthe sitelinks of moved pages) and contribute in data completenesswhich is defined as one data quality dimension [2]

However we were surprised that there were fewer requestsregarding adding or updating references which could support thetrustworthiness a data quality dimension as well of data importedby bots This result supports existing analyses on the edit activitiesof bots [18] However it would be interesting to bring these twolines of research together - the task approval and edit perspective -to understand more closely the development over time Thus wecan infer that bots are allowed to assist the Wikidata communityin ensuring data quality from the completeness angle howeverthis is less visible from the trustworthiness angle In comparison toWikipedia where bots perform primarily maintenance tasks botrequests in Wikidata concentrate mainly on the content ie dataperspective

The majority of data sources mentioned in RfPs come from in-side Wikimedia projects mainly Wikipedia (cf Section 412) Thisobservation implies that Wikidata is on its way to serve as thestructured data source for Wikimedia projects one of the maingoals for the development of Wikidata Among the five languageversions of Wikipedia most used English French Russian Germanand Persian (cf Figure 5) the first three languages show a dom-inance of Western knowledge in bots imports In a study Kaffeeet al [11] had earlier found the influence of western languagesin Wikidata with the most dominant ones being English DutchFrench German Spanish Italian and Russian The similarity of theresults in both studies could imply that the reason these languages

36Information on Wikimedia strategy can be found on httpswikimediafoundationorgaboutvision

have better coverage than other languages is a result of having botsKaffee et al also found that language coverage is not related to thenumber of speakers of a language which supports our assumptionWe could not find evidence that languages other than Western onesare underrepresented on bot requests on purpose We rather ex-pect that the current bot requests are a representative sample ofWikidatarsquos community

We can see from the external sources that bots use these sourcesmostly for importing identifiers (eg VIAF) or for importing datafrom other databases (eg DBpedia Freebase) This insight supportsPiscoporsquos [17] argument that bot edits need to be more diverse Wesuggest that further efforts should be made to import or link datafrom different sources for example from research institutions andlibraries With the increased usage of the Wikibase software37 weassume that more data might be linked or imported to WikidataOur classification revealed that bots import data from local files ordatabases already However such data imports often rely on thecommunity trusting the bot operators and do not seem large-scale

The requested maintenance tasks within the RfP show similarpatterns to those in the Wikipedia community Bots are being usedto overcome the limitations of the MediaWiki software [9] Anadministrative task such as deletion is not often requested in RfPsHowever bots on Wikidata are allowed to delete sitelinks onlySimilar to Wikipedia the Wikidata community comes closer to acommon understanding of what bots are supposed to do and whatnot

The issue of unknown tasks which was not clearly defined inRfPs shows the trust the community has in single operators proba-bly due to their previous participation history There is a noticeablenumber of approved RfPs which we coded as unknown due to theirvague task description while there are also cases where task de-scriptions were clear but the community did not trust the operatorsand thus they were not approved These two observations indicatethat trust is also given importance by the community in additionto their defined policy and procedures

The success rate of RfPs is relatively high since only a smallnumber of RfPs are closed as unsuccessful Our findings show thatamong the 81 unsuccessful RfPs only some of the RfPswere rejectedby the community directly the majority of them were unsuccessfuldue to the reason that the operators were not responding or had

37Wikibase is based on MediaWiki software with the Wikibase extension httpswwwmediawikiorgwikiWikibase

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

withdrawn the request Therefore operatorrsquos inactivity is a higherreason for failure than community refusal In some cases (ie theRfPs discussed most) we can see that the community considersevery detail of the tasks and then comes to a decision however insome cases they approve the request without a detailed discussionas can be seen in the cases of unknown tasks and unknown sourcesThis result could indicate that in addition to the defined procedurefor RfPs the community applies a more flexible approach whendeciding on RfPs considering the context (eg editors experience)of the application into account

In summary the high approval rate of RfPs shows that the Wiki-data community is positive towards bot operations in Wikidata andwilling to take advantage of task automation through bots Wiki-datarsquos community profited from the experiences of the Wikipediacommunity and built productive human-bot processes in quite ashort period However we will leave the impact of these processeson data quality to future research

6 CONCLUSIONWe studied the formal process of requesting bot rights in Wikidatato find out what kind of task types are allowed to be automated bybots Our study provides a detailed description of the RfP processWe retrieved the closed RfPs from the Wikidata archive up to mid-2018 We defined a scheme in the form of a codebook to classifythe RfPs and developed our dataset The RfPs were studied mainlyfrom two perspectives 1) What information is provided duringthe time the bot rights are requested and 2) how the communityhandles these requests We found that the main tasks requestedare adding claims statements terms and sitelinks into Wikidata aswell as the main source of bot edits have their roots in WikipediaThis contrasts with Wikipedia where bots are performing mostlymaintenance tasks Our findings also show that most of the RfPswere approved and a small number of them were unsuccessfulmainly because operators had withdrawn or there was no activityfrom the operators

Future directions of this research will focus on analyzing theactual bot activities and comparing them with the results of thisstudy to find out whether bots are really performing the tasks thatthey had asked for This could help us understand better how botsevolve over time and whether the community controls bot activitiesbased on their permitted tasks or bots can perform any edit oncethey get the permission to operate as a bot

ACKNOWLEDGMENTSWe thank the Deutscher Akademischer Austauschdienst (DAAD)[Research grant 57299294] and the German Research Foundation(DFG) [Fund number 229241684] who supported this research finan-cially and the anonymous reviewers for their valuable commentswhich helped to improve the paper

REFERENCES[1] Emilio Ferrara Onur Varol Clayton A Davis Filippo Menczer and Alessandro

Flammini 2016 The rise of social bots Commun ACM 59 7 (2016) 96ndash104[2] Michael Faumlrber Frederic Bartscherer Carsten Menne and Achim Rettinger 2018

Linked data quality of DBpedia Freebase OpenCyc Wikidata and YAGO Se-mantic Web 9 1 (Jan 2018) 77ndash129 httpsdoiorg103233SW-170275

[3] R Stuart Geiger 2009 The social roles of bots and assisted editing programs InInt Sym Wikis

[4] R Stuart Geiger 2012 Bots are users too Rethinking the roles of software agentsin HCI Tiny Transactions on Computer Science 1 (2012) 1

[5] R Stuart Geiger and Aaron Halfaker 2013 When the levee breaks without botswhat happens to Wikipediarsquos quality control processes In Proceedings of the 9thInternational Symposium on Open Collaboration (WikiSym rsquo13) ACM New YorkNY USA 61ndash66 httpsdoiorg10114524910552491061

[6] R Stuart Geiger and Aaron Halfaker 2017 Operationalizing conflict and co-operation between automated software agents in Wikipedia a replication andexpansion of ldquoeven good bots fightrdquo ACM Hum-Comput Interact 1 2 (Nov 2017)33 httpsdoiorg1011453134684

[7] R Stuart Geiger andDavid Ribes 2010 Thework of sustaining order inWikipediathe banning of a vandal In Proceedings of the 2010 ACM Conference on ComputerSupported Cooperative Work (CSCW rsquo10) ACM New York NY USA 117ndash126httpsdoiorg10114517189181718941

[8] Aaron Halfaker R Stuart Geiger Jonathan T Morgan and John Riedl 2013 Therise and decline of an open collaboration system how Wikipediarsquos reaction topopularity is causing its decline 57(5) (2013) 664ndash688 httpsdoiorg1011770002764212469365

[9] Aaron Halfaker and John Riedl 2012 Bots and cyborgs Wikipediarsquos immunesystem Computer 45 3 (March 2012) 79ndash82 httpsdoiorg101109MC201282

[10] AndrewHall Loren Terveen and Aaron Halfaker 2018 Bot detection inWikidatausing behavioral and other informal cues Proceedings of the ACM on Human-Computer Interaction 2 CSCW (Nov 2018) 1ndash18 httpsdoiorg1011453274333

[11] Lucie-Aimeacutee Kaffee Alessandro Piscopo Pavlos Vougiouklis Elena SimperlLeslie Carr and Lydia Pintscher 2017 A glimpse into Babel an analysis ofmultilinguality in Wikidata In Proceedings of the 13th International Symposiumon Open Collaboration (OpenSym rsquo17) ACM New York NY USA 141ndash145httpsdoiorg10114531254333125465

[12] Jonathan Lazar JinjuanHeidi Feng andHarryHochheiser 2017 ResearchMethodsin Human-Computer Interaction (2nd ed) Morgan Kaufmann

[13] Brenda Leong and Evan Selinger 2019 Robot Eyes Wide Shut UnderstandingDishonest Anthropomorphism ACM New York New York USA

[14] Randall M Livingstone 4 January 2016 Population automation An interviewwith Wikipedia bot pioneer Ram-Man First Monday 21 1 (4 January 2016)httpsdoiorg105210fmv21i16027

[15] Claudia Muumlller-Birn 2019 Bots in Wikipedia Unfolding their duties TechnicalReports Serie B TR-B-19-01 Freie Universitaumlt Berlin httpsrefubiumfu-berlindehandlefub18824775

[16] Claudia Muumlller-Birn Benjamin Karran Janette Lehmann and Markus Luczak-Roumlsch 2015 Peer-production system or collaborative ontology engineering effortWhat is Wikidata In Proceedings of the 11th International Symposium on OpenCollaboration ACM 20

[17] Alessandro Piscopo 2018 Wikidata A New Paradigm of Human-Bot Collabora-tion arXiv181000931 [cs] (Oct 2018) httparxivorgabs181000931 arXiv181000931

[18] Alessandro Piscopo Lucie-Aimeacutee Kaffee Chris Phethean and Elena Simperl2017 Provenance information in a collaborative knowledge graph an evaluationof Wikidata external references In The Semantic Web ndash ISWC 2017 (LectureNotes in Computer Science) Springer Cham 542ndash558 httpsdoiorg101007978-3-319-68288-4_32

[19] Alessandro Piscopo Chris Phethean and Elena Simperl 2017 What makes a goodcollaborative knowledge graph group composition and quality in Wikidata InSocial Informatics (Lecture Notes in Computer Science) Giovanni Luca CiampagliaAfra Mashhadi and Taha Yasseri (Eds) Springer International Publishing 305ndash322

[20] Thomas Steiner 2014 Bots vs Wikipedians Anons vs Logged-Ins (Redux) aglobal study of edit activity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration (OpenSym rsquo14) ACM New YorkNY USA 251ndash257 httpsdoiorg10114526415802641613

[21] Margaret-Anne Storey and Alexey Zagalsky 2016 Disrupting developer produc-tivity one bot at a time In Proceedings of the 2016 24th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering (FSE 2016) ACM New YorkNY USA 928ndash931 httpsdoiorg10114529502902983989

[22] Wikidata 2013 RfP ElphiBot_3 wwwwikidataorgwikiWikidataRequests_for_permissionsBotElphiBot_3 [Last accessed 01 August 2018]

[23] Wikidata 2013 RfP ImplicatorBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotImplicatorBot [Last accessed 01 August 2018]

[24] Wikidata 2014 RfP Structor wwwwikidataorgwikiWikidataRequests_for_permissionsBotStructor [Last accessed 01 August 2018]

[25] Wikidata 2014 RfP VlsergeyBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotVlsergeyBot [Last accessed 01 August 2018]

[26] Wikidata 2016 RfP Phenobot wwwwikidataorgwikiWikidataRequests_for_permissionsBotPhenobot [Last accessed 01 August 2018]

[27] Wikidata 2016 RfP WikiLovesESBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotWikiLovesESBot [Last accessed 01 August 2018]

  • Abstract
  • 1 Introduction
  • 2 Bots in Peer Production
    • 21 Bots in Wikipedia
    • 22 Bots in Wikidata
      • 3 Approving Bot Tasks
        • 31 Approval Process in Wikidata
        • 32 Data Collection
        • 33 Task Approval Data
        • 34 Classification Process
          • 4 Results
            • 41 Approved Requests
            • 42 Disputed Requests
              • 5 Discussion
              • 6 Conclusion
              • Acknowledgments
              • References
Page 7: Approving Automation: Analyzing Requests for Permissions ... · • We used our defined scheme to identify bot-requested tasks and data sources. • We analyzed how the community

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 4 All internal the top 10 external sources most usedand unknown sources

Origin Source Task Requests

Internal (455)

Wikipedia 263Wikidata 110Wiki-loves-monuments 30Wikicommons 13Wikisource 5Wikivoyage 4Wikinews 2Wikiquote 2WikiPathways 2Wikimedia Project 24

External (38)

MusicBrainz 8VIAF 7OpenStreetMap 4DBpedia 4Freebase 4GRID 3GND 2GitHub 2YouTube 2Disease Ontology Project 2

Unknown Unknown 107

412 Data Origin In addition to the types of tasks in RfPs thesources from which data are imported also exist in the RfP Wetried to identify all sources independently of the focus or activitytype in the classification of all task approval pages We differentiateinternal external and unknown sources

(1) Internal including all sources from the Wikimedia ecosys-tem

(2) External All sources outside Wikimediarsquos ecosystem and(3) Unknown All cases where the source of data was a local

filedatabase or not clearly statedTable 4 shows the number of tasks approved per data source The

majority of approved task requests (454) deal with data from theWikimedia ecosystem with Wikipedia providing most of the dataIn addition to Wikipedia the Wiki Loves Monuments (WLM) ini-tiative is organized worldwide by Wikipedia community membersand Wikimedia projects The category Wikimedia project refers torequests where the operator provided information on the sourcebeing restricted to Wikimedia projectsrsquo scope however a specificattribution to one of the project was not possible

Another important source refers toWikidata itselfWe found thatonly a small number of these data are used in maintenance tasks27the majority of requests are for tasks such as retrieving informationfrom Wikidata adding descriptions and updating or adding newlabels by translating the existing labels in other languages

There are 128 task approval requests that relate to a task ofimporting external data into Wikidata The external sources most27Among 104 requests with a data source of Wikidata there are 16 requests holdingtasks of maintenance such as archive discussions or update database reports

Vietnam

ese

Hebrew

Maithill

Slova

k

Cebuan

o

Estonian

GreekCzec

hNep

ali

Bulgaria

nWels

h

Georgi

anFre

nch

Russian

German

Arabic

Farsi

Turkish

DanishUrduHindi

Latvi

an

Min-Nan

English

Dutch

Roman

ian

Swed

ish

Korean

Japan

ese

Belarusia

n

Bosnian

Serbi

an aiMala

y

Catalan

Ukranian

Chinese

Norweg

ianPoli

shIta

lian

Span

ish

5

10

15

20

25

30

35

40

0

Figure 5 Wikipedia language versions used as sources men-tioned on task approval pages (n = 195 all 488 other requestseither did not provide a specific Wikipedia language version or didnot have Wikipedia as the source)

often mentioned are MusicBrainz28 and VIAF29 (cf Table 4) Bothsources are used mainly to add identifiers to Wikidata items Otherexternal sources are OpenStreetMap30 DBpedia31 and Freebase32

As shown in Figure 4 the internal data sources have remainedthose most often mentioned in the task requests External sourcesand unknown sources are on the same level and external sourcesshow a slight increase over time

Most data from Wikipedia comes from 43 different languageversions (cf Figure 5) with English French German Russian andPersian being the top five languages The results show that botscontribution to certain languages had made these languages moreprominent than others in Wikidata

There is a total of 109 RfPs in total with unknown sources 85of them were approved some of these requests mentioned a localfile or database as a data source Other RfPs are not adding data toWikidata but are for example moving pages

42 Disputed RequestsThere are 81 RfPs in our dataset which were closed unsuccessfullyWe wanted to understand better why the community decided todecline these requests thus we investigated the main reasons forall unsuccessful RfPs

Our data show that operators themselves are most commonlyresponsible for their RfPs being rejected Most of the time operatorswere not responding to questions in a timely manner or did notprovide enough information when required There are only a fewcases where RfPs were not approved by the community ie nocommunity consensus was reached or the bot violated the botpolicy In one case33 for instance an editor was asking for botrights for its own user account instead of the one for the bot This

28MusicBrainz is an open and publicly available music encyclopedia that collects musicmetadata available at wwwmusicbrainzorg29VIAF is an international authority file which stands for Virtual International Au-thority File available at wwwviaforg30OpenStreetMap is an open license map of the world available at wwwopenstreetmaporg31DBpedia is also a Wikipedia-based knowledge graph wikidbpediaorg32Freebase was a collaborative knowledge base which was launched in 2007 and closedin 2014 All data from Freebase was subsequently imported into Wikidata33 wwwwikidataorgwikiWikidataRequests_for_permissionsBotFlorentyna

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

Table 5 Top 5 most edited task requests closed as unsuccessful

Reasons No of RfP Edits No of Editors RfP Created At Reference

Data source introduced errors 88 10 2018-03 [26]Bot name bot edits bot performance 43 6 2015-12 [24]Automatic adding of implicit statements 41 21 2013-04 [23]Support from less active users duplicate task 32 16 2018-03 [27]Bot name conflict with users running without flag 32 11 2014-10 [25]

violates bot policy as operators are required to create a new accountfor their bots and include the word bot in the account name

Table 6 shows the main reasons why tasks were rejected RfPswhich had already been implemented by other bots (duplicate)or RfPs requesting for tasks which were not needed such as onebot which wanted to remove obsolete claims related to property(P107)34 but there were no more items associated with P107

Furthermore RfPs were closed as unsuccessful because the com-munity could not trust on the operatorsrsquo behaviors One bot forexample has requested many RfPs which were not accepted Thecommunity was questioning its editing behavior and required thisuser to gain community trust first It can be seen that unsuccessfulclosing is not only related to the task types that users had requested

In the following we describe six RfPs which were edited anddiscussed most of all (cf Table 5) We defined the number of editson an RfP page as a proxy for challenging cases

Phenobot [26] was asking for correcting species names basedon the UniProt Taxonomy database The community doubted thedata source (UniProt Taxonomy database) and insisted on morereliable information The bot has introduced some error reportstherefore the community had to ask for edit-rollbacks Since thenthe operator has been inactive for two years which finally led tothis request being closed unsuccessfully

WikiLovesESBot [27] which wanted to add statements related toSpanish Municipalities from Spanish Wikipedia gained continuoussupport from nine users until a user asked for more active Wiki-data users to comment After this one active user checked the botcontributions and pointed out that this bot was blocked Moreoveranother active user figured out that this request was asking forduplicate tasks since there was already another user also workingon Spanish municipalities The RfP remained open for about halfa year without any operator activities and came to a proceduralclose

Another remarkable case is VlsergeyBot [25] which intended toupdate local dictionaries transfer data from infobox to Wikidataand update properties and reports in Wikidata At first the accountwas named Secretary which was against the bot policy Aftera community suggestion it was renamed to VlsergeyBot Afterthat a user opposed the request and said that Vlsergeylsquos activitygenerates a number of large conflicts with different users in ruwikiand then another user stated that maybe you have a personal con-flict with Vlsergey and discussions looked like a personal conflictThe community then considered whether this account needs a botflag and found that the bot was already running without a bot flagfor hours and even broke the speed limit for bots with a flag The

34P107 was a property representing GND type

operator promised to update his code by adding restrictions beforethe next run Finally the operator withdrew his request stating thathe does not need a bot flag that much

ImplicatorBot [23] was intended to add implicit statements intoWikidata automatically Implicated statements are relationshipsbetween items which can be inferred from existing relationships Aproperty of an item stating that a famous mother has a daughterimplies for example that the daughter has a mother even thoughit is not represented yet The task sounds quite straightforwardbut the community was hesitating to approve the task consideringthe possibility of vandalism Furthermore changes in propertiesare still common and such a bot might cause tremendous furtherwork in the case of those changes Even if the operator had sharedhis code discussions proceeded with the highest number of editorsdiscussing a request (ie 21) and finally the operator stated that heis ldquosick and tired of the bickeringrdquo Despite the community insistingthat they needed the task the operator withdrew it

Similar to ImplicatorBot Structor [24] was withdrawn by theoperator because the operator was tired of bureaucracy The botwanted to add claim label and description particularly to providestructured information about species items There were many ques-tions raised by a small number of community members (ie a totalof six different users participated in the discussion) including thebot name the source of edits duplicate entries the edit summaryand edit test performances The operator was actively respondingto all these questions However the whole process was so surpris-ingly complex (the discussions continued for over seven monthsfrom the start that the operator was inactive for a long time thenwithdrew his request

Requests denied because the user is not trusted are also inter-esting ElphiBot_335 [22] for example which is managed togetherby three operators sent a request along with the source code forupdating the interwiki links in Wikidata after the category in Ara-bic Wikipedia has been moved The community preferred how-ever someone who is more responsible and more familiar with thetechnical aspects of Wikipedia and Wikidata What concerned thediscussants most was that the operators were not capable enoughto fix the bugs introduced by their bot The community did not feelcomfortable approving this request because the operatorsrsquo treat-ment for fixing bugs was simply to reverse the mistakes manuallyThe requirement to run a bot should not only be to know how tocode but surely also to respond in a timely manner and notify thosewho can fix the issue

35This RfP has a total of 29 edits and 8 editors contributed in the discussions

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 6 Main reasons for unsuccessful task requests (n = 81 number of unsuccessful task requests)

Reason Description Frequency

No activity from operator Operators are not active or do not respond to the questions regarding their bots 35Withdrawn Operators wanted to close the requested task 14Duplicate Requested tasks are redundant and are already done through other bots or tools 9No community consensus Community opposed the requested tasks 6User not trusted Operator has questionable editing behavior and ability to fix any bugs introduced 6Task not needed Tasks do not need a bot or are not a significant problem for the time being 5Donrsquot need bot flag The requested tasks can be performed without a bot flag 3No info provided The fields in the RfP page are left empty 2Against bot policy Task is not in line with bot policy 1

5 DISCUSSIONIn this study we collected existing data from the RfP pages onWikidata to find out what kind of tasks the community allows tobe automated or shared with bots

Looking at the type and focus of the tasks which were mentionedmost often in RfPs we found that the dominant request type overtime is adding data This observation suggests that Wikidatarsquoscommunity uses bots to increase the coverage of the provideddata which aligns with the vision of the Wikimania FoundationImagine a world in which every single human being can freelyshare in the sum of all knowledge 36 Updating data is the secondhighest requested task which shows bots also take part in correctingmistakes or refreshing data according to changes (eg updatingthe sitelinks of moved pages) and contribute in data completenesswhich is defined as one data quality dimension [2]

However we were surprised that there were fewer requestsregarding adding or updating references which could support thetrustworthiness a data quality dimension as well of data importedby bots This result supports existing analyses on the edit activitiesof bots [18] However it would be interesting to bring these twolines of research together - the task approval and edit perspective -to understand more closely the development over time Thus wecan infer that bots are allowed to assist the Wikidata communityin ensuring data quality from the completeness angle howeverthis is less visible from the trustworthiness angle In comparison toWikipedia where bots perform primarily maintenance tasks botrequests in Wikidata concentrate mainly on the content ie dataperspective

The majority of data sources mentioned in RfPs come from in-side Wikimedia projects mainly Wikipedia (cf Section 412) Thisobservation implies that Wikidata is on its way to serve as thestructured data source for Wikimedia projects one of the maingoals for the development of Wikidata Among the five languageversions of Wikipedia most used English French Russian Germanand Persian (cf Figure 5) the first three languages show a dom-inance of Western knowledge in bots imports In a study Kaffeeet al [11] had earlier found the influence of western languagesin Wikidata with the most dominant ones being English DutchFrench German Spanish Italian and Russian The similarity of theresults in both studies could imply that the reason these languages

36Information on Wikimedia strategy can be found on httpswikimediafoundationorgaboutvision

have better coverage than other languages is a result of having botsKaffee et al also found that language coverage is not related to thenumber of speakers of a language which supports our assumptionWe could not find evidence that languages other than Western onesare underrepresented on bot requests on purpose We rather ex-pect that the current bot requests are a representative sample ofWikidatarsquos community

We can see from the external sources that bots use these sourcesmostly for importing identifiers (eg VIAF) or for importing datafrom other databases (eg DBpedia Freebase) This insight supportsPiscoporsquos [17] argument that bot edits need to be more diverse Wesuggest that further efforts should be made to import or link datafrom different sources for example from research institutions andlibraries With the increased usage of the Wikibase software37 weassume that more data might be linked or imported to WikidataOur classification revealed that bots import data from local files ordatabases already However such data imports often rely on thecommunity trusting the bot operators and do not seem large-scale

The requested maintenance tasks within the RfP show similarpatterns to those in the Wikipedia community Bots are being usedto overcome the limitations of the MediaWiki software [9] Anadministrative task such as deletion is not often requested in RfPsHowever bots on Wikidata are allowed to delete sitelinks onlySimilar to Wikipedia the Wikidata community comes closer to acommon understanding of what bots are supposed to do and whatnot

The issue of unknown tasks which was not clearly defined inRfPs shows the trust the community has in single operators proba-bly due to their previous participation history There is a noticeablenumber of approved RfPs which we coded as unknown due to theirvague task description while there are also cases where task de-scriptions were clear but the community did not trust the operatorsand thus they were not approved These two observations indicatethat trust is also given importance by the community in additionto their defined policy and procedures

The success rate of RfPs is relatively high since only a smallnumber of RfPs are closed as unsuccessful Our findings show thatamong the 81 unsuccessful RfPs only some of the RfPswere rejectedby the community directly the majority of them were unsuccessfuldue to the reason that the operators were not responding or had

37Wikibase is based on MediaWiki software with the Wikibase extension httpswwwmediawikiorgwikiWikibase

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

withdrawn the request Therefore operatorrsquos inactivity is a higherreason for failure than community refusal In some cases (ie theRfPs discussed most) we can see that the community considersevery detail of the tasks and then comes to a decision however insome cases they approve the request without a detailed discussionas can be seen in the cases of unknown tasks and unknown sourcesThis result could indicate that in addition to the defined procedurefor RfPs the community applies a more flexible approach whendeciding on RfPs considering the context (eg editors experience)of the application into account

In summary the high approval rate of RfPs shows that the Wiki-data community is positive towards bot operations in Wikidata andwilling to take advantage of task automation through bots Wiki-datarsquos community profited from the experiences of the Wikipediacommunity and built productive human-bot processes in quite ashort period However we will leave the impact of these processeson data quality to future research

6 CONCLUSIONWe studied the formal process of requesting bot rights in Wikidatato find out what kind of task types are allowed to be automated bybots Our study provides a detailed description of the RfP processWe retrieved the closed RfPs from the Wikidata archive up to mid-2018 We defined a scheme in the form of a codebook to classifythe RfPs and developed our dataset The RfPs were studied mainlyfrom two perspectives 1) What information is provided duringthe time the bot rights are requested and 2) how the communityhandles these requests We found that the main tasks requestedare adding claims statements terms and sitelinks into Wikidata aswell as the main source of bot edits have their roots in WikipediaThis contrasts with Wikipedia where bots are performing mostlymaintenance tasks Our findings also show that most of the RfPswere approved and a small number of them were unsuccessfulmainly because operators had withdrawn or there was no activityfrom the operators

Future directions of this research will focus on analyzing theactual bot activities and comparing them with the results of thisstudy to find out whether bots are really performing the tasks thatthey had asked for This could help us understand better how botsevolve over time and whether the community controls bot activitiesbased on their permitted tasks or bots can perform any edit oncethey get the permission to operate as a bot

ACKNOWLEDGMENTSWe thank the Deutscher Akademischer Austauschdienst (DAAD)[Research grant 57299294] and the German Research Foundation(DFG) [Fund number 229241684] who supported this research finan-cially and the anonymous reviewers for their valuable commentswhich helped to improve the paper

REFERENCES[1] Emilio Ferrara Onur Varol Clayton A Davis Filippo Menczer and Alessandro

Flammini 2016 The rise of social bots Commun ACM 59 7 (2016) 96ndash104[2] Michael Faumlrber Frederic Bartscherer Carsten Menne and Achim Rettinger 2018

Linked data quality of DBpedia Freebase OpenCyc Wikidata and YAGO Se-mantic Web 9 1 (Jan 2018) 77ndash129 httpsdoiorg103233SW-170275

[3] R Stuart Geiger 2009 The social roles of bots and assisted editing programs InInt Sym Wikis

[4] R Stuart Geiger 2012 Bots are users too Rethinking the roles of software agentsin HCI Tiny Transactions on Computer Science 1 (2012) 1

[5] R Stuart Geiger and Aaron Halfaker 2013 When the levee breaks without botswhat happens to Wikipediarsquos quality control processes In Proceedings of the 9thInternational Symposium on Open Collaboration (WikiSym rsquo13) ACM New YorkNY USA 61ndash66 httpsdoiorg10114524910552491061

[6] R Stuart Geiger and Aaron Halfaker 2017 Operationalizing conflict and co-operation between automated software agents in Wikipedia a replication andexpansion of ldquoeven good bots fightrdquo ACM Hum-Comput Interact 1 2 (Nov 2017)33 httpsdoiorg1011453134684

[7] R Stuart Geiger andDavid Ribes 2010 Thework of sustaining order inWikipediathe banning of a vandal In Proceedings of the 2010 ACM Conference on ComputerSupported Cooperative Work (CSCW rsquo10) ACM New York NY USA 117ndash126httpsdoiorg10114517189181718941

[8] Aaron Halfaker R Stuart Geiger Jonathan T Morgan and John Riedl 2013 Therise and decline of an open collaboration system how Wikipediarsquos reaction topopularity is causing its decline 57(5) (2013) 664ndash688 httpsdoiorg1011770002764212469365

[9] Aaron Halfaker and John Riedl 2012 Bots and cyborgs Wikipediarsquos immunesystem Computer 45 3 (March 2012) 79ndash82 httpsdoiorg101109MC201282

[10] AndrewHall Loren Terveen and Aaron Halfaker 2018 Bot detection inWikidatausing behavioral and other informal cues Proceedings of the ACM on Human-Computer Interaction 2 CSCW (Nov 2018) 1ndash18 httpsdoiorg1011453274333

[11] Lucie-Aimeacutee Kaffee Alessandro Piscopo Pavlos Vougiouklis Elena SimperlLeslie Carr and Lydia Pintscher 2017 A glimpse into Babel an analysis ofmultilinguality in Wikidata In Proceedings of the 13th International Symposiumon Open Collaboration (OpenSym rsquo17) ACM New York NY USA 141ndash145httpsdoiorg10114531254333125465

[12] Jonathan Lazar JinjuanHeidi Feng andHarryHochheiser 2017 ResearchMethodsin Human-Computer Interaction (2nd ed) Morgan Kaufmann

[13] Brenda Leong and Evan Selinger 2019 Robot Eyes Wide Shut UnderstandingDishonest Anthropomorphism ACM New York New York USA

[14] Randall M Livingstone 4 January 2016 Population automation An interviewwith Wikipedia bot pioneer Ram-Man First Monday 21 1 (4 January 2016)httpsdoiorg105210fmv21i16027

[15] Claudia Muumlller-Birn 2019 Bots in Wikipedia Unfolding their duties TechnicalReports Serie B TR-B-19-01 Freie Universitaumlt Berlin httpsrefubiumfu-berlindehandlefub18824775

[16] Claudia Muumlller-Birn Benjamin Karran Janette Lehmann and Markus Luczak-Roumlsch 2015 Peer-production system or collaborative ontology engineering effortWhat is Wikidata In Proceedings of the 11th International Symposium on OpenCollaboration ACM 20

[17] Alessandro Piscopo 2018 Wikidata A New Paradigm of Human-Bot Collabora-tion arXiv181000931 [cs] (Oct 2018) httparxivorgabs181000931 arXiv181000931

[18] Alessandro Piscopo Lucie-Aimeacutee Kaffee Chris Phethean and Elena Simperl2017 Provenance information in a collaborative knowledge graph an evaluationof Wikidata external references In The Semantic Web ndash ISWC 2017 (LectureNotes in Computer Science) Springer Cham 542ndash558 httpsdoiorg101007978-3-319-68288-4_32

[19] Alessandro Piscopo Chris Phethean and Elena Simperl 2017 What makes a goodcollaborative knowledge graph group composition and quality in Wikidata InSocial Informatics (Lecture Notes in Computer Science) Giovanni Luca CiampagliaAfra Mashhadi and Taha Yasseri (Eds) Springer International Publishing 305ndash322

[20] Thomas Steiner 2014 Bots vs Wikipedians Anons vs Logged-Ins (Redux) aglobal study of edit activity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration (OpenSym rsquo14) ACM New YorkNY USA 251ndash257 httpsdoiorg10114526415802641613

[21] Margaret-Anne Storey and Alexey Zagalsky 2016 Disrupting developer produc-tivity one bot at a time In Proceedings of the 2016 24th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering (FSE 2016) ACM New YorkNY USA 928ndash931 httpsdoiorg10114529502902983989

[22] Wikidata 2013 RfP ElphiBot_3 wwwwikidataorgwikiWikidataRequests_for_permissionsBotElphiBot_3 [Last accessed 01 August 2018]

[23] Wikidata 2013 RfP ImplicatorBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotImplicatorBot [Last accessed 01 August 2018]

[24] Wikidata 2014 RfP Structor wwwwikidataorgwikiWikidataRequests_for_permissionsBotStructor [Last accessed 01 August 2018]

[25] Wikidata 2014 RfP VlsergeyBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotVlsergeyBot [Last accessed 01 August 2018]

[26] Wikidata 2016 RfP Phenobot wwwwikidataorgwikiWikidataRequests_for_permissionsBotPhenobot [Last accessed 01 August 2018]

[27] Wikidata 2016 RfP WikiLovesESBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotWikiLovesESBot [Last accessed 01 August 2018]

  • Abstract
  • 1 Introduction
  • 2 Bots in Peer Production
    • 21 Bots in Wikipedia
    • 22 Bots in Wikidata
      • 3 Approving Bot Tasks
        • 31 Approval Process in Wikidata
        • 32 Data Collection
        • 33 Task Approval Data
        • 34 Classification Process
          • 4 Results
            • 41 Approved Requests
            • 42 Disputed Requests
              • 5 Discussion
              • 6 Conclusion
              • Acknowledgments
              • References
Page 8: Approving Automation: Analyzing Requests for Permissions ... · • We used our defined scheme to identify bot-requested tasks and data sources. • We analyzed how the community

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

Table 5 Top 5 most edited task requests closed as unsuccessful

Reasons No of RfP Edits No of Editors RfP Created At Reference

Data source introduced errors 88 10 2018-03 [26]Bot name bot edits bot performance 43 6 2015-12 [24]Automatic adding of implicit statements 41 21 2013-04 [23]Support from less active users duplicate task 32 16 2018-03 [27]Bot name conflict with users running without flag 32 11 2014-10 [25]

violates bot policy as operators are required to create a new accountfor their bots and include the word bot in the account name

Table 6 shows the main reasons why tasks were rejected RfPswhich had already been implemented by other bots (duplicate)or RfPs requesting for tasks which were not needed such as onebot which wanted to remove obsolete claims related to property(P107)34 but there were no more items associated with P107

Furthermore RfPs were closed as unsuccessful because the com-munity could not trust on the operatorsrsquo behaviors One bot forexample has requested many RfPs which were not accepted Thecommunity was questioning its editing behavior and required thisuser to gain community trust first It can be seen that unsuccessfulclosing is not only related to the task types that users had requested

In the following we describe six RfPs which were edited anddiscussed most of all (cf Table 5) We defined the number of editson an RfP page as a proxy for challenging cases

Phenobot [26] was asking for correcting species names basedon the UniProt Taxonomy database The community doubted thedata source (UniProt Taxonomy database) and insisted on morereliable information The bot has introduced some error reportstherefore the community had to ask for edit-rollbacks Since thenthe operator has been inactive for two years which finally led tothis request being closed unsuccessfully

WikiLovesESBot [27] which wanted to add statements related toSpanish Municipalities from Spanish Wikipedia gained continuoussupport from nine users until a user asked for more active Wiki-data users to comment After this one active user checked the botcontributions and pointed out that this bot was blocked Moreoveranother active user figured out that this request was asking forduplicate tasks since there was already another user also workingon Spanish municipalities The RfP remained open for about halfa year without any operator activities and came to a proceduralclose

Another remarkable case is VlsergeyBot [25] which intended toupdate local dictionaries transfer data from infobox to Wikidataand update properties and reports in Wikidata At first the accountwas named Secretary which was against the bot policy Aftera community suggestion it was renamed to VlsergeyBot Afterthat a user opposed the request and said that Vlsergeylsquos activitygenerates a number of large conflicts with different users in ruwikiand then another user stated that maybe you have a personal con-flict with Vlsergey and discussions looked like a personal conflictThe community then considered whether this account needs a botflag and found that the bot was already running without a bot flagfor hours and even broke the speed limit for bots with a flag The

34P107 was a property representing GND type

operator promised to update his code by adding restrictions beforethe next run Finally the operator withdrew his request stating thathe does not need a bot flag that much

ImplicatorBot [23] was intended to add implicit statements intoWikidata automatically Implicated statements are relationshipsbetween items which can be inferred from existing relationships Aproperty of an item stating that a famous mother has a daughterimplies for example that the daughter has a mother even thoughit is not represented yet The task sounds quite straightforwardbut the community was hesitating to approve the task consideringthe possibility of vandalism Furthermore changes in propertiesare still common and such a bot might cause tremendous furtherwork in the case of those changes Even if the operator had sharedhis code discussions proceeded with the highest number of editorsdiscussing a request (ie 21) and finally the operator stated that heis ldquosick and tired of the bickeringrdquo Despite the community insistingthat they needed the task the operator withdrew it

Similar to ImplicatorBot Structor [24] was withdrawn by theoperator because the operator was tired of bureaucracy The botwanted to add claim label and description particularly to providestructured information about species items There were many ques-tions raised by a small number of community members (ie a totalof six different users participated in the discussion) including thebot name the source of edits duplicate entries the edit summaryand edit test performances The operator was actively respondingto all these questions However the whole process was so surpris-ingly complex (the discussions continued for over seven monthsfrom the start that the operator was inactive for a long time thenwithdrew his request

Requests denied because the user is not trusted are also inter-esting ElphiBot_335 [22] for example which is managed togetherby three operators sent a request along with the source code forupdating the interwiki links in Wikidata after the category in Ara-bic Wikipedia has been moved The community preferred how-ever someone who is more responsible and more familiar with thetechnical aspects of Wikipedia and Wikidata What concerned thediscussants most was that the operators were not capable enoughto fix the bugs introduced by their bot The community did not feelcomfortable approving this request because the operatorsrsquo treat-ment for fixing bugs was simply to reverse the mistakes manuallyThe requirement to run a bot should not only be to know how tocode but surely also to respond in a timely manner and notify thosewho can fix the issue

35This RfP has a total of 29 edits and 8 editors contributed in the discussions

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 6 Main reasons for unsuccessful task requests (n = 81 number of unsuccessful task requests)

Reason Description Frequency

No activity from operator Operators are not active or do not respond to the questions regarding their bots 35Withdrawn Operators wanted to close the requested task 14Duplicate Requested tasks are redundant and are already done through other bots or tools 9No community consensus Community opposed the requested tasks 6User not trusted Operator has questionable editing behavior and ability to fix any bugs introduced 6Task not needed Tasks do not need a bot or are not a significant problem for the time being 5Donrsquot need bot flag The requested tasks can be performed without a bot flag 3No info provided The fields in the RfP page are left empty 2Against bot policy Task is not in line with bot policy 1

5 DISCUSSIONIn this study we collected existing data from the RfP pages onWikidata to find out what kind of tasks the community allows tobe automated or shared with bots

Looking at the type and focus of the tasks which were mentionedmost often in RfPs we found that the dominant request type overtime is adding data This observation suggests that Wikidatarsquoscommunity uses bots to increase the coverage of the provideddata which aligns with the vision of the Wikimania FoundationImagine a world in which every single human being can freelyshare in the sum of all knowledge 36 Updating data is the secondhighest requested task which shows bots also take part in correctingmistakes or refreshing data according to changes (eg updatingthe sitelinks of moved pages) and contribute in data completenesswhich is defined as one data quality dimension [2]

However we were surprised that there were fewer requestsregarding adding or updating references which could support thetrustworthiness a data quality dimension as well of data importedby bots This result supports existing analyses on the edit activitiesof bots [18] However it would be interesting to bring these twolines of research together - the task approval and edit perspective -to understand more closely the development over time Thus wecan infer that bots are allowed to assist the Wikidata communityin ensuring data quality from the completeness angle howeverthis is less visible from the trustworthiness angle In comparison toWikipedia where bots perform primarily maintenance tasks botrequests in Wikidata concentrate mainly on the content ie dataperspective

The majority of data sources mentioned in RfPs come from in-side Wikimedia projects mainly Wikipedia (cf Section 412) Thisobservation implies that Wikidata is on its way to serve as thestructured data source for Wikimedia projects one of the maingoals for the development of Wikidata Among the five languageversions of Wikipedia most used English French Russian Germanand Persian (cf Figure 5) the first three languages show a dom-inance of Western knowledge in bots imports In a study Kaffeeet al [11] had earlier found the influence of western languagesin Wikidata with the most dominant ones being English DutchFrench German Spanish Italian and Russian The similarity of theresults in both studies could imply that the reason these languages

36Information on Wikimedia strategy can be found on httpswikimediafoundationorgaboutvision

have better coverage than other languages is a result of having botsKaffee et al also found that language coverage is not related to thenumber of speakers of a language which supports our assumptionWe could not find evidence that languages other than Western onesare underrepresented on bot requests on purpose We rather ex-pect that the current bot requests are a representative sample ofWikidatarsquos community

We can see from the external sources that bots use these sourcesmostly for importing identifiers (eg VIAF) or for importing datafrom other databases (eg DBpedia Freebase) This insight supportsPiscoporsquos [17] argument that bot edits need to be more diverse Wesuggest that further efforts should be made to import or link datafrom different sources for example from research institutions andlibraries With the increased usage of the Wikibase software37 weassume that more data might be linked or imported to WikidataOur classification revealed that bots import data from local files ordatabases already However such data imports often rely on thecommunity trusting the bot operators and do not seem large-scale

The requested maintenance tasks within the RfP show similarpatterns to those in the Wikipedia community Bots are being usedto overcome the limitations of the MediaWiki software [9] Anadministrative task such as deletion is not often requested in RfPsHowever bots on Wikidata are allowed to delete sitelinks onlySimilar to Wikipedia the Wikidata community comes closer to acommon understanding of what bots are supposed to do and whatnot

The issue of unknown tasks which was not clearly defined inRfPs shows the trust the community has in single operators proba-bly due to their previous participation history There is a noticeablenumber of approved RfPs which we coded as unknown due to theirvague task description while there are also cases where task de-scriptions were clear but the community did not trust the operatorsand thus they were not approved These two observations indicatethat trust is also given importance by the community in additionto their defined policy and procedures

The success rate of RfPs is relatively high since only a smallnumber of RfPs are closed as unsuccessful Our findings show thatamong the 81 unsuccessful RfPs only some of the RfPswere rejectedby the community directly the majority of them were unsuccessfuldue to the reason that the operators were not responding or had

37Wikibase is based on MediaWiki software with the Wikibase extension httpswwwmediawikiorgwikiWikibase

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

withdrawn the request Therefore operatorrsquos inactivity is a higherreason for failure than community refusal In some cases (ie theRfPs discussed most) we can see that the community considersevery detail of the tasks and then comes to a decision however insome cases they approve the request without a detailed discussionas can be seen in the cases of unknown tasks and unknown sourcesThis result could indicate that in addition to the defined procedurefor RfPs the community applies a more flexible approach whendeciding on RfPs considering the context (eg editors experience)of the application into account

In summary the high approval rate of RfPs shows that the Wiki-data community is positive towards bot operations in Wikidata andwilling to take advantage of task automation through bots Wiki-datarsquos community profited from the experiences of the Wikipediacommunity and built productive human-bot processes in quite ashort period However we will leave the impact of these processeson data quality to future research

6 CONCLUSIONWe studied the formal process of requesting bot rights in Wikidatato find out what kind of task types are allowed to be automated bybots Our study provides a detailed description of the RfP processWe retrieved the closed RfPs from the Wikidata archive up to mid-2018 We defined a scheme in the form of a codebook to classifythe RfPs and developed our dataset The RfPs were studied mainlyfrom two perspectives 1) What information is provided duringthe time the bot rights are requested and 2) how the communityhandles these requests We found that the main tasks requestedare adding claims statements terms and sitelinks into Wikidata aswell as the main source of bot edits have their roots in WikipediaThis contrasts with Wikipedia where bots are performing mostlymaintenance tasks Our findings also show that most of the RfPswere approved and a small number of them were unsuccessfulmainly because operators had withdrawn or there was no activityfrom the operators

Future directions of this research will focus on analyzing theactual bot activities and comparing them with the results of thisstudy to find out whether bots are really performing the tasks thatthey had asked for This could help us understand better how botsevolve over time and whether the community controls bot activitiesbased on their permitted tasks or bots can perform any edit oncethey get the permission to operate as a bot

ACKNOWLEDGMENTSWe thank the Deutscher Akademischer Austauschdienst (DAAD)[Research grant 57299294] and the German Research Foundation(DFG) [Fund number 229241684] who supported this research finan-cially and the anonymous reviewers for their valuable commentswhich helped to improve the paper

REFERENCES[1] Emilio Ferrara Onur Varol Clayton A Davis Filippo Menczer and Alessandro

Flammini 2016 The rise of social bots Commun ACM 59 7 (2016) 96ndash104[2] Michael Faumlrber Frederic Bartscherer Carsten Menne and Achim Rettinger 2018

Linked data quality of DBpedia Freebase OpenCyc Wikidata and YAGO Se-mantic Web 9 1 (Jan 2018) 77ndash129 httpsdoiorg103233SW-170275

[3] R Stuart Geiger 2009 The social roles of bots and assisted editing programs InInt Sym Wikis

[4] R Stuart Geiger 2012 Bots are users too Rethinking the roles of software agentsin HCI Tiny Transactions on Computer Science 1 (2012) 1

[5] R Stuart Geiger and Aaron Halfaker 2013 When the levee breaks without botswhat happens to Wikipediarsquos quality control processes In Proceedings of the 9thInternational Symposium on Open Collaboration (WikiSym rsquo13) ACM New YorkNY USA 61ndash66 httpsdoiorg10114524910552491061

[6] R Stuart Geiger and Aaron Halfaker 2017 Operationalizing conflict and co-operation between automated software agents in Wikipedia a replication andexpansion of ldquoeven good bots fightrdquo ACM Hum-Comput Interact 1 2 (Nov 2017)33 httpsdoiorg1011453134684

[7] R Stuart Geiger andDavid Ribes 2010 Thework of sustaining order inWikipediathe banning of a vandal In Proceedings of the 2010 ACM Conference on ComputerSupported Cooperative Work (CSCW rsquo10) ACM New York NY USA 117ndash126httpsdoiorg10114517189181718941

[8] Aaron Halfaker R Stuart Geiger Jonathan T Morgan and John Riedl 2013 Therise and decline of an open collaboration system how Wikipediarsquos reaction topopularity is causing its decline 57(5) (2013) 664ndash688 httpsdoiorg1011770002764212469365

[9] Aaron Halfaker and John Riedl 2012 Bots and cyborgs Wikipediarsquos immunesystem Computer 45 3 (March 2012) 79ndash82 httpsdoiorg101109MC201282

[10] AndrewHall Loren Terveen and Aaron Halfaker 2018 Bot detection inWikidatausing behavioral and other informal cues Proceedings of the ACM on Human-Computer Interaction 2 CSCW (Nov 2018) 1ndash18 httpsdoiorg1011453274333

[11] Lucie-Aimeacutee Kaffee Alessandro Piscopo Pavlos Vougiouklis Elena SimperlLeslie Carr and Lydia Pintscher 2017 A glimpse into Babel an analysis ofmultilinguality in Wikidata In Proceedings of the 13th International Symposiumon Open Collaboration (OpenSym rsquo17) ACM New York NY USA 141ndash145httpsdoiorg10114531254333125465

[12] Jonathan Lazar JinjuanHeidi Feng andHarryHochheiser 2017 ResearchMethodsin Human-Computer Interaction (2nd ed) Morgan Kaufmann

[13] Brenda Leong and Evan Selinger 2019 Robot Eyes Wide Shut UnderstandingDishonest Anthropomorphism ACM New York New York USA

[14] Randall M Livingstone 4 January 2016 Population automation An interviewwith Wikipedia bot pioneer Ram-Man First Monday 21 1 (4 January 2016)httpsdoiorg105210fmv21i16027

[15] Claudia Muumlller-Birn 2019 Bots in Wikipedia Unfolding their duties TechnicalReports Serie B TR-B-19-01 Freie Universitaumlt Berlin httpsrefubiumfu-berlindehandlefub18824775

[16] Claudia Muumlller-Birn Benjamin Karran Janette Lehmann and Markus Luczak-Roumlsch 2015 Peer-production system or collaborative ontology engineering effortWhat is Wikidata In Proceedings of the 11th International Symposium on OpenCollaboration ACM 20

[17] Alessandro Piscopo 2018 Wikidata A New Paradigm of Human-Bot Collabora-tion arXiv181000931 [cs] (Oct 2018) httparxivorgabs181000931 arXiv181000931

[18] Alessandro Piscopo Lucie-Aimeacutee Kaffee Chris Phethean and Elena Simperl2017 Provenance information in a collaborative knowledge graph an evaluationof Wikidata external references In The Semantic Web ndash ISWC 2017 (LectureNotes in Computer Science) Springer Cham 542ndash558 httpsdoiorg101007978-3-319-68288-4_32

[19] Alessandro Piscopo Chris Phethean and Elena Simperl 2017 What makes a goodcollaborative knowledge graph group composition and quality in Wikidata InSocial Informatics (Lecture Notes in Computer Science) Giovanni Luca CiampagliaAfra Mashhadi and Taha Yasseri (Eds) Springer International Publishing 305ndash322

[20] Thomas Steiner 2014 Bots vs Wikipedians Anons vs Logged-Ins (Redux) aglobal study of edit activity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration (OpenSym rsquo14) ACM New YorkNY USA 251ndash257 httpsdoiorg10114526415802641613

[21] Margaret-Anne Storey and Alexey Zagalsky 2016 Disrupting developer produc-tivity one bot at a time In Proceedings of the 2016 24th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering (FSE 2016) ACM New YorkNY USA 928ndash931 httpsdoiorg10114529502902983989

[22] Wikidata 2013 RfP ElphiBot_3 wwwwikidataorgwikiWikidataRequests_for_permissionsBotElphiBot_3 [Last accessed 01 August 2018]

[23] Wikidata 2013 RfP ImplicatorBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotImplicatorBot [Last accessed 01 August 2018]

[24] Wikidata 2014 RfP Structor wwwwikidataorgwikiWikidataRequests_for_permissionsBotStructor [Last accessed 01 August 2018]

[25] Wikidata 2014 RfP VlsergeyBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotVlsergeyBot [Last accessed 01 August 2018]

[26] Wikidata 2016 RfP Phenobot wwwwikidataorgwikiWikidataRequests_for_permissionsBotPhenobot [Last accessed 01 August 2018]

[27] Wikidata 2016 RfP WikiLovesESBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotWikiLovesESBot [Last accessed 01 August 2018]

  • Abstract
  • 1 Introduction
  • 2 Bots in Peer Production
    • 21 Bots in Wikipedia
    • 22 Bots in Wikidata
      • 3 Approving Bot Tasks
        • 31 Approval Process in Wikidata
        • 32 Data Collection
        • 33 Task Approval Data
        • 34 Classification Process
          • 4 Results
            • 41 Approved Requests
            • 42 Disputed Requests
              • 5 Discussion
              • 6 Conclusion
              • Acknowledgments
              • References
Page 9: Approving Automation: Analyzing Requests for Permissions ... · • We used our defined scheme to identify bot-requested tasks and data sources. • We analyzed how the community

Approving AutomationAnalyzing Requests for Permissions of Bots in Wikidata OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden

Table 6 Main reasons for unsuccessful task requests (n = 81 number of unsuccessful task requests)

Reason Description Frequency

No activity from operator Operators are not active or do not respond to the questions regarding their bots 35Withdrawn Operators wanted to close the requested task 14Duplicate Requested tasks are redundant and are already done through other bots or tools 9No community consensus Community opposed the requested tasks 6User not trusted Operator has questionable editing behavior and ability to fix any bugs introduced 6Task not needed Tasks do not need a bot or are not a significant problem for the time being 5Donrsquot need bot flag The requested tasks can be performed without a bot flag 3No info provided The fields in the RfP page are left empty 2Against bot policy Task is not in line with bot policy 1

5 DISCUSSIONIn this study we collected existing data from the RfP pages onWikidata to find out what kind of tasks the community allows tobe automated or shared with bots

Looking at the type and focus of the tasks which were mentionedmost often in RfPs we found that the dominant request type overtime is adding data This observation suggests that Wikidatarsquoscommunity uses bots to increase the coverage of the provideddata which aligns with the vision of the Wikimania FoundationImagine a world in which every single human being can freelyshare in the sum of all knowledge 36 Updating data is the secondhighest requested task which shows bots also take part in correctingmistakes or refreshing data according to changes (eg updatingthe sitelinks of moved pages) and contribute in data completenesswhich is defined as one data quality dimension [2]

However we were surprised that there were fewer requestsregarding adding or updating references which could support thetrustworthiness a data quality dimension as well of data importedby bots This result supports existing analyses on the edit activitiesof bots [18] However it would be interesting to bring these twolines of research together - the task approval and edit perspective -to understand more closely the development over time Thus wecan infer that bots are allowed to assist the Wikidata communityin ensuring data quality from the completeness angle howeverthis is less visible from the trustworthiness angle In comparison toWikipedia where bots perform primarily maintenance tasks botrequests in Wikidata concentrate mainly on the content ie dataperspective

The majority of data sources mentioned in RfPs come from in-side Wikimedia projects mainly Wikipedia (cf Section 412) Thisobservation implies that Wikidata is on its way to serve as thestructured data source for Wikimedia projects one of the maingoals for the development of Wikidata Among the five languageversions of Wikipedia most used English French Russian Germanand Persian (cf Figure 5) the first three languages show a dom-inance of Western knowledge in bots imports In a study Kaffeeet al [11] had earlier found the influence of western languagesin Wikidata with the most dominant ones being English DutchFrench German Spanish Italian and Russian The similarity of theresults in both studies could imply that the reason these languages

36Information on Wikimedia strategy can be found on httpswikimediafoundationorgaboutvision

have better coverage than other languages is a result of having botsKaffee et al also found that language coverage is not related to thenumber of speakers of a language which supports our assumptionWe could not find evidence that languages other than Western onesare underrepresented on bot requests on purpose We rather ex-pect that the current bot requests are a representative sample ofWikidatarsquos community

We can see from the external sources that bots use these sourcesmostly for importing identifiers (eg VIAF) or for importing datafrom other databases (eg DBpedia Freebase) This insight supportsPiscoporsquos [17] argument that bot edits need to be more diverse Wesuggest that further efforts should be made to import or link datafrom different sources for example from research institutions andlibraries With the increased usage of the Wikibase software37 weassume that more data might be linked or imported to WikidataOur classification revealed that bots import data from local files ordatabases already However such data imports often rely on thecommunity trusting the bot operators and do not seem large-scale

The requested maintenance tasks within the RfP show similarpatterns to those in the Wikipedia community Bots are being usedto overcome the limitations of the MediaWiki software [9] Anadministrative task such as deletion is not often requested in RfPsHowever bots on Wikidata are allowed to delete sitelinks onlySimilar to Wikipedia the Wikidata community comes closer to acommon understanding of what bots are supposed to do and whatnot

The issue of unknown tasks which was not clearly defined inRfPs shows the trust the community has in single operators proba-bly due to their previous participation history There is a noticeablenumber of approved RfPs which we coded as unknown due to theirvague task description while there are also cases where task de-scriptions were clear but the community did not trust the operatorsand thus they were not approved These two observations indicatethat trust is also given importance by the community in additionto their defined policy and procedures

The success rate of RfPs is relatively high since only a smallnumber of RfPs are closed as unsuccessful Our findings show thatamong the 81 unsuccessful RfPs only some of the RfPswere rejectedby the community directly the majority of them were unsuccessfuldue to the reason that the operators were not responding or had

37Wikibase is based on MediaWiki software with the Wikibase extension httpswwwmediawikiorgwikiWikibase

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

withdrawn the request Therefore operatorrsquos inactivity is a higherreason for failure than community refusal In some cases (ie theRfPs discussed most) we can see that the community considersevery detail of the tasks and then comes to a decision however insome cases they approve the request without a detailed discussionas can be seen in the cases of unknown tasks and unknown sourcesThis result could indicate that in addition to the defined procedurefor RfPs the community applies a more flexible approach whendeciding on RfPs considering the context (eg editors experience)of the application into account

In summary the high approval rate of RfPs shows that the Wiki-data community is positive towards bot operations in Wikidata andwilling to take advantage of task automation through bots Wiki-datarsquos community profited from the experiences of the Wikipediacommunity and built productive human-bot processes in quite ashort period However we will leave the impact of these processeson data quality to future research

6 CONCLUSIONWe studied the formal process of requesting bot rights in Wikidatato find out what kind of task types are allowed to be automated bybots Our study provides a detailed description of the RfP processWe retrieved the closed RfPs from the Wikidata archive up to mid-2018 We defined a scheme in the form of a codebook to classifythe RfPs and developed our dataset The RfPs were studied mainlyfrom two perspectives 1) What information is provided duringthe time the bot rights are requested and 2) how the communityhandles these requests We found that the main tasks requestedare adding claims statements terms and sitelinks into Wikidata aswell as the main source of bot edits have their roots in WikipediaThis contrasts with Wikipedia where bots are performing mostlymaintenance tasks Our findings also show that most of the RfPswere approved and a small number of them were unsuccessfulmainly because operators had withdrawn or there was no activityfrom the operators

Future directions of this research will focus on analyzing theactual bot activities and comparing them with the results of thisstudy to find out whether bots are really performing the tasks thatthey had asked for This could help us understand better how botsevolve over time and whether the community controls bot activitiesbased on their permitted tasks or bots can perform any edit oncethey get the permission to operate as a bot

ACKNOWLEDGMENTSWe thank the Deutscher Akademischer Austauschdienst (DAAD)[Research grant 57299294] and the German Research Foundation(DFG) [Fund number 229241684] who supported this research finan-cially and the anonymous reviewers for their valuable commentswhich helped to improve the paper

REFERENCES[1] Emilio Ferrara Onur Varol Clayton A Davis Filippo Menczer and Alessandro

Flammini 2016 The rise of social bots Commun ACM 59 7 (2016) 96ndash104[2] Michael Faumlrber Frederic Bartscherer Carsten Menne and Achim Rettinger 2018

Linked data quality of DBpedia Freebase OpenCyc Wikidata and YAGO Se-mantic Web 9 1 (Jan 2018) 77ndash129 httpsdoiorg103233SW-170275

[3] R Stuart Geiger 2009 The social roles of bots and assisted editing programs InInt Sym Wikis

[4] R Stuart Geiger 2012 Bots are users too Rethinking the roles of software agentsin HCI Tiny Transactions on Computer Science 1 (2012) 1

[5] R Stuart Geiger and Aaron Halfaker 2013 When the levee breaks without botswhat happens to Wikipediarsquos quality control processes In Proceedings of the 9thInternational Symposium on Open Collaboration (WikiSym rsquo13) ACM New YorkNY USA 61ndash66 httpsdoiorg10114524910552491061

[6] R Stuart Geiger and Aaron Halfaker 2017 Operationalizing conflict and co-operation between automated software agents in Wikipedia a replication andexpansion of ldquoeven good bots fightrdquo ACM Hum-Comput Interact 1 2 (Nov 2017)33 httpsdoiorg1011453134684

[7] R Stuart Geiger andDavid Ribes 2010 Thework of sustaining order inWikipediathe banning of a vandal In Proceedings of the 2010 ACM Conference on ComputerSupported Cooperative Work (CSCW rsquo10) ACM New York NY USA 117ndash126httpsdoiorg10114517189181718941

[8] Aaron Halfaker R Stuart Geiger Jonathan T Morgan and John Riedl 2013 Therise and decline of an open collaboration system how Wikipediarsquos reaction topopularity is causing its decline 57(5) (2013) 664ndash688 httpsdoiorg1011770002764212469365

[9] Aaron Halfaker and John Riedl 2012 Bots and cyborgs Wikipediarsquos immunesystem Computer 45 3 (March 2012) 79ndash82 httpsdoiorg101109MC201282

[10] AndrewHall Loren Terveen and Aaron Halfaker 2018 Bot detection inWikidatausing behavioral and other informal cues Proceedings of the ACM on Human-Computer Interaction 2 CSCW (Nov 2018) 1ndash18 httpsdoiorg1011453274333

[11] Lucie-Aimeacutee Kaffee Alessandro Piscopo Pavlos Vougiouklis Elena SimperlLeslie Carr and Lydia Pintscher 2017 A glimpse into Babel an analysis ofmultilinguality in Wikidata In Proceedings of the 13th International Symposiumon Open Collaboration (OpenSym rsquo17) ACM New York NY USA 141ndash145httpsdoiorg10114531254333125465

[12] Jonathan Lazar JinjuanHeidi Feng andHarryHochheiser 2017 ResearchMethodsin Human-Computer Interaction (2nd ed) Morgan Kaufmann

[13] Brenda Leong and Evan Selinger 2019 Robot Eyes Wide Shut UnderstandingDishonest Anthropomorphism ACM New York New York USA

[14] Randall M Livingstone 4 January 2016 Population automation An interviewwith Wikipedia bot pioneer Ram-Man First Monday 21 1 (4 January 2016)httpsdoiorg105210fmv21i16027

[15] Claudia Muumlller-Birn 2019 Bots in Wikipedia Unfolding their duties TechnicalReports Serie B TR-B-19-01 Freie Universitaumlt Berlin httpsrefubiumfu-berlindehandlefub18824775

[16] Claudia Muumlller-Birn Benjamin Karran Janette Lehmann and Markus Luczak-Roumlsch 2015 Peer-production system or collaborative ontology engineering effortWhat is Wikidata In Proceedings of the 11th International Symposium on OpenCollaboration ACM 20

[17] Alessandro Piscopo 2018 Wikidata A New Paradigm of Human-Bot Collabora-tion arXiv181000931 [cs] (Oct 2018) httparxivorgabs181000931 arXiv181000931

[18] Alessandro Piscopo Lucie-Aimeacutee Kaffee Chris Phethean and Elena Simperl2017 Provenance information in a collaborative knowledge graph an evaluationof Wikidata external references In The Semantic Web ndash ISWC 2017 (LectureNotes in Computer Science) Springer Cham 542ndash558 httpsdoiorg101007978-3-319-68288-4_32

[19] Alessandro Piscopo Chris Phethean and Elena Simperl 2017 What makes a goodcollaborative knowledge graph group composition and quality in Wikidata InSocial Informatics (Lecture Notes in Computer Science) Giovanni Luca CiampagliaAfra Mashhadi and Taha Yasseri (Eds) Springer International Publishing 305ndash322

[20] Thomas Steiner 2014 Bots vs Wikipedians Anons vs Logged-Ins (Redux) aglobal study of edit activity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration (OpenSym rsquo14) ACM New YorkNY USA 251ndash257 httpsdoiorg10114526415802641613

[21] Margaret-Anne Storey and Alexey Zagalsky 2016 Disrupting developer produc-tivity one bot at a time In Proceedings of the 2016 24th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering (FSE 2016) ACM New YorkNY USA 928ndash931 httpsdoiorg10114529502902983989

[22] Wikidata 2013 RfP ElphiBot_3 wwwwikidataorgwikiWikidataRequests_for_permissionsBotElphiBot_3 [Last accessed 01 August 2018]

[23] Wikidata 2013 RfP ImplicatorBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotImplicatorBot [Last accessed 01 August 2018]

[24] Wikidata 2014 RfP Structor wwwwikidataorgwikiWikidataRequests_for_permissionsBotStructor [Last accessed 01 August 2018]

[25] Wikidata 2014 RfP VlsergeyBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotVlsergeyBot [Last accessed 01 August 2018]

[26] Wikidata 2016 RfP Phenobot wwwwikidataorgwikiWikidataRequests_for_permissionsBotPhenobot [Last accessed 01 August 2018]

[27] Wikidata 2016 RfP WikiLovesESBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotWikiLovesESBot [Last accessed 01 August 2018]

  • Abstract
  • 1 Introduction
  • 2 Bots in Peer Production
    • 21 Bots in Wikipedia
    • 22 Bots in Wikidata
      • 3 Approving Bot Tasks
        • 31 Approval Process in Wikidata
        • 32 Data Collection
        • 33 Task Approval Data
        • 34 Classification Process
          • 4 Results
            • 41 Approved Requests
            • 42 Disputed Requests
              • 5 Discussion
              • 6 Conclusion
              • Acknowledgments
              • References
Page 10: Approving Automation: Analyzing Requests for Permissions ... · • We used our defined scheme to identify bot-requested tasks and data sources. • We analyzed how the community

OpenSym rsquo19 August 20ndash22 2019 Skoumlvde Sweden Farda-Sarbas et al

withdrawn the request Therefore operatorrsquos inactivity is a higherreason for failure than community refusal In some cases (ie theRfPs discussed most) we can see that the community considersevery detail of the tasks and then comes to a decision however insome cases they approve the request without a detailed discussionas can be seen in the cases of unknown tasks and unknown sourcesThis result could indicate that in addition to the defined procedurefor RfPs the community applies a more flexible approach whendeciding on RfPs considering the context (eg editors experience)of the application into account

In summary the high approval rate of RfPs shows that the Wiki-data community is positive towards bot operations in Wikidata andwilling to take advantage of task automation through bots Wiki-datarsquos community profited from the experiences of the Wikipediacommunity and built productive human-bot processes in quite ashort period However we will leave the impact of these processeson data quality to future research

6 CONCLUSIONWe studied the formal process of requesting bot rights in Wikidatato find out what kind of task types are allowed to be automated bybots Our study provides a detailed description of the RfP processWe retrieved the closed RfPs from the Wikidata archive up to mid-2018 We defined a scheme in the form of a codebook to classifythe RfPs and developed our dataset The RfPs were studied mainlyfrom two perspectives 1) What information is provided duringthe time the bot rights are requested and 2) how the communityhandles these requests We found that the main tasks requestedare adding claims statements terms and sitelinks into Wikidata aswell as the main source of bot edits have their roots in WikipediaThis contrasts with Wikipedia where bots are performing mostlymaintenance tasks Our findings also show that most of the RfPswere approved and a small number of them were unsuccessfulmainly because operators had withdrawn or there was no activityfrom the operators

Future directions of this research will focus on analyzing theactual bot activities and comparing them with the results of thisstudy to find out whether bots are really performing the tasks thatthey had asked for This could help us understand better how botsevolve over time and whether the community controls bot activitiesbased on their permitted tasks or bots can perform any edit oncethey get the permission to operate as a bot

ACKNOWLEDGMENTSWe thank the Deutscher Akademischer Austauschdienst (DAAD)[Research grant 57299294] and the German Research Foundation(DFG) [Fund number 229241684] who supported this research finan-cially and the anonymous reviewers for their valuable commentswhich helped to improve the paper

REFERENCES[1] Emilio Ferrara Onur Varol Clayton A Davis Filippo Menczer and Alessandro

Flammini 2016 The rise of social bots Commun ACM 59 7 (2016) 96ndash104[2] Michael Faumlrber Frederic Bartscherer Carsten Menne and Achim Rettinger 2018

Linked data quality of DBpedia Freebase OpenCyc Wikidata and YAGO Se-mantic Web 9 1 (Jan 2018) 77ndash129 httpsdoiorg103233SW-170275

[3] R Stuart Geiger 2009 The social roles of bots and assisted editing programs InInt Sym Wikis

[4] R Stuart Geiger 2012 Bots are users too Rethinking the roles of software agentsin HCI Tiny Transactions on Computer Science 1 (2012) 1

[5] R Stuart Geiger and Aaron Halfaker 2013 When the levee breaks without botswhat happens to Wikipediarsquos quality control processes In Proceedings of the 9thInternational Symposium on Open Collaboration (WikiSym rsquo13) ACM New YorkNY USA 61ndash66 httpsdoiorg10114524910552491061

[6] R Stuart Geiger and Aaron Halfaker 2017 Operationalizing conflict and co-operation between automated software agents in Wikipedia a replication andexpansion of ldquoeven good bots fightrdquo ACM Hum-Comput Interact 1 2 (Nov 2017)33 httpsdoiorg1011453134684

[7] R Stuart Geiger andDavid Ribes 2010 Thework of sustaining order inWikipediathe banning of a vandal In Proceedings of the 2010 ACM Conference on ComputerSupported Cooperative Work (CSCW rsquo10) ACM New York NY USA 117ndash126httpsdoiorg10114517189181718941

[8] Aaron Halfaker R Stuart Geiger Jonathan T Morgan and John Riedl 2013 Therise and decline of an open collaboration system how Wikipediarsquos reaction topopularity is causing its decline 57(5) (2013) 664ndash688 httpsdoiorg1011770002764212469365

[9] Aaron Halfaker and John Riedl 2012 Bots and cyborgs Wikipediarsquos immunesystem Computer 45 3 (March 2012) 79ndash82 httpsdoiorg101109MC201282

[10] AndrewHall Loren Terveen and Aaron Halfaker 2018 Bot detection inWikidatausing behavioral and other informal cues Proceedings of the ACM on Human-Computer Interaction 2 CSCW (Nov 2018) 1ndash18 httpsdoiorg1011453274333

[11] Lucie-Aimeacutee Kaffee Alessandro Piscopo Pavlos Vougiouklis Elena SimperlLeslie Carr and Lydia Pintscher 2017 A glimpse into Babel an analysis ofmultilinguality in Wikidata In Proceedings of the 13th International Symposiumon Open Collaboration (OpenSym rsquo17) ACM New York NY USA 141ndash145httpsdoiorg10114531254333125465

[12] Jonathan Lazar JinjuanHeidi Feng andHarryHochheiser 2017 ResearchMethodsin Human-Computer Interaction (2nd ed) Morgan Kaufmann

[13] Brenda Leong and Evan Selinger 2019 Robot Eyes Wide Shut UnderstandingDishonest Anthropomorphism ACM New York New York USA

[14] Randall M Livingstone 4 January 2016 Population automation An interviewwith Wikipedia bot pioneer Ram-Man First Monday 21 1 (4 January 2016)httpsdoiorg105210fmv21i16027

[15] Claudia Muumlller-Birn 2019 Bots in Wikipedia Unfolding their duties TechnicalReports Serie B TR-B-19-01 Freie Universitaumlt Berlin httpsrefubiumfu-berlindehandlefub18824775

[16] Claudia Muumlller-Birn Benjamin Karran Janette Lehmann and Markus Luczak-Roumlsch 2015 Peer-production system or collaborative ontology engineering effortWhat is Wikidata In Proceedings of the 11th International Symposium on OpenCollaboration ACM 20

[17] Alessandro Piscopo 2018 Wikidata A New Paradigm of Human-Bot Collabora-tion arXiv181000931 [cs] (Oct 2018) httparxivorgabs181000931 arXiv181000931

[18] Alessandro Piscopo Lucie-Aimeacutee Kaffee Chris Phethean and Elena Simperl2017 Provenance information in a collaborative knowledge graph an evaluationof Wikidata external references In The Semantic Web ndash ISWC 2017 (LectureNotes in Computer Science) Springer Cham 542ndash558 httpsdoiorg101007978-3-319-68288-4_32

[19] Alessandro Piscopo Chris Phethean and Elena Simperl 2017 What makes a goodcollaborative knowledge graph group composition and quality in Wikidata InSocial Informatics (Lecture Notes in Computer Science) Giovanni Luca CiampagliaAfra Mashhadi and Taha Yasseri (Eds) Springer International Publishing 305ndash322

[20] Thomas Steiner 2014 Bots vs Wikipedians Anons vs Logged-Ins (Redux) aglobal study of edit activity on Wikipedia and Wikidata In Proceedings of TheInternational Symposium on Open Collaboration (OpenSym rsquo14) ACM New YorkNY USA 251ndash257 httpsdoiorg10114526415802641613

[21] Margaret-Anne Storey and Alexey Zagalsky 2016 Disrupting developer produc-tivity one bot at a time In Proceedings of the 2016 24th ACM SIGSOFT InternationalSymposium on Foundations of Software Engineering (FSE 2016) ACM New YorkNY USA 928ndash931 httpsdoiorg10114529502902983989

[22] Wikidata 2013 RfP ElphiBot_3 wwwwikidataorgwikiWikidataRequests_for_permissionsBotElphiBot_3 [Last accessed 01 August 2018]

[23] Wikidata 2013 RfP ImplicatorBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotImplicatorBot [Last accessed 01 August 2018]

[24] Wikidata 2014 RfP Structor wwwwikidataorgwikiWikidataRequests_for_permissionsBotStructor [Last accessed 01 August 2018]

[25] Wikidata 2014 RfP VlsergeyBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotVlsergeyBot [Last accessed 01 August 2018]

[26] Wikidata 2016 RfP Phenobot wwwwikidataorgwikiWikidataRequests_for_permissionsBotPhenobot [Last accessed 01 August 2018]

[27] Wikidata 2016 RfP WikiLovesESBot wwwwikidataorgwikiWikidataRequests_for_permissionsBotWikiLovesESBot [Last accessed 01 August 2018]

  • Abstract
  • 1 Introduction
  • 2 Bots in Peer Production
    • 21 Bots in Wikipedia
    • 22 Bots in Wikidata
      • 3 Approving Bot Tasks
        • 31 Approval Process in Wikidata
        • 32 Data Collection
        • 33 Task Approval Data
        • 34 Classification Process
          • 4 Results
            • 41 Approved Requests
            • 42 Disputed Requests
              • 5 Discussion
              • 6 Conclusion
              • Acknowledgments
              • References