This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The...
Transcript of This chapter is part of “The SAGE Handbook of Web History ... · This chapter is part of “The...
Thischapterispartof“TheSAGEHandbookofWebHistory”,editedbyNielsBrüggerandIanMilligan.Thisversionissubsequenttopeerreviewbutbeforetype-settingandproofing,sodon’tcitethis,citetheversionofrecord,availablefromhttps://uk.sagepub.com/en-gb/eur/the-sage-handbook-of-web-history/book252251ThisversioncanbereusedunderCC-BY-NC-ND4.0:https://creativecommons.org/licenses/by-nc-nd/4.0/
CollectingPrimarySourcesfromWebArchives:
ATaleofScarcityandAbundance
FedericoNanni
DataandWebScienceGroup
UniversityofMannheim
Ladiversitédestémoignageshistoriques
estpresqueinfinie.
(Bloch,1949)
TheWorldWideWebisthelargestcollectionofhumantestimoniesthatwehaveeverhadatour
fingertips.Spanningfrominstitutionalwebsitestodigitallibraries,frompersonalblogstoTwitter
accountsofprominentpoliticians,fromonlinenewspaperstolarge-scaleknowledgebases,an
immensenumberofborn-digitaltestimoniesiswaitingtoberetrieved,selectedandstudiedbyfuture
historians.Inadditiontothis,whilethesenewresourcesarepilingupsteadilyinfrontofoureyes,
theyarealsorapidlyreplacingtheiranaloguecounterparts,fromprintednewsarticlestopersonal
diaries,fromlettercorrespondencestoscientificpublications.
Byacknowledgingthissuddentransitioninproductionfromprintedtodigitaldocuments,thegoalof
thischapteristopresentanddiscusssomeofthenewmethodologicalissuesthatarisewhenthese
materialsaretobeemployedasprimarysourcesforstudyingtherecentpast.Firstly,anoverviewof
thedebateonthehistorian’scraftisoffered.Then,twodifferentcasestudiesthathavedealtwiththe
difficultiesofadoptingborn-digitalmaterialsinhistoricalworkwillbedescribed:thefirstisfocused
onreconstructingthepastofuniversitywebsitesasanewwayforstudyingtherecentpastof
academicinstitutions;thesecondretrievesmaterialsfromlarge-scalearchivesofthewebinorderto
studycontemporarysocio-politicalevents.Throughthesedescriptions,itwillbehighlightedhowa
fruitfulcombinationofthehistoricalmethodswithapproachesfromotherresearchareas,suchas
internetstudiesandnaturallanguageprocessing,couldsupportfuturehistoriansinsuccessfully
addressingthem.
1.TheHistoricalMethod:TodayandTomorrow
Inordertounderstandhowthetransitionfromanaloguetodigitalsourcesisabouttochangethe
historian’scraft,itisfirstofallessentialtoexaminehowthe‘historicalmethod’(Shafer,1974)is
generallydefinedandwhichareitsmajorsteps.
DefiningaSubject
Inthefirstpartofanyhistoricalresearch,thescholarbroadlydefinesthesubjectofinvestigationand
-togetherwithit-aninitialquestion.Theresearchquestion,firstlypresentedatacoarse-grained
level,willbesharpenedthroughtherecursiveprocessofcollectingsources,interpretingthemandby
doingsodiscoveringtheunderlyingnarrative.
CollectingtheEvidence
Inordertoaddresstheresearchquestion,thehistorianidentifiesthetestimoniesuponwhichshe/he
buildsanarrativethroughacomplexprocessofcollection,analysisandselectionoftheremainsof
thepast.Thesetestimoniescouldbephysicalremains(e.g.buildings,statues),oralmemories,printed
documents(e.g.chronicles,diaries,articles,censusdata)andwillsoonbecomeborn-digital
documents,suchaswebsites,onlineforums,emailthreads,large-scaledatabases,etc.Theprocessof
collectingprimarysourceshasbeenshapedandsharpenedbydecadesofdiscussionsin
historiographybothonhowtoestablishthereliabilityofthesematerials,forexamplethroughsource
criticism,andonhowmuch‘trueknowledge’canbederivedfromthem(thereareasmany
interpretationsofthesametextasmanyreaders,asBarthes(1967)hastaughtus).
InterpretingtheEvidence
Theinterpretationofthecollectedtextualsourcesrepresentsthecoreofanyhistoricalresearch.Due
tothisreason,ithasbeenthecentralfocusofdebateacross20thcenturyhistoriographyandhas
experienceddrastictransitionsinmethodology.Asamatteroffact,theanalysisandinterpretationof
sourcescanbeconductedinmanydifferentways:traditionalhistoriographyscholarshipshave
stronglyreliedonhermeneuticsandonthecarefulqualitativeexaminationofdocuments,whileother
approaches-whichemergedduringthesecondpartofthe20thcenturyinspiredbysocialscience
methodologies(seetheadventofCliometrics-Greif,1997)–haveemployedcensusdataor
economicreportsinordertoconductlarge-scalequantitativeanalyses.
Throughthe‘70sandthe‘80spostmodernanddeconstructionisttheories(startingfromtheworksof
Barthes,1967;Derrida,1967andLyotard,1979,amongothers)haveposedmajorcritiquestothe
underlyingassumptionofbothtraditionalandsocial-sciencehistoricalscholarshipsthatitispossible
todiscovera‘uniquetruth’aboutthepastthroughtocarefulanalysisoftheremains.Theenormous
impactofthesecritiqueshasbeenremarkedbymanyhistorians(Munslow,2006;Burke,2008)and
hasledtotheso-calledculturalturnintheprofession,whichisstillreflectedstronglytodayinthe
community1.
PresentingaNarrative
Thefinalstepofanyhistoricalresearchistodefineanarrativeandwriteahistory.Thecreationofa
narrative,whichishighlyconnectedwiththeinitialdefinitionoftheresearchquestion,givesthe
historianthepossibilityofplacingtheworkshe/heiswritingaspartofalargercontributiontothe
field.Thisisachievedintwointerconnectedways:firstofall,byofferinganew/differentperspective
onthetopicunderstudy;inadditiontothis,byparticipatinginthelargerdebateinhistoriography
regardingthewaysthepastcanbere-discovered,examined,describedand-forcertainauthors2-
evenmodelled.
1.1AComputationalTurnoftheCraft?
Historyhasbeenpartoftheso-calleddigitalhumanities(Schreibmanetal.,2004),sincetheirvery
beginning.3Inparticular,duringthesecondpartofthe20thcenturythepotentialofcomputational
methodsandtheirimpactoverthehistorian'scrafthavebeenrecurrenttopicsinhistoriography.As
ThomasIII(2004)remarked,alreadyin1945VannevarBush,inhisfamousessay‘AsWeMayThink’
pointedoutthattechnologycouldbethesolutionthatwouldenableustomanagetheabundanceof
scientificandhumanisticdata(Bush,1945);inhisvision,theMemexcouldbecomeanextremely
usefulinstrumentforhistorians.
Theuseofthecomputerinhistoricalresearch,whichgrewsignificantlybetweenthe‘60sandthe‘70s
thanksbothtotheeffortsoftheAnnalesschool(seeforexampleDaumardandFuret,1959)andtoits
applicationtotheanalysisofeconomicandcensusdata(Greif,1997),hasbeenstronglyrelatedtothe
adoptionofsocialsciencepracticesinhistoricalstudies(Evans,2001).Apioneeringworkontheuse
ofdatabasetechnologiesforhistoricalresearchwasconductedbyManfredThallerduringthe‘80s
(Thaller,1991).
However,asMilligan(2012)andRobertson(2016)havealreadyremarked,alargemajorityofthe
historiancommunityhasremainedskepticaltowardstheadoptionofcomputationalmethodsinthe
craft.Thisattitudehasconsolidatedinoppositiontootherhumanitiesdisciplines:forexample,inthe
lastthirtyyearsthefieldofliterarystudyhaslargelyexperimentedwiththepotentialofwhatthey
havedefinedas‘distantreading’techniques,inordertoextractquantifiableinformationfromlarge
amountoftexts(Moretti,2013).Instead,duringthesametime,theso-calleddigitalhistory
community(Cohenetal.,2008)hasdecidedtofocusprimarilyonthepotentialitiesoftheWebasa
platformforthecollection,presentation,anddisseminationofmaterial(CohenandRosenzweig,
2005)andonthemore‘communicativeaspects’ofdoingresearchinthehumanities(Robertson,
2016).Thiscanbenoticedbyobservingtheimportancegiventodigitalpublichistorytopics(Noiret,
2015),therelevanceofteachingindigitalhistory(Cohenetal.,2008)andthetraditionofdigital
historymapping(KnowlesandHillier,2008).
Inthesecondpartofthe2000s,thanksinparticulartothepromptavailabilityofdigitizedhistorical
primarysourcesandthepotentialitiesofwebtechnologies,thisskepticalattitudetowards
computationalmethodshasslowlychangedandafewinterdisciplinaryteamshavedevelopedtoolsin
ordertohelpothertraditionallytrainedhistorianstoemploythesemethodsintheirwork.AsNelson
(2016)remarked,thefirstfruitfulapplicationsofthesemethodsforsupportinghistoricalnarratives
canbefoundintheworksofWilkens(2013)andBlevins(2014),whicharerobustexamplesofthe
beginningofamatureseasonofdigitalhistory.
Whiletheseearlyscholarshipsbasedontheuseofcomputationalapproachesareessentialfor
refreshingthehistoriographicdebate,itisarguedinthischapterthattheadoptionofcomputational
methodscouldnotbeconsiderperseasarevolutionaryturningpointfortheprofession.Infact,use
oftheseapproachesissimilartoothermethodologicalturningpointsthathistorianshavealready
experiencedbefore(Milligan(2012),forexample,identifies‘threewaves’ofcomputationalhistory);
moreover,duringthelasttenyearstheuseofcomputationalmethodsinhumanitiesresearchhas
beenstronglysustainedandencouragedbypublicandprivateinstitutions(fromtheNEHDigital
HumanitiesAdvancementGrantstotheVolkswagenStiftungon‘MixedMethods’intheHumanities)
aswellasprivatecompanies(e.g.,Google’s‘commitment’totheDigitalHumanities)andoften
mainstreammediasources(Rothman,2014).
Nevertheless,itisarguedinthischapterthathistoriographyisabouttoexperienceanewandway
moreconspicuousturningpointandthatthiswillhaveaverystrongimpactonaspecificstepofthe
historian’scraft,namelythewaysourcesarecollectedfromnowon.Born-digitaldocumentsshared
online,theirephemerality,preservation,availabilityandaccessisabouttoposealargesetofnew
challengesforfuturehistorians.Inthenextdecades,themethodologicaldebateinhistoriographywill
notonlybecenteredaroundqualitativeoverquantitative,distantversusclose,hermeneuticsagainst
statisticalsignificance,butitwillalsoaddresstheneedsofthecommunityinfindingwaysofacquiring
knowledgeonourrecent(digital)past.
1.2Theborn-digitalturn
Thetransitionfromanaloguetoborn-digitalmaterialsisinfluencingthewayhistoriansstudythepast:
materialssuchaswebsites,forums,blogs,tweets,emails,areinfactverydifferentcomparedto
traditionalanalogueanddigitizedprimarysources.Born-digitalmaterialshaveanextremelyshortlife
comparedtoprinteddocumentsastheyaresignificantlymoredifficulttoarchiveandpreserve
(LaFrance,2015).Thisisduetoavastnumberofreasons(Brügger,2005)andtheconsequenceofit
hasbeensummarizedbyRosenzweig(2003)withtheconceptof‘scarcity’ofdigitalprimarysources.
Webpagesdisappearconstantlyfromtheliveweb(becausetheyareremovedbytheauthororby
theowneroftheplatform,forinstanceduetocopyrightissues),leavingafamiliartraceof404status
codemessages.Severalscholars(Rosenzweig,2003;Brügger,2012amongothers)havealready
remarkedonthegreatimpactthattheephemeralityofwebmaterialswillhaveonthesharingand
accessibilityoftheknowledgeproducedinthedigitalageforthenextgenerationsofhistorians.Asit
hasbeenalreadysaid,inoppositiontothefactthat‘papersurvivesbenignneglectforalongtime’
(Davis,2014):
Thelifecycleofmostwebpagesrunsitscourseinamatterofmonths.In1997,theaverage
lifespanofawebpagewas44days;in2003,itwas100days.Linksgobadevenfaster.A
2008analysisoflinksin2,700digitalresources—themajorityofwhichhadnoprint
counterpart—foundthatabout8percentoflinksstoppedworkingafteroneyear.By2011,
whenthreeyearshadpassed,30percentoflinksinthecollectionweredead.(LaFrance,
2015)
Moreover,whilesometypesofpagesdisappearmorefrequentlythanothers(e.g.socialmedia
messagesasopposedtoofficialstatementsonadministrativewebsites),thosethatdosurvivetendto
changeveryfrequently(Doughertyetal.,2010).Forexample,articlesinnewspapers(Nanni,2013)as
wellasofficialadministrativepageshavebeenoftenmodifiedwithoutaspecificmention(Owenand
Davis,2008).WhileinitiativessuchastheInternetArchivehavealongtraditionofpreservingborn-
digitalmaterialsforfutureresearch,severalissuesstillexistandnewissuescontinuetoemerge-not
intheleastduetoconstantinnovationsinwebtechnologies.Therefore,researchershavetodealwith
thecollectedmaterialsinahighlycriticalway,asBrügger(2012)describedwhenheintroducedhis
definitionofwebarchivedocumentsasreborn-digitalmaterials:
Oneofthemaincharacteristicsofwebarchivingisthattheprocessof
archivingitselfmaychangewhatisarchived,thuscreatingsomethingthatis
notnecessarilyidenticaltowhatwasonceonline.[...]And,second,thatawebsitemaybe
updatedduringtheprocessofarchiving,justastechnicalproblemsmayoccurwherebyweb
elementswhichwereinitiallyonlinearenotarchived.Thus,itcanbearguedthattheprocess
ofarchivingcreatesthearchivedwebonthebasisofwhatwasonceonline:theborn-digital
webmaterialisreborninthearchive.(Brügger,2012)
Thedifficultiesinthepreservationofdigitalsourcespresentanewsetofissuesforhistorianswho
plantoemploythemintheirwork;however,theyremainonlypartoftheoverallproblem.Infact,
alreadyin2003,Rosenzweigenvisionedthatfuturehistorianswillnotonlydealwithaconsistent
scarcityofprimarysources,buttheywillbealsochallengedbyaneverexperiencedbeforeabundance
ofrecordsofourpast.Theindispensableneedofcomputationalmethodsforprocessingand
retrievingmaterialsfromthesehugecollectionsofprimarysourceshasbeenacentraltopicof
Milligan'spublications(2012,2016).Fromhisworksitemergesthatnowthatthecommunityis
dealingwiththeabundanceofborn-digitalsources,theuseofcomputationalapproachescannotbea
choiceforthedigitalhumanitiesresearcheranymore.Therefore,itbecomesessentialthatthe
researchersadoptthesesolutionscritically,alwaysknowingtheirpotentialandlimitations,andlearn
howtocombinethemfruitfullywiththetraditionalhistoricalmethod.
Whiletheconsequencesoftheadventofborn-digitalsourceswillberevolutionaryforourprofession,
sofar‘verylittleattentionhasbeenpaidtothenewdigitalmediaashistoricalsources’(Brügger,
2012),highlightingthefactthat,while‘newmediaisnotthatnewanymore’(Milligan,2016)forour
society,theyremainanoveltyforhistorians.
Thenextsectionswillremarkfurtheronthistopicbydescribingtwoverydifferentcasestudiesthat
havedealtwiththeuseofborn-digitaldocumentsasprimarysourcesforhistoricalresearch.Thefirst
thatwillbeintroduced,focusesonexaminingtheonlinepresenceoftheUniversityofBologna,since
theearlyNineties,andremarksontheimportanceofcombiningthetraditionalhistorian’scraftwith
approachesfromthefieldofinternetstudies.
2.StudyingtheRecentPastofAcademicInstitutions:ATaleofScarcity
Multiplehistorianshaveconsideredacademicinstitutionsaspolitical,economicalandsocialactors;
theyhavealsoarguedhowtheirpower,roleandinfluencechangedovertime,especiallyinrelationto
otheractors,suchasthecity,thechurch,thenationalgovernment(Brockliss,1978).Inparticular,the
comprehensivefour-volumebookseries`AHistoryoftheUniversitiesinEurope’,commissionedbythe
EuropeanUniversityAssociation,editedbyHildedeRidder-SymoensandWalterRüeggandpublished
between1992and2011,offersanunprecedentedoverviewonhowuniversitieshavetransformed
overcenturies:whattheyhavetaughtandresearched,howtheyhavebeeninstitutionalizedandhow
theyhaveinteractedwiththesociety.
Historiansofhighereducation,whopresentedtheirresearchinthevolume,haveadoptedalarge
varietyofprimaryandsecondarysourcesintheirworks,fromuniversity-archivematerialssuchas
matriculationandgraduationstatisticstoacademicdissertations,frompublicreportstolargescale
statisticalanalyses.Basedonthesedata,researchershavedescribedanddrawnconclusionsonthe
historyofuniversitiesonalargevarietyoftopics,suchasthewayuniversitieshavemanaged
resources,thewaytheadmissionprocesshaschangedbeforeandafter1970,andhowsciencesand
humanitieshavebeentaughtandstudied.
Thecurrentpromptavailabilityofalargevarietyofborn-digitalmaterialssuchassyllabi(Cohen,
2005),bachelor,masteranddoctoraltheses(Ramage,2011),academicwebsites(Holzmannetal.,
2016b)andtheirhyperlinkedstructure(Haleetal.,2014)isabouttobecomeanewrelevant
componentofthisfieldofresearch(Nanni,2017b).
Anemblematicexampleofthenewchallengesthatborn-digitaldocumentswillposetohistoriansof
highereducationisastudyonreconstructingtherecentpastoftheUniversityofBologna,throughits
digitalsources(Nanni,2017a).
TheUniversityofBologna'swebsite(Unibo.it),initiallycreatedin1993,representsanewcategoryof
relevantresourceforhistoriansofhighereducation.Thewebsitecollectsandofferstothereadera
largevarietyofdocuments,fromdescriptionsofeducationalprojectstooverviewsofresearch
groups,fromreportsofcollaborationwithinternationalinstitutionstoinformationonopportunities
ofinteractionswiththeprivatesector.Inaddition,italsoshowshowdifferentdepartments,
professorsandresearchteamshavebeenadoptingtheweb–especiallyinitsearlydays.Amongthe
manyrelevantexamples,onethatdeservesspecialmentionisthattheAstronomyDepartmentofthe
universitywasalreadysharingpreprintsoftheirpublicationsonlinein1994ashtmlpages,inanearly
attemptofbenefittingfromthepotentialoftheWorldWideWeb.
Nevertheless,whileUnibo.itrepresentsausefulcollectionofprimarysources,thewebsitehasbeen
modifiedseveraltimesduringitsfirsttwentyyearsandthemajorityofthepagesthathavebeen
publishedinthepastarenotavailableanymoreontheliveweb.Inparticular,thetransitiontotheso-
called‘PortaleD’Ateneo’,whichstartedintheearly2000s,requiredthatalldepartmentpageschange
theirstructureandadoptacommonlayoutandorganizationoftheircontent.Thishasoftenforced
thecreationofbrand-newdepartmentsubdomainsandtheremovalthepreviousversionsofthe
samefromtheliveweb.Asanadditionalissue,theteamthathasmanagedthewebsiteduringthis
entiretransitionhasnotconsistentlyarchivedthepreviousversionsofthewebsiteanddocumented
theirwork.
Giventhefactthatasof2017theNationalLibrariesofFlorenceandRomearestillnotpartofthe
InternationalInternetPreservationConsortium(IIPC)andnocoordinatedprojectwiththespecific
purposeofpreservingthenationalwebspherecurrentlyexistsinItaly,theInternetArchiveremains
theonlyresourceavailableforrecollectingallthematerialsthatarenotavailableontheUniversityof
Bolognawebsiteanymore.However,in2002aremovalrequest4fromtheadministrativeteamof
Unibo.itwassenttotheInternetArchive,andforthisreasonUnibo.ithadbeeninaccessiblethrough
theWaybackMachineformorethanthirteenyears.Thishighlycomplexsituationreflectsanewlevel
ofdifficultiesthatfuturehistorianswillencounterwhileattemptingtocollectborn-digitalsources.In
thenextsection,anoverviewofthevarietyofsourcesandmethodsthathavebeenusedtodealwith
thisissueandtoreconstructthepastofUnibo.itwillbepresented.
LibraryandArchiveMaterials
Asaninitialstepoftheresearch,materialsavailableintheuniversitylibraryandarchiveswere
consulted.Amongmanyotherdocuments,averyusefulsourcehasbeentheuniversityyearbook.In
theearly90sonlyafewpiecesofinformationregardingthewebsitewerementionedinthe
yearbook;nevertheless,thissourceofferedaninitialdiachronicoverviewoftheofficialteamsthat
weremanagingUnibo.itandwasusefulfordrawingalistofpeopletointerview.
Interviews
Inordertocapturetherationaleandthechangingarchitectureofthewebsite,thedifferentteams
whomanagedthewebsitewereinterviewed,togetherwithtechniciansandresearcherswhoworked
onthedevelopmentofthepagesofvariousdepartments,especiallyduringthe‘90s.Yetanother
interestingfinding,presumablyhighlyrelevantforfuturehistorians,wasthatmanytimesduringthe
interviewsthesubjectsusedpublicandprivatebackupsofemailsinordertorecollectthememories
oftheirexperienceinworkingonUnibo.itandtoconfirmpassagesofthehistoricalreconstruction.
Newspapers
Asalreadydoneinpreviouswork(Brügger,2011),whereprintedmediawereusedtoretrieve
informationaboutthewebofthepast,informationrelatedtoUnibo.itandtheroleofthewebsitefor
theUniversityofBolognahavebeenidentifiedinlocalandnationalnewspaperarchives.Duringthe
‘90s,newspaperssuchasLaRepubblicaandIlRestodelCarlinopublishedafewshortarticlescovering
thenewfunctionalitiesonthewebsite(e.g.freeemailaccountforallstudents,onlinefeepayments,
etc.).Thesepublications,togetherwithmaterialscollectedfromtheuniversitydigitalmagazines
(Alma2000,AlmaNews,UniboMagazine),offeredanadditionaloverviewonhowtheuniversity
decidedtopromotethewebsitetoitsaudience.
OnlineForums
Togetacloserlookattheeverydayuseofthewebsitebystudentsandresearchers,othermaterials
havebeencollectedandanalyzed,startingfromstudentforums(e.g.UniversiBo)andUsenet
discussionspreservedbyGoogle.Thesedocuments,especiallyinthe‘90s,presenttheperspective
andenthusiasmofarathersmallbutspecificsubsetoftheuniversitycommunity,namelystudents,
researchersandprofessorsinSTEMfields,whosedepartmentswereamongthefirstonestooffer
accesstotheweb.
LiveWebMaterials
Whilethewebsitehasbeenrestructuredmultipletimesduringitsfirst20yearsonline,many
resourcesarestillavailableonthelivewebandcanrevealthecurrentroleofwebsiteinthe
university'sorganizationandmanagement(e.g.attractingnationalandinternationalstudentsand
researches,promotingcollaborationswiththeprivatesector,etc).Additionally,thesocialmedia
pagesoftheinstitution(suchasFacebook,YoutubeandTwitterprofiles)arebecomingkey
componentsofitspresenceonline,showingalternativeandmoreinformalwaysofinteractionwith
theusers.
PresenceofItalianWebsitesinOtherNationalWebArchives
AsidefromtheInternetArchive,since1996nationallibrariesfromallaroundtheworldhavealso
beguntopreservetheirnationalwebpast.PANDORA,startedin1996bytheNationalLibraryof
Australia,theUKWebArchive(2004),theNetarkivet(2005)inDenmarkandthePortugueseWeb
Archive(2011)arejustafewexamplesofthisinternationalendeavor.Giventhecomplexityof
definingandpreservingwhatiscalleda‘nationalweb-sphere’(Brügger,2009),thisresearchalso
exploredtheuseofforeignwebarchivesasaproxyforstudyingUnibo.it.Thepracticeofretrieving
primarysourcesrelatedtoanItalianuniversitywebsiteinforeignwebarchivescouldseemratherodd
asthegoalofanationalwebarchiveispreciselytopreservethewebofitscountry,howeverfrom
timetotimepartofthenon-nationalwebalsoendsupbeingpreserved,unintentionally,bythese
digitalarchives.
Forexample,toarchivenationalwebspheresinanautomaticway,archivistscouldsetupcrawlers
withamaximumnumberofhyperlinkstheycanfollow,withaspecificsetofstartingpoints.Acrawler
whichissettogoatmosttenlinksawayfromoneoftheseURLscouldalsoendupcrawlingnon-
nationalcontent,asitwillsystematicallyfollowallthehyperlinks.Forthisreason,iftheUniversityof
BolognaweretoorganizeaSummerSchoolandAarhusUniversityhadlinkeditfromitswebsite,the
UniversityofBolognawebsite(oratleastpartofit)wouldbeunintentionallypreservedintheDanish
WebArchive.
Asapartofthiswork,ithasbeenfoundoutthatboththePortuguese(Arquivo)andDanish
(Netarkivet)webarchiveshavepreservedpartsofUnibo.itseveraltimes,since2006.
ClonedVersionsoftheWebsite
Amongthevarietyofsourcesavailable,onedeservesaspecificmention.InMay2007,agroupof
activistsdecidedtocreateacopyoftheUnibo.itwebinterface,aspartofaprotestagainstthe
EuropeanCreditTransferandAccumulationSystem(ECTS)fortheevaluationofthenumberofhours
ofstudy.IntheURLhttp://www.unibologna.euanidenticalversionofthewebsitewasavailable,with
thedescriptionofthereasonsoftheprotest.
Thissourcehasnotonlybeenimportantinthisstudyasitdocumentedaninnovativewayof
conductingaprotestagainstanacademicinstitution(bytargetingitswebsite),butalsobecausethe
cloned-websitewaspreservedbytheInternetArchive.
ACriticalCombinationofSourcesandMethods
Thecombinationoftraditionalarchivalpracticeswithapproachesfromthefieldofinternetstudiesis
essentialintheattemptoffacingthisemblematicexampleofscarcityofborn-digitalprimarysources
andreconstructingthepastoftheUniversityofBolognawebsite.Thisnewmethodologyforcollecting
born-digitalevidenceshasbeenespeciallyusefulinidentifyingthenarrativebehindtheearlyyearsof
Unibo.it,whichinvolvesthearrivalofaTurkishprofessorfromtheUnitedStatesattheuniversityin
1988,theestablishmentofthesecondItaliannodetotheInternetandthecreationofarguablyoneof
themostrelevantuniversitywebsitesofthecountry5.
Whilethedifficultiesinreconstructingtherecentpastofauniversitywebsitecouldsurprisethe
reader,aslessthan30yearshavebeenpassedsinceitscreation,theyonlyrepresentonepartofthe
newissuesthatborn-digitalsourceswillposetofuturehistorian.
Asithasbeenpreviouslyremarkedandwillbeexpandedinthenextsection,futurehistorianswillbe
infactalsochallengedbyaneverexperiencedbeforeabundanceofrecordsofourpast.Thesecond
casestudypresentedinthischapterfocusesonobtainingsmalltopic-specificcollectionsfromlarge-
scalearchivesoftheweb;bypresentingtheencounteredchallengesanddescribingtheadopted
solutions,itwillberemarkedontheimportanceoffruitfullycombiningthetraditionalhistorical
methodwithapproachesfromthefieldofnaturallanguageprocessing.
3.CreatingPoliticalEventCollections:ATaleofAbundance
TheWorldWideWebprovidestheresearchcommunitywithanunprecedentedabundanceof
primarysourcesfordiachronicallytracing,examiningandunderstandingmajoreventsand
transformationsinoursociety.Fortwodecades,publicandprivateinstitutionshavepreservedthese
born-digitalmaterialsforfutureanalysis(GomesandCosta,2011).However,thesecollectionsare
nowsolargethat–intherarecaseswhentheyarefullyavailableforresearch(Hockx-Yu,2014)–itis
notfeasibleforscholarstostudypoliticalandsocialphenomenabyexaminingthemintheirentirety.
IfweforinstanceconsidertheInternetArchive,duringitsfirsttwentyyearsithaspreservedalmost
500billionwebpages,andasof2017ithasacollectionofaround25petabytesofdata.Since2001,
thiscollectionhasbecomeavailableforresearchthroughaURLsearchtoolontheWaybackMachine.
Inthemostrecentyears,informationretrievalsystemssupportingkeywordsearchoverthe
diachroniclayersofwebarchiveshavebeendevelopedbytheresearchcommunityandemployedby
institutionssuchastheUKWebArchiveand–since2017–alsopartiallybytheInternetArchive.In
additiontothis,out-of-the-boxtoolssuchasArchiveSpark(Holzmannetal.,2016a)andWarcbase(Lin
etal.,2017)havebeendevelopedbytheresearchcommunitywiththespecificgoalofsupporting
scholarsingatheringinformationfromlarge-scalewebarchivecollections.
Oneofthemainendeavorsofwebarchiveinstitutionsforfosteringtheuseofthesenewresourcesis
tooffermanuallycuratedsub-collectionsregardingrecentsocio-politicalevents.OnArchive-It–a
subscriptionwebarchivingserviceprovidedbytheInternetArchive–afewcollectionsregarding
large-scaleeventssuchastheBostonMarathonShooting,theBlackLivesMattermovementandthe
CharlieHebdoterroristattackareavailable.Thecollectionsarecuratedby‘theArchive-Itteamin
conjunctionwithcuratorsandsubjectmatterexpertsfrominstitutionsaroundtheworld’.
Inadditiontomanualselection,anothersolutionemployedbydigitalarchivistsforcreatingand
sharingtheseeventcollectionsistoadoptafilteringapproachthatpresentstotheuseronlythose
documentsthatmentionthenameoftheevent.Thistypeofapproachiscommoninevent-harvesting
fromTwitter,whereresearcherscollectalltweetsthat–forexample–mentionthehashtagofthe
event.
Whilebothcollectingdocumentsfromwebarchivesthroughmanualselectionandretrieving
materialsthroughname-filteringhavealreadyprovedtheirusefulnessinsupportingresearchersin
thehumanitiesandsocialsciences(e.g.,Small,2011),theyhaveafewcruciallimitations.Onone
hand,manualselectionisobviouslyapainstakinglylongprocess–giventhepreviouslymentioned
difficultiesofretrievinginformationfromwebarchives.Ontheotherhand,collectingdocuments
usingtheevent-nameheuristicspresentsthecruciallimitationofoftenmissinginformationon
backgroundstoriesaswellaspremisesoftheexaminedevents.Togiveaspecificexample,letus
imaginethatthegoalistocollectprimarysourcesregardingthe2004UkraineOrangeRevolution.If
theadoptedmethodonlyretrievesdocumentsthatmentionthenameoftheevent,itwillnotcollect
materialsthatconnectthepremisesoftherevolutiontothepreviouscontroversialpresidential
electioninthecountry.AndthesameissuewillemergewhenstudyingthefirstfreeAlgerianelections
sincetheirindependence(1990),whichisapremiseoftheAlgeriancivilwar,orevenwhen
investigatingtheeconomiccrisisbehindFujimori'sauto-golpeinPeru,1992.Inthislastcase,the
documentsthatdiscusstheadoptionofausteritymeasureswillbenotbepartofthecollection.
Moreover,thenameusedforreferringtoaneventmightchangeovertimeorvarybetweencountries
andlanguages:forexample,oneoftheearlyhashtagsusedforthe2011EgyptianRevolutionwas
#jan25,referringtothedayitstarted.
Thesecondcasestudypresentedinthischapterisaninterdisciplinaryprojectbetweencomputer
scienceandpoliticalhistoryfocusedonbuildingmorecomprehensivesub-collectionsregarding
eventssuchaselections,protestsandpoliticalcrisesfromlarge-scalewebarchives.Aspartofthis
research,asystemthatemploysnaturallanguageprocessingmethodsandinformationretrieval
approacheshasbeendeveloped,whichisabletogatherandorganizeahighlycomprehensive
collectionofsourcesdescribingaspecificevent(Nannietal.,2017).Thedevelopedapproachis
inspiredbythefactthat,whenhistoriansareconductingthesametaskmanually(i.e.,identifying
relevantmaterialsacrossanentirearchive),theydonotnecessarilysearchonlyfordocumentsthat
mentionthenameoftheevent.Whathistorianswilltrytocollectarealsothosedocumentsthattalk
aboutrelatedaspectswhichprovidethecontext,involvingforexamplesomeoftheparticipantsto
theevent,butnotothers.IfweconsiderthepreviousexampleregardingtheOrangeRevolution,
historianswillalsobeinterestedinmaterialsfromthesameperiodoftimediscussingthepolitical
careerofYuliaTymoshenkooraddressingthestateofthepoliticalrelationsbetweenUkraine,Russia
andtheEuropeanUnion.
IdentifyingRelatedConceptsandEntities
Inordertoachievethisgoalinanautomatedfashion,thefirststepistobeabletoidentifyasetof
conceptsandentitiesthatarerelevanttoanevent.Todoso,DBpedia(Aueretal.,2007)hasbeen
employed.Thisisalarge-scaleknowledgebaseextractedfromWikipedia,whereevents(suchasthe
OrangeRevolution)arerepresentedbynodesandconnectedthroughedges(i.e.,hyperlinksin
Wikipedia)tootherrelatedentities.
RetrievingContextualPassages
Foreachcollectedentityandconcept,atextualpassagepresentingitinthecontextoftheeventwas
alsoextractedfromWikipedia(forexample:‘YuliaTymoshenkoco-ledtheOrangeRevolutionandwas
thefirstwomanappointedPrimeMinisterofUkraine’).Thisisanoptimalsolutionforidentifying
othertermsthatcouldbeusefultoindentifyrelevantdocuments.
RankingConceptsandEntities
Havingobtainedaninitialsetofpotentiallyrelevantconceptsandentities,thegoalistoscoreeachof
themonhowrelevanttheyaretotheevent.Forexample,whileYuliaTymoshenkoishighlyrelevant
fortheOrangeRevolution,theEuropeanUnionplayedonlyamarginalroleintheevent.Different
approachesforrankingentitiesandconceptsforrelevanceweretestedandthebestperforming
solutionwastocomputedistancesbetweenentitiesandtheeventemployingout-of-the-boxRDF
vector-representations(RistoskiandPaulheim,2016).
FindingMentionsinText
Havingourrankedsetofentitiesandconcepts,otherdocumentswereretrievedfromtheweb-
archivementioningtheminrelevantcontexts.Inordertogobeyondsimplestring-matchingof
conceptsthatareconsideredrelevant(e.g.,‘protests’,‘revolution’,‘crisis’,‘election’),word-
embeddingrepresentations(Mikolovetal.,2013)havebeenadopted.Embeddingtechniques
representeachword,entityorconcept(e.g.,‘protest’)asanumeric-vectorofndimensions.This
allowstomeasuresimilarityacrossdifferentwordsandtocollectrelevantmaterialseveniftheytalk
about‘demonstration’or‘crisis’,insteadforexampleofmentioning‘protest’or‘revolution’.
FinalCollectionBuilding
Itcouldhappenthatdocumentsmentionrelevantentitiesandconceptsoutofcontext,forexample
aspartofacomparison:‘ThepopularoppositiontoEthiopia'scurrentcorruptregimeiscomparable
totheOrangeRevolutioninUkraine.’.Inordertofilterthemoutandselectonlythedocumentsthat
shouldbeincludedintheevent-collection,amachinelearningsystemcalledLearningtoRank(Liu,
2009)hasbeenemployed,which,givenaninitialsetofrelevantandnotrelevantdocuments,learns
howtoabstractthispropertyandtoautomatetherankingprocess.
ACriticalCombinationofSourcesandMethods
Thecombinationoftraditionalpracticesofhistoricalresearchwithmethodologiesandapproaches
fromthefieldsofnaturallanguageprocessingandinformationretrievalisessentialforfacingthe
largeabundanceofborn-digitalprimarysources.Someoftheapproachespresentedinthischapter
havebeenalreadyadoptedinpoliticalscienceresearch.Oneofthesefirststudiesfocuseson
retrievingdocumentswhichreferredtopoliticalevents(e.g.,elections)frominstitutionalweb
collectionsoftheUnitedStatesgovernmentinordertodefineanewmeasureof‘attention’ofthe
U.S.CongressandthePresidenttodemocratizationandelectoralpracticesinothercountries,from
ZimbabwetoHaitiandEgypt(Elshehawyetal.,2017).Bydoingso,thisinitialworkhighlightsboththe
potentialandchallengesofusingborn-digitaldocumentsandcomputationalmethodsforobtaining
newinsightsontherecentpoliticalpast.
Thetwocasestudiespresentedinthischapterrevealtheimportanceofadoptingahighly
interdisciplinaryapproachwhendealingwithborn-digitalsources;methodologiesfromthefieldof
internetstudiescouldsupporthistoriansinreconstructinglostwebpages,whilenaturallanguage
processingmethodscouldguidetheminretrievingdocumentsfromlarge-scalewebarchives.The
finalpartofthischapterwillremarkfurtheronthis,bydiscussingontheimportanceofofferingthis
interdisciplinarypreparationtofuturehistoriansintheireducationalprograms.
4.Conclusion:ANewGenerationofHistorians
Inrecentyears,researchershavearguedthathistory,asotherhumanitiesdisciplines,isreachinga
turningpointinitsmethodology(Scheinfeldt,2012;Graham,MilliganandWeingart,2015;Nelson,
2016):sustainedbytheeffortsofmanydigitizationprojects,thecommunityhasbeenemploying
computationalmethodsinordertoexaminethesevastresourcesandobtainingnewinsights.This
changeinmethodologyhasreopenedalong-termdebateregardingthewaystextualevidenceofthe
pastcanandshallbeproperlyinterpreted.
Whileforthehistoricalprofessionitisofcoursebeneficialtoconstantlydebateandcriticizethe
validityofestablishedpracticesofacquiringknowledgefromsources,itisarguedinthischapterthat
theadoptionofdigitizeddatasetsandcomputationalmethodscannotbeconsidered,byitself,the
triggeringfactorofafundamentalturningpointinourprofession.Infact,adopting(ornot)large-scale
datasetsofdigitizedsources,togetherwithcomputationalmethods,willalwaysremainachoicefor
thehistoryscholar:CharlesDarwincanstillbestudiedwithoutconductingtextminingoverthe
collectionspresentedonDarwinOnline,aswellastheLondonof18thcenturycanbeexamined
withoutdistantreadingtheProceedingsoftheOldBaileyOnline.
However,itisalsoarguedthathistoryisinfactabouttofaceaparadigm-shiftingtransitioninits
methods,butthetriggeringcauseofthistransitionreliesontheborn-digitalnatureofthelarge
majorityofsourcesproducedbycontemporarysocieties.Thischangeaffectsanytypeofdocument
wecreateandconsumeinoureverydaylife,frombureaucraticformscollectedbythepublicsectorto
newspapersarticlestopoliticalmailcorrespondencestouniversitywebsites,anditisaboutto
presentitsmultifariousconsequencesonhistoricalresearch.
Born-digitalsourcesaresignificantlymorecomplextoarchive,collect,analyzeandselectcomparedto
traditionalmaterials.Websites(suchasUnibo.it),arelargeandvariegatedcollectionsofdocuments,
whichareoftennotpreservedintheirentiretybywebarchiveinitiativesandcanbere-constructed
onlythroughthemeticulouscombinationofvariouspiecesofinformationfromdifferentsources.
Whenaresource,suchastheinstitutionalwebsiteofanadministrationisfinallyre-created,itisoften
sovastthatcomputationaltechnologies(i.e.naturallanguageprocessingmethodsandinformation
retrievalapproaches)arenecessaryforidentifyingandretrievingspecificdocuments.
Themethodologicalstepsoverviewedinthischapterforcollecting,analyzingandselectingborn-
digitaldocumentsrequirestronginterdisciplinarycompetencesandahighlycriticalattitudetowards
sourcesandmethods.Inthiscomplexscenario,thischapterconcludesbyraisingaverypressing
question:howcanthenewgenerationsofhistoriansbepreparedtofacethesenewchallenges?
Inrecentyears,thedigitalhistorycommunityhasalreadyofferedmanyeducationalactivitieson
computationalmethodstoitsstudents.Fromworkshopstopanels,fromcoursestosummerschools,
fromtutorialstohackathons,theseinitiativeshavealmostalwaysbeenfocusedonpresentingthe
potentialofnewresources,toolsandplatformstothehistorystudents,followinganattitudewhich
hasbeenbrandedas‘morehack,lessyack’(Nowviskie,2014).Whileofferinghands-onexperiences
withcomputationaltoolsisimportantinordertointroducehistorystudentstothedigitalhumanities,
acriticalapproachisstronglyneededinordertoproperlydealwithborn-digitalsourcesand
computationalmethods.
Forthisreason,itisessentialthatstudentswillfirstofallbeguidedinshapingtheirresearchtopics
andreceiveearlyonintheirstudiesthepreparationnecessarytosupportacriticalanalysisofthe
born-digitaldocumentsandcomputationalmethodsattheirdisposal.Thiswillbeimperativefora
generationofhistorianswhowillbeabletogobeyondanunquestionedadoptionofthenewsources
andtoolsattheirdisposalandwillinsteadcriticallyemploythem,insearchofnewhistorical
perspectives.
References
AuerS.,BizerC.,KobilarovG.,LehmannJ.,CyganiakR.,IvesZ.(2007)‘DBpedia:ANucleusforaWeb
ofOpenData’,Proceedingsofthe6thInternationaland2ndAsianConferenceonSemanticWeb:722-
735.
Barthes,R.(1967)‘DiscourseonHistory’,SocialScienceInformation,6(4):65-75.
Blevins,C.(2014)‘Space,Nation,andtheTriumphofRegion:AViewoftheWorldfromHouston’,The
JournalofAmericanHistory,101(1):122-147.
Bloch,M.(1949)Apologiepourl'histoire,ou,Métierd'historien,ArmandColin,Paris.
Brockliss,L.W.(1978)‘PatternsofAttendanceattheUniversityofParis,1400–1800’,TheHistorical
Journal,21(3):503-544.
Brügger,N.(2005)‘ArchivingWebsites:GeneralConsiderationsandStrategies’,TheCentrefor
InternetResearch,Aarhus.
Brügger,N.(2009)‘WebsiteHistoryandtheWebsiteasanObjectofStudy’.NewMedia&Society,
11(1-2):115-132.
Brügger,N.(2011)‘WebArchiving–betweenPast,Present,andFuture’,inM.ConsalvoandC.Ess
(ed.),TheHandbookofInternetStudies,Wiley-Blackwell,Oxford.
Brügger,N.(2012)‘WhenthePresentWebisLaterthePast:WebHistoriography,DigitalHistory,and
InternetStudies’.HistoricalSocialResearch/HistorischeSozialforschung,37(4):102-117.
Burke,P.(2008)WhatisCulturalHistory?,Polity,Cambridge(UK).
Bush,V.(1945)‘AsWeMayThink’,TheAtlanticMonthly,176(1):101-108.
Cohen,D.J.,&Rosenzweig,R.(2005)DigitalHistory:AGuidetoGathering,Preserving,andPresenting
thePastontheWeb,UniversityofPennsylvaniaPress.
Cohen,D.J.(2005)‘Bythebook:AssessingthePlaceofTextbooksinUSSurveyCourses’,TheJournal
ofAmericanHistory,91(4):1405-1415.
Cohen,D.J.,Frisch,M.,Gallagher,P.,Mintz,S.,Sword,K.,Taylor,A.M.,&Turkel,W.J.(2008)
‘Interchange:ThePromiseofDigitalHistory’,TheJournalofAmericanHistory,95(2):452-491.
Daumard,A.,&Furet,F.(1959)‘Méthodesdel'histoiresociale:lesarchivesnotarialesetla
mécanographie’.Annales.Histoire,SciencesSociales,14(4):676-693.
Davis,C.(2014)‘ArchivingtheWeb:ACaseStudyfromtheUniversityofVictoria’.code{4}libJournal,
26(http://journal.code4lib.org/articles/10015)
Derrida,J.(1967)OfGrammatology,LesÉditionsdeMinuit,Paris.
Dougherty,M.,Meyer,E.T.,Madsen,C.M.,VandenHeuvel,C.,Thomas,A.,&Wyatt,S.(2010),
‘ResearcherEngagementwithWebArchives:StateoftheArt’,PreprintonSSRN
(https://ssrn.com/abstract=1714997)
Elshehawy,A.,Marinov,N.,&Nanni,F.(2017)‘QuantifyingAttentiontoForeignElectionswithText
AnalysisofUSCongressandthePresidency’,PreprintonSSRN(https://ssrn.com/abstract=2981486)
Evans,R.J.(2001)InDefenceofHistory,GrantaBooks,London.
Fogel,R.W.,&Engerman,S.L.(1974)TimeontheCross,UniversityPressofAmerica,Lanham,
Maryland.
GomesD.,MirandaJ.,CostaM.(2011)‘ASurveyonWebArchivingInitiatives’,Proceedingsofthe
15thinternationalconferenceonTheoryandpracticeofdigitallibraries:408-420.
Graham,S.,Milligan,I.,&Weingart,S.(2015)ExploringBigHistoricalData:TheHistorian's
Macroscope,ImperialCollegePress,London.
Greif,A.(1997)‘CliometricsAfter40Years’,TheAmericanEconomicReview,87(2):400-403.
Hale,S.A.,Yasseri,T.,Cowls,J.,Meyer,E.T.,Schroeder,R.,&Margetts,H.(2014)‘MappingtheUK
Webspace:FifteenYearsofBritishUniversitiesontheWeb’,Proceedingsofthe2014ACMConference
onWebScience:62-70.
Hockx-Yu,H.(2014)‘AccessandScholarlyUseofWebArchives’,Alexandria:TheJournalofNational
andInternationalLibraryandInformationIssues,25(1-2):113-127.
Holzmann,H.,Goel,V.,&Anand,A.(2016a)‘Archivespark:EfficientWebArchiveAccess,Extraction
andDerivation’,Proceedingsofthe2016IEEE/ACMJointConferenceonDigitalLibraries(JCDL):83-92.
Holzmann,H.,Nejdl,W.,&Anand,A.(2016b)‘TheDawnofToday'sPopularDomains:AStudyofthe
ArchivedGermanWebOver18Years’,Proceedingsofthe2016IEEE/ACMJointConferenceonDigital
Libraries(JCDL):73-82.
Iggers,G.G.(2005)HistoriographyintheTwentiethCentury:FromScientificObjectivitytothe
PostmodernChallenge,WesleyanUniversityPress,Middletown(CT).
Knowles,A.K.,&Hillier,A.(eds)(2008)PlacingHistory:HowMaps,SpatialData,andGISare
ChangingHistoricalScholarship,ESRI,NewYork.
LaFrance,A.(2015)‘RaidersoftheLostWeb’.TheAtlantic,14
(https://www.theatlantic.com/technology/archive/2015/10/raiders-of-the-lost-web/409210/)
Lin,J.,Milligan,I.,Wiebe,J.,&Zhou,A.(2017)‘Warcbase:ScalableAnalyticsInfrastructurefor
ExploringWebArchives’,JournalonComputingandCulturalHeritage(JOCCH),10(4):22.
Liu,T.Y.(2009)‘LearningtoRankforInformationRetrieval’,FoundationsandTrendsinInformation
Retrieval,3(3):225-331.
Lyotard,J.F.(1979)Thepostmoderncondition:AReportonKnowledge,Minuit,Paris.
Munslow,A.(2006)DeconstructingHistory.Routledge,NewYork.
Milligan,I.(2012)‘Miningthe‘InternetGraveyard’:RethinkingtheHistorians’Toolkit’.Journalofthe
CanadianHistoricalAssociation/RevuedelaSociétéhistoriqueduCanada,23(2):21-64.
Milligan,I.(2016)‘LostintheInfiniteArchive:ThePromiseandPitfallsofWebArchives’,International
JournalofHumanitiesandArtsComputing,10(1):78-94.
Mikolov,T.,Sutskever,I.,Chen,K.,Corrado,G.S.,&Dean,J.(2013)‘DistributedRepresentationsof
WordsandPhrasesandtheirCompositionality’,Proceedingsofthe26thInternationalConferenceon
NeuralInformationProcessingSystems:3111-3119.
Moretti,F.(2013)DistantReading.VersoBooks,London.
Nanni,F.(2013)‘L’archiviazionedellepaginedeiquotidianionline’,Diacronie.StudidiStoria
Contemporanea,15(3)(http://www.studistorici.com/wp-content/uploads/2013/10/02_NANNI.pdf)
Nanni,F.(2017a)‘ReconstructingaWebsite'sLostPast:MethodologicalIssuesConcerningtheHistory
ofwww.unibo.it’,DigitalHumanitiesQuarterly.11(2)
(http://www.digitalhumanities.org/dhq/vol/11/2/000292/000292.html)
Nanni,F.(2017b)‘TheWebasaHistoricalCorpus:Collecting,AnalysingandSelectingSourcesonthe
RecentPastofAcademicInstitutions’,Ph.D.Dissertation,UniversityofBologna.
Nanni,F.,Ponzetto,S.P.,&Dietz,L.(2017)‘BuildingEntity-CentricEventCollections’,Proceedingsof
2017IEEE/ACMJointConferenceonDigitalLibraries(JCDL):199-209.
Nelson,R.K.(2016)‘DigitalHumanitiesasAppendix’,AmericanQuarterly,68(1):131-136.
Noiret,S.(2015)‘DigitalPublicHistory:BringingthePublicBackIn’,PublicHistoryWeekly,3(13)
(http://hdl.handle.net/1814/38393).
Nowviskie,B.(2014)‘OntheOriginof“Hack”and“Yack”’,,inM.K.GoldandL.F.Klein(eds)Debates
inDigitalHumanities(2ndedn),UniversityofMinnesotaPress
(http://dhdebates.gc.cuny.edu/debates/text/58)
Owen,D.,&Davis,R.(2008)‘PresidentialCommunicationintheInternetEra’,PresidentialStudies
Quarterly,38(4):658-673.
Ramage,D.R.(2011)‘StudyingPeople,Organizations,andtheWebwithStatisticalTextModels’,
Ph.D.Dissertation,StanfordUniversity.
Ristoski,P.,&Paulheim,H.(2016)‘RDF2vec:RDFGraphEmbeddingsforDataMining’,Proceedingsof
the2016InternationalSemanticWebConference:498-514.
Robertson,S.(2016)‘TheDifferencesBetweenDigitalHistoryandDigitalHumanities’,inM.K.Gold
andL.F.Klein(eds)DebatesinDigitalHumanities(2ndedn),UniversityofMinnesotaPress.
(http://dhdebates.gc.cuny.edu/debates/text/76).
Rosenzweig,R.(2003)‘ScarcityorAbundance?PreservingthePastinaDigitalEra’,TheAmerican
HistoricalReview,108(3):735-762.
Rothman,J.(2014)‘AnAttempttoDiscovertheLawsofLiterature’,TheNewYorker.
Rüegg,W.,&deRidder-Symoens,H.(eds)(1992)AHistoryoftheUniversityinEurope,Cambridge
UniversityPress,Cambridge.
Scheinfeldt,T.(2012)‘SunsetforIdeology,SunriseforMethodology’,inM.K.GoldandL.F.Klein
(eds)DebatesinDigitalHumanities(1stedn),UniversityofMinnesotaPress:124-127.
Schreibman,S.,Siemens,R.,&Unsworth,J.(eds)(2004)ACompaniontoDigitalHumanities,Blackwell
Publishing,Oxford.
Shafer,R.J.(1974)AGuidetoHistoricalMethod,DorseyPress,Belmont(CA).
Small,T.A.(2011)‘WhattheHashtag?AContentAnalysisofCanadianPoliticsonTwitter’,
Information,Communication&Society,14(6):872-895.
Thaller,M.(1991)‘TheHistoricalWorkstationProject’,ComputersandtheHumanities,25(2):149-
162.
ThomasIII,W.G.(2004)‘ComputingandtheHistoricalImagination’,inSchreibman,S.,Siemens,R.,&
Unsworth,J.(eds)ACompaniontoDigitalHumanities,BlackwellPublishing,Oxford:56-68.
Wilkens,M.(2013)‘TheGeographicImaginationofCivilWar-EraAmericanFiction’,AmericanLiterary
History,25(4):803-840.
1Itisalsoimportanttoacknowledgethatreactionstopostmodernapproachesarepresentaswellinthehistoriographicdebate
(seeforexampleEvans,2001).
2SeeforexampletheadoptionofsocialsciencemethodologiesinhistoricalresearchinFogelandEngerman(1974).
3However,therelationshipbetweenhistoryandcomputingontheonesideandliteraryandlinguisticcomputingontheother
sidehasalwaysbeencomplicated(seeforexampleRobertson,2016).
4AsdescribedintheFAQsectionoftheInternetArchive,awebsiteownercanrequesttostopcrawlingorarchivingasiteand
theInternetArchivewillendeavortocomplytoit.Thiswillbesignaledbya'blockedsiteerror'messagesuchas‘ThisURLhas
beenexcludedfromtheWaybackMachine’.
5In2001theUniversityofBolognawebsitewonthe‘WWW’prizefromtheItalianeconomicnewspaperIlSole24Oreforthe
bestwebsiteinthecategory‘School,universityandresearch’.Then,forthreeconsecutiveyears(2005-2007)Unibo.itreceived
the‘Osc@rdelweb’prizeasthebestItalianpublicadministrationwebsite.In2007LuigiNicolais,theItalianMinisterofPublic
Administration,wasalsopresenttoconfertheprize.