2011.How to Compare One Million Images

40
7/27/2019 2011.How to Compare One Million Images http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 1/40 1/40 HowtoCompareOneMillionImages? Dr.LevManovich, Professor,VisualArtsDepartment,UniversityofCalifornia,SanDiego(UCSD). Director,SoftwareStudiesInitiative(softwarestudies.com)atCaliforniaInstitutefor TelecommunicationandInformation(Calit2). Dr.JeremyDouglass, Post‐doctoralResearcher,SoftwareStudiesInitiative. TaraZepel , Ph.D.candidate,ArtHistory,Theory,&Criticism, UCSD. Exploringonemillionmangapagesonthe287megapixelHIPerSpace(TheHighlyInteractiveParallelized DisplaySpace)atCalit2,SanDiego.HIPerSpaceoffers35,840x8,000pixelsresolution(287megapixels)on 31.8feetwideand7.5feettalldisplaywallmadefrom7030‐inchmonitors.

Transcript of 2011.How to Compare One Million Images

Page 1: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 1/40

1/40

HowtoCompareOneMillionImages?

Dr.LevManovich,

Professor,VisualArtsDepartment,UniversityofCalifornia,SanDiego(UCSD).

Director,SoftwareStudiesInitiative(softwarestudies.com)atCaliforniaInstituteforTelecommunicationandInformation(Calit2).

Dr.JeremyDouglass,

Post‐doctoralResearcher,SoftwareStudiesInitiative.

TaraZepel,

Ph.D.candidate,ArtHistory,Theory,&Criticism,UCSD.

Exploringonemillionmangapagesonthe287megapixelHIPerSpace(TheHighlyInteractiveParallelized

DisplaySpace)atCalit2,SanDiego.HIPerSpaceoffers35,840x8,000pixelsresolution(287megapixels)on

31.8feetwideand7.5feettalldisplaywallmadefrom7030‐inchmonitors.

Page 2: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 2/40

2/40

INTRODUCTION

ThedescriptionofjointNEH/NSFDiggingintoDatacompetition(2009)organizedbyOfficeof

DigitalHumanitiesattheNationalEndowmentofHumanities(theU.S.federalagencywhichfunds

humanitiesresearch)openedwiththesequestions:“Howdoesthenotionofscaleaffecthumanities

andsocialscienceresearch?Nowthatscholarshaveaccesstohugerepositoriesofdigitizeddata—

farmorethantheycouldreadinalifetime—whatdoesthatmeanforresearch?”Ayear,later,anarticleinNewYorkTime(November16,2010)stated:“Thenextbigideainlanguage,historyand

thearts?Data.”

Whiledigitizedarchivesofhistoricaldocumentscertainlyrepresentajumpinscaleincomparison

totraditionallysmallcorporausedbyhumanists,researchersandcriticsinterestedin

contemporaryculturehaveevenalargerchallenge.WiththenotableexceptionofGoogleBooks,the

sizeofdigitalhistoricalarchivespalesincomparisontothequantityofdigitalmediacreatedby

contemporaryculturalproducersandusers–designs,motiongraphics,websites,blogs,YouTube

videos,Flickrphotos,Facebookpostings,Twittermessages,andotherkindsofprofessionaland

participatorymedia.Thisquantitativechangeisasatleastasimportantastheotherfundamental

effectsofthepolitical,technologicalandsocialprocessesthatstartaftertheendoftheColdWar

(forinstance,freelong‐distancemultimediacommunication).InanearlierarticleIdescribedthisin

thefollowingway:

Theexponentialgrowthofanumberofbothnon‐professionalandprofessionalmediaproducersoverthelast

decadehascreatedafundamentallynewculturalsituationandachallengetoournormalwaysoftracking

andstudyingculture.Hundredsofmillionsofpeopleareroutinelycreatingandsharingculturalcontent‐

blogs,photos,videos,onlinecommentsanddiscussions,etc.Atthesametime,therapidgrowthof

professionaleducationalandculturalinstitutionsinmanynewlyglobalizedcountriesalongwiththeinstant

availabilityofculturalnewsoverthewebandubiquityofmediaanddesignsoftwarehasalsodramatically

increasedthenumberofcultureprofessionalswhoparticipateinglobalculturalproductionanddiscussions.

Before,culturaltheoristsandhistorianscouldgeneratetheoriesandhistoriesbasedonsmalldatasets(for

instance,"ItalianRenaissance,""classicalHollywoodcinema,"“post‐modernism,”etc.)Buthowcanwetrack"globaldigitalcultures",withtheirbillionsofculturalobjects,andhundredsofmillionsofcontributors?

Beforeyoucouldwriteaboutculturebyfollowingwhatwasgoingoninasmallnumberofworldcapitalsand

schools.Buthowcanwefollowthedevelopmentsintensofthousandsofcitiesandeducationalinstitutions?

(Manovich,CulturalAnalytics,2009).

Whiletheavailabilityoflargedigitizedcollectionsofhumanitiesdatacertainlycreatesthecasefor

humaniststousecomputationaltools,theriseofsocialmediaandglobalizationofprofessional

cultureleaveusnootherchoice.Buthowcanweexplorepatternsandrelationsbetweensetsof

photographs,designs,orvideo,whichmaynumberinhundredsofthousands,millions,orbillions?

(Bysummer2010,Facebookcontained48billionphotos;deviantArt.com,theleadingsitefornon‐

professionalart,housed100millionsubmissions;coroflot.com,asiteusedbyprofessional

designershad200,00portfolios.)

In2007wehavesetupanewlabcalledSoftwareStudiesInitiative

(http://www.softwarestudies.com)atUniversityofCalifornia,SanDiego(UCSD)andCalifornia

InstituteforTelecommunicationandInformation(Calit2)toaddressthesechallenges.We

developedanumberofmethodsandtechniquesfortheanalysisandvisualizationoflargesetsof

images,video,andinteractivevisualmedia.Thisarticledescribesourkeymethodthatconsists

fromtwoparts:1)automaticdigitalimageanalysisthatgeneratesnumericaldescriptionsof

variousvisualcharacteristicsoftheimages;2)visualizationsthatshowthecompleteimageset

organizedbythesedimensions.

Page 3: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 3/40

3/40

Wehavealreadysuccessfullyappliedthismethodforresearchingmanytypesofvisualmedia:

comics,webcomics,videogames,virtualworlds,video,films,cartoons,motiongraphics,print

magazines,paintings,andphotographs.Examplesinclude167,000imagesfromArtNowFlickr

group,100hoursofKingdomHeartsvideogameplay,and20,000pagesofScienceandPopular

Sciencemagazines(1872‐1922).(Formoredetailsandotherprojects,seeDigitalHumanities

sectionofourlabblog http://www.softwarestudies.com;youcanalsofindourvisualizationson

YouTubeandFlickrathttp://www.flickr.com/photos/culturevis/collections/).

Totesthowthismethodwouldworkwithlargerdatasets,intheFall2009wedownloadedthe

completerunsof883differentmangaseriesfromthemostpopularwebsitefor“scanlations”

OneManga.com.(“Scanlation”referstotomangapublicationswhicharedigitizedandtranslatedby

fans.)Weobtainedallpagesavailablefortheseseriesalongwiththeuser‐assignedtagsindicating

thegenresandintendedaudiencesforthem(SeeDouglass,Huber,Manovich,Understanding

Scanlation).

Theresultingdatasetcontains1,074,790mangapages.EachpageisintheformofaJPEGimage;

averageimageresolutionis850x1150pixels.Weusedourcustomdigitalimageprocessing

softwaretomeasureanumberofvisualfeaturesofeverypage(shading,presenceoffinedetails,

textureproperties)usingsupercomputersattheNationalEnergyResearchScientificComputing

Center(NERSC).(A“feature”isthecommontermindigitalimageprocessing;itreferstoa

numericaldescriptionofsomeimagepropertysuchasaveragegrayscalevalue,orthenumberof

lines.Forthereadersfamiliarwithcomputationaltextanalysis:imagefeaturesarestructurally

similartotextfeatures;theyofferacompactdescriptionofthedata.)

Inthisarticleweusethechallengeofworkingwiththissetofonemillionmangapagestomotivate

theneedforacomputationalapproachfortheexplorationofcollectionsofsuchsize,andtoexplain

ourparticularmethodthatcombinesdigitalimageanalysisandanovelvisualizationtechnique.

Anyreflectiononculturebeginswithaninformalcomparisonbetweenmultipleobjectsinorderto

understandtheirsimilaritiesanddifferences.Forinstance,ifwewantouranalysistoproperly

reflecttherangeofgraphicaltechniquesusedtodaybymangaartistsacrossthousandsofmanga

books,millionsofpagesinthesebooks,andtensofmillionsofindividualpanels,weneedtobeable

toexaminedetailsofindividualimagesandtofindpatternsofdifferenceandsimilarityacrosslarge

numbersofimages.Todothis,weneedamechanismthatwouldallowustopreciselycomparesets

ofimagesofanysize–fromafewdozenstomillions.Wediscusshowourmethod,whichcombines

automaticdigitalimageanalysisandmediavisualization,addressestheserequirements.

HOWTOCOMPAREONEMILLIONIMAGES?

Today,atypicalpublicationinhumanitiesisbasedonthedetailedexaminationofasmallnumberof

artifacts(which,dependingonthefieldcanbeliterarytexts,TVprograms,films,videogames,etc.

Ofcourse,thisdoesnotmeanthattheauthoronlyconsideredtheseartifactsinisolation.Usuallythedetailedanalysisoftheseartifactsisperformedagainstthelargerhorizon‐theknowledgeof

thewiderculturalfield(s)whichisacquiredbothdirectly(forinstance,watchingfilms)or

indirectly(forinstance,readingpublicationinfilmstudies).Buthowreliableorcompleteisthis

backgroundknowledge?Forinstance,tocontinuewiththeexampleofcinema,IMDb

(www.imdb.com)containsinformationforoverahalfamillionfilmsproducedsincethebeginnings

ofcinema;howmanyofthemwereseenbyacademicfilmscholarsandfilmcritics?(Thesame

databaselists1,137,074TVepisodesasofsummer,2001;seeIMDbDatabaseStatistics).

Thefactthatusingtinysampleshasbeenadefaultmethodofhumanitiesuntilnowdoesnotmean

thatweshouldkeepusingitbydefault.IfGooglecananalyzebillionsofwebpagesandovera

trillionlinksseveraltimeseachday,weshouldbeabletodobetterthansimplyconsiderahandful

Page 4: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 4/40

4/40

ofartifactsandgeneralizefromthem–evenifwedon’thavethesameresources.Thekeyreason

forthehugeinfrastructuremaintainedbyGoogleistheneedtoindexthewebinrealtime;inthe

caseofculturalanalysis,wedon’thavethesamerequirements,soweshouldbeabletoanalyze

largeculturaldatasetswithmuchsmallercomputerresources.Indeed,todaycomputerscientists

whostudyonlinesocialmediatypicallycaptureandanalyzedozensorhundredsofmillionsof

separateobjects‐photos,twitterupdates,blogposts,etc.‐withverysmallresources.(See

Cha,Kwak,Rodriguez,Ahn,Moon,ITube,YouTube,EverybodyTubes;Crandall,Backstrom,

Huttenlocher,Kleinberg,Mappingtheworld'sphotos;Kwak,Lee,Park,Moon.WhatisTwitter?).

Havingatourdisposalverylargeculturaldatasetswhichcanbeanalyzedautomaticallywith

computersandexploredusinginteractiveinterfacesandvisualizationopensupexistingnew

possibilities.Thesepossibilitiescanpotentiallytransformourunderstandingofculturesinmany

importantways.Insteadofbeingfuzzyandblurred,ourhorizon(knowledgeofaculturalfieldas

whole)canbecomerazorsharpandatthesametimeacquireanewdepth(beingabletosortand

clustermillionsofartifactsalongmultipledimensions).Thiswouldenrichourunderstandingofany

singleartifactbecausewewouldseeitinrelationtopreciselydelineatedlargerpatterns.Itwould

alsoallowustomakemoreconfidentstatementsaboutthefieldatlarge.Perhapsmostimportantly,

itwillerasethedistinctionbetweentheprecisionof“closereading”andimprecisionofa“zoomed

out”view“–betweenadetailedunderstandingofafewworksandveryapproximateideasabout

thefieldasawholewhichwenormallyformbymentallyinterpolatingbetweenasmallnumberof

factsandartifactswestudied.Itwillalsoerasetheseparationbetween“closereading”(detailed

analysisofsmallpartsoftexts)andFrancoMoretti’s“distantreading”(asitiscommonly

understood‐whichisnotthesameashowMorettidefinesit–seehisConjecturesonWorld

Literature,2000):analysisoflargescalepatternsinthedevelopmentofentiregenres,literary

productionofwholecountries,andthelikeusingawholenovelasanatomofanalysis(forinstance,

countinganumberofnovelsindifferentgenrespublishedoverahistoricalperiod.)Ratherthan

choosingonescaleofanalysis,wewouldbeabletoeasilytraversebetweenallthematwill,

observingpatternsatanyscale.

Anyautomaticcomputationalanalysisoflargesamplesofhumancultureswillhavemany

limitationsofitsown,andthereforeitwillnotreplacehumanintuitionandexperience.However,

whileweshouldkeepinmindthesevariouslimitations,theopportunitiesthatitoffersarestill

immense.Forexample,havingaccesstoamillionmangapagesshouldallowus,inprinciple,to

prettyreliablymapthefullspectrumofgraphicalpossibilitiesusedbycontemporarycommercial

Japanesemangaartists.Suchamappingwouldalsoallowustounderstandwhichmangaseriesare

mosttypicalandwhicharemostuniquestylistically;tofindallserieswheregraphicallanguage

significantlychangesovertime(todayalltopserieshavebeenrunningforanumberofyears);to

investigateifshorterseriesandlongerserieshavedifferentpatterns;toseparatetheartistswho

significantlyvarytheirgraphicallanguagesfromseriestoseriesfromtheartistswhodonot;etc.

Wealsowouldbeabletotakeanyhypothesisorobservationwemaymakewhileinformally

lookingthroughasmallsamplesofimages–forinstance,wemayobservethatthemangaaimedat

differentgendersandagegroupshasdistinctgraphicallanguages–andseeifitholdsacrossour

wholeset.

Buthowcanweaccomplishthisinpractice?Howcanwepossiblycompareonemillionmanga

pages?

WHATEYESCAN’TSEE

1.Letusstartbyselectingonlytwopagesfromourmangaimagesetsetandexaminingthem

directlywithoutanysoftwaretools.WewilltakethesepagesfromOnePieceandVampireKnight 

series.Thefirstisoneofthebestsellingandtoprated shounen(teenageboys)manga;thesecondis

amongthetopshoujo(teenagegirls)manga.

Page 5: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 5/40

5/40

Leftimage:samplepagefromVampireKnight .Rightimage:samplepagefromOnePiece.Accordingtothe

OneManga.comJune2010listoftop50seriestitlesaccessedbysiteusers, OnePiecewasno.2,whileVampire

Knight wasno.13.Accordingtowww.icv2.com,duringQ32010,OnePiecewasno.2inJapan,andVampire

Knightwasno.4.ThedifferenceinVampireKnight rankingislikelytoreflectdifferentproportionsof

male/femalemangareadersinsideandoutsideofJapan.

Wecancertainlynotemanykindsofstylisticdistinctionsbycomparingthesetwopages.For

instance,wemayobservethatapagefromVampireKinight containsdramaticdiagonalangles

createdbybothpaneldivisionsandlinesinsidethepanels,thefullpaletteofgreytonesfromwhite

togreytoblack;thelargeblacklettersrepresentingsoundsandcontributingadditionalvisual

energy.Incontrast,thepagefromOnePiecehasverylittleshading;thelineshavecurves;thepanels

areneatrectangles.However,howdoweknowifthesestylisticdistinctionsholdforallofthe10562OnePiecepagesand1423VampireKnight pageswhichwehaveavailable?Similarly,ifwe

wanttotalkaboutagraphicalstyleofanartistwhomayhaveproducedtensofthousandsofpages

acrossdozensoftitles,wouldselectingandexaminingafewdozenpagesbesufficient?Andwhatif

wewanttocompareallshounenandshoujomangainourdataset?Howmanypagesshouldwepull

outfromeachofthesecategoriestoreachauthoritativeconclusionsaboutthepossibledifferencein

theirgraphicallanguages?

Hereisanotherexample.Let’ssaywewanttocompareOnePiecewithanotherhighlypopular

shounen(teenageboys)manga:Naruto.Herearethetwosamplepagesfromthetwoseries.

Page 6: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 6/40

6/40

Leftimage:samplepagefromNaruto.Rightimage:samplepagefromOnePiece.

Wecannoticecertaindifferencesingraphicalstylebetweenthesepages–buthowtypicalarethese

differencesforallof10562OnePiecepagesand8037Narutopageswehaveavailable?Incontrast

tothefirstexamplewherethestylesvarieddramatically,herethedifferencesaremoresubtle–

whichmakesitevenmoreproblematictogeneralizewhatweseeinthesetwopagestothe

completeseries.

Page 7: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 7/40

7/40

 

10461scanlationpagesfromOnePiecewhichwereavailableononemanga.comintheFall2009,organizedby

sequenceofpublication(lefttoright,toptobottom).Thiscompositeimage(wecallsuchimages“montages”)

includesspecialpagesinsertedbyscanlationgroups(someofthemappearasdarkblacksquareswhenyou

Page 8: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 8/40

8/40

lookatasmallversionofthevisualization).Note:Tofitallpagesinarectangulargrid,someofthepageswere

cropped.

Ifweonlyexamineasmallnumberofimagesfromamuchlargerimagesetatrandom,wecan’tbe

certainthatourobservationswillholdacrossthecompleteset.Forexample,inthecaseofourone

millionpagescollection(whichitselfisonlyasampleofallmangabeingpublishedcommercially),

100pagesis%0.0001ofallavailablepages.Wedon’thaveanyreasontobelievethatwhateverwe

mayobserveinthese100pagesistypicalofourmangasetawhole.

Thus,thefirstproblemwithusingournativeperceptiontocompareimagesinordertonotice

differencesandsimilaritiesisthatthisapproachdoesnotscale. Ifweonlyexamineasmallselection

ofmangapages,thisdoesnotqualifyustomakegeneralstatementsaboutthegraphicalstyleof

“best‐sellingmanga,”“shounenmanga,”orevenofasinglelongrunningseriessuchas Hajimeno

Ippo(15978pagesinourdataset.)

Examiningonlyasmallsampleofalargerimagesetalsoprecludesusfromunderstandingdetailed

patternsofchangeandevolution.Toillustratethis,wepulledoutthreepagesfromabest‐selling

OnePiecemangaseriesdrawn.Thepagescomefromchapters5,200,and400.Theseriesstartedin

1997;approximately600chapterswerepublishedbytheendof2010,withnewchapters

appearingweekly.Thus,thetimepassedbetweenchapter5andchapter400isapproximatelyeightyears;duringtheseperiods,theartistscreatedover7000pages.(Ourdatasourcedidnothave

exactinformationonpublicationdatesofeverychapter,andthat’swhywehavetoestimatethemin

thisway).Aswecanimmediatelyseebycomparingthesethreepages,thegraphicallanguageof

OnePieceapparentlychangedsignificantlyduringtheseeightyears‐buthow diditchange?Wasita

gradualtransition,aseriesofabruptchanges,orsomeothertemporalpattern?Unlesswehavea

mechanismtocompareamuchlargernumberofpages,wecan’tanswerthisquestion.

SamplepagesfromOnePiecemangaseriesdrawnfromthe5th,200th,and400 thchapters.

Inthisexample,itwillbebeneficialtopulloutalargersetofsamplepagesusingsomesystematic

procedure.Forexample,wecanselectevery100thpagetogetabetterideaofhowseries’visual

languageischangingovertime.Thisapproachcanalsobeappliedforcomparingdifferentmanga

series.Forinstance,wecouldtakeevery200thpageofNarutoandevery200thpageofOnePiece.

(Narutoisthemostpopularmangaseriesaroundtheworldtoday.)Sinceourdatasetcontains

approximately10,000pagesforeachoftheseseries,wewouldendupwith50pagesforeach.We

couldthenexaminethese100pagesinordertodescribethedifferencesbetweenthestylesofthe

Page 9: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 9/40

9/40

twoseries.Or,ifourseriesarenotverylong,wecanuseadifferentprocedure.Wecanselectone

pagefromeachchapterofaseriesandthenusesuchsetsofpagestocompareseriestoeachother.

However,suchanapproachwillfailifthestylewithineachchapter varies.Thepageswemayselect

usingoursamplingproceduremaynotproperlyreflectallthesevariations.Thisisafundamental

problemwiththeideaofselectinganumberof“representativepages”andonlyworkingwiththis

smallerset.

Forinstance,considerallpagesfromoneof Abara’schapters(seeillustrationbelow).Whichpagein

eachchapterbestrepresentsitsstyle?Regardlessofwhichpageweselecttostandinforthewhole

chapter,itwillnotadequatelyconveythestylisticvariationsacrossallthepages.Somepages

consistmostlyfromblackandwhiteareas,withlittletexture;otherpageshavelotsofshadingand

usemostlygrey;stillotherscombinepanelsinbothstyles.

Ofcourse,ifweareusingacomputertobrowseourimageset,wearenolongerlimitedtopulling

outasmallsampleofimagesforcloseexamination.Modernoperatingsystems(OS)suchas

Windows,MacOS,iOS,andAndroid,imageorganizersoftwaresuchasGoogle’sPicasaandApple’

iPhoto,andwebphotosharingservicessuchasPhotobucketandFlickrallprovidetheoptionsto

browsethroughlotsofimages.Soifwe,forexample,addall10,461OnePiecepagestoiPhoto,we

canpan,zoominginandoutatwill,quicklymovingbetweenindividualimagesandthewholeset.

Thisshouldhelpustonoticeadditonalvisualdifferencesandaffinitiesacrosslargesetsofimages

beoyndwhosepickedupbyacloseanalysisofasmallsample.Unfortunately,thedisplaymodes

offeredbyexistingconsumersoftwareandwebservicesareratherinflexible.Typically,theonly

modesavailableareaslideshowandanimagegrid.Moreover,usuallytheimagescanbesortedby

onlyafewparameterssuchasuploadeddates,ornames,andtheusercan’teasilychangetheorder

inwhichimagesaredisplayed.Inordertodisplayimagesinadifferentorder,youhavetofirstto

assignnewkeyword(s)toeveryimage.Thispreventsspontaneousdiscoveryofinterestingpatterns

inanimageset.Insteadofreorganizingimagesalongdifferentdimensionsandquicklytrying

differentpossibilities,auserwouldhavetoknowfirsttheexactorderinwhichtodisplayimages.

Ofcourse,ifourimagesethassomeveryobviouspatterns–letssay,itconsistsfromimagesin

threedistinctstylesonly–theselimitedbrowsingmodesandfixedorderwouldbestillsufficient,

andwewilleasilynoticethesepatterns.Butsuchcasesareexceptionsratherthanthenorm.(While

Picassoworkedinanumberofdramaticallydifferentstyles,heisnottypical.)

Analternativetoexaminingasetofimagesinformally‐regardlessofwhetherwearelooking

formallyatafew,orusesoftwaretobrowsethroughmany–istosystematicallydescribeeach

usinganumberofterms,andthenanalyzethedistributionsoftheseterms.Inhumanities,this

processiscalled“annotation.”Aresearcherdefinesadictionaryofdescriptivetermsandthentags

allimages(orfilmshots,transitionsbetweencomicframes,oranyothervisualobjects).Aparallel

topopularpracticeofuserstaggingmediaobjectsinsocialmediasites(thinkoftagsinFlickr),or

addingkeywordstoone’sblogpostisobvious–howeverwhileusersarefreetoaddanykeywords

theywant,inacademicstudiesresearcherstypicallyemploy“closedvocabularies”whereasetof

termsisdefinedbeforehand.Onceallimagesareannotated,wecanlookatalltheimagesthathave

particulartags;wecanplotandcomparethetagfrequenciesandotherstatisticaldescriptions.For

instance,ifweannotatemangapagesusingasetoftagsdescribingvisualstyle,wecanthen

comparehowofteneachstylisticfeaturewasusedforshounenvs.shoujopages.

BarrySaltpioneeredtheuseofthismethodtostudyvisualmedia.Heannotatedallshotsinfirst30

minutesofafewhundredsof20thcenturyfeaturefilmsusinganumberofcharacteristics:shot

scale,cameramovement,andangleofshot.Baltusedasmallnumberofcategoriesforeach

characteristic.Forexample,possiblecameramovementtypeswerepan,tilt,panwithtilt,track,etc.

(BarrySalt’sDatabase).Healsorecordedshotduration(Salt,TheStatisticalStyleAnalysis;Film

StyleandTechnology.)Saltthenuseddescriptivestatisticsmeasuresandgraphstoanalyzethis

data.Inhisveryinfluentialbook UnderstandingComics (1993)ScottMcLoudemployedasimilar

Page 10: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 10/40

10/40

methodcomparethevisuallanguageofJapanesemangaandcomicsfromtheWest.Heannotated

typesoftransitionsbetweenframesinanumberofmangaandcomicbooks,andthenused

histogramstovisuallyexplorethedata.

Communicationandmediastudiesfieldhaveasimilarmethodcalled“contentanalysis.”If

humanistsusuallyareconcernedwithworksofaparticularauthor(s),communicationresearchers

typicallyemployedcontentanalysistoanalyzerepresentationsinmassmedia,and,morerecently,

user‐generatedcontent.Therefore,theymorecarefullydeterminetheirsamples;theyalsoemploymultiplepeopleto“code”(thetermusedincontentanalysistorefertoannotation)mediamaterial,

andthencalculatethedegreeofagreementbetweentheresultsofdifferentcoders.Hereiscouple

ofrecentapplicationsofthismethod.Herring,Scheidt,Kouper,andWrightanalyzedthecontentof

457randomlyselectedblogscollected at roughly six-month intervals during 2004-2004 “to assessthe extent to which the characteristics of blogs themselves remained stable or changed during

this active period.” Williams,Martins,Consalvo,andIvory analyzed characters in 150 videogames; the total of 8572 characters were coded to “answer questions about their representations

of gender, race and age in comparison to the US population.”

Thismethodismorepowerfulthaninformalexaminationofmedia,butitsuffersfromthesame

problem–itdoesnotscaletomassivedatasets.McLoud,forinstance,onlyannotatedasmall

numberofcomicbooksandmangatitles.Wouldheobtainthesameresultswithamuchlargerset–

suchasourcollectionofonemillionmangapages?Andevenifhisresultswouldbeconfirmed,what

aboutallpossibleexceptions?Tofindthem,weneedtotageverypage.

Letussaywehaveatrainedviewerwhocanexamineamangapageandselecttherelevanttags.If

thisviewerspends1minuteforeachpageandworks8hoursperday,itwouldtakealmost6years

toannotateonemillionpages.

(Recently,itbecamepossibletousecrowdsourcinginordertospeedupthisprocess.Sincewe

cannotexpecteverypersontohavethesamejudgmentsaboutvisualformortousetagsinthe

sameway,researchersusestatisticaltechniquestocalculateconsistencybetweenthejudgmentsof

allparticipatingworkers,andtodisregardlowqualityresults.However,thisapproachhasa

fundamentallimitation–likeanyotherattempttodescribeimagesusingnaturallanguages,itismuchbetteratcapturingimagescontentthanform.Wediscussthisinmoredetailbelow.)

Summary:Whenweexamineonlyasmallsubsetoflargeimageset,oursamplemaynotbe

representativeofthewholeset;itmaynotproperlyreflectallvariationsinaset;andwemaynotbe

abletostudygradualchangesovertime.

2.Sowhatifweassumethatourdatasetcontainsnotamillionimages,butonlyahundred?Not

thattheproblemofscalegoesaway,isitsufficienttouseoureyesonly?No.Thesecondproblem

withusingoureyesisthatwearenotverygoodatregisteringsubtledifferencesbetweenimages .If

youareconfrontedwithasmallnumberofimagesthathavesubstantialdifferences,yourbraincan

easilyorganizetheseimagesaccordingtotheirvisualdifferences.(HereIamnotconcernedwith

differencesincontent,whichareeasytosee,butwiththedifferencesinvisuallanguage).Thismeansbeingabletoseparatethemintogroups,rankthemaccordingtooneormorekindsofvisual

characteristics,noticetheoutliers(imageswhichstandoutfromtherest),andcompleteother

tasks.Forinstance,wehavenoproblemdistinguishingbetweenpaintingsbyPietMondrianand

TheovanDoesburgcreatedafter1925.Mondrianusedexclusivelyhorizontalandverticalline

orientations,whilevanDoesburgalsoonlyuseddiagonals.Thesepropertiesclearlymark

Mondrian’sandvanDoesburg’spaintings.

Page 11: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 11/40

11/40

Left:PietMondrian.CompositionIIinred,blueandyellow.1930.

Right:TheovanDoesburg.SimultaneousCounterComposition.1929.

Butwithlargenumberofimages,whichhavesmallerdifferences,wecannolongerperformthese

tasks.Thefollowingexampleillustratesthisproblem.Thefirstcompositeimagecontainsallpages

fromasinglechapterof AbarabyTsutomuNihei.Thesecondcompositeimagecontainsallpages

fromasinglechapterofBioMegabythesameartist.Do AbaraandBioMegasharethesamestyle,or

dotheyhavesubtlebutimportantdifferences(besidethesizeofthepanels)?Forexample,doesone

titlehasmorestylisticvarietythantheother?Whichpageineachtitleisthemostunusual

stylistically?Evenwiththissmallnumberofmangapages,thesequestionsarealreadyimpossible

toanswer.

Page 12: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 12/40

12/40

Achapterfrom AbaramangabyTsutomuNihei.

AchaperfromBioMegamangabyTsutomuNihei.

Summary:evenwithasmallimagesample,wemaynotbeabletonoticesmallvisualdifferences

betweenimages.

3. AbaraandBioMegahaveonlyafewhundredpageseach .Outof883mangaseriesinour

collection,297seriescontainover1000pages,whileanumberofserieshavemorethan10,000

pageseach.Ifwehavedifficultycomparingonlyafewdozenpagesfrom AbaraandBioMega,how

canwepossiblydothiswithmangaserieswhicharemuchlonger?

Annotation/contentanalysismethodswillnothelphere.Touseit,weneedtohaveenoughtagsto

comprehensivelydescribevisualcharacteristicsofmanga.However,creatingavocabularywhich

wecanusetolabelalltypesofvisualdifferencesinmanga–orinanyotherformofvisualculture‐

isproblematicbyitself.Wedonothaveenoughwordsinournaturallanguagestoadequatelydescribevisualcharacteristicsofevenasinglemangaimage –letaloneallotherkindsofhuman‐

createdimages.ConsiderasamplepagefromVampireKnight (theimageontheleftinthefirst

illustrationabove).Canwedescribeallvariationsinbackgroundtexturesinitsfourpanels?Orthe

differencesbetweentherenderingofhairineachofthesepanels?

Thisisthethird  problemwithstudyingvisualart,visualculture,andvisualmediausingtraditional

humanitiesapproaches.Regardlessofthemethodologiesandtheoriesbeingemployedinagiven

case,allofthemuseonerepresentationalsystem(anaturallanguage)todescribeanother(images).

Butasthelastexampleshows,wewillnotbeabletogivenamestoallofthevariationsoftextures,

compositions,lines,andshapesusedeveninasinglechapterof Abara,letaloneonemillionmanga

pages.Wecanproceedwithtraditionalapproachessolongaswelimitourselvestodiscussing

Page 13: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 13/40

13/40

mangaiconographyandotherdistinctvisualelementswhichhavestandardizedshapesand

meanings:waterdropssignifyingstress,throbbingveinssignifyinganger,andsoforth.Butifwe

actuallywanttostartdiscussingarangeofgraphicalandcompositionalpossibilitiesusedacrossin

manga,weneedanewkindofinstrument.Thisfundamentallimitationappliestoallothervisual

formsdevelopedbyhumanbeings,betheypaintings,drawings,photographs,graphicdesigns,web

pages,visualinterfaces,animations,etc.

Summary:naturallanguagesdonotallowustoproperlydescribeallvisualcharacteristicsofimages,ornamealltheirpossiblevariations.

METHOD:DIGITALIMAGEPROCESSING+VISUALIZATION

Toaddressthesechallenges,wedevelopedasetofmethodsandtechniques,whichtogetherwecall

CulturalAnalytics .Thekeyideaofculturalanalyticsistheusevisualizationtoexplorelargesetsof

imagesandvideo.Thesevisualizationscanuseexistingmetadata(forexample,publicationdatesor

authornames)andalsonewmetadataaddedbyresearchersviaannotationorcoding.However,as

wealreadydiscussed,addingtagsorotherannotationsmanuallyhasseriouslimitations:our

naturalvisualsystemcan’tnoticesubtlevisualdifferencesbetweenalargenumberofimages,and

ournaturallanguagesdonothavetermstodescribeallvisualcharacteristicsofimages,ornametheirpossiblevariations.

Toovercometheselimitations,ourcoremethodusesdigitalimageprocessingandanewtypeof

visualization.Thissectiondescribesthismethod,andthenextsectionsapplyittoprogressively

largernumbersofimagesdrawnfromouronemillionmangadataset.(Forthedescriptionofour

othervisualizationmethods,seeManovich,MediaVisualization).

Themethodinvolvestwosteps:

1.Weusedigitalimageprocessingtoautomaticallymeasureanumberofvisualcharacteristics(i.e.,

features)ofourimages.Inthisprocess,visualqualitiesaremappedintonumbers.(Incomputer

science,thisstepisoftenreferredtoas“featureextraction.”)Forexample,inthecaseofgreytonesmeasuredon0‐255scale,blackisrepresentedas0,whiteas255,and%50greyas127.5.The

examplesofdimensionsthatcanbemeasuredincludecontrast,presenceoftextureandfinedetails,

numberoflinesandtheircurvature,numberandtypeofedges,sizeandpositionsofshapes,andso

on.Inthecaseofcolorimages,wecanalsomeasurethecolorsofallpixels(hue,saturation,

brightness),determinemostfrequentlyusecolors,andcalculatevariousimagestatisticsforR,G,B

colorcomponents(R,G,B)separately.

Typically,suchmeasurementsproducemanynumbersforeachvisualdimensionofanimage.For

instance,everypixelwillhaveitsgrayscalevalue.Ifwemeasureorientationsofalllinesinan

image,wewillendwithaseparatenumberforeveryline.Inordertobeabletocomparemultiple

imageswitheachotheralongparticulardimensions,itisconvenienttousetheaveragesofthe

measurementsoneachdimension.

Forexample,ifweareinterestedingreyscalevalues,wesumvaluesofallpixelsanddividethem

bythenumberofpixels.Inthecaseoflineorientations,wesimilarlyaddanglesofalllinesand

dividethembythenumberoflines.

Besidesthissimpleaveragescalledmean,wecanalsouseothertypesofdescriptivestatisticsto

summarizeimagecharacteristics.Theyincludedifferentrepresentationsofcentraltendencyina

data(median,mode,etc.)andofdatadispersion(variance,standarddeviation,etc.)

Herearetheexamplesofsuchstatisticscalculatedforthesamplemangapageswhichalready

appearedintheprevioussection.

Page 14: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 14/40

14/40

Left:apagefromVampireKnight .Mean:164.68.Standarddeviation:102.25.Middle:apagefromOnePiece.Mean:214.58.Standarddeviation:75.65.

Right:apagefromNaruto.Mean:199.96.Standarddeviation:90.92.

Bothmeanandstandarddeviationarecalculatedusinggreyscalevaluesofallpixelsinapage.

VampireKnight page(left)hasthelowestaveragegrayscalevaluebecauseofitsshadedareas,thick

blackcontoursandtype,andtheblackareaatthebottom.OnePiecepage(middle)hasthehighest

averagegrayscalevaluebecauseitusesblackquitesparingly.Thestandarddeviationvaluesare

reversed:VampireKnight pagehasthehighest,whileOnePiecepagehasthelowest.VampireKnight 

pagehaslotsofblack,gray,andwhite,andthispushesupitsstandarddeviationmeasurement.One

Piecehaslotsofwhiteandjustalittlebitofblack,andthismakesitsstandarddeviationlow.

2.Wecreate2Dvisualizationsthatpositiontheimagesaccordingtotheirfeaturevalues.Forexample,wemayusehorizontaldimension(X‐axis)torepresentgrayscalemean,andvertical

dimension(Y‐axis)torepresentgrayscalestandarddeviation.Theseimagefeaturescalculatedby

softwareinstep1becameimagecoordinatesina2Dspace.Inthisway,thedifferencesbetween

imagesalongaparticularvisualdimensionaretranslatedintotheirpositionsinspace –something

thathumanvisualsystemcanreadverywell.

Toillustratethis,wevisualizethethreemangapagesfromthepreviousexampleusingtheirmean

andstandarddeviationfeatures.

Page 15: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 15/40

15/40

AplotofsamplepagesfromVampireKnight ,OnePiece,andNaruto.Meanandstandarddeviationofapage

areusedasitsXandYcoordinates.

(Bothmeanandstandarddeviationarecalculatedusinggreyscalevaluesofallpixelsinapage.)

Thisplotusesonlytwoverybasicfeatures.Thereforewecan’texpectittocaptureeveryvisual

differencebetweenthesethreeimages.Still,evenwithonlytwofeatures,thespatialpositionsand

thedistancesbetweentheimagesin2Dspacereflectwelloursenseofoverallvisualdifferences

betweentheimages:OnePieceandNarutopagesarecloseby;VampireNight pageisfurtheraway.

MeasuringvisualfeaturesandthenmappingthesefeaturesintoXandY‐axisallowsustoseparate

theoverallperceiveddifferenceintoseparatedimensions.Inthisplot,weuseaverageandthe

spreadofgrayscalevalues(i.e.,meanandstandarddeviation),butwecanalsousemanyother

dimensionscanbealsoused.Thisprocessisnotunlikehowhumanvisualperceptionfunctions.

Humanvisualsystemanalyzesvisualinputseparatelyintermsofdifferentcharacteristics:contrast,

texture,shape,color,andmotion.Mostpsychologistsandneuroscientistsbelievethatthebrain

combinesthisinformationtoarriveatperceptualwholes.Varioustheorieshavebeenproposedto

explainthedetailsofthisprocessandto.AninfluentialtheoryofattentiondevelopedbyAnne

TreismanandGarryGeladesuggeststhatdifferentfeaturesanalyzedattheearlystagesof

perceptionare“bindedtogether”intoconsciouslyexperiencedwholes.AnothertheorybyL.Ward

proposedtheneuralmechanismresponsibleforthebindingoffeatureswhichcodeshape,motion,

color,depthandotherperceptualaspects.SomeofthefeatureswemeasuresuchasGaborfilters

arethoughttobetheexactequivalentsofthefeaturesanalyzedbythebrain;otherscanbe

understoodasbeingequivalenttothecombinationofmultiplefeaturescomputedbythebrain.

Page 16: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 16/40

16/40

Aswediscussedearlier,whenabrainisconfrontedwithanumberofverysimilarimages,oravery

largenumberofimages,itnolongercancomputethesedifferencesreliably.Whenwemeasure

featuresinimagesetsandvisualizingimagesaccordingtothefeaturevalues,weessentially

augmentshumanperception;thatis,wescaleupitscapacitytojudgevisualdifferences.

Combinationofdigitalimageanalysisandvisualizationmakespossibletobypasstheproblem

whichhauntedvisualsemioticsinparticular,andalllinguisticdescriptionsofthevisualingeneral:theinabilityoflanguagetoadequatelyrepresentallvariationswhichimagescancontain.For

instance,evenifweusehundredsofwordsforcolors,imagescanshowmillionsofcolorvariations.

Andcolorisnottheworstcase;forotherdimensionssuchastextureorlinecharacter,theterms

providebynaturallanguagesaremuchmorelimited.

Inotherwords,oursensesareabletoregisteramuchlargersetofvaluesonanyanalogdimension

‐loudness,pitch,grayscale,color,motion,orientation,size,etc.–thanourlanguageshavewords

for.Thismakessensebecauselanguagehasdevelopedmuchlaterevolutionarytosupplementthe

senses.Languagedividesthecontinuousworldintolargerdiscretecategoriesthatmakespossible

abstractreasoning,metaphors,andotheruniquecapacities.Itisnotdesignedtoexactlymapthe

wealthofoursensoryexperienceintoanotherrepresentationalsystem.

Soifwecan’trelyonanaturallanguagetocapturewhatoursensescanregister–andwecan’trely

onthesensesbecause,aswediscussedintheprevioussection,theyarenotabletoregistervery

subtledifferencesbetweenimages,orotherculturalartifacts–howcanwetalkaboutvisualculture

andvisualmedia?

Ourapproachistousevisualizationasanewdescriptivesystem.Inotherwords,wedescribe

imageswithimages.Indoingthis,wearetakingadvantageoftheabilityofimagestoregistersubtle

differencesonanyvisualdimension.

Notethatourmethoddoesnotimplythatwearegettingridofdiscretecategories.Rather,instead

ofbeinglimitedtoafewprovidedbylanguage,wecannowdefineasmanyasweneed.

Forexample,letssaythatwewanttodescribegrayscalelevelsinanimage.Weusesoftwaretoread

pixelvaluesfromanimagefile,andcalculateanaveragevalue.Thisaverageisthenusedtoposition

theimageinthevisualizationalongXorY‐axis.

Common8‐bitand24‐bitimageformatssuchasJPEGandPNGuse256discretevaluestorepresent

grayscalelevels.Thisgivesus256separatecategoriesforgrayscalevalues.Thesecategoriesdonot

havedistinctnames.Buttheywork‐theyallowustocomparemultipleimagesintermsoftheir

grayscalevalues.

Wearenotlimitedto256categories–ifwewant,wecanuse1000,10,000,oranyothernumber.

Howdoesthiswork?Whenwecalculatetheaveragevalueofallintegergrayscalepixelvalues,we

getarealnumber.Forinstance,ifourimagecontainsfourpixelswithgrayscalevalues103,106,

121,and112,theaverageofthesevaluesis102+107+127+113)/4=109.75.Ifweroundthesevaluesusingonedecimalplace,wewillhave256x10=2,560distinctcategories.Ifwekeeptwo

decimalplaces,wewillhave25,600distantcategories.Inreality,wedon’tneedtogothathigh,

sincehumanperceptioncan’tevenseethedifferencebetweentwograylevelswhicharenextto

eachother(forinstance,127and128)on0‐256scale.

Whileweutilizeavarietyofvisualizationtechniques,thekeytechniqueusedbyourmethodisa

scatterplot ,i.e.a2DvisualrepresentationthatusesCartesiancoordinatestodisplaytwosetsof

numericalvaluesdescribingdata.Inourcase,eachelementofadatasetisanimage,andthetwo

valuesthatdetermineitspositioninaplotaretwomeasuredvisualqualities(features),suchas

averagegrayscaleandstandarddeviation.

Page 17: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 17/40

17/40

Equallyoftenweuse linegraphswheretheX‐axisrepresentsthedatestheimageswerecreated(or

theirpositionsinanarrativesequencesuchasacomicbook),andY‐axisrepresentssomemeasured

value(suchasaveragesaturation).

Alongwithregularscatterplotsandlinegraphs,wealsouseanewvisualizationtechniquewhich

wecallimageplot.Anormalscatterplotandalinegraphdisplaythedataaspointsandlines.An

imageplot superimposesimagesoverdatapointsinagraph.

Thefollowingvisualizationsof127PietMondrian’spaintingsillustratethedifferencebetweena

scatterplotandanimageplot.

127paintingsbyPietMondrian(createdbetween1905and1917)visualizedasascatterplot(left)andas

imageplot(right).X‐axis=brightnessmedian.Y‐axis=saturationmedian.

(Technicaldetails:Wealsousemoreadvancedvisualizationtechniquessuchasscatterplotmatrix

andparallelcoordinates,andmultivariatedataanalysistechniquessuchasPCA,clusteranalysis,

andsoon.However,sincetheconceptsofamulti‐dimensionalfeaturespaceanddimension

reductionaremoreabstract,inthischapterallourexamplesare2Dvisualizationswhereeachaxis

correspondstoasinglefeaturesuchasmeangrayscalevalue,ormetadatawhichwascollected

alongwiththedata‐suchasapositionofapagewithinthesequenceofallpagesinamangatitle.In

general,weprefertousesinglefeaturesforXandY‐axisiftheirgraphrevealsinterestingpatterns

andiftheirmeaningiseasytoexplain;incontrast,itisoftendifficulttointerpetthedimensionsina

graphwhichusesPCAorothermultivariatemethods.)

Whilethetechniquewecall“imageplots”hasalreadybeendescribedinanumberofarticlesinComputerScience(seePeters,MultiMatch),ithasnotbeenmadeavailableinanygraphingor

visualizationapplication.Ourlabdevelopedsoftwaretocreateimageplots;weusethissoftwarein

allourprojects,andalsodistributeitasopensource(SoftwareStudiesInitiative,ImagePlot ).The

softwarerunsonregularWindows,MacandLunixdesktopsandlaptops.Workingwithourlan,

GravityLabatCaliforniaInstituteforTelecommunicationandInformation(Calit2),wealso

developedaninteractiveapplicationthatcangenerateimageplotswhichcancontainthousandsof

individualimagesinrealtime.TheapplicationrunsonscalabletileddisplayssuchasHiperSpace

(TheHighlyInteractiveParallelizedDisplaySpace)whichoffers35,840x8,000pixelsresolution

(287megapixels)on31.8feetwideand7.5feettalldisplaywallmadefrom7030‐inchmonitors

(Yamaoka,Manovich,Douglass,Kuester,CulturalAnalyticsinLarge‐ScaleVisualization

Page 18: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 18/40

18/40

Environments).Inthisarticle,allimageplotsaredonewithImagePlotsoftware;scatterplotsand

linegraphsaredonewithImagePlotandMondrian(freedatavisualizationsoftware).

Inthenextsectionsofthisarticle,wewillshowhowourmethodcanbeusedtocompareimagesets

rangingfromafewhundredtoonemillionimages. 

COMPARING ABARAANDNOISE (474PAGES)

Havingintroducedourmethod–visualizingimagesasascatterplotaccordingtoquantitative

descriptionsoftheirvisualproperties(features)measuredwithdigitalimageprocessing

techniques–letsnowapplythismethodtoourmangadataset.

Tomakeourvisualizationexampleseasiertofollow,wewillusethesametwovisualfeaturesin

mostoftheexampesbelow.Thefirstfeatureisastandarddeviationofgreyscalevaluesofallpixels

inanimage.Standarddeviationisacommonlyusedmeasureofvariability.Itshowshowmuchthe

dataisdispersedaroundtheaverage.Ifanimagehasabigrangeofgreyscalevalues,itwillhave

largestandarddeviation.Ifanimageemploysonlyafewgreyscalevalues,itsstandarddeviation

willbesmall.

Thesecondfeatureisentropy.Ininformationtheory,theconceptofentropywasdevelopedby

ClaudeE.Shannoninhisfamous1948paper"AMathematicalTheoryofCommunication”.Entropy

describesthedegreeofuncertaintyinthedata–i.e.,howdifficultorhoweasyitistopredictthe

unknowndatavaluesgiventhevalueswealreadyknow.Ifamimageconsistsfromafew

monochromeareas,itsentropywillbelow.Incontrast,ifanimagehaslotsoftextureanddetails,

anditscolors(orgreyscalevaluesinthecaseofablackandwhiteimages)varysignificantlyfrom

placetoplace,itsentropywillbehigh.

Inthefollowingexamples,wewillmapstandarddeviationvaluestoX‐axis,andentropyvaluestoY‐

axis.Wewillstartbycreatingimageplotsof  AbaraandNoisepages.Thefirsttitlehas291pages;

thesecondhas183pages.(Thiscountincludesallpages,whichwereavailableforthesetitlesononemanga.com,includingtitleandcreditpages.)

Page 19: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 19/40

19/40

 Abarapages.Artist:TsutomuNihei.

X‐axis=standarddeviationofgreyscalevaluesofallpixelsinapage.

Y‐axis=entropycalculatedovergreyscalevaluesofallpixelsinapage.

Page 20: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 20/40

Page 21: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 21/40

21/40

 Abarapages(left)andNOISE pages(right).Artist:TsutomuNihei.X‐axis=standarddeviationofgreyscalevaluesofallpixelsinapage.

Y‐axis=entropycalculatedovergreyscalevaluesofallpixelsinapage.

Thesevisualizationsofthetwotitlesside‐by‐sidemakeansweringthequestionsweaskedearlier

easy.Doesonetitlehasmorestylisticvarietythantheother?Yes, Abara’sstylevariesmuchmore

thanNOISE style:thepointsontheleftplotaremoredispersedthanthepointsontherightplot.

Whichpageineachtitleisthemostunusualstylistically?Eachplotrevealsanumberofoutliers–

i.e.pointsthatstandoutfromtherest.(Ofcourse,weshouldkeepinmindthatthetwo

measurementsweareusingintheseplots–i.e.standarddeviationandentropy‐onlycapturesome

dimensionsofavisualstyle.Ifweuseotherfeatures,differentpagesmaystandoutasoutliers.)

VISUALIZINGTEMPORALCHANGESIN  ABARAANDNOISE 

Mangaisasequentialart.Tounderstandifandhowvisualstyleinatitlevariesoverthesequenceof

itschaptersandindividualpages,wecangenerateimageplotswhereX‐axisrepresentsapage’s

positioninasequence,andY‐axisusessomevisualfeature.Ifwedothisfortwotitles,wecanthen

comparehowtheyvaryintimeinrelationtothisfeature.

Whenwemakeimageplotsusingthisapproach,theycanbecomeverylong.Forexample,letssay

wewanttographasequenceof10,000imagesandmakeeachimage100pixelswide.Such

visualizationwillbe10,000x100=1,000,000pixelswide.Toaddressthisproblem,our

visualizationsoffeaturefilmsrepresenteachshotbyasingleframe.Youcanfindexamplesofsuch

plotswemadeforwholefilmsonourFlickrgallery.Forshortermangaseries,wewilluseboth

imageplotswhichshowallpages,andlinegraphsthatrepresenteachpagebyapoint.Theformer

areeasiertoread;thelatteraresometimesmoreeffectiveinrevealingpatterns.Forlongertitles,

wewilluselinegraphs,sinceimageplotswouldverylong.(Alinegraphdoesnothavetouselines

toconnectthedata.Thedifferencebetweenalinegraphandascatterplotisthattheformer

assumesthatdatavaluesmappedontoX‐axisareseparatedbythesameinterval,i.e.1,2,3,…A

scatterplotdoesnotmakethisassumption).

Page 22: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 22/40

22/40

First,wecompare Abara(318pages)andNoise(140pages)usingasimplestvisualfeature:amean

ofgrayscalevaluesofallpixelsinapage(Y‐axis).Themeanvalueindicatesarelativeproportionof

white,blackandgreyareasinapage.Wenumberedallpagesstaringfromthecoverpageand

endingwiththelastpageoftheseries.(Wehaveremovedextrapagesinsertedbyfanswhoscanned

andtranslatedtheseseries;wealsoremoved“bonus”chapters.)Thepagesarepositionedleftto

rightusingthislinearsequence(Y‐axis).

InJapanamangaseriesfirstpublishedinmagazineswhichmaycomeoutweekly,bi‐weekly,ormonthly,andcontainnewchaptersforanumberofseries.Later,anumberofalreadypublished

chaptersofsuccesfulseriesisprintedinaseparatevolume(tankōbon).Theseriestranslatedinto

otherlanguagesarepublishedinsimilarformat(anumberofchapterscollectedinasinglebook)

Globalfanswhoreadscanlatedmangaonthewebcanalsogothoughasequenceofchaptersallat

one.Avisualizationwhicharrangesallpagesinashortseriessuchas  AbaraorNOISE assumesthis

readingmode.

 Abara(318consequitivepagesin11chapters).X‐aixs=pagepostionintheseries.Y‐axis=grayscalemean.

NOISE (140consequitivepagesin6chapters).X‐aixs=pagepostionintheseries.Y‐axis=grayscale

mean.

ThetwovisualizationsusethesamescaleforX‐aixs,andthisiswhyfirstgraph( Abara,318pages)

Page 23: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 23/40

23/40

istwiceaslongasthesecond( NOISE ,140pages).Thepointscorrespondingtoindividualpagesare

connectedbylinesinordertomakethepatternseasiertosee.

Comparingthetwoseriescreatedbythesamearistsidebysideshowssimilaritiesanddifferences

intheirgraphicaldevelopment. Abara’sgraphgraduallygoesupalmostinalinearfashionuntilthe

lastfewpages.NOISE graphalsoslightlygoesuptillthemiddleofchapter5,andthengoesdown.

However,itsaveragegrayscalelevelsneverraiseasmuchasthoseof Abara.

Tohelpusunderstandthegraphicalpatternsineachchapter,graphicdesignerOngKianPeng

(MultimodalAnalysisLab,SingaporeNationalUniversity)visualizedthesequenceofNOISE pages

asabarchart.Eachchapterismarkedinitsowncolor;thechaptersarealsoseparatedby

horizontalintervals.Thegraphalsoomitscoversthechapterscoverstofocusonthepatternsinthe

regularnarrativepages.

NOISE .X‐aixs=pagepostionintheseries.Y‐axis=grayscalemean.Eachchapterisshowninaseparatecolor.

Chaptersareseparatebywhitegaps;thetitlepagesareomitted.

ThenexttwovisualizationscompareAbamaandNOISE stylisticdevelopmentonlowdetail/texture

–highdetail/texturedimensionasrepresentedbyentropyfeature.(HeretheplotofNOISEisscaled

tothesamewidthastheplotofAbaratomakepatternsmorevisible.)

 Abarapages.

X‐axis=pagesequentialpositioninthetitle(lefttoright).

Y‐axis=entropycalculatedovergreyscalevaluesofallpixelsinapage.

Page 24: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 24/40

24/40

NOISE pages.

X‐axis=pagesequentialpositioninthetitle(lefttoright).

Y‐axis=entropycalculatedovergreyscalevaluesofallpixelsinapage.

Inbothtitles,entropyvalueschangeovertime.ThepatternofthesechangesinNOISE canbe

describedasalinearslightshiftdownward.Thepatternin  Abaraismorecomplex.Itcanbe

characterizedasahorizontalline,followedbyacurvewhichfirstgoesdownandthengoesup.

Thetwotemporalpatternsalsohaveaninterestingstructuralsimilarity.Ineachgraph,therange

betweenthetopandbottompoints(i.e..thedifferencebetweentheirvaluesonY‐axis)gradually

increases.Thismeansthatstylisticallythepagesareatfirstareprettysimilar,butbecomemore

variedovertime.(Again,keepinmindthatwearedescribingonlyonestylisticdimension.)

Toillustrateinterestingtemporalpatternsthatcanberevealedusingsuchlinegraphsoffeature

values,wewillgooutsideofouronemillionmangapagesforthenextexample.Wewilllookatthe

webcomicFreakangels.Ourdatasetconsistsfrom342consecutivepagespublishedover15months

(2008‐2009)insix‐pagechaptersappearingweekly.Likeinpreviousgraphs,weuseY‐axisto

representasinglevisualfeature,andreserveXaxisforpagespublicationorder.

Inthefollowingvisualization,Ypositionofeachpageisdeterminedbythemean(average)ofthegreyscalevaluesofallpixelsinthepage.

342consecutivepagesofthewebcomicFreakangels(2008‐2009).

X‐axis=pagepublicationorder.

Y‐axis=meanofgreyscalevaluesofallpixelsinapage.Greyscalevalue=(Rvalue+Gvalue+Bvalue)/3.

Page 25: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 25/40

25/40

Aclose‐upofvisualizationsofthewebcomic Freakangels.

Despitetheweeklyintervalsthatseparatethe6‐pageepisodesofFreakangels,visualizationshows

thatvisualvaluesofthepagesareremarkablyconsistent.Forthelargerpartofthepublication

period,thechangesinaveragegrayscalevaluesfollowasmoothcurve(thesameappliestohueand

saturationifweplotthem).Whiletheoverallchangefromlighttodarkimagescorrespondstothe

developmentofthestoryfromdaytonight,thefactthatthegrayscalevaluesshiftsverygradually

andsystematicallyovermanymonthsisagenuinediscovery.Visualizationrevealsthisunexpected

patternandallowsustoseetheexactshapeofthecurve.

COMPARINGVAMPIREKNIGHT ANDONEPIECESAMPLES(2,744PAGES)

 AbaraandNoisetitlesarequiteshort:291pagesand183pages,respectively.Howdoesourmethod

scaleforlongermangaseriessuchasVampireKnight (57chapters;1423pages)andOnePiece(563chapters;9745pages)?

VampireKnight publicationstarted inJanuary2005;OnePiecestartedmuchinAugust4,1997.This

explainsthedifferencesinthenumbersofchaptersandtotalpagesinourdownload.Tomakea

comparisonmoremeaningful,wewillonlyuseonlyapartofOnePiecedataset:481‐563chapters

thatcontain1321pages.Asintheearlierexample,wewillvisualizethesetwosetsofpages

accordingtostandarddeviation(X‐axis)andentropy(Y‐axis).Wegraphthedatausingscatter

plots;eachpageisrepresentedbyapoint.

Page 26: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 26/40

26/40

1423VampireKnight pages(left)and1321OnePiecepages(right).X‐axis=standarddeviation.Y‐axis=entropy.

Ineachgraph,X‐axisandY‐axisstartandendvaluesaresetasfollows:

Std:min=0.000000,max=126.602400(largestpossiblevaluewhichcanbeobserved).

Entropy:min=‐0.000000,max=7.962000(largestpossiblevaluewhichcanbeobserved).

Wecannoticethatthecenterofthepointsintheleftgraphishigherthanthecenterofthepointsin

therightgraph.RecallthatY‐axiscorrespondstolowtexture/detail–hightexture/detail

dimension.Thisconfirmswhatwecanseeinthetwosamplepages:VampireKnight pagehasmore

shadinganddetailthanOnePiecepage.However,sinceeachsetofpointsalsoextendssignificantly

alongY‐axis,itisclearthatwewereluckyinourchoice.Wecouldhaveeasilyselecteddifferentpagesthatwouldleadustoanoppositeideaaboutthegraphicdifferencebetweenthetwosets.

Toconfirmourobservationabouttheverticaldifferencebetweenthecentersofthetwosetsof

points,wecalculatetheactualaveragesofthedatavaluesprojectedontoY‐axis(entropy).

VampireKnight :meanofentropymeasurementsof1423pages:5.1.

OnePiece:meanofentropymeasurementsof1321pages:5.6.

(Theaveragesareroundedtoonedigit). 

Toputthisdifferenceinperspective,itisusefultoknowthemaximumpossiblevalueofentropy

measurementofasingleimageis7.962.Thismeansthatthisvaluerepresents%6.4ofthetotal

possiblerange.

Page 27: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 27/40

27/40

1423VampireKnight pages.X‐axis=standarddeviation.Y‐axis=entropy.

Page 28: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 28/40

28/40

1430OnePiecepages.X‐axis=standarddeviation.Y‐axis=entropy.

 

COMPARINGNARUTOANDONEPIECE(17,782PAGES)

WhenweputsidebysidetwosamplepagesfromNarutoandOnePiecemangaseries,wenoticed

thatthedifferencesintheirgraphicalstylesweremoresubtlethanbetweenthesamplepagesfrom

VampireKnight andOnePiece.Canwebetterunderstandthesemoresubtledifferencesusingour

method?

ItisalsointerestingtocompareNaruto(1999‐)andOnePiece(1997-)becausetheyamongmost

popularmangaseriesamongglobalreaders.Thesetitlesareratedno.1andno.3among

OneManga.comglobalreaders(OneManga).WhenwedownloadedthepagesfromOneManga.com

intheFall2009,thefirstserieswaspublishedcontinuouslyfor10years,andthesecondwas

publishedfor12years.Accordingly,ourdownloadcontained8037Narutopagesand9745One

Page 29: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 29/40

29/40

Piecepages.

Thefollowingplotscomparethesetwosetsofpages.

Left:8037Narutopages.Right:9745OnePiecepages.Eachpageisrepresentedbyapoint.

X‐axis=standarddeviationofgreyscalevaluesofallpixelsinapage.

Y‐axis=entropycalculatedovergreyscalevaluesofallpixelsinapage.

Page 30: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 30/40

30/40

8037Narutopages.X‐axis=standarddeviation.Y‐axis=entropy.

Page 31: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 31/40

31/40

9745OnePiecepages.X‐axis=standarddeviation.Y‐axis=entropy.

Projectinglargenumberofpagesfromthetwoseriesintothesamecoordinatespacehelpsusto

betterunderstandthesimilaritiesandthedifferencesbetweentheirgraphicalstyles.The

visualizationsshowthatalongthetwovisualdimensionsusedg,thedistinctionsbetweenthe

languagesofthetwoseriesarequantitativeratherthanqualitative.Thatis,the“pointcloud”of

Narutopagessignificantlyoverlapswiththe“pointcloud”ofOnePiecepagesbothalongandvertical

axis.

Atthesame,thedifferencesbetweenthemarelargerthanacasualexaminationofonlytwopages

implies.Thevisualizationsrevealthatbothseriescoveralargerangeofgraphicalpossibilities:

fromsimpleblackandwhitepageswithminimaldetailandtexture(lowerpartofeach

visualization)tothehighlydetailedandtextured(toppart).ButthecenterofOnePiecepointcloud

isslightlyhigherthanthecenterof Narutopointcloud.ThismeansthatOnePiecehasmorepages

thathavemoretexturesanddetailsthatOnePiecepages.

Page 32: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 32/40

32/40

Visualizationsalsorevealthesignificantdifferencesinthegraphicalvariabilitybetweenthetwo

series.Naruto’s“pointcloud”ismuchmoredispersethanOnePiece“pointcloud”bothon

horizontalandverticaldimensions.ThisindicatesthatNaruto’svisuallanguageismorediverse

thanthevisuallanguageof OnePiece.(Wealreadysawasimilardifferencewhenwecompared

 AbaraandNOISE –butnowweareseeingthisinamuchlargerdataset.)

Wecanalsoexaminethestylisticdevelopmentoftheselongseriesoverthetimeofpublicationin

thesamewayusedformuchshorter AbaraandNoise.Thefollowinggraphplots9745OnePiecepageslefttorightintheorderofpublication;theverticalpositionisdeterminedbypagegrayscale

mean.Belowthegrapharethethreesamplepagesfromwhichwealreadyreferedtoearlier.

9745OnePiecepages(562chapters).

X‐axis=pagepositioninpublicationorder(lefttoright).

Y‐axis=meanofgreyscalevaluesofallpixelsinapage.

SamplepagesfromOnePiecemangaseriesdrawnfromthe5th,200th,and400 thchapters.

Sincewenowdealingwiththousandsofpagesin562weeklychapterspublishedover12years,we

candiscusstemporalpatternsatnumberofscales.Onthescaleofyears,OnePiecemeanvalues

Page 33: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 33/40

33/40

graduallydriftoverthewholetimeperiod.Withinthisoverallalmostlinearpattern,wesee

periodicraisesandfallsthatreversedirectionanywherebetween7and13months.Thus,weget

theanswertothequestionweaskedearlierwhenwecomparedthreesamplepagesdrawnfrom

the5th,200th,and400thchapters–how doesthevisuallanguageofOnePiecechangesovertime?

Thevisualizationalsorevealsthepatternsmaynotbevisibleifweonlyuseasmallnumberof

pages.Thethreepageswelookatearliermissedtheperiodicdropsingrayscalevalueswecansee

whenweplotallpages.Thedipscorrespondtotheflashbackpartsofthenarrativewhichplacethepanelsoverblackbackground,ascanbeseenintheclose‐upoftheearliermontageofall OnePiece

pages.

Aclose‐upofthecompletesetofOnePiecepagesarrangedinagridinpublicationorder(lefttoright,topto

bottom).

VISUALIZINGCOMPLETEMANGAIMAGESET(1,074,790PAGES).

Wecannowfinallyprovideananswertothequestioninthechapter’stitle:howtoseeonemillion

images?Usingthesamemeasurementsandaxesassignments(X‐axis=standarddeviation,Y‐axis–

entropy)aswedidintheplotsofindividualtitlesandseries,wevisualizeourcompletesetofone

millionpages.(Ofcoursewecanalsoorganizethisimagesetinmanyotherwaysusingmanyother

measurements–thisisjustonepossiblemapping.)

Page 34: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 34/40

34/40

Page 35: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 35/40

35/40

Page 36: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 36/40

36/40

Onemillionmangapagesorganizedaccordingtoselectedvisualcharacteristics.

X‐axis=standarddeviationofgreyscalevaluesofallpixelsinapage.

Y‐axis=entropycalculatedovergreyscalevaluesofallpixelsinapage.

Topimage:completevisualization.

Middleimageclose‐upofthetoppart.

Bottomimage:close‐upofthebottomleftcorner..

Notes:

1)Someofthepages‐suchasallcovers‐areincolor.Inordertobeabletofitallimageintoasinglelarge

image(theoriginalis44,000x44,000pixels‐scaledto10,000x10,000forpostingtoFlickr),werendered

everythingingraycale.

2)Becausepagesarerenderedontopofeachother,youdon'tactuallyseeonemillionofdistinctpages–

rather,thevisualizationshowsadistributionofallpageswithtypicalexamplesappearingonthetop.

Onemillionpagescoverthespaceofgraphicalpossibilitiesmorebroadlyandwithmoredensity

thanNarutoorOnePiecealone.Inbetweenthefourgraphicalextremescorrespondingtotheleft,

right,top,andbottomedgesofthepages“cloud,”wefindeverypossibleintermediategraphic

variation.Thissuggeststhatmanga’sgraphiclanguageshouldbeunderstoodasacontinuous

variable.

This,inturn,suggeststhattheveryconceptofstyleasitisnormallyusedmaybecomeproblematic

thenweconsiderverylargeculturaldatasets.Theconceptassumesthatwecanpartitionasetof

worksintoasmallnumberofdiscretecategories.However,ifwefindaverylargesetofvariations

withverysmalldifferencesbetweenthem(suchasinthiscaseofonemillionmangapages),itisno

longerpossibletousethisconcept.Instead,itisbettertousevisualizationandmathematical

descriptionstocharacterizethespaceofpossibleandrealizedvariations.

Tobetterunderstandthedistributionofourdatasetwithinthespaceofallgraphicalpossibilities,

wecanrenderthedatafromthelastvisualizationusingpoints.Suchascatterplotisnotaseasyto

readasanimageplot,howeveritisbetterinshowingtheshapeofpagesdistribution.Theplot

showsthatthedistributionfollowsBell‐curvelikepattern:singledenseclusterswithgradualfall

offtothesides.Thepartsoftheplotwhichremainsblackrepresentthegraphicalpossibilitiesnot

realizedinourmangasample:imageswhicharealmostcompletelywhite(lowerrightcorner),and

imageswhichhavelargeareasofblackandsmallareasofwhite(leftthirdoftheplot).

Page 37: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 37/40

37/40

Onemillionmangapagesrenderedaspoints.

X‐axis=standarddeviationofgreyscalevaluesofallpixelsinapage.

Y‐axis=entropycalculatedovergreyscalevaluesofallpixelsinapage.

Thefactthatdigitalimageprocessingandvisualizationofonemillionmangapagesdatasetmake

usquestiontheverybasicconceptofhumanitiesandculturalcriticismisatleastasimportantas

anyparticulardiscoverieswecanmakeaboutthisdataset.Itillustrateshowcomputational

analysisofmassiveculturaldatasetshasapotentialtotransformourtheoreticaland

methodologicalparadigmsforstudyingculture.

Page 38: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 38/40

38/40

DEFAMILIARISATIONWITHCOMPUTERS

AlexanderRodchenko.PineTreesinPushkinPark.1927.(Gelatinsilverprint.)

Ourmethodologyreliesonstandardtechniquesofdigitalimageanalysiswhichstartedtobe

developedalreadyinthesecondpartofthe1950sandarenoweverywhere–indigitalcameras,

imageeditingsoftwaresuchasPhotoshop,automatedfactories,medicalimagingandallscience

fieldswhichuseimagesassourcesofdata(fromastronomytobiology).However,whenweadopt

thesetechniquesastoolsfortheculturalresearch,weshouldbeclearabouthowtheyanalyze

imagesandwhatdoesitmeaningeneraltoseethroughcomputer“eyes.”Sincethischapteris

focusedonmotivatingandexplainingourmethodingeneralterms,wewouldonlymakeone

observation.Whenwelookatimagesnormally,weexperiencealltheirvisualdimensionsatonce.

Whenweseparatethesedimensionsusingdigitalimageanalysisandvisualization,webreakthis

gestaltexperience.Beingabletoexamineasetofimagesalongasingularvisualdimensionisa

powerfulmechanismofdefamiliarization(“otstranenie”)–adeviceforseeingwhatwecouldhave

notnoticedpreviously.Ifavant‐gardephotographers,designersandfilmmakersofthe1920suchas

Rodchenko,Moholy‐Nagi,Eistenstein,andVertovweredefamiliarizingthestandardperceptionof

visiblerealityusingdiagonalframingandunusualpointsofview,nowwecanusesoftwareto

defamaliarizeourperceptionsofvisualandmediacultures.

Page 39: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 39/40

39/40

 ACKNOWLEDGMENTS

SoftwareStudiesInitiativeresearchreportedinthisarticlewasmadepossiblebythegenerous

supportprovidedbyCaliforniaInstituteforTelecommunicationandInformation(Calit2),Center

forResearchinComputingandtheArtsatUCSD(CRCA),andChancellorOfficeofUniversityofCalifornia,SanDiego(UCSD).Thedevelopmentofthecustomdigitalimageanalysissoftwareand

theprocessingofmangaimagesetonsupercomputersattheNationalEnergyResearchScientific

ComputingCenter(NERSC)wasfundedby2009HENHumanitiesHighPerformanceComputing

Award“VisualizingPatternsinDatabasesofCulturalImagesandVideo.”Thedevelopmentof

ImagePlotsoftwarewasfundedbyNEH2010‐2011DigitalStartuplevelIIgrant“Interactive

VisualizationofImageCollectionsforHumanitiesResearch.” 

CREDITS

ThevisualizationsanddataanalysisandvisualizationtoolsdevelopedbySoftwareStudies

Initiativearetheresultofsystematiccollaborationsbetweenthekeylabmembers:LevManovich,JeremyDouglass,WilliamHuber,andTaraZepel.HyperSpacesoftwareforinteractivemedia

visualizationwasdevelopedbySoYamaoka(Gravitylab,Calit2).Batchimageprocessingtoolswere

developedbySunsernCheamanunkul(PhDstudent,ComputerScienceandEngineering

Department,UCSD).

REFERENCES

Barry,Salt.“TheStatisticalStyleAnalysisofMotionPictures.”1974.FilmQuarterly,28,1:13‐22.

Barry,Salt. FilmStyleandTechnology:HistoryandAnalysis.London:Starword,1992.

Cha,Meeyoung,HaewoonKwak,PabloRodriguez,Yong‐YeolAhn,andSueMoon.“ITube,YouTube,

EverybodyTubes:AnalyzingtheWorld’sLargestUserGeneratedContentVideoSystem.”2007ACM

InternetMeasurementConference.Web.17July,2011.

<http://an.kaist.ac.kr/traces/papers/imc131‐cha.pdf >.

Cinemetrics.lv.BarrySalt’sDatabase.Web.August11,2011.

<http://www.cinemetrics.lv/satltdb.php‐cm>

Crandall,DavidJ.,LarsBackstrom,DanielHuttenlocher,JonKleinberg.Mappingtheworld'sphotos.

18thinternationalconferenceonWorldwideweb,2009.Web.July17,2011.

<http://www.cs.cornell.edu/~dph/papers/photomap‐www09.pdf >.

Douglass,Jeremy,WilliamHuber,LevManovich."Understandingscanlation:howtoreadone

millionfan‐translatedmangapages."ImageandNarrative,12,1(2011),190–228.Brussels.Web.

<http://lab.softwarestudies.com/2010/11/one‐million‐manga‐pages.html>.

Gaborfilter.“GaborFilter.”Wikipedia.Web.August11,2011.

<http://en.wikipedia.org/wiki/Gabor_filter>.

Herring,Susan,LoisAnnScheidt,InnaKouper,andElijahWright.ALongitudinalContentAnalysis

ofWeblogs:2003‐2004.Blogging,CitizenshipandtheFutureofMedia.London:Routledge,2006.

IMDb.IMDbDatabaseStatistics.Web.July17,2011.<http://www.imdb.com/stats>.

Page 40: 2011.How to Compare One Million Images

7/27/2019 2011.How to Compare One Million Images

http://slidepdf.com/reader/full/2011how-to-compare-one-million-images 40/40

40/40

Kwak,Haewoon,ChanghyunLee,HosungPark,andSueMoon.“WhatisTwitter,aSocialNetworkor

aNewsMedia?”The19thinternationalconferenceonWorldWideWeb,2010.July17,2011.Web.

<an.kaist.ac.kr/~haewoon/papers/2010‐www‐twitter.pdf >.

SoftwareStudiesInitiative.2011.ImagePlot.Opensourcesoftwareforvisualizationoflargeimage

andvideocollections.Web.<http://lab.softwarestudies.com/p/imageplot.html>

McLoud,Scott.UnderstandingComics:TheInvisibleArt.KitchenSinkPress,1993.

Manovich,Lev.“CulturalAnalytics:VisualingCulturalPatternsintheEraof‘MoreMedia’”.Domus,

Spring2009.Web.<http://lab.softwarestudies.com/2009/06/publications.html#CA_Publications>.

Manovich,Lev.2010."MediaVisualization:VisualTechniquesforExploringLargeMedia

Collections."MediaStudiesFutures,ed.KellyGates.Blackwell,forthcoming2012.July17,2011.

Web.< http://manovich.net/DOCS/media_visualization.2011.pdf>.

Moretti,Franco.ConjecturesonWorldLiterature.NewLeftReview1(2000),55–67.July17,2011.

Web.<http://www.newleftreview.org/A2094>.

OneManga.onemanga.comtop50mangalist.August,12,2011.

<http://www.onemanga.com/directory/top/>.

Peters,Carol,ed.MultiMatch:Technology‐enhancedLearningandAccesstoCulturalHeritage.

D1.1.3–StateoftheArtreportsection.2006.July17,2011.Web.

<http://www.multimatch.eu/docs/publicdels/D1.1.2.pdf >

Treisman,AnneandGarryGelade(1980)."Afeature‐integrationtheoryofattention."Cognitive

Psychology,Vol.12,No.1,pp.97‐136.

Ward,L.M.(December2003)."Synchronousneuraloscillationsandcognitiveprocesses".Trends

Cogn.Sci.7(12):553–9.

Williams,D.,N.Martins,M.ConsalvoandJ.Ivory.2009.“TheVirtualCensus:

RepresentationsofGender,RaceandAgeinVideoGames.”NewMedia&Society.11,

p.815‐834.

Yamaoka,S.,L.Manovich,J.Douglass,F.Kuester.2011.“CulturalAnalyticsinLarge‐Scale

VisualizationEnvironments.”SubmittedtoComputerGraphicsForum.