Recorded Future – A White Paper on Temporal Analytics

Post on 02-Apr-2018

216 views 0 download

Transcript of Recorded Future – A White Paper on Temporal Analytics

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 1/19

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 2/19

 

ontraditionaltextsearch,usingvariousalgorithmsbutreallylookingat

individualdocumentsinisolation.

Googlechangedthat,withitspublicdebutin1998.Google’ssecondgeneration

searchengineisbasedonideasfromanexperimentalsearchenginecalled

BackRub.AtitsheartisthePageRankalgorithm,andthisisthecoreofGoogle’ssuccess(togetherwithcleveradvertisingbasedrevenuemodels!).ThemainideaofthePageRankalgorithmistoanalyzelinksbetweenwebpages,andtorank a

pagebasedonthenumberoflinkspointingtoit,and(recursively)therankof

thepagespointingtoit.Thisuseofexplicitlinkanalysishasproventobetremendouslyusefulandsurprisinglyrobust(eventhoughGooglecontinuously

havetotweaktheiralgorithmstocombatattemptstomanipulatetherankingalgorithm).

RecordedFutureisdevelopingathirdgenerationanalyticsengine,whichgoesbeyondexplicitlinkanalysisandadsimplicitlinkanalysis,bylookingatthe

“invisiblelinks”betweendocumentsthattalkaboutthesame,orrelated,entitiesandevents.Wedothisbyseparatingthedocumentsandtheircontentfromwhattheytalkabout–the“canonical”entitiesandevents(yes,thismodelisheavily

inspiredbyPlatoandhisdistinctionbetweentherealworldandtheworldofideas).

Documentscontainreferencestothesecanonicalentitiesandevents,andweusethesereferencestorankcanonicalentitiesandeventsbasedonthenumberof

referencestothem,thecredibilityofthedocuments(ordocumentsources)

containingthesereferences,andseveralotherfactors(forexample,co-occurrenceofdifferenteventsandentitiesinthesameorinrelated

documentsisalsousedforranking).Thisrankingmeasure–calledmomentum–isouraggregatejudgmentofhowinterestingorimportantanentityoreventis

atacertainpointintime–notethatovertime,themomentummeasureof

coursechanges,reflectingadynamicworld.

Inadditiontoextractingeventandentityreferences,RecordedFuturealso

analyzesthe“timeandspacedimension”ofdocuments–referencestowhenandwhereaneventhastakenplace,orevenwhenandwhereitwill takeplace–since

manydocumentactuallyrefertoeventsexpectedtotakeplaceinthefuture.Wearealsoaddingmorecomponents,e.g.sentimentanalyses,whichdeterminewhat

attitudeanauthorhastowardshis/hertopic,andhowstrongthatattitudeis–

theaffectivestateoftheauthor.

Thesemantictextanalysesneededtoextractentities,events,time,location,

sentimentetc.canbeseenasanexampleofalargertrendtowardscreating“thesemanticweb”.

ThetimeandspaceanalysisdescribedaboveisthefirstwayinwhichRecorded

Futurecanmakepredictionsaboutthefuture–byaggregatingweighted

opinionsaboutthelikelytimingoffutureeventsusingalgorithmiccrowdsourcing.Inadditiontothis,wecanusestatisticalmodelstopredictfuture

happeningsbasedonhistoricalrecordsofchainsofeventsofsimilarkinds.

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 3/19

 

Thecombinationofautomaticevent/entity/time/locationextraction,implicit

linkanalysisfornovelrankingalgorithms,andstatisticalpredictionmodelsformsthebasisforRecordedFuture’stemporalanalyticsengine.Ourmissionis

nottohelpourcustomersfinddocuments,buttoenablethemtounderstandwhatishappeningintheworld.

RecordedFutureandBusinessIntelligence

Therehasbeenalongpathofinnovationinsystemsforbusinessintelligence–

tryingtohelpdecisionmakersincompaniesandorganizationsmakebetter,datadriven,decision.We’dliketothinkoftheseinthreegenerationsaswell:

•  Firstgenerationbusinessintelligencetools(BI)wereallaboutreporting

andOLAPcubes,typicallytakinghistoricalfinancial,sales,and

manufacturinginformationandorganizingforanalysis.Veryhelpful–butveryfocusedonprovidingarearmirrorviewoftheworld

•  Secondgenerationbusinessintelligencewasallaboutrealtime–hookingintorealtimedatasourcesaswellasrealtimeuserinteraction–allowing

decisionmakerstobothlookatverytimelydataaswellasadjustandinteractwithsuchviewsathighpace.

•  Thirdgenerationbusinessintelligence,wewouldliketobelieve,willbeallaboutlookingoutsidecorporationsandgeneratingdataandanalytics

fordecisionmakingbasedontheworld,notjustoldhistoricalenterprise

data.ThisisRecordedFuture.

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 4/19

 

RecordedFutureatWork

Toillustratetheseideas,we’llpresentasimpleexample.Assumewehaveasetofdifferentsourcesfromthenet,asillustratedinthispicture:

Fromthesesources,weharvestdocuments,eitherfromRSSfeedsorotherforms

ofwebharvesting.Anexampledatasetmightcontainthefollowingdocumentswithshorttextsnippetsinthem:

Ouranalysisfirstdetectsentitiesmentionedinthedocument,anddecideswhich

entitycategorytheybelongto(inthisexample,blueforCompanies,OrangeforPersons,andgreenforCities):

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 5/19

 

Next,eventsinvolvingtheseentitiesaredetected;inthisexamplefivedifferent

kindsofevents:

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 6/19

 

Thesearethecanonicalevents;wenowaddeventreferences/instancesderivedfromthedifferentdocuments(andthesameforentityinstances,butforthesake

ofgraphicalclaritythesearenotincludedinthesepictures):

Oncethisanalysisiscompleted,wecanactuallydispose1oftheoriginaltexts,sincewehavecompletedthetransitionfromthetexttothedatadomain:

1Wedokeepreferencestotheoriginaldocuments,butwedonotstoreanycopyoftheactualtext. 

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 7/19

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 8/19

 

SystemArchitecture

TheRecordedFuturesystemcontainsmanycomponents,whicharesummarized

inthefollowingdiagram:

Thesystemiscenteredroundthedatabase,whichcontainsinformationaboutallcanonicaleventandentities,togetherwithinformationabouteventandentity

references(sometimesalsocalledinstances),documentscontainingthese

references,andthesourcesfromwhichthesedocumentswereobtained.

Therearefivemajorblocksofsystemcomponentsworkingwiththisdatabase:

-  Harvesting–inwhichtextdocumentsareretrievedfromvarioussources

onthenetandstoredinthedatabase(temporarilyforanalysis,longertermonlyifpermittedbytermsofuseandIPRlegislation).

-  Linguisticanalysis–inwhichtheretrievedtextsareanalyzedtodetecteventandentityinstances,timeandlocation,textsentimentetc.Thisis

thestepthattakesusfromthetextdomaintothedatadomain.Thisisalsotheonlylanguagedependentcomponentofthesystem;asweare

addingsupportformultiplelanguagesnewmodulesareintroducedhere.

Weareusingindustryleadinglinguisticsplatformsforsomeoftheunderlyinganalyses,andcombinethemwithourownanalysistoolswhen

necessary.

-  Refinement–inwhichdataisanalyzedtoobtainmoreinformation;this

includescalculatingthemomentumofentities,events,documentsand

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 9/19

 

evensources(seenextsection),calculationofsentiment,synonym

detection,andontologyanalysis.

-  Dataanalysis–inwhichdifferentstatisticalandAIbasedmodelsare

appliedtothedatatodetectanomaliesinthedataandtogenerate

predictionsaboutthefuture,basedeitheronactualstatementsinthetextsorothermodelsforgeneralizingtrendsorhypothesizingfrompreviousexamples.

-  Userexperience–thedifferentuserinterfacestothesystem,includingthewebinterface,overviewdashboard,alertmechanisms,andtheAPIfor

interfacingtoothersystems.

Momentum

Tofindrelevantinformationintheseaofdataproducedbyoursystem,weneed

somerelevancemeasure.Tothisend,wehavedeveloped“momentum”–a

relevancemeasureforeventsandentitieswhichtakesintoaccounttheflowofinformationaboutanentity/event,thecredibilityofthesourcesfromwhichthat

informationisobtained,theco-occurrencewithothereventsandentities,andsoon.Momentumisforexampleusedtopresentresultsinmostrelevantorder,and

canalsobeutilizedtofindsimilaritiesbetweendifferenteventsandentities.

UserExperience

EndusersinteractwithRecordedFuturethroughaseriesofrichuser

experiences.Theanalyticsqueryinterfaceallowsuserstospecifyevents(suchas

“PersonTravel”),entities(suchas“HuJintao”)andtimeintervals(suchas“2009”or“AnytimeintheFuture”):

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 10/19

 

Theresultscanthenbeanalyzedinseveraldifferentviews(details,charts,

timelines):

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 11/19

 

Videosshowingtheuseofthesystemareavailableat:http://www.youtube.com/recordedfuture

Finally,enduserscaneasilysubscribetoemailalerts(calledFutures)

correspondingtointerestingqueries.Livevisualizationswithup-to-datedata

fromRecordedFuturecanalsobeembeddedinblogs,etc.

Futures

FuturesareawayofstoringanalyticquestionsandhavingRecordedFuture

monitorthemwithrespecttothecontinuousflowofdatafromtheworld.AnyqueryinRecordedFuturecanbeturnedintoaFutureattheclickofagreen

button:

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 12/19

 

WhenaFutureisdefined,thefrequencyofupdatescanbespecified(andofcoursechangedlater),andtheFuturecanalsobesharedwithothers:

FuturesarethendeliveredastheyaredetectedbyRecordedFuture,inarich

emailformatwhichworkswellonbothlargeandsmallscreendevices:

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 13/19

 

API

DeveloperscanaccessRecordedFuturedataandanalyticsthroughawebservicesAPI(documentationavailable

http://code.google.com/p/recordedfuture).Queriestothesystemareexpressed

usingjson(http://json.org/)andresultsareprovidedasjsonorcsvtext.TheAPIcanbeusedtointerfaceRecordedFuturewithstatisticssoftwaresuchasR

(http://www.r-project.org/)orvisualizationsoftwaresuchasSpotfire

(http://spotfire.tibco.com/),aswellasproprietaryanalyticsapplications.

ExamplesofapplicationsoftheRecordedFutureAPIinclude:

•  Algorithmictrading–usingtheRecordedFuturedatastreamtoenhance

automatedtrading/riskdecisionmaking,e.g.bymonitoringmomentumandsentimentdevelopmentofcompaniesinaportfolio.

•  Mediamonitoring–buildingnewapplicationsthatmonitorsocialaswell

astraditionalmediacoverageofacompany,industrysector,organization,orcountry.

•  Dashboards–usingtheRecordedFuturedatastreamtodisplaynovel,

externallyoriented,indicatorsoftheworld,likethefollowingverysimpleexample:

•  GeographicalinformationaccessedthroughtheAPIcaneasilybeusedto

presentresultsin3rdpartyapplicationssuchasGoogleMapsandGoogle

Earth:

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 14/19

 

AFinalWord

RecordedFuturebringsaparadigmshifttoanalytics,byfocusingontimeasanessentialaspectoftheanalyst’swork.Sophisticatedlinguisticandstatistical

analysescombinedwithinnovativeuserinterfacesandapowerfulAPIbringsnewopportunitiestobothhumananalystsanddevelopersof3rdpartyanalytics

systems.Wecontinuouslydevelopalltheseaspectsofoursystemtobringnewtoolsintotheanalysts’hands-thefuturehasonlyjustbegun!

"Thus,whatenablesthewisesovereignandthegoodgeneraltostrikeandconquer,andachievethingsbeyondthereachofordinarymen,isforeknowledge."

(fromTheArtofWarbySunTzu,Section13)

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 15/19

 

WHITEPAPERADDENDUM

Plato,theCave,andRecordedFuture

StaffanTruvé,Ph.D.

TounderstandthephilosophybehindRecordedFuture,itishelpfultoconsiderthefamous“caveallegory”byPlato:

Platoimaginesagroupofpeoplewhohavelivedchainedinacavealloftheirlives, facingablankwall.Thepeoplewatchshadowsprojectedonthewallbythings

 passinginfrontofafirebehindthem,andbegintoascribeformstotheseshadows. AccordingtoPlato,theshadowsareascloseastheprisonersgettoseeingreality.

Hethenexplainshowthephilosopherislikeaprisonerwhoisfreedfromthecave

andcomestounderstandthattheshadowsonthewallarenotconstitutiveofrealityatall,ashecanperceivethetrueformofrealityratherthanthemere

shadowsseenbytheprisoners.(en.wikipedia.org/wiki/Allegory_of_the_Cave)

(imagefromwww.thatmarcusfamily.org/philosophy/Course_Websites/Phil_Math/Photos/Cave.jpg )

Whatwereadinnewspapers,blogsetc.isnotunliketheshadowsonthewallofthecave–wegetreportsabouteventsintherealworld,andattempttousethat

informationtogetanideaaboutwhatisreallyhappening.Asgoodanalysts,wenaturallyconsultseveralsources,andweightogethertheinformationobtained

fromthem–alwayskeepinginmindthatsomesourcesaremorecrediblethan

others,andthusshouldbegivenhigherweight.Wecalltheevidencewegetfrom

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 16/19

 

differentreports“eventinstances”,andtherealworldeventstheyreportonwe

refertoas“canonicalevents”.

Acanonicalevent,inoursystem,isarepresentationofaparticularhappeningin

therealworld.Forexample,assumewereadthefollowingstatementintheNew

YorkTimes:

“BarackObamasaidyesterdaythatHillaryClintonwillbetravellingtoHaitinext

week”

Thisstatementdescribestwoevents:acanonical“Quotation”eventanda

canonical“PersonTravel”event.

Thequotationeventreferstoacanonicalentity,“BarackObama”,anda

statement“HillaryClintonwillbetravellingtoHaitinextweek”.Ithasanassociatedtime,“yesterday”.

The“PersonTravel”eventincludesreferencestotwocanonicalentities,“HillaryClinton”and“Haiti”,andhasanassociatedtime“nextweek”.

Notethat“yesterday”and“nextweek”arerelativetimes,andtoplacethemonanabsolutetimeaxisweneedtoknowwhentheentirestatementwasuttered.Let

usassumethatthestatementwasutteredonWednesday,March17th.Thenwe

mightrepresentthestatementpictoriallyinthefollowingway2:

2

Notethat“nextweek”isculturallydependant–intheUS,weeksbeginonSundayswhereasinmanyothercountriestheybeginonMondays!

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 17/19

 

Inoursystem,thisstatementwillberepresentedinthefollowingway:

Wehavethreecanonicalentities:BarackObamaandHillaryClinton,whicharePersonentities[bluerectangles],andHaiti,aLocationentity[greenrectangle].

Therearetwocanonicalevents[redovals]–“QuotationbyBarackObama”and

“PersonTravelofHillaryClintontoHaiti”.

Furthermore,thereareinstancesoftheseevents[pinkovals],whicharetagged

bythetimeortimeintervalduringwhichtheyareexpectedtohaveoccurredor

willoccur.

TheQuotationinstancealsohasareferencetothetextofthequoteandtothe

instanceoftheeventreferencedinthequote.

Finally,bothinstancesrefertothetextfragmentrepresentingtheoriginal

statement,andthefragmentreferstoitssource–theNewYorkTimes.

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 18/19

 

Multipletextdocuments,retrievedfromdifferentsources,canofcoursebeused

togatherevidenceofthesamecanonicalevent,i.e.,toprovidedifferentinstancesofthecanonicalevent.Severaldifferentcanonicalevents–andinstances–will

alsorefertothesameentities.Toextendourexample,let’saddthestatement:

“HillaryClintontomeetwithBanKi-MooninPortauPrinceonMarch23rd 

Therepresentationofour“worldknowledge”willthenbeupdatedto:

Isthisallweknow?Notreally!RecordedFuturealsomaintainsanontology3,

withadditionalinformationaboutcanonicalentitiesandtheirrelationships.Inthisparticularexample,thefollowinginformationcanbefoundinourdatabase:

3Ontologyisthephilosophicalstudyofthenatureofbeing,existenceorrealityin

general,aswellasthebasiccategoriesofbeingandtheirrelations.Traditionally

listedasapartofthemajorbranchofphilosophyknownasmetaphysics,ontologydealswithquestionsconcerningwhatentitiesexistorcanbesaidtoexist,and

howsuchentitiescanbegrouped,relatedwithinahierarchy,andsubdivided

accordingtosimilaritiesanddifferences.(http://en.wikipedia.org/wiki/Ontology)

7/27/2019 Recorded Future – A White Paper on Temporal Analytics

http://slidepdf.com/reader/full/recorded-future-a-white-paper-on-temporal-analytics 19/19

 

Combiningtheinformationderivedfromanalyzedtextandtheontologygivesus

thefollowingpictureforthisminimalexample.IntherealRecordedFuturedatabase,therearemillionsofeventinstances.Thisshouldgiveyouanidea

abouthowtherichnessofRecordedFuturedatacanhelpyouinanalyzingeventsintherealworld!

Additionalreadingonourblogs:

CompanyUpdates:http://blog.recordedfuture.com

Government&Intelligenceexamples:

http://www.AnalysisIntelligence.com

Finance&Statisticsexamples:

http://www.PredictiveSignals.com