Wed 1130 aasman_jans_color

Post on 20-Aug-2015

234 views 0 download

Tags:

Transcript of Wed 1130 aasman_jans_color

When a relational database  doesn’t work

And why a graph database might help

ContentsContents

• Franz and customers• Two Use Cases

– Amdocs: a real time semantic platform for telecom that knows everything about everyone in real time

– Real time news  and social network analysis using the Linked Open Data CloudLinked Open Data Cloud

• Scalability?• Integration with other NoSQL databases – Solr, MongoDBg , g

Franz Inc – Who We AreFranz Inc  Who We Are

• Private, founded 1984 • We are an AI and 

Semantic Technology company• Out of BerkeleyOut of Berkeley

(1 (2 3) (4 5) (6 7) (8 9) (10 11) (12 13) (14 15)(16 17) (18 19 20 21 22 23 24 27 28) (29 30))

Bob

AliceCraig

Bill

How is it different from an RDB d h i i fl ibl ?and why is it more flexible?

• No Schema. – Say whatever you want to say but– ontologies may constrain what you put in triple store

• No Link Tables – because you can do one‐to‐many relationships directly

• No Indexing Choices– Can add new data attributes (predicates) on‐the‐fly that will be real‐time available for querying becausewill be real time available for querying, because everything is automatically indexed.

• Takes anything you give it: it is trivial to consume– Rows and columns from RDB, XML, RDF(S), OWL, Text and Extracted Entities, JSON

AllegroGraph: RDF Graph StoreAllegroGraph: RDF Graph Store

RESTBackup/Restore

ReplicationRules Java

Warm FailoverSparql Prolog Rules 

Clif++ Geo SNA Time RDFS+ Java‐Script

Session Management, Query Engine, FederationSecurity

ManagementStorage layer ( compression,  indexing, freetext, transactions )

Use Case AmdocsUse Case Amdocs

Build a semantic platformthat knows everything

babout everyonein real time.

Telco Call Center Volume QuadruplesQuadruples Since 2007

• On average, each call – Lasts 10 minutes– Go thru 68 screens

• One call costs 3 months’ profit from that customer• One call costs 3 months  profit from that customer• It’s getting worse every day!

Typical Interaction Begins in the Dark

Bill

Dark

PlanPast Payments The unknown – why 

calling? How to help?

DeviceCalculator (avg peak usage)

g p

Past Interactions (Memos)

Statements

No real‐time context           ‐ insight & guidance

(Memos)g g

High AHT, poor FCR, low customer and agent satisfaction

AIDA Maps Events to C tConcepts

Events from many source systems are transformed into a set of related business concepts

Interactions

Bills

Orders

Many events Triple Store with business concepts

Bills

Payments

Collections

Charge disputeg p

Individual

Customer

Pay instructions Subjective  "good payer"Patterns  "always pays 2 days late"

Chronology of events

Device Activated

Device heartbeat

Subscriptions

D i h

a e s a ays pays days a eTrends “improving payer"Geospatial  “within 5 miles of the tower"Time  “within 5 minutes of an outage" Chronology of eventsDevice changes Probability  “probably will call about the bill"Absence of occurrence  “missed payment"Relationship between  " friend of a friend"

Events Decision Engine

Container

ActionsSBA   Application Server

ContainerContainer

EventIngestion Inference

Amdocs Event Collector

Amdocs Integration Framework

Scheduled

Inference Engine(Business Rules)

Bayesian

EventsEvents

“Sesame”

ScheduledEvents

yBeliefNetwork

Operational SystemsOperational Systems

CRMCRMRM OMS

AllegroGraph

Operational SystemsOperational Systems

Event Data SourcesEvent Data Sources

NW Web 2.0

AllegroGraphTriple Store DB

AIDA Event CollectionAIDA Event Collection

Amdocs Event CollectorInference & DecisionAmdocs Event Collector

Event Sources Collection Parsing Mapping Publishing

Decision

Ingestion

• Events are collected from many heterogeneous, configured event sources

Phone calls texting video upload roaming etc– Phone calls, texting, video upload, roaming, etc.– iTune download, web site interaction, media upload– Emails, support calls

Bill payment or non payment– Bill payment or non‐payment– Phones stop working or disconnect

• All fused and mapped into a single event knowledge base

AIDA Semantic Inference

• Define rules to operate to create higher level concepts

AIDA Semantic Inference

– Event (mapping) rules ‐Map event data into the domain ontology– Automatic rules – Compute new properties defined by the ontology– On‐demand rules ‐ perform inference for the services

• Rules triggered upon event ingestion, service request or schedule• Semantic rule inference generates new triples from existing ones

Bills

Charges

P t

Amount

Payment P

Customer

Payments Due Date

“Timeliness”Make

Pattern

Good

Bad

Devices Model

StatusOnTime

Early

Late

Improving

Worsening

Semantic Inference – Using Business R l hi h l l

• AIDA provides Workbench for business 

Rules to generate high level concepts“Late Payment” defined in Workbench

rule construction• Utilizes a sophisticated 

magnetic block GUI for b i lbusiness analysts

• Rules triggered to infer and generate newbusiness conceptsbusiness concepts

rule PaymentDetails.timeliness{

if date within EarlyPeriod days after customerBill.billDatethen timeliness = Early ;

Each business rule defines an attribute. This rule defines an attribute of the PaymentDetails class called timeliness

then timeliness = Early ;else if date not within LatePeriod days after customerBill.billDatethen timeliness = Late ;else timeliness = OnTime ;

}All classes and their attributes are defined in the application ontology

Java codeJava code

Decisioning – Probabilistic 

• AIDA incorporates also Bayesian Belief Networks (BBN)

Assessment

• These are graphical models for reasoning under uncertainty• Important part of decision making – the likelihood of something happenning

estimated by how often it occurred in the past (primarily used in medical research til tl )until recently)

• Evidence consists of observations on certain nodes leading to conclusions

Evidence Conclusions

Payment Pattern

Bill Expect Payment Arrangement 

Setup

Payment

Expect Payment

Presenting insight to the CSRese t g s g t to t e CS

Prediction on reason for the Process opens Prediction on reason for the call – ranked by probability relevant screen for 

reference and action

Presentation of recent dinteractions and events  

Prioritized Recommended treatment and script

First application:  CRMAmdocs Guided Interaction Advisor

First Call ResolutionFirst Call Resolution• Increase up to 15%

Average Handling Time• Reduce up to 30%

Training CostsR d 25%• Reduce up to 25%

Triples all the way downTriples all the way down

So why a triple storeSo why a triple store

• Flexibility, flexibility and flexibilityy, y y– Change the schema on a daily basis– Customers create new policies which in turn will create new schemas on the fly

• Needed to work with meaningRdf describes data– Rdf describes data

• Needed to be declarative for everything– Most RTBI is a combination of data in the DB and javaMost RTBI is a combination of data in the DB and java variables in the application.

Text Intelligence for DOD/ISText Intelligence for DOD/IS

How would you do this with d d h iyour standard search engine

• Give me a newspaper text with a republican and a democrat that serve on two subcommittees that have the same parent committee.

• Which [democrat|republican] is most vocal in the oil spill disaster[ | p ] p

• Given this text, find all the other texts that have the same people and the same main topics but not democrats in the textsame main topics but not democrats in the text.

• Which newspaper favors [democrats|republicans]

• Which [democrate|republican|senator|representative] get most of the attention in the last week.

• Give me the distribution of the most important topics yesterday

The processThe process

• We spider daily >  300 on‐line newspapers and thousands of p y p pblogs

• And search specifically for all the member of the senate and  house of representatives and the executive branch

• Apply entity extractor to the text and extract main concepts – About 150 triples per text…p p

• Hook up these concepts with a detailed database of  each politician and with information from the linked open data cloud

From News Article toFrom News Article to

• People (has‐people)p ( p p )– And their roles

• Places (has‐places)– And the county, state, country they are in

• Organizations (has‐organizations)– Government departments, company names, etc.

• Main Categories (has‐domains)Politics sports ministries energy finance economics– Politics, sports, ministries, energy, finance, economics, ecology, oil, mining industry, etc..

• Main Concepts (has‐main‐groups)– Other important nouns and phrases in a text

LOD cloud – Sept 22 2010LOD cloud  Sept 22 2010

latest LOD cloud

AllegroTextAllegroText

• A little demo?

How scalable is this?How scalable is this?

LoadingLoading

QueriesQueries

• Query planner now takes 99% of SPARQL 1.0, automatically Q y p Q , ycompiles it into query graph flow language…

You can write this by hand if you i i lfwant to optimize yourself.

This will actually work on Prolog i h l !with rules too!

Query performance notes:iWins

• Indices are small enough to fit in memory of conventialg ymachines

• Simultaneous access to indices  (see next slide)

• Pipe line architecture• Pipe line architecture– Stream based processing (all nodes can be active in parallel. Most nodes can begin before the end of data is p greached.)

The endThe end