Graph Analytics - Titan and Cassandra @NJ Data Science Meetup
Transcript of Graph Analytics - Titan and Cassandra @NJ Data Science Meetup
Text
TitanBy Isaac Rieksts @IsaacRieksts
1These thoughts are mine own and do not represent the company
Text
Database Trianglehttp://blog.nahurst.com/visual-guide-to-nosql-systems
Cassandra
Tunable consistency
Multiple datacenter support
Built in replication and fault tolerance
CQL query language
Keyspace passwords
IndexingBuilt-in
Fast for exact matches
Lucene
More advanced queries
Good for single box
Elasticsearch
Advanced queries
large scale clusters
Gremlin vs SPARQL
Support for complex queries
http://gremlindocs.com/
Easy query language
http://www.w3.org/TR/rdf-sparql-query/
Gremlin SPARQL
Gremlin vs SPARQL example 1
g.v(‘tg:1')
.out('tg:knows')
SELECT ?x WHERE {
tg:1 tg:knows ?x
}
Gremlin SPARQL
g.v(‘tg:1')
.out(‘tg:knows')
.out('tg:name')
SELECT ?y WHERE {
tg:1 tg:knows ?x .
?x tg:name ?y
}
Gremlin SPARQL
Gremlin vs SPARQL example 2
Our Mission▪Deliver the most current information on the U.S. healthcare provider
universe using integrated solutions in order for customers to: › Prevent fraud, waste and abuse across the healthcare system › Comply with evolving state and federal regulations › Improve market opportunity for non retail drugs and devices
Health Market Science a Lexisnexis Company
The Business
BusinessSolutionsHealth Care Provider & Facilities
Variety/Velocity • >2000 of sources • 6 Million unique HCPs • 10+ years history Data Challenges • Constant change in real
world data • Conflicting & partial info • Frequent changes to source
structure • Authoritative sources vs.
crowdsource • Predicting source quality
Master Data SolutionsMedical Procedures & Diagnosis
Volume/Velocity • ~1B claims annually • +5B records annually • 5+ years history Data Challenges • Sources have incomplete
capture • Overlapping source data • Statistical projections &
biases • Social media type
relationships
Medical Claims Data
Batch (CompleteView,
Expense Manager)
Transactional (PRS/MDM/
VerifyRx)
Big Data Relational DB & Analytics
(Claims)
Master Data Management
Visualization
Dashboard / Reports
Structured Storage
RelationalIndexing
Flexible Storage
NoSQL Graph(s)
Interfacing
Web Services
Distributed Processing
Standardize
Validate
Match
Consolidate
Analytics
Data Sources
Government
Web
Customer
I’m happy
User Interface
Our use of Titan
Link storage
Analytics of links
Affiliation of business influences
Visualization of relationships