Application of text mining and graph database on civil engineering projects - Djordje Nedeljkovic
Graph Day Texas: Open Source Graph Projects from PokitDok
-
Upload
denise-gosnell-phd -
Category
Data & Analytics
-
view
324 -
download
1
Transcript of Graph Day Texas: Open Source Graph Projects from PokitDok
A tour of the PokitDok Health Graph and some open source graph projects
Graph Day Texas, Jan 2016Denise Gosnell, PhD
Twitter and Github:@pokitdok
@denisekgosnell
Confidential 2
PokitDok APIs:
The business of health,for developers.
https://platform.pokitdok.com/
Twitter and Github:@pokitdok
@denisekgosnell
Confidential 3
PokitDok APIs: Marketplace
Confidential 4
Doctor on Demand: Powered by PokitDok
Twitter and Github:@pokitdok
@denisekgosnell
onward to graphs.
6
What we built. The HealthGraph
What we’ve open sourced.A Gremlin-Python LibraryCustom Titan BuildDynamic JSON Graph [WIP]HealthGraph DSL [WIP]
Talk Outline: Twitter and Github:@pokitdok
@denisekgosnell
The PokitDok HealthGraph
Confidential 8
X12 Data Standard: ETL hell from the 1970s
Twitter and Github:@pokitdok
Confidential 9
X12 Data Standard: ETL hell from the 1970s
Twitter and Github:@pokitdok
Confidential 10
Health Graph: Transaction as Trees
• We treat transactions as first-class objects in the graph
• Buried in the depth of an X12 transactions are the entities of interest
Twitter and Github:@pokitdok
Interactive graph available at: https://fullmetalhealth.com/dsl/
Confidential 11
HealthGraph: Property Graph Model
Twitter and Github:@pokitdok
@denisekgosnell
Confidential 12
HealthGraph: Probabilistic Inferences
Confidential 13
HealthGraph: Data Inferences
Twitter and Github:@pokitdok
@denisekgosnell
Confidential 14
HealthGraph: Predictive Models
• What is the probability claim X will be denied?
• A new customer just searched for “family practice”; recommend the best provider within 10 miles.
• Given a CPT code, what is the expected reimbursement rate from insurance company A in zip code 37601?
Twitter and Github:@pokitdok
@denisekgosnell
Confidential 15HealthGraph: Top 100k Providers
Twitter and Github:@pokitdok
@MacraeAlec
PokitDok Open Source:
Gremlin Python
Confidential 17
Our HealthGraph Production Stack
• Titan 0.5.3
• TinkerPop’s Blueprints 2.50
• Cassandra and Elastic Search
Gremlin-Python
Twitter and Github:@pokitdok
@denisekgosnell
Confidential 18
• Lighter Context Switching between development tools and environments
• Incompatible syntax issues between Gremlin and Python
• Using Python.
Gremlin-Python MotivationTwitter and Github:
@corbinbs@denisekgosnell
Confidential 19
Option 1: Grab our docker container1. Install Dockerhttps://www.docker.com/docker-toolbox
2. Jump in the “Docker Quickstart Terminal”
3. Fire up our example container:docker run -i -t pokitdok/gremlin-python-test-drive
Option 2: Shell script install1. Clone our repo:https://github.com/pokitdok/gremlin-python
2. Run the set-up scripts:$./test_drive/setup.sh &&./test_drive/run.sh
Gremlin-Python Test Drive Twitter and Github:@corbinbs
@denisekgosnell
Confidential 20
Bi-Partite Graph Recommendation System
Customer
viewedscheduled_wit
h
Doctor
Twitter and Github:@pokitdok
@denisekgosnell
Confidential 21
Bi-Partite Graph Recommendation System
Customer
viewed
Doctor
Twitter and Github:@pokitdok
@denisekgosnell
Confidential 22
Bi-Partite Graph Recommendation System
Customer
viewed
Doctor
Twitter and Github:@pokitdok
@denisekgosnell
Confidential 23
Bi-Partite Graph Recommendation System
Customer
viewed
Doctor
Twitter and Github:@pokitdok
@denisekgosnell
Confidential 24
Bi-Partite Graph Recommendation System
Customer
viewed
Doctor
Twitter and Github:@pokitdok
@denisekgosnell
g.E.has(‘edge_type’,’scheduled_with’) .in_v() .group_count(ranked_docs, lambda it: it.full_name, lambda it: it.b+1.0)
Confidential 25
Gremlin-Python Test Drive Twitter and Github:@corbinbs
@denisekgosnell
PokitDok Open Source:
Custom Build of Titan 0.5.3 to Integrate with CDH5 Containers
Confidential 27
Motivation for Release of Custom Build:Graph Production Stack:
Titan 0.5.x ships with Hadoop 2.2API Production Stack:
contains Cloudera’s CDH5 containers and Hadoop 2.6.0You guessed it:
infrastructure dependency errors upon integration the Hadoop 2.6.0 API is not fully backwards compatible with Hadoop 2.2
Twitter and Github:@pokitdok
Confidential 28
Released:A modification of the Titan 0.5.3 build to upgrade to Hadoop 2.6.0 and resolve numerous conflicts among transitive dependencies.
… someone had to do it.
Grab it here: https
://github.com/pokitdok/titan/tree/0.5.3-
hadoop2.6.0
Tested for Cassandra but not Hbase.
Twitter and Github:@pokitdok
HealthGraph Dynamic JSON Load
Open Source Version [WIP]
Confidential 30
Dyanmic JSONLoader:
Goal: Bulk load of JSON from sequenced HDFS files straight to a Titan DB
Twitter and Github:@pokitdok
Confidential 31
1. Extract PokitDok HealthGraph specific features
2. Move to Titan 1.0 and TP3 compatibility3. Release on PokitDok GitHub
Dyanmic JSONLoader Future WorkTwitter and Github:
@pokitdok
HealthGraph DSL
Open Source Version [WIP]
Confidential 33
X12 Data Standard: ETL hell from the 1970s
Twitter and Github:@pokitdok
Confidential 34
X12 Spec Trees vs. Graph DSL:Twitter and Github:
@pokitdok
Interactive graph available at: https://fullmetalhealth.com/dsl/
Confidential 35
Graph DSL with TinkerPop 2.5:Twitter and Github:
@pokitdok
Confidential 36
1. Move to Titan 1.0 and TP3 compatibility2. Release on PokitDok GitHub3. Current Open Question:
We are looking for(ward to) more documentation on implementing custom gremlin steps(DSLs) in TP3
DSL Future WorkTwitter and Github:
@pokitdok
and there will be more…!
A tour of the PokitDok Health Graph and some open source graph projects
Graph Day Texas, Jan 2016Denise Gosnell, PhD
Twitter and Github:@pokitdok
@denisekgosnell