Graph Day Texas: Open Source Graph Projects from PokitDok

39
A tour of the PokitDok Health Graph and some open source graph projects Graph Day Texas, Jan 2016 Denise Gosnell, PhD Twitter and Github: @pokitdok @denisekgosnell

Transcript of Graph Day Texas: Open Source Graph Projects from PokitDok

Page 1: Graph Day Texas: Open Source Graph Projects from PokitDok

A tour of the PokitDok Health Graph and some open source graph projects

Graph Day Texas, Jan 2016Denise Gosnell, PhD

Twitter and Github:@pokitdok

@denisekgosnell

Page 2: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 2

PokitDok APIs:

The business of health,for developers.

https://platform.pokitdok.com/

Twitter and Github:@pokitdok

@denisekgosnell

Page 3: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 3

PokitDok APIs: Marketplace

Page 4: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 4

Doctor on Demand: Powered by PokitDok

Twitter and Github:@pokitdok

@denisekgosnell

Page 5: Graph Day Texas: Open Source Graph Projects from PokitDok

onward to graphs.

Page 6: Graph Day Texas: Open Source Graph Projects from PokitDok

6

What we built. The HealthGraph

What we’ve open sourced.A Gremlin-Python LibraryCustom Titan BuildDynamic JSON Graph [WIP]HealthGraph DSL [WIP]

Talk Outline: Twitter and Github:@pokitdok

@denisekgosnell

Page 7: Graph Day Texas: Open Source Graph Projects from PokitDok

The PokitDok HealthGraph

Page 8: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 8

X12 Data Standard: ETL hell from the 1970s

Twitter and Github:@pokitdok

Page 9: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 9

X12 Data Standard: ETL hell from the 1970s

Twitter and Github:@pokitdok

Page 10: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 10

Health Graph: Transaction as Trees

• We treat transactions as first-class objects in the graph

• Buried in the depth of an X12 transactions are the entities of interest

Twitter and Github:@pokitdok

Interactive graph available at: https://fullmetalhealth.com/dsl/

Page 11: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 11

HealthGraph: Property Graph Model

Twitter and Github:@pokitdok

@denisekgosnell

Page 12: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 12

HealthGraph: Probabilistic Inferences

Page 13: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 13

HealthGraph: Data Inferences

Twitter and Github:@pokitdok

@denisekgosnell

Page 14: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 14

HealthGraph: Predictive Models

• What is the probability claim X will be denied?

• A new customer just searched for “family practice”; recommend the best provider within 10 miles.

• Given a CPT code, what is the expected reimbursement rate from insurance company A in zip code 37601?

Twitter and Github:@pokitdok

@denisekgosnell

Page 15: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 15HealthGraph: Top 100k Providers

Twitter and Github:@pokitdok

@MacraeAlec

Page 16: Graph Day Texas: Open Source Graph Projects from PokitDok

PokitDok Open Source:

Gremlin Python

Page 17: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 17

Our HealthGraph Production Stack

• Titan 0.5.3

• TinkerPop’s Blueprints 2.50

• Cassandra and Elastic Search

Gremlin-Python

Twitter and Github:@pokitdok

@denisekgosnell

Page 18: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 18

• Lighter Context Switching between development tools and environments

• Incompatible syntax issues between Gremlin and Python

• Using Python.

Gremlin-Python MotivationTwitter and Github:

@corbinbs@denisekgosnell

Page 19: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 19

Option 1: Grab our docker container1. Install Dockerhttps://www.docker.com/docker-toolbox

2. Jump in the “Docker Quickstart Terminal”

3. Fire up our example container:docker run -i -t pokitdok/gremlin-python-test-drive

Option 2: Shell script install1. Clone our repo:https://github.com/pokitdok/gremlin-python

2. Run the set-up scripts:$./test_drive/setup.sh &&./test_drive/run.sh

Gremlin-Python Test Drive Twitter and Github:@corbinbs

@denisekgosnell

Page 20: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 20

Bi-Partite Graph Recommendation System

Customer

viewedscheduled_wit

h

Doctor

Twitter and Github:@pokitdok

@denisekgosnell

Page 21: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 21

Bi-Partite Graph Recommendation System

Customer

viewed

Doctor

Twitter and Github:@pokitdok

@denisekgosnell

Page 22: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 22

Bi-Partite Graph Recommendation System

Customer

viewed

Doctor

Twitter and Github:@pokitdok

@denisekgosnell

Page 23: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 23

Bi-Partite Graph Recommendation System

Customer

viewed

Doctor

Twitter and Github:@pokitdok

@denisekgosnell

Page 24: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 24

Bi-Partite Graph Recommendation System

Customer

viewed

Doctor

Twitter and Github:@pokitdok

@denisekgosnell

g.E.has(‘edge_type’,’scheduled_with’) .in_v() .group_count(ranked_docs, lambda it: it.full_name, lambda it: it.b+1.0)

Page 25: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 25

Gremlin-Python Test Drive Twitter and Github:@corbinbs

@denisekgosnell

Page 26: Graph Day Texas: Open Source Graph Projects from PokitDok

PokitDok Open Source:

Custom Build of Titan 0.5.3 to Integrate with CDH5 Containers

Page 27: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 27

Motivation for Release of Custom Build:Graph Production Stack:

Titan 0.5.x ships with Hadoop 2.2API Production Stack:

contains Cloudera’s CDH5 containers and Hadoop 2.6.0You guessed it:

infrastructure dependency errors upon integration the Hadoop 2.6.0 API is not fully backwards compatible with Hadoop 2.2

Twitter and Github:@pokitdok

Page 28: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 28

Released:A modification of the Titan 0.5.3 build to upgrade to Hadoop 2.6.0 and resolve numerous conflicts among transitive dependencies.

… someone had to do it.

Grab it here: https

://github.com/pokitdok/titan/tree/0.5.3-

hadoop2.6.0

Tested for Cassandra but not Hbase.

Twitter and Github:@pokitdok

Page 29: Graph Day Texas: Open Source Graph Projects from PokitDok

HealthGraph Dynamic JSON Load

Open Source Version [WIP]

Page 30: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 30

Dyanmic JSONLoader:

Goal: Bulk load of JSON from sequenced HDFS files straight to a Titan DB

Twitter and Github:@pokitdok

Page 31: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 31

1. Extract PokitDok HealthGraph specific features

2. Move to Titan 1.0 and TP3 compatibility3. Release on PokitDok GitHub

Dyanmic JSONLoader Future WorkTwitter and Github:

@pokitdok

Page 32: Graph Day Texas: Open Source Graph Projects from PokitDok

HealthGraph DSL

Open Source Version [WIP]

Page 33: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 33

X12 Data Standard: ETL hell from the 1970s

Twitter and Github:@pokitdok

Page 34: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 34

X12 Spec Trees vs. Graph DSL:Twitter and Github:

@pokitdok

Interactive graph available at: https://fullmetalhealth.com/dsl/

Page 35: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 35

Graph DSL with TinkerPop 2.5:Twitter and Github:

@pokitdok

Page 36: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 36

1. Move to Titan 1.0 and TP3 compatibility2. Release on PokitDok GitHub3. Current Open Question:

We are looking for(ward to) more documentation on implementing custom gremlin steps(DSLs) in TP3

DSL Future WorkTwitter and Github:

@pokitdok

Page 37: Graph Day Texas: Open Source Graph Projects from PokitDok

and there will be more…!

Page 38: Graph Day Texas: Open Source Graph Projects from PokitDok

Confidential 38

Reach Out

Dev Blog: FullMetalHealth.com@PokitDok @DeniseKGosnell

[email protected]

Page 39: Graph Day Texas: Open Source Graph Projects from PokitDok

A tour of the PokitDok Health Graph and some open source graph projects

Graph Day Texas, Jan 2016Denise Gosnell, PhD

Twitter and Github:@pokitdok

@denisekgosnell