Php in the graph (Gremlin 3)

Post on 21-Feb-2017

83 views 0 download

Transcript of Php in the graph (Gremlin 3)

PHP in the graph

Fosdem 2017, Brussels, Belgique

Agenda

Discover the Graph

Steps with a gremlin

Gremlin and PHP

Damien Seguy

CTO at Exakat

Static analysis for PHP

PHP code as Dataset

Speaker

Wordpress call graph<?php

function _default_wp_die_handler( $message, $title = '', $args = array() ) {     $defaults = array( 'response' => 500 );     $r = wp_parse_args($args, $defaults);

    $have_gettext = function_exists('__');

    if ( function_exists( 'is_wp_error' ) &&  is_wp_error( $message ) ) {         if ( empty( $title ) ) {             $error_data = $message->get_error_data();             if ( is_array( $error_data ) && isset( $error_data['title'] ) )                 $title = $error_data['title'];

Wordpress call graph

What is gremlin?

Domain specific language for graphs

It is a programming language to traverse a graph

It is open source, vendor-agnostic

Simple traversal

V : Vertices, or nodes or objects

E : Edges, or links or relations

G : graph, or the dataset

The Verticesg represents the graph or the gremlin

v() represents all the vertices

g.V(1) is one of the vertice

Vertices always have an id

g.V(1) => v(1)

1

g.V(1) => v(1) g.V(1).values('name') => apply_filter g.V(1).values('leaf') => true g.V(1).values('compat') => ['7.0', '7.1'] g.V(1).id() => 1

// non-existing properties g.V(1).values('isFedAfterMidnight') => null

PropertiesGraph is schemaless

apply_filter

Vertice discoveryUse valueMap to discover the graph

Except id and label

g.V(2).valueMap() => {name=wp_die, leaf=true, glob} g.V(2).valueMap('name') => {name=wp_die}

wp_die

The Edges

g.E() represents all the edges

g.E(1) is the edge with id 1

Edges have id, properties

also : start, end and label

CALLS

g.E(1) => e(1) g.E(1).label() => 'CALLS' g.E(1).id() => 1 g.E(1).values('count') => 3 g.E(1).valueMap() => {count=3}

Edge discovery

wp_diewp_ajax_fetch_list CALLS

Edges link the vertices

g.E(5).outV() => v(2) g.E(6).outV() => v(3)

g.E(7).inV() => v(4)

Exiting the Edges

2

3

4

1

5

6

7

Directed graph

g.V(1).out() => v(2) v(3)

g.V(1).in() => v(4)

g.V(1).both() => v(2) v(3)

v(4)

Following Edges

2

3

4

1

5

6

7

g.V(1).inE() => e(7)

g.V(1).out().id() => 2 3

g.V(2).in().in().id() => 4

Chaining

2

3

4

1

5

6

7

Wordpress Calls GraphThe graph of all Wordpress internal function calls

function:name

function:nameCALLS

g.V(19).out('CALLS').values('name') => wp_hash_password

wp_cache_delete

g.V(19).in('CALLS').values('name') => reset_password

wp_check_password

Who's calling ?

wp_set_password

reset_password

wp_hash_password

wp_check_password

wp_cache_delete

CALLS

CALLS

CALLS

CALLS

Is it Recursive?

get_permalink get_category_link

get_category_parentsid : 30

CALLS

CALLSCALLS

g.V(30).as('myself') .out(‘CALLS’)

.retain('myself') .values('name') => get_category_parents

Is it Recursive?

get_permalink get_category_link

get_category_parentsid : 30

CALLS

CALLSCALLS

Is it Recursive?g.v(30).as('myself')

.in(‘CALLS’) .retain('myself') .values('name') => get_category_parents

get_permalink get_category_link

get_category_parentsid : 30

CALLS

CALLSCALLS

g.V(47).as('myself') .out(‘CALLS’).except('myself')

.out('CALLS').retain('myself') .values('name')

=> wp_trash_comment wp_delete_comment

Ping-Pong Functions

CALLS

wp_trash_commentid: 47

wp_delete_commentid : 148

CALLS

g.V(47).as('myself') .out(‘CALLS’).except('myself') .out(‘CALLS’).except('myself')

.out('CALLS').retain('myself') .values('name')

Ping-Pong Functions

CALLS

CALLS

CALLS

g.V(47).as('myself') .out(‘CALLS’).except('myself') .out(‘CALLS’).except('myself') .out(‘CALLS’).except('myself')

.out('CALLS').retain('myself') .values('name')

Ping-Pong Functions

CALLS

CALLS

CALLS

CALLS

g.V(47).as('myself') .repeat( out(‘CALLS’).except('myself')

).emit().times(3) .out('CALLS').retain('myself') .values('name')

Ping-Pong Functions

CALLS

CALLS

CALLS

CALLS

Up to now

nodes and vertices : basic blocs

in and out (and both) : navigation

except(), retain(), in(‘label’) : filtering

Loops with repeat()

Starting at the vertices

Traversing the graph

1 2 3

4

6

5 7

Filtering on edgesg.V().out('CALL') => v(25);

g.V().out('CALL', 'CALLED', 'XXX') => v(25);

wp_set_password

reset_password

wp_hash_password

wp_check_password

wp_cache_delete

CALLS

CALLS

CALLS

CALLS

Filtering on vertices

g.V().has('name') g.V().has('name','wp_die') g.V().has('name', neq('wp_die')) g.V().has('name', within('wp_die', 'wp_header')) g.V().has('name', without('wp_die', 'wp_header'))

wp_die

g.V().out('CALLS') .has('name','wp_die') .values('name') =>

Dying Functions

???? wp_dieCALLS

PROCESSING

wp_die wp_die wp_die wp_die wp_die wp_die wp_die wp_die wp_die

g.V().has('name','wp_die') .in('CALLS') .values('name') => wp_ajax_trash_post wp_ajax_delete_post wp_ajax_delete_meta wp_ajax_delete_link wp_ajax_delete_tag wp_ajax_delete_comment wp_ajax_oembed_cache wp_ajax_imgedit_preview we_ajax_fetch_list

Dying Functions

???? wp_dieCALLS

PROCESSING

Dying Functionsg.V().out('CALLS') .has('name','wp_die') .count() => 84

???? wp_dieCALLS

PROCESSING

PROCESSING

g.V().has('name','wp_die') .in('CALLS') .count() => 84

Dying Functionsg.V().out('CALLS') .has('name','wp_die') .dedup() .count() => 1

???? wp_dieCALLS

PROCESSING

PROCESSING

g.V().has('name','wp_die') .in('CALLS') .dedup() .count() => 84

Sampling

g.V().limit(2).count() => 2 g.V().range(2,5).count() => 3 g.V().tail(4).count() => 4 g.V().coin(0.01).count() => 44

g.V().count() => 4373 g.E().count() => 55457

wp_parse_args2

wp_get_object_terms3

wp_terms_checklist

4

count()

g.V()

wp_terms_checklist

4

g.V().as('start') .out('CALLS') .has('name','wp_die') .select('start') .by('name') => wp_ajax_trash_post wp_ajax_delete_post wp_ajax_delete_meta wp_ajax_delete_link wp_ajax_delete_tag wp_ajax_delete_comment wp_ajax_oembed_cache wp_ajax_imgedit_preview wp_ajax_fetch_list

Dying Functions

???? wp_dieCALLS

PROCESSING

Naming nodes

esc_htmlg.V()

_eCALLS

CALLS????has(‘name’) CALLS

as () : gives a name select() : select a node by() : options of display

Filtering on vertices//Functions that call wp_die and esc_html

g.V().has('name','wp_die') .in('CALLS') .as('results') .out('CALLS') .has('name', 'esc_html') .select('results')

????

wp_die

esc_html

CALLS

CALLS

CALLS

????

CALLS

Filtering on vertices//Functions that call wp_die and esc_html

g.V().where( .out('CALLS') .has('name','wp_die') ) .as('results') .out('CALLS') .has('name', 'esc_html') .select('results')

????

wp_die

esc_html

CALLS

CALLS

CALLS

????

CALLS

Filtering on vertices//Functions that call wp_die and esc_html

g.V().where( .out('CALLS') .has('name','wp_die') ) .where( __.out('CALLS') .has('name', 'esc_html') )

????

wp_die

esc_html

CALLS

CALLS

CALLS

????

CALLS

Filtering on vertices//Functions that call wp_die and esc_html

g.V().where( .out('CALLS') .has('name','wp_die') ) .where( __.out('CALLS') .has('name', 'esc_html') .count().is(neq(0)) )

????

wp_die

esc_html

CALLS

CALLS

CALLS

????

CALLS

Traversal full ahead

g.V().in/out() .has() .where() .as() .select().by().by()

Advanced

Applied to properties

Non standard functions

g.V().filter{ it.get().value('name') != it.get().value('name').toLowerCase() } .count() => 73

Closures

Steps often offer possibility for closure

Closure is between {} , uses ‘it.get()’ as current node, is written in Groovy

Closure should be replaces by step, unless there is a need for a special manipulation

GroupBy/GroupCount

g.V().groupCount('a').by('name').cap('a') ==>[wp_die:22, wp_header:24…]

g.V().groupCount('a').by('name') .groupCount('b').by(out().count()) .cap('a','b') ==>[a:[wp_die:22, wp_header:24], b:[22:1, 24:1]]

Gremlin and PHP

Gremlin For PHP

https://github.com/PommeVerte/gremlin-php

Using with Neo4j : REST API

Older API : neo4jPHP, rexpro-php

No Gremlin implementation in PHP (yet?)

<?php

require_once('vendor/autoload.php'); // depending on your project this may not be necessa use \Brightzone\GremlinDriver\Connection;  $db = new Connection([    'host' => 'localhost',    'graph' => 'graph' ]); $db->open();   $result = $db->send('g.V(2)'); //do something with result $db->close();

Apache TinkerPop

http://tinkerpop.incubator.apache.org/

Version : 3.2.3

Database

StarDogsqlg

Gremlin

Server

Console

bin/gremlin-server.sh conf/gremlin-server-modern.yaml

:install org.apache.tinkerpop neo4j-gremlin 3.2.3-incubating

Thanks

dseguy@exakat.io @exakat

http://www.slideshare.net/dseguy/

Leaf and Roots

LEAF

ROOT

g.V().where( out('CALLS') ) .count() => 407

g.V().where( __.in('CALLS').count().is(eq(0)) ) .count() => 1304