Php in the graph (Gremlin 3)

53
PHP in the graph Fosdem 2017, Brussels, Belgique

Transcript of Php in the graph (Gremlin 3)

Page 1: Php in the graph (Gremlin 3)

PHP in the graph

Fosdem 2017, Brussels, Belgique

Page 2: Php in the graph (Gremlin 3)

Agenda

Discover the Graph

Steps with a gremlin

Gremlin and PHP

Page 3: Php in the graph (Gremlin 3)

Damien Seguy

CTO at Exakat

Static analysis for PHP

PHP code as Dataset

Speaker

Page 4: Php in the graph (Gremlin 3)

Wordpress call graph<?php

function _default_wp_die_handler( $message, $title = '', $args = array() ) {     $defaults = array( 'response' => 500 );     $r = wp_parse_args($args, $defaults);

    $have_gettext = function_exists('__');

    if ( function_exists( 'is_wp_error' ) &&  is_wp_error( $message ) ) {         if ( empty( $title ) ) {             $error_data = $message->get_error_data();             if ( is_array( $error_data ) && isset( $error_data['title'] ) )                 $title = $error_data['title'];

Page 5: Php in the graph (Gremlin 3)

Wordpress call graph

Page 6: Php in the graph (Gremlin 3)

What is gremlin?

Domain specific language for graphs

It is a programming language to traverse a graph

It is open source, vendor-agnostic

Page 7: Php in the graph (Gremlin 3)

Simple traversal

V : Vertices, or nodes or objects

E : Edges, or links or relations

G : graph, or the dataset

Page 8: Php in the graph (Gremlin 3)

The Verticesg represents the graph or the gremlin

v() represents all the vertices

g.V(1) is one of the vertice

Vertices always have an id

g.V(1) => v(1)

1

Page 9: Php in the graph (Gremlin 3)

g.V(1) => v(1) g.V(1).values('name') => apply_filter g.V(1).values('leaf') => true g.V(1).values('compat') => ['7.0', '7.1'] g.V(1).id() => 1

// non-existing properties g.V(1).values('isFedAfterMidnight') => null

PropertiesGraph is schemaless

apply_filter

Page 10: Php in the graph (Gremlin 3)

Vertice discoveryUse valueMap to discover the graph

Except id and label

g.V(2).valueMap() => {name=wp_die, leaf=true, glob} g.V(2).valueMap('name') => {name=wp_die}

wp_die

Page 11: Php in the graph (Gremlin 3)

The Edges

g.E() represents all the edges

g.E(1) is the edge with id 1

Edges have id, properties

also : start, end and label

CALLS

Page 12: Php in the graph (Gremlin 3)

g.E(1) => e(1) g.E(1).label() => 'CALLS' g.E(1).id() => 1 g.E(1).values('count') => 3 g.E(1).valueMap() => {count=3}

Edge discovery

wp_diewp_ajax_fetch_list CALLS

Edges link the vertices

Page 13: Php in the graph (Gremlin 3)

g.E(5).outV() => v(2) g.E(6).outV() => v(3)

g.E(7).inV() => v(4)

Exiting the Edges

2

3

4

1

5

6

7

Directed graph

Page 14: Php in the graph (Gremlin 3)

g.V(1).out() => v(2) v(3)

g.V(1).in() => v(4)

g.V(1).both() => v(2) v(3)

v(4)

Following Edges

2

3

4

1

5

6

7

g.V(1).inE() => e(7)

Page 15: Php in the graph (Gremlin 3)

g.V(1).out().id() => 2 3

g.V(2).in().in().id() => 4

Chaining

2

3

4

1

5

6

7

Page 16: Php in the graph (Gremlin 3)

Wordpress Calls GraphThe graph of all Wordpress internal function calls

function:name

function:nameCALLS

Page 17: Php in the graph (Gremlin 3)
Page 18: Php in the graph (Gremlin 3)

g.V(19).out('CALLS').values('name') => wp_hash_password

wp_cache_delete

g.V(19).in('CALLS').values('name') => reset_password

wp_check_password

Who's calling ?

wp_set_password

reset_password

wp_hash_password

wp_check_password

wp_cache_delete

CALLS

CALLS

CALLS

CALLS

Page 19: Php in the graph (Gremlin 3)

Is it Recursive?

get_permalink get_category_link

get_category_parentsid : 30

CALLS

CALLSCALLS

Page 20: Php in the graph (Gremlin 3)

g.V(30).as('myself') .out(‘CALLS’)

.retain('myself') .values('name') => get_category_parents

Is it Recursive?

get_permalink get_category_link

get_category_parentsid : 30

CALLS

CALLSCALLS

Page 21: Php in the graph (Gremlin 3)

Is it Recursive?g.v(30).as('myself')

.in(‘CALLS’) .retain('myself') .values('name') => get_category_parents

get_permalink get_category_link

get_category_parentsid : 30

CALLS

CALLSCALLS

Page 22: Php in the graph (Gremlin 3)

g.V(47).as('myself') .out(‘CALLS’).except('myself')

.out('CALLS').retain('myself') .values('name')

=> wp_trash_comment wp_delete_comment

Ping-Pong Functions

CALLS

wp_trash_commentid: 47

wp_delete_commentid : 148

CALLS

Page 23: Php in the graph (Gremlin 3)

g.V(47).as('myself') .out(‘CALLS’).except('myself') .out(‘CALLS’).except('myself')

.out('CALLS').retain('myself') .values('name')

Ping-Pong Functions

CALLS

CALLS

CALLS

Page 24: Php in the graph (Gremlin 3)

g.V(47).as('myself') .out(‘CALLS’).except('myself') .out(‘CALLS’).except('myself') .out(‘CALLS’).except('myself')

.out('CALLS').retain('myself') .values('name')

Ping-Pong Functions

CALLS

CALLS

CALLS

CALLS

Page 25: Php in the graph (Gremlin 3)

g.V(47).as('myself') .repeat( out(‘CALLS’).except('myself')

).emit().times(3) .out('CALLS').retain('myself') .values('name')

Ping-Pong Functions

CALLS

CALLS

CALLS

CALLS

Page 26: Php in the graph (Gremlin 3)

Up to now

nodes and vertices : basic blocs

in and out (and both) : navigation

except(), retain(), in(‘label’) : filtering

Loops with repeat()

Starting at the vertices

Page 27: Php in the graph (Gremlin 3)

Traversing the graph

1 2 3

4

6

5 7

Page 28: Php in the graph (Gremlin 3)

Filtering on edgesg.V().out('CALL') => v(25);

g.V().out('CALL', 'CALLED', 'XXX') => v(25);

wp_set_password

reset_password

wp_hash_password

wp_check_password

wp_cache_delete

CALLS

CALLS

CALLS

CALLS

Page 29: Php in the graph (Gremlin 3)

Filtering on vertices

g.V().has('name') g.V().has('name','wp_die') g.V().has('name', neq('wp_die')) g.V().has('name', within('wp_die', 'wp_header')) g.V().has('name', without('wp_die', 'wp_header'))

wp_die

Page 30: Php in the graph (Gremlin 3)

g.V().out('CALLS') .has('name','wp_die') .values('name') =>

Dying Functions

???? wp_dieCALLS

PROCESSING

wp_die wp_die wp_die wp_die wp_die wp_die wp_die wp_die wp_die

Page 31: Php in the graph (Gremlin 3)

g.V().has('name','wp_die') .in('CALLS') .values('name') => wp_ajax_trash_post wp_ajax_delete_post wp_ajax_delete_meta wp_ajax_delete_link wp_ajax_delete_tag wp_ajax_delete_comment wp_ajax_oembed_cache wp_ajax_imgedit_preview we_ajax_fetch_list

Dying Functions

???? wp_dieCALLS

PROCESSING

Page 32: Php in the graph (Gremlin 3)

Dying Functionsg.V().out('CALLS') .has('name','wp_die') .count() => 84

???? wp_dieCALLS

PROCESSING

PROCESSING

g.V().has('name','wp_die') .in('CALLS') .count() => 84

Page 33: Php in the graph (Gremlin 3)

Dying Functionsg.V().out('CALLS') .has('name','wp_die') .dedup() .count() => 1

???? wp_dieCALLS

PROCESSING

PROCESSING

g.V().has('name','wp_die') .in('CALLS') .dedup() .count() => 84

Page 34: Php in the graph (Gremlin 3)

Sampling

g.V().limit(2).count() => 2 g.V().range(2,5).count() => 3 g.V().tail(4).count() => 4 g.V().coin(0.01).count() => 44

g.V().count() => 4373 g.E().count() => 55457

wp_parse_args2

wp_get_object_terms3

wp_terms_checklist

4

count()

g.V()

wp_terms_checklist

4

Page 35: Php in the graph (Gremlin 3)

g.V().as('start') .out('CALLS') .has('name','wp_die') .select('start') .by('name') => wp_ajax_trash_post wp_ajax_delete_post wp_ajax_delete_meta wp_ajax_delete_link wp_ajax_delete_tag wp_ajax_delete_comment wp_ajax_oembed_cache wp_ajax_imgedit_preview wp_ajax_fetch_list

Dying Functions

???? wp_dieCALLS

PROCESSING

Page 36: Php in the graph (Gremlin 3)

Naming nodes

esc_htmlg.V()

_eCALLS

CALLS????has(‘name’) CALLS

as () : gives a name select() : select a node by() : options of display

Page 37: Php in the graph (Gremlin 3)

Filtering on vertices//Functions that call wp_die and esc_html

g.V().has('name','wp_die') .in('CALLS') .as('results') .out('CALLS') .has('name', 'esc_html') .select('results')

????

wp_die

esc_html

CALLS

CALLS

CALLS

????

CALLS

Page 38: Php in the graph (Gremlin 3)

Filtering on vertices//Functions that call wp_die and esc_html

g.V().where( .out('CALLS') .has('name','wp_die') ) .as('results') .out('CALLS') .has('name', 'esc_html') .select('results')

????

wp_die

esc_html

CALLS

CALLS

CALLS

????

CALLS

Page 39: Php in the graph (Gremlin 3)

Filtering on vertices//Functions that call wp_die and esc_html

g.V().where( .out('CALLS') .has('name','wp_die') ) .where( __.out('CALLS') .has('name', 'esc_html') )

????

wp_die

esc_html

CALLS

CALLS

CALLS

????

CALLS

Page 40: Php in the graph (Gremlin 3)

Filtering on vertices//Functions that call wp_die and esc_html

g.V().where( .out('CALLS') .has('name','wp_die') ) .where( __.out('CALLS') .has('name', 'esc_html') .count().is(neq(0)) )

????

wp_die

esc_html

CALLS

CALLS

CALLS

????

CALLS

Page 41: Php in the graph (Gremlin 3)

Traversal full ahead

g.V().in/out() .has() .where() .as() .select().by().by()

Page 42: Php in the graph (Gremlin 3)

Advanced

Page 43: Php in the graph (Gremlin 3)

Applied to properties

Non standard functions

g.V().filter{ it.get().value('name') != it.get().value('name').toLowerCase() } .count() => 73

Page 44: Php in the graph (Gremlin 3)

Closures

Steps often offer possibility for closure

Closure is between {} , uses ‘it.get()’ as current node, is written in Groovy

Closure should be replaces by step, unless there is a need for a special manipulation

Page 45: Php in the graph (Gremlin 3)

GroupBy/GroupCount

g.V().groupCount('a').by('name').cap('a') ==>[wp_die:22, wp_header:24…]

g.V().groupCount('a').by('name') .groupCount('b').by(out().count()) .cap('a','b') ==>[a:[wp_die:22, wp_header:24], b:[22:1, 24:1]]

Page 46: Php in the graph (Gremlin 3)

Gremlin and PHP

Page 47: Php in the graph (Gremlin 3)

Gremlin For PHP

https://github.com/PommeVerte/gremlin-php

Using with Neo4j : REST API

Older API : neo4jPHP, rexpro-php

No Gremlin implementation in PHP (yet?)

Page 48: Php in the graph (Gremlin 3)

<?php

require_once('vendor/autoload.php'); // depending on your project this may not be necessa use \Brightzone\GremlinDriver\Connection;  $db = new Connection([    'host' => 'localhost',    'graph' => 'graph' ]); $db->open();   $result = $db->send('g.V(2)'); //do something with result $db->close();

Page 49: Php in the graph (Gremlin 3)

Apache TinkerPop

http://tinkerpop.incubator.apache.org/

Version : 3.2.3

Page 50: Php in the graph (Gremlin 3)

Database

StarDogsqlg

Page 51: Php in the graph (Gremlin 3)

Gremlin

Server

Console

bin/gremlin-server.sh conf/gremlin-server-modern.yaml

:install org.apache.tinkerpop neo4j-gremlin 3.2.3-incubating

Page 52: Php in the graph (Gremlin 3)

Thanks

[email protected] @exakat

http://www.slideshare.net/dseguy/

Page 53: Php in the graph (Gremlin 3)

Leaf and Roots

LEAF

ROOT

g.V().where( out('CALLS') ) .count() => 407

g.V().where( __.in('CALLS').count().is(eq(0)) ) .count() => 1304