Php in the graph (Gremlin 3)
-
Upload
damien-seguy- -
Category
Technology
-
view
83 -
download
0
Transcript of Php in the graph (Gremlin 3)
PHP in the graph
Fosdem 2017, Brussels, Belgique
Agenda
Discover the Graph
Steps with a gremlin
Gremlin and PHP
Damien Seguy
CTO at Exakat
Static analysis for PHP
PHP code as Dataset
Speaker
Wordpress call graph<?php
function _default_wp_die_handler( $message, $title = '', $args = array() ) { $defaults = array( 'response' => 500 ); $r = wp_parse_args($args, $defaults);
$have_gettext = function_exists('__');
if ( function_exists( 'is_wp_error' ) && is_wp_error( $message ) ) { if ( empty( $title ) ) { $error_data = $message->get_error_data(); if ( is_array( $error_data ) && isset( $error_data['title'] ) ) $title = $error_data['title'];
Wordpress call graph
What is gremlin?
Domain specific language for graphs
It is a programming language to traverse a graph
It is open source, vendor-agnostic
Simple traversal
V : Vertices, or nodes or objects
E : Edges, or links or relations
G : graph, or the dataset
The Verticesg represents the graph or the gremlin
v() represents all the vertices
g.V(1) is one of the vertice
Vertices always have an id
g.V(1) => v(1)
1
g.V(1) => v(1) g.V(1).values('name') => apply_filter g.V(1).values('leaf') => true g.V(1).values('compat') => ['7.0', '7.1'] g.V(1).id() => 1
// non-existing properties g.V(1).values('isFedAfterMidnight') => null
PropertiesGraph is schemaless
apply_filter
Vertice discoveryUse valueMap to discover the graph
Except id and label
g.V(2).valueMap() => {name=wp_die, leaf=true, glob} g.V(2).valueMap('name') => {name=wp_die}
wp_die
The Edges
g.E() represents all the edges
g.E(1) is the edge with id 1
Edges have id, properties
also : start, end and label
CALLS
g.E(1) => e(1) g.E(1).label() => 'CALLS' g.E(1).id() => 1 g.E(1).values('count') => 3 g.E(1).valueMap() => {count=3}
Edge discovery
wp_diewp_ajax_fetch_list CALLS
Edges link the vertices
g.E(5).outV() => v(2) g.E(6).outV() => v(3)
g.E(7).inV() => v(4)
Exiting the Edges
2
3
4
1
5
6
7
Directed graph
g.V(1).out() => v(2) v(3)
g.V(1).in() => v(4)
g.V(1).both() => v(2) v(3)
v(4)
Following Edges
2
3
4
1
5
6
7
g.V(1).inE() => e(7)
g.V(1).out().id() => 2 3
g.V(2).in().in().id() => 4
Chaining
2
3
4
1
5
6
7
Wordpress Calls GraphThe graph of all Wordpress internal function calls
function:name
function:nameCALLS
g.V(19).out('CALLS').values('name') => wp_hash_password
wp_cache_delete
g.V(19).in('CALLS').values('name') => reset_password
wp_check_password
Who's calling ?
wp_set_password
reset_password
wp_hash_password
wp_check_password
wp_cache_delete
CALLS
CALLS
CALLS
CALLS
Is it Recursive?
get_permalink get_category_link
get_category_parentsid : 30
CALLS
CALLSCALLS
g.V(30).as('myself') .out(‘CALLS’)
.retain('myself') .values('name') => get_category_parents
Is it Recursive?
get_permalink get_category_link
get_category_parentsid : 30
CALLS
CALLSCALLS
Is it Recursive?g.v(30).as('myself')
.in(‘CALLS’) .retain('myself') .values('name') => get_category_parents
get_permalink get_category_link
get_category_parentsid : 30
CALLS
CALLSCALLS
g.V(47).as('myself') .out(‘CALLS’).except('myself')
.out('CALLS').retain('myself') .values('name')
=> wp_trash_comment wp_delete_comment
Ping-Pong Functions
CALLS
wp_trash_commentid: 47
wp_delete_commentid : 148
CALLS
g.V(47).as('myself') .out(‘CALLS’).except('myself') .out(‘CALLS’).except('myself')
.out('CALLS').retain('myself') .values('name')
Ping-Pong Functions
CALLS
CALLS
CALLS
g.V(47).as('myself') .out(‘CALLS’).except('myself') .out(‘CALLS’).except('myself') .out(‘CALLS’).except('myself')
.out('CALLS').retain('myself') .values('name')
Ping-Pong Functions
CALLS
CALLS
CALLS
CALLS
g.V(47).as('myself') .repeat( out(‘CALLS’).except('myself')
).emit().times(3) .out('CALLS').retain('myself') .values('name')
Ping-Pong Functions
CALLS
CALLS
CALLS
CALLS
Up to now
nodes and vertices : basic blocs
in and out (and both) : navigation
except(), retain(), in(‘label’) : filtering
Loops with repeat()
Starting at the vertices
Traversing the graph
1 2 3
4
6
5 7
Filtering on edgesg.V().out('CALL') => v(25);
g.V().out('CALL', 'CALLED', 'XXX') => v(25);
wp_set_password
reset_password
wp_hash_password
wp_check_password
wp_cache_delete
CALLS
CALLS
CALLS
CALLS
Filtering on vertices
g.V().has('name') g.V().has('name','wp_die') g.V().has('name', neq('wp_die')) g.V().has('name', within('wp_die', 'wp_header')) g.V().has('name', without('wp_die', 'wp_header'))
wp_die
g.V().out('CALLS') .has('name','wp_die') .values('name') =>
Dying Functions
???? wp_dieCALLS
PROCESSING
wp_die wp_die wp_die wp_die wp_die wp_die wp_die wp_die wp_die
g.V().has('name','wp_die') .in('CALLS') .values('name') => wp_ajax_trash_post wp_ajax_delete_post wp_ajax_delete_meta wp_ajax_delete_link wp_ajax_delete_tag wp_ajax_delete_comment wp_ajax_oembed_cache wp_ajax_imgedit_preview we_ajax_fetch_list
Dying Functions
???? wp_dieCALLS
PROCESSING
Dying Functionsg.V().out('CALLS') .has('name','wp_die') .count() => 84
???? wp_dieCALLS
PROCESSING
PROCESSING
g.V().has('name','wp_die') .in('CALLS') .count() => 84
Dying Functionsg.V().out('CALLS') .has('name','wp_die') .dedup() .count() => 1
???? wp_dieCALLS
PROCESSING
PROCESSING
g.V().has('name','wp_die') .in('CALLS') .dedup() .count() => 84
Sampling
g.V().limit(2).count() => 2 g.V().range(2,5).count() => 3 g.V().tail(4).count() => 4 g.V().coin(0.01).count() => 44
g.V().count() => 4373 g.E().count() => 55457
wp_parse_args2
wp_get_object_terms3
wp_terms_checklist
4
count()
g.V()
wp_terms_checklist
4
g.V().as('start') .out('CALLS') .has('name','wp_die') .select('start') .by('name') => wp_ajax_trash_post wp_ajax_delete_post wp_ajax_delete_meta wp_ajax_delete_link wp_ajax_delete_tag wp_ajax_delete_comment wp_ajax_oembed_cache wp_ajax_imgedit_preview wp_ajax_fetch_list
Dying Functions
???? wp_dieCALLS
PROCESSING
Naming nodes
esc_htmlg.V()
_eCALLS
CALLS????has(‘name’) CALLS
as () : gives a name select() : select a node by() : options of display
Filtering on vertices//Functions that call wp_die and esc_html
g.V().has('name','wp_die') .in('CALLS') .as('results') .out('CALLS') .has('name', 'esc_html') .select('results')
????
wp_die
esc_html
CALLS
CALLS
CALLS
????
CALLS
Filtering on vertices//Functions that call wp_die and esc_html
g.V().where( .out('CALLS') .has('name','wp_die') ) .as('results') .out('CALLS') .has('name', 'esc_html') .select('results')
????
wp_die
esc_html
CALLS
CALLS
CALLS
????
CALLS
Filtering on vertices//Functions that call wp_die and esc_html
g.V().where( .out('CALLS') .has('name','wp_die') ) .where( __.out('CALLS') .has('name', 'esc_html') )
????
wp_die
esc_html
CALLS
CALLS
CALLS
????
CALLS
Filtering on vertices//Functions that call wp_die and esc_html
g.V().where( .out('CALLS') .has('name','wp_die') ) .where( __.out('CALLS') .has('name', 'esc_html') .count().is(neq(0)) )
????
wp_die
esc_html
CALLS
CALLS
CALLS
????
CALLS
Traversal full ahead
g.V().in/out() .has() .where() .as() .select().by().by()
Advanced
Applied to properties
Non standard functions
g.V().filter{ it.get().value('name') != it.get().value('name').toLowerCase() } .count() => 73
Closures
Steps often offer possibility for closure
Closure is between {} , uses ‘it.get()’ as current node, is written in Groovy
Closure should be replaces by step, unless there is a need for a special manipulation
GroupBy/GroupCount
g.V().groupCount('a').by('name').cap('a') ==>[wp_die:22, wp_header:24…]
g.V().groupCount('a').by('name') .groupCount('b').by(out().count()) .cap('a','b') ==>[a:[wp_die:22, wp_header:24], b:[22:1, 24:1]]
Gremlin and PHP
Gremlin For PHP
https://github.com/PommeVerte/gremlin-php
Using with Neo4j : REST API
Older API : neo4jPHP, rexpro-php
No Gremlin implementation in PHP (yet?)
<?php
require_once('vendor/autoload.php'); // depending on your project this may not be necessa use \Brightzone\GremlinDriver\Connection; $db = new Connection([ 'host' => 'localhost', 'graph' => 'graph' ]); $db->open(); $result = $db->send('g.V(2)'); //do something with result $db->close();
Apache TinkerPop
http://tinkerpop.incubator.apache.org/
Version : 3.2.3
Database
StarDogsqlg
Gremlin
Server
Console
bin/gremlin-server.sh conf/gremlin-server-modern.yaml
:install org.apache.tinkerpop neo4j-gremlin 3.2.3-incubating
Leaf and Roots
LEAF
ROOT
g.V().where( out('CALLS') ) .count() => 407
g.V().where( __.in('CALLS').count().is(eq(0)) ) .count() => 1304