Overview of Modern Graph Analysis Tools

35
Overview of Modern Graph Analysis Tools Keiichiro Ono Cytoscape Core Developer Team UC, San Diego Trey Ideker Lab / National Resource for Network Biology 5/24/2016 Ideker Lab Meeting

Transcript of Overview of Modern Graph Analysis Tools

Page 1: Overview of Modern Graph Analysis Tools

Overview of Modern Graph Analysis ToolsKeiichiro OnoCytoscape Core Developer TeamUC, San Diego Trey Ideker Lab / National Resource for Network Biology

5/24/2016 Ideker Lab Meeting

Page 2: Overview of Modern Graph Analysis Tools

Recap

Cytoscape Session File — for sharing results

But what about process?

Page 3: Overview of Modern Graph Analysis Tools

http://www.the-scientist.com/?articles.view/articleNo/43632/title/Get-With-the-Program/

https://theconversation.com/how-computers-broke-science-and-what-we-can-do-to-fix-it-49938http://www.nature.com/nature/journal/v483/n7391/full/483531a.html

Reproducibility…it’s a known issue

Page 4: Overview of Modern Graph Analysis Tools

Data Preparation

Analysis Visualization

Page 5: Overview of Modern Graph Analysis Tools

Advanced Users: Cytoscape for Interactive Visualization

R/Python for Data Manipulation / Analysis

Page 6: Overview of Modern Graph Analysis Tools

Lab Notebook for in silico Experiments

Page 7: Overview of Modern Graph Analysis Tools

Interactive Command-Line +

Markdown-based Documents

Page 8: Overview of Modern Graph Analysis Tools

Question

• Cytoscape is a desktop application

• Point & click GUI operation

• Easy to use, but how can we make our workflow reproducible?

Page 9: Overview of Modern Graph Analysis Tools

REST

Page 10: Overview of Modern Graph Analysis Tools

What is cyREST?

- Platform-independent, RESTful API module for Cytoscape - Means you can access basic Cytoscape data objects

programmatically - Now it’s a Cytoscape Core feature!

REST

Page 11: Overview of Modern Graph Analysis Tools

Get full network with unique ID 52 as JSON

GET http://localhost:1234/v1/networks/52

Page 12: Overview of Modern Graph Analysis Tools

But, don’t use cyREST (directly)!

Page 13: Overview of Modern Graph Analysis Tools

Language-Specific Shims

For Python For R

Page 14: Overview of Modern Graph Analysis Tools

RCy3

• R wrapper for cyREST

• Now a part of Bioconductor

• Easy to install

• Natural API for R users

Page 15: Overview of Modern Graph Analysis Tools

py2cytoscape

• Python wrapper for cyREST

• Supports high-level API

• Cytoscape.js viewer included

• Supports for iOS/Android

Page 16: Overview of Modern Graph Analysis Tools

Example

Page 17: Overview of Modern Graph Analysis Tools

Creating an empty network with raw cyREST API

Page 18: Overview of Modern Graph Analysis Tools

…and with py2cytoscape

Page 19: Overview of Modern Graph Analysis Tools

http://nbviewer.jupyter.org/gist/keiono/73da21846b6f73de70122bdb545c1c14

Page 20: Overview of Modern Graph Analysis Tools

https://github.com/cytoscape/cyREST/wiki/Running-your-workflow-in-the-clouds

Page 21: Overview of Modern Graph Analysis Tools

Now you have…

• Programmatic access to Cytoscape functions

• Notebooks to run your workflows

• Remote machines (clusters/clouds) for CPU intensive tasks

Page 22: Overview of Modern Graph Analysis Tools

Graph Libraries as Analytic Engine for Cytoscape

Page 23: Overview of Modern Graph Analysis Tools

In-Memory Graph Analysis

N < millions

Page 24: Overview of Modern Graph Analysis Tools

NetworkX

Pros:- Easy to install- Most of basic graph operations

Cons:- Slow!

Page 25: Overview of Modern Graph Analysis Tools

igraph

Pros:

- Has a lot of analysis featuresStandard graph statistics, community detection, label propagation, etc.

- Fast (comparing to NetworkX)

Cons:

- Weird API (for Python Users)

Page 26: Overview of Modern Graph Analysis Tools

graph-tool

Pros:

- Fast (Optimized with C++)- Nice visualization features

Cons:

Hard to install

Page 27: Overview of Modern Graph Analysis Tools

Parallel Graph Analytics (PGX)

- Oracle’s experimental project- There are lots of unknowns due to its stage (early experimental release), but has a lots of features, just like igraph

Page 28: Overview of Modern Graph Analysis Tools

Don’t use NetworkX for large data sets…

Page 29: Overview of Modern Graph Analysis Tools

FYI: GPU-Based Layouts

~100x faster

Page 30: Overview of Modern Graph Analysis Tools

Out-of-Core Graph Analysis

N > billions

Page 31: Overview of Modern Graph Analysis Tools

GraphX

• Part of Apache Spark Project

• Industry Standard

• Lots of documentation and supports from the community

• You can use Python and R, but in Spark world, Scala is still the first-class citizen…

End-to-end PageRank performance (20 iterations, 3.7B edges)

Page 32: Overview of Modern Graph Analysis Tools

GraphLab Create

• Commercial Service by Dato

• High-level API and data structure

• SFrame/SGraph

• Their version of scalable-DataFrames

• (Semi) automatic parallel processing

Page 33: Overview of Modern Graph Analysis Tools

Neo4j v3- This one focuses on storing arbitrary large graph (billions of nodes /edges) data

- Has some analysis features

- Now natively support Python

Page 34: Overview of Modern Graph Analysis Tools

Summary

• Don’t use NetworkX unless it’s necessary!

• Don’t use raw cyREST API if you are Python/R users

• There are lots of new graph analysis tools

• Some of them are bit hard to install / Setup

• Candidates for CI services (?)

• We deploy to servers, and you can access from simple API