Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
-
Upload
-eric-david-benari-pmp -
Category
Technology
-
view
1.169 -
download
0
Transcript of Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
![Page 1: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/1.jpg)
http://blazegraph.com/ http://blazegraph.com/
Global Knowledge Collaboration to Cure Cancer: How GPUs Impact Graph & Predictive Analytics Database Camp July 10, 2016 Blazegraph Brad Bebee, CEO
![Page 2: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/2.jpg)
http://blazegraph.com/
Exploding Data Volumes Requires New Approach for Relationship Insight
SYSTAP™, LLC. © 2006-2015 All Rights Reserved
Graph databases are designed to analyze diverse entities and relationships. Today’s datasets have billions of edges and nodes.
A (very small) Knowledge Graph
2
![Page 3: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/3.jpg)
http://blazegraph.com/
Powers Business with Billion Edge
Datasets
Information Management Retrieval
Industrial IoT
Cyber Defense Intelligence
Financial Services Fraud
Detection
Life Sciences Precision Medicine
SYSTAP™, LLC. © 2006-2016 All Rights Reserved 3
Blazegraph Vision: The Scalable Solution for Graphs 500+ weekly downloads Thousands of active deployments
Blazegraph
Enterprise High Availability
Enterprise GPU Acceleration
Embedded and Single Server Deployments
![Page 4: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/4.jpg)
http://blazegraph.com/
Precision Medicine and Hospitals
• 5,686 Hospitals in the United States • Top 10% have 100B+ Edge Graph Problems • 256 K-80 GPUs for a 100B Edge Graph Integration Application • Syapse Precision Medicine Video 4
3/10/15
5-10BEdgeGraphProblems
![Page 5: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/5.jpg)
http://blazegraph.com/ Uncovering influence links in molecular knowledge networks to streamline personalized medicine | Shin, Dmitriy et al.Journal of Biomedical
Informatics , Volume 52 , 394 - 405"
Finding the Next Cure for Cancer is a Billion+ Edge Graph Challenge
![Page 6: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/6.jpg)
http://blazegraph.com/
GraphsEnablePeopletoFindKnowledge
A Bunch of Pages! An Answer
![Page 7: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/7.jpg)
http://blazegraph.com/
Graph Analytic Applications: Cyber Defense Example • Path algorithms (BFS, SSSP,
APSP, CC) and their extensions (closeness, betweenness, etc)
• Ranking algorithms (cardinality, PageRank, BadRank, SecureRank)
• Clustering algorithms (canopy, k-means, Jaccard, community detection, etc)
• Many others
7
![Page 8: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/8.jpg)
http://blazegraph.com/
http://bit.ly/1UdP2Sn
Blazegraph Stands Out!
Wikimedia Evaluation:
SYSTAP, LLC DBA Blazegraph. © 2006-2016 All Rights Reserved 8
![Page 9: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/9.jpg)
http://blazegraph.com/
Blazegraph™: Embedded and Single Server • High performance, Scalable
– 50B edges/node – RDF/SPARQL level query language – Efficient Graph Traversal – High 9s solution
• Property graphs – Blueprints, gremlin, rextser
• REST API (NSS) • Extension points
– Stored queries for custom application logic on the server.
– Custom services & indices – Custom functions – Vertex-centric programs
Embedded Server
Standalone Server
9
JVM
Journal
WAR
Journal
![Page 10: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/10.jpg)
http://blazegraph.com/
Blazegraph™: High Availability • Shared nothing architecture
– Same data on each node – Coordinate only at commit – Transparent load balancing
• Scaling – 50 billion triples or quads – Query throughput scales linearly
• Self healing – Automatic failover – Automatic resync after disconnect – Online single node disaster recovery
• Online Backup – Online snapshots (full backups) – HA Logs (incremental backups)
• Point in time recovery (offline) 10
HAService
Quorumk=3
size=3
follower
leader
HAService
HAService
![Page 11: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/11.jpg)
http://blazegraph.com/
The Billion-Edge Graph Challenge: Scaling Up Requires the Right Paradigm and Hardware
SYSTAP, LLC DBA Blazegraph. © 2006-2016 All Rights Reserved 11
https://datatake.files.wordpress.com/2015/09/latency.png
Type
of C
ache
or M
emor
y
Access Latency Per Clock Cycle
4.3
3
4
11
11
11
14
18
38
167
0 50 100 150
L1 Cache sequential access
L1 Cache in Page Random access
L1 Cache in Full Random access
L2 Cache sequential access
L2 Cache in Page Random access
L2 Cache in Full Random access
L3 Cache sequential access
L3 Cache in Page Random access
L3 Cache in Full Random access
Main memory
CPU Cache Access Latencies in Clock Cycles
Graph Cache Thrash The CPU just waits for graph data from main memory...
![Page 12: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/12.jpg)
http://blazegraph.com/
Blazegraph Multi-GPU: Extreme Scale Traverse 1B Edges/Sec (GTEP) 40x more affordably!
SYSTAP, LLC DBA Blazegraph. © 2006-2016 All Rights Reserved
Cray XMT-2 $~180K per GTEP
CO
ST C
OS
T
Large Hadoop Cluster $~18M per GTEP 1 GTEP = 1 Billion Traversed Edges Per Second
Blazegraph with GPU Clusters
$16K per GTEP (K40)
$4K per GTEP (Pascal)
![Page 13: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/13.jpg)
http://blazegraph.com/
With familiar graph APIs; 200-300x acceleration with almost no code changes.
Enabling GPU Acceleration without Code Changes
SYSTAP, LLC DBA Blazegraph. © 2006-2016 All Rights Reserved 13
Blazegraph GPU Acceleration NVIDIA
Tesla GPU
Graph DB
Blazegraph Plug-in for GPU Acceleration Solves “Graph Cache Thrash”
Funded by and co-developed with:
![Page 14: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/14.jpg)
http://blazegraph.com/
172,632 results 187ms on Blazegraph GPU
172,632 results 53,960ms on Blazegraph CPU
LUBM Query #9 U1000 (167,697,263 Edges w/Inference) Just Add GPUs –Testing Shows 200-300x Speed-up
SYSTAP, LLC DBA Blazegraph. © 2006-2016 All Rights Reserved 14
Graph Database
Graph Database
290X
![Page 15: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/15.jpg)
http://blazegraph.com/
1403
1843
700
0
200
400
600
800
1000
1200
1400
1600
1800
2000
700M Edges Single Node Xeon 2650 (128G) vs 2 K80 (48G) 1.98 Edges 16 EC2 r3.xlarge (488G) vs 16 K40s (192G) 1.98 Edges 16 EC2 r3.4xlarge (1952G) vs 16 K40s (192G) 1.98 Edges Spark CPU Baseline
GPUs 700x-1800x Faster for Graphs Compared to Apache Spark on 700M and 1.9B Edge Graph
SYSTAP™, LLC. © 2006-2015 All Rights Reserved
Spe
ed-u
p (H
ighe
r is
fast
er)
1
15
Speed-up over Baseline Spark CPU Configuration
![Page 16: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/16.jpg)
http://blazegraph.com/
Did you hear the one about the Analytics Developer who is also a Parallel Programming expert?
SYSTAP™, LLC © 2006-2015 All Rights Reserved
16
Need:Simpler,smarteralgorithmsthatrunefficientlyonGPUswithoutrequiringknowledgeoftheparallelprogrammingordevice-specificopTmizaTon.
Algorithm Developer Parallel Programmer
![Page 17: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/17.jpg)
http://blazegraph.com/
Blazegraph DASL for GPU Acceleration
SYSTAP™, LLC. © 2006-2015 All Rights Reserved 17
Blazegraph DASL – A Domain specific language for graphs with Accelerated Scala using Linear algebra
…you can see why we call it “DASL”
Automation Enabling Analytics Coders to Optimize for GPU
Delivering ease of use of Spark and Scala for graph algorithms and predictive analytics… your analytic code can work on GPUs and Blazegraph.
DASL Executor Multi-GPU Extension
DASL Translator
DASL Graph Algorithms
…with the speed of CUDA… …Graph and Machine
Learning Algorithms…
![Page 18: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/18.jpg)
http://blazegraph.com/
DASL your analytics
• DASL is functional, domain-specific language (DSL) for graph and machine learning algorithms
• Provides of linear algebra primitives supporting the DSL designed for efficient, multi-GPU execution
• Reduce the barrier of graphs on GPUS, enabling simpler and smarter algorithms
• Co-opt existing data ecosystems, such as Apache Spark and Hadoop, for ease of adoption and mission impact
SYSTAP™, LLC © 2006-2015 All Rights Reserved
18 3/10/1
5
Graph and Machine Learning Algorithms
1000X
DASL Executor Multi-GPU Extension
DASL Translator
DASL Graph Algorithms
![Page 19: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/19.jpg)
http://blazegraph.com/
Anatomy of a DASL algorithm
19
![Page 20: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/20.jpg)
http://blazegraph.com/
GPU-Accelerated DASL Algorithms
SYSTAP™, LLC. © 2006-2015 All Rights Reserved 20
Graph
• Hierarchical partitioning/graph coarsening
• Louvain modularity • Jaccard Similarity • Triangle counting
Collaborative Filtering
(Recommendation Systems)
• User-based, Item-based
• Matrix Factorization with ALS (alternating least squares)
• Weighted Matrix Factorization, SVD+
Supervised Neural Network Techniques
• Hidden Markov Models
• Multilayer Perceptron
Clustering
• Canopy, k-Means, Fuzzy k-Means, Streaming k-Means
• Spectral Clustering
Dimensionality Reduction
• Singular Value Decomposition (top-k)
• Lanczos Algorithm • Stochastic SVD • QR Decomposition • Non-Negative matrix
factorization (NNMF) • Distance Geometry
…and many others!
![Page 21: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/21.jpg)
http://blazegraph.com/
Stay in Touch Brad Bebee, CEO
[email protected] http://blazegraph.com
@blazegraph
![Page 22: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/22.jpg)
http://blazegraph.com/
Case Study - NetFlow Data Analysis • What is NetFlow data?
• NetFlow is a network protocol developed by Cisco for collecting IP traffic information and monitoring network traffic. http://www.solarwinds.com/what-is-netflow.aspx
• Sample NetFlow record {"start_time":"2007-08-01 14:31:02.946”,”duration":0.000,
“src_addr":"122.166.71.110", “dst_addr":"9.155.118.136", "src_port":10822,"dst_port":13567,"protocol":"UDP “,"tcp_flags":"......","input_packets":1,"input_bytes":46}
• How is NetFlow collected? • Generated by routers and forwarded to collection point • Processed out of pcap records using tools such as nfcapd/
nfdump
22
![Page 23: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/23.jpg)
http://blazegraph.com/
Graph Algorithms Applicable to NetFlow • Path algorithms (BFS, SSSP, APSP,
CC) and their extensions (closeness, betweenness, etc)
• Ranking algorithms (cardinality, PageRank, BadRank, SecureRank)
• Clustering algorithms (canopy, k-means, Jaccard, community detection, etc)
• Many others!
23
![Page 24: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/24.jpg)
http://blazegraph.com/
Challenges of Network Flow Analysis
24
• The volume of NetFlow packets could not previously be efficiently analyzed or visualized to identify threats in real time
– One hour of collection on a modest network can easily generate more than 100M NetFlow records
– Visualizing this data to identify points of interest is nearly impossible due to the “hairball effect”, even after time windowing the data
– These issues largely led to attempts at batch processing the data on large clusters of machines using Hadoop and/or Spark
– Pushing the problem into the batch space puts organizations in the unenviable position of hoping to identify why and by whom their network was attacked last month rather than being able to identify an attack as it is occurring.
![Page 25: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/25.jpg)
http://blazegraph.com/
Initial Graph Filtered by PageRank using DASL
25
![Page 26: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/26.jpg)
http://blazegraph.com/
Interactive Visual Query Session
26
![Page 27: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/27.jpg)
http://blazegraph.com/
Visual Analysis - Identification of Exfiltrated Traffic
27
![Page 28: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/28.jpg)
http://blazegraph.com/
Other new capabilities for high-speed, large-scale graph processing
New technologies to deliver to deliver an HPC capability for processing very large scale graphs at high speed. • Blazegraph DASL provides a high-level way to author graph
analytics and integrate with Apache Spark and Hadoop Data ecosystems.
• Power8 / OpenPOWER bring high speed interconnects for moving data quickly (3X faster)
• New NVIDIA P100 GPUs provide stacked memory, 720 GB/s of memory bandwidth, and unified memory addressing.
SYSTAP™, LLC. © 2006-2015 All Rights Reserved 28
![Page 29: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/29.jpg)
http://blazegraph.com/
Research Space
SYSTAP™, LLC. © 2006-2015 All Rights Reserved 29
![Page 30: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/30.jpg)
http://blazegraph.com/
Use
rs
Original data sources
Fron
tend
B
acke
nd
Knowledge Graph Creation
Museum visitor
Museums and other sources
• Data crawling • Data transformation (CIDOC-CRM) • Data Interlinking • Data enrichment / Information
extraction
Cards
Social networks
DBpediaBriTshMuseumData UserData
Mobile App
• HTML5 Templates + CSS for mobile devices
• Google Glass App • QR Code recognition • Pattern / image recognition • Context reasoning
Knowledge Graph Exploration
• Templates for visualization (CIDOC-CRM and external data)
• Timeline, Maps • PivotViewer for visual exploration • Semantic Search • Graph Analytics (GAS)
Website visitor Data Expert
Research Space / British Museum Project Architecture and Use Cases
![Page 31: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/31.jpg)
http://blazegraph.com/
System will either have their own terminology to qualify place Or More commonly it will be implicit and not explicit. Relationships both support and bypass this heterogeneity at the same time
![Page 32: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/32.jpg)
http://blazegraph.com/
![Page 33: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/33.jpg)
http://blazegraph.com/
FROM means • Thing has been
located at a place or • Thing was created at a
place or • Thing was created by
a person from a place or • Thing was part of an
event that happened at a place
or • Thing was acquired at
a place Research Space / British Museum Video
![Page 34: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/34.jpg)
http://blazegraph.com/
This subset of FROM is just “is/was located at” – These are photographs that are located in India
![Page 35: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/35.jpg)
http://blazegraph.com/
This subset of FROM is just “refers to” India – These objects may not be in India, or from India, but make some reference to India. India is a subject.
![Page 36: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/36.jpg)
http://blazegraph.com/
Things from India created in the 17th century
![Page 37: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/37.jpg)
http://blazegraph.com/
This is a life boat from a ship (The Nancy Packet) wrecked while retuning from India.
![Page 38: Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph](https://reader031.fdocuments.us/reader031/viewer/2022030216/58880ed01a28ab083c8b49a7/html5/thumbnails/38.jpg)
http://blazegraph.com/
Getting to 1 Trillion Edge Graphs on GPUs
- 1 GPU - K20 (6G) : 125M edges - K40 (12G) : 250M edges - K80 (24G) : 500M edges (Gemini) - M40 (24G) : 500M edges - Pascal (32G) : 750M edges (Future)
- 1 Node - 8 x K80 : 4B edges - 8 x Pascal : 6B edges
- Cluster with 1T edges - 2000 Pascal GPUs (242 nodes) (Future)
38