July 22, 2020

Victor LeeHead of Product Strategy & Developer Relations

● BS in Electrical Engineering and Computer Science from UC Berkeley

● MS in Electrical Engineering from Stanford University

● PhD in Computer Science from Kent State University focused on graph data mining

● 20+ years in tech industry

Rayees PashaPrincipal Product Manager● MS in Computer Science

from University of Memphis

● Prior Lead PM and ENG positions at Workday, Hitachi and HP

● Expertise in Database Management and Big Data Technologies

Some Housekeeping Items

● Although your phone is muted you can ask questions at any time using the Q&A tab in the Zoom menu

● The webinar is being recorded and will be emailed you with the slides

● If you have any issues with Zoom please contact the organizer via chat

1 System Architecture Overview

Data Processing

Data Ingestion and Storage

Non-functional Features● HA● Transaction Management● Security

Feature Design Difference Benefit

Real-Time Deep-Link Querying ● Native Graph design

● C++ engine, for high performance

● Storage Architecture

● Uncovers hard-to-find patterns

● Operational, real-time

● HTAP: Transactions+Analytics

Handling Massive Scale ● Distributed DB architecture

● Massively parallel processing

● Compressed storage reduces footprint and messaging

● Integrates all your data

● Automatic partitioning

● Elastic scaling of resource usage

In-Database Analytics ● GSQL: High-level yet Turing-complete language

● User-extensible graph algorithm library, runs in-DB

● ACID (OLTP) and Accumulators (OLAP)

● Avoids transferring data

● Richer graph context

● In-DB machine learning

5 to 10+ hops deep

Advantages:● Simple to setup and

manage

● Unlimited scale-out; simple to expand

● Scalable OLAP:massively parallel processing

● Scalable OLTP:concurrent ACID transactions

● Economical

Simple setup, Performant design● Setup: Just tell TigerGraph how many servers.● TigerGraph seamlessly distributes data.● Users see a single database, not shards.

Real-time active replicationfor High Availability (HA)

● write to all● read from any● strong

consistency

Step 3

Each GPE consumes the partial data updates, processes it and puts it on disk.

Loading Jobs and POST use UPSERT semantics:

● If vertex/edge doesn't yet exist, create it.

● If vertex/edge already exists, update it.

● Idempotent

Step 1

Loaders take in user source data.

● Bulk load of data files or a Kafka stream in CSV or JSON format

● HTTP POSTs via REST services (JSON)

● GSQL Insert commands

Step 2

Dispatcher takes in the data ingestion requests in the form of updates to the database.

1. Query IDS to get internal IDs

2. Convert data to internal format

3. Send data to one or more corresponding GPEs

Incremental Data

Nginx Restpp

GPE GPE GPE

Disk Disk Disk

CSV/JSON Insert/Update/Delete Vertices and Edges

Listen to corresponding topic for new messages

Acknowledge

Response

Incoming

Outgoing

Synchronize data to disk

GSE(IDS)

ID Translation

Kafka Kafka Kafka

Server 1 Server 2 Server 3

Kafka Cluster

In-memorycopy of data

“USER123” <---> 1234321 IDS: Bidirectional external ID to Internal ID mapping

1234321, John, 33, john@abc.com1234322, Tom, 27, tom@abc.com

...Vertex Partitions: Vertex internal ID and attributes

1234321, 1234322, 2020-04-23, 3.3 1234321, 1234324, 2020-02-13, 2.3

...Edge Partitions: Source vertex internal ID, target vertex internal ID, edge attributes

1 2 34 5 6

7 8 910 11 12

10 11 12

7 8 910 11 12

13 14 1516 17 18

13 14 15

16 17 18

13 14 1516 17 18

Data is split into segments.

Data in segments is distributed across the cluster.

The segments with same ID store data for the same set of vertices.

The location of a vertex can be calculated based on its internal ID

VERTEX

1392273

Partition 1 Partition 2 Partition 314

Users use GSQL, GraphStudio or RESTful API to submit a query using HTTP GET/POST request

Dispatcher queries ID Service, convert the request into internal format, and route it to a GPE

REST server responds back the HTTP request, in JSON format

GPE performs the computation.Dispatcher collects and returns the final GPE run result to REST server

One of the REST servers parses the request based on graph schema and forwards it to a dispatcher

GPE GSE(IDS) MISCPROC

Query 1snapshot

Query 2snapshot

System Memory

Query 3snapshot ...

Graph Partitions

Query Memory

BidirectionalID Mapping

Graph Updates

Global Accumulators(Global Copy/Local Copy)

Local Accumulators

Shuffled Partitions/Accumulators

Current Activated Vertex Set

Upserted Vertices/Edges

TigerGraph Memory

Nginx Restpp

GSE(IDS)

Incoming

Outgoing

Request

Query Name, parameters, vertexes

External Vertex ID

TranslatedInternal Vertex ID

Query Name, Parameters

Process Query Logic….

Internal Vertex IDIn Response

Nginx Restpp

GSE(IDS)

Request

JSON Response

Translated ExternalVertexIDCombined

ResponseQuery Result

Schema-Based

Optimizes storage efficiency and query speed. Supports data-independent app/query development.

Built-in High Performance Parallelism

Achieves fast results while being easy to code. Accumulators turbocharge parallel computation.

SQL-Like

Familiar to 1 millionusers

ConventionalControl Flow (FOR, WHILE, IF/ELSE)Makes it easy to implement conventional algorithms

Procedural Queries

Parameterized queries are flexible and can be used to build more complex queries

Transactional Graph UpdatesHTAP - Hybrid Transactional / Analytical Processing with real-time data updates

Server 3Server 2Server 1

Single Server Mode Query

Single Server Mode

● The cluster elects one server to be the master for that query.

● All query computations take place on query master.

● Vertex and edge data are copied to the query master as needed.

● Best for queries with one or a few starting vertices.

Server 3Server 2Server 1

Distributed Query(Master Node)

Distributed Mode

● The server that receives the query becomes the master.

● Computations execute on all servers in parallel.

● Global accumulators are transferred across the cluster.

● If your query starts from all or most vertices, use this mode.

Distributed Query Distributed Query

@@ communication @@ communication

@@ communication

Single Server Mode Query

Single Server mode VERSUS Distributed mode

Distributed Query(Master Node)

Distributed Query Distributed Query

@@ communication

Single Server Mode is better when1. Starting from a single or small number of vertices.

2. Modest number of vertices/edges are traversed.

3. Heavy usage of global accumulators.

Ex: Point query, single entity-based transaction/update

Distributed Mode is better when1. Starting from all or a large number of vertices.

2. Very large number vertices/edges are traversed.

Ex: Most graph algorithms & global analytics (PageRank, Closeness Centrality, Louvain Community, etc.)

Accumulators are a special type of variables that accumulate information from multiple workers/threads during a query. Workers/threads work asynchronously.

Accumulating phase 1: Receive messages and store them temporarily in a bucket that belongs to the accumulator.

Accumulating phase 2: Aggregate the messages it received based on its accumulator type. The aggregated value will become the accumulator’s value which can be accessed by other parts of the Query.

The GSQL language provides many different accumulators, which follow the same rules for receiving and accessing data. However, each of them has their unique way of aggregating values.

●●●

● TigerGraph HA Replication provides both Increased Throughput and Continuous Operation

● Cluster size = P X R (Partitions x Replicas)

● Any cluster size is allowed, except 1x2

P = Partitioning Factor

R =ReplicationFactor

● Each server has T available workers for serving requests (GSQL query, REST POST, etc.)T is a system configuration parameter, defaults to 8. Consider number of CPU cores.

● Cluster's total number of workers = TxPxR, e.g. 8x5x2 = 80○ A point mode query uses 1 worker.○ A distributed mode query use P workers.

● All Replicas are Read/Write, always in sync with the latest updates● Writes go to all replicas (e.g. both 1A and 1B).● Reads can be from any one replica (e.g. either 1A or 1B).● Distributed queries can mix replicas (e.g. {1A, 2B, 3B, 4A, 5B}

is a valid active set for a request.)

● If any single server is unavailable (expected or unexpected):○ When it fails to respond after a certain number of tries, requests will automatically

divert to another replica (e.g. 3B is unavailable, so use 3A)○ If it fails in the middle of a transaction, that transaction might be aborted.

● System continues to operation, with reduced throughput, until server is restored.

R =ReplicationFactorX

●●●

● Follows SQL approach for roles.

○ GSQL:…

● Can map TigerGraph roles to external LDAP roles and groups.

● Share & Collaborate○ Multiple groups share one master

database⇒ data integration, insights, productivity

● Real-time, UpdatableShared updates, no copying⇒ cleaner, faster, cheaper, safer

● Fine-Grained Security○ Each group is granted its own view○ Each group has its own admin user,

who manages local users' privileges.

Built-In Roles:● Queryreader: run existing loading jobs & queries for its assigned graph.

● Querywriter: Queryreader privileges + create queries and run data-manipulation commands on its assigned graph.

● Designer: Querywriter privileges + modify the schema, create loading jobs for its assigned graph.

● Globaldesigner: Designer privileges + create global schema, create objects. Also, delete graphs which they created.

● Admin: Designer privileges, + create/drop users, grant/revoke roles for its assigned graph. That is, control existence & privileges of its local users.

● Superuser: admin privileges on all graphs. Create global vertex & edge types, create multiple graphs, and clear the database.

User-Defined Roles: in development

● Encrypted Data at Rest○ Choice of encryption levels (file, volume, partition, disk)

■ Kernel level: dm-crypt / cryptsetup■ User level: FUSE (Filesystem in User Space)

○ Automatically encrypted in TigerGraph Cloud

● Encrypted Data in Transit○ Can set up SSL/TLS for HTTPS protocol○ Automatically encrypted in TigerGraph Cloud

●●●

● The TigerGraph distributed database provides full ACID transactions with sequential consistency

● Transactions definition:○ Each GSQL Query procedure is a transaction. Each query may have

multiple SELECT, INSERT, or UPDATE statements.

○ Each REST++ GET, POST, or DELETE operation (which may have multiple update operations within it) is a transaction.

The TigerGraph platform implements write-ahead logging (WAL) to disk to provide durability.

Logs are consumed periodically to update the database on disk.

Durability

Repeatable Read:● Each transaction sees

the same data.

No Dirty/Phantom Read:● A transaction's

updates are not visible to other transactions until the update is committed.

Isolation Level

Single-server Consistency:A transaction obeys data validation rules.

Distributed System Sequential Consistency: Every replica of the data performs the same operations in the same order.

Consistency

GSQL query w/ or w/o updates = Transaction

Transactions are “all or nothing”: either all changes are successful, or none is successful.

Atomicity

TigerGraph Platform Overview: https://docs.tigergraph.com/intro/tigergraph-platform-overview

HA Cluster Configuration: https://docs.tigergraph.com/admin/admin-guide/installation-and-configuration/ha-cluster

Transaction Processing and ACID Support:

https://docs.tigergraph.com/dev/transaction-and-acid

MultiGraph:

https://docs.tigergraph.com/intro/multigraph-overview

● Try TigerGraph Cloud

● Download TigerGraph’s Developer Edition

● Take a Test Drive - Online Demo

● Get TigerGraph Certified

● Join the Community

Continue your journey at TigerGraph Cloud - TGCLOUD.IO !

July 22, 2020

Documents

Transcript of July 22, 2020

The Scoop July newslettergypsumcoop.org/scoop/scoop2020/scoop2007.pdf · 2020. 6. 22. · The Scoop community newsletter July 2020 Saturday July 18th Serving from 4pm ‘til GONE!!

July 22, 2020 To Gifu University students Hirokazu Fukui ...

Robotics: Science and Systems 2020 Corvalis, Oregon, USA, July … · 2020. 6. 22. · Robotics: Science and Systems 2020 Corvalis, Oregon, USA, July 12-16, 2020 1 Generalized Tsallis

Webinar date: 22 July 2020. The presentation will begin at ...

Insurance Webinar Series 2020: Market Insights & Trends in ......Insurance Webinar Series 2020: Market Insights & Trends in the Motor Insurance Industry July 8, 2020 July 22, 2020

ILLINOIS...24 June 1, 2020 June 12, 2020 25 June 8, 2020 June 19, 2020 26 June 15, 2020 June 26, 2020 27 June 22, 2020 July 6, 2020 28 June 29, 2020 July 10, 2020 29 July 6, 2020 July

July 2020 Volume 22 Issue 7 · 2020-06-30 · Ascension Lutheran Church, 1479 Morse Road, Columbus, Ohio 43229 July 2020 Volume 22 Issue 7

Page 2—The Ad-Vertiser—July 22, 2020 Ad... · 2020. 7. 21. · Page 2—The Ad-Vertiser—July 22, 2020 I HAVE A BEAUTIFUL big office available on the second floor in the Portal

agenda for july 21-22, 2020 board meeting...Jul 07, 2020 · Tuesday, July 21, 2020 – 9:30 a.m. PUBLIC WORKSHOP Wednesday, July 22, 2020 – 9:30 a.m. Video and Teleconference Meeting

Wednesday, July 22, 2020 AHC Meeting_Sli… · 22/07/2020 · • September 24, 2019 –Review Began • May 14, 2020 –Findings Presented by Guidehouse • June 10, 2020 –Commission

Wednesday, July 22, 2020 - Oregon · 2020. 7. 22. · Wednesday, July 22, 2020 Below are some highlights from the Governor’s Regional Solutions (RS) Coordinators, on behalf of the

WEDNESDAY, JULY 22, 2015 at PHILADELPHIA PHILLIESmlb.mlb.com/.../137984344/7_22_15_PREGAME_NOTES_0nef9x00.pdf · 2020-04-20 · WEDNESDAY, JULY 22, 2015 at PHILADELPHIA PHILLIES RH

ONCOLOGY NUTRITION IN 2020: THE INTERSECTION OF … · 2020. 7. 22. · oncology nutrition in 2020: the intersection of evidence, guidelines and clinical practice. july 22, 2020.

WELCOME Day One 22 July 2020 · 2 days ago · 1 WELCOME Day One 22 July 2020 Charleston Defense Industry Association 54th Small Business and Industry Outreach Initiative Virtual

July 22, 2020 The Market That Moves America 2020 ACG Webinar vf_0.pdf · July 22, 2020. The Market That Moves America. Findings from the 2Q 2020 Middle Market Indicator Report . ...

CUSCO, PERU machu picchu - Moondance Adventures · 2019-10-23 · CUSCO, PERU (CUZ) JUNE 9 – JUNE 22, 2020 JUNE 24 – JULY 7, 2020 JULY 9 – JULY 22, 2020 CUZ Nestled in the Andes

VENICE, ITALY dolomites - Moondance Adventures...VENICE, ITALY (VCE) JUNE 9 – JUNE 22, 2020 JUNE 25 – JULY 8, 2020 JULY 11 – JULY 24, 2020 VCE La Vita e l’avventura! That’s

Volume 22 Number 14 Green 21 July 2020 Chemistry

· Web viewAmendments; 13 May 2020 / 22 May 2020 / 3rd July 2020 / 18th August 2020 / 3rd , 8th, 28th September /2020/7th October 2020/4 Jan 2021. Amendments; 13 May 2020 / 22 May

Bid Letting WEDNESDAY, JULY 22, 2020 · 2020. 7. 7. · contingent letting date for this month is july 29, 2020 for receipt of bids with the opening and reading on july 29, 2020.