Scaling JSON Documents and Relational Data in Distributed ... · Oracle Code New York ......
Transcript of Scaling JSON Documents and Relational Data in Distributed ... · Oracle Code New York ......
Scaling JSON Documents and Relational Data in Distributed Sharded DatabasesOracle Code New York
Christoph Bussler
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Christoph BusslerCMTSMarch 21, 2017
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
3
Presentation Agenda
Background, Context and Goals
Oracle 12c as Multi-Modal Database
JSON OLTP
1
2
3
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
JSON OLTP
Analytics support for JSON
Sharding support for JSON
3
4
5
4
Background, Context and Goals
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Background, Context and Goals
5
Background: Federated Application System Architecture
• Strategy for concurrent relational data and JSON data management?
• That was easy:
– Deploy one database management system supporting one data type each!
• Application system architecture
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
– Two databases
–One optional access layer for each
– Application accessing two access layers
6
RELATIONALDBMS
JSON DBMS
• Access
– Two systems, two set of drivers, two interfaces, two query languages, two data type semantics
• Transactions
– Local to database, not distributed
• Scalability
– Two approaches
• Engineering knowledge
– Separated engineering knowledge, two communities, two test environments
Background: Federated Application System Architecture Evaluation
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
– Local to database, not distributed
– Possibly different transaction models
– Failure recovery to be done by application logic code
• Analytics
– Separated, not on common data set
communities, two test environments
• Management
– Separate systems, different backup functionality and strategies, non-coordinated backup
• Support
– Two support systems
7
Alternative Application System Architecture
• Two DBMSs supporting one data type each
• One DBMS supporting two (or more) data types concurrently, integrated and homogenously
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 8
DBMS
Context
• Recent versions of Oracle 12c
–Oracle 12c Release 1
–Oracle 12c Release 2 ("Oracle 12c")
• JSON data structure support is one area of major functional enhancement in all areas of the database functionality
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
in all areas of the database functionality
– Storage
–Querying
– Analytics
– Sharding
9
Goals
• Goals
– JSON OLTP (Online Transaction Processing)
• Introduce "traditional" software system architecture for JSON processing
• Provide overview of Oracle 12c JSON support
– JSON Analytics
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
– JSON Analytics
• Discuss "traditional" software system architecture for JSON analytics
• Provide overview of Oracle 12c JSON in-memory analytics support
– JSON Sharding
• Discuss Oracle 12c JSON sharding support
10
Oracle 12c as Multi-Modal Database
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle 12c as Multi-Modal Database
11
Multi-Model Database
• Database management system that supports more than one data type
• Oracle 12c
– Relational model
– Object/relational model
– XML
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
– XML
– RDF
– Topology (Graph)
– JSON
• Independent of data model, the same non-functional properties are supported
– E.g., backup/restore, RAC database, Data Guard, In-Memory option, sharding, etc.
12
JSON OLTP
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
JSON OLTP
13
JSON
• JavaScript Object Notation (JSON) Data Interchange Format
{"firstName": "Chris","lastName": "Bussler","zip": 94065}
{"productId": 1011,
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
{"productId": 1011,"sizes": [4, 5, 6, "custom"]}
14
Standards (I)
• The JavaScript Object Notation (JSON) Data Interchange Format
– Internet Engineering Task Force (IETF)
– Request for Comments: 7159
–Obsoletes: 4627, 7158
– Category: Standards Track
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
– Category: Standards Track
– ISSN: 2070-1721
– T. Bray, Ed., Google, Inc., March 2014
• ECMA404 The JSON Data Interchange Standard
– json.org
15
Standards (II)
• JSON Schema: core definitions and terminology
– draft-zyp-json-schema-04
– Internet Engineering Task Force
– Internet-Draft
– Intended status: Informational
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
– Intended status: Informational
– Expires: August 4, 2013
– F. Galiegue, Ed., K. Zyp, Ed., SitePen (USA), G. Court, January 31, 2013
16
JSON.org
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 17
Observations and Caveats
• JSON is an interchange format (only)– Syntax only
– No operational semantics defined• E.g., no comparison operations (>, <, =, etc.), no string operations, no Boolean operations, etc.
• E.g., no restrictions on array: array elements can be of any type
• Unknown value cannot be expressed (unlike e.g. SQL Null)
• Property order is undefined
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Property order is undefined
• Duplicate properties are not restricted
• No type constructors (new types cannot be introduced by specification)
• Identifier sizes, array sizes, object sizes, etc., are undefined
• Case variations (TRUE vs. true vs. TrUe) are not supported
• Uniqueness (aka, primary key(s)) is undefined
• Array base (zero or one?) is undefined
• Top level object restriction (composite only?)
• Etc.
18
Know Your Semantics!
• Language libraries
– Back-end and/or user interface libraries
• Database behavior
– Driver and database functionality
Establish knowledge and baseline of operational semantics
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Establish knowledge and baseline of operational semantics
– Regression unit tests that cover all possible semantic aspects
– Difference in semantics of systems implementing JSON
19
• Oracle Database, JSON Developer's Guide, 12c Release 2 (12.2), E58287-10 (206 pages)
• Relational schema support– Create table statement
– JSON column(s)
• Additional topics
– Virtual columns
– Referential integrity
– Partitioning
– JSON data generation
– GeoJSON
Oracle 12c JSON Support (Native)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
– JSON column(s)
• CRUD support– SQL
– JSON functions
• Transaction Support– ACID transactions
– "Multi-document" transactions ☺
– GeoJSON
– OSON
– Indexing
– Encoding
– External table (file access)
– SODA
– JSON Data Guide
20
JSON Relational Schema Support (I)
• Create table statement
– VARCHAR (4000)
– VARCHAR2 (32767)
– BLOB (recommended), CLOB
• Optimization: LOB (<COLUMN_NAME>) STORE AS (CACHE)
• Constraints
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Constraints
– Well-formed JSON (lax syntax)
• CONSTRAINT <constraint_name> CHECK (<column_name> IS JSON));
– Well-formed JSON (strict syntax)
• CONSTRAINT <constraint_name> CHECK (<column_name> IS JSON (STRICT))
– No duplicate properties
• WITH UNIQUE KEYS
21
JSON Relational Schema Support (II)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 22
Oracle® Database
SQL Language
Reference
12c Release 2 (12.2)
E49448-12
January 2017
Example – Create Table with JSON Column
CREATE TABLE one_coll(part VARCHAR2(4000)
CONSTRAINT ensure_jsonCHECK (part IS JSON (STRICT WITH UNIQUE KEYS)));
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
INSERT INTO one_coll VALUES('{"id": 1, "cost": 5, "inventory": 100,
"description": "screw driver"}');
23
Example – Create Table with Two JSON Columns
CREATE TABLE two_coll(part VARCHAR2(4000)
CONSTRAINT ensure_json_pCHECK (part IS JSON (STRICT WITH UNIQUE KEYS)),
notes VARCHAR2 (2000)CONSTRAINT ensure_json_n
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
CONSTRAINT ensure_json_nCHECK (notes IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO two_coll VALUES('{"id": 1, "cost": 5, "inventory": 100,
"description": "screw driver"}','{"status": "brand new"}');
24
Example – Create Table with Mixed Columns
CREATE TABLE mixed_coll(id NUMBER,part VARCHAR2(4000)CONSTRAINT ensure_json_p2
CHECK (part IS JSON (STRICT WITH UNIQUE KEYS)),notes VARCHAR2 (2000) CONSTRAINT ensure_json_n2
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
CONSTRAINT ensure_json_n2 CHECK (notes IS JSON (STRICT WITH UNIQUE KEYS)));
INSERT INTO mixed_coll VALUES(1,'{"id": 1, "cost": 5, "inventory": 100,
"description": "screw driver"}','{"status": "brand new"}');
25
• Create – Read – Update – Delete
– Insert
• Standard SQL
–Update
• Standard SQL
• Standard SQL
– Standardized by standardization body
– Extension of SQL for JSON data structure
• Not separate query language (!)
JSON CRUD Support
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Standard SQL
• Update complete JSON value
– Delete
• Standard SQL
–Query
• Standard SQL
• Not separate query language (!)
26
Example – Insert JSON
INSERT INTO two_coll VALUES('{"id": 1, "cost": 5, "inventory": 100,
"description": "screw driver"}','{"status": "brand new"}');
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 27
Example – Update JSON
INSERT INTO two_coll VALUES('{"id": 1, "cost": 5, "inventory": 100,
"description": "screw driver"}','{"status": "brand new"}');
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
UPDATE two_collSET notes = '{"status": "used"}'WHERE json_value(part, '$.id') = 1;
28
Example – Delete JSON
INSERT INTO two_coll VALUES('{"id": 1, "cost": 5, "inventory": 100,
"description": "screw driver"}','{"status": "brand new"}');
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
DELETE FROM mixed_collWHERE json_value(part, '$.id') = 1;
29
Query JSON – DOT Notation
• DOT notation
– <column>.<property_name>[.<property_name>|<array_step>]*
– In projection to select property of JSON document
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 30
Example – DOT notation
SELECT mc.id, mc.part.costFROM mixed_coll mc;
SELECT id, json_value(part, '$.cost') AS COSTFROM mixed_coll;
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
FROM mixed_coll;
31
• Path expression
– Selects zero or more matching JSON values
– Each step must match for the expression to match
• Functions
– JSON_EXISTS()
• Returns true, if at least one value matches
– JSON_VALUE()
• Returns value if scalar, error if non-scalar
Query JSON – JSON Functions
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Returns value if scalar, error if non-scalar
• Returns SQL Null if no match
– JSON_QUERY()
• Returns all matching values
– JSON_TABLE()
• Create a relational view (JSON decomposition)
32
Example – JSON_TABLE()
INTO complex_coll VALUES(1,'{"id": 1, "cost": 5, "inventory": 100,
"description": "screw driver", "shipper": [{"name": "FAST Shipper", "quality": 5},
{"name": "SLOMO", "quality": 1},
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
{"name": "SLOMO", "quality": 1},{"name": "ALWAYS-ON-TIME", "quality": 10}]}');
33
Example – JSON_TABLE()
SELECT cc.id, jt.shipper, jt.qualityFROM complex_coll cc,
json_table(part, '$.shipper[*]' COLUMNS (shipper VARCHAR2(32 CHAR) PATH '$.name',
quality VARCHAR2(32 CHAR) PATH '$.quality')) jt;
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 34
Example – JSON Join
SELECT mc.id,tc.notes AS "tc notes",mc.notes AS "mc notes"
FROM two_coll tc,mixed_coll mc
WHERE json_value(tc.part, '$.id') = json_value(mc.part, '$.id');
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
WHERE json_value(tc.part, '$.id') = json_value(mc.part, '$.id');
35
Example – JSON and Relational Data Join
SELECT mc.id,tc.notes AS "tc notes",mc.notes AS "mc notes"
FROM two_coll tc,mixed_coll mc
WHERE json_value(tc.part, '$.id') = mc.id;
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
WHERE json_value(tc.part, '$.id') = mc.id;
36
Transactions
• Oracle's transaction semantics applies unchanged
• One or more DML SQL statements referring to JSON columns can be in one transaction
• JSON as well as relational DML SQL statements can occur in any order in a transaction
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
transaction
37
Summary
• JSON
– Standardized interchange format
– Popular format for UI, backend programming as well as storage: one format across all application system layers
• Oracle 12c database
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Oracle 12c database
– Provides complete operational semantics
– Provides extensive functionality
– Includes JSON support in all non-functional features
38
Analytics support for JSON
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Analytics support for JSON
39
Analytics
• Use of aggregation functions to gain insight and knowledge from OLTP data subset
– Aggregation functions: avg, min, max, count, stddev, …
• Example query
–What is average quality of all shippers?
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
–What is average quality of all shippers?
• Analytics dashboard
–User interface collection of different analytic evaluations for given metrics
–Not discussed in the following
40
Classical Analytics Architecture
• Independent analytics system separate from OLTP system
–Optimized for analytics processing
• ETL (Extract-Transform-Load) from OLTP to analytics system
– Extract subset of OLTP data set required for analytics
– Transform extracted data set into form suitable for analytics system
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
– Transform extracted data set into form suitable for analytics system
• Possibly semantic transformation and "cleansing"
– Load into analytics system for analytics processing
41
OLTPDBMS
ANALYTICSDBMS
ETL
Classical Analytics Architecture Evaluation
• Separate systems
– Additional failure points, different infrastructure requirements, separate maintenance approaches, operations support required for several systems
• Data ETL
– Significant execution duration for overall data transfer (practical data volume limited)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
– Significant execution duration for overall data transfer (practical data volume limited)
– Data snapshot (outdated, not up-to-date), continuous stream possible (still lagging)
– Different programming paradigm compared to OLTP
• Change in analytics requirements
–Might require change in ETL programming for extracting different data set and/or transforming differently
42
Ideal Architecture
• One database system used for OLTP as well as analytics processing
–One system and environment
–One programming and querying approach
–No data movement required through ETL
• Fundamental idea
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Fundamental idea
– Same data can be represented in form optimized for OLTP as well as for analytics processing
• OLTP: row (tuple) format
• Analytics processing: columnar format
43
DBMS
Columnar Format: Data Representation
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 44
From:
Oracle® Database
In-Memory Guide
12c Release 2 (12.2)
E71458-06
January 2017
Oracle 12c Analytics Support: In-Memory Option
• OLTP data represented in main memory in columnar format
• Data in main memory (columnar) transactionally consistent with OLTP data (row)
• Analytics processing expressed as regular SQL queries
Optimizer decides if columnar format is advantageous over row (tuple) representation
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
–Optimizer decides if columnar format is advantageous over row (tuple) representation
–No special query language or query syntax elements required
45
Example – Configure Database
• ALTER SYSTEM SET INMEMORY_SIZE = 100M SCOPE=SPFILE;
• ALTER SYSTEM SET MAX_STRING_SIZE=EXTENDED; -- in update mode
• ALTER SYSTEM SET INMEMORY_EXPRESSIONS_USAGE='ENABLE';
• ALTER SYSTEM SET INMEMORY_VIRTUAL_COLUMNS=ENABLE SCOPE=SPFILE;
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 46
Analytics Support for JSON
• No difference with support for relational data
• Check execution plan for usage of In-Memory option
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 47
Example – Create Table/Alter Table
CREATE TABLE im_coll(id NUMBER,part VARCHAR2(4000)
CONSTRAINT ensure_json_p5 CHECK (part IS JSON (STRICT WITH UNIQUE KEYS)));
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
ALTER TABLE im_coll INMEMORY;
ALTER TABLE im_coll NO INMEMORY;
48
Example – Insert
INSERT INTO im_coll VALUES(1,'{"id": 1, "cost": 5, "inventory": 100,
"description": "screw driver", "shipper": [{"name": "FAST Shipper", "quality":5},
{"name": "SLOMO", "quality":1}, {"name": "ALWAYS-ON-TIME", "quality":10}]}');
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
{"name": "ALWAYS-ON-TIME", "quality":10}]}');
INSERT INTO im_coll VALUES(2,
'{"id": 2, "cost": 77, "inventory": 345, "description": "standard screw", "shipper": [{"name": "QUICK Shipper", "quality":5},
{"name": "SLO", "quality":1}, {"name": "ALWAYS-ON-TIME", "quality":10}]}');
49
Example – Analytics Query
SELECT COUNT(st.shipper),SUM(st.quality),AVG(st.quality)
FROM (SELECT DISTINCT jt.shipper,jt.quality
FROM im_coll imc,
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
FROM im_coll imc,json_table(part, '$.shipper[*]'
COLUMNS (shipper VARCHAR2(32 CHAR) PATH '$.name',
quality VARCHAR2(32 CHAR) PATH '$.quality')) jt
) st;
50
• Compression methods
– E.g., MEMCOMPRESS FOR QUERY LOW, MEMCOMPRESS FOR CAPACITY HIGH
• Priority (for loading)
– E.g., PRIORITY LOW, PRIORITY CRITICAL
• Main memory capacity protected through selective OLTP data representation
– Virtual columns, selective enabling of individual columns
Wait – There is More!
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
– E.g., PRIORITY LOW, PRIORITY CRITICAL
• Advisors
– In-Memory advisor, compression advisor
individual columns
• In-Memory Expressions
• …
51
Summary
• Single system with dual data representation optimized for OLTP as well as analytics processing
– Row format
– Columnar format
• JSON data format fully supported enabling JSON analytics processing
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• JSON data format fully supported enabling JSON analytics processing
–Query against JSON data
–Not ETL or pre-analytics transformation required
52
Sharding support for JSON
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Sharding support for JSON
53
What is Sharding in Context of Databases?
• Separation of data and its storage into independent database management systems
– Independent DBMSs are called "shard"
– Shards might be local or remote
– The set of all shards combined is the "sharded database"
• Disjoint separation
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Disjoint separation
– Random (consistent hash) or based on "sharding" key
– Does not imply data replication
• Replication of sharded data
– For HA/DR support
– For read-only access
54
Sharding
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 55
Oracle® Database
Administrator’s Guide
12c Release 2 (12.2)
E49631-09
December 2016
Sharding – Replication
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 56
Oracle® Database
Administrator’s Guide
12c Release 2 (12.2)
E49631-09
December 2016
Sharding – Sharding Criteria
• Criteria to distribute data between shards
• Automatic sharding (system managed)
– System decides how to distribute data over shards
• Composite sharding
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
– Data designer decides how to distribute data
– Partitionset
– Specifies value or range of values in column ("shard key")
– Specified when table is created
57
Example – Composite Sharding
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 58
Oracle® Database
Administrator’s Guide
12c Release 2 (12.2)
E49631-09
December 2016
Example
CREATE SHARDED TABLE Customers (CustId VARCHAR2(60) NOT NULL,Name VARCHAR2(60),Geo VARCHAR2(8),CustProfile VARCHAR2(4000),CONSTRAINT pk_customers PRIMARY KEY (CustId),CONSTRAINT json_customers CHECK (CustProfile IS JSON)
)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
CONSTRAINT json_customers CHECK (CustProfile IS JSON)) partitionset by list(GEO)partition by consistent hash(CustId)partitions auto(partitionset america values ('AMERICA') tablespace set tsp_set_1,partitionset europe values ('EUROPE') tablespace set tsp_set_2);
59
Sharding Architecture
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 60
Oracle® Database
Administrator’s Guide
12c Release 2 (12.2)
E49631-09
December 2016
Universal Connection Pool - JDBC
• UCP introduced shared pools
–One pool can have connections to more than one database
–One pool can have connections to different shards
• Connection creation
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
protected Connection getCustomerConnection(PoolDataSource pool, Customer customer) throws SQLException{
return pool.createConnectionBuilder().shardingKey(pool.createShardingKeyBuilder()
.subkey(customer.email, OracleType.VARCHAR2)
.build()).build();
}
61
• Adding, removing shards
• Resharding
– Required by adding/removing shards
• Backup/recovery
• Monitoring
– Command line options
• Schema modification
–Orchestrated by shard catalog
Sharding Management
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Patching
– Rolling patching supported
62
Wait – But Why?
• Linear scalability
• Fault containment
• Geographical data distribution
• Rolling upgrades
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Rolling upgrades
• Cloud deployment benefits
– Sizing, elasticity, mix of cloud/on-premise
• Cool
63
Summary
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Summary
64
JSON
• Scaling JSON Documents and Relational Data in Distributed ShardedDatabases
–Oracle as multi-model database supports concurrently different data models, including JSON
–Oracle 12c provides complete functional and non-functional set capabilities for OLTP
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
–Oracle 12c provides complete functional and non-functional set capabilities for OLTP and analytics processing of JSON data (documents)
– Data modeler can choose from all data models within one database design
– Engineer can choose from all data models to implement OLTP and/or analytics
– Database ops can choose best deployment options for the scalability required
65
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 66