Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL...

29
Speaker : Date : E-mail : Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store” using SQL/JSON Peter de Vaal 16-06-2017 [email protected]

Transcript of Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL...

Page 1: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

Speaker : Date : E-mail :

Why you don't need NOSQL solutions for time series data

Making Oracle 12cR2 a “relational NOSQL store” using SQL/JSON

Peter de Vaal 16-06-2017 [email protected]

Page 2: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

Who am I

² Peter de Vaal, Netherlands ² University of Leiden, PhD in Chemistry ² Hobby: Mountain Biking, Road cycling ² Working with Oracle technology since 1992

ª 5 years as developer, 19 years as consultant, 1 year as software architect

² SIG lead of Fusion Middleware CAF SIG ª So what am I doing on a SQL/PLSQL track?

2

Page 3: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

Subjects

² Time series: Relational vs NOSQL databases ² How to store time series data in an RDBMS ² A real life case ² Creating APIs for “citizen” developers

3

Page 4: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

Storing timeseries data in a database ² Relational database

ª Advantages: ª Full SQL language support ª Knowledge widely available ª Built-in analytical functions

ª Disadvantage: Said to be not scalable

² NOSQL database ª Advantages: Very scalable ª Disadvantages:

ª Poor query support ª Poor support for secondary indexes ª No built-in analytical functions

https://blog.timescale.com/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c

4

Page 5: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

The NOSQL paradigm

² NOSQL databases are key-value pair databases ² Storage is simpler and more scalable ² No overhead for read-consistency, constraint

checking or table logic (triggers) ² dedicated time series databases typically store a

large series as one value, e.g. all values of one measuring session, or all values of a day, month ª Only feasible for fixed frequency time series (time

between points is the same for all)

5

Page 6: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

Storing Time Series Data in an RDBMS

² In a relational database values are usually stored as individual rows in a table

² The table holding the values should have a simple structure. Ideally it has 3 columns: ª A timestamp column ª A value column, in most cases a numeric ª A key column, holding an identifier pointing to a

descriptor of the value

6

Page 7: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

Storing Time Series Data in an RDBMS

² The key column has a FK to a dimension table ² The dimension table may have any number of

columns to describe the time series ² A multi-column Unique Key constraint may be

helpful to uniquely describe each time serie ² Making the table Index Organized may boost

performance

7

Page 8: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

The timestamp column

² Unconstrained or equally spaced grid points ² Do not use TIMESTAMP WITH TIME ZONE

ª It cannot be used as (part of) a primary key

² Avoid using local time ª Gives problems with daylight saving time changes ª Use a fixed UTC offset ª Use a pl/sql function (ideally with RESULT CACHE) to

convert the stored timestamp to a local time representation (for display) or for selecting all time points on a date (the function determines the first and last UTC timestamp of the day)

8

Page 9: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

The timestamp column

² Use TIMESTAMP rather than DATE ª The behaviour is slightly different ª Use TIMESTAMP(0) if you do not need fractional

seconds

² Considerations when using hours as unit ª When selecting values of a day make sure you can

distinguish 25 hours on the day going from summer to winter time

² Values on discrete times, or interval averages? ª Add an interval length column, unless the period

length is always fixed, e.g. hours or days

9

Page 10: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

When does the RDBMS not perform?

² Index Organization only helps when the number of time series in one table is limited, so the timepoints in a series are not scattered over storage blocks

10

Page 11: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

Why store individual values as rows?

² The simple structure of a time series table does not require to store just one value per row

² But it is not feasible to create columns for all time points (unless there only a few, e.g. 24 hours)

² So what now?

11

Page 12: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

Store multiple values in one row

² Use a CLOB or BLOB (or XML Type) column to store a text (e.g. XML, JSON or CSV) representation of the timestamp-value pairs ª In Oracle 11 you may use XMLType ª In Oracle 12.2 a use a BLOB/CLOB with JSON

² Both enable full usage of SQL to retrieve the contents as if it were individual rows

² In 11g you might also make use of a JSON parser to retrieve data from JSON in a CLOB, and retrurning the result using a pipelined PL/SQL function

12

Page 13: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

REAL LIFE CASE

13

Page 14: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

A database for storing Power Data

² The European energy market is an open market ² Any data on production and demand of power in

all European countries is available to the market ² Power production and demand data are available

in MWh per time interval, mostly quarter (15 min), hour or day averages

² Multiple time series must be stored ª Dimensions are a.o.: country, production source) ª Total of about 4000 time series

14

Page 15: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

Lets try it out

² Case 1: ª A Wind power forecast, 0-15 days ahead, one value

per 15 minutes. Number of values: 96*16 = 1536

² Case 2: ª Wind Power forecasts, 0-15 days ahead based on the

European weather models, containing 50 models, 14 averaged curves and 1 reference (= operational forecast). Number of values = 96*16*65=99850

15

Page 16: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

The Relational time series table

² Each value in a single row in the table ² Requires 15 time series (keys), one for each day

ª Cannot use the issue date of the forecast as a dimension column because it will ever grow

ª Instead we use a forecast label (FC00-FC15) to identify the 15 days

² Use issue_date (extra indexed column on the time series table) to select all rows belonging to an issue

16

Page 17: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

The table

CREATE TABLE capacity_values ( CPR_ID NUMBER, TIME_INTERVAL TIMESTAMP(0), INTERVAL_LENGTH NUMBER(4,0) CAPACITY NUMBER, ISSUE_DATE TIMESTAMP(0) ) ORGANIZATION INDEX;

17

Page 18: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

Query on the Relational Table SELECT cv.time_interval, cv.capacity, ch.series_code FROM capacity_values cv WHERE cv.issue_date = TO_DATE(:ISSUE,’DD-MM-YYYY HH24:MI’) AND cv.cpr_id IN ( SELECT d.cpr_id FROM power_descriptors_v d WHERE h.series_type = :SERIES_TYPE AND h.area_code = :COUNTRY AND h.category = ‘WIND’ AND h.phase = ‘F’ -- (Forecast) );

18

Page 19: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

The time series in a LOB table

² The issue date instead of the time_interval in de primary key

² All the time intervals and values stored as JSON in a LOB column

19

Page 20: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

The issue based time series table

CREATE TABLE power_forecast_issues( CPR_ID NUMBER, ISSUE_DATE TIMESTAMP(0), POWER_VALUES CLOB ) ORGANIZATION INDEX;

Note: BLOB would probably be better than CLOB when using SQL/JSON

20

Page 21: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

JSON structure in POWER_VALUES

{"series”:[ { "series_code":"REGULAR”, "dates”:[ {"date":"2017-06-15”, "intervals”:[ {"ti":”00:00”,"pv":125}, {"ti":”00:15”,"pv":129}, … ] } ] }] }

21

Page 22: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

The query using a pipelined function

² Query the single row ² Use a pipelined function to decompose the

JSON using a JSON parser (I use APEX_JSON), then output a row per value

² Query the pipelined function joined with the dimensions table (POWER_DESCRIPTORS_V)

22

Page 23: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

The query using SQL/JSON

See demo

23

Page 24: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

APIS FOR DEVELOPERS

24

Page 25: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

Developers and SQL/JSON

² SQL/JSON or XMLDB queries are complex ª Not many developers may like or be able to write it

² Use virtual columns ² Hide the queries behind APIs

25

Page 26: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

APIs for (citizen) developers

² Views ª Enables sesame street SQL, but it is still SQL

² Pipelined PL/SQL functions ª Parametrized function calls instead of WHERE clause ª But WHERE clause can be added ª Developer can make full use of analytical functions

² REST APIs ª Parametrized URLs instead of SQL quries ª SQL injection proof ª Users cannot write bad performing queries ª Also for CRUD operations

26

Page 27: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

CONCLUSION

27

Page 28: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

² A relational database is the perfect place to store time series data

² The SQL/JSON in Oracle 12.2 enables all query possibilities for time series stored in JSON

² Performance and scalability are no drawback anymore

28

Page 29: Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL solutions for time series data Making Oracle 12cR2 a “relational NOSQL store”

Welcome to the Relational NOSQL database

29