Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL...
Transcript of Why you don't need NOSQL solutions for time series data · 2017-06-16 · Why you don't need NOSQL...
Speaker : Date : E-mail :
Why you don't need NOSQL solutions for time series data
Making Oracle 12cR2 a “relational NOSQL store” using SQL/JSON
Peter de Vaal 16-06-2017 [email protected]
Who am I
² Peter de Vaal, Netherlands ² University of Leiden, PhD in Chemistry ² Hobby: Mountain Biking, Road cycling ² Working with Oracle technology since 1992
ª 5 years as developer, 19 years as consultant, 1 year as software architect
² SIG lead of Fusion Middleware CAF SIG ª So what am I doing on a SQL/PLSQL track?
2
Subjects
² Time series: Relational vs NOSQL databases ² How to store time series data in an RDBMS ² A real life case ² Creating APIs for “citizen” developers
3
Storing timeseries data in a database ² Relational database
ª Advantages: ª Full SQL language support ª Knowledge widely available ª Built-in analytical functions
ª Disadvantage: Said to be not scalable
² NOSQL database ª Advantages: Very scalable ª Disadvantages:
ª Poor query support ª Poor support for secondary indexes ª No built-in analytical functions
https://blog.timescale.com/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c
4
The NOSQL paradigm
² NOSQL databases are key-value pair databases ² Storage is simpler and more scalable ² No overhead for read-consistency, constraint
checking or table logic (triggers) ² dedicated time series databases typically store a
large series as one value, e.g. all values of one measuring session, or all values of a day, month ª Only feasible for fixed frequency time series (time
between points is the same for all)
5
Storing Time Series Data in an RDBMS
² In a relational database values are usually stored as individual rows in a table
² The table holding the values should have a simple structure. Ideally it has 3 columns: ª A timestamp column ª A value column, in most cases a numeric ª A key column, holding an identifier pointing to a
descriptor of the value
6
Storing Time Series Data in an RDBMS
² The key column has a FK to a dimension table ² The dimension table may have any number of
columns to describe the time series ² A multi-column Unique Key constraint may be
helpful to uniquely describe each time serie ² Making the table Index Organized may boost
performance
7
The timestamp column
² Unconstrained or equally spaced grid points ² Do not use TIMESTAMP WITH TIME ZONE
ª It cannot be used as (part of) a primary key
² Avoid using local time ª Gives problems with daylight saving time changes ª Use a fixed UTC offset ª Use a pl/sql function (ideally with RESULT CACHE) to
convert the stored timestamp to a local time representation (for display) or for selecting all time points on a date (the function determines the first and last UTC timestamp of the day)
8
The timestamp column
² Use TIMESTAMP rather than DATE ª The behaviour is slightly different ª Use TIMESTAMP(0) if you do not need fractional
seconds
² Considerations when using hours as unit ª When selecting values of a day make sure you can
distinguish 25 hours on the day going from summer to winter time
² Values on discrete times, or interval averages? ª Add an interval length column, unless the period
length is always fixed, e.g. hours or days
9
When does the RDBMS not perform?
² Index Organization only helps when the number of time series in one table is limited, so the timepoints in a series are not scattered over storage blocks
10
Why store individual values as rows?
² The simple structure of a time series table does not require to store just one value per row
² But it is not feasible to create columns for all time points (unless there only a few, e.g. 24 hours)
² So what now?
11
Store multiple values in one row
² Use a CLOB or BLOB (or XML Type) column to store a text (e.g. XML, JSON or CSV) representation of the timestamp-value pairs ª In Oracle 11 you may use XMLType ª In Oracle 12.2 a use a BLOB/CLOB with JSON
² Both enable full usage of SQL to retrieve the contents as if it were individual rows
² In 11g you might also make use of a JSON parser to retrieve data from JSON in a CLOB, and retrurning the result using a pipelined PL/SQL function
12
REAL LIFE CASE
13
A database for storing Power Data
² The European energy market is an open market ² Any data on production and demand of power in
all European countries is available to the market ² Power production and demand data are available
in MWh per time interval, mostly quarter (15 min), hour or day averages
² Multiple time series must be stored ª Dimensions are a.o.: country, production source) ª Total of about 4000 time series
14
Lets try it out
² Case 1: ª A Wind power forecast, 0-15 days ahead, one value
per 15 minutes. Number of values: 96*16 = 1536
² Case 2: ª Wind Power forecasts, 0-15 days ahead based on the
European weather models, containing 50 models, 14 averaged curves and 1 reference (= operational forecast). Number of values = 96*16*65=99850
15
The Relational time series table
² Each value in a single row in the table ² Requires 15 time series (keys), one for each day
ª Cannot use the issue date of the forecast as a dimension column because it will ever grow
ª Instead we use a forecast label (FC00-FC15) to identify the 15 days
² Use issue_date (extra indexed column on the time series table) to select all rows belonging to an issue
16
The table
CREATE TABLE capacity_values ( CPR_ID NUMBER, TIME_INTERVAL TIMESTAMP(0), INTERVAL_LENGTH NUMBER(4,0) CAPACITY NUMBER, ISSUE_DATE TIMESTAMP(0) ) ORGANIZATION INDEX;
17
Query on the Relational Table SELECT cv.time_interval, cv.capacity, ch.series_code FROM capacity_values cv WHERE cv.issue_date = TO_DATE(:ISSUE,’DD-MM-YYYY HH24:MI’) AND cv.cpr_id IN ( SELECT d.cpr_id FROM power_descriptors_v d WHERE h.series_type = :SERIES_TYPE AND h.area_code = :COUNTRY AND h.category = ‘WIND’ AND h.phase = ‘F’ -- (Forecast) );
18
The time series in a LOB table
² The issue date instead of the time_interval in de primary key
² All the time intervals and values stored as JSON in a LOB column
19
The issue based time series table
CREATE TABLE power_forecast_issues( CPR_ID NUMBER, ISSUE_DATE TIMESTAMP(0), POWER_VALUES CLOB ) ORGANIZATION INDEX;
Note: BLOB would probably be better than CLOB when using SQL/JSON
20
JSON structure in POWER_VALUES
{"series”:[ { "series_code":"REGULAR”, "dates”:[ {"date":"2017-06-15”, "intervals”:[ {"ti":”00:00”,"pv":125}, {"ti":”00:15”,"pv":129}, … ] } ] }] }
21
The query using a pipelined function
² Query the single row ² Use a pipelined function to decompose the
JSON using a JSON parser (I use APEX_JSON), then output a row per value
² Query the pipelined function joined with the dimensions table (POWER_DESCRIPTORS_V)
22
The query using SQL/JSON
See demo
23
APIS FOR DEVELOPERS
24
Developers and SQL/JSON
² SQL/JSON or XMLDB queries are complex ª Not many developers may like or be able to write it
² Use virtual columns ² Hide the queries behind APIs
25
APIs for (citizen) developers
² Views ª Enables sesame street SQL, but it is still SQL
² Pipelined PL/SQL functions ª Parametrized function calls instead of WHERE clause ª But WHERE clause can be added ª Developer can make full use of analytical functions
² REST APIs ª Parametrized URLs instead of SQL quries ª SQL injection proof ª Users cannot write bad performing queries ª Also for CRUD operations
26
CONCLUSION
27
² A relational database is the perfect place to store time series data
² The SQL/JSON in Oracle 12.2 enables all query possibilities for time series stored in JSON
² Performance and scalability are no drawback anymore
28
Welcome to the Relational NOSQL database
29