© 2018 Dbvisit Software | dbvisit.com© 2018 Dbvisit Software | dbvisit.com
Through O Shaped Glasses
Introducing Kafka to the Oracle DBA
Mike Donovan
CTO
Dbvisit Software
© 2018 Dbvisit Software | dbvisit.com
2
Mike Donovan
Chief Technology Officer, Dbvisit Software
• Multi-platform DBA, (Oracle, MSSQL…..)
• Conference speaker: OOW, RMOUG, dbTech Showcase, Collaborate
• NZOUG member
• Technical Writer and Editor
• Kafka enthusiast
• Say that I am an oracle ACE ☺
Professional not-knower of things
© 2018 Dbvisit Software | dbvisit.com
Stream Data Platform with Kafka
• Distributed
• Fault Tolerant
• Stream Processing
• Data Integration
• Message Store
© 2018 Dbvisit Software | dbvisit.com
Agenda
• What is Kafka
• Looking at this new technology as an Oracle
DBA
• Why should an Oracle professional care?
• How do I get started with Kafka
© 2018 Dbvisit Software | dbvisit.com
The New World of data
• Data centralization
• Real time delivery
• Integration
• Stream data processing
• New data end points/stores
© 2018 Dbvisit Software | dbvisit.com
What is Kafka?
A scalable, fault tolerant, distributed system where messages are kept in
topics that are partitioned and replicated across multiple nodes.
• Developed at LinkedIn ~2010
• Confluent and the OS project
An open-source publish-subscribe messaging implemented as a
distributed commit log
© 2018 Dbvisit Software | dbvisit.com
What is Kafka?
• Data is written to Kafka in the form of key-value pair messages (can have
null)
• Each message belongs to a topic
• Messages as a continuous flow (stream) of events
• Producers (writers) decoupled from Consumers (readers)
• A delivery channel/platform (if you like) – crossing systems (data
Integration)
• TOPICS (Kafka) (~)= TABLES (ORACLE)
© 2018 Dbvisit Software | dbvisit.com
Kafka - components
Zookeeper
Schema Registry
Kafka
REST Proxy
Kafka Connect
data
What about KSQL and Kafka
Streams?
© 2018 Dbvisit Software | dbvisit.com
Kafka – basic operations demo
1. Download the Confluent platform
2. Run the CLI (scripts alternative)
CLI = SQL Plus? (or svrmgr)
3. Push data into Kafka topic
(bundled Producer)
4. Read some data out of a Kafka topic
(bundled Consumer)
© 2018 Dbvisit Software | dbvisit.com
Kafka – why would you use it?3 propositions:
• Messaging system
• Data streaming platform
• Data storage
➢ Messaging
➢ Website Activity
➢ Tracking Metrics
➢ Log Aggregation
➢ Stream Processing
➢ Event Sourcing
➢ Commit Log
© 2018 Dbvisit Software | dbvisit.com
Apples and Oranges? Kafka and Oracle
• Messaging system
- transmission channel
- integration priority
• Data streaming (always on) platform
– in line transformations
- push
• Data storage (topics)
• X
• Data delivery end point
(periodic/batch)
- materialised views?
- logical replication?
- pull?
• Data store (source of truth –
tables)
© 2018 Dbvisit Software | dbvisit.com
Oracle Database tables: State active record model
Persistent Store - retains current known STATE.
ID Name Salary
1 Chris 100
2 Jim 350
3 Bob 500
Oracle
Database
Source
select * from
employees
where ID = 2;
© 2018 Dbvisit Software | dbvisit.com
ID Name Old
Salary
New
Salary
Machine
IDUser TRANS_
TYPE
Commit
Timestamp
2 Jim 300 350 Machine_2 QA U2016-May-12
14:22:03
ID Name Salary
1 Chris 100
2 Jim 350
3 Bob 500
Update emp set
salary = 350
Where id = 2
Insert into stage_emp values
(2, 300, 350, Machine_2, QA,
U, 2016-May-12 14:22:03)
Event Streaming
Source
Target
INSERT ALL ROWS mode
© 2018 Dbvisit Software | dbvisit.com
Map of Europe
Show SQL Statement
Train No Start
Location
End
Location
Passengers Engineer Status TRANS_
TYPE
Commit
Timestamp
1 London Cardiff 100 Smythe Good I2016-Nov-12
14:22:00
1 Cardiff Cardiff 0 Smythe Good U2016-Nov-12
17:24:03
1 Cardiff Edinburgh 312 Johnson Good U2016-Nov-12
18:00:09
1 Edinburgh Edinburgh 0 Rest U2016-Nov-13
04:02:33
What’s missing with state alone?
© 2018 Dbvisit Software | dbvisit.com
The Online Redo Log Files
The redo log stores a continuous chain of chronological order of every
change vector applied to the database. This will be the bare minimum of
information required to reconstruct, or redo, all the work that has been
done.
If a datafile (or the whole database) is damaged or destroyed, these change
vectors can be applied to datafile backups to redo the work, bringing them
forward in time until the moment that the damage occurred.
P89 OCA exam guide
© 2018 Dbvisit Software | dbvisit.com
The Redo Log!CHANGE #3 TYP:0 CLS:1 AFN:4 DBA:0x0100061b OBJ:27521 SCN:0x0000.00ab0ab9 SEQ:2
OP:11.2 ENC:0 RBL:0
KTB Redo
op: 0x01 ver: 0x01
compat bit: 4 (post-11) padding: 1
op: F xid: 0x0003.00c.00001047 uba: 0x00c31f4a.043f.15
KDO Op code: IRP row dependencies Disabled
xtype: XA flags: 0x00000000 bdba: 0x0100061b hdba: 0x0100061a
itli: 1 ispac: 0 maxfr: 4858
tabn: 0 slot: 0(0x0) size/delt: 24
fb: --H-FL-- lb: 0x1 cc: 6
null: ------
col 0: [ 2] c1 07
col 1: [ 5] 50 65 72 72 79
col 2: [ 2] c1 16
col 3: [ 2] c1 02
col 4: [ 2] 49 54
col 5: [ 2] c1 07
insert into HR.EMPLOYEES values (6,'Perry',21,1,'IT’,6);
© 2018 Dbvisit Software | dbvisit.com
An old methodology: Event Sourcing
“Event Sourcing ensures that all changes to application state are stored as a sequence of events...
The fundamental idea of Event Sourcing is that of ensuring every change to the state of an application is captured in an
event object, and that these event objects are themselves stored in the sequence they were applied for the same lifetime as
the application state itself.”
Martin Fowler: https://www.martinfowler.com/eaaDev/EventSourcing.html
• Don't save the current state of objects
• Instead write the events that lead to the current state
• An APPEND-ONLY log
© 2018 Dbvisit Software | dbvisit.com
An old methodology: Event Sourcing
Martin Kleppmann - Designing Data Intensive Applications
EVENT• Details
• Meta-data
© 2018 Dbvisit Software | dbvisit.com
Event Sourcing Benefits
Fowler suggests:
• Complete Rebuild - rehydrate secondary systems
• Temporal Queries
• Event Replay - forward and reverse
© 2018 Dbvisit Software | dbvisit.com
• Capture all changes in the database and record these as events
Every change becomes an insert, even a delete and update become an insert
• Adds additional information (metadata) about these changes such as who,
where, what, when
• Turning the database “inside out” (turn the redo log into a normal log)
• See the full lifecycle of the data, now possible in real time!
Event Streaming
© 2018 Dbvisit Software | dbvisit.com
The Online Redo Log FilesI created a topic (in Confluent 4.0) called connect-dbmessage
Boils down to a file on disk here (where is this determined?)
/tmp/confluent.Sz1GdA5f/kafka/data/connect-dbmessage-0
We can run a strings command on it:
AND we can also dump it using some
Kafka tools (need to show this)...
© 2018 Dbvisit Software | dbvisit.com
Oracle Change Data – delivered to Kafka
INSERT...
into SCOTT.TEST9
metadata
© 2018 Dbvisit Software | dbvisit.com
Kafka - a log writer/readerPartition 0 Partition 1 Partition 2
Old
New
• Organized by topics
• Sub-categorization by
partitions (log files on
disk)
• Replicated between
nodes for redundancy
© 2018 Dbvisit Software | dbvisit.com
Indexes, Offsets and Data filesKafka - a log writer/reader
[oracle@dbvrep01 REP-TX.META-0]$ ll
total 4172
-rw-r--r-- 1 oracle oinstall 10485760 Jun 15 18:51 00000000000000000000.index
-rw-r--r-- 1 oracle oinstall 4236052 Jun 15 18:56 00000000000000000000.log
-rw-r--r-- 1 oracle oinstall 10485756 Jun 15 18:51 00000000000000000000.timeindex
DUMP LOG SEGMENTS COMMAND:kafka-run-class kafka.tools.DumpLogSegments --print-data-log
--files /tmp/kafka-logs/REP-TX.META-0/00000000000000000000.log
© 2018 Dbvisit Software | dbvisit.com
Topic vs Table Creation
Create a topic
bin/kafka-topics —create —zookeeper localhost:2181 --topic TOPIC_NAME --replication-factor 1 --partitions 1
© 2018 Dbvisit Software | dbvisit.com
Kafka Connect - export/import toolDatapump anyone?
• Cassandra
• Elasticsearch
• Google BigQuery
• Hbase
• HDFS
• JDBC
• Kudu
• MongoDB
• Postgres
• S3
• SAP HANA
• Solr
• Vertica
© 2018 Dbvisit Software | dbvisit.com
SMTs and KStreams
Create a topic
bin/kafka-topics —create —zookeeper localhost:2181 --topic TOPIC_NAME --replication-factor 1 --partitions 1
© 2018 Dbvisit Software | dbvisit.com
• Kafka and Kafka Connectwww.confluent.io
• Download the Confluent Platform (bundled connectors)
• Check out the available community connectors
• Try running it in Docker
Get started with Kafka
© 2018 Dbvisit Software | dbvisit.com
• Real-time Oracle Database Streaming software solutions
• In the Cloud | Hybrid | On-Premise
• New Zealand-based, US office, Asia Sales office, EU office (Prague)
• Unique offering: disaster recovery solutions for Oracle Standard Edition
• Logical replication for moving data where ever and whenever you wish
• Flexible licensing, cost effective pricing models available
• Exceptional growth, 1300+ customers
• Peerless customer support
About Dbvisit Software
© 2018 Dbvisit Software | dbvisit.com© 2018 Dbvisit Software | dbvisit.com
Thank you
Top Related