Supercharge your RDBMS with Elasticsearch
-
Upload
arthur-gimpel -
Category
Technology
-
view
59 -
download
0
Transcript of Supercharge your RDBMS with Elasticsearch
Supercharge Your RDBMS with Elasticsearch
Arthur Gimpel, Director of DataZone
Name: Arthur Gimpel
Position: Technology Evangelist, Solutions Architect, Trainer
Tech Stack: Elastic Stack, SQL Server, MongoDB, Couchbase, Redis, Kafka, StreamSets, Python, .NET…
Free Time: Motorcycles, Skydiving…
Click to edit Master title styleAbout Me
• First RDBMS was introduced in late 1970s
• Exist in all possible flavors but share one thing - ACID• Still dominate the database market
Click to edit Master title styleRelational Database Management Systems
• Atomicity: All or nothing approach, transactions
• Consistency: Hard state, every transaction changes the whole DBMS
• Isolation: Transactions cannot interfere with each other
• Durability: Every transaction is persisted
Click to edit Master title styleRDBMS in Theory - ACID
• Everything is persisted, synchronously. Limited by IO performance
• All data is bound to a tabular schema, hard to make changes in big databases
• ACID makes horizontal scaling nearly* impossible
• Complex schema slows down aggregations and queries drastically
Click to edit Master title styleACID Is Not Perfect
• Distributed / Horizontal Scalability
• Mostly Open Source• Mostly schema less:
• Key - Value
• Document
• Graph
• Serves specific purposes
Click to edit Master title styleNoSQL - New Kid in Town
• Every data store has its purpose. There is no single solution to all database needs
• NoSQL does not implement all of RDBMS’s abilities (CDC, Jobs, Stored Procedures, Triggers)
• Every data store has its own languages, and APIs. There is no ANSI SQL
Click to edit Master title styleNoSQL - Challenges
Click to edit Master title styleNoSQL = Not Only SQL | Polyglot Persistence
• Search platform, data store based on Apache Lucene
• Supports various search types: Filtered, Full-text, Geography, Aggregation (Facet, Nested, Pipeline), Graph
• Distributed - every index is split to shards relying on (potentially) a node
• Document store - JSON
• “Optimistic” Schema-less architecture
• Supports Replication by nature
• Supports Unsupervised Machine Learning by nature (Prelert, in beta)
Click to edit Master title style
Click to edit Master title styleSearch != SQL Querying
Click to edit Master title styleReference Architecture #1
Click to edit Master title styleReference Architecture #2
Click to edit Master title styleArchitecture Comparison
Architecture #1 Architecture #2
Data distribution strategy Data store based Application based
Data distribution component Data Pipeline ( StreamSets ) Message Queue ( Kafka )
Implementation Team Data Engineers / DevOps DevOps / Developers
Implementation Complexity Low: Data pipeline development High: data access layer refactor
Potential additional licensing Elasticsearch, StreamSets None
Scalability Limited to RDBMS Scale Fully scalable regardless of RDBMS
Thank You!