Database engineering

Database Engineering

Velocity Conference, Santa Clara

Laine Campbell, Chief Unicorn

May 28th, 2015

content

© 2014 Pythian

● reliability and operations engineering - history and context

● design - architecture and objectives

● deploy - delivery pipelines, capacity and automation

● maintain - data stewardship, business continuity and change

Who am I?

© 2014 Pythian

● first of a series

● focuses on operations engineering for DBEs

● laine gives perfectionist and enterprise focus

● charity gives real world start-up view

● crazy breakdancing

© 2014 Pythian

O’Reilly doesn’t allow unicorns on their covers

in early release by OSCONF Portland

© 2014 Pythian

me with a jaguar…

engineering, not administration

© 2014 Pythian

engineering

© 2014 Pythian

● quantitative

● interdisciplinary

● results focused

● repeatable and code-driven

systems engineering

© 2014 Pythian

● designing and managing complex systems for complete life-cycle

● translation from business to system, with focus on communications, resiliency, flexibility and interoperability

● ensuring the system generates necessary data for measuring its performance against KPIs

operations engineering

© 2014 Pythian

● allocate resources to balance competing objectives of cost, service level and productivity

● designing and building processes and infrastructure to maintain peak flexibility and quality to serve the business

● gathering and utilizing data to effectively provide continuous improvement to systems

database engineering

© 2014 Pythian

● participates in systems and operations engineering processes by providing expertise to facilitate all work

● understands and teaches dataset nuances to ensure all KPIs are met

● anchors teams with expertise for troubleshooting, recovery and other tasks requiring depth, not breadth

reliability engineering

© 2014 Pythian

● focuses on the glue common to all services, platform-centric

● deployment, efficiency, scale, performance, observability

● often done by systems and operations engineering, rather than being their own function

yesterday’s DBA

© 2014 Pythian

● gatekeeper

● master builder

● superhero

● siloed

● specialized

paradigm shifts

© 2014 Pythian

virtualization and cloud

© 2014 Pythian

● forces horizontal scaling

● forces designing for resilience

● elasticity drives new data storage

● management by API

infrastructure as code

© 2014 Pythian

● forces standardization

● forces us to learn to code

● we start building platforms

● changes become deployments

devops cultures

© 2014 Pythian

● lean manufacturing defines our workflows

● tighter feedback loops require organizational shifts

● experimentation and controlled failure shift architecture and process design

● integration drives empathy

continuous delivery

© 2014 Pythian

● brings us to the source code control paradigm

● we must be teachers, not gatekeepers

● testing and compliance become top priorities

polyglot persistence

© 2014 Pythian

● relational is not the end of the line

● data must be looked at end to end

● function dictates form

● we cannot predict all uses

today’s database engineer

© 2014 Pythian

specialty = depth

© 2014 Pythian

● analytical thinking

● problem solving

● edge case knowledge

● exhibited in at least one system

○ MySQL/Solaris○ Cassandra/RHEL○ PostgreSQL/Windows

breadth, crossingdisciplines

© 2014 Pythian

● understands core concepts

● can communicate with experts

● apply knowledge to intersection with core disciplines

● database engineers need:

○ TCP/IP networking concepts○ scripting○ virtualization, automation and orchestration

db engineer’s manifesto

© 2014 Pythian

● its about the mission

● protect the data

● eliminate waste

● data-driven decision making

● databases are not special

● eliminate the barriers between software and ops

bringingitto

thetable

© 2014 Pythian

design

© 2014 Pythian

what do we design for?

© 2014 Pythian

● mission KPIs

● function, not form

● operational processes and management

designing for the business

© 2014 Pythian

velocity

efficiency

security

performance

availability

https://derpicdn.net/img/view/2012/8/3/65841__safe_fluttershy_tank_scooter_artist-colon-giantmosquito_tortoise_vespa.jpg

velocity

© 2014 Pythian

velocity ● how fragile is the datastore?

● what features allow for rapid change?

● how rapidly can fallback be performed?

● does it integrate with CD pipelines?

● how rapidly can developers learn to use it?

efficiency

© 2014 Pythian

velocity

efficiency

● how elastic is the datastore?

● how configurable is resource allocation?

● does the datastore require vendor lock-in?

● can we measure our cost per transaction?

security

© 2014 Pythian

velocity ● is there robust and granular user management?

● is there an audit trail?

● is data and connection encryption supported?

● history of vulnerabilities?

efficiency

security

performance

© 2014 Pythian

velocity

efficiency

security

performance

● how does the datastore handle concurrency? large data size per node?

● at what points do consistency requirements get violated?

● what are the performance curves and cliffs at scale?

● how tunable and instrumented is the datastore?

availability

© 2014 Pythian

velocity

efficiency

security

performance

availability

● what are the single points of failure?

● can you take consistent backups?

● how are network partitions handled, and recovered from?

● is failure and rebalancing built in, or do you have to build it?

● how do node, partition and system failures impact consistency?

choosing your datastore(s)

© 2014 Pythian

“partitions always occur, whether from network outages or overload of components”

“design for performance, consistency, durability and availability, and make tradeoffs along the way”

polyglot love

© 2014 Pythian

so, how do we classify and choose our datastores?

traditional relational stores (RDBMS):

● e.f. codd’s 12 rules of relations○ simplified to tables with key relationships and constraints

● SQL accessible

● ACID levels○ consistency and isolation may be tunable, or fuzzy

distributed DBMS functionality

© 2014 Pythian

relational? strict EF Codd limited key/value JSON/BSON

data access language

SQL mapreduce SQLesque (ie CQL)

custom

atomicity strict ACID node only cluster level specific ops Only

consistency (ACID) strict ACID (three rules) distributed (quorum) eventual, weak (w/conflict res.)

eventual, strong (i.e CRDT)

isolation strict ACID relaxed isolation

durability strict ACID relaxed durability Write Repair

availability one writer:many readers write:read anywhere allow lagged read

partition tolerance shut down whole cluster shut down minority read only minority continue in parallel

understanding the spectrum

© 2014 Pythian

compromise reason tunable example

availability consistency boolean replica out of service

durability latency yes disk flush intervals

partition tolerance availability yes read only minority

isolation concurrency yes dirty reads allowed

consistency performance yes quorums, CRDT

designing for operations

© 2014 Pythian

component patterns anti-patterns

configuration mgmt config files, APIs in-memory changes, binaries

degradation mgmt read-only modes, queue draining, timeouts, dynamic configuration changes

bad defaults, long timeouts, static configurations

change mgmt online changes, fast alters, atomic operations, instrumented DDL, data dictionaries

lack of data governance, table and schema level locking, config based DDL

elasticity auto session failover, pre-load cache, easy node add/removes, auto-registration

cold cache, slow starts, complicated member management, impactful hash redistribution

designing for operations (cont.)

© 2014 Pythian


availability online backups, incremental backups, auto-failover, auto-registration, read/write anywhere, rate limits

single points of failure, common partitions, cold backups, complicated registration, special snowflakes

instrumentation good data dictionary, comprehensive status instrumentation, APIs, dynamic logging levels, audit trails, documentation

lack of documentation, lack of API, minimal status management, no plugins for common tooling

security granular access controls, encryption options, audit trails, ldap integration, granular privileges, policy enforcement, SSL

no roles, no object level permissions, no encryption, no SSL, no audit/access logging

designing for operations (cont.)

© 2014 Pythian


support extensive documentation, user community, responsive vendor, transparent bug databases, active committers, exception collection, clean source code

opposites of ←-

disciplines and systems

© 2014 Pythian

discipline concept systems

operating system concepts disk, virtual and network I/O linux, solaris, windows, etc...

kernel behaviors

networking tcp/ip core theory and constructs

network services dhcp, dns, load balancing

distributed systems dynamo based stores riak, cassandra, voldemort

hadoop and mapreduce hadoop, hbase, hive

document stores mongodb, couchbase

shared state engines zookeeper, consul

a day in the life...

© 2014 Pythian

● selecting datastores to add to production platform catalogs

● dbms and feature education to software, systems and operations engineers

● integrating with company-wide services


© 2014 Pythian

● validating acceptable configurations

● testing and benchmarking new versions, features and configurations

● documenting and sharing standards

deploying infrastructure

© 2014 Pythian

● configuration management

● orchestration

● test automation

● self-service to the team

defining “done”

© 2014 Pythian

● standards and documentation of acceptable use

● integration into operational visibility platforms

● playbooks for operations staff

● repeatable tests, manual and automated to verify quality

deploying software

© 2014 Pythian

● agile participation with developers

● using version control for schemas and metadata

● more flexible data model approaches

● teach your engineering teams how to assess risk, performance and impact

disciplines and systems

© 2014 Pythian


config mgmt and orchestration building playbooks/recipes chef, puppet, ansible...

scripting python, ruby, etc...

virtualization, cloud architecture basics compute, storage, network etc...

operational visibility core components, writing checks, etc...

sensu, graphite, ELK, grafana, etc...

statistics, anomaly tests moving averages, predictive analytics

test Infrastructures testing theory/principles jenkins, travis, code repo etc...

scripting, automation jenkins, travis, code repo etc...

disciplines and systems (cont)

© 2014 Pythian


agile Software Development requirements, epics, stories any tool…

process any tool… .

software repositories checking out, coding, integration

github, bitbucket etc...


© 2014 Pythian

● attending scrums, grooming and planning

● committing DDL and DML to the codebase

● monitoring automated builds and tests, performing manual ones

a day in the life… (cont)

© 2014 Pythian

● providing new or modified recipes for CM and automation

● pairing with, and teachingengineers, iterating on the schema

maintaining the datastores

© 2014 Pythian

● maintaining dataflow

● backup and recovery

● change management

● incident management

● capacity planning

dataflow

© 2014 Pythian

● lambda architectures○ data backbones: pubsub○ batch processing○ hadoop

● cache population, use, and flushing

● search stores

backup and recovery

© 2014 Pythian

● this has not changed, we live and dieby the safety of our data

● borrow ideas of continuous deployment, for continuous recovery testing

● build backup and recovery into every possible process

change management

© 2014 Pythian

● business velocity often demands that developers and DBEs work together to assess risk, rather than gate keep

● where possible, we are pushing changes through code deployments and automated scripting

● immutable architectures will force us to create change at the template layer and redeploy

incident management

© 2014 Pythian

● where possible, we are identifying items in need of work before they become problems

● we sometimes share pagers with ops and dev groups, often as tier 2 escalation


© 2014 Pythian

● reviewing current workloads and tuning

● managing escalations on DB issues

● continuous improvement

● writing, testing and performing change plans as part of the deployment process

Database engineering

Data & Analytics

Transcript of Database engineering