Database engineering

59
Database Engineering Velocity Conference, Santa Clara Laine Campbell, Chief Unicorn May 28th, 2015

Transcript of Database engineering

Page 1: Database engineering

Database Engineering

Velocity Conference, Santa Clara

Laine Campbell, Chief Unicorn

May 28th, 2015

Page 2: Database engineering

content

© 2014 Pythian

● reliability and operations engineering - history and context

● design - architecture and objectives

● deploy - delivery pipelines, capacity and automation

● maintain - data stewardship, business continuity and change

Page 3: Database engineering

Who am I?

Page 4: Database engineering

Who am I?

Page 5: Database engineering

© 2014 Pythian

● first of a series

● focuses on operations engineering for DBEs

● laine gives perfectionist and enterprise focus

● charity gives real world start-up view

● crazy breakdancing

Page 6: Database engineering

© 2014 Pythian

O’Reilly doesn’t allow unicorns on their covers

in early release by OSCONF Portland

Page 7: Database engineering

© 2014 Pythian

me with a jaguar…

Page 8: Database engineering

engineering, not administration

© 2014 Pythian

Page 9: Database engineering

engineering

© 2014 Pythian

● quantitative

● interdisciplinary

● results focused

● repeatable and code-driven

Page 10: Database engineering

systems engineering

© 2014 Pythian

● designing and managing complex systems for complete life-cycle

● translation from business to system, with focus on communications, resiliency, flexibility and interoperability

● ensuring the system generates necessary data for measuring its performance against KPIs

Page 11: Database engineering

operations engineering

© 2014 Pythian

● allocate resources to balance competing objectives of cost, service level and productivity

● designing and building processes and infrastructure to maintain peak flexibility and quality to serve the business

● gathering and utilizing data to effectively provide continuous improvement to systems

Page 12: Database engineering

database engineering

© 2014 Pythian

● participates in systems and operations engineering processes by providing expertise to facilitate all work

● understands and teaches dataset nuances to ensure all KPIs are met

● anchors teams with expertise for troubleshooting, recovery and other tasks requiring depth, not breadth

Page 13: Database engineering

reliability engineering

© 2014 Pythian

● focuses on the glue common to all services, platform-centric

● deployment, efficiency, scale, performance, observability

● often done by systems and operations engineering, rather than being their own function

Page 14: Database engineering

yesterday’s DBA

© 2014 Pythian

● gatekeeper

● master builder

● superhero

● siloed

● specialized

Page 15: Database engineering

paradigm shifts

© 2014 Pythian

Page 16: Database engineering

virtualization and cloud

© 2014 Pythian

● forces horizontal scaling

● forces designing for resilience

● elasticity drives new data storage

● management by API

Page 17: Database engineering

infrastructure as code

© 2014 Pythian

● forces standardization

● forces us to learn to code

● we start building platforms

● changes become deployments

Page 18: Database engineering

devops cultures

© 2014 Pythian

● lean manufacturing defines our workflows

● tighter feedback loops require organizational shifts

● experimentation and controlled failure shift architecture and process design

● integration drives empathy

Page 19: Database engineering

continuous delivery

© 2014 Pythian

● brings us to the source code control paradigm

● we must be teachers, not gatekeepers

● testing and compliance become top priorities

Page 20: Database engineering

polyglot persistence

© 2014 Pythian

● relational is not the end of the line

● data must be looked at end to end

● function dictates form

● we cannot predict all uses

Page 21: Database engineering

today’s database engineer

© 2014 Pythian

Page 22: Database engineering

specialty = depth

© 2014 Pythian

● analytical thinking

● problem solving

● edge case knowledge

● exhibited in at least one system

○ MySQL/Solaris○ Cassandra/RHEL○ PostgreSQL/Windows

Page 23: Database engineering

breadth, crossingdisciplines

© 2014 Pythian

● understands core concepts

● can communicate with experts

● apply knowledge to intersection with core disciplines

● database engineers need:

○ TCP/IP networking concepts○ scripting○ virtualization, automation and orchestration

Page 24: Database engineering

db engineer’s manifesto

© 2014 Pythian

● its about the mission

● protect the data

● eliminate waste

● data-driven decision making

● databases are not special

● eliminate the barriers between software and ops

Page 25: Database engineering

bringingitto

thetable

© 2014 Pythian

Page 26: Database engineering

design

© 2014 Pythian

Page 27: Database engineering

what do we design for?

© 2014 Pythian

● mission KPIs

● function, not form

● operational processes and management

Page 28: Database engineering

designing for the business

© 2014 Pythian

velocity

efficiency

security

performance

availability

https://derpicdn.net/img/view/2012/8/3/65841__safe_fluttershy_tank_scooter_artist-colon-giantmosquito_tortoise_vespa.jpg

Page 29: Database engineering

velocity

© 2014 Pythian

velocity ● how fragile is the datastore?

● what features allow for rapid change?

● how rapidly can fallback be performed?

● does it integrate with CD pipelines?

● how rapidly can developers learn to use it?

Page 30: Database engineering

efficiency

© 2014 Pythian

velocity

efficiency

● how elastic is the datastore?

● how configurable is resource allocation?

● does the datastore require vendor lock-in?

● can we measure our cost per transaction?

Page 31: Database engineering

security

© 2014 Pythian

velocity ● is there robust and granular user management?

● is there an audit trail?

● is data and connection encryption supported?

● history of vulnerabilities?

efficiency

security

Page 32: Database engineering

performance

© 2014 Pythian

velocity

efficiency

security

performance

● how does the datastore handle concurrency? large data size per node?

● at what points do consistency requirements get violated?

● what are the performance curves and cliffs at scale?

● how tunable and instrumented is the datastore?

Page 33: Database engineering

availability

© 2014 Pythian

velocity

efficiency

security

performance

availability

● what are the single points of failure?

● can you take consistent backups?

● how are network partitions handled, and recovered from?

● is failure and rebalancing built in, or do you have to build it?

● how do node, partition and system failures impact consistency?

Page 34: Database engineering

choosing your datastore(s)

© 2014 Pythian

“partitions always occur, whether from network outages or overload of components”

“design for performance, consistency, durability and availability, and make tradeoffs along the way”

Page 35: Database engineering

polyglot love

© 2014 Pythian

so, how do we classify and choose our datastores?

traditional relational stores (RDBMS):

● e.f. codd’s 12 rules of relations○ simplified to tables with key relationships and constraints

● SQL accessible

● ACID levels○ consistency and isolation may be tunable, or fuzzy

Page 36: Database engineering

distributed DBMS functionality

© 2014 Pythian

relational? strict EF Codd limited key/value JSON/BSON

data access language

SQL mapreduce SQLesque (ie CQL)

custom

atomicity strict ACID node only cluster level specific ops Only

consistency (ACID) strict ACID (three rules) distributed (quorum) eventual, weak (w/conflict res.)

eventual, strong (i.e CRDT)

isolation strict ACID relaxed isolation

durability strict ACID relaxed durability Write Repair

availability one writer:many readers write:read anywhere allow lagged read

partition tolerance shut down whole cluster shut down minority read only minority continue in parallel

Page 37: Database engineering

understanding the spectrum

© 2014 Pythian

compromise reason tunable example

availability consistency boolean replica out of service

durability latency yes disk flush intervals

partition tolerance availability yes read only minority

isolation concurrency yes dirty reads allowed

consistency performance yes quorums, CRDT

Page 38: Database engineering

designing for operations

© 2014 Pythian

component patterns anti-patterns

configuration mgmt config files, APIs in-memory changes, binaries

degradation mgmt read-only modes, queue draining, timeouts, dynamic configuration changes

bad defaults, long timeouts, static configurations

change mgmt online changes, fast alters, atomic operations, instrumented DDL, data dictionaries

lack of data governance, table and schema level locking, config based DDL

elasticity auto session failover, pre-load cache, easy node add/removes, auto-registration

cold cache, slow starts, complicated member management, impactful hash redistribution

Page 39: Database engineering

designing for operations (cont.)

© 2014 Pythian

component patterns anti-patterns

availability online backups, incremental backups, auto-failover, auto-registration, read/write anywhere, rate limits

single points of failure, common partitions, cold backups, complicated registration, special snowflakes

instrumentation good data dictionary, comprehensive status instrumentation, APIs, dynamic logging levels, audit trails, documentation

lack of documentation, lack of API, minimal status management, no plugins for common tooling

security granular access controls, encryption options, audit trails, ldap integration, granular privileges, policy enforcement, SSL

no roles, no object level permissions, no encryption, no SSL, no audit/access logging

Page 40: Database engineering

designing for operations (cont.)

© 2014 Pythian

component patterns anti-patterns

support extensive documentation, user community, responsive vendor, transparent bug databases, active committers, exception collection, clean source code

opposites of ←-

Page 41: Database engineering

disciplines and systems

© 2014 Pythian

discipline concept systems

operating system concepts disk, virtual and network I/O linux, solaris, windows, etc...

kernel behaviors

networking tcp/ip core theory and constructs

network services dhcp, dns, load balancing

distributed systems dynamo based stores riak, cassandra, voldemort

hadoop and mapreduce hadoop, hbase, hive

document stores mongodb, couchbase

shared state engines zookeeper, consul

Page 42: Database engineering

a day in the life...

© 2014 Pythian

● selecting datastores to add to production platform catalogs

● dbms and feature education to software, systems and operations engineers

● integrating with company-wide services

Page 43: Database engineering

a day in the life...

© 2014 Pythian

● validating acceptable configurations

● testing and benchmarking new versions, features and configurations

● documenting and sharing standards

Page 44: Database engineering

© 2014 Pythian

deploy

Page 45: Database engineering

deploying infrastructure

© 2014 Pythian

● configuration management

● orchestration

● test automation

● self-service to the team

Page 46: Database engineering

defining “done”

© 2014 Pythian

● standards and documentation of acceptable use

● integration into operational visibility platforms

● playbooks for operations staff

● repeatable tests, manual and automated to verify quality

Page 47: Database engineering

deploying software

© 2014 Pythian

● agile participation with developers

● using version control for schemas and metadata

● more flexible data model approaches

● teach your engineering teams how to assess risk, performance and impact

Page 48: Database engineering

disciplines and systems

© 2014 Pythian

discipline concept systems

config mgmt and orchestration building playbooks/recipes chef, puppet, ansible...

scripting python, ruby, etc...

virtualization, cloud architecture basics compute, storage, network etc...

operational visibility core components, writing checks, etc...

sensu, graphite, ELK, grafana, etc...

statistics, anomaly tests moving averages, predictive analytics

test Infrastructures testing theory/principles jenkins, travis, code repo etc...

scripting, automation jenkins, travis, code repo etc...

Page 49: Database engineering

disciplines and systems (cont)

© 2014 Pythian

discipline concept systems

agile Software Development requirements, epics, stories any tool…

process any tool… .

software repositories checking out, coding, integration

github, bitbucket etc...

Page 50: Database engineering

DEV OPS

database engineers:O.G. devops

© 2014 Pythian

DBADEV OPS

DBE

shared goals, tools and processes

Page 51: Database engineering

a day in the life...

© 2014 Pythian

● attending scrums, grooming and planning

● committing DDL and DML to the codebase

● monitoring automated builds and tests, performing manual ones

Page 52: Database engineering

a day in the life… (cont)

© 2014 Pythian

● providing new or modified recipes for CM and automation

● pairing with, and teachingengineers, iterating on the schema

Page 53: Database engineering

maintain

© 2014 Pythian

Page 54: Database engineering

maintaining the datastores

© 2014 Pythian

● maintaining dataflow

● backup and recovery

● change management

● incident management

● capacity planning

Page 55: Database engineering

dataflow

© 2014 Pythian

● lambda architectures○ data backbones: pubsub○ batch processing○ hadoop

● cache population, use, and flushing

● search stores

Page 56: Database engineering

backup and recovery

© 2014 Pythian

● this has not changed, we live and dieby the safety of our data

● borrow ideas of continuous deployment, for continuous recovery testing

● build backup and recovery into every possible process

Page 57: Database engineering

change management

© 2014 Pythian

● business velocity often demands that developers and DBEs work together to assess risk, rather than gate keep

● where possible, we are pushing changes through code deployments and automated scripting

● immutable architectures will force us to create change at the template layer and redeploy

Page 58: Database engineering

incident management

© 2014 Pythian

● where possible, we are identifying items in need of work before they become problems

● we sometimes share pagers with ops and dev groups, often as tier 2 escalation

Page 59: Database engineering

a day in the life...

© 2014 Pythian

● reviewing current workloads and tuning

● managing escalations on DB issues

● continuous improvement

● writing, testing and performing change plans as part of the deployment process