Database engineering
-
Upload
laine-campbell -
Category
Data & Analytics
-
view
604 -
download
0
Transcript of Database engineering
Database Engineering
Velocity Conference, Santa Clara
Laine Campbell, Chief Unicorn
May 28th, 2015
content
© 2014 Pythian
● reliability and operations engineering - history and context
● design - architecture and objectives
● deploy - delivery pipelines, capacity and automation
● maintain - data stewardship, business continuity and change
Who am I?
Who am I?
© 2014 Pythian
● first of a series
● focuses on operations engineering for DBEs
● laine gives perfectionist and enterprise focus
● charity gives real world start-up view
● crazy breakdancing
© 2014 Pythian
O’Reilly doesn’t allow unicorns on their covers
in early release by OSCONF Portland
© 2014 Pythian
me with a jaguar…
engineering, not administration
© 2014 Pythian
engineering
© 2014 Pythian
● quantitative
● interdisciplinary
● results focused
● repeatable and code-driven
systems engineering
© 2014 Pythian
● designing and managing complex systems for complete life-cycle
● translation from business to system, with focus on communications, resiliency, flexibility and interoperability
● ensuring the system generates necessary data for measuring its performance against KPIs
operations engineering
© 2014 Pythian
● allocate resources to balance competing objectives of cost, service level and productivity
● designing and building processes and infrastructure to maintain peak flexibility and quality to serve the business
● gathering and utilizing data to effectively provide continuous improvement to systems
database engineering
© 2014 Pythian
● participates in systems and operations engineering processes by providing expertise to facilitate all work
● understands and teaches dataset nuances to ensure all KPIs are met
● anchors teams with expertise for troubleshooting, recovery and other tasks requiring depth, not breadth
reliability engineering
© 2014 Pythian
● focuses on the glue common to all services, platform-centric
● deployment, efficiency, scale, performance, observability
● often done by systems and operations engineering, rather than being their own function
yesterday’s DBA
© 2014 Pythian
● gatekeeper
● master builder
● superhero
● siloed
● specialized
paradigm shifts
© 2014 Pythian
virtualization and cloud
© 2014 Pythian
● forces horizontal scaling
● forces designing for resilience
● elasticity drives new data storage
● management by API
infrastructure as code
© 2014 Pythian
● forces standardization
● forces us to learn to code
● we start building platforms
● changes become deployments
devops cultures
© 2014 Pythian
● lean manufacturing defines our workflows
● tighter feedback loops require organizational shifts
● experimentation and controlled failure shift architecture and process design
● integration drives empathy
continuous delivery
© 2014 Pythian
● brings us to the source code control paradigm
● we must be teachers, not gatekeepers
● testing and compliance become top priorities
polyglot persistence
© 2014 Pythian
● relational is not the end of the line
● data must be looked at end to end
● function dictates form
● we cannot predict all uses
today’s database engineer
© 2014 Pythian
specialty = depth
© 2014 Pythian
● analytical thinking
● problem solving
● edge case knowledge
● exhibited in at least one system
○ MySQL/Solaris○ Cassandra/RHEL○ PostgreSQL/Windows
breadth, crossingdisciplines
© 2014 Pythian
● understands core concepts
● can communicate with experts
● apply knowledge to intersection with core disciplines
● database engineers need:
○ TCP/IP networking concepts○ scripting○ virtualization, automation and orchestration
db engineer’s manifesto
© 2014 Pythian
● its about the mission
● protect the data
● eliminate waste
● data-driven decision making
● databases are not special
● eliminate the barriers between software and ops
bringingitto
thetable
© 2014 Pythian
design
© 2014 Pythian
what do we design for?
© 2014 Pythian
● mission KPIs
● function, not form
● operational processes and management
designing for the business
© 2014 Pythian
velocity
efficiency
security
performance
availability
https://derpicdn.net/img/view/2012/8/3/65841__safe_fluttershy_tank_scooter_artist-colon-giantmosquito_tortoise_vespa.jpg
velocity
© 2014 Pythian
velocity ● how fragile is the datastore?
● what features allow for rapid change?
● how rapidly can fallback be performed?
● does it integrate with CD pipelines?
● how rapidly can developers learn to use it?
efficiency
© 2014 Pythian
velocity
efficiency
● how elastic is the datastore?
● how configurable is resource allocation?
● does the datastore require vendor lock-in?
● can we measure our cost per transaction?
security
© 2014 Pythian
velocity ● is there robust and granular user management?
● is there an audit trail?
● is data and connection encryption supported?
● history of vulnerabilities?
efficiency
security
performance
© 2014 Pythian
velocity
efficiency
security
performance
● how does the datastore handle concurrency? large data size per node?
● at what points do consistency requirements get violated?
● what are the performance curves and cliffs at scale?
● how tunable and instrumented is the datastore?
availability
© 2014 Pythian
velocity
efficiency
security
performance
availability
● what are the single points of failure?
● can you take consistent backups?
● how are network partitions handled, and recovered from?
● is failure and rebalancing built in, or do you have to build it?
● how do node, partition and system failures impact consistency?
choosing your datastore(s)
© 2014 Pythian
“partitions always occur, whether from network outages or overload of components”
“design for performance, consistency, durability and availability, and make tradeoffs along the way”
polyglot love
© 2014 Pythian
so, how do we classify and choose our datastores?
traditional relational stores (RDBMS):
● e.f. codd’s 12 rules of relations○ simplified to tables with key relationships and constraints
● SQL accessible
● ACID levels○ consistency and isolation may be tunable, or fuzzy
distributed DBMS functionality
© 2014 Pythian
relational? strict EF Codd limited key/value JSON/BSON
data access language
SQL mapreduce SQLesque (ie CQL)
custom
atomicity strict ACID node only cluster level specific ops Only
consistency (ACID) strict ACID (three rules) distributed (quorum) eventual, weak (w/conflict res.)
eventual, strong (i.e CRDT)
isolation strict ACID relaxed isolation
durability strict ACID relaxed durability Write Repair
availability one writer:many readers write:read anywhere allow lagged read
partition tolerance shut down whole cluster shut down minority read only minority continue in parallel
understanding the spectrum
© 2014 Pythian
compromise reason tunable example
availability consistency boolean replica out of service
durability latency yes disk flush intervals
partition tolerance availability yes read only minority
isolation concurrency yes dirty reads allowed
consistency performance yes quorums, CRDT
designing for operations
© 2014 Pythian
component patterns anti-patterns
configuration mgmt config files, APIs in-memory changes, binaries
degradation mgmt read-only modes, queue draining, timeouts, dynamic configuration changes
bad defaults, long timeouts, static configurations
change mgmt online changes, fast alters, atomic operations, instrumented DDL, data dictionaries
lack of data governance, table and schema level locking, config based DDL
elasticity auto session failover, pre-load cache, easy node add/removes, auto-registration
cold cache, slow starts, complicated member management, impactful hash redistribution
designing for operations (cont.)
© 2014 Pythian
component patterns anti-patterns
availability online backups, incremental backups, auto-failover, auto-registration, read/write anywhere, rate limits
single points of failure, common partitions, cold backups, complicated registration, special snowflakes
instrumentation good data dictionary, comprehensive status instrumentation, APIs, dynamic logging levels, audit trails, documentation
lack of documentation, lack of API, minimal status management, no plugins for common tooling
security granular access controls, encryption options, audit trails, ldap integration, granular privileges, policy enforcement, SSL
no roles, no object level permissions, no encryption, no SSL, no audit/access logging
designing for operations (cont.)
© 2014 Pythian
component patterns anti-patterns
support extensive documentation, user community, responsive vendor, transparent bug databases, active committers, exception collection, clean source code
opposites of ←-
disciplines and systems
© 2014 Pythian
discipline concept systems
operating system concepts disk, virtual and network I/O linux, solaris, windows, etc...
kernel behaviors
networking tcp/ip core theory and constructs
network services dhcp, dns, load balancing
distributed systems dynamo based stores riak, cassandra, voldemort
hadoop and mapreduce hadoop, hbase, hive
document stores mongodb, couchbase
shared state engines zookeeper, consul
a day in the life...
© 2014 Pythian
● selecting datastores to add to production platform catalogs
● dbms and feature education to software, systems and operations engineers
● integrating with company-wide services
a day in the life...
© 2014 Pythian
● validating acceptable configurations
● testing and benchmarking new versions, features and configurations
● documenting and sharing standards
© 2014 Pythian
deploy
deploying infrastructure
© 2014 Pythian
● configuration management
● orchestration
● test automation
● self-service to the team
defining “done”
© 2014 Pythian
● standards and documentation of acceptable use
● integration into operational visibility platforms
● playbooks for operations staff
● repeatable tests, manual and automated to verify quality
deploying software
© 2014 Pythian
● agile participation with developers
● using version control for schemas and metadata
● more flexible data model approaches
● teach your engineering teams how to assess risk, performance and impact
disciplines and systems
© 2014 Pythian
discipline concept systems
config mgmt and orchestration building playbooks/recipes chef, puppet, ansible...
scripting python, ruby, etc...
virtualization, cloud architecture basics compute, storage, network etc...
operational visibility core components, writing checks, etc...
sensu, graphite, ELK, grafana, etc...
statistics, anomaly tests moving averages, predictive analytics
test Infrastructures testing theory/principles jenkins, travis, code repo etc...
scripting, automation jenkins, travis, code repo etc...
disciplines and systems (cont)
© 2014 Pythian
discipline concept systems
agile Software Development requirements, epics, stories any tool…
process any tool… .
software repositories checking out, coding, integration
github, bitbucket etc...
DEV OPS
database engineers:O.G. devops
© 2014 Pythian
DBADEV OPS
DBE
shared goals, tools and processes
a day in the life...
© 2014 Pythian
● attending scrums, grooming and planning
● committing DDL and DML to the codebase
● monitoring automated builds and tests, performing manual ones
a day in the life… (cont)
© 2014 Pythian
● providing new or modified recipes for CM and automation
● pairing with, and teachingengineers, iterating on the schema
maintain
© 2014 Pythian
maintaining the datastores
© 2014 Pythian
● maintaining dataflow
● backup and recovery
● change management
● incident management
● capacity planning
dataflow
© 2014 Pythian
● lambda architectures○ data backbones: pubsub○ batch processing○ hadoop
● cache population, use, and flushing
● search stores
backup and recovery
© 2014 Pythian
● this has not changed, we live and dieby the safety of our data
● borrow ideas of continuous deployment, for continuous recovery testing
● build backup and recovery into every possible process
change management
© 2014 Pythian
● business velocity often demands that developers and DBEs work together to assess risk, rather than gate keep
● where possible, we are pushing changes through code deployments and automated scripting
● immutable architectures will force us to create change at the template layer and redeploy
incident management
© 2014 Pythian
● where possible, we are identifying items in need of work before they become problems
● we sometimes share pagers with ops and dev groups, often as tier 2 escalation
a day in the life...
© 2014 Pythian
● reviewing current workloads and tuning
● managing escalations on DB issues
● continuous improvement
● writing, testing and performing change plans as part of the deployment process