Honey I Shrunk the Database
-
Upload
vanessa-hurst -
Category
Technology
-
view
1.861 -
download
0
description
Transcript of Honey I Shrunk the Database
![Page 1: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/1.jpg)
Honey, I Shrunk the Database
For Test and Development Environments
Postgres Open, September 2011
Vanessa HurstPaperless Post
![Page 2: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/2.jpg)
![Page 3: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/3.jpg)
User Data
![Page 4: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/4.jpg)
Why Shrink?
Accuracy
You don’t truly know how your app will behave in production unless you use real data.
Production data is the ultimate in accuracy.
![Page 5: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/5.jpg)
Why Shrink?
Accuracy
Freshness
New data should be available regularly.
Full database refreshes should be timely.
![Page 6: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/6.jpg)
Why Shrink?
Accuracy
Freshness
Resource Limitations
Staging and developer machines cannot handle production load.
![Page 7: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/7.jpg)
Why Shrink?
Accuracy
Freshness
Resource Limitations
Data Protection
Limit spread of sensitive user or client data.
![Page 8: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/8.jpg)
Why Shrink?
Accuracy
Freshness
Resource Limitations
Data Protection
![Page 9: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/9.jpg)
Case Study: Paperless Post
Requirements Freshness – Daily, On command for non-
developers Shrinkage – Slices, Mutations
![Page 10: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/10.jpg)
Case Study: Paperless Post
Requirements Freshness – Daily, On command for non-
developers Shrinkage – Slices, Mutations
Resources Source – extra disk space, RAM, and CPUs Destination – limited, often entirely un-
optimized Development -- constrained DBA resources
![Page 11: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/11.jpg)
Shrink Strategies
Copies
Restored backups or live replicas of entire production database
![Page 12: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/12.jpg)
Shrink Strategies
Copies
Slices
Select portions of exact data
![Page 13: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/13.jpg)
Shrink Strategies
Copies
Slices
Mutations
Sanitized, anonymized, or otherwise changed data
![Page 14: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/14.jpg)
Shrink Strategies
Copies
Slices
Mutations
Assumptions
Seed databases, fixtures, test data
![Page 15: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/15.jpg)
Shrink Strategies
Copies
Slices
Mutations
Assumptions
![Page 16: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/16.jpg)
Slices
Vertical Slice Difficult to obtain a valid, useful subset of data. Example: Include some entire tables, exclude
others
![Page 17: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/17.jpg)
Slices
Vertical Slice Difficult to obtain a valid, useful subset of data. Example: Include some entire tables, exclude
others
Horizontal Slice Difficult to write and maintain. Example: SQL or application code to determine
subset of data
![Page 18: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/18.jpg)
PG Tools – Vertical Slice
Flexibility at Source (Production)
pg_dump Include data only [-a --data-only] Include table schema only [-s --schema-only] Select tables [-t table1 table2 --table table1
table2] Select schemas [-n schema --schema=schema] Exclude schemas [-N schema --exclude-
schema=schema]
![Page 19: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/19.jpg)
PG Tools – Vertical Slice
Flexibility at Destination (Staging, Development)
pg_restore Include data only [-a --data-only] Select indexes [-i index --index=index] Tune processing [-j number-of-jobs --jobs=number-
of-jobs] Select schemas [-n schema --schema=schema] Select triggers[-T trigger --trigger=trigger] Exclude privileges [-x --no-privileges --no-acl]
![Page 20: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/20.jpg)
![Page 21: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/21.jpg)
Mutations
External Data Protection HIPAA Regulations PCI Compliance API Terms of Use
![Page 22: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/22.jpg)
Mutations
External Data Protection HIPAA Regulations PCI Compliance API Terms of Use
Internal Data Protection Protecting your users’ personal data Protecting your users from accidents, e.g. staging
emails Your Terms of Service
![Page 23: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/23.jpg)
User Data
![Page 24: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/24.jpg)
Case Study: Paperless Post
Composite Slice including
Vertical Slice – All application object schemas
Vertical Slice – Entire tables of static content
Horizontal Slice – Subset of users and their data
Mutation – Changed user email addresses
![Page 25: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/25.jpg)
Case Study: Paperless Post
Composite Slice including
Vertical Slice – All application object schemas
pg_dump --clean --schema-only --schema public db-01 > slice.sql
![Page 26: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/26.jpg)
Case Study: Paperless Post
Composite Slice including
Vertical Slice – All application object schemas
pg_dump --clean --schema-only --schema public db-01 > slice.sql
Vertical Slice – Entire tables of static content
pg_dump --data-only --schema public -t cards db-01 >> slice.sql
![Page 27: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/27.jpg)
Case Study: Paperless Post
Composite Slice including
Vertical Slice – All application object schemas
pg_dump --clean --schema-only --schema public db-01 > slice.sql
Vertical Slice – Entire tables of static content
pg_dump --data-only --schema public -t cards db-01 >> slice.sql
Horizontal Slice – Subset of users and their dataMutation – Changed user email addresses
![Page 28: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/28.jpg)
Case Study: Paperless Post
CREATE SCHEMA staging;
![Page 29: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/29.jpg)
Case Study: Paperless Post
Horizontal Slice Custom SQL
SELECT * INTO staging.usersFROM usersWHERE EXISTS (subset of users);
![Page 30: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/30.jpg)
Case Study: Paperless Post
Horizontal Slice Custom SQL
SELECT * INTO staging.usersFROM usersWHERE EXISTS (subset of users);
Dynamic relative to full data set or newly created slice
SELECT * INTO staging.stuffFROM stuffWHERE EXISTS (stuff per staging.users);
![Page 31: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/31.jpg)
Case Study: Paperless Post
Horizontal Slice Custom SQL Dynamic relative to full data set or newly created
slice
Mutations Email Addresses
Use regular expressions to clean non-admin addressese.g. [email protected] => [email protected]
Cached Data Clear cached short link from link-shortening API
![Page 32: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/32.jpg)
Case Study: Paperless Post
Composite Slice including
Vertical Slice – All application object schemas
pg_dump --clean --schema-only --schema public db-01 > slice.sql
Vertical Slice – Entire tables of static content
pg_dump --data-only --schema public -t cards db-01 >> slice.sql
Horizontal Slice – Subset of users and their dataMutation – Changed user email addresses
pg_dump --data-only --schema staging db-01 >> slice.sql
![Page 33: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/33.jpg)
Case Study: Paperless Post
Rebuild Prepare new database as standby Gracefully close connections Rotate by renaming databases
Security Dedicated database build user Membership in application user role Application user role & privileges remain
![Page 34: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/34.jpg)
Case Study: Paperless Post
Rebuild $ bzcat slice.sql.bz2 | psql db-new Staging schema has not been created, so all
data loads to default schema
![Page 35: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/35.jpg)
Case Study: Paperless Post
We hacked our rebuild by importing across schemas!
Now our sequences are wrong, causing duplicate data errors every time we try to insert into tables.
![Page 36: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/36.jpg)
Secret Weapon
--Updates all serial sequences for ID columns only
BEGINFOR table_record IN SELECT pc.relname FROM pg_class pc
WHERE pc.relkind = 'r' AND EXISTS (SELECT 1 FROM pg_attribute pa WHERE pa.attname = 'id' AND pa.attrelid = pc.oid) LOOPtable_name = table_record.relname::text;EXECUTE 'SELECT setval(pg_get_serial_sequence(' || quote_literal(table_name) || ', ' || quote_literal('id')::text || '), MAX(id)) FROM ' || table_name || ' WHERE EXISTS (SELECT 1 FROM ' || table_name || ')';
END LOOP;
![Page 37: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/37.jpg)
Case Study: Paperless Post
Rebuild $ bzcat slice.sql.bz2 | psql db-new Staging schema has not been created, so all
data loads to default schema echo “select 1 from update_id_sequences();”
>> slice.sql Vacuum Reindex
![Page 38: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/38.jpg)
Case Study: Paperless Post
Security Database build user
CREATE DB privileges Member of Application user role
Application user remains database owner Application user privileges remain limited Build only works in predetermined
environments
![Page 39: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/39.jpg)
Case Study: Paperless Post
Requirements Freshness – Daily, On command for non-
developers Shrinkage – Slices, Mutations
Resources Source – extra disk space, RAM, and CPUs Destination – limited, often entirely un-
optimized Development -- constrained DBA resources
![Page 40: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/40.jpg)
Questions?
Postgres Open, September 2011
Vanessa HurstPaperless Post
@DBNess
![Page 41: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/41.jpg)
More Tools
Copies -- LVMSnapshots See talk by Jon Erdman at PG Conf EU Great for all reads Data stays virtualized & doesn’t take up space
until changed Ideal for DDL changes without actual data
changes
![Page 42: Honey I Shrunk the Database](https://reader033.fdocuments.us/reader033/viewer/2022061113/545bf8d2af7959b90e8b45c9/html5/thumbnails/42.jpg)
More Tools
Copies, Slices -- pg_staging by dmitrihttp://github.com/dimitri/pg_staging Simple -- pauses pgbouncer & restores backup Efficient -- leverage bulk loading Flexible -- supports varying psql files Custom -- limited
Slices -- replicate by rtomayko of Githubhttp://github.com/rtomayko/replicate Simple - Preserves object relations via ActiveRecord Inefficient -- Creates text-based .dump Inflexible -- Corrupts id sequences on data insert Custom -- highly