MySQL 101
-
Upload
jason-nguyen -
Category
Data & Analytics
-
view
46 -
download
0
Transcript of MySQL 101
AGENDA
• Introduction
• Storage engine
• SQL Language
• Data types
• Index
• Normalization
• Guidelines for good schema design
INTRODUCTION
• What’s MySQL?
• Since 1995
• Written in C/C++
• RDBMS (Relational Database Management System)
STORAGE ENGINE
• What is it?
• Stores, handles, and retrieves information from a table
• There are two types of storage engines in MySQL: Transactional and non-Transactional
• MySQL provides support for thirteen difference storage engines.
• Most people who use two storage engines: MyISAM and InnoDB
• Default storage engine for MySQL <= 5.5 was MyISAM and other was InnoDB
• Comparison between InnoDB and MyISAM (ref: http://en.wikipedia.org/wiki/Comparison_of_MySQL_database_engines)
• Choosing the right engine
CHOOSE YOUR NUMERIC DATA TYPES
• Favorites signs of poor design:
• INT(1)
• BIGINT AUTO_INCREMENT
• No UNSIGNED used
• DECIMAL(31,0)
CHOOSE YOUR NUMERIC DATA TYPES
• INT(1) – 1 does not mean 1 digit
• 1 represents client output display format only
• INT is 4 bytes, TINYINT is 1 byte
• TINYINT UNSIGNED can store from 0 – 255
• BIT is even better when values are 0 - 1
CHOOSE YOUR NUMERIC DATA TYPES
• BIGINT is not need for AUTO_INCREMENT
• INT UNSIGNED can store 4.3 billion values
• BIGINT is applicable for some columns
• E.g: Summation of values
CHOOSE YOUR NUMERIC DATA TYPES
• Best practice
• All integer columns UNSIGNED unless there is a reason otherwise
• Adds a level of data integrity for negative values
OTHER DATA TYPE EFFICIENCIES
• TIMESTAMP v DATETIME
• Suitable for EPOCH only values
• TIMESTAMP is 4 bytes
• DATETIME is 8 bytes
OTHER DATA TYPE EFFICIENCIES
• CHAR(n)
• Use VARCHAR(n) for variable values
• E.g. CHAR(128) when storing ~ 10 bytes
APPLICATION DATA TYPE EFFICIENCIES
• Using Codes or ENUM
• A description is a presentation layer function
• e.g. 'M', 'F' instead of 'Male', 'Female’
• e.g. 'A', 'I' instead of 'Active', 'Inactive’
• BINARY(16/20) v CHAR(32/40)
• MD5() or HASH() Hex value with twice the length
• INT UNSIGNED for IPv4 address
• VARCHAR(15) results in average 12 bytes v 4 bytes
NOT NULL
• Saves up to a byte per column per row of data
• Double benefit for indexed columns
• NOT NULL DEFAULT '' is bad design
• Always use NOT NULL unless there is a reason why not
INDEX
• What are indexes for?
• Speed up access for database
• Help to enforce constrains (UNIQUE, FOREIGN KEY)
Queries can ran without any indexes
But it can take a really long time
OVERHEAD OF THE INDEXING
• Indexes are costly; Do not add more than you need
• In most cases extending index is better then adding new one
• Writes: Updating indexes is often major cost of database writes
• Reads: Wasted space on disk and in memory; additional overhead during query optimization
GUIDELINES FOR GOOD SCHEMA DESIGN
• These are only guidelines, not rules, and your code will work if they're not followed. But you'll have an easier path if you adopt them
• DO name your database tables in the singular, not plural
• DON'T prepend db table names to field names
• DON'T include a table prefix in the model class name
• DO name a table's own ID column "id"
• AVOID semantically-meaningful primary key names
• DO define foreign-key relationships in the database schema
• DO name your foreign key fields ending in "id”