Who Moved My Tuple—Columnstore Indexes in SQL Server 2014
Joe D’Antoni Philadelphia SQL Server Users Group25 March 2014
Joe D’Antoni
Joe has over 15 years of experience with a wide variety of data platforms, in both Fortune 50 companies as well as smaller organizations
He is a frequent speaker on database administration, big data, and career management
He is the co-president of the Philadelphia SQL Server User’s Group
He wants you to make sure you can restore your data
Joedantoni.wordpress.com – Blog, Slides
http://bit.ly/SQLColumnstore -- Slides, Resources
AgendaIndexes—a basic overview
Columnstore—an introduction
Query Performance—Demo
2012 and 2014—What’s Changing?
2014—Demo
Questions
Indexes• Data Structure that allows us
to speed data retrieval, by maintaining an extra copy of data
• Can be filtered
• Can be function based, or ordered
• Penalty is that writes become more expensive
• More storage required
Indexes in SQL Server• Clustered vs. Nonclustered
• Clustered Index—Index Organized Table
• Non-clustered index “just an index”
Clustered Index• Data is ordered as is inserted
into pages• Data in clustered index is only
stored on disk once (it’s the data from the tables)
• Table without a clustered index is called a heap—no order at all
LastName FirstName Address PhoneNumber
Gates Bill 101 Money Ln (206)-555-1111
Smith John 101 Anywhere Rd
(212)-566-1112
Smith John 181 Uphill Way (215)-555-2425
Zuckerberg Mark 1 Hacker Way (650)-555-9999
Clustered Index Layout
Ellison Larry 1 Oracle Way (650)-555-1245New Record to be inserted
LastName FirstName Address PhoneNumber
Ellison Larry 1 Oracle Way (650)-555-1245
Gates Bill 101 Money Ln (206)-555-1111
Smith John 101 Anywhere Rd
(212)-566-1112
Smith John 181 Uphill Way (215)-555-2425
Zuckerberg Mark 1 Hacker Way (650)-555-9999
Non-Clustered Index• Duplicate copy of the data in table
• Provides point from index to table data
• No specific order of data in index
LastName FirstName Address PhoneNumber
Gates Bill 101 Money Ln (206)-555-1111
Smith John 101 Anywhere Rd
(212)-566-1112
Smith John 181 Uphill Way (215)-555-2425
Zuckerberg Mark 1 Hacker Way (650)-555-9999
Non-Clustered Index Layout
Ellison Larry 1 Oracle Way (650)-555-1245New Record to be inserted
LastName FirstName Address PhoneNumber
Gates Bill 101 Money Ln (206)-555-1111
Smith John 101 Anywhere Rd
(212)-566-1112
Smith John 181 Uphill Way (215)-555-2425
Zuckerberg Mark 1 Hacker Way (650)-555-9999
Ellison Larry 1 Oracle Way (650)-555-1245
So Why All This Talk About Indexes?
Data Warehouse Queries• Data Warehouses have a lot of data
• Querying lots of a data can take a really long time
• Processing data row by row—may not be the most efficient way to perform aggregations
Traditional Approaches To Improving Performance• Partitioned Tables• Indexed Views• Data Compression
LastName FirstName Address PhoneNumber
Ellison Larry 1 Oracle Way (650)-555-1245
Gates Bill 101 Money Ln (206)-555-1111
Smith John 101 Anywhere Rd
(212)-566-1112
Smith John 181 Uphill Way (215)-555-2425
Zuckerberg Mark 1 Hacker Way (650)-555-9999
Compression in SQL Server
Uncompressed Table
LastName
FirstName
Address PhoneNumber
Ellison Larry 1 Oracle Way (650)-555-1245
Gates Bill 101 Money Ln (206)-555-1111
Smith John 101 Anywhere Rd
(212)-566-1112
Smith John 181 Uphill Way
(215)-555-2425
Zuckerberg
Mark 1 Hacker Way (650)-555-9999
Row Compressed Table
LastName
FirstName
Address PhoneNumber
Ellison Larry 1 ***c** W** (650)-555-*245
G*t** B*** *0* M**** ** *2***********
S***h J*** *** ******** ** *************
***** **** *8* Up**** *** *************
Z******** **** * ******* *** *************
Page Compressed Table
Introducing Columnstore Indexes (SQL 2012)• Data is stored in columns, as
opposed to rows• This allows a much higher rate
of compression• Columns not used in a query a
simply not scanned, nor returned
• Recommended practice is to add most columns in a table to a index
Fn LnAreaCode Phone StNum StName StType City State
A Disney 661872-4547 111Wilson Dr
Bakersfield CA
Al Disney 530778-3737 222Main St Lewiston CA
Amy Disney 209577-5824 410Park Av
Santa Rosa CA
Anita Disney 559642-4472 89
Ahwahnee St San Diego CA
Anita Disney 209966-4472 781Mariposa Dr Napa CA
Ann Disney 949830-1883 3Amato Ct Yountville CA
Original Table
Fn
A
Al
Amy
Anita
Anita
Ann
LnDisneyDisneyDisneyDisneyDisneyDisney
AreaCode
661530209559209949
Phone872-4547778-3737577-5824642-4472966-4472830-1883
StNum111222410
89781
3
StNameWilsonMainParkAhwahneeMariposaAmato
StTypeDrStAvStDrCt
CityBakersfieldLewistonSanta RosaSan DiegoNapaYountville
StateCACACACACACA
Split in Columns
Fn A*l*my*nita********
LnDisney******************************
AreaCode
6615302*9*******4*
Phone872-4547***-3*3****-****6**-****9**-******0-1***
StNum1112224*089
7**3
StNameWilsonMa**P*rk*hw***e****i*******t*
StTypeDrStAv****C*
CityBakersfieldL*wi*tonS**** ******* DiegoNapaYountville
StateCA**********
Columnstore Compressed
Columnar Data Storage
From Microsoft SIGMOD Paper
So How are Columnstores So Much Faster?• Very good compression ratio for Column
oriented data• Better use of Memory• Segment Elimination Skips Large Chunks of
Data• Batch Mode
• Processes data in chunks of a 1000 row “batches” rather than row by row
• 7-40x CPU savings with batch mode
“The key to getting the best performance is to make sure your queries process the large majority of data in batch mode.”
Columnstore All The Things?• Awesome performance—so
what’s the negative?• Can’t update/insert in
2012• Can only be nonclustered
index—so we are storing more data on disk
• Data types are somewhat limited
• One index per table• Can’t be a sorted index
Update Process (2012)
Fact Table
Partition 1
Fact Table
Partition 3
Fact Table
Partition 2
Staging Table Data To Be
Loaded
Build Columnstore Index
Fact Table
Partition 4Partition Switch
Data From Staging to Fact Table
So Where To Use Columnstore Indexes?• Only on Large Tables—Fact
tables and Dimension Tables > 3 Million Rows
• Include Every Column • Structure Queries as star
joins with grouping and aggregation
More details here
Columnstore 2014
Columnstore in 2014• Fewer Data Type Limitations
• Updateable
• Can be Clustered Index
• New Archival Compression Mode
• Batch Mode Improvements
Columnstore Trickle Updates (2014)
Updates To Index
Collected until they reach 210
rows
Tuple Movers
Move into Index
This is the process when loading 102,399 rows or fewer
Columnstore Bulk Insert
Columnstore Updates (2014)• Bulk Inserts go
through special API• Updates are
processed as inserts and deletes, so expensive operation
Columnstore Compression Effect
1 2 3 4 5 6 70
50
100
150
200
250
300
Columnstore Compression
No CS Clustered CS Archival CS
1 2 3 4 5 6 70
10
20
30
40
50
60
70
80
Columnstore Archival Compression
Clustered CS Archival CS
• Average space savings of columnstore versus no compression—69%
• Average space savings of columnstore Archival versus regular columnstore—29%
Columnstore 2014Demo
What Do We Do Differently in 2014• Best Practices are mostly the
same• Batch mode gets enhanced
and gains more query types• No need to worry about
dropping and rebuilding indexes—just append data
• Still focus on large tables where data is not frequently updated
• Archival Compression Good for old unused data
Questions
Contact [email protected]
Joedantoni.wordpress.com
@jdanton
http://bit.ly/SQLColumnstore -- Slides, Resources
Top Related