Visualizing database performance hotsos 13-v2

63
Visualizing Database Performance with R Gwen Shapira, Senior Consultant February, 2013

description

 

Transcript of Visualizing database performance hotsos 13-v2

Page 1: Visualizing database performance   hotsos 13-v2

Visualizing Database Performance with R

Gwen Shapira, Senior ConsultantFebruary, 2013

Page 2: Visualizing database performance   hotsos 13-v2

About Me– Oracle ACE Director– Member of Oak Table– 14 years of IT

– Performance Tuning– Troubleshooting– Hadoop

– Presents, Blogs, Tweets

– @gwenshap

© 2013 Pythian2

Page 3: Visualizing database performance   hotsos 13-v2

About Pythian• Recognized Leader:

– Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server

– Work with over 250 multinational companies such as Forbes.com, Fox Sports, Nordion and Western Union to help manage their complex IT deployments

• Expertise:

– Pythian’s data experts are the elite in their field. We have the highest concentration of Oracle ACEs on staff—9 including 2 ACE Directors—and 2 Microsoft MVPs.

– Pythian holds 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC

• Global Reach & Scalability:

– Around the clock global remote support for DBA and consulting, systems administration, special projects or emergency response

© 2013 Pythian3

Page 4: Visualizing database performance   hotsos 13-v2

Will Talk About:

• Data pre-processing tools• Visualization tools and techniques• How to make great looking charts• What makes visuals effective• How to avoid visualization mistakes

Page 5: Visualizing database performance   hotsos 13-v2

Will NOT Talk About:

• How to collect performance data• Cool ASH queries• How to program in R• Statistics• Machine Learning• What the data actually means• How to explain the results to your boss

Page 6: Visualizing database performance   hotsos 13-v2

Why Visualize?

• Yet another analysis tool• But more fun• Highly effective

• Communications tool, too• But not at the same time

© 2013 Pythian6

Page 7: Visualizing database performance   hotsos 13-v2

Reveal Structure in Data

Page 8: Visualizing database performance   hotsos 13-v2

Visualization Tools

Page 9: Visualizing database performance   hotsos 13-v2

R Studio

© 2013 Pythian9

Page 10: Visualizing database performance   hotsos 13-v2

© 2013 Pythian10

Getting Data In Shape

Page 11: Visualizing database performance   hotsos 13-v2

Use the DB, Luke

© 2013 Pythian11

Aggregate

Scale

Filter

Page 12: Visualizing database performance   hotsos 13-v2

Getting DB Data to Rlibrary(RJDBC)drv <-JDBC("oracle.jdbc.driver.OracleDriver",

"/Users/grahn/code/jdbc/ojdbc6.jar")

conn<-dbConnect(drv,

"jdbc:oracle:thin:@zulu.us.oracle.com1521:orcl","grahn","grahn")

# import the data into a data.framelfs <-dbGetQuery(conn,

"select SAMPLE_ID, TIME_WAITED from ashdump where EVENT='log file sync’ order by SAMPLE_ID")

© 2013 Pythian12

Page 13: Visualizing database performance   hotsos 13-v2

With R"NAME","SNAP_TIME","BYTES""free memory",12-03-09 00:00:00,645935368"KGH: NO ACCESS",12-03-09 00:00:00,325214880"db_block_hash_buckets",12-03-09 00:00:00,186650624"free memory",12-03-09 00:00:00,134211304"shared_io_pool",12-03-09 00:00:00,536870912"log_buffer",12-03-09 00:00:00,16924672"buffer_cache",12-03-09 00:00:00,21676163072"fixed_sga",12-03-09 00:00:00,2238472"JOXLE",12-03-10 04:00:01,27349056"free memory",12-03-10 04:00:01,105800192"free memory",12-03-10 04:00:01,192741376"PX msg pool",12-03-10 04:00:01,8192000

© 2013 Pythian13

Page 14: Visualizing database performance   hotsos 13-v2

Reshapeshared_pool <- read.csv(~/shapira/shared_pool.csv")install.packages("reshape")library(reshape)max_shared_pool<-

cast(shared_pool,SNAP_TIME ~ NAME,max)

© 2013 Pythian14

Time free memory log_buffer buffer_cache12-03-09 00:00:00 645935368 16924672 21676163072

12-03-09 04:00:00 192741376

Page 15: Visualizing database performance   hotsos 13-v2

With R

© 2013 Pythian15

out of scale

Page 16: Visualizing database performance   hotsos 13-v2

Select Subset of data

max_shared_pool <- subset(max_shared_pool, select = -c(buffer_cache))

boxplot((max_shared_pool)/1024/1024,xlab="Size in MBytes",horizontal=TRUE,las=1,par(mar=c(4,6,2,1))

)© 2013 Pythian16

Page 17: Visualizing database performance   hotsos 13-v2

With R

© 2013 Pythian17

Page 18: Visualizing database performance   hotsos 13-v2

More SubsetsSAMPLE_ID TIME_WAITED WAIT_CLASS EVENT

10526629 14929 User I/O cell single block physical read

10526629 5015 User I/O cell single block physical read

10465699 21572 Concurrency library cache: mutex X

10465699 65938 Concurrency library cache: mutex X

© 2013 Pythian18

Page 19: Visualizing database performance   hotsos 13-v2

new <- subset (old, row filter, column filter)

phys_io <- subset(ash, WAIT_CLASS ==

“User I/O”, select = -

c(EVENT))

Filtering Data

© 2013 Pythian19

SAMPLE_ID TIME_WAITED WAIT_CLASS

10526629 14929 User I/O

10526629 5015 User I/O

Page 20: Visualizing database performance   hotsos 13-v2

Another Filtering Syntax

© 2013 Pythian20

short_waits <- subset(ash, ash$TIME_WAITED < 10000)

short_waits <- ash[ash$TIME_WAITED < 10000,]

SAMPLE_ID TIME_WAITED WAIT_CLASS EVENT

10526629 5015 User I/O cell single block physical read

Not a Typo!

Page 21: Visualizing database performance   hotsos 13-v2

Summarize with DDPLYinstall.packages(”plyr")library(plyr)

ash2 <- ddply(ash, ”SAMPLE_ID”, summarise,N=length(TIME_WAITED),

mean=mean(TIME_WAITED),max=max(TIME_WAITED));

© 2013 Pythian21

SAMPLE_ID N MEAN MAX

10526629 2 9972 14929

10465699 2 43755 65938

Page 22: Visualizing database performance   hotsos 13-v2

Cheating for DBAs

© 2013 Pythian22

library(sqldf)

ash2 = sqldf('select SAMPLE_ID, count(*) N, mean(TIME_WAITED), max(TIME_WAITED)from ash where WAIT_CLASS=“User I/O”group by SAMPLE_ID')

Page 23: Visualizing database performance   hotsos 13-v2

When all else fails

Text is text.Frits Hoogland converts 10046 trace to CSV for R with SED:

s/^\(WAIT\)\ #\([0-9]*\):\ nam='\(.*\)'\ ela=\ *\([0-9]*\)\ [0-9a-z\ #|]*=\([0-9]*\)\ [0-9a-z\ #|]*=\([0-9]*\)\ [0-9a-z\ #|]*=\([0-9]*\)\ obj#=\([0-9\-]*\)\ tim=\([0-9]*\)$/\1|\2|\3|\4|\5|\6|\7|\8|\9/

© 2013 Pythian23

Page 24: Visualizing database performance   hotsos 13-v2

Exploring Data

© 2013 Pythian24

Page 25: Visualizing database performance   hotsos 13-v2

Directions to Explore

• Shape of data• Correlations• Changes over time

© 2013 Pythian25

Page 26: Visualizing database performance   hotsos 13-v2

The Goal of Analysis is a Story

• Who • What• When• Where• Why • Why• Why• Why• Why

© 2013 Pythian26

Page 27: Visualizing database performance   hotsos 13-v2

Boxplot

• Initial step• Identify outliers• Compare groups• Summarize

© 2013 Pythian27

Fail?

75% of exports take

less than 600m

Page 28: Visualizing database performance   hotsos 13-v2

For Example:

© 2013 Pythian28

WHAT?

Page 29: Visualizing database performance   hotsos 13-v2

How its done?

ash <- read.csv('~/Downloads/ash1.csv')

boxplot(ash$TIME_WAITED/1000000 ~ ash$WAIT_CLASS, xlab="Wait Class",ylab="Time Waited (s)",cex.axis=1.2)

© 2013 Pythian29

Page 30: Visualizing database performance   hotsos 13-v2

Scatter Plot

• Incredibly versatile• Use to:

– Show changes over time– Show correlations– Highlight trends– Find model– Pretty much everything

© 2013 Pythian30

Page 31: Visualizing database performance   hotsos 13-v2

© 2013 Pythian31

WHAT?

Page 32: Visualizing database performance   hotsos 13-v2

Log Data

© 2013 Pythian32

Page 33: Visualizing database performance   hotsos 13-v2

How its done?

install.packages("ggplot2")library(ggplot2)ggplot(ash,

aes(SAMPLE_ID,TIME_WAITED, color=factor(WAIT_CLASS)))+geom_point();ggplot(ash,

aes(SAMPLE_ID,log(TIME_WAITED), color=factor(WAIT_CLASS)))

+geom_point();

© 2013 Pythian33

Page 34: Visualizing database performance   hotsos 13-v2

Only ”Small Waits”

© 2013 Pythian34

500us Physical

IO?

Page 35: Visualizing database performance   hotsos 13-v2

Filtering

small_waits <- ash[ash$TIME_WAITED<15000,]

ggplot(small_waits,aes(SAMPLE_ID,TIME_WAITED,color=factor(WAIT_CLASS))) + geom_point()

© 2013 Pythian35

Page 36: Visualizing database performance   hotsos 13-v2

Smoothing

© 2013 Pythian36

Page 37: Visualizing database performance   hotsos 13-v2

Smoothing

ggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000000,color=factor(WAIT_CLASS))) + geom_smooth()

ggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000000,color=factor(WAIT_CLASS))) + geom_point() + geom_smooth()

© 2013 Pythian37

Page 38: Visualizing database performance   hotsos 13-v2

Data over Time

© 2013 Pythian38

11gR2!

Page 39: Visualizing database performance   hotsos 13-v2

Finding Correlation

© 2013 Pythian39

Page 40: Visualizing database performance   hotsos 13-v2

© 2013 Pythian40

Regression (is not Causation)

Page 41: Visualizing database performance   hotsos 13-v2

How?concurr2 <- ddply(concurr,.(SAMPLE_ID), summarise,

N=length(TIME_WAITED), mean=mean(TIME_WAITED),

max=max(TIME_WAITED));

ggplot(concurr2,aes(N,max/1000000))+geom_point()+geom_smooth(method=lm)+xlab("Number of Samples")+ylab("Max Time Waited (s)")

© 2013 Pythian41

Page 42: Visualizing database performance   hotsos 13-v2

Heatmap

• Values as “blocks” in a matrix

• Clearer than scatter plot for large amounts of data

• Shows less information

• Performance data made sexy

© 2013 Pythian42

Page 43: Visualizing database performance   hotsos 13-v2

Heatmap

© 2013 Pythian43

Page 44: Visualizing database performance   hotsos 13-v2

How?ash2 <- ddply(concurr,.(SAMPLE_ID), summarise,N=length(TIME_WAITED), mean=mean(TIME_WAITED),

max=max(TIME_WAITED))ash2 <- ash2[ash2$WAIT_CLASS %in% c("Concurrency","User I/O","Other"),]

ggplot(ash2, aes(SAMPLE_ID, WAIT_CLASS)) + geom_tile(aes(fill = log(N))) + scale_fill_gradient(low = ”green”, high = ”red")

© 2013 Pythian44

Page 45: Visualizing database performance   hotsos 13-v2

Presenting Your Data

© 2013 Pythian45

Page 46: Visualizing database performance   hotsos 13-v2

FACT

“Even irrelevant neuroscience information in an explanation of a psychological phenomenon may interfere with people’s abilities to critically consider the underlying logic of this explanation.”

© 2013 Pythian46

Page 47: Visualizing database performance   hotsos 13-v2

47

Numerical quantities focus on expected values –

graphical summaries on unexpected values

--John Tukey

Page 48: Visualizing database performance   hotsos 13-v2

Our goal is an interesting presentation.

What is “Interesting”?

• Surprise• Beauty• Stories• Visuals• Counterintuitive• Variety

© 2013 Pythian48

Page 49: Visualizing database performance   hotsos 13-v2

Bad Visualizations Lie

1. Omit important data2. Distort data3. Misleading 4. Confusing 5. Fake correlations and Bad models

© 2013 Pythian49

Page 50: Visualizing database performance   hotsos 13-v2

Bad vs. Good Visuals

© 2013 Pythian50

Page 51: Visualizing database performance   hotsos 13-v2

Eye-API

• Good:– distances– locations– length– high contrast

• Bad:– shades– relative area– angles

© 2013 Pythian51

Page 52: Visualizing database performance   hotsos 13-v2

Good or Bad?

© 2013 Pythian52

Page 53: Visualizing database performance   hotsos 13-v2

© 2013 Pythian53

Page 54: Visualizing database performance   hotsos 13-v2

#1 Mistake – Throw a line on Data

© 2013 Pythian54

Page 55: Visualizing database performance   hotsos 13-v2

© 2013 Pythian55

Page 56: Visualizing database performance   hotsos 13-v2

Avoid Pie Charts

© 2013 Pythian56

Page 57: Visualizing database performance   hotsos 13-v2

Infographics always have Pie Charts

© 2013 Pythian57

Page 58: Visualizing database performance   hotsos 13-v2

Which is better?

© 2013 Pythian58

Page 59: Visualizing database performance   hotsos 13-v2

Creativity is Allowed

© 2013 Pythian59

Page 60: Visualizing database performance   hotsos 13-v2

Make it Beautiful – for Geeks

• Contrast• Reduce noise• Few colors• Few fonts• Lots of Data• More Signal• Less Noise

© 2013 Pythian60

Page 61: Visualizing database performance   hotsos 13-v2

IMPORTant R Libraries

• reshape• plyr• ggplot2• sqldf• http://blog.revolutionanalytics.com/

2013/02/10-r-packages-every-data-scientist-should-know-about.html

© 2013 Pythian61

Page 62: Visualizing database performance   hotsos 13-v2

Other Visualization Tools

• R + R Studio• Excel• Gephi• JIT, D3.js• Excel• ggobi

© 2013 Pythian62

Page 63: Visualizing database performance   hotsos 13-v2

Thank you – Q&A

To contact us

[email protected]

1-877-PYTHIAN

To follow us

http://www.pythian.com/blog

http://www.facebook.com/pages/The-Pythian-Group/163902527671

@pythian

http://www.linkedin.com/company/pythian

© 2013 Pythian63