Visualizing database performance hotsos 13-v2
-
Upload
chen-gwen-shapira -
Category
Documents
-
view
115 -
download
4
description
Transcript of Visualizing database performance hotsos 13-v2
Visualizing Database Performance with R
Gwen Shapira, Senior ConsultantFebruary, 2013
About Me– Oracle ACE Director– Member of Oak Table– 14 years of IT
– Performance Tuning– Troubleshooting– Hadoop
– Presents, Blogs, Tweets
– @gwenshap
© 2013 Pythian2
About Pythian• Recognized Leader:
– Global industry-leader in remote database administration services and consulting for Oracle, Oracle Applications, MySQL and Microsoft SQL Server
– Work with over 250 multinational companies such as Forbes.com, Fox Sports, Nordion and Western Union to help manage their complex IT deployments
• Expertise:
– Pythian’s data experts are the elite in their field. We have the highest concentration of Oracle ACEs on staff—9 including 2 ACE Directors—and 2 Microsoft MVPs.
– Pythian holds 7 Specializations under Oracle Platinum Partner program, including Oracle Exadata, Oracle GoldenGate & Oracle RAC
• Global Reach & Scalability:
– Around the clock global remote support for DBA and consulting, systems administration, special projects or emergency response
© 2013 Pythian3
Will Talk About:
• Data pre-processing tools• Visualization tools and techniques• How to make great looking charts• What makes visuals effective• How to avoid visualization mistakes
Will NOT Talk About:
• How to collect performance data• Cool ASH queries• How to program in R• Statistics• Machine Learning• What the data actually means• How to explain the results to your boss
Why Visualize?
• Yet another analysis tool• But more fun• Highly effective
• Communications tool, too• But not at the same time
© 2013 Pythian6
Reveal Structure in Data
Visualization Tools
R Studio
© 2013 Pythian9
© 2013 Pythian10
Getting Data In Shape
Use the DB, Luke
© 2013 Pythian11
Aggregate
Scale
Filter
Getting DB Data to Rlibrary(RJDBC)drv <-JDBC("oracle.jdbc.driver.OracleDriver",
"/Users/grahn/code/jdbc/ojdbc6.jar")
conn<-dbConnect(drv,
"jdbc:oracle:thin:@zulu.us.oracle.com1521:orcl","grahn","grahn")
# import the data into a data.framelfs <-dbGetQuery(conn,
"select SAMPLE_ID, TIME_WAITED from ashdump where EVENT='log file sync’ order by SAMPLE_ID")
© 2013 Pythian12
With R"NAME","SNAP_TIME","BYTES""free memory",12-03-09 00:00:00,645935368"KGH: NO ACCESS",12-03-09 00:00:00,325214880"db_block_hash_buckets",12-03-09 00:00:00,186650624"free memory",12-03-09 00:00:00,134211304"shared_io_pool",12-03-09 00:00:00,536870912"log_buffer",12-03-09 00:00:00,16924672"buffer_cache",12-03-09 00:00:00,21676163072"fixed_sga",12-03-09 00:00:00,2238472"JOXLE",12-03-10 04:00:01,27349056"free memory",12-03-10 04:00:01,105800192"free memory",12-03-10 04:00:01,192741376"PX msg pool",12-03-10 04:00:01,8192000
© 2013 Pythian13
Reshapeshared_pool <- read.csv(~/shapira/shared_pool.csv")install.packages("reshape")library(reshape)max_shared_pool<-
cast(shared_pool,SNAP_TIME ~ NAME,max)
© 2013 Pythian14
Time free memory log_buffer buffer_cache12-03-09 00:00:00 645935368 16924672 21676163072
12-03-09 04:00:00 192741376
With R
© 2013 Pythian15
out of scale
Select Subset of data
max_shared_pool <- subset(max_shared_pool, select = -c(buffer_cache))
boxplot((max_shared_pool)/1024/1024,xlab="Size in MBytes",horizontal=TRUE,las=1,par(mar=c(4,6,2,1))
)© 2013 Pythian16
With R
© 2013 Pythian17
More SubsetsSAMPLE_ID TIME_WAITED WAIT_CLASS EVENT
10526629 14929 User I/O cell single block physical read
10526629 5015 User I/O cell single block physical read
10465699 21572 Concurrency library cache: mutex X
10465699 65938 Concurrency library cache: mutex X
© 2013 Pythian18
new <- subset (old, row filter, column filter)
phys_io <- subset(ash, WAIT_CLASS ==
“User I/O”, select = -
c(EVENT))
Filtering Data
© 2013 Pythian19
SAMPLE_ID TIME_WAITED WAIT_CLASS
10526629 14929 User I/O
10526629 5015 User I/O
Another Filtering Syntax
© 2013 Pythian20
short_waits <- subset(ash, ash$TIME_WAITED < 10000)
short_waits <- ash[ash$TIME_WAITED < 10000,]
SAMPLE_ID TIME_WAITED WAIT_CLASS EVENT
10526629 5015 User I/O cell single block physical read
Not a Typo!
Summarize with DDPLYinstall.packages(”plyr")library(plyr)
ash2 <- ddply(ash, ”SAMPLE_ID”, summarise,N=length(TIME_WAITED),
mean=mean(TIME_WAITED),max=max(TIME_WAITED));
© 2013 Pythian21
SAMPLE_ID N MEAN MAX
10526629 2 9972 14929
10465699 2 43755 65938
Cheating for DBAs
© 2013 Pythian22
library(sqldf)
ash2 = sqldf('select SAMPLE_ID, count(*) N, mean(TIME_WAITED), max(TIME_WAITED)from ash where WAIT_CLASS=“User I/O”group by SAMPLE_ID')
When all else fails
Text is text.Frits Hoogland converts 10046 trace to CSV for R with SED:
s/^\(WAIT\)\ #\([0-9]*\):\ nam='\(.*\)'\ ela=\ *\([0-9]*\)\ [0-9a-z\ #|]*=\([0-9]*\)\ [0-9a-z\ #|]*=\([0-9]*\)\ [0-9a-z\ #|]*=\([0-9]*\)\ obj#=\([0-9\-]*\)\ tim=\([0-9]*\)$/\1|\2|\3|\4|\5|\6|\7|\8|\9/
© 2013 Pythian23
Exploring Data
© 2013 Pythian24
Directions to Explore
• Shape of data• Correlations• Changes over time
© 2013 Pythian25
The Goal of Analysis is a Story
• Who • What• When• Where• Why • Why• Why• Why• Why
© 2013 Pythian26
Boxplot
• Initial step• Identify outliers• Compare groups• Summarize
© 2013 Pythian27
Fail?
75% of exports take
less than 600m
For Example:
© 2013 Pythian28
WHAT?
How its done?
ash <- read.csv('~/Downloads/ash1.csv')
boxplot(ash$TIME_WAITED/1000000 ~ ash$WAIT_CLASS, xlab="Wait Class",ylab="Time Waited (s)",cex.axis=1.2)
© 2013 Pythian29
Scatter Plot
• Incredibly versatile• Use to:
– Show changes over time– Show correlations– Highlight trends– Find model– Pretty much everything
© 2013 Pythian30
© 2013 Pythian31
WHAT?
Log Data
© 2013 Pythian32
How its done?
install.packages("ggplot2")library(ggplot2)ggplot(ash,
aes(SAMPLE_ID,TIME_WAITED, color=factor(WAIT_CLASS)))+geom_point();ggplot(ash,
aes(SAMPLE_ID,log(TIME_WAITED), color=factor(WAIT_CLASS)))
+geom_point();
© 2013 Pythian33
Only ”Small Waits”
© 2013 Pythian34
500us Physical
IO?
Filtering
small_waits <- ash[ash$TIME_WAITED<15000,]
ggplot(small_waits,aes(SAMPLE_ID,TIME_WAITED,color=factor(WAIT_CLASS))) + geom_point()
© 2013 Pythian35
Smoothing
© 2013 Pythian36
Smoothing
ggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000000,color=factor(WAIT_CLASS))) + geom_smooth()
ggplot(ash,aes(SAMPLE_ID,TIME_WAITED/1000000,color=factor(WAIT_CLASS))) + geom_point() + geom_smooth()
© 2013 Pythian37
Data over Time
© 2013 Pythian38
11gR2!
Finding Correlation
© 2013 Pythian39
© 2013 Pythian40
Regression (is not Causation)
How?concurr2 <- ddply(concurr,.(SAMPLE_ID), summarise,
N=length(TIME_WAITED), mean=mean(TIME_WAITED),
max=max(TIME_WAITED));
ggplot(concurr2,aes(N,max/1000000))+geom_point()+geom_smooth(method=lm)+xlab("Number of Samples")+ylab("Max Time Waited (s)")
© 2013 Pythian41
Heatmap
• Values as “blocks” in a matrix
• Clearer than scatter plot for large amounts of data
• Shows less information
• Performance data made sexy
© 2013 Pythian42
Heatmap
© 2013 Pythian43
How?ash2 <- ddply(concurr,.(SAMPLE_ID), summarise,N=length(TIME_WAITED), mean=mean(TIME_WAITED),
max=max(TIME_WAITED))ash2 <- ash2[ash2$WAIT_CLASS %in% c("Concurrency","User I/O","Other"),]
ggplot(ash2, aes(SAMPLE_ID, WAIT_CLASS)) + geom_tile(aes(fill = log(N))) + scale_fill_gradient(low = ”green”, high = ”red")
© 2013 Pythian44
Presenting Your Data
© 2013 Pythian45
FACT
“Even irrelevant neuroscience information in an explanation of a psychological phenomenon may interfere with people’s abilities to critically consider the underlying logic of this explanation.”
© 2013 Pythian46
47
Numerical quantities focus on expected values –
graphical summaries on unexpected values
--John Tukey
Our goal is an interesting presentation.
What is “Interesting”?
• Surprise• Beauty• Stories• Visuals• Counterintuitive• Variety
© 2013 Pythian48
Bad Visualizations Lie
1. Omit important data2. Distort data3. Misleading 4. Confusing 5. Fake correlations and Bad models
© 2013 Pythian49
Bad vs. Good Visuals
© 2013 Pythian50
Eye-API
• Good:– distances– locations– length– high contrast
• Bad:– shades– relative area– angles
© 2013 Pythian51
Good or Bad?
© 2013 Pythian52
© 2013 Pythian53
#1 Mistake – Throw a line on Data
© 2013 Pythian54
© 2013 Pythian55
Avoid Pie Charts
© 2013 Pythian56
Infographics always have Pie Charts
© 2013 Pythian57
Which is better?
© 2013 Pythian58
Creativity is Allowed
© 2013 Pythian59
Make it Beautiful – for Geeks
• Contrast• Reduce noise• Few colors• Few fonts• Lots of Data• More Signal• Less Noise
© 2013 Pythian60
IMPORTant R Libraries
• reshape• plyr• ggplot2• sqldf• http://blog.revolutionanalytics.com/
2013/02/10-r-packages-every-data-scientist-should-know-about.html
© 2013 Pythian61
Other Visualization Tools
• R + R Studio• Excel• Gephi• JIT, D3.js• Excel• ggobi
© 2013 Pythian62
Thank you – Q&A
To contact us
1-877-PYTHIAN
To follow us
http://www.pythian.com/blog
http://www.facebook.com/pages/The-Pythian-Group/163902527671
@pythian
http://www.linkedin.com/company/pythian
© 2013 Pythian63