Common Mistakes and How to Avoid Them - Wayne State University
Transcript of Common Mistakes and How to Avoid Them - Wayne State University
Common Mistakes and How to Avoid Them
Hongwei Zhang
http://www.cs.wayne.edu/~hzhang
Acknowledgement: this lecture is partially based on the slides of Dr. Raj Jain.
Wise men learn by other men’s mistakes, fools by their own.
--- H. G. Wells
Performance Evaluation:
Outline
� Common Mistakes in Performance Evaluation
� A Systematic Approach to Performance Evaluation
� Case Study: Remote Pipes vs. RPC
Outline
� Common Mistakes in Performance Evaluation
� A Systematic Approach to Performance Evaluation
� Case Study: Remote Pipes vs. RPC
Common Mistakes in Performance Evaluation
� No Goals
� Goals => Metrics, workloads, methodology
� Non trivial, and need thorough understanding of the problem
� E.g., TCP: timeout algorithm (for retransmission) vs. congestion control?
� Biased Goals
� E.g., “To show that OUR system is better than OTHERS”
� May lead to mistakes in selecting metrics, methodology, etc
� Unsystematic Approach
� E.g., ad hoc selection of metrics, system parameters, factors, workload, etc
� May lead to wrong conclusions
� Analysis Without Understanding the problem
� Play with models familiar to us, instead of solving real problems
Common mistakes (contd.)
� Incorrect Performance Metrics
� E.g., “use Millions of Instructions Per Second (MIPS) to compare RISC
and CISC CPUs”?
� Overlook Important Parameters
� Parameters: system and workload characteristics
� Ignore Significant Factors
� Factors: parameters that are varied in study
� It is important to identify those parameters that, if varied, will make a
significant impact on the system properties
Common mistakes (contd.)
� Wrong Evaluation Technique
� Three evaluation techniques: measurement, simulation, analyticalmodeling
� E.g., stick to one technique we are familiar with, despite the limitations of individual techniques
� Unrepresentative Workload
� Workload: service requests to the system
� Need to represent actual usage of the system
� Inappropriate Experiment Design
� # of measurement/simulation experiments, parameter values
� Inappropriate Level of Detail
� Level of detail should depend on systems being studied
� The more similar two systems are, the more detailed the study should be
Common mistakes (contd.)
� No Analysis
� E.g., simply collect data
� Erroneous (Measurement/Simulation/)Analysis
� E.g., taking average of ratios, too short simulation, etc
� No Sensitivity Analysis
� Ignoring Errors in Input
� E.g., errors in parameter values that have to be estimated
� Need to adjust the confidence level on the model output according
to errors in input
Common mistakes (contd.)
� Improper Treatment of Outliers
� Outliers: values that is too high or too low compared to a majority
of values in a set
� Outliers may be invalid values (e.g., due to environmental
perturbations or experiment errors), but may also be valid values
� Assuming No Change in the Future
� Need to verify whether this assumption holds in practice, and, if so,
how far into the future it holds
Common mistakes (contd.)
� Ignoring Variability
� Need to study variability, in addition to mean
� Omitting Assumptions and Limitations
� Need to clearly report the assumptions and the consequent
limitations of the conclusions
� Too Complex Analysis, Improper Presentation of Results,
Ignoring Social Aspects
� Need to tune analysis and presentation to the problem context and
audience
Checklist for Avoiding Common Mistakes
� Is the system correctly defined and the goals clearly stated?
� Are the goals stated in an unbiased manner?
� Have all the steps of the analysis followed systematically?
� Is the problem clearly understood before analyzing it?
� Are the performance metrics relevant for this problem?
� Is the workload correct for this problem?
� Is the evaluation technique appropriate?
� Is the list of parameters that affect performance complete?
� Have all parameters that affect performance been chosen as factors to be varied?
Checklist (contd.)
� Is the experimental design efficient in terms of time and results?
� Is the level of detail proper?
� Is the measured data presented with analysis and interpretation?
� Is the analysis statistically correct?
� Has the sensitivity analysis been done?
� Would errors in the input cause an insignificant change in the results?
� Have the outliers in the input or output been treated properly?
� Have the future changes in the system and workload been modeled?
� Has the variance of input been taken into account?
Checklist (contd.)
� Has the variance of the results been analyzed?
� Is the analysis easy to explain?
� Is the presentation style suitable for its audience?
� Have the results been presented graphically as much as
possible?
� Are the assumptions and limitations of the analysis clearly
documented?
Outline
� Common Mistakes in Performance Evaluation
� A Systematic Approach to Performance Evaluation
� Case Study: Remote Pipes vs. RPC
A Systematic Approach to Performance Evaluation
� State Goals and Define the System
� Identify system boundaries
� List Services and Outcomes
� Helps in selecting the right metrics and workload
� Select Metrics
� Metric: criteria used to compare performance
A Systematic Approach (contd.)
� List Parameters
� System workload parameters
� Select Factors to Study
� Factors: parameters to be varied in experiments
� Levels: values of a factor
� The selection should consider technological, economic, and social
constraints
� Select Evaluation Technique
� Analytical modeling, simulation, and/or measurement
� Select Workload
� Should represent system usage in real life
A Systematic Approach (contd.)
� Design Experiments
� Usually a two-phase process
� First phase: consider “a large number of factors but a small number of levels”
� Second phase: consider “a smaller number of factors but a larger number of
levels”
� Analyze and Interpret Data
� Present Results
� Start over, if necessary (e.g., re-define system boundaries and
include other factors and performance metrics)
Outline
� Common Mistakes in Performance Evaluation
� A Systematic Approach to Performance Evaluation
� Case Study: Remote Pipes vs. RPC
Case study: RPC vs. remote pipes
� RPC
� The calling program blocks/waits until the procedure is
complete and the result is returned
� Remote pipes
� Similar to RPC, but the calling program does not block; that
is, the execution of the pipe occurs concurrently with the
continued execution of the calling program
Case study: system definition & services
� System Definition
� Services
� Small data transfer or large data transfer
Case study: metrics
� Assuming no errors and failures. Thus consider correct
operations only
� Want to compare max. allowable rate, time and resource (client,
network, and server) consumption per service
� Thus the following metrics:
� Elapsed time per call
� Maximum call rate per unit of time, or equivalently, the time
required to complete a block of n successive calls
� Local and remote CPU time per call
� Number of bytes sent on the link per call
Case study: system parameters
� Speed of the local CPU
� Speed of the remote CPU
� Speed of the network
� Operating system overhead for interfacing with the channels (i.e., a procedure or a pipe)
� Operating system overhead for interfacing with the networks
� Reliability of the network affecting the number of retransmissions required
Case study: workload parameters
� Type of channel (i.e., RPC vs. remote pipe)
� Time between successive calls
� Number and sizes of the call parameters
� Number and sizes of the results
� Other loads on the local and remote CPUs
� Other loads on the network
Case study: factors
� Type of channel: Remote pipes and remote procedure calls
� Size of the call parameters: small and large
� Size of the Network: Short distance and long distance
� Number n of consecutive calls (i.e., Block size): 1, 2, 4, 8, 16, 32,
…, 512, and 1024
� Note:
� Fixed: type of CPUs and operating systems
� Ignore retransmissions due to network errors
� Measurement without any other load on the hosts and the network
Case study: technique, workload, expt. design
� Evaluation Technique
� Prototypes implemented => Measurements
� But use analytical modeling for validation
� Workload
� Synthetic program generating the specified types of channel
requests
� Null channel requests (i.e., without actual work) => to identify
resources used in monitoring and logging
� Experimental Design
� A full factorial experimental design with 23×11=88 experiments will
be used
Case study: data analysis & presentation
� Data Analysis
� Analysis of Variance (ANOVA) for the first three factors
� Regression for number n of successive calls
� Data Presentation
� The final results will be plotted as a function of the block
size n
Summary
� Common Mistakes in Performance Evaluation
� A Systematic Approach to Performance Evaluation
� Case Study: Remote Pipes vs. RPC
Exercise
� Read the paper
Hongwei Zhang, Anish Arora, Prasun Sinha, ”Link Estimation and Routing
in Sensor Network Backbones: Beacon-based or Data-driven?”, IEEE
Transactions on Mobile Computing, 2009
(http://www.cs.wayne.edu/~hzhang/group/publications/LOF-TMC.pdf)
Make a list of strong and weak points of the study. What
would you do differently, if you were asked to repeat the
study?
Further reading
� Balachander Krishnamurthy, Walter Willinger, “What
are our standards for validation of measurement-
based networking research?”, ACM HotMETRICS’08