Common Mistakes and How to Avoid Them - Wayne State University

Common Mistakes and How to Avoid Them

Hongwei Zhang

http://www.cs.wayne.edu/~hzhang

Acknowledgement: this lecture is partially based on the slides of Dr. Raj Jain.

Wise men learn by other men’s mistakes, fools by their own.

--- H. G. Wells

Performance Evaluation:

Outline

� Common Mistakes in Performance Evaluation

� A Systematic Approach to Performance Evaluation

� Case Study: Remote Pipes vs. RPC

Common Mistakes in Performance Evaluation

� No Goals

� Goals => Metrics, workloads, methodology

� Non trivial, and need thorough understanding of the problem

� E.g., TCP: timeout algorithm (for retransmission) vs. congestion control?

� Biased Goals

� E.g., “To show that OUR system is better than OTHERS”

� May lead to mistakes in selecting metrics, methodology, etc

� Unsystematic Approach

� E.g., ad hoc selection of metrics, system parameters, factors, workload, etc

� May lead to wrong conclusions

� Analysis Without Understanding the problem

� Play with models familiar to us, instead of solving real problems

Common mistakes (contd.)

� Incorrect Performance Metrics

� E.g., “use Millions of Instructions Per Second (MIPS) to compare RISC

and CISC CPUs”?

� Overlook Important Parameters

� Parameters: system and workload characteristics

� Ignore Significant Factors

� Factors: parameters that are varied in study

� It is important to identify those parameters that, if varied, will make a

significant impact on the system properties


� Wrong Evaluation Technique

� Three evaluation techniques: measurement, simulation, analyticalmodeling

� E.g., stick to one technique we are familiar with, despite the limitations of individual techniques

� Unrepresentative Workload

� Workload: service requests to the system

� Need to represent actual usage of the system

� Inappropriate Experiment Design

� # of measurement/simulation experiments, parameter values

� Inappropriate Level of Detail

� Level of detail should depend on systems being studied

� The more similar two systems are, the more detailed the study should be


� No Analysis

� E.g., simply collect data

� Erroneous (Measurement/Simulation/)Analysis

� E.g., taking average of ratios, too short simulation, etc

� No Sensitivity Analysis

� Ignoring Errors in Input

� E.g., errors in parameter values that have to be estimated

� Need to adjust the confidence level on the model output according

to errors in input


� Improper Treatment of Outliers

� Outliers: values that is too high or too low compared to a majority

of values in a set

� Outliers may be invalid values (e.g., due to environmental

perturbations or experiment errors), but may also be valid values

� Assuming No Change in the Future

� Need to verify whether this assumption holds in practice, and, if so,

how far into the future it holds


� Ignoring Variability

� Need to study variability, in addition to mean

� Omitting Assumptions and Limitations

� Need to clearly report the assumptions and the consequent

limitations of the conclusions

� Too Complex Analysis, Improper Presentation of Results,

Ignoring Social Aspects

� Need to tune analysis and presentation to the problem context and

audience

Checklist for Avoiding Common Mistakes

� Is the system correctly defined and the goals clearly stated?

� Are the goals stated in an unbiased manner?

� Have all the steps of the analysis followed systematically?

� Is the problem clearly understood before analyzing it?

� Are the performance metrics relevant for this problem?

� Is the workload correct for this problem?

� Is the evaluation technique appropriate?

� Is the list of parameters that affect performance complete?

� Have all parameters that affect performance been chosen as factors to be varied?

Checklist (contd.)

� Is the experimental design efficient in terms of time and results?

� Is the level of detail proper?

� Is the measured data presented with analysis and interpretation?

� Is the analysis statistically correct?

� Has the sensitivity analysis been done?

� Would errors in the input cause an insignificant change in the results?

� Have the outliers in the input or output been treated properly?

� Have the future changes in the system and workload been modeled?

� Has the variance of input been taken into account?

Checklist (contd.)

� Has the variance of the results been analyzed?

� Is the analysis easy to explain?

� Is the presentation style suitable for its audience?

� Have the results been presented graphically as much as

possible?

� Are the assumptions and limitations of the analysis clearly

documented?

Outline




A Systematic Approach to Performance Evaluation

� State Goals and Define the System

� Identify system boundaries

� List Services and Outcomes

� Helps in selecting the right metrics and workload

� Select Metrics

� Metric: criteria used to compare performance

A Systematic Approach (contd.)

� List Parameters

� System workload parameters

� Select Factors to Study

� Factors: parameters to be varied in experiments

� Levels: values of a factor

� The selection should consider technological, economic, and social

constraints

� Select Evaluation Technique

� Analytical modeling, simulation, and/or measurement

� Select Workload

� Should represent system usage in real life

A Systematic Approach (contd.)

� Design Experiments

� Usually a two-phase process

� First phase: consider “a large number of factors but a small number of levels”

� Second phase: consider “a smaller number of factors but a larger number of

levels”

� Analyze and Interpret Data

� Present Results

� Start over, if necessary (e.g., re-define system boundaries and

include other factors and performance metrics)

Outline




Case study: RPC vs. remote pipes

� RPC

� The calling program blocks/waits until the procedure is

complete and the result is returned

� Remote pipes

� Similar to RPC, but the calling program does not block; that

is, the execution of the pipe occurs concurrently with the

continued execution of the calling program

Case study: system definition & services

� System Definition

� Services

� Small data transfer or large data transfer

Case study: metrics

� Assuming no errors and failures. Thus consider correct

operations only

� Want to compare max. allowable rate, time and resource (client,

network, and server) consumption per service

� Thus the following metrics:

� Elapsed time per call

� Maximum call rate per unit of time, or equivalently, the time

required to complete a block of n successive calls

� Local and remote CPU time per call

� Number of bytes sent on the link per call

Case study: system parameters

� Speed of the local CPU

� Speed of the remote CPU

� Speed of the network

� Operating system overhead for interfacing with the channels (i.e., a procedure or a pipe)

� Operating system overhead for interfacing with the networks

� Reliability of the network affecting the number of retransmissions required

Case study: workload parameters

� Type of channel (i.e., RPC vs. remote pipe)

� Time between successive calls

� Number and sizes of the call parameters

� Number and sizes of the results

� Other loads on the local and remote CPUs

� Other loads on the network

Case study: factors

� Type of channel: Remote pipes and remote procedure calls

� Size of the call parameters: small and large

� Size of the Network: Short distance and long distance

� Number n of consecutive calls (i.e., Block size): 1, 2, 4, 8, 16, 32,

…, 512, and 1024

� Note:

� Fixed: type of CPUs and operating systems

� Ignore retransmissions due to network errors

� Measurement without any other load on the hosts and the network

Case study: technique, workload, expt. design

� Evaluation Technique

� Prototypes implemented => Measurements

� But use analytical modeling for validation

� Workload

� Synthetic program generating the specified types of channel

requests

� Null channel requests (i.e., without actual work) => to identify

resources used in monitoring and logging

� Experimental Design

� A full factorial experimental design with 23×11=88 experiments will

be used

Case study: data analysis & presentation

� Data Analysis

� Analysis of Variance (ANOVA) for the first three factors

� Regression for number n of successive calls

� Data Presentation

� The final results will be plotted as a function of the block

size n

Summary




Exercise

� Read the paper

Hongwei Zhang, Anish Arora, Prasun Sinha, ”Link Estimation and Routing

in Sensor Network Backbones: Beacon-based or Data-driven?”, IEEE

Transactions on Mobile Computing, 2009

(http://www.cs.wayne.edu/~hzhang/group/publications/LOF-TMC.pdf)

Make a list of strong and weak points of the study. What

would you do differently, if you were asked to repeat the

study?

Further reading

� Balachander Krishnamurthy, Walter Willinger, “What

are our standards for validation of measurement-

based networking research?”, ACM HotMETRICS’08

Common Mistakes and How to Avoid Them - Wayne State University

Documents

Transcript of Common Mistakes and How to Avoid Them - Wayne State University