Statistics Presentation Ch En 475 Unit Operations.

23
Statistics Presentation Ch En 475 Unit Operations

Transcript of Statistics Presentation Ch En 475 Unit Operations.

Page 1: Statistics Presentation Ch En 475 Unit Operations.

Statistics Presentation

Ch En 475Unit Operations

Page 2: Statistics Presentation Ch En 475 Unit Operations.

Quantifying variables(i.e. answering a question with a number)

1. Directly measure the variable. - referred to as “measured” variable

ex. Temperature measured with thermocouple

2. Calculate variable from “measured” or “tabulated” variables - referred to as “calculated” variable

ex. Flow rate m = r A v (measured or tabulated)

Each has some error or uncertainty

Page 3: Statistics Presentation Ch En 475 Unit Operations.

Some definitions:

x = sample means = sample standard deviation

m = exact means = exact standard deviation

As the sampling becomes larger:

x m s s

t chart z chart

not valid if bias exists (i.e. calibration is off)

A. Error of Measured Variable

Several measurementsare obtained for a single variable (i.e. T).

• What is the true value?• How confident are you?• Is the value different on different days?

Questions

Page 4: Statistics Presentation Ch En 475 Unit Operations.

• Let’s assume “normal” Gaussian distribution • For small sampling: s is known• For large sampling: s is assumed

How do you determine the error?

i

i

nx

x

small

large(n>30)

i

i xxn

s 22

11

i

i xxn

22

11

we’ll pursue this approach

Use z tables for this approach

Use t tables for this approach

Don’t often have this much data

Page 5: Statistics Presentation Ch En 475 Unit Operations.

Example

n Temp

1 40.1

2 39.2

3 43.2

4 47.2

5 38.6

6 40.4

7 37.7

9.407

)7.374.406.382.472.432.391.40( x

7.10

9.407.37

9.404.409.406.38

9.402.479.402.43

9.402.399.401.40

17

1

2

22

22

22

2

s

27.3s

Page 6: Statistics Presentation Ch En 475 Unit Operations.

Standard Deviation Summary

(normal distribution)

40.9 ± (3.27) 1s: 68.3% of data are within this range

40.9 ± (3.27x2) 2s: 95.4% of data are within this range 40.9 ± (3.27x3) 3s: 99.7% of data are within this range

If normal distribution is questionable, use Chebyshev's inequality:

At least 50% of the data are within 1.4 s from the mean. At least 75% of the data are within 2 s from the mean. At least 89% of the data are within 3 s from the mean.

The above ranges don’t state how accurate the mean is - only the % of data within the given range

Page 7: Statistics Presentation Ch En 475 Unit Operations.

Student t-test (gives confidence of where m (not data) is located)

1,f t where

nns

tx

2a=1- probabilityr = n-1 = 6

Prob. a t +-

90% .05 1.943 2.40

95% .025 2.447 3.02

99% .005 3.707 4.58

___9.40

5% 5%

ttrue mean

measured mean

2-tail

Page 8: Statistics Presentation Ch En 475 Unit Operations.

T-test Summary

41 ± 2 90% confident m is somewhere in this range

41 ± 3 95% confident m is somewhere in this range 41 ± 5 99% confident m is somewhere in this range

m= exact mean40.9 is sample mean

Page 9: Statistics Presentation Ch En 475 Unit Operations.

Comparing averages of measured variables

Day 1:

Day 2: 9n 2.67 s 2.37

7n 3.27s 9.40

yy

xx

y

x

What is your confidence that mx≠my (i.e. they are different)?

5.211

2

)1()1( 22

yxyx

yyxx

nnnn

snsn

yxt

nx+ny-2

1- confident different confident same

Larger t:More likelydifferent

1-tail

Page 10: Statistics Presentation Ch En 475 Unit Operations.

Problem 1

1. Calculate average and s for both sets of data2. Find range in which 95.4% of the data falls (for each set).3. Determine range for m for each set at 95% probability4. At what confidence are pressures different each day?

Data points

PressureDay 1

PressureDay 2

1 750 730

2 760 750

3 752 762

4 747 749

5 754 737

Page 11: Statistics Presentation Ch En 475 Unit Operations.

Example: You take measurements of r, A, v to determine m = rAv. What is the range of m and its associated uncertainty?

Calculate variable from multiple input (measured, tabulated, …) variables (i.e. m = rAv)

What is the uncertainty of your “calculated” value?

Each input variable has its own error

B. Uncertainty of Calculated Variable

Details provided in Applied Engineering Statistics, Chapters 8 and 14, R.M. Bethea and R.R. Rhinehart, 1991).

Page 12: Statistics Presentation Ch En 475 Unit Operations.

To obtain uncertainty of “calculated” variable

• DO NOT just calculate variable for each set of data and then average and take standard deviation

• DO calculate uncertainty using error from input variables: use uncertainty for “calculated” variables and error for input variables

Plan: obtain max error (d) for each input variable then obtain uncertainty of calculated variable Method 1: Propagation of max error - brute force Method 2: Propagation of max error - analytical Method 3: Propagation of variance - analytical Method 4: Propagation of variance - brute force -

Monte Carlo simulation

Page 13: Statistics Presentation Ch En 475 Unit Operations.

Value and Uncertainty

• Value used to make decisions - need to know uncertainty of value• Potential ethical and societal impact• How do you determine the uncertainty of the value?

Sources of uncertainty (from Rhinehart, Applied Engineering Statistics, 1991):1. Estimation - we guess!2. Discrimination - device accuracy (single data point)3. Calibration - may not be exact (error of curve fit)4. Technique - i.e. measure ID rather than OD5. Constants and data - not always exact!6. Noise - which reading do we take?7. Model and equations - i.e. ideal gas law vs. real gas8. Humans - transposing, …

Page 14: Statistics Presentation Ch En 475 Unit Operations.

Estimates of Error (d) for input variables(d ’s are propagated to find uncertainty)

1. Measured: measure multiple times; obtain s; d ≈ 2.5s Reason: 99% of data is within ± 2.5s

Example: s = 2.3 ºC for thermocouple, d = 5.8 ºC

2. Tabulated : d ≈ 2.5 times last reported significant digit (with 1) Reason: Assumes last digit is ± 2.5 (± 0 assumes perfect, ± 5 assumes next left digit is fuzzy)

Example: r = 1.3 g/ml at 0º C, d = 0.25 g/ml Example: People = 127,000 d = 2500 people

Page 15: Statistics Presentation Ch En 475 Unit Operations.

Estimates of Error (d) for input variables

3. Manufacturer spec or calibration accuracy: use given spec or accuracy data Example: Pump spec is ± 1 ml/min, d = 1 ml/min

4. Variable from regression (i.e. calibration curve): d ≈ 2.5*standard error (std error is stdev of residual) Example: Velocity is slope with std error = 2 m/s

5. Judgment for a variable: use judgment for d Example: Read pressure to ± 1 psi, d = 1 psi

Page 16: Statistics Presentation Ch En 475 Unit Operations.

Estimates of Error (d) for input variables

If none of the above rulesapply, give your best guess

Example: Data from a computer show that the flow rate is 562 ml/min ± 3 ml/min (stdev of computer noise). Your calibration shows 510 ml/min ± 8 ml/min (stdev). What flow rate do you use and what is d?

In the following propagation methods, it’s assumed that there is no bias in the values used - let’s assume this for all lab projects.

Page 17: Statistics Presentation Ch En 475 Unit Operations.

Method 4: Monte Carlo Simulation (propagation of variance – brute

force) Choose N (N is very large, e.g. 100,000) random

±δi from a normal distribution of standard deviation σi for each variable and add to the mean to obtain N values with errors: • rnorm(N,μ,σ) in Mathcad generates N random numbers

from a normal distribution with mean μ and std dev σ Find N values of the calculated variable using the

generated x’i values. Determine mean and standard deviation of the N

calculated variables.• y = yavg ± 1.96 SQRT(s2

y) 95%

• y = yavg ± 2.57 SQRT(s2y) 99%

d 'i i ix x

Page 18: Statistics Presentation Ch En 475 Unit Operations.

Monte Carlo Simulation Example

Estimate the uncertainty in the critical compressibility factor of a fluid if Tc = 514 ± 2 K, Pc = 61.37 ± 0.6 bar, and Vc = 0.168 ± 0.002 m3/kmol?

Page 19: Statistics Presentation Ch En 475 Unit Operations.

Example: Propagation of variance

Calculate r and its 95% probable error

)4/( 2DLM

All independent variables were measuredmultiple times (Rule 1); averages and s are given

M = 5.0 kg s = 0.05 kgL = 0.75 m s = 0.01 mD = 0.14 m s = 0.005 m

Page 20: Statistics Presentation Ch En 475 Unit Operations.

Monte CarloExample Problem

M

L D

2

4

Mav 5 Lav 0.75 Dav 0.14

sM 0.05 sL 0.01 sD 0.005

dM 2.5 sM dL 2.5 sL dD 2.5 sD

Monte Carlo

Mtrial rnorm N Mav sM Ltrial rnorm N Lav sL Dtrial rnorm N Dav sD

av mean4 Mtrial

Ltrial Dtrial2

434.738 s stdev4 Mtrial

Ltrial Dtrial2

32.091

uncertainty 1.96s 62.899

Page 21: Statistics Presentation Ch En 475 Unit Operations.

Problem 2 The DIPPR Database reports the following

measured property values with associated uncertainties for methyl propionate:• TC = 530.6 K ± 1%• PC = 40.04 bar ± 3%• VC = 0.282 m3/kmol ± 5%

Uncertainties are essentially 95% confidence intervals or 1.96s.

Determine ZC and use a Monte Carlo calculation to determine a 95% confidence interval.

Page 22: Statistics Presentation Ch En 475 Unit Operations.

Overall Summary

• measured variables: use average, std dev (data range),

and student t-test (mean range and mean comparison)

• calculated variable: determine uncertainty -- Max error: propagating error with brute force -- Max error: propagating error analytically -- Probable error: propagating variance analytically

-- Probable error: propagating variance with brute force (Monte Carlo)

Page 23: Statistics Presentation Ch En 475 Unit Operations.

Data and Statistical Expectations

1. Summary of raw data (table format)2. Sample calculations– including statistical

calculations3. Summary of all calculations- table format

is helpful4. If measured variable: average and standard

deviation, confidence of mean5. If calculated variable: Uncertainty using

Monte Carlo (if possible)