Sampling and Variability (Chapter 5.1 - 5.4)

7
Sampling and Variability (Chapter 5.1 - 5.4) Chengyuan Peng 92777A [email protected]

description

Sampling and Variability (Chapter 5.1 - 5.4). Chengyuan Peng 92777A [email protected]. Purpose of Sampling. What is Data Population Problems with using all of the data The whole data not available Too much data Necessary to sample the data when building models Capture a Sample: - PowerPoint PPT Presentation

Transcript of Sampling and Variability (Chapter 5.1 - 5.4)

Page 1: Sampling and Variability (Chapter 5.1 - 5.4)

Sampling and Variability

(Chapter 5.1 - 5.4)

Chengyuan Peng

92777A

[email protected]

Page 2: Sampling and Variability (Chapter 5.1 - 5.4)

Purpose of Sampling• What is Data Population

• Problems with using all of the data– The whole data not available– Too much data– Necessary to sample the data when building

models

• Capture a Sample:– To represent only some part of the population

Page 3: Sampling and Variability (Chapter 5.1 - 5.4)

Variability of Variables• Main Feature of a Variable

– Takes on a variety of values

– Contains Pattern distribution

• Numerical variables

• Categorical variables

• Graphical Display of a Pattern Distribution– Histogram, Curve

• Problems– Convergence: True Population Distribution Pattern

Unknown

– Measuring Variability: Which Distribution Curve is the Right one to use ????

Page 4: Sampling and Variability (Chapter 5.1 - 5.4)

Converging

• To Create a Distribution Curve for the Sample– Selecting instance values, one at a time at random

– Recalculated when adding a new instance value

• Converge– At first: a large change

– After a while: settled down -> Converges to the Final shape

• Summary– What is measured not the shape of the curve, but the

Variability of the sample

Page 5: Sampling and Variability (Chapter 5.1 - 5.4)

Measuring Variability

• Require Some Method of Measuring Variability– Without being sensitive to column width or smoothing method

• What is Variability– How far the individual instances from the Mean of the sample

• Standard Deviation --- One Popular Measure

- O n e F o r m u l a :

S t a n d a r d d e v i a t i o n ( ) ( )x m n2 1

- A n o t h e r F o r m u l a : I m p o r t a n t f o r d a t a p r e p a r a t i o n p r o c e s s

s = ( ) ( )x n m n2 2 1

Page 6: Sampling and Variability (Chapter 5.1 - 5.4)

• Why Confidence– An alternative of sampling the whole population

– To establish some acceptable degree of confidence,

• 95% as a satisfactory level of confidence

Variability of Numeric and Alpha Variables

• Distinction

– Alpha: for nominal / categorical; measured in nonnumeric scales

– Numeric: measured in numeric scales

– Different when measuring variability

Page 7: Sampling and Variability (Chapter 5.1 - 5.4)

• Measuring Variability of Numeric Variables– Covered above– Random sampling without introducing bias

• Measuring Variability of Alpha Variables– Instead of standard deviation

– Rate of Discovery (ROD):• Measure the rate of change of the relative proportion of values

discovered

• Sample size increases, the ROD of new alpha values falls