Frequency Distribution Statistics

Post on 29-May-2017

219 views 1 download

Transcript of Frequency Distribution Statistics

FREQUENCY DISTRIBUTIONS

How to organize, present and analyze data

Content of 60s Pop Songs

YeahActual LyricsBabyOooh

2

Consider the following exampleHow old is John?How old is Mary?How old is Frank?………How old am I?

FREQUENCY DISTRIBUTIONS

3

On the basis of a sample with 40 values, representing the age (in years, thus discrete) of EHL students

40Ages manualCount the number of times each age appears in the sample and chalk it up on the given diagram

EXAMPLE: DISCRETE VARIABLE

4

ABSOLUTE FREQUENCY DISTRIBUTION

Here the y-values represent the frequency in absolute values

5

RELATIVE FREQUENCY DISTRIBUTION

Here the y-values represent the frequency in percentage

240=5%

440=10% 3

40=7.5%

6

THE MOST FREQUENT VALUE: THE MODE

The MODE is found by the Xcel function: MODE (ranges) Result: 21 years

There are 8 21-year old students in this sample. This represents the LARGEST frequency, ie, the MODE

The set of these 8 21-year old students is called the MODAL CLASS

7

SPECIAL CASE

This frequency distribution has two (nearly equal) peaks: Bi-modal distribution

8

The median divides the data in two EQUAL parts:50% of the data’s values are BELOW the MEDIAN value50% of the data’s values are ABOVE the MEDIAN valueXcel function: MEDIAN (ranges)

THE MEDIAN VALUE: A “DEMOCRATIC” VALUE

9

POSITION OF THE MEDIAN

The MEDIAN value is 21.5 years (found by Xcel)Notice that there are 20 students younger and 20 students older than the MEDIAN

10

Median: the central data point of a data set after sorting.If the data has an odd number of values it’s literally the data value in the center of the sorted data set.If the data set has an even number of values it’s the average of the two values closest to the center of the sorted data set.

Example: annual precipitations in Geneva between 1976 and 1993 (mm)

After sortingTo find the position of the Median :

Here:

WHAT IS THE MEDIAN ?

583 890 777 958 875 926 524 756 619730 688 528 901 884 969 1258 850 939

524 528 583 619 688 730 756 777 850 875 884 890 901 926 939 958 969 1258

9.5 value out of 18 Center of the data set

11

THE AVERAGE (AVG) VALUE: A “BALANCED” MEASURE

: the values of the variable

: SUM

: the SUM of ALL the given values

n = number of valuesXcel function: AVERAGE (ranges)

NB: In many textbooks the average is called the “mean”. This gives the honest average a poor image, so it is not used in this course.

Symbol

Formula

𝑥

𝑥=∑ 𝑥𝑖𝑛

12

POSITION OF THE AVG

The AVG value is 21.65 (found by Xcel)This point on the Age axis can be considered the CENTROID of this distribution, hence the idea of a “balanced” value.

13

You made a survey on 10 different families to see how many children they have. You obtained the following observations: 0, 0, 1, 1, 2, 2, 2, 3, 4, 5

Indicate whether each statement is true or false.The mode is 5The average is 2.5The median is 2The variable is quantitativeThe variable is quantitative continuous

QUICK QUIZ

14

When data are classified or in any way grouped, we can calculate the average of the following

= the value of variable at the MIDDLE of the frequency class = the value of the frequency

40Ages computer

THE AVG OF CLASSIFIED DATA

Formula:

15

SYMMETRICAL DISTRIBUTIONS

In perfectly symmetrical frequency distributions, the relative positions of MODE, MEDIAN and AVG coincide

16

ASYMMETRICAL DISTRIBUTIONS

In a asymmetrical frequency distribution the relative positions of these three parameters appear as shown. This distribution is skewed to the right. The mirror image of this situation is also possible.

AVG MEDIANMODE

17

THE RANGE OF A GROUP OF VALUES

Age distribution of 40 students

18

QUICK QUIZ

The distribution is left skewedThe mode is smaller than the median and the averageMode = Median = AverageThe mode is between 50 and 60The average is higher than 5The median is between 4 and 5

From the following frequency distribution, indicate whether each statement is true or false.

0

10

20

30

40

50

60

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

19

You are given burger sizes of the last 20 burgers sold in one fast food. Answer the following questions.

What is the type of the variable “Burger Size”?Compute the range.Calculate the mode, median and average.Classify the data into 4 classes and compute the frequency distribution.Represent graphically the relative frequency distribution and comment it.

EXERCISE 1

20

QUICK QUIZ

Indicate whether each statement is true or false.

x3= 27 clientsThe sample size is 50 clientsf4 = 18% of days 28 clients came to your restaurantThe median is 28 clientsThe average cannot be calculated

You are reported in the table below the number of clients that came to your restaurant the last 50 days.

Compute the missing valuesxi ni fi Fi

25 5 10.00% 10.00%26   12.00%  27     32.00%28 9 18.00% 50.00%29 11 22.00% 72.00%30      

> 30 5 10.00% 100.00%

21

Using data from the customer satisfaction feedback of one service, answer the following questions:

What is the type of the variable?Compute the absolute and relative frequency distribution.Graph the relative frequency and comment your results.

EXERCISE 2

22

GRAPHICAL TOOLS

Use of different graphical representations depends on the nature (qualitative or quantitative) of the variable being studied.

Qualitative Variable

• Circle diagram• Bar chart

Quantitative Variable

• Discret• Bar chart• Steam and Leaf• Box Plot

• Continous• Histogram• Density Curve• Box Plot

23

GRAPHICAL TOOLS: CIRCLE DIAGRAM

Represents the terms of the variable as a disc. Surfaces for each category are determined by angles that are proportional to observed frequencies.

αi =360°*fi

24

GRAPHICAL TOOLS: BAR CHART

Represents the various possible values of the variable according to their absolute or relative frequency.

25

Annual precipitations in Geneva between 1976 and 1993 (mm):

Procedure:Separate each number into a stem and a leaf.Here, we choose the number of hundreds asthe stem and the tens digit as the leafGroup the numbers with the same stems

Remarks:Stem and leaf plots simultaneously show data repartition and data itselfThe leaves are sorted in increasing orderThe most difficult step is the scale choice: tens/hundreds; sometimes 5/50; 2/20, etc…

GRAPHICAL TOOLS : STEM AND LEAF PLOTS

583 890 777 958 875 926 524 756 619730 688 528 901 884 969 1258 850 939 Stem Leaf

5 2 3 86 2 9

7 3 6 88 5 8 8 99 0 3 4 6 7

101112 6

26

QUICK QUIZ

Indicate whether each statement is true or false.

This graphical representation is called a histogram.The average expenditure cannot be calculated.The expenditures distribution is skewed to the left.The median is at 21.

As a marketing consultant you observed 50 consecutive shoppers at a grocery store, and recorded how much money each shopper spent in the store.

The following graph provides this information.

1| 0 matches for 10 francs

0 2 7 7 8 9

1 0 1 2 3 3 4 4 4 5 5 5 5 7 7 8 8 9

2 0 0 1 1 1 1 4 6 7 9 9

3 1 2 3 3 4 5 6 8 9

4 1 4 6

5 2

6 2 4 4 9

27

QUICK QUIZ

Indicate whether each statement is true or false.

Team 2 is made out of 6 students.The range of the scores is 59.The highest obtained score is 70.The median is 32.40% of the students totaled less than 30 points.The average cannot be calculated.The variable is quantitative discrete.25% of the students have more than 36 points.The circle diagram could be a good graphical representation of the observations.

The scores of a team from the last Statistics quiz are given in the stem and leafs graph below. The quiz was graded on 70pts.

Reading scale :1 | 5 represent 15 points

   1 0 7 92 1 1 3 6 83 0 1 3 5 6 7 7 4 1 1 1 25  6 9

28

GRAPHICAL TOOLS: HISTOGRAM

Represents the distribution of the variable taking into account the frequency and amplitude of classes.

Distribution of employees wages according to the salary classes, Switzerland 2008

Monthly net salary, private and public sector (Confederation) together

29

Great visual representation of many important characteristics of a data set.

Data needed:Minimum and MaximumAverageMedianFirst and Third quartiles (Q1 and Q3)

GRAPHICAL TOOLS: BOX PLOT

36

BOX PLOT ILLUSTRATION

38

QUICK QUIZ

From the Box Plot above, indicate weather each statement is true or false.

75% of airports have an annual traffic lower than 100'000 flights. Half of the airports have an annual traffic greater than 70'000 flights. The skew is positive.Two airports in particular have most traffic.

The Box Plot here under represents the Swiss Civil Aviation Airport traffic in 2009.

39

GRAPH EXAMPLES

40

GRAPH EXAMPLESIn October 2012, a well known newspaper published that “the average salary in Switzerland is ranked 6th among 29 countries used for the study. Below is the reference graph published by the OFS (office féféral de la statistique). What can you conclude?

41

QUICK QUIZ

Given this information, indicate whether each statement is true or false?

The data cannot be graphically represented in terms of relative frequency because the last class “8000 and more” is open.The most suitable graph is the circle diagram because the variable "Salary" is Quantitative continuous.A histogram would be the best graphical representation of the data.The steam and leaf graph is not possible because the Variable "Salary" is classified.

We would like to study the distribution of net monthly salary for Swiss employees in 2013. Relative frequencies per class are given in the table below:

Salaryclassification

Relative frequency

0-3000 CHF 2%3000-4000 CHF 14%4000-5000 CHF 24%5000-6000 CHF 20%6000-7000 CHF 13%7000-8000 CHF 9%

8000 and more CHF 19%Total 100%

42

The life cycle of 20 bulbs from the company Superligth SA has been measured during a control. The results obtained are in the stem-and-leaf (see Excel file).

Find the quartiles of this distribution and compute the IQR.Find the average life cycle knowing that the sum of leafs are 18800 hours.Find the mode?

EXERCISE 3

43

Answer the following questions using the available exam grades distribution.

How many students attended the exam?Compute the 5-number summary of the exam results.What is the average grade?Draw the graph of the distribution and comment it.

EXERCISE 4