Statistics and probability theory mth 262

49
Statistics and Probability Theory MTH-262 BS(CS)-V Course Instructor: Sajdah Hassan

description

get it and enjoy from myside

Transcript of Statistics and probability theory mth 262

Page 1: Statistics and probability theory mth 262

Statistics and Probability Theory MTH-262

BS(CS)-V

Course Instructor: Sajdah Hassan

Page 2: Statistics and probability theory mth 262

Lecture 4 Descriptive Statistics

Descriptive Statistics

Presenting Data

Describing Data

Page 3: Statistics and probability theory mth 262

Presentation of Data

1. Classification

2. Tabulation

3. Frequency Distribution

4. Stem and Leaf Display

5. Graphical Presentation

Page 4: Statistics and probability theory mth 262

Classification

The classification is the process of dividing a set of observations or objects into classes or groups in such a way that

1. Observation or objects in same class or group are similar

2. Observation or objects in same class or group are dissimilar to Observation or objects in other class or group

Page 5: Statistics and probability theory mth 262

Types of Classification When the data are sorted according to one criterion only, it is

called a “Simple classification” or a “one-way classification

Classification is called two-way classification when data is sorted according to two criteria

Similarly manifold classification is made according to several criteria

Data may be classified according to qualitative, temporal and geographical characteristics

Page 6: Statistics and probability theory mth 262

Example:

Page 7: Statistics and probability theory mth 262

2. Tabulation

The process of placing classified data into tabular form is known as tabulation. A table is a symmetric arrangement of statistical data in rows and columns. Rows are horizontal arrangements whereas columns are vertical arrangements. It may be simple, double or complex depending upon the type of classification.

Page 8: Statistics and probability theory mth 262

Main Parts of Tables

A statistical table has at least four major parts and some other minor parts.(1) The Title(2) The Box Head (column captions)(3) The Stub (row captions)(4) The Body(5) Prefatory Notes(6) Foots Notes(7) Source Notes

Page 9: Statistics and probability theory mth 262

Cont…

(1) The Title:A title is the main heading written in capital shown at the top of the table. It must explain the contents of the table and throw light on the table as whole different parts of the heading can be separated by commas there are no full stop be used in the little.

(2) The Box Head (column captions):The vertical heading and subheading of the column are called columns captions. The spaces were these column headings are written is called box head. Only the first letter of the box head is in capital letters and the remaining words must be written in small letters.

(3) The Stub (row captions):The horizontal headings and sub heading of the row are called row captions and the space where these rows headings are written is called stub.

(4) The Body:It is the main part of the table which contains the numerical information classified with respect to row and column captions.

Page 10: Statistics and probability theory mth 262

(5) Prefatory Notes :A statement given below the title and enclosed in brackets usually describe the units of measurement is called prefatory notes.

(6) Foot Notes:It appears immediately below the body of the table providing the further additional explanation.

(7) Source Notes:The source notes is given at the end of the table indicating the source from when information has been taken. It includes the information about compiling agency, publication etc…

Page 11: Statistics and probability theory mth 262

Cont….

General Rules of Tabulation:

A table should be simple and attractive. There should be no need of further explanations (details).

Proper and clear headings for columns and rows should be need.

Suitable approximation may be adopted and figures may be rounded off.

The unit of measurement should be well defined.

If the observations are large in number they can be broken into two or three tables.

Thick lines should be used to separate the data under big classes and thin lines to separate the sub classes of data.

Page 12: Statistics and probability theory mth 262

Table format:----THE TITLE----

----Prefatory Notes----

----Box Head----

----Row Captions---- ----Column Captions----

----Stub Entries----  ----The Body----

 

Page 13: Statistics and probability theory mth 262

Example

Page 14: Statistics and probability theory mth 262

Frequency Distribution

Page 15: Statistics and probability theory mth 262

Frequency Distribution

One method for simplifying and organizing data is to construct a frequency distribution.

A frequency distribution is an organized tabulation showing exactly how many individuals are located in each category on the scale of measurement. A frequency distribution presents an organized picture of the entire set of scores, and it shows where each individual is located relative to others in the distribution.

A frequency distribution is a tabular summary ofdata into classes or groups together with the number of observation in each class or group is called frequency distribution

Page 16: Statistics and probability theory mth 262

Example: Marada Inn

Guests staying at Marada Inn were asked to rate the quality of their

accommodations as being excellent, above average, average, below average, or poor. The ratings provided by a sample of 20 guests are:

Below Average Poor Average Above Average

Above Average Below Average Above Average Poor

Above Average Average Below Average Above Average

Average Above Average Excellent Above Average

Above Average Average Above Average Average

Page 17: Statistics and probability theory mth 262

Frequency Distribution

Rating Frequency

Poor 2

Below Average 3

Average 5

Above Average 9

Excellent 1

Total=20

Page 18: Statistics and probability theory mth 262

Grouped frequency distributions - can be used when the range of values in the data set is very large. The data must be grouped into classes that are more than one unit in width.Examples - Blood samples taken from 36 male volunteers as part of a study to determine the natural variation in CK concentration.

The serum CK concentrations were measured in (U/I) are as follows:

Grouped frequency distribution

Page 19: Statistics and probability theory mth 262

Cont…

121 82 100 151 68 58

95 145 64 201 101 163

84 57 139 60 78 94

119 104 110 113 118 203

62 83 67 93 92 110

25 123 70 48 95 42

Page 20: Statistics and probability theory mth 262

Grouped Frequency distribution

Serum CK (U/I)Class limits

Frequency Cumulative Frequency

20-39 1 1

40-59 4 5

60-79 7 12

80-99 8 20

100-119 8 28

120-139 3 31

140-159 2 33

160-179 1 34

180-199 0 34

200-219 2 36

Total 36

Page 21: Statistics and probability theory mth 262

Terms Associated with a Grouped Frequency Distribution

Class limits represent the smallest and largest data values that can be included in a class.

In the serum ck example, the values 20 and 39 of the first class are the class limits.

The lower class limit is 20 and the upper class limit is 39.

The class boundaries can be used to separate the classes so that there are no gaps in the frequency distribution.

The class width for a class in a frequency distribution is found by subtracting the lower (or upper) class limit of one class minus the lower (or upper) class limit of the previous class.

Page 22: Statistics and probability theory mth 262

Guidelines for Constructing a Frequency Distribution

There should be between 5 and 20 classes..

The classes must be mutually exclusive.

The class must be equal in width

Page 23: Statistics and probability theory mth 262

Procedure for Constructing a Grouped Frequency Distribution

Find the highest and lowest value.

Find the range

Select the number of classes desired.

Formula for number of classes is

Where

Find the width by dividing the range by the number of classes and rounding up

Page 24: Statistics and probability theory mth 262

Procedure for Constructing a Grouped Frequency Distribution

Select a starting point (usually the lowest value); add the width to get the lower limits.

Find the upper class limits.

Find the boundaries.

Tally the data, find the frequencies and find the cumulative frequency

Page 25: Statistics and probability theory mth 262

Example: Grouped Frequency distribution

In a survey of 20 patients who smoked, the following data were obtained. Each value represents the number of cigarettes the patient smoked per day. Construct a frequency distribution using six classes. (The data is given on the next slide.)

Page 26: Statistics and probability theory mth 262

Example: Grouped frequency distribution

Page 27: Statistics and probability theory mth 262

Example: Grouped Frequency distribution

Step 1: Find the highest and lowest values: H = 22 and L = 5.

Step 2: Find the range: R = H – L = 22 – 5 = 17.

Step 3: Select the number of classes desired. In this case it is equal to 6.

Step 4: Find the class width by dividing the range by the number of classes. Width = 17/6 = 2.83. This value is rounded up to 3.

Step 5: Select a starting point for the lowest class limit. For convenience, this value is chosen to be 5, the smallest data value. The lower class limits will be 5, 8, 11, 14, 17 and 20.

Page 28: Statistics and probability theory mth 262

Example: Grouped Frequency distribution

Step 6: The upper class limits will be 7, 10, 13, 16, 19 and 22. For example, the upper limit for the first class is computed as 8 - 1, etc.

Step 7: Find the class boundaries by subtracting 0.5 from each lower class limit and adding 0.5 to the upper class limit.

Step 8: Tally the data, write the numerical values for the tallies in the frequency column and find the cumulative frequencies.

The grouped frequency distribution is shown on the next slide.

Page 29: Statistics and probability theory mth 262

Example: Grouped Frequency distribution

Class Limits Class Boundaries Frequency Cumulative Frequency

05 to 07 4.5 - 7.5 2 208 to 10 7.5 - 10.5 3 5

11 to 13 10.5 - 13.5 6 11

14 to 16 13.5 - 16.5 5 1617 to 19 16.5 - 19.5 3 1920 to 22 19.5 - 22.5 1 20

Page 30: Statistics and probability theory mth 262

Mid points or class marks

We can also find the mid point of each class by averaging the lower and upper class limit or class boundary of that class.

Class limits Mid points or class marks

5-7 6

8-10 9

11-13 12

14-16 15

17-19 18

20-22 21

Page 31: Statistics and probability theory mth 262

Question

Make a frequency distribution of given data relating to weights recorded to the nearest grams of 60 apples picked out random from a consignment

106,107,76,82,109,107,115,93,187,95,123,125,111,92,86,70,126,68,130,129,139,119,115,128,100,186,84,99,113,204,111,141,136,123,90,115,98,110,78,185,162,178,140,152,173,146,158,194,148,90,107,181,131,75,184,104,110,80,118,82

Page 32: Statistics and probability theory mth 262

Stem and Leaf Display

A clear disadvantage of using a frequency table is that the identity of individual observations is lost in grouping process. To overcome this drawback stem and leaf display is used which offers a quick way for sorting and displaying data where each number in data is divided into two parts.

I. STEM: A stem is the leading digit(s) of each number and used in sorting

II. LEAF: Leaf is the rest of numbers or the trailing digit(s) and shown in display

Page 33: Statistics and probability theory mth 262

Stem and leaf Display A vertical line separates the leaf( 0r leaves) from the

stem .for example the number 243 can be split in two ways:

The resulting display provides an organized picture of the entire distribution. The number of leafs beside each stem corresponds to the frequency, and the individual leafs identify the individual scores

Leading digit

Trailing digit

Or Leading digit

Trailing digit

2 43 24 3

stem Leaf stem leaf

Page 34: Statistics and probability theory mth 262

Example: stem and leaf display

Page 35: Statistics and probability theory mth 262

Graphical Representation of data

Page 36: Statistics and probability theory mth 262

Graphical Representation

The visual display of statistical data in the form of points ,lines areas and other geometrical forms and symbols ,is in most general term is known as graphical representation

Statistical data can be studied with this method without going through figures, presented in form of tables

Page 37: Statistics and probability theory mth 262

Graphs for Nominal or Ordinal Data

1. Bar chart

A bar chart displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison.

For non-numerical values (scores), a bar cart is used

Spaces between adjacent bars indicates discrete categories without order (nominal) or of un measurable width (ordinal)

Multiple bar chart ,component bar chart and sub-divided rectangles also used for graphical representation of categorical data.

Page 38: Statistics and probability theory mth 262

Example: Bar Graph

Page 39: Statistics and probability theory mth 262

Example

Data of total persons in a ship can be shown in table according to the classes

Page 40: Statistics and probability theory mth 262

Example: Bar Chart

Page 41: Statistics and probability theory mth 262

2.Pie Charts

When we are interested in parts of the whole, a pie chart might be our display of choice.

Pie charts show the whole group of cases as a circle. They slice the circle into pieces whose size is fraction of

the whole proportional to the in each category. Pie chart for ship data

Page 42: Statistics and probability theory mth 262

How to Accurately Create Pie Charts

Convert Your Data: Convert all data values to percentages of the whole data set.

For example, four radishes, three cucumbers, two carrots and one pepper equals 40 percent radishes, 30 percent cucumbers, 20 percent carrots and 10 percent peppers.

Convert the percentages into angles. Since a full circle is 360 degrees, multiply this by the percentages to get the angle for each section of the pie. For the radishes, 0.4 X 360 = 144 degrees. For the cucumbers, 0.3 X 360 = 108 degrees. For the carrots, 0.2 X 360 = 72 degrees. For the peppers, 0.1 X 360 = 36 degrees.

Make sure the angle calculations are correct by adding all the angles. The total should be 360. 144 + 108 + 72 + 36 = 360. You may be off by a tenth or so due to rounding, so be

careful.

Page 43: Statistics and probability theory mth 262

How to Accurately Create Pie Charts (cont..) Draw the Chart:

Draw a circle on a blank sheet of paper, using a compass. While a compass is not necessary, using one will make the chart much neater and clearer by ensuring the circle is even.

Draw a radius, from the center to the right edge of the circle, using the ruler or straight edge. This will be the first base line.

Measure the largest angle in the data with the protractor, starting at the baseline, and mark it on the edge of the circle. Use the ruler to draw another radius to that point. Use this new radius as a base line for your next largest angle and continue this process until you get to the last data point. You will only need to measure the last angle to verify its value since both lines will already be drawn.

Label and shade the sections of the pie chart to highlight whatever data is important for your use.

Page 44: Statistics and probability theory mth 262

Graphical presentation of Quantitative Data

Histogram Another common graphical presentation of

quantitative data is a histogram In histogram the variable of interest is placed on

the horizontal axis. A rectangle is drawn above each class interval with

its height corresponding to the interval’s frequency Unlike a bar graph, a histogram has no natural

separation between rectangles of adjacent classes.

Page 45: Statistics and probability theory mth 262

Example: Histogram

An example of a frequency distribution histogram. The same set of quiz scores is presented in a frequency distribution table and in a histogram.

Page 46: Statistics and probability theory mth 262

Example: Histogram

Page 47: Statistics and probability theory mth 262

Frequency Polygons

In a polygon, a dot is centered(use mid points if intervals are given) above each score so that the height of the dot corresponds to the frequency. The dots are then connected by straight lines. An additional line is drawn at each end to bring the graph back to a zero frequency.

Page 48: Statistics and probability theory mth 262

Example: Frequency Polygon

Page 49: Statistics and probability theory mth 262

Example: Frequency Polygon