The reasons for the appearance of Statistics: Statistics ... · the total number of its...

10
Naseebah Melaibari - Page (1) Statistics The reasons for the appearance of Statistics: Census community. Inventory of the wealth of individuals. Data on births, deaths and production and consumption. Statistics is a science of collecting , organizing , analyzing and interpreting data in order to make decisions. A population is the collection of all outcomes, responses, measurements, or counts that are of interest. Finite population Infinite population the total number of its observations is a finite number the total number of its observations is an infinite number A sample is a subset of a population that is representative of the population. Reasons draw a sample, rather than study a population We cannot study the population :huge, destinations. Preservation from loss Less cost. Save time. More inclusive and accuracy. Branches of Statistics: Descriptive statistics Statistical inference is the branch of statistics that involves the organization, summarization , and display of data. is the branch of statistics that involves using a sample to draw conclusions about a population. Data consist of information coming from observations, counts, measurement, or responses. The singular for data is datum. Types of data: Quantitative data Qualitative data measured in the usual sense like length, weight , and age it cannot be measured in the usual sense but it can be ordered or ranked. for example: marital status ,blood group and eye color A frequency distribution (frequency table) is a table that shows classes or intervals of data entries with a count of the number of entries in each class. The frequency f of a class is a number of data entries in the class. Frequency distribution for quantitative data : The number of intervals (k) A small number of intervals are not good because information will be lost. A large number of intervals are not helpful to summarize the data. A commonly followed rule is that 5 k 20 or the following formula may be used, k=1+3.322 (log n). The range (R) It is the difference between the maximum and the minimum observation (entries) in the data set. R = the maximum entry - the minimum entry R =X max - X min The Width of the interval (w) Class intervals generally should be of the same width. Thus, if we want k intervals, then w is chosen such that w > . Choose the minimum observation to be the lower limit of the first interval and add the width of interval to get the lower limit of the second interval and so on. To find the upper limit of any interval add the following to the lower limit of interval: W-1 The true class intervals The true lower unit = the lower limit - 0.5 The true upper unit = the upper limit + 0.5 The Mid interval Midpoints = The relative frequency of interval = The percentage frequency = The relative frequency

Transcript of The reasons for the appearance of Statistics: Statistics ... · the total number of its...

Page 1: The reasons for the appearance of Statistics: Statistics ... · the total number of its observations is a finite number the total number of its observations is an infinite number

Naseebah Melaibari - Page (1)

Statistics The reasons for the appearance of Statistics: • Census community. • Inventory of the wealth of individuals. • Data on births, deaths and production and consumption. Statistics is a science of collecting , organizing , analyzing and interpreting data in order to make decisions. A population is the collection of all outcomes, responses, measurements, or counts that are of interest.

Finite population Infinite population

the total number of its observations is a finite number

the total number of its observations is an infinite number

A sample is a subset of a population that is representative of the population. Reasons draw a sample, rather than study a population • We cannot study the population :huge, destinations. • Preservation from loss • Less cost. • Save time. • More inclusive and accuracy. Branches of Statistics:

Descriptive statistics Statistical inference

is the branch of statistics that involves the organization, summarization , and display of data.

is the branch of statistics that involves using a sample to draw conclusions about a population.

Data consist of information coming from observations, counts, measurement, or responses. The singular for data is datum. Types of data:

Quantitative data Qualitative data

measured in the usual sense like length, weight , and age

it cannot be measured in the usual sense but it can be ordered or ranked. for example: marital status ,blood group and eye color

A frequency distribution (frequency table) is a table that shows classes or intervals of data entries with a count of the number of entries in each class. The frequency f of a class is a number of data entries in the class. Frequency distribution for quantitative data:

The number of intervals (k)

A small number of intervals are not good because information will be lost. A large number of intervals are not helpful to summarize the data. A commonly followed rule is that 5 ≤ k ≤ 20 or the following formula may be used, k=1+3.322 (log n).

The range (R)

It is the difference between the maximum and the minimum observation (entries) in the data set. R = the maximum entry - the minimum entry R =Xmax- Xmin

The Width of the interval (w)

Class intervals generally should be of the same width. Thus, if we want k

intervals, then w is chosen such that w >

.

Choose the minimum observation to be the lower limit of the first interval and add the width of interval to get the lower limit of the second interval and so on.

To find the upper limit of any interval add the following to the lower limit of interval: W-1

The true class intervals The true lower unit = the lower limit - 0.5 The true upper unit = the upper limit + 0.5

The Mid interval Midpoints =

The relative frequency of interval

=

The percentage frequency = The relative frequency

Page 2: The reasons for the appearance of Statistics: Statistics ... · the total number of its observations is a finite number the total number of its observations is an infinite number

Naseebah Melaibari - Page (2)

Descriptive Statistics

Data representation: Quantitative Data:

1. A frequency histogram 2. A frequency polygon

* frequency table. * The true class intervals.

* frequency table. * The true class intervals.

3. A frequency curve 4. A steam -and –leaf plot

* frequency table. * The true class intervals.

Qualitative data:

1. Line chart 2. Pie cart

Central Angle=

3. Bar chart

Simple bar chart Cluster charts

A Measure of Central Tendency is a value that represents a typical, or central, entry of data set.

Page 3: The reasons for the appearance of Statistics: Statistics ... · the total number of its observations is a finite number the total number of its observations is an infinite number

Naseebah Melaibari - Page (3)

The three most commonly used of central tendency:

The Mean: Property: the sum of the deviation of a set of values from their mean is 0.

Ungrouped Data

sum of the data entries divided by the numbers of entries:

grouped data

Where are the midpoints and are the frequencies of a class

The Weighted Mean

The median: The median of a data set is the value that lies in the middle of the data when the data set is ordered.

Odd number The median is the Middle data entry

Even number The median is the mean of the two middle data entries

The mode: The mode of a data set is the data entry that occurs with the greatest frequency.

Set of data may have: • one mode

• more than one mode (bimodal) • no mode

The relation between the mean, median, and mode

Advantage and Disadvantage:

The mean/ Standard deviation (The same)

Advantage Disadvantage

1. For a given set of data there is one and only one mean(uniqueness) .

2. It takes every entry into account. 3. It is easy to understand and to compute.

1. Affected by extreme values. Since all values enter into the computation.

2. It can’t be calculated with the open table. 3. It can’t be used with qualitative data.

The median

Advantage Disadvantage

1. Don’t affected by the extreme values. 2. It can be calculated with the open table. 3. It can be used with qualitative data.

1. It don’t takes every entry into account. 2. It is not easy to use in statistical analyses.

The mode

Advantage Disadvantage

1. Don’t affected by the extreme values. 2. It can be calculated with the open

frequencies table. 3. It can be used with qualitative data. 4. It is easy measurement.

1. It don’t takes every entry into account. 2. In such cases, the mode may not exist or

may not be very meaningful. 3. Some data have no mode.

Page 4: The reasons for the appearance of Statistics: Statistics ... · the total number of its observations is a finite number the total number of its observations is an infinite number

Naseebah Melaibari - Page (4)

Definition of Measures of variation: The measure of variation in a set of observations refers to how spread out the observations are from each other.

When the measure of variation is small, this means that the values are close together (but not the same).

Measures Definition Formula

Range The range of a data set is the difference between the maximum and the minimum data entries in the set.

R = maximum data entry – minimum data entry

Deviation The deviation of an entry x in a population data set is the difference between the entry and the mean μ of the data set.

Deviation of x = x –μ

Variance

The population variance of a population data set of N entries

The sample variance and sample standard deviation of a sample data set of n entries

Sample variance=

Or:

pooled two sample variance:

standard deviation

The population standard deviation of a population data set of N entries is the square root of the population variance

population standard deviation =

Sample standard deviation= S =

Standard Deviation for Grouped Data: that large data sets are usually best represented by a frequency distribution

the sample standard deviation for a frequency

distribution is :

Where n=∑f is the number of entries.

Coefficient of

variance

The Coefficient of variance C.V. describes the standard deviation as a percent of the mean.

The standard

score

The standard score, or z-score, represent the number or standard deviation a given value falls from the mean. A z-score is used to compare data values within the same data set or to compare data values from different data set.

A z-score can be negative, positive, or zero. If z is negative x < If z is zero x =

If z is positive x >

Coefficient of

skewness

It is the measurement to describes the shape of data. Types of skewness: • Skewd left: a distribution is skewed left (negatively skewed) if its tail extends to the left. • Skewd right: a distribution is skewed right (positively skewed) if its tail extends to the right. • Symmetric: a distribution on one side of the mean is a mirror image of the other side.

If a distribution is skewed-left If a distribution is symmetric

If a distribution is skewed-right

Page 5: The reasons for the appearance of Statistics: Statistics ... · the total number of its observations is a finite number the total number of its observations is an infinite number

Naseebah Melaibari - Page (5)

The Range :

Advantages of the range Disadvantages of the range

• It’s easy to calculate • It gives a quick idea about the nature of data, often used in quality control and describe the weather.

• It uses only two entries from the data set. • Affected by extreme values .therefore it’ approximate measurement.

Notes:

If the sample size (n) is large (greater than 30) then , are equal approximately.

The standard deviation is always positive.

If data are equal then the standard deviation is 0.

You can use the Coefficient of variance to compare data with different units.

Advantages and disadvantages of the standard deviation: It is as like as the mean. Interpreting Standard Deviation: when interpreting the standard deviation , remember that it is a measure of the typical amount an entry deviates from the mean. The more the entries are spread out, the greater the standard deviation

Reasons to use Coefficient of variance (the relative variation) rather than the measure of variation:

• The two variables involved might by measured in different units. • The means of the two may quit different in size.

Correlation and Regression: A correlation is a relationship between two variables. *The data can be represented by the ordered pairs (x,y) where x is the independent (or explanatory) variable and y is the dependent (or response) variable. Example: A. The relation exits between the number of hours for group of students spent studying for a test and their scores on that test. x = hours spent studying , y= scores on that test Scatter plot:

B. The relation exists between an individual’s weight (in pounds) and daily water consumption (in ounces). x = an individual’s weight (in pounds) , y= water consumption Scatter plot:

C. The relation exits between the high outdoor temperature (in degrees Fahrenheit) and coffee sales (in hundreds of dollars) for a coffee shop for eight randomly selected days. x = temperature (in degrees Fahrenheit) , y= coffee sales Scatter plot:

D. The relation exists between income per year (in thousand of dollars) and a mount spent on milk per year (in dollars). x = money spent on advertising , y= company sales Scatter plot:

Page 6: The reasons for the appearance of Statistics: Statistics ... · the total number of its observations is a finite number the total number of its observations is an infinite number

Naseebah Melaibari - Page (6)

Linear Correlation Coefficient of Pearson: The correlation coefficient is a measure of the strength and the direction of a liner relationship between two variables. The symbol r represents the sample correlation coefficient.

Where n is the number of pairs of data. Notes: *The range of correlation coefficient is -1 to 1. *Weak linear correlation coefficient does not mean no any relationship

Simple Linear Regression:

The equation of a regression line for an independent variable X and a dependent variable Y is:

where Y is the predicted Y-value for a given X-value. The slope m and Y-intercept b are given by:

and

where is the mean of the Y-value in the data set and is the mean of the X-value.

Page 7: The reasons for the appearance of Statistics: Statistics ... · the total number of its observations is a finite number the total number of its observations is an infinite number

Naseebah Melaibari - Page (7)

Concepts of Probabilities: Random Experiment:

Can be repeated (theoretically) an infinite number of times.

Has a well-defined set of possible outcomes.

Result cannot be predetermined. We use three building blocks: a sample space, a set of events and probability.

sample space events

Definition The set of all possible outcomes of a statistical experiments is called the sample space and represented by the symbol S.

The subset of a sample space. Donated by a capital letter.

The complement The intersection The union

The complement of an event A with respect to S is the subset of all

elements of S that are not in A. We denote the complement of A by the

symbol Aˊ.

The intersection of two events A and B, denoted by the

symbol A ∩ B , is the event containing all elements that

are common to A and B.

The union of the two event A and B denoted by the symbol A U B is

the event containing all the elements that belong to A or B or

both.

Mutually Exclusive Events: two events A and B are mutually exclusive or disjoint, if A ∩ B= Ø, that is, if A and B have no elements in common. Some important results related to the set operation:

A ∩ Ø = Ø.

A U Ø = A.

A ∩ Aˊ = Ø.

A U Aˊ = S.

Sˊ = Ø.

Ø ˊ= S.

(Aˊ)ˊ = A.

(A ∩ B)ˊ = Aˊ U Bˊ

(A U B)ˊ = Aˊ ∩ Bˊ Probability functions: Want to assign a probability to an experiment's outcome (and in general to events).

1. Let A be an event defined on sample space S. 2. P(A) denoted the probability of A occurring. 3. P is the probability function.

The probability of an event A: is the sum of the weights of all sample points in A. Such that:

Axiom 1: 0 ≤ P(A) ≤ 1,

Axiom 2: P(S)= 1 and P(Ø) = 0,

Axiom 3: if A, B, C, …. Is a sequence of mutually exclusive events then P(A U B U C U …)= P(A) + P(B) + P(C)+ ….

Theorems:

(1) Additive Rules (2) Additive Rules (3) If an experiment can result in

any one of N different equally likely outcomes, and if exactly n of these outcomes correspond to event A, then the probability

of event A is

P(A)=

If A and B are two events, then P(AU B)= P(A) +P(B) – P(A∩B).

Results:

If A and B are mutually exclusive, then P(AUB)= P(A) +P(B)

If A₁, A₂, A₃, … Aᵣ is a partition of sample space S then P(A₁ U A₂ U A₃U … U Aᵣ )= P(A₁) + P(A₂)+ P(A₃) + …

+P(Aᵣ)=P(s)=1.

If A and Aˊ are

complementary events

then P(A)+ P(Aˊ)

= 1.

Page 8: The reasons for the appearance of Statistics: Statistics ... · the total number of its observations is a finite number the total number of its observations is an infinite number

Naseebah Melaibari - Page (8)

Conditional Probability: We often want to find the probability of an event A occurring given that B has occurred.

In these cases, we are conditioning on B, and we write P(A|B).

B serves as a new (reduced) sample space.

P(A|B) is that fraction of P(B) which corresponding to P(A ∩ B).

The conditional probability of B, given A, denoted by P(A|B) is defined by

Where P(B) > 0.

Independence: A and B are independent if knowledge that one event has occurred does not change the probability that the other will occur.

P(A|B)= P(A) or P(B|A)=P(B) Note: Provided the existences of the conditional probability. Otherwise, A and B are dependent. Multiplicative Rules: Theorems:

(1) (2)

If in an experiment the events A and B can both occur, then: P (A ∩ B)= P(A)P(B|A) provided that P(A) >0.

Two events A and B are independent if and only if P (A ∩ B)= P(A)P(B).

Therefore, to obtain the probability that two independent events both occur, we simply find the product of their individual probabilities.

Random Variables: is a function that associates a real number with each element in the sample space. Remark: We shall use a capital letter, say X, to denote a random variable and its corresponding small letter, x in this case, for one of its values. Sample Space:

discrete sample space continuous sample space

If a sample space contains a finite number of possibilities or an unending sequence with as many

elements as there are whole numbers.

If a sample space contains an infinite number of possibilities equal to the number of points on a line

segment.

Type Of Random Variable:

Discrete random variable Continuous random variable

if its set of possible outcomes is countable.

when can take on values on a continuous scale.

Discrete Probability Distributions: The set of ordered pairs (x, f(x)) is a probability function, probability mass function, or probability distribution of the discrete random variable X.

Continuous Probability Distributions: • A continuous random variable has a probability of zero of assuming exactly any of its values. • Its probability distribution cannot be given in tabular form, it can be stated as a formula f(x), is usually called the probability density function or the density function of X. • A probability density function is constructed so that the area under its curve bounded by the x axis is equal 1 when computed over the range of X for which f(x) is defined. • The probability that X assumes a value between a and b is equal to the shaded area under the density function between the ordinates at x=a and x=b and from integral calculus.

Page 9: The reasons for the appearance of Statistics: Statistics ... · the total number of its observations is a finite number the total number of its observations is an infinite number

Naseebah Melaibari - Page (9)

Discrete random variable Continuous random variable

probability distribution

(density given)

(cumulative given)

----------------------------------------

Density function =

(cumulative

given)

The cumulative distribution

function

(density given)

Mean or expected value x

Mean or expected value g(x)

The variance

Let X a random variable with probability distribution f(x) and mean m. The variance of X is

The Variance and mean theorems:

(1) Some properities of means and variance of random variables

The variance of a random variable X is

(2) (3)

If a and b are constants, then

Corollary(1): Setting a=0, we see that E(b)=b

Corollary(2): Setting b=0, we see that E(aX)=aE(X)

If a and b are constants, then

Corollary(1): Setting a=1 We see that

Corollary(2): Setting b=0, We see that

Page 10: The reasons for the appearance of Statistics: Statistics ... · the total number of its observations is a finite number the total number of its observations is an infinite number

Naseebah Melaibari - Page (10)

Probability Distribution:

Discrete Probability Distribution Some Continuous Probability Distributions

n= (العدد األكبر)عدد مرات تكرار التجربة x= (العدد األصغر)عدد مرات نجاح التجربة P= احتمال النجاح q= 1-P احتمال الفشل The Bernoulli process must possess the following properities: 1. The experiment consists of n repeated trail. 2. Each trial results in an outcome that may be classified as a success or a failure. 3. The probability of success, denoted by p, remains constant from trail to trail. 4. The repeated trails are independent. The number X of successes in n Bernoulli trails is called a binomial random variable. The probability distribution of this discrete random variable is called the binomial distribution. Binomial Distribution: A Binomial trail can result in a success with probability p and a failure with probability 1-p. Then the probability distribution of the binomial random variable X, the number of successes in n independent trails, is

The mean and variance of the binomial distribution b(x;n,p) are

Normal Distribution: Definition: The density of the normal random variable X, with mean m and variance σ2 , is Where

Some properties of the normal curve:

The mode, which is the point on the horizontal axis the curve is a maximum, occurs at x= m.

The curve is symmetric about a vertical axis through the mean.

The curve has its points of inflection at is concave downward if and is concave upward otherwise.

The normal curve approaches the horizontal axis asymptotically as we proceed in either direction away from the mean.

The total area under the curve and above the horizontal axis is equal to one. The standard normal distribution: The distribution of a normal random variable with mean 0 and variance 1 is called a standard normal distribution.

من على طول نطلع القيمة

الجدول

K= x.yr x.y من اليسار

0.0r من فوق

أطلع قيمتها من الجدول

1أطرحها من

من الجدول أطلع قيمة

a, k