Post on 28-Nov-2014
description
Statistics is the science of collection, analysis and presentation of numerical data. It is used for decision-
making and inferential determination in different situations.
It deals with :
Large groups of values, not a single entity or value
Uncertainty determination (probability)
Identifying patterns in values
Aspects of information that can be described numerically
Branches of statistics:
• Descriptive Statistics deals with concepts and methods concerned with summarization and
description of important aspects of numerical data. Its consists of condensation of data, their
graphical display and the computation of few numerical quantities that provide information
about centre of the data and indicate the spread of the observations.
• Inferential Statistics deals with procedure for making inferences about the characteristics that
describe the larger group of data or the whole called the population, from the knowledge
derived from only a part of the data named as sample. It includes the estimation of population
parameters and testing of statistical hypotheses. This part is based on probability theory.
Population is the set of all outcomes of an event. It can also be considered as a collection of all the
observations regarding any phenomenon or entity. It can be finite or infinite.
Parameters are numerical values that describe a population e.g. mean.
Sample is a subset of the population.
Quantitative variable: numerical data
1. Discrete: integer or whole number
2. Continuous: any value between any given range is possible whether it is a whole number or a
decimal number or fraction.
Qualitative variable: non-numerical data e.g. eye color, gender
Scales:
1. Nominal : numbers define classes but there is no significance in ranking or ordering of numbers
2. Ordinal: numbers define classes and ranking or ordering of numbers is significant.
3. Interval: any scale possessing a constant interval size
Collection of data:
1. Personal direct investigation
2. Indirect investigation
3. Questionnaires and surveys
4. Local sources ( no formal investigation )
5. Enumerators
The main aims of classification are
To reduce the large set of data to an easily understood summary
To display the points of similarity and dissimilarity
To reflect the important aspects of the data
To make comparison and inference of data easier
Frequency curves come in a variety of shapes. A unimodal curve is one that rises to a single peak and
then declines. A bimodal curve has two different peaks.
Advantages Disadvantages
MEAN
Easy to compute and comprehend
All observations taken into account
Can be determined for any set
Accuracy affected by outliers
Misleading results
Highly skewed distribution, mean is not a good measure of location
GEOMETRIC MEAN
Rigorously defined mathematical formula
All observations taken into account
Not effected by sampling variability
Cannot be computed for all sets
It is difficult to comprehend
HARMONIC MEAN
Rigorously defined mathematical formula Difficult to comprehend
Not affected by sampling variability
All observations have bearing on its value
Cannot be computed for all types of sets
MEDIAN
Easy to compute and comprehend
Not affected by outliers
In highly skewed distribution, it is a good measure of location
Has no strict definition
It cannot be mathematically treated further than what it already is
Necessitates the arrangement of data, time consuming
MODE
Simple calculation
Not affected by outliers
Can be evaluated for both Qualit. and Quanti. data
No further mathematical treatment
No strict definition
Does not take into account all observations
An experiment that can result in different outcomes, even though it is repeated in the same manner
every time, is called a random experiment.
Sample space = population
Event= sample
When A and B have no outcomes in common, they are said to be mutually exclusive
or disjoint events.