Lecture Stat GS

download Lecture Stat GS

of 97

Transcript of Lecture Stat GS

  • 8/3/2019 Lecture Stat GS

    1/97

    Introduction

    STATISTICS is the most important science in thewhole world; for upon it depends the practical

    application of every science and of every art; theone science essential to all political and socialadministration, all education, all organization

    based on experience, for it only gives results of

    our experience.-Florence Nightingale-

  • 8/3/2019 Lecture Stat GS

    2/97

    Basic Concepts

    The term statistics originated from the Latinword status, which means state.

    The original definition was the sciencedealing with data about the condition of astate or community.

    Statistics is the branch of science that deals withthe collection, presentation, organization,analysis, and interpretation of data.

  • 8/3/2019 Lecture Stat GS

    3/97

    Basic Concepts

    Thepopulation is the collection of all

    elements under consideration in a statistical

    inquiry.

    The sample is a subset of a population.

    The variable is a characteristic or attribute of

    the elements in a collection that can assume

    different values for the different elements. An observation is a realized value of a variable.

    Data is the collection of observations.

  • 8/3/2019 Lecture Stat GS

    4/97

    Basic Concepts

    Example

    Variable Possible Observations

    S= sex of a student Male, Female

    E= employment status of an employees Permanent,Temporary,Contractual

    I = monthly income of a person in greater than or equal

    pesos to zero

    N = number of children of a teacher n = 0, 1, 2, 3, ..

    H = height of a basketball player h > 0 cms, in.

  • 8/3/2019 Lecture Stat GS

    5/97

    Basic Concepts

    Example

    The office ofAdmissions is studying the relationship

    between the score in the entrance examination during

    application and the general weighted average (GWA)upon graduation among graduates of the university from2005 to 2010.

    Population:collection of all graduates of the universityfrom the years 2005 to 2010.

    VariableofInterest: score in the entrance examinationand GWA

  • 8/3/2019 Lecture Stat GS

    6/97

    Basic Concepts

    The Department of Health is interested I

    determining the percentage of children below

    12 years old infected by the Hepatitis B virus

    in Laguna in 2010.

    Population: set of all children below 12 years old

    in Laguna in 2010.

    Variableof interest: whether or not the child has

    ever been infected by the Hepatitis B virus.

  • 8/3/2019 Lecture Stat GS

    7/97

    Basic Concepts

    Theparameteris a summary measure

    describing a specific characteristic of the

    population.

    The statisticis a summary measure describing

    a specific characteristic of the sample.

  • 8/3/2019 Lecture Stat GS

    8/97

    Fields of Statistics

    Two Major fields of Statistics1. Applied Statistics Is concerned with the procedures and techniques used in

    the collection, presentation, organization, analysis andinterpretation of data. We study applied statistics in order to learn how to select and

    properly implement the most appropriate statistical methods thatwill provide answers to our research problem.

    2. Theoretical or Mathematical Statistics

    is concerned with the development of the mathematical

    foundations of the methods used in applied statistics.We study mathematical statistics in order to understand the

    rationale behind the statistical methods we use in analysis and toestablish new theories that will validate the use of new statisticalmethods or modifications of existing statistical methods in solvingproblems that are more complex.

  • 8/3/2019 Lecture Stat GS

    9/97

    Two Major Areas of Interest in

    Applied Statistics

    1. Descriptive Statistics

    Includes all the techniques used in organizing,summarizing, and presenting the data on hand.

    The data on hand may have come from all the elementsof the population so that the analysis using descriptivestatistics will allow us to describe the population.

    The data on hand may also come from the elements ofa selected sample. In this case, the analysis using

    descriptive statistics will only allow us to describe thesample. The methods used in descriptive statistics willnot allow us to generalize about the population usingthe sample data.

  • 8/3/2019 Lecture Stat GS

    10/97

    Two Major Areas of Interest in

    Applied Statistics

    In descriptive statistics, we use tables and charts,and compute for summary measures likeaverages, proportions, andpercentages.

    2. Inferential Statistics We do not simply describe the sample data. Rather,

    we use the sample data to form conclusions about thepopulation. Since the sample is only a subset of the

    population, then we arrive at the conclusions aboutthe population using inferential statistics underconditions of uncertainty.

  • 8/3/2019 Lecture Stat GS

    11/97

    Two Major Areas of Interest in

    Applied Statistics

    Example

    1. A badminton player wants to know his

    average score for the past 10 games.descriptive

    2. Joseph wants to determine the variability of

    his seven exam scores in Statistics.descriptive

  • 8/3/2019 Lecture Stat GS

    12/97

    Two Major Areas of Interest in

    Applied Statistics3. Based on last years electricity bills, Mrs. Mercado would like to forecast

    the average monthly electricity bill she will pay for the next year based onher average monthly bill in the past year.

    inferential

    4. Efren Bata wants to estimate his chance of winning in the next World

    Championship game in Billiards based on his average scores lastchampionship and the averages of the competing players.

    inferential

    5. Dr. Escape wants to determine the proportion spent on transportationduring the past four months using the daily records of expenditure thatshe keeps.

    descriptive6. A politician wants to determine the total number of votes his rival obtained

    in the sample used in the exit poll.

    descriptive

  • 8/3/2019 Lecture Stat GS

    13/97

    Steps in a Statistical InquiryStatistical inquiry is a designed research that provides

    information needed to solve a research problem1. Describe the characteristic of the elements in the

    population under study through the computation orestimation of a parameter such as the proportion, total,and average.

    2. Compare the characteristics of the elements in thedifferent subgroups in the population through contrasts oftheir respective summary measures.

    3. Justify an assertion made by the researcher about aparticular characteristic of the population or subgroups in

    the population.4. Determine the nature and strength of relationships among

    the different variables of interest.

    5. Identify the different groups of interrelated variablesunder study.

  • 8/3/2019 Lecture Stat GS

    14/97

    Steps in a Statistical Inquiry

    6. Reveal the natural groupings of the elements in thepopulation based on the values of a set of variables.

    7. Determine the effects of one or more variables on a

    response variable.8. Clarify patterns and trends in the values of a variable

    over time or space.

    9. Predict the value of a variable based upon its

    relationship with another variable.10. Forecast future values of a variable using a sequence

    of observations on the same variable taken over time.

  • 8/3/2019 Lecture Stat GS

    15/97

    Basic Steps in Performing a Statistical

    Inquiry1. Identify the problem.

    The researchers need to define and state the problem in a clearmanner so that they can arrive at appropriate solutions andrecommendations later on.

    2. Plan the study.Some statistical inquiries do not reach completion or do not succeedin arriving at useful information for sound decision making because ofthe researchers failure to plan the study carefully.

    3. Collect the data.

    The investigators take extra measures to ensure the quality of thedata collected. If the collected data were incomplete, outdated,inaccurate, or worse yet, fabricated, then it will be useless to proceedwith data analysis.

    There are different ways of collecting data. These are throughsurveys, observation, experiments, and use of available documenteddata.

  • 8/3/2019 Lecture Stat GS

    16/97

    Basic Steps in Performing a Statistical

    Inquiry4. Explore the data.

    Prior to data analysis, the investigators need to explore andunderstand the essential features of their data. This processallows them to determine if their data satisfy the assumptionsmade in the derivation of the statistical technique that they willuse for analysis.

    5. Analyze data and interpret the results.After collecting and organizing data, analysis follows. Theinvestigators once more carry out the plans specified in the researchdesign but this time on data analysis. They then examine all theresults on tables, charts, estimated summary measures, and tests ofhypotheses. They need to check that they were able to meet all of

    the specific objectives. Based on the analysis carried out, theinvestigators must be able to answer the research problem and giverecommendations on how this can be useful in decision making.

    The investigators must double-check the results that contradictexisting theory or the earlier hypothesis made. They may havecommitted errors in data collection or analysis. If not, they wouldhave to propose possible explanations for these results or suggestfuture statistical inquiries that could help explain the inconsistency.

  • 8/3/2019 Lecture Stat GS

    17/97

    Basic Steps in Performing a Statistical

    Inquiry

    6. Present the results.

    After analyzing the data and interpreting

    the results, the investigators must presentthese results in a clear and concise manner to

    the users of the research.

  • 8/3/2019 Lecture Stat GS

    18/97

    Measurement

    Process of determining the value or label of thevariable based on what has been observed

    Ratio level of measurement has all of thefollowing properties;

    A) the numbers in the system are used to classifya person/object into distinct, nonoverlapping,

    and exhaustive categories; B) the system arranges the categories according

    to magnitude;

  • 8/3/2019 Lecture Stat GS

    19/97

    C) the system has a fixed unit of measurement

    representing a set size throughout the scale,

    and

    D) the system has an absolute zero.

    Examples: 1. allowance of a student (in pesos) 2. distance travelled by an airplane (in kms)

    3. the speed of a car (in kms/hr)

    4. height of an adult (in cms)

    5. weight of a newborn baby (in kgs)

  • 8/3/2019 Lecture Stat GS

    20/97

    Interval level of measurement satisfies only the firstthree properties of the ratio level.

    The only difference between the interval and the ratiolevels is the interpretation of the value O in theirscales. The zero point in the interval level is not anabsolute zero. Unlike in the ratio scale, the zero valuein the interval scale has an arbitrary interpretation anddoes not mean the absence of the property we aremeasuring.

    Examples: Temperature readings measured in degreesCentigrade (0 C), Intelligent Quotient (IQ)

    If the temperature is O C, do we say that there is no

    temperature? Of course not, Since O C is not anabsolute zero.

  • 8/3/2019 Lecture Stat GS

    21/97

    Ordinal level of measurement satisfies onlythe first two properties of the ratio level.

    Examples.

    1. Size of shirt ( small, medium, large, extra

    large) 2. Performance rating of a salesperson

    measured as follows: Excellent, very good,good, satisfactory, poor

    3. Faculty rank: Professor, Associate Professor,Assistant Professor, Instructor

  • 8/3/2019 Lecture Stat GS

    22/97

    Nominal level of measurement satisfies only

    the first property of the ratio level.

    Examples:

    1. Gender

    2. Civil Status

    3. Type of movies ( Action, Roamance,

    Comedy. Others)

    4. Major island group ( Luzon, Visayas,

    Mindanao)

  • 8/3/2019 Lecture Stat GS

    23/97

    Data Collection Methods

    1. Use of available documented data inpublished or unpublished studies.

    2. Surveys

    3. Experiments

    4. Observations

  • 8/3/2019 Lecture Stat GS

    24/97

    Collection of Data

    Primary Data

    Are data documented by the primary source. The

    data collectors themselves documented this data.

    Secondary Data

    are data documented by a secondary source. AN

    individual/agency, other than the data collectors,documented this data.

  • 8/3/2019 Lecture Stat GS

    25/97

    Collection of Data

    Survey

    is a method of collecting data on thevariable of interest by asking people

    questions. When data came from asking allthe people in the population, then the study iscalled a census. On the other hand, when datacame from asking a sample of people selectedfrom a well-defined population, then thestudy is called a sample survey.

  • 8/3/2019 Lecture Stat GS

    26/97

    Collection of Data

    Experiment

    is a method of collecting data where there

    is direct human intervention on the conditionsthat may affect the values of the variable of

    interest.

  • 8/3/2019 Lecture Stat GS

    27/97

  • 8/3/2019 Lecture Stat GS

    28/97

    Collection of Data

    Observation method

    is a method of collecting data on the

    phenomenon of interest by recording theobservations made about the phenomenon as

    it actually happens.

  • 8/3/2019 Lecture Stat GS

    29/97

    Collection of Data

    Examples:

    1) A local TV network asked voters to indicate whom theyvoted as they exited the polling booth. Survey

    2) A private hospital divides terminally ill patients into twogroups, with one group receiving medication A and theother group receiving medication B. After a month, theymeasured each subjects improvement. Experiment

    3) A researcher investigates the level of pollution in keypoints in Metro Manila by setting up pollution measuringdevices at selected intersections. Observation

  • 8/3/2019 Lecture Stat GS

    30/97

    Collection of Data

    Questionnaire

    is a measurement instrument used in various datacollection methods, particularly surveys. We use aquestionnaire to determine and record the

    measurements of characteristics of the elements in astudy such as height, weight, color, size, attitude, pastand present behavior, and opinions.

    Two Types of Questionnaire Used in SurveysSelf-Administered Questionnaire

    Interview Schedule

  • 8/3/2019 Lecture Stat GS

    31/97

    Collection of Data

    Types of Questions

    1. Closed-ended question

    is a type of question that includes a list of response

    categories from which the respondent will select his

    answer.

    2. Open-ended question

    is a type of question that does not include response

    categories.

  • 8/3/2019 Lecture Stat GS

    32/97

    Collection of DataOpen-ended Closed-ended

    Advantages Respondent can freely answer Facilitates tabulation of responses

    Can elicit feelings and emotions of the

    respondent

    Easy to code and analyze

    Can reveal new ideas and views that the

    researcher might not have considered

    Saves time and money

    Good for complex issues High response rate since it is

    simple and quick to answer

    Good for questions whose possible

    responses are unknown

    Response categories make

    questions easy to understand

    Allows respondent to clarify answers Can repeat the study and easily

    make comparisons

    Gets detailed answers

    Shows how respondents think

  • 8/3/2019 Lecture Stat GS

    33/97

    Collection of DataOpen-ended Closed-ended

    Disadvantages Difficult to tabulate and code Increases respondent burdenwhen there are too many or too

    limited response categories

    High refusal rate because it requires

    more time and effort on the

    respondent

    Bias responses against categories

    excluded in the list of choices

    Respondent needs to be articulate Difficult to detect if respondent

    misinterpreted the question

    Responses can be inappropriate or

    vague

    May threaten respondent

    Responses have different levels of

    detail

  • 8/3/2019 Lecture Stat GS

    34/97

    Collection of Data

    Pitfalls To Avoid in Wording Questions

    1. Avoid Vague Questions

    2. Avoid Biased Questions

    3. Avoid Confidential and Sensitive Questions

    4. Avoid Questions that are Difficult to Answer

    5. Avoid Questions that are Confusing or

    Perplexing to answer

    6. Keep the question short and simple

  • 8/3/2019 Lecture Stat GS

    35/97

    Sampling and Sampling Techniques

    Advantages of Sampling

    1. Sampling is more economical.

    2. A study based on a sample requires less time to

    accomplish.

    3. Sampling allows for a wider scope for the study.

    4. Results of studies based on a sample can even

    be more accurate.5. Sampling is sometimes the only feasible

    method.

  • 8/3/2019 Lecture Stat GS

    36/97

    Sampling and Sampling Techniques

    Target Population is the population we want study.

    Sampled Populationis the population from where

    we actually select the sample.

    Elementary unit or element is a member of thepopulation whose measurement on the variableof interest is what we wish to examine.

    Sampling unit is a unit of the population that weselect in our sample.

  • 8/3/2019 Lecture Stat GS

    37/97

  • 8/3/2019 Lecture Stat GS

    38/97

    Sampling and Sampling Techniques

    Target populationset of all establishments in the manufacturing, mining, and agricultureindustries

    Elementary unitan establishment (which is an economic unit that engages under asingle ownership or control in one predominantly one kind of economicactivity at a fixed single physical location) in the manufacturing,mining, and agricultural industries.

    Samplingunitan enterprise (which is an economic unit with one or moreestablishments into enterprises under a single ownership or control) inthe manufacturing, mining and agricultural industries.

  • 8/3/2019 Lecture Stat GS

    39/97

    Sampling and Sampling Techniques

    Sampling Frame or Frame

    is a list or map showing all the sampling units in thepopulation.

    Example

    Suppose a researcher is interested in getting the opinion ofeligible voters on the media campaign of candidates running for topposition in the government.

    Targetpopulation:set of all eligible voters

    Sampling Frame: Commission on Elections (COMELEC) list ofregistered voters.

    Sampledpopulation:set of registered voters in the list ofCOMELECThis sampledpopulationexcludes theeligiblevoters whodid

    not registerordidnot revalidate their registration with theCOMELECduring the registrationperiod.

  • 8/3/2019 Lecture Stat GS

    40/97

    Sampling and Sampling Techniques

    Sampling Error

    is the error attributed to the variation

    present among the computed values of thestatistic from the different possible samples

    consisting of n elements.

    Nonsampling Error

    is the error from the other sources apart

    from sampling fluctuations.

  • 8/3/2019 Lecture Stat GS

    41/97

    Sampling and Sampling Techniques

    Sampling error occurs when we collect datafrom a sample and not from all the elementsin the population. It is an error innate in

    results based from a sample.

    Classifications of Nonsampling error

    1. Measurement Error2. Error in the implementation of the sampling

    design

  • 8/3/2019 Lecture Stat GS

    42/97

    Sampling and Sampling Techniques

    Measurement error

    Is the difference between the true value of thevariable and the observed value used in the study.

    This occurs when we are using a faultymeasurement instrument or when we do not usethe instrument properly.

    Error in the implementation of the sampling design

    occurs when we do not adhere to the proceduresand requirements as specified in the samplingdesign.

  • 8/3/2019 Lecture Stat GS

    43/97

    Sampling and Sampling Techniques

    Total Error

    Nonsampling Error Sampling ErrorError in the Implementation of Measurement Error

    The Sampling Design

    Instrument Error

    Selection Error Response Error Response Bias

    Nonresponse Bias

    Frame Error Processing Error

    Population Specification Interviewer Bias

    Error

    Surrogate Information

    Error

  • 8/3/2019 Lecture Stat GS

    44/97

    Sampling and Sampling Techniques

    Measurement Errors

    1. Interviewer bias may occur when an enumerator reacts toa respondents reply

    2. Errors in editing and coding

    3. Bias occurs when respondent tends to respond to items inan acceptable manner instead of truthfully

    4. Errors in conversion from one unit of measurement toanother

    5. Response set occurs when a respondent agrees with all

    the statements without careful consideration to each oneof the given statements.

    6. Faulty measurement devices such as a weighing scale thatis not properly claibrated

  • 8/3/2019 Lecture Stat GS

    45/97

    Sampling and Sampling Techniques

    Errors in the Implementation of the Sampling Design

    1. Sampling frame defines a sampled population that istoo far from the target population

    2. Sampling frame is outdated

    3. Complicated sample selection procedure is done inthe field by confused enumerators who incorrectlyselect the respondents included in the sample

    4. Lazy enumerators do no follow the specified sample

    selection procedure5. Target population is the target consumers of a

    particular brand but researchers incorrectly define thequalifications of the target consumers.

  • 8/3/2019 Lecture Stat GS

    46/97

    Sampling and Sampling Techniques

    Probability Sampling

    is a method ofselecting a sample wherein

    each element in the population has a known,

    nonzero chance ofbeing included in the

    sample; otherwise, it is nonprobability

    sampling.

  • 8/3/2019 Lecture Stat GS

    47/97

    Sampling and Sampling Techniques

    Probability Sampling Methods

    1. Simple Random Sampling2. Stratified Sampling

    3. Systematic Sampling

    4. Cluster Sampling5. Multistage Sampling

  • 8/3/2019 Lecture Stat GS

    48/97

    Probability Sampling Methods

    Simple Random Sampling is a probability samplingmethod wherein all possible subsets consisting of nelements selected from the N elements of thepopulation have the same chances of selection.

    In simple random sampling without replacement(SRSWOR),all the n elements in the sample must bedistinct from each other.

    In simple random sampling with replacement(SRSWR),the n elements in the sample needed not be

    distinct, that is, an element can be selected more thanonce to be a part of the sample.

  • 8/3/2019 Lecture Stat GS

    49/97

  • 8/3/2019 Lecture Stat GS

    50/97

    Probability Sampling Methods

    Systematic sampling is a probability sampling

    method wherein the selection of the first

    element is at random and the selection of the

    other elements in the sample is systematic by

    subsequently taking kth element from the

    random start, where K is the sampling interval.

  • 8/3/2019 Lecture Stat GS

    51/97

    Probability Sampling Methods

    Cluster sampling is a probability sampling

    method wherein we divide the population into

    nonoverlapping groups or clusters consisting

    of one or more elements, and then select a

    sample of clusters. The sample will consist of

    all the elements in the selected clusters.

  • 8/3/2019 Lecture Stat GS

    52/97

    Probability Sampling Methods

    Multistage sampling is a probability sampling

    method where there is a hierarchical

    configuration of sampling units and we select

    a sample of these units in stages.

  • 8/3/2019 Lecture Stat GS

    53/97

    Basic Methods of Probability SamplingMethod Procedure Advantages Disadvantages When to use

    1. SimpleRandom

    Sampling

    List theelements and

    number them

    from 1 to N.

    Select n

    numbers from1 to N, using a

    randomization

    mechanism.

    The sample will

    consist of theelements

    correspondings

    to the numbers

    selected.

    Design issimple and

    easy to

    understand

    Estimation

    methods aresimple and

    easy.

    It needs a listof all elements

    in the

    population.

    Sample size

    must be verylarge for

    heterogeneous

    populations in

    order to get

    reliable results.

    High

    transportation

    cost if

    elements are

    widely spread

    geographically.

    If the elementsare

    homogeneous

    with respect to

    the

    characteristic

    under study.

    If the elements

    are not so

    spread out

    geographically.

  • 8/3/2019 Lecture Stat GS

    54/97

    Basic Methods of Probability SamplingMethod Procedure Advantages Disadvantages When to use

    2. Stratified RandomSampling

    Divide the populationinto nonoverlapping

    strata.

    Obtain a simple

    random sample from

    each stratum.

    The sample consists of

    the selected samples inall the strata.

    Estimates are morereliable compared to

    SRS of the same

    sample size if the

    population has been

    divided into strata with

    homogeneous

    elements, but the

    strata are very different

    from each other.

    Estimation of

    parameter for each

    subpopulation is easier

    when compared to

    other sampling

    methods

    It can faciltiate theadministration and

    supervision of data

    collection, especially

    the stratification

    variables is geographic

    subdivision.

    It needs a list of allelements of the

    population, including

    their values of the

    stratification variable.

    High transportation

    cost if elements are

    widely spread

    geographically, unlessthere are field offices

    in each geographic

    area.

    If population isheterogeneous with

    respect to the

    characteristic under

    study.

    If we want to perform

    separate analysis for

    certain subpopulations.

    If we wish to facilitate

    the administration of

    the collection of data.

  • 8/3/2019 Lecture Stat GS

    55/97

    Basic Methods of Probability SamplingMethod Procedure Advantages Disadvantages When to use

    3. Systematic Sampling Assign a uniquenumber from 1 to N to

    each element of the

    population.

    Determine the

    sampling interval, k.

    Obtain the first

    element in the sampleusing a randomization

    mechanism.

    Get the rest of the

    elements in the sample

    by taking every kth

    element from the

    random start.

    Identifying the units inthe sample is easy.

    The design does not

    require a list of all

    elements in the

    population.

    The sample is

    distributed evenly overthe entire population.

    It gives more reliable

    estimates than simple

    random sampling when

    the arrangement of the

    elements in the

    sampling frame is

    according tomagnitude.

    Estimates may no bereliable when there are

    periodic regularities in

    the list.

    It requires information

    on the arrangement of

    the elements in the

    sampling frame to

    determine thereliability of the

    estimates.

    If there is no availablelist of elements in the

    population.

    If the arrangement of

    the elements in the

    sampling frame is

    according to

    magnitude.

  • 8/3/2019 Lecture Stat GS

    56/97

    Basic Methods of Probability SamplingMethod Procedure Advantages Disadvantages When to use

    4. Cluster Sampling

    5. Multistage Sampling

    Divide the populationinto nonoverlapping

    clusters.

    Select a sample of

    clusters using simple

    random sampling.

    The sample consists of

    all the elements in theselected clusters.

    Select sample in

    several stages.

    The design needs onlya list of clusters and

    not a list of elements.

    Transportation and

    listing costs are usually

    lower.

    Reduced transportation

    and listing cost.

    Estimates are usuallyless reliable when

    compared to other

    sampling design.

    It is not cost-efficient if

    the clusters are large

    and the elements are

    homogeneous with

    respect to thecharacteristic under

    study.

    Difficult estimation

    procedures.

    The design needs

    thorough planningbefore performing

    sample selection.

    If there is no availablelist of elements.

    If cost is more

    important than

    reliability of the

    estimates.

    If the geographic

    coverageof the

    population of interest

    is wide.

    If no listing of the

    elementary units in the

    population is available.

  • 8/3/2019 Lecture Stat GS

    57/97

    Methods of Nonprobability Sampling

    Nonprobability sampling methods do not makeuse any randomization mechanism inidentifying the sampling units included in the

    sample. Rather, it allows the researcher tochoose the units in the sample objectively.Since the selection of the sample is subjective,there is consequently no objective way of

    assessing the reliability of the results withoutmaking assumptions that there are oftentimesdifficult to verify.

  • 8/3/2019 Lecture Stat GS

    58/97

    Methods of Nonprobability Sampling

    Haphazard or Convenience Sampling

    the sample consists of elements that are mostaccessible or easiest to contact. This usually includes

    friends, acquaintances, volunteers, and subjects whoare available and willing to participate at the time ofthe study.

    Example: The adviser of a student organization isconducting a research on study habits of students in

    the university. To select a sample, the adviser includesthe members of the student organization because it iseasy to reach them and get data from them.

  • 8/3/2019 Lecture Stat GS

    59/97

    Methods of Nonprobability Sampling

    Judgement or Purposive Sampling

    The researcher chooses a sample that agrees

    with his/her subjective judgement of arepresentative sample.

  • 8/3/2019 Lecture Stat GS

    60/97

    Methods of Nonprobability Sampling

    Quota Samplingis the nonprobability sampling version ofstratified sampling. In quota sampling, the

    researcher also chooses the grouping or strata inthe study but the selection of the sampling unitswithin the stratum does not make use of aprobability sampling method. The researcher just

    sets a quota or number of sampling units to beincluded in each grouping but uses conveniencesampling to select the units within each grouping

  • 8/3/2019 Lecture Stat GS

    61/97

    Sampling and Sampling Techniques

    Sample Size DeterminationAn important component of the sampling designis the sample size. The number of elements that

    you include in the sample must not be too smallbecause this will not allow you to come up withreliable estimates.

    In determining the sample size, you should

    always consider the reliability of the results of thestudy and, at the same time, the cost involved indoing the study.

  • 8/3/2019 Lecture Stat GS

    62/97

    Presentation of DataTextual Presentation

    Textual presentation ofdata incorporates importantfigures in a paragraph oftext.

    Tabular Presentation

    Tabular presentation ofdata arranges figures in asystematic manner in rows and columns.

    Graphical PresentationGraphical presentation ofdata portrays numerical

    figures or relationships among variables in pictorial form.

  • 8/3/2019 Lecture Stat GS

    63/97

    Organization of Data

    Raw Data

    are data in their original form.

    Array

    is an ordered arrangement of data according

    to magnitude. We also refer to the array assorted data or ordered data.

  • 8/3/2019 Lecture Stat GS

    64/97

    Organization of Data

    Frequency Distribution

    Frequency distribution is a way of summarizing

    data by showing the number of observations

    that belong in the different categories or

    classes We also refer to this as grouped data.

  • 8/3/2019 Lecture Stat GS

    65/97

    Frequency Distribution

    Final Grade No. of

    Students

    Final Grade No. of

    Students

    Final Grade No. of

    Students

    40-49 8 40-46 7 40-45 7

    50-59 23 47-53 9 46-51 6

    60-69 42 54-60 18 52-57 1070-79 62 61-67 30 58-63 24

    80-89 58 68-74 41 64-69 26

    90-99 17 75-81 48 70-75 35

    Total 210 82-88 39 76-81 45

    89-95 13 82-87 34

    96-102 5 88-93 13

    Total 210 94-99 10

    Total 210

  • 8/3/2019 Lecture Stat GS

    66/97

    Frequency Distribution Class Intervalis the range of values that belong in the class

    or category. Class Frequencyis the number of observations that belong

    in a class interval.

    Class Limits are the end numbers used to define the classinterval. The lower class limit (LCL) is the lower end

    number while the upper class limit (UCL) is the upper endnumber.

    Class Boundaries are the true limits. If the observations arerounded figures, then we identify the class boundariesbased on the standard rules of rounding as follows: the

    lower class boundary (LCB) is halfway between the lowerclass limit of the class and the upper class limit of thepreceding class while the upper class boundary (UCB) ishalfway between the upper class limit of the class thelower class limit of the next class.

  • 8/3/2019 Lecture Stat GS

    67/97

    Frequency Distribution

    Class Size is the size of the class interval. It is the

    difference between the upper class boundaries of

    the class and the preceding class; or the

    difference between the lower class boundaries ofthe next class and the class. We can also use the

    class limits in place of the class boundaries.

    Class Markis the midpoint of a class interval. It is

    the average of the lower class limit and the upper

    class limit.

  • 8/3/2019 Lecture Stat GS

    68/97

    Frequency Distribution

    Final Grade No. ofStudents

    40-49 8

    50-59 23

    60-69 42

    70-79 62

    80-89 58

    90-99 17

    Total 210

    40-49 first Class Interval Class size; 50 40 = 10

    8 Class frequency for the class

    interval 40-49

    Class Mark or Midpoint; (40 + 49)/2 =

    44.5

    40-49; 40 is the lower class limit

    while 49 is the upper class limit

    39.5-49.5; lower and upper class

    boundaries

  • 8/3/2019 Lecture Stat GS

    69/97

    Frequency DistributionSteps in the Construction of a Frequency Distribution

    1. Determine the adequate number of classes K. Usuallybetween 5 to 20 or K = 1+3.322log n (Sturgess rule)

    Log is the ordinary logarithm (base 10)

    2. Determine the range (R) = highest observed value lowest observed value.

    3. Compute for C = R/K4. Determine the class size C, by rounding off C to a

    convenient number.

    5. Choose the lower class limit of the first class. Usuallybased on the lowest observed value.

    6. Tally all the observed values in each class interval.

    7. Sum the frequency column and check against thetotal number of observations.

  • 8/3/2019 Lecture Stat GS

    70/97

    Frequency Distribution

    Less Than Cumulative FrequencyDistribution (CFD)

    shows the number of observations with values largerthan or equal to the lower class boundary.

    Cumulative Frequency Distribution is another variation

    of the frequency distribution. We use this to determinehow many observations have values smaller than orgreater than a specified class boundary. It shows theaccumulated frequencies of successive classes, either atthe beginning or at the end of the distribution.

  • 8/3/2019 Lecture Stat GS

    71/97

    Frequency Distribution

    Final Grade No. of

    Students

    CFD

    40-49 8 8 210

    50-59 23 31 202

    60-69 42 73 179

    70-79 62 135 137

    80-89 58 193 75

    90-99 17 210 17

    Total 210

    h l f h

  • 8/3/2019 Lecture Stat GS

    72/97

    Graphical Presentation of the

    Frequency Distribution

    1. Frequency Histogram

    2. Frequency polygon

    3. Less Than Ogive4. Greater Ogive

    5. Pie Chart

  • 8/3/2019 Lecture Stat GS

    73/97

    Measures of Central TendencyMeasures of CentralTendency, or "location", attempt to quantify what we mean

    when we think of as the "typical" or "average" score in a data set. The concept isextremely important and we encounter it frequently in daily life.

    For example, we often want to know before purchasing a car its average distanceper litre of petrol. Or before accepting a job, you might want to know what atypical salary is for people in that position so you will know whether or not you aregoing to be paid what you are worth. Or, if you are a smoker, you might often thinkabout how many cigarettes you smoke "on average" per day.

    Statistics geared toward measuring central tendency all focus on this concept of"typical" or "average." As we will see, we often ask questions in psychologicalscience revolving around how groups differ from each other "on average". Answersto such a question tell us a lot about the phenomenon or process we are studying.

    We also use measures of central tendency to facilitate the comparison of two or

    more data sets. For example, a teacher may want to answer the question, Whoperformedbetter in exam, t he girls or the boys?The teacher can then comparethe average score of the girls with the average score of the boys. If the averagescore of the girls is higher, then the teacher can conclude that the girls generallyperformed better than the boys in the exam.

  • 8/3/2019 Lecture Stat GS

    74/97

    Measures of Central Tendency

    The Arithmetic Mean

    The arithmetic mean, or simply called the mean, is

    the most common type of average. It is the sum of

    all the observed values divided by the number ofobservations.

    population mean

    bar sample mean

  • 8/3/2019 Lecture Stat GS

    75/97

    Measures of Central Tendency

    The Weighted Mean

    When all the individual observed values have

    equal importance, we compute for the arithmetic

    mean. On the other hand, if we believe that theindividual observed values vary in their degree of

    importance, then it is advisable to use a

    modification of the mean that we call the

    weightedmean. The WeightedMean assignsweights to the observations depending on their

    relative importance.

  • 8/3/2019 Lecture Stat GS

    76/97

    Measures of Central Tendency

    The Trimmed Mean

    We have noted that the mean may not bea good measure of central tendency whenever

    there are outliers. An outlier pulls the value ofthe mean in its direction and farther awayfrom all of the observations. However, amodification of the arithmetic mean, called

    the TrimmedMean, addresses the particularproblem.

  • 8/3/2019 Lecture Stat GS

    77/97

    Measures of Central Tendency

    The Median

    The median divides an ordered set ofobservations into two equal parts. In other

    words, it is the measure occupying thepositional center of the array. If anobservation is smaller than median, then itbelongs in the lower half of the array; and if

    an observation is larger than the median, thenit belongs in the upper half of the array.

  • 8/3/2019 Lecture Stat GS

    78/97

    Measures of Central Tendency

    The Mode

    The mode is the most frequent observedvalue in the data set. It is the observed value that

    occurs the greatest number of times. If the dataset is small, we easily see the mode throughinspection. However, as the data becomes large,finding the mode is quite tedious without a

    computer. Generally, the mode is a less popularmeasure of central tendency as compared to themean and the median.

    Summary of the Different Measures of Central

  • 8/3/2019 Lecture Stat GS

    79/97

    Summary of the Different Measures of Central

    TendencyMeasure of

    Central

    Tendency

    Definition Data Requirement Existence/Uniquene

    ss

    Takes into

    account

    every

    value?

    Affected

    by

    Outliers?

    Can treat

    formula

    algebraic

    ally

    Mean

    center of

    mass

    Sum of all

    the values in

    the

    collection

    divided bythe total

    number of

    elements in

    the

    collection

    At least interval

    scale and values

    that are close to

    each other

    Always exist

    s/Always unique

    Yes Yes Yes

    Mediancenter of

    the array

    Divides thearray into

    two equal

    parts

    At least ordinalscaleAlwaysexists/Always

    Unique

    No No No

    Mode

    typical

    value

    Most

    frequent

    value

    Even if nominal

    scale only

    Might not

    exist/Not always

    unique

    No No No

  • 8/3/2019 Lecture Stat GS

    80/97

    Measures of Location

    A measure of location provides us information

    on the percentage of observations in the

    collection whose values are less than or equal

    to it. We also commonly refer to thesemeasures of location as Quantiles or Fractiles.

    The three measures of location

    Percentiles, Quartiles, and Deciles

  • 8/3/2019 Lecture Stat GS

    81/97

    Measures of Location

    Percentiles divide the ordered observations into

    100 equal parts.

    Quartiles divide the ordered observations into 4

    equal parts.

    Deciles divide the ordered observations into 10

    equal parts.

  • 8/3/2019 Lecture Stat GS

    82/97

    Measures of Dispersion

    A Measure of Dispersion is a descriptive summarymeasure that help us characterize the data set in termsof how varied the observations are from each other.This measure allows us to determine the degree of

    dispersion of the observations about the center of thedistribution. If its value is small, then this indicates thatthe observations are not too different from each otherso that there is a concentration of observations about

    the center. On the other hand, if its value is large, thenthis indicates that the observations are very differentfrom each other so that they are widely spread outfrom the center.

  • 8/3/2019 Lecture Stat GS

    83/97

  • 8/3/2019 Lecture Stat GS

    84/97

    Measures of Dispersion

    TheVariance andStandardDeviation

    The variance is a measure of dispersion so we can use it todescribe the variation of the measurements in thecollection. It is defined as the average squared deviation or

    difference of each observation from the mean. The squareddifference of an observation from the mean gives us anidea on how close this observation is to the mean. A largesquared difference indicates that the observation and themean are far from each other while a small squareddifference indicates that the observation and the mean areclose to each other. In fact, when the squared difference iszero (o) then this implies that the observation and themean are equal to each other.

  • 8/3/2019 Lecture Stat GS

    85/97

    Measures of Dispersion

    We can also use the variance to determine if themean is a good measure of central tendency. Asmall variance indicates that the observations arehighly concentrated about the mean so that it is

    appropriate to use the mean to represent all ofthe values in the collection. Whereas, if thevariance is large then this indicate that, on theaverage, the observations are far or very differentfrom the mean. In this case, we cannot considerthe mean as a good measure of central tendencybecause it will not be suitable representative ofall values in the collection.

  • 8/3/2019 Lecture Stat GS

    86/97

    Measures of Dispersion

    The CoefficientofVariation

    is a measure of relative variation. We can use

    it to compare the variability of two or more data

    sets even if they have different means or different

    units of measurement because the coefficient of

    variation has no unit.

    The coefficientofvariation (CV) is the ratio ofthe standard deviation to the mean, expressed as

    a percentage.

  • 8/3/2019 Lecture Stat GS

    87/97

    Measures of Shape: Skewness and Kurtosis

    A measure of skewness is a single value that indicates the degreeand direction of asymmetry.

    If it is possible to divide the histogram at the center into two

    identical halves, wherein each half is a mirror image of the

    other, then the distribution is called a symmetricdistribution.

    Otherwise, it is called a skeweddistribution.

    Measures of Shape: Skewness and

  • 8/3/2019 Lecture Stat GS

    88/97

    Measures of Shape: Skewness and

    Kurtosis

    If the concentration of the values is at left-end ofthe distribution and the upper tail of thedistribution stretches out more than the lowertail, then the distribution is said to be positivelyskewedor skewedto the right. Conversely, if theconcentration of the values is at the right-end ofthe distribution and the lower tail of thedistribution stretches out more than the uppertail, then the distribution is said to be negativelyskewedor skewedto theleft.

    Measures of Shape: Skewness and

  • 8/3/2019 Lecture Stat GS

    89/97

    Measures of Shape: Skewness and

    Kurtosis

    Skewed to the right

    Measures of Shape: Skewness and

  • 8/3/2019 Lecture Stat GS

    90/97

    Measures of Shape: Skewness and

    Kurtosis

    Skewed to the left

    Measures of Shape: Skewness and

  • 8/3/2019 Lecture Stat GS

    91/97

    Measures of Shape: Skewness and

    Kurtosis

    Interpreting If skewness is positive, the data are positively skewed or skewed right,

    meaning that the right tail of the distribution is longer than the left. Ifskewness is negative, the data are negatively skewed or skewed left,meaning that the left tail is longer.

    If skewness = 0, the data are perfectly symmetrical. But a skewness ofexactly zero is quite unlikely for real-world data, so how can you interpretthe skewness number? Bulmer, M. G., Principles ofStatistics (Dover,1979) a classic suggests this rule of thumb:

    If skewness is less than 1 or greater than +1, the distribution is highlyskewed.

    If skewness is between 1 and or between + and +1, the distributionismoderately skewed.

    If skewness is between and +, the distribution is approximatelysymmetric.

    Mean = Median = Mode ---- Symmetric Distribution

    Mean > Median > Mode ----- Positively Skewed

    Mean < Median < Mode ------ Negarively Skewed

    Measures of Shape: Skewness and

  • 8/3/2019 Lecture Stat GS

    92/97

    Measures of Shape: Skewness and

    Kurtosis

    Karl Pearson (1857-1936), the founder of

    biometrics and a major contributor to the

    theory of modern applied statistics, coined

    the term kurtosis in the 1905. The term camefrom the Greek word kurtos, meaning

    convex. Pearson used it to describe the shape

    of the hump of a relative frequencydistribution as compared to the normal

    distribution.

    Measures of Shape: Skewness and

  • 8/3/2019 Lecture Stat GS

    93/97

    Measures of Shape: Skewness and

    Kurtosis

    Mesokurtic the hump is the same as the normalcurve. It is neither too flat nor too peak. ( SectionC)

    Leptokurtic the curve is more peaked and the

    hump is narrower or sharper than the normalcurve. The prefix lepto came from the Greekleptos meaning small or thin. (Section E)

    Platykurtic the curve is less peaked and the hump

    is flatter than the normal curve. The prefixplaty came from the Greek word platusmeaning wide or flat. (Section D)

    Measures of Shape: Skewness and

  • 8/3/2019 Lecture Stat GS

    94/97

    Measures of Shape: Skewness and

    Kurtosis

  • 8/3/2019 Lecture Stat GS

    95/97

    Sampling Distributions

    Basic Concepts

    In Inferential Statistics, we come up with

    generalizations about the population using the

    information that we collect from a sample. We

    will require this sample to be a random

    sample.

  • 8/3/2019 Lecture Stat GS

    96/97

    Sampling Distributions

    The samplingdistributionofa statisticis its

    probability distribution.

    discrete- pmf

    continuous pdf

    The standarddeviationof a statistic is called its

    standarderror.

  • 8/3/2019 Lecture Stat GS

    97/97

    Tests of Hypotheses

    Basic Concepts in Testing Statistical Hypotheses

    the first step in hypothesis testing is to identify andstate the statistical hypotheses to be tested.

    Astatisticalhypothesis is a conjecture concerning one

    or more populations whose veracity can be establishedusing sample data. The Null Hypothesis,denoted asHo, is a statistical hypothesis which the researcherdoubts to be true. The Alternative Hypothesis,denoted as Ha,is the operational statement of the

    theory that the researcher believes to be true andwishes to prove and is contradiction of the nullhypothesis.