RM_5___6_Group_no_10_

download RM_5___6_Group_no_10_

of 20

Transcript of RM_5___6_Group_no_10_

  • 8/2/2019 RM_5___6_Group_no_10_

    1/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 1

    Group 10

    Roll No. Name

    8 Sarvesh Desai

    17 Pooja Gupta

    24 Nilesh Jadhav41 Rupesh Phalke

    55 Venugopalan Swaminathan

    RM Assignment: RM 5

    Q1 Differentiate between following,

    1. Parameter and statistic.2. Level of significance and level of confidence.3. Null and Alternate hypothesis.4. Type-I and type-II error.5.

    One-tailed and two-tailed test of hypothesis

    6. Testing of hypothesis and estimation.7. Point estimate and interval estimate.8. Parametric and non-parametric test of hypothesis9. Z-test and t-test of hypothesis.10.Test of goodness of fit and test of independence, under chi-square test11.1-way ANOVA and 2-way ANOVA.12.Test of confirmation and test of comparison.

    Solution:

    Q1.1 Parameter Statitics

    1 A parameter describes a full population a statistic describes a sample2 A parameter is a property of the

    underlying population distribution

    "statistic" is "a function of a

    sample/observation."

    3 as the sample becomes large,

    approaches the population mean, which

    is a parameter

    the sample mean is a statistic

    Q1.2 Level of Significance Level of confidance

    1 It indicates the likelihood that the

    answer will fall outside that range

    Is the expected % of times that actual

    value will fall with the stated precision

    limits

    2 1% significance level means 99%confidance level

    95% confidance level means 95 chancesin 100 that sample represents true

    condition

    3 It indicates the likelihood that the

    answer will fall outside that range

    Is the expected % of times that actual

    value will fall with the stated precision

    limits

    Q1.3 Null Hypothesis Alternate hypothesis

  • 8/2/2019 RM_5___6_Group_no_10_

    2/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 2

    1 Ho: The finding occurred by chance H1: The finding did not occur by chance

    2 The null hypothesis is then assumed to

    be true unless we find evidence to the

    contrary

    If we find that the evidence is just too

    unlikely given the null hypothesis, we

    assume the alternative hypothesis is

    more likely to be correct.

    Q1.4 Type I error Type II error

    1 Means rejection of hypothesis which

    should have been accepted

    Means accepting the hypothesis which

    should have been rejected

    2 Denoted by alapha Denoted by Beta

    3 Can be controlled by fixing it lower It depends on the type I error

    Q1.5 One tailed Hypothesis two tailed hyopthesis

    1 Rejection/Acceptance area only on one

    side

    Rejection/Acceptance area only on two

    side

    2

    Q1.6 Testing of Hypothesis Estimation of Hypothesis

    Hypothesis testing is carried out fortesting of the assumed criteria

    Population parameters are unknown sohas to be estimated from sample

    Q1.7 Point Estimate Interval Estimate

    The esitmate of a population parameter

    may be one single value or it could be a

    range

    Estimation of the parameter is not

    sufficient. It is necessary to analyse and

    see how confident we can be about this

    particular estimation. One way of doing

    it is defining confidence intervals. If we

    have estimated q we want to know if the

    true parameter is close to our

    estimate. In other words we want to

    find an interval that satisfies following

    relation:

    as the name suggests is the estimation

    of the population parameter with one

    number

  • 8/2/2019 RM_5___6_Group_no_10_

    3/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 3

    Q1.8 Parameric test of hypotesis Non parameteric test of hypotesis

    1 The observations must be independent Observations are independent

    2 The observations must be drawn from

    normally distributed populations

    Variable under study has underlying

    continuity

    3 These populations must have the same

    variances4 The means of these normal and

    homoscedastic populations must be

    linear combinations of effects due to

    columns and/or rows*

    Q1.9 Z test T test

    1 Z-test is a statistical hypothesis test that

    follows a normal distribution

    T-test follows a Students T-distribution

    2 Z-test is appropriate when you are

    handling moderate to large samples (n >

    30).

    A T-test is appropriate when you are

    handling small samples (n < 30)

    3 Z-test will often require certain

    conditions to be reliable.

    T-test is more adaptable than Z-test

    4 Z-tests are not commonly used than T-

    tests

    T-tests are more commonly used than Z-

    tests

    Q

    1.10

    Test of goodness of fit under chi sqaure Test of independence under chi sqaure

    1 A goodness-of-fit test is a one variable

    Chi-square test.

    A test of independence is a two variable

    Chi-square test

    2 the goal of a Chi-square goodness-of-fit

    test is to determine whether a set of

    frequencies or proportions is similar to

    and therefore fits with a hypothesized

    set of frequencies or proportions

    the goal of a two-variable Chi-square is

    to determine whether or not the first

    variable is related toor independent

    ofthe second variable

    3 A Chi-square goodness-of-fit test is like

    to a one-sample t-test

    A two variable Chi-square test or test of

    independence is similar to the test for

    an interaction effect in ANOVA

    4 It determines if a sample is similar to,

    and representative of, a population.

    Is the outcome in one variable related to

    the outcome in some other variable

    Q1.11 1 way ANOVA 2 Way ANOVA

    1 The purpose of one way Anova is to

    verify whether the data collected from

    different sources converge on a

    common mean

    purpose of the two way Anova is to

    verify whether the data collected from

    different sources coverage on a

    common mean based on two categories

    of defining characteristics

  • 8/2/2019 RM_5___6_Group_no_10_

    4/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 4

    2 one way Anova is find out whether the

    groups carried out the same procedures

    in conducting research

    Anova is used in the comparison of

    treatment means. This involves the

    introduction of randomized block

    design. The experiment conducted in

    the case of two way Anova gets split

    normally into many mini experiments. In

    short it can be said that the two way

    Anova is employed for a design with two

    or more treatment means that can be

    called factorial designs.

    Q

    1.12

    Test of confirmation Test of comparision

    1

    2

  • 8/2/2019 RM_5___6_Group_no_10_

    5/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 5

    Q2 State whether following statements are true or false, giving reasons,

    1) Level of significance is type-I error.2) In 1-way ANOVA, we need all samples to be of equal size.3) Point estimate is often insufficient because it is either right or wrong.4) In Z distribution , area contained between + / - 3* standard deviation is equal to

    100%.

    5) In fixing critical value of t, we need to specify level of significance or degrees offreedom or one/two tailed.

    6) All tests of hypothesis are repetitive and hence universal.7) If the test fails to support null hypothesis, it also, indicates why test fails.8) ( 1 beta error ) is called power of test.9) 1% level of significance gives greater confidence to decision maker than 5% level of

    significance.

    10)In 1-way ANOVA, if F calculated is lesser than 1, it means the factor whichdifferentiates columns is the strong reason explaining variation in data.

    11)If all data values are increased by 5, ANOVA inference drawn earlier will change.12)Client is supposed to give beta error to researcher in advance.13)In chi-square test, we want to confirm whether chi-square value is zero or not.14)Level of significance is rejection area under the sampling distribution beyond critical

    value of test statistic

    15)Good hypothesis can result into type-II error only.16)Alternate hypothesis can decide whether test is one tailed or two tailed in case of

    large sample Z test.

    17)Randomised block experimental design results into one-way ANOVA.18)Difference between sample statistic and population parameter is always significant.19)We use chi-square test of goodness of fit on nominal data 2-way classified.20)Latin square experimental design will lead to 3-way ANOVA

    Solution:

    Q2 State whether following

    statements are true or false,

    giving reasons

    Answer Reason

    Q 2 .1 Level of significance is type-I

    error.

    TRUE Level of significance indicates most

    likelihood to reject the hypothesis

    though its true which is Type-I error

    Q 2 .2 In 1-way ANOVA, we need all

    samples to be of equal size.

    FALSE Not necessary. 1-way ANOVA can

    result for unequal sample size also

    Q 2 .3 Point estimate is often

    insufficient because it is either

    right or wrong.

    TRUE Point estimate gives one value

    which can be right or wrong where

    interval gives range to check answer

    Q 2 .4 In Z distribution , area contained

    between + / - 3* standard

    deviation is equal to 100%.

    FALSE In Z distribution, area contained

    between +/-3* SD is 99.87%

    Q 2 .5 In fixing critical value of t, we

    need to specify level of

    significance or degrees of

    freedom or one/two tailed.

    TRUE To fix critical value of 't', we need to

    specify LOS, DOF, one/tqo tailed.

  • 8/2/2019 RM_5___6_Group_no_10_

    6/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 6

    Q 2 .6 All tests of hypothesis are

    repetitive and hence universal.

    TRUE When sample changes, we need to

    repeat thst of hypothesis

    Q 2 .7 If the test fails to support null

    hypothesis, it also, indicates why

    test fails.

    FALSE No. It does no tell why test fails

    Q 2 .8 ( 1 beta error ) is called power

    of test.

    TRUE 1-beta error is type-II error in which

    False H0 is accepted.

    Q 2 .9 1% level of significance gives

    greater confidence to decision

    maker than 5% level of

    significance.

    TRUE 1% LOS is 99% confidence level

    which means 99% confidence level

    is > 95% confidence level

    Q 2 .10 In 1-way ANOVA, if F calculated

    is lesser than 1, it means the

    factor which differentiates

    columns is the strong reason

    explaining variation in data.

    TRUE Yes. 'F' calculated is lesser than 1

    explains variation in data with

    strong reason

    Q 2 .11 If all data values are increased by

    5, ANOVA inference drawn

    earlier will change.

    FALSE

    Q 2 .12 Client is supposed to give beta

    error to researcher in advance.

    TRUE Researcher should know the client

    expected success rate

    Q 2 .13 In chi-square test, we want to

    confirm whether chi-square value

    is zero or not.

    TRUE

    Q 2 .14 Level of significance is rejection

    area under the sampling

    distribution beyond critical value

    of test statistic

    TRUE LOS indicates the % failure in test

    statistic

    Q 2 .15 Good hypothesis can result into

    type-II error only.

    TRUE Here False H0 is accepted, indicating

    failures are accepted hence good

    hypothesis

    Q 2 .16 Alternate hypothesis can decide

    whether test is one tailed or two

    tailed in case of large sample Z

    test.

    TRUE Alternate hypothesis tells the

    Q 2 .17 Randomised block experimental

    design results into one-way

    ANOVA.

    FALSE CR results into one way ANOVA

    Q 2 .18 Difference between samplestatistic and population

    parameter is always significant.

    FALSE Lets say population has seasonalityfactor and while if the sampling is

    not done proper way, your sample

    statistic and population parameter

    can be different.

    Q 2 .19 We use chi-square test of

    goodness of fit on nominal data

    2-way classified.

    TRUE

  • 8/2/2019 RM_5___6_Group_no_10_

    7/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 7

    Q 2 .20 Latin square experimental design

    will lead to 3-way ANOVA

    TRUE

  • 8/2/2019 RM_5___6_Group_no_10_

    8/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 8

    Q3 State whether following statements are true or false, giving reasons

    1. Partial correlation analysis is same as multiple correlation analysis.2. If byx = 0.8, bxy = - 0.2, hence r = - 0.4.3. If byx = 0.8,bxy = 1.6, hence r = 1.13.4. byx and bxy must be less than 1, always.5. y = a + bx this equation can be used to estimate value of x for a given value of y

    always.

    6. If two regression lines are perpendicular to each other., correlation coefficient is 1

    7. If r =0.7, amount of variation in y because of x is 70 %.8. Coefficient of determination can be negative sometimes.9. If one variable is constant, correlation between x and y is positive perfect.10.If coefficient of determination is less, stronger will be relationship between x and

    y.

    11.Coefficient of indetermination and standard error of estimate are same inconcepts.

    12.Variance and co-variance mean the same thing.13.If correlation coefficient between x and y is 0.90, this definitely proves that

    relationship is always causal.

    14.If two regression lines coincide, coefficient of correlation is always +1.15.Intersection of two regression lines is the mean of each variable.

    Solution:

    Q3 State whether following

    statements are true or false, giving

    reasons

    TRUE

    /

    FALSE

    Reason

    Q 3.1 Partial correlation analysis is same

    as multiple correlation analysis.

    FALSE Partial correlation measures the

    effect of its independent variable on

    the dependent variable whereas

    multiple correlation takes into

    account two independent and one

    dependent variable.

    Q 3.2 If byx = 0.8, bxy = - 0.2, hence r = -

    0.4.

    TRUE r=(0.8*0.2) = hence r0.16= - 0.4

    Q 3.3 If byx = 0.8,bxy = 1.6, hence r =

    1.13.

    TRUE (.0.8*1.6) r = 1.28 r= 1.13

    Q 3.4 byx and bxy must be less than 1,

    always.

    TRUE

    Q 3.5 y = a + bx this equation can be used

    to estimate value of x for a given

    value of y always.

    TRUE

    Q 3.6 If two regression lines are

    perpendicular to each other.,

    correlation coefficient is 1

    TRUE

    Q 3.7 If r =0.7, amount of variation in y

    because of x is 70 %.

    TRUE

  • 8/2/2019 RM_5___6_Group_no_10_

    9/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 9

    Q 3.8 Coefficient of determination can be

    negative sometimes.

    TRUE negative values of R2 may occur

    when fitting non-linear trends to

    data.

    Q 3.9 If one variable is constant,

    correlation between x and y is

    positive perfect.

    FALSE

    Q 3.10 If coefficient of determination is

    less, stronger will be relationship

    between x and y.

    FALSE

    Q 3.11 COefficient of indetermination and

    standard error of estimate are same

    in concepts.

    FALSE

    Q 3.12 Variance and co-variance mean the

    same thing.

    FALSE

    Q 3.13 If correlation coefficient between x

    and y is 0.90, this definitely proves

    that relationship is always causal.

    FALSE

    Q 3.14 If two regression lines coincide,

    coefficient of correlation is always

    +1.

    FALSE When r +/- 1, there is exact linear

    relationship between X & Y and two

    regression lines coincides with each

    other.

    Q 3.15 Intersection of two regression lines

    is the mean of each variable.

    TRUE Two regression lines always

    intersect each other at point mean

    of X and mean of Y

  • 8/2/2019 RM_5___6_Group_no_10_

    10/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 10

    Q4 Explain importance of following in statistical analysis (Under what circumstances will

    you recommend following in analyzing data collected?

    1. Mode as measure of central tendency.2. Coefficient of variation3. Interquartile range.4. Measures of skewness and kurtosis5. Syx : standard error of estimate of y because of x.6. Coefficient of determination ( r2)7. Co-variance in bivariate analysis8. Interval estimate.9. Classification, tabulation, presentation of data10.Frequency curve and histogram11.Correlation and regression analysis12.Yules coefficient of association

    Solution:

    1) Mode as measure of central tendency.The mode is the most frequently occurring value in the data set. The mode in a distribution

    is that item around which there is maximum concentration. In general mode is the size of

    the item which has the maximum frequency.

    For example, in the data set {1,2,3,4,4}, the mode is equal to 4. A data set can have more

    than a single mode, in which case it is multimodal. In the data set {1,1,2,3,3} there are two

    modes: 1 and 3.

    The mode can be very useful for dealing with categorical data. For example, if a sandwich

    shop sells 10 different types of sandwiches, the mode would represent the most popular

    sandwich. The mode also can be used with ordinal, interval, and ratio data. However, in

    interval and ratio scales, the data may be spread thinly with no data points having the same

    value. In such cases, the mode may not exist or may not be very meaningful.

    2) Coefficient of variationThe coefficient of variation measures variability in relation to the mean (or average) and is

    used to compare the relative dispersion in one type of data with the relative dispersion in

    another type of data. The data to be compared may be in the same units, in different units,

    with the same mean, or with different means.

    Suppose you want to evaluate the relative dispersion of grades for two classes of students:

    Class A and Class B. The coefficient of variation can be used to compare these two groups

    and determine how the grade dispersion in Class A compares to the grade dispersion in

    Class B. This is one example of how the coefficient of variation can be applied.

    The coefficient of variation is a calculation built on other calculations -- the standard

    deviation and the mean -- as follows:

    This reads as 'the coefficient of variation is equal to the standard deviation divided by the

    mean, multiplied by 100 (to produce a percentage).

    The steps required for calculating the coefficient of variation are:

  • 8/2/2019 RM_5___6_Group_no_10_

    11/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 11

    Calculate the mean for the data set.

    Calculate the standard deviation.

    Divide the standard deviation by the mean.

    Multiply the result of step 3 by 100.

    3) Interquartile range.The interquartile range (IQR) is the distance between the 75

    thpercentile and the 25

    th

    percentile. The IQR is essentially the range of the middle 50% of the data. Because it uses

    the middle 50%, the IQR is not affected by outliers or extreme values.

    The IQR is also equal to the length of the box in a box plot.

    4) Measures of skewness and kurtosisSkewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution,

    or data set, is symmetric if it looks the same to the left and right of the center point.

    For univariate data Y1, Y2, ..., YN, the formula for skewness is:

    where is the mean, is the standard deviation, and N is the number of data points. The

    skewness for a normal distribution is zero, and any symmetric data should have a skewness

    near zero. Negative values for the skewness indicate data that are skewed left and positive

    values for the skewness indicate data that are skewed right. By skewed left, we mean that

    the left tail is long relative to the right tail. Similarly, skewed right means that the right tail is

    long relative to the left tail. Some measurements have a lower bound and are skewed right.

    For example, in reliability studies, failure times cannot be negative.

    Kurtosis is a measure of whether the data are peaked or flat relative to a normal

    distribution. That is, data sets with high kurtosis tend to have a distinct peak near the mean,

    decline rather rapidly, and have heavy tails. Data sets with low kurtosis tend to have a flattop near the mean rather than a sharp peak. A uniform distribution would be the extreme

    case

    For univariate data Y1, Y2, ..., YN, the formula for kurtosis is:

    where is the mean, is the standard deviation, and N is the number of data points.

    5) Syx : standard error of estimate of y because of x.Let us consider yest as the estimated value ofy for a given value ofx. This estimated value

    can be obtained from the regression curve ofy on x From this, the measure of the scatter

    about the regression curve is supplied by the quantity:

  • 8/2/2019 RM_5___6_Group_no_10_

    12/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 12

    The above equation is called the Standard Error of Estimate ofy on x. It is important to note

    that this Standard Error of Estimate has properties analogous to those of standard

    deviation.

    6) Coefficient of determination ( r2)The coefficient of determination, r

    2,is useful because it gives the proportion of

    the variance (fluctuation) of one variable that is predictable from the other variable.

    It is a measure that allows us to determine how certain one can be in making

    predictions from a certain model/graph.

    The coefficient of determination is the ratio of the explained variation to the total

    variation.

    The coefficient of determination is such that 0 < r 2 < 1, and denotes the strength

    of the linear association between x and y.

    The coefficient of determination represents the percent of the data that is the closest

    to the line of best fit. For example, if r = 0.922, then r 2 = 0.850, which means that

    85% of the total variation in y can be explained by the linear relationship between x

    and y (as described by the regression equation). The other 15% of the total variation

    in y remains unexplained.

    The coefficient of determination is a measure of how well the regression line

    represents the data. If the regression line passes exactly through every point on the

    scatter plot, it would be able to explain all of the variation. The further the line is

    away from the points, the less it is able to explain.

    7) Co-variance in bivariate analysis

    8) Interval estimate.An interval estimate is defined by two numbers, between which a population parameter is

    said to lie. For example, a < x < b is an interval estimate of the population mean . It

    indicates that the population mean is greater than a but less than b.

    9) Classification, tabulation, presentation of dataTabulation refers to the systematic arrangement of the information in rows and columns.

    Rows are the horizontal arrangement. In simple words, tabulation is a layout of figures in

    rectangular form with appropriate headings to explain different rows and columns. The

    main purpose of the table is to simplify the presentation and to facilitate comparisons

    "A statistical table is a systematic organisation of data in columns and rows."

    "Tabulation involves the orderly and systematic presentation of numerical data in a formdesigned to elucidate the problem under consideration."

    10)Frequency curve and histogramFrequency curve is obtained by joining the points of frequency polygon by a freehand

    smoothed curve. Unlike frequency polygon, where the points we joined by straight lines, we

    make use of free hand joining of those points in order to get a smoothed frequency curve. It

    is used to remove the ruggedness of polygon and to present it in a good form or shape. We

  • 8/2/2019 RM_5___6_Group_no_10_

    13/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 13

    smoothen the angularities of the polygon only without making any basic change in the

    shape of the curve. In this case also the curve begins and ends at base line, as is in case of

    polygon. Area under the curve must remain almost the same as in the case of polygon.

    A histogram is a way of summarising data that are measured on an interval scale (either

    discrete or continuous). It is often used in exploratory data analysis to illustrate the major

    features of the distribution of the data in a convenient form. It divides up the range of

    possible values in a data set into classes or groups. For each group, a rectangle is

    constructed with a base length equal to the range of values in that specific group, and an

    area proportional to the number of observations falling into that group. This means that the

    rectangles might be drawn of non-uniform height.

    The histogram is only appropriate for variables whose values are numerical and measured

    on an interval scale. It is generally used when dealing with large data sets (>100

    observations), when stem and leaf plots become tedious to construct. A histogram can also

    help detect any unusual observations or any gaps in the data set.

    11)Correlation and regression analysisRegression analysis is the mathematical process of using observations to find the line ofbest

    fitthrough the data in order to make estimates and predictions about the behaviour of the

    variables. This line of best fit may be linear (straight) or curvilinear to some mathematical

    formula.

    Correlation analysis is the process of finding how well (or badly) the line fits the

    observations, such that if all the observations lie exactly on the line of best fit, the

    correlation is considered to be 1 or unity.

    12)Yules coefficient of associationIn order to find the degree of intensity of association between two or more sets of

    attributes, we should work out the coefficient of association , Professor Yules coefficient of

    association

    QAB = {(AB)(ab)-(Ab)(aB)}/{(AB)(ab)+(Ab)(aB)}

    QAB = Yules coefficient of association between attributes A & B

    (AB)=Frequency of class AB in which A & B are present

    (Ab) = Frequency of class Ab in which A is present & B is absent

    (aB) = Frequency of class aB in which A is absent & B is present

    (ab)= Frequency of class ab in which both A & B are absent

  • 8/2/2019 RM_5___6_Group_no_10_

    14/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 14

    RM Assignment: RM 6

    Q1 Differentiate between following

    1. Completely randomized ( CR ) and randomized block ( RB ) experimental design2. Stratified sampling and cluster sampling3. Sampling and non-sampling errors.4. Probability and non-probability sampling.5. Survey and experiment.6. Simple random sampling and systematic sampling.7. Nominal data and ratio data.8. Exploratory and diagnostic research.9. Validity and reliability in attitude measurement.10.Bias and error in research11.Structured and un-structured interview.12.Latin square and factorial experimental design.13.Principle of randomizing and principle of replication.14.Multi-stage sampling and multi-phase sampling.15.Informal experimental and formal experimental design

    Solution:

    Q1.1 Completely randomized ( CR ) Randomized block ( RB ) experimental

    design

    1 It is simple design than RB It is an improvement over CR

    2 Invovles 2 principles Viz the principle of

    replication and the principle of

    randmozation

    Principle of Local control can be applied

    along with the other two principles of

    experimental design

    3 Subjects are randomly assigned to

    experiment treatments

    Subjects are divided into groups-Blocks ,

    such that within each group thesubjectss are relatively homogenous in

    respect to some other variable'

    Is Analsed by 1 way ANOVA Is Analsed by 2 way ANOVA

    Q1.2 Stratified sampling Cluster sampling

    1 If a population from which a sample is

    to be drawn does not constitue a

    homogenous group , stratified sampling

    technique is used

    for bigger samples divide the area into a

    number of smaller non overlapping

    areas and then randomly select a

    number of these smaller areas(Clusters)

    2 Generally used to obtain representative

    sample3 Sampling population is divided into

    several sub -population(Strata) that are

    individually more homogenous than the

    total population then from Stratum

    items are selected for sampling

    Sample is divided in clusters which are

    themselves clusters in themselves

    4 Sample size ni = { n x N1 x si} /{N1 x s1

    +N2 x s2+ ..Ni x si}

  • 8/2/2019 RM_5___6_Group_no_10_

    15/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 15

    5 High cost required Low cost involved

    6 More precise Less Precise

    Q1.3 Probability Sampling Non-probability Sampling

    1 Also known as Random sampling or

    chance sampling

    Also known as deliberate sampling

    2 Every item of universe has eqal chance

    of inclsion in sample

    Organisers of inquiry purposively

    choosw the particular units of the

    universe for constituing a sample on the

    bais that the sma;; ass that they so

    select out of a hufe one will be typical

    or represntative whle

    3 Probability is 1 /NCn Just quota sampling no basis

    Q1.4 Survey Experiment

    1 The process of examing the truth of

    statitical hypothesis relating to someresearch problem is known as an

    experiment.

    2 Two types absolute & comparitive

    3 are conducted in case of descriptive

    reaserch studies

    are part of experimental research

    studies

    4 Larger samples Small samples

    5 Normally used for social & behavioural

    sciences

    used for measure of the effects of an

    experiment which he conducts

    intentionally

    6 Example firld research Example Laboratory research

    Q1.5 Simple random sampling Systematic sampling

    1 Just a random sample Various systemeatic approaches

    2 every entity from universe may become

    a sample

    logic is defined in order to have better

    control on sample

    3 low cost high cost is involved

    Q1.6 Nominal data ratio data

    1 Simply a system of assigning nmber

    symbols to events in order to lable hem.

    has absolute or zero of measurement

    2 conveienet for keeping taracks actual amounts of variables3 only mode is measure of central

    tendancy

    Geometric or harmonic means are used

    as easure of central tendency

    4 Widely used in surveys Used for physical measurement

    Q1.7 Exploratory research Diagnostic research

    1 This is carried out for exploring new

    ideasm with support

    This is carried out for digonising certain

    problem

  • 8/2/2019 RM_5___6_Group_no_10_

    16/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 16

    2 This is general research leading to

    surveys

    This is extensive research involves

    depth study and stattical tools

    3 Low to moderate cost compared to

    Diagonostic research

    High cost compared to exploratory

    research

    Q1.8 Validity in attitude measurement Reliability in attitude measurement

    Q1.9 Bias in Research Error in research

    1 This may impacts the results of the

    research

    This impacts a lot the results of the

    reasearch

    2 This is the attitude This is system related

    Q1.10 Structured interview Un-structured interview

    1 Invovles a set of predetermined

    questions

    Questions are not fixed

    2 Highly standardised techniques of

    recording

    Normal standards for recording

    3 Rigid procedure to intervirew freedom to condct interview

    4 Question order is fixed sometimes Question sequence may be chaged

    Q1 .11 Latin square Factorial experimental design

    1 Very frequenctly used in agricultural

    reasearch

    are used in experiments where the

    effects of varying more than one factor

    are to be determined

    2 Asumption that there is no interaction

    between row factor & coum factors

    There is interractio between row &

    column entity

    3 No of row & columns are required to be

    equal

    more complex problem are been looked

    with multiple rows and columns

    4 Acuuracy us low compared to factorial

    deisgn

    Provide equivalent accuracy with lesss

    labour and as such are a source of

    economy

    Q1 .12 Principle of randomizing Principle of replication

    1

    2

    Q1. 13 Multi-stage sampling Multi-phase sampling

    1 It is further dvelopment of cluster

    sampling

    2 Easier to administer

    3 Large no of units can be sampledfor

    given cost under mutlistsge

  • 8/2/2019 RM_5___6_Group_no_10_

    17/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 17

    Q1. 14 Informal experimental Formal experimental design

    1 of 3 types

    before & after without control design

    After only cotrol design

    Before & after with cotrol design

    of 4 types

    Completely randomized design (CR)

    Rnadomized block design (RB)

    Latin sqauare design (LS)Factorial design

    2 Less sophisticated offer more control

    3 based on differences of magnitude Use precise sratitical procedure for

    analysis

    Q2 Justify following statements

    1. Quota sampling is a non-probability sampling.2. We dont need hypothesis firmed up in diagnostic research.3. Wording of questionnaire can cause ineffective instrument.4. In Latin square experimental design it is assumed that factors are independent ofeach other.5. Stratified sampling method assumes strata to be homogeneous within and

    heterogeneous between.

    6. Convenience sampling is a method of probability sampling.7. Semantic differential scale requires identifying bi-polar adjectives describing the

    object.

    8. Likert scale is a summative model for attitude measurement.9. Principle of replication in experimental design is aimed at increasing statistical

    accuracy

    10.Principle of local control in experimental design is identifying effect of known sourceof variation in data.11.Non-sampling errors cannot be totally avoided in research.

    12.Word association test is a projective method of data collection.13. Defining the problem involves in identifying unit of analysis and characteristic of

    interest, time and space references and environmental conditions.

    14.Projective methods of data collection are used for inferred characteristics15.On ordinal data, we can do all mathematical operations.16.Optimal sample size is based on degree of accuracy and level of confidence

    expected.

    17.Cluster sampling needs each cluster to be homogeneous between andheterogeneous within.

    18.Systematic sampling is not truly probability sampling.19. Parameters of quality data are same whether it is primary data or secondary data.

    20.We firm up hypothesis based on exploratory, descriptive and diagnostic research.Solution:

    1) Quota sampling is a non-probability sampling.The first step in non-probability quota sampling is to divide the population into exclusive

    subgroups. Then, the researcher must identify the proportions of these subgroups in the

    population; this same proportion will be applied in the sampling process. Finally, the

  • 8/2/2019 RM_5___6_Group_no_10_

    18/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 18

    researcher selects subjects from the various subgroups while taking into consideration the

    proportions noted in the previous step. The final step ensures that the sample is

    representative of the entire population. It also allows the researcher to study traits and

    characteristics that are noted for each subgroup. So in quota sampling the probability is not

    considered hence it is called non probability sampling.

    2) We dont need hypothesis firmed up in diagnostic research.Since DR aims to identify causes of a problem and its possible solutions.

    3) Wording of questionnaire can cause ineffective instrument.Wording and order of questions, ensures that each respondent receives the same

    stimuli, else the purpose of the survey will not get serve

    4) In Latin square experimental design it is assumed that factors are independent ofeach other.

    A Latin square is used in experimental designs in which one wishes to compare

    treatments and to control for two other known sources of variation. It was recognized

    that within a eld there would be fertility trends running both across the eld and up

    and down the eld. So in an experiment to test, say, four different fertilizers, A, B, C and

    D, the eld would divided into four horizontal strips and four vertical strips, thus

    producing 16 smaller plots. A Latin square design will give a random allocation of

    fertilizer type to a plot in such a way that each fertilizer type is used once in each

    horizontal strip (row) and once in each vertical strip (column).

    5) Stratified sampling method assumes strata to be homogeneous within andheterogeneous between.

    6) Convenience sampling is a method of probability sampling.Convenience sampling is a non-probability sampling technique where subjects are

    selected because of their convenient accessibility and proximity to the researcher.

    7) Semantic differential scale requires identifying bi-polar adjectives describing theobject.

    Yes, Semantic differential is a type of a rating scale designed to measure the connotative

    meaning of objects, events, and concepts.

    8) Likert scale is a summative model for attitude measurement.Likert (1932) developed the principle of measuring attitudes by asking people to respond

    to a series of statements about a topic, in terms of the extent to which they agree with

    them, and so tapping into the cognitive and affective components of attitudes.

    9) Principle of replication in experimental design is aimed at increasing statisticalaccuracy

    Measurements are usually subject to variation and uncertainty. Measurements are

    repeated and full experiments are replicated to help identify the sources of variation, to

    better estimate the true effects of treatments, to further strengthen the experiment's

    reliability and validity, and to add to the existing knowledge of about the topic.[13]

  • 8/2/2019 RM_5___6_Group_no_10_

    19/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 19

    However, certain conditions must be met before the replication of the experiment is

    commenced: the original research question has been published in a peer-reviewed

    journal or widely cited, the researcher is independent of the original experiment, the

    researcher must first try to replicate the original findings using the original data, and the

    write-up should state that the study conducted is a replication study that tried to follow

    the original study as strictly as possible.

    10)Principle of local control in experimental design is identifying effect of known sourceof variation in data.

    Local control refers to grouping of the experimental units in such a way that the units

    within a group (i.e., block) are more homogeneous than are units in different groups.

    The experimental materials or conditions are more alike within a group. Thus, the

    variation among experimental units within a group is less than the variation would have

    been without grouping

    11)Non-sampling errors cannot be totally avoided in research.Non-sampling errors are part of the total error that can arise from doing a statistical

    analysis. The remainder of the total error arises from sampling error. Unlike sampling

    error, increasing the sample size will not have any effect on reducing non-sampling

    error. Unfortunately, it is virtually impossible to eliminate non-sampling errors entirely.

    12)Word association test is a projective method of data collection.Word Association Test: An individual is given a clue or hint and asked to respond to the

    first thing that comes to mind. The association can take the shape of a picture or a word.

    There can be many interpretations of the same thing. A list of words is given and you

    dont know in which word they are most interested

    13)Defining the problem involves in identifying unit of analysis and characteristic ofinterest, time and space references and environmental conditions.

    14)Projective methods of data collection are used for inferred characteristicsThis holds that an individual puts structure on an ambiguous situation in a way that

    is consistent with their own conscious & unconscious needs

    15)On ordinal data, we can do all mathematical operations.Ordinal data is second level of measurement therefore The experimental (scientific)

    method depends on physically measuring things. The concept of measurement has been

    developed in conjunction with the concepts of numbers and units of measurement.

    Statisticians categorize measurements according to levels. Each level corresponds to

    how this measurement can be treated mathematically

    16)Optimal sample size is based on degree of accuracy and level of confidenceexpected.

    17)Cluster sampling needs each cluster to be homogeneous between andheterogeneous within.

  • 8/2/2019 RM_5___6_Group_no_10_

    20/20

    RESEARCH METHODOLOGY : MFM SEM II GROUP 10

    [Type text] Page 20

    Common motivation for cluster sampling is to reduce the average cost per interview.

    Given a fixed budget, this can allow an increased sample size.

    18)Systematic sampling is not truly probability sampling.Systematic sampling is still thought of as being random, as long as the periodic interval is

    determined beforehand and the starting point is random, For example, if you wanted to

    select a random group of 1,000 people from a population of 50,000 using systematic

    sampling, you would simply select every 50th person, since 50,000/1,000 = 50.

    19)Parameters of quality data are same whether it is primary data or secondary data.Data that has been collected from first-hand-experience is known as primary data.

    Primary data has not been published yet and is more reliable, authentic and objective.

    Primary data has not been changed or altered by human beings, therefore its validity is

    greater than secondary data. The review of literature in nay research is based on

    secondary data. Nostly from books, journals and periodicals.

    20)We firm up hypothesis based on exploratory, descriptive and diagnostic research