Statistics

download Statistics

of 25

description

Statistics problems and solutions

Transcript of Statistics

  • STATISTICS ASSIGNMENT NO 1

    SUBMITTED TO PROFESSOR SACHIN S KAMBLE

    NITIE MUMBAI

    ON 10TH

    JULY 2012

    Prepared by:

    Rahul Ranganathan (121)

    Rajiv V (125)

    Prakash Sethu (107)

    Prateek Kumar Kureel (108)

    Pruthvi Raj (114)

  • Statistics Assignment No 1 Page 2

    Data Collected:

    The data with which we are going to do statistical analysis is: Consumption of Conventional Energy in

    India (Peta Joules).

    Dataset 1

    Year Energy consumption (Peta Joules)

    1970-71 3860

    1975-76 5074

    1980-81 6394

    1985-86 9471

    1990-91 13313

    1995-96 18188

    2000-01 22198

    2005-06 28299

    2006-07 31040

    2007-08 34428

    2008-09 36330

    2009-10 40353

    2010-11 42664

    Data Brief:

    The data collected for consumption of conventional energy was reported in the 19th

    issue of Energy

    Statistics (for the year 2012) released by the Central Statistics Office, Government of India. The

    above data in itself is collected in collaboration with the below entities:

    1. Office of Coal Controller, Ministry of Coal

    2. Ministry of Petroleum & Natural Gas

    3. Central Electricity Authority

    Why the data was maintained by the source and its importance?

    The Central Statistics Office, as part of the Ministry of Statistics and Programme Implementation in

    India maintains data not just with respect to energy but also industrial, social and price statistics.

    Data released periodically with respect to CPI and WPI index, IIP index and energy statistics from the

    Central Statistics Office provide the basis for decision making for government policy and for the

    government to track progress and take a stance.

    The data for consumption of conventional energy sources was published as part of a wider report

    covering both renewable and non-renewable sources of energy and to monitor trends in

    consumption and future outlook. For instance, the particular data used for this assignment provides

    valuable insight into the rate at which the energy consumption is increasing in India over the recent

    past.

  • Statistics Assignment No 1 Page 3

    Type of data:

    The type of data used for analysis is NUMERICAL data.

    Statistical Analysis:

    Concept Name: Frequency Distribution

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    X-Axis: Class intervals

    Y-Axis: The frequency of occurrence of the value in the corresponding class interval

    Formula and calculation steps:

    We will use Dataset 1 to construct a frequency distribution chart. With the range of data from 0 to

    around 43000 Peta Joules, the class intervals are formed with an interval of 8000 as shown below.

    The frequency distribution table thus looks:

    Class Intervals Frequency

    0-8000 3

    8001-16000 2

    16001-24000 2

    24001-32000 2

    32001-40000 2

    40001-48000 2

    Total 13

    And the frequency distribution chart:

  • Statistics Assignment No 1 Page 4

    Findings and Interpretation of results:

    With the frequency distribution chart, we can identify that the data for energy consumption is

    uniformly spread out across all class intervals defined.

    Concept Name: Relative Frequency

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    X-Axis- Class intervals

    Y-Axis- The relative frequency of occurrence of the value in the corresponding class interval

    Formula and calculation steps:

    We will use Dataset 1, present in page 2 to construct a relative frequency distribution chart. Building

    on the frequency distribution table, we can categorize the relative frequency of occurrence of data

    in each interval as follows:

    Class Intervals Frequency Relative frequency

    0-8000 3 0.23

    8001-16000 2 0.15

    16001-24000 2 0.15

    24001-32000 2 0.15

    32001-40000 2 0.15

    40001-48000 2 0.15

    Total 13 1.00

    And the relative frequency plot of the data:

  • Statistics Assignment No 1 Page 5

    Findings and Interpretation of results:

    We can interpret that the relative frequency of occurrence of data has been consistent across all

    class intervals.

    Concept Name: Cumulative Frequency Distribution

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    X-Axis- Class intervals

    Y-Axis- The cumulative frequency of occurrence of the value in the corresponding class interval

    Formula and calculation steps:

    We will use Dataset 1, present in page 2 to construct a cumulative frequency distribution chart.

    Building on the frequency distribution table, we can categorize the cumulative frequency of

    occurrence of data in each interval as follows:

    Class

    Intervals

    Cumulative

    frequency

    0-8000 3

    8001-16000 5

    16001-24000 7

    24001-32000 9

    32001-40000 11

    40001-48000 13

    Total 13

    And the cumulative frequency distribution chart will look like:

  • Statistics Assignment No 1 Page 6

    Findings and Interpretation of results:

    As we can see, there is progressive increase in the frequency of data occurrence in between various

    class intervals, and finally reaches the zenith in the last class interval.

    Concept Name: Histogram

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    X-Axis- The class mid-point is chosen as the X axis variable for the histogram

    Y-Axis- The frequency of occurrence in that particular class is used as the Y axis.

    Formula and calculation steps:

    Class mid-point is obtained by finding the mean of the two end values of the class, and thus the

    histogram table is obtained:

    Class Mid-point Frequency

    4000 1

    12000 3

    20000 2

    28000 1

    36000 3

    44000 3

    More 0

    And the histogram will look like:

  • Statistics Assignment No 1 Page 7

    Findings and Interpretation of results:

    Histogram represents the frequencies of values present in a particular class represented by the class

    mid-point. If for a particular class, there are no occurrences of data, there will be no bar for that

    class. In our example its the class More which has no entries!

    Concept Name: Frequency Polygon

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    X-Axis- The class range is chosen in the X axis.

    Y-Axis- The frequency of occurrence is represented in the Y axis.

    Formula and calculation steps:

    We will use Dataset 1 to construct a frequency distribution chart. With the range of data from 0 to

    around 43000 Peta Joules, the class intervals are formed with an interval of 8000 as shown below.

    The frequency distribution table thus looks:

    Class Intervals Frequency

    0-8000 3

    8001-16000 2

    16001-24000 2

    24001-32000 2

    32001-40000 2

    40001-48000 2

    Total 13

    And the frequency polygon:

  • Statistics Assignment No 1 Page 8

    Findings and Interpretation of results:

    With the frequency polygon, we can identify that the data is uniformly spread out across all class

    intervals.

    Concept Name: Ogive

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    X-Axis- The class range is chosen in the X axis.

    Y-Axis- The frequency of occurrence is represented in the Y axis.

    Formula and calculation steps:

    An ogive is a derivative form of frequency polygon chart. Ogive is drawn like a cumulative frequency

    polygon chart. There are two kinds of ogives- Less than Ogive and More than Ogive

  • Statistics Assignment No 1 Page 9

    This is plotted by using the frequency distribution table that we have been using. The difference is

    that the type of information that can be derived from an Ogive. Ogive tells us about the number of

    values less than/ more than a particular value.

    Findings and Interpretation of results:

    An Ogive will give us information about how many data points are less than/ greater than a

    particular value is present. For example, in the Less than Ogive, it says that there are 5 values

    which are less than 8000 in the data that we have.

    Concept Name: Pie Chart

    Selection of Variable: Break-up of the Total Energy Consumption from conventional sources in 2011

    The highest value that we have in our data is 42664 Peta Joules. This number represents the

    Consumption of Conventional Energy in India in the year 2011. It can be divided into 4 parts- Coal &

    Lignite, Crude Petroleum, Natural Gas and Electricity.

    Pie chart can be drawn for the classification of the consumption.

    Formula and calculation steps:

    With 42664 representing the consumption of Conventional Energy in a whole, the 4 sub heads- Coal

    & Lignite, Crude Petroleum, Natural Gas and Electricity will represent a part of the whole circle,

    based on the amount of consumption in each fields:

    Coal &

    Lignite

    Crude

    Petroleum

    Natural

    Gas Electricity

    10179 8632 1974 21879

  • Statistics Assignment No 1 Page 10

    Findings and Interpretation of results:

    With the pie chart we can identify the pattern of consumption in various heads. For example in the

    above pie, we can say more than 50% of the energy consumption is in the form of Electricity which

    includes thermal, hydro & nuclear electricity from utilities.

    Concept Name: Stem and Leaf Diagram

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    Formula and calculation steps:

    In order to create the Stem & Leaf Diagram, the data points are arranged with a power of 10000 as

    the base. Hence, for example, the first data point of 3860 Peta Joules has a stem of 0 and a leaf of 4

    with 3860 rounded off to 4000.

    In accordance with the above step, the Stem and Leaf Diagram and the associated plot are

    represented below:

    Stem Leaves

    0 4 5 6 9

    1 3 8

    2 2 8

    3 1 4 6

    4 0 3

  • Statistics Assignment No 1 Page 11

    Findings and Interpretation of results:

    The Stem and Leaf Diagram provides a combination of inferences in terms of a distribution of

    frequencies and the data points in the original data set.

    Concept Name: Pareto Diagram

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    X-Axis- The year which is considered

    Y-Axis- For a Pareto diagram, Y axis is used to denote both the Consumption per year and the

    cumulative percentage of the consumption

    Formula and calculation steps:

    First, the data needs to be arranged in the descending order of consumption and the corresponding

    cumulative percentage contribution is obtained.

    S No Consumption Qty Relative Distribution Cumulative %

    1 42664 0.146303993 14.6303993

    2 40353 0.138379079 28.4683072

    3 36330 0.12458335 40.92664225

    4 34428 0.118060985 52.73274077

    5 31040 0.106442808 63.37702152

    6 28299 0.097043332 73.08135468

    7 22198 0.076121696 80.69352427

    8 18188 0.062370547 86.93057899

    9 13313 0.045653128 91.4958918

  • Statistics Assignment No 1 Page 12

    10 9471 0.032478087 94.74370053

    11 6394 0.021926395 96.93634007

    12 5074 0.017399833 98.67632333

    13 3860 0.013236767 100

    With this data, we can identify the cumulative consumption percentage of consumption in the

    decreasing order of consumption. The Pareto diagram is drawn thus:

    Findings and Interpretation of results:

    The Pareto Principle is named after Vilfredo Pareto who observed in Italy in the 19th Century, that

    80% of the land was owned by 20% of the people. He then developed the principle further by

    observing 20% of the pea pods in the garden contained 80% of the peas. The Pareto Principle is also

    known as the 80-20 rule, which is a general, principle referring to the observation that 80% of

    outcomes come from 20% of causes.

    In the case of this particular example, 80% of the energy consumption across a time span of 30 years

    has occurred in the last decade.

    Concept Name: Scatter Plot

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    X-Axis- The year of consumption is in the X axis.

    Y-Axis- The amount of consumption in that particular year is set in the Y Axis.

  • Statistics Assignment No 1 Page 13

    Formula and calculation steps:

    A scatter plot is a type of mathematical diagram using Cartesian coordinates to display values for

    two variables for a set of data. The data is displayed as a collection of points, each having the value

    of one variable determining the position on the horizontal axis and the value of the other variable

    determining the position on the vertical axis

    With the 2 axes/ variables decided, drawing a scatter plot is a fairly simple exercise. The data that we

    use to draw the scatter graph:

    Year Consumption

    1999 3860

    2000 5074

    2001 6394

    2002 9471

    2003 13313

    2004 18188

    2005 22198

    2006 28299

    2007 31040

    2008 34428

    2009 36330

    2010 40353

    2011 42664

  • Statistics Assignment No 1 Page 14

    Findings and Interpretation of results:

    With the Scatter plot, we can infer that the consumption of Conventional Energy has been

    constantly increasing over all the years. Also, if the years are indexed to numbers, then the Scatter

    Diagram allows a prediction of the future energy consumption.

    Concept Name: Mode

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    Formula and calculation steps:

    The series of data for which MODE needs to be estimated is:

    Class Intervals Frequency

    0-8000 3

    8001-16000 2

    16001-24000 2

    24001-32000 2

    32001-40000 2

    40001-48000 2

    Total 13

    Formula for mode:

    = + () LMo = lower limit of the modal class

    d1 = frequency of the modal class minus the frequency of that directly below it.

    d1 = frequency of the modal class minus the frequency of that directly above it.

    w = Width of the modal class interval

    Using the formula, we can calculate the mode as 8000.

    Findings and Interpretation of results:

    We can see that, obtaining mode through this methodology is not an accurate method.

    Nevertheless, with data constraints, we can utilize this formula to obtain the MODE of the series.

  • Statistics Assignment No 1 Page 15

    Concept Name: Median

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    Formula and calculation steps:

    The series of data for which MEDIAN needs to be estimated is:

    Class Intervals Frequency

    0-8000 3

    8001-16000 2

    16001-24000 2

    24001-32000 2

    32001-40000 2

    40001-48000 2

    Total 13

    Median of Grouped data:

    ( + 1)2 ( + 1) +

    Where, n = number of values in the series

    F = sum of all the class frequencies excluding the median class

    fm = frequency of the median class

    w = class interval width

    Lm = lower limit of the median class interval

    Using the formula, we can estimate the mode as= 5333.33

    Findings and Interpretation of results:

    With the original series of data available with us, we know that the median for this set of data is

    22198. Obviously, there is a huge deviation in the Median obtained through the formula and the

    proper median. Thus, this formula will be a useful one there is data constraints but there will be a

    high degree of deviation in the results.

    Concept Name: Arithmetic Mean

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    Formula and calculation steps:

  • Statistics Assignment No 1

    The series of data for which ARITHMETIC MEAN needs to be estimated is:

    Arithmetic Mean =

    Where, n = number of values in the series

    a = all the values in the in the set; such that a

    AM = Arithmetic mean.

    Using the formula, we can estimate the Arithmetic Mean as

    Findings and Interpretation of results:

    With the original series of data available with us, we know that the AM for this set of data is

    As we can see the AM gets affected by the extreme values. Thus, Arithmetic Mean is an excellent

    tool to determine the mean when the frequencies are closer to the Central Tendency and the

    extreme points are not that far from it.

    Concept Name: Percentiles

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    Formula and calculation steps:

    Percentiles divide the data into 100 equal parts. So

    points as in this case, we need to find the ((n+1)/100)

    In this case, the data point corresponding to 0.14 data point

    Range = 0.14 * (1st data point) = 0.14 * 3860 =

    Now this data range when divided into 100 equal parts will give 1 percentile.

    Findings and Interpretation of results:

    With the original series of data available with us, we know that the each Perc

    Percentiles are used to divide the entire data set into 100 equal parts. This is primarily used to judge

    relative position of a particular datum w.r.t to the entire set. It is majorly used in Competitive

    examinations to judge the relative performance of participants.

    Concept Name: Deciles

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    ITHMETIC MEAN needs to be estimated is:

    Where, n = number of values in the series

    a = all the values in the in the set; such that a1 is the first value.

    Using the formula, we can estimate the Arithmetic Mean as 22431.

    Findings and Interpretation of results:

    With the original series of data available with us, we know that the AM for this set of data is

    As we can see the AM gets affected by the extreme values. Thus, Arithmetic Mean is an excellent

    etermine the mean when the frequencies are closer to the Central Tendency and the

    extreme points are not that far from it.

    Consumption of Conventional Energy in India (Peta Joules)

    Percentiles divide the data into 100 equal parts. So, for calculating the Percentile range, with

    points as in this case, we need to find the ((n+1)/100)th

    data point.

    In this case, the data point corresponding to 0.14 data point is,

    data point) = 0.14 * 3860 = 540.4

    Now this data range when divided into 100 equal parts will give 1 percentile.

    Findings and Interpretation of results:

    With the original series of data available with us, we know that the each Percentile is 540.4

    Percentiles are used to divide the entire data set into 100 equal parts. This is primarily used to judge

    relative position of a particular datum w.r.t to the entire set. It is majorly used in Competitive

    performance of participants.

    Consumption of Conventional Energy in India (Peta Joules)

    Page 16

    With the original series of data available with us, we know that the AM for this set of data is 22431.

    As we can see the AM gets affected by the extreme values. Thus, Arithmetic Mean is an excellent

    etermine the mean when the frequencies are closer to the Central Tendency and the

    for calculating the Percentile range, with 13 data

    540.4.

    Percentiles are used to divide the entire data set into 100 equal parts. This is primarily used to judge

    relative position of a particular datum w.r.t to the entire set. It is majorly used in Competitive

  • Statistics Assignment No 1 Page 17

    Formula and calculation steps:

    The series of data for which Deciles need to be estimated is:

    Deciles divide the data into 10 equal parts. So for calculating the Deciles, with 13 data points as in

    this case, we need to find the ((n+1)/10)th

    data point.

    In this case, the data point corresponding to 1.4th

    data point is,

    Range = 1st

    term + 0.4 * (2nd

    data point - 1st data point) = 3860 + 0.4 * (5074-3860) = 4345.6

    Now this data range when divided into 10 equal parts will give 10 percentile.

    Findings and Interpretation of results:

    With the original series of data available with us, we know that the each Decile is 4345.6. Deciles are

    used to divide the entire data set into 10 equal parts. Deciles are majorly used in measuring the

    amount of rainfall; where the entire rainfall is divided into 10 parts and then categorized as low,

    medium and so on.

    Concept Name: Quartiles

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    Formula and calculation steps:

    The series of data for which Deciles need to be estimated is:

    Deciles divide the data into 4 equal parts. So for calculating the Deciles, with 13 data points as in this

    case, we need to find the ((n+1)/4)th

    data point.

    In this case, the data point corresponding to 3.5th

    data point is,

    Range = 3rd

    term + 0.5 *(4th

    term-3rd

    term) = 6394 + 0.5 * (947-6394) = 7932.5

    Now this data range when divided into 4 equal parts will give 25 percentile.

    Findings and Interpretation of results:

    With the original series of data available with us, we know that the each Quartile is 7932.5.

    Quartiles are used to divide the entire data set into 4 equal parts.

    Concept Name: Range

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    Formula and calculation steps:

  • Statistics Assignment No 1 Page 18

    The series of data for which Range needs to be estimated is:

    Range is the span of the data

    Range = Value of highest observation Value of Lowest Observation.

    Range = 42664 3860 = 38804

    Findings and Interpretation of results:

    With the original series of data available with us, we know that the each Range is 38804. Range

    provides us with a good picture about the span of the data but gives very little information about the

    variance in the data.

    Concept Name: Inter Quartile Range

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    Formula and calculation steps:

    The series of data for which Quartiles need to be estimated is:

    Inter Quartile range measures how far from the median we must go on either side before we can

    include one half of the dataset. So, the Inter Quartile range is the difference between the third

    quartile and the first quartile.

    Now, we know that the first quartile (Q1) is 7932.5

    The third quartile(Q3) is calculated as 3*(14/4) = 10.5th

    term

    So, the third quartile = 10th

    term + 0.5 *(11th

    term-10th

    term) = 35379.5

    So the Interquartile range is Q3 Q1 = 35379.5 7932.5 = 27447

    Findings and Interpretation of results:

    With the original series of data available with us, we know that the Inter Quartile range is 27447.

    Inter Quartile range gives us the amount by which we have to move away from median on either

    side to include one half of the dataset.

    Concept Name: Mean Absolute Deviation

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    Formula and calculation steps:

    Mean Absolute Deviation = mean of | | Where

  • Statistics Assignment No 1 Page 19

    is the element is the Athematic mean of the data.

    Mean = ! " #... &

    '

    The steps involved

    1. = ()*+,+-.*(/./.-((()))

    /))//(+.+(..)(*((+.+(,(.**.( = 22431.69

    x Deviation Absolute Deviation

    3860 18571.69 18571.69

    5074 17357.69 17357.69

    6394 16037.69 16037.69

    9471 12960.69 12960.69

    13313 9118.692 9118.692

    18188 4243.692 4243.692

    22198 233.6923 233.6923

    28299 -5867.308 5867.31

    31040 -8608.308 8608.31

    34428 -11996.31 11996.31

    36330 -13898.31 13898.31

    40353 -17921.31 17921.31

    42664 -20232.31 20232.31

    Mean Absolute Deviation = 12080.6

    Findings and Interpretation of results:

    With the original series of data available with us, we know that the mean for this set of data is

    22431. From this value we got the Mean Absolute Value as 12080.59. From this value we can

    deduce that the deviation is very large and the data is fluctuating over a large range.

    Concept Name: Variance

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    Formula and calculation steps:

    Variance = 0 = (234)"5 = 180670816 Where

    1. X = Observation

  • Statistics Assignment No 1 Page 20

    2. 6 = 7789: 3. ; = ::9 8 : 7789: 4. = @8 99 A98> ( B 6 ) 5. 0 = 7789: >:= A: 6. 0 = 7789: C=D

    x x- (x-)2

    3860 22432 18572 344907755

    5074 22432 17358 301289482

    6394 22432 16038 257207575

    9471 22432 12961 167979545

    13313 22432 9119 83150549.4

    18188 22432 4244 18008924.4

    22198 22432 234 54612.0947

    28299 22432 -5867 34425299.6

    31040 22432 -8608 74102961.3

    34428 22432 -11996 143911398

    36330 22432 -13898 193162957

    40353 22432 -17921 321173269

    42664 22432 -20232 409346275

    Summation 2348720603

    Findings and Interpretation of results:

    With the original series of data available with us, we know that the mean for this set of data is

    22431. From this value we got the Variance as 180670816. From this value we can deduce that the

    deviation is very large and the data is fluctuating over a large range.

    Concept Name: Standard Deviation

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    Formula and calculation steps:

    Standard Deviation = 0 = 0 = F(234)"5 Where

    1. X = Observation

    2. 6 = 7789: 3. ; = ::9 8 : 7789: 4. = @8 99 A98> ( B 6 ) 5. 0 = 7789: >:= A: Standard Deviation = Square root of variance = 13441

  • Statistics Assignment No 1 Page 21

    Findings and Interpretation of results:

    With the original series of data available with us, we know that the mean for this set of data is

    22431. From this value we got the standard deviation = 13441. From this value we can deduce that

    the deviation is very large and the data is fluctuating over a large range.

    Concept Name: Coefficient of Variance

    Formula and calculation steps:

    Coefficient of Variance = GH = I4 = 13441/22441 * 100 = 59.96 Where

    1. 6 = 7789: 2. 0 = 7789: >:= A:

    Findings and Interpretation of results:

    With the coefficient of variation at 60%, it can be concluded that the distribution of data is highly

    dispersed with respect to the mean.

    Concept Name: Coefficient of Skewness

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    Formula and calculation steps:

    Coefficient of Skewness = @J = ( 34)#KLM!(53)I# = 1.65 Where

    1. X = Observation

    2. 6 = 7789: 3. ; = ::9 8 : 7789: 4. = @8 99 A98> ( B 6 )( 5. 0 = 7789: >:= A: 6. 0 = 7789: C=D

    x x- (x-)3

    3860 22432 18572 6405520703583.78

    5074 22432 17358 5229690128413.75

    6394 22432 16038 4125015939940.37

    9471 22432 12961 2177131197958.20

    13313 22432 9119 758224275215.75

  • Statistics Assignment No 1 Page 22

    18188 22432 4244 76424333956.14

    22198 22432 234 12762426.43

    28299 22432 -5867 -201983824896.17

    31040 22432 -8608 -637901092000.60

    34428 22432 -11996 -

    1726405413819.48

    36330 22432 -13898 -

    2684638207112.31

    40353 22432 -17921 -

    5755844983504.25

    42664 22432 -20232 -

    8282019779521.16

    Summation -516773959359.54

    Findings and Interpretation of results:

    From this value we got the Coefficient of Skewness as 1.65 by way of which it can be concluded that

    the data is quite symmetric in nature with little skewness. This is corroborated by the fact that the

    mean and the median values are not far off.

    Concept Name: Coefficient of Kurtosis

    Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)

    Formula and calculation steps:

    Coefficient of Kurtosis = @N = (6)4;=1(;1)04 = -0.018

    x x- (x-)4

    3860 22432 18572 118961359577511000.00

    5074 22432 17358 90775352113581700.00

    6394 22432 16038 66155736409089900.00

    9471 22432 12961 28217127570213800.00

    13313 22432 9119 6914013865915450.00

    18188 22432 4244 324321358130165.00

    22198 22432 234 2982480884.74

    28299 22432 -5867 1185101249535000.00

    31040 22432 -8608 5491248877200220.00

    34428 22432 -11996 20710490545844300.00

    36330 22432 -13898 37311927844972200.00

    40353 22432 -17921 103152268978605000.00

  • Statistics Assignment No 1 Page 23

    42664 22432 -20232 167564372493050000.00

    Summation 646763323866130000.00

    Where

    1. X = Observation

    2. 6 = 7789: 3. ; = ::9 8 : 7789: 4. = @8 99 A98> ( B 6 ). 5. 0 = 7789: >:= A: 6. 0 = 7789: C=D

    Findings and Interpretation of results:

    From this value we got the Coefficient of Kurtosis as -0.018 by way of which it can be concluded that

    the data is quite symmetric in nature with little peakedness.

    Data Collected:

    The data with which we are going to do statistical analysis is: Consumption of Conventional Energy in

    India (Peta Joules).

    Dataset 2

    Year

    Exports (Rs

    crore)

    Imports (Rs

    crore)

    IIP

    index

    2011-12 1454066 2342217 170.2

    2010-11 1142922 1683467 165.5

    2009-10 845534 1363736 152.9

    2008-09 840754 1374434 145.2

    2007-08 655863 1012312 141.7

    Data Brief:

    The data collected gives a measure of the imports and exports from India which, in turn, gives an

    account of the trade deficit for the Indian economy over the last five years. Also included in the data

    set is the Index of Industrial Production which is calculated with a base of 100 for the year 2004-05.

    Why the data was maintained by the source and its importance?

    The data was collected from the Macro Economic Indicators section of Economic and Political

    Weekly, a popular fortnightly magazine which publishes articles, commentary and editorials on

    current topics related to economy and politics. The articles are written by eminent academicians and

    members of the industry.

  • Statistics Assignment No 1 Page 24

    The Macro Economic Indicators section is maintained by the Economic and Political Weekly as a

    regular section in their newspaper giving an overall view of the Indian economy with respect to the

    trade balance, money and banking and index numbers of wholesale prices.

    Type of data:

    The type of data used for analysis is NUMERICAL data.

    --------------------------------------------------------------------------------------------------------------------------------------

    Concept Name: Geometric mean

    Selection of Variable: Index of Industrial Production (IIP) data

    Formula and calculation steps:

    The Index of Industrial Production, as mentioned before, is relative to a base of 100 for the year

    2004-05. Hence, in order to find the average index over the course of 5 years, the arithmetic mean is

    not a suitable measure whereas the Geometric mean is.

    In order to calculate the geometric mean, the index numbers are divided by 100 to factor to a base

    of 1 instead of 100 resulting in the table below:

    Year IIP index Scaled to 1

    2011-12 170.2 1.702

    2010-11 165.5 1.655

    2009-10 152.9 1.529

    2008-09 145.2 1.452

    2007-08 141.7 1.417

    Geometric mean is then calculated as the 5th

    root of product of all index numbers across 5 years.

    Hence, geometric mean = (1.702 * 1.655 * 1.529 * 1.452 * 1.417)^0.2 = 1.547

    Findings and Interpretation of results:

    Hence, from the geometric mean, we can conclude that the average index for industrial production

    across 5 years starting from 2007-08 is 155.5 scaled to a base of 100.

    Concept Name: Scatter Plot

    Selection of Variable: Data for exports and imports

    Formula and calculation steps:

  • Statistics Assignment No 1 Page 25

    The Scatter plot is a graph which describes the relationship between two variables. In this case, the

    Scatter Plot is plotted with the Exports on the x-axis and the Imports on the y-axis to provide a

    relationship between the exports and imports of India.

    Findings and Interpretation of results:

    From the Scatter Plot, it can be observed that as Exports increase, the Imports increase as well as

    seen from the trend over the last 5 years from 2007-08.