Statistics
-
Upload
rahul-ranganathan -
Category
Documents
-
view
2 -
download
0
description
Transcript of Statistics
-
STATISTICS ASSIGNMENT NO 1
SUBMITTED TO PROFESSOR SACHIN S KAMBLE
NITIE MUMBAI
ON 10TH
JULY 2012
Prepared by:
Rahul Ranganathan (121)
Rajiv V (125)
Prakash Sethu (107)
Prateek Kumar Kureel (108)
Pruthvi Raj (114)
-
Statistics Assignment No 1 Page 2
Data Collected:
The data with which we are going to do statistical analysis is: Consumption of Conventional Energy in
India (Peta Joules).
Dataset 1
Year Energy consumption (Peta Joules)
1970-71 3860
1975-76 5074
1980-81 6394
1985-86 9471
1990-91 13313
1995-96 18188
2000-01 22198
2005-06 28299
2006-07 31040
2007-08 34428
2008-09 36330
2009-10 40353
2010-11 42664
Data Brief:
The data collected for consumption of conventional energy was reported in the 19th
issue of Energy
Statistics (for the year 2012) released by the Central Statistics Office, Government of India. The
above data in itself is collected in collaboration with the below entities:
1. Office of Coal Controller, Ministry of Coal
2. Ministry of Petroleum & Natural Gas
3. Central Electricity Authority
Why the data was maintained by the source and its importance?
The Central Statistics Office, as part of the Ministry of Statistics and Programme Implementation in
India maintains data not just with respect to energy but also industrial, social and price statistics.
Data released periodically with respect to CPI and WPI index, IIP index and energy statistics from the
Central Statistics Office provide the basis for decision making for government policy and for the
government to track progress and take a stance.
The data for consumption of conventional energy sources was published as part of a wider report
covering both renewable and non-renewable sources of energy and to monitor trends in
consumption and future outlook. For instance, the particular data used for this assignment provides
valuable insight into the rate at which the energy consumption is increasing in India over the recent
past.
-
Statistics Assignment No 1 Page 3
Type of data:
The type of data used for analysis is NUMERICAL data.
Statistical Analysis:
Concept Name: Frequency Distribution
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
X-Axis: Class intervals
Y-Axis: The frequency of occurrence of the value in the corresponding class interval
Formula and calculation steps:
We will use Dataset 1 to construct a frequency distribution chart. With the range of data from 0 to
around 43000 Peta Joules, the class intervals are formed with an interval of 8000 as shown below.
The frequency distribution table thus looks:
Class Intervals Frequency
0-8000 3
8001-16000 2
16001-24000 2
24001-32000 2
32001-40000 2
40001-48000 2
Total 13
And the frequency distribution chart:
-
Statistics Assignment No 1 Page 4
Findings and Interpretation of results:
With the frequency distribution chart, we can identify that the data for energy consumption is
uniformly spread out across all class intervals defined.
Concept Name: Relative Frequency
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
X-Axis- Class intervals
Y-Axis- The relative frequency of occurrence of the value in the corresponding class interval
Formula and calculation steps:
We will use Dataset 1, present in page 2 to construct a relative frequency distribution chart. Building
on the frequency distribution table, we can categorize the relative frequency of occurrence of data
in each interval as follows:
Class Intervals Frequency Relative frequency
0-8000 3 0.23
8001-16000 2 0.15
16001-24000 2 0.15
24001-32000 2 0.15
32001-40000 2 0.15
40001-48000 2 0.15
Total 13 1.00
And the relative frequency plot of the data:
-
Statistics Assignment No 1 Page 5
Findings and Interpretation of results:
We can interpret that the relative frequency of occurrence of data has been consistent across all
class intervals.
Concept Name: Cumulative Frequency Distribution
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
X-Axis- Class intervals
Y-Axis- The cumulative frequency of occurrence of the value in the corresponding class interval
Formula and calculation steps:
We will use Dataset 1, present in page 2 to construct a cumulative frequency distribution chart.
Building on the frequency distribution table, we can categorize the cumulative frequency of
occurrence of data in each interval as follows:
Class
Intervals
Cumulative
frequency
0-8000 3
8001-16000 5
16001-24000 7
24001-32000 9
32001-40000 11
40001-48000 13
Total 13
And the cumulative frequency distribution chart will look like:
-
Statistics Assignment No 1 Page 6
Findings and Interpretation of results:
As we can see, there is progressive increase in the frequency of data occurrence in between various
class intervals, and finally reaches the zenith in the last class interval.
Concept Name: Histogram
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
X-Axis- The class mid-point is chosen as the X axis variable for the histogram
Y-Axis- The frequency of occurrence in that particular class is used as the Y axis.
Formula and calculation steps:
Class mid-point is obtained by finding the mean of the two end values of the class, and thus the
histogram table is obtained:
Class Mid-point Frequency
4000 1
12000 3
20000 2
28000 1
36000 3
44000 3
More 0
And the histogram will look like:
-
Statistics Assignment No 1 Page 7
Findings and Interpretation of results:
Histogram represents the frequencies of values present in a particular class represented by the class
mid-point. If for a particular class, there are no occurrences of data, there will be no bar for that
class. In our example its the class More which has no entries!
Concept Name: Frequency Polygon
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
X-Axis- The class range is chosen in the X axis.
Y-Axis- The frequency of occurrence is represented in the Y axis.
Formula and calculation steps:
We will use Dataset 1 to construct a frequency distribution chart. With the range of data from 0 to
around 43000 Peta Joules, the class intervals are formed with an interval of 8000 as shown below.
The frequency distribution table thus looks:
Class Intervals Frequency
0-8000 3
8001-16000 2
16001-24000 2
24001-32000 2
32001-40000 2
40001-48000 2
Total 13
And the frequency polygon:
-
Statistics Assignment No 1 Page 8
Findings and Interpretation of results:
With the frequency polygon, we can identify that the data is uniformly spread out across all class
intervals.
Concept Name: Ogive
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
X-Axis- The class range is chosen in the X axis.
Y-Axis- The frequency of occurrence is represented in the Y axis.
Formula and calculation steps:
An ogive is a derivative form of frequency polygon chart. Ogive is drawn like a cumulative frequency
polygon chart. There are two kinds of ogives- Less than Ogive and More than Ogive
-
Statistics Assignment No 1 Page 9
This is plotted by using the frequency distribution table that we have been using. The difference is
that the type of information that can be derived from an Ogive. Ogive tells us about the number of
values less than/ more than a particular value.
Findings and Interpretation of results:
An Ogive will give us information about how many data points are less than/ greater than a
particular value is present. For example, in the Less than Ogive, it says that there are 5 values
which are less than 8000 in the data that we have.
Concept Name: Pie Chart
Selection of Variable: Break-up of the Total Energy Consumption from conventional sources in 2011
The highest value that we have in our data is 42664 Peta Joules. This number represents the
Consumption of Conventional Energy in India in the year 2011. It can be divided into 4 parts- Coal &
Lignite, Crude Petroleum, Natural Gas and Electricity.
Pie chart can be drawn for the classification of the consumption.
Formula and calculation steps:
With 42664 representing the consumption of Conventional Energy in a whole, the 4 sub heads- Coal
& Lignite, Crude Petroleum, Natural Gas and Electricity will represent a part of the whole circle,
based on the amount of consumption in each fields:
Coal &
Lignite
Crude
Petroleum
Natural
Gas Electricity
10179 8632 1974 21879
-
Statistics Assignment No 1 Page 10
Findings and Interpretation of results:
With the pie chart we can identify the pattern of consumption in various heads. For example in the
above pie, we can say more than 50% of the energy consumption is in the form of Electricity which
includes thermal, hydro & nuclear electricity from utilities.
Concept Name: Stem and Leaf Diagram
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
Formula and calculation steps:
In order to create the Stem & Leaf Diagram, the data points are arranged with a power of 10000 as
the base. Hence, for example, the first data point of 3860 Peta Joules has a stem of 0 and a leaf of 4
with 3860 rounded off to 4000.
In accordance with the above step, the Stem and Leaf Diagram and the associated plot are
represented below:
Stem Leaves
0 4 5 6 9
1 3 8
2 2 8
3 1 4 6
4 0 3
-
Statistics Assignment No 1 Page 11
Findings and Interpretation of results:
The Stem and Leaf Diagram provides a combination of inferences in terms of a distribution of
frequencies and the data points in the original data set.
Concept Name: Pareto Diagram
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
X-Axis- The year which is considered
Y-Axis- For a Pareto diagram, Y axis is used to denote both the Consumption per year and the
cumulative percentage of the consumption
Formula and calculation steps:
First, the data needs to be arranged in the descending order of consumption and the corresponding
cumulative percentage contribution is obtained.
S No Consumption Qty Relative Distribution Cumulative %
1 42664 0.146303993 14.6303993
2 40353 0.138379079 28.4683072
3 36330 0.12458335 40.92664225
4 34428 0.118060985 52.73274077
5 31040 0.106442808 63.37702152
6 28299 0.097043332 73.08135468
7 22198 0.076121696 80.69352427
8 18188 0.062370547 86.93057899
9 13313 0.045653128 91.4958918
-
Statistics Assignment No 1 Page 12
10 9471 0.032478087 94.74370053
11 6394 0.021926395 96.93634007
12 5074 0.017399833 98.67632333
13 3860 0.013236767 100
With this data, we can identify the cumulative consumption percentage of consumption in the
decreasing order of consumption. The Pareto diagram is drawn thus:
Findings and Interpretation of results:
The Pareto Principle is named after Vilfredo Pareto who observed in Italy in the 19th Century, that
80% of the land was owned by 20% of the people. He then developed the principle further by
observing 20% of the pea pods in the garden contained 80% of the peas. The Pareto Principle is also
known as the 80-20 rule, which is a general, principle referring to the observation that 80% of
outcomes come from 20% of causes.
In the case of this particular example, 80% of the energy consumption across a time span of 30 years
has occurred in the last decade.
Concept Name: Scatter Plot
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
X-Axis- The year of consumption is in the X axis.
Y-Axis- The amount of consumption in that particular year is set in the Y Axis.
-
Statistics Assignment No 1 Page 13
Formula and calculation steps:
A scatter plot is a type of mathematical diagram using Cartesian coordinates to display values for
two variables for a set of data. The data is displayed as a collection of points, each having the value
of one variable determining the position on the horizontal axis and the value of the other variable
determining the position on the vertical axis
With the 2 axes/ variables decided, drawing a scatter plot is a fairly simple exercise. The data that we
use to draw the scatter graph:
Year Consumption
1999 3860
2000 5074
2001 6394
2002 9471
2003 13313
2004 18188
2005 22198
2006 28299
2007 31040
2008 34428
2009 36330
2010 40353
2011 42664
-
Statistics Assignment No 1 Page 14
Findings and Interpretation of results:
With the Scatter plot, we can infer that the consumption of Conventional Energy has been
constantly increasing over all the years. Also, if the years are indexed to numbers, then the Scatter
Diagram allows a prediction of the future energy consumption.
Concept Name: Mode
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
Formula and calculation steps:
The series of data for which MODE needs to be estimated is:
Class Intervals Frequency
0-8000 3
8001-16000 2
16001-24000 2
24001-32000 2
32001-40000 2
40001-48000 2
Total 13
Formula for mode:
= + () LMo = lower limit of the modal class
d1 = frequency of the modal class minus the frequency of that directly below it.
d1 = frequency of the modal class minus the frequency of that directly above it.
w = Width of the modal class interval
Using the formula, we can calculate the mode as 8000.
Findings and Interpretation of results:
We can see that, obtaining mode through this methodology is not an accurate method.
Nevertheless, with data constraints, we can utilize this formula to obtain the MODE of the series.
-
Statistics Assignment No 1 Page 15
Concept Name: Median
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
Formula and calculation steps:
The series of data for which MEDIAN needs to be estimated is:
Class Intervals Frequency
0-8000 3
8001-16000 2
16001-24000 2
24001-32000 2
32001-40000 2
40001-48000 2
Total 13
Median of Grouped data:
( + 1)2 ( + 1) +
Where, n = number of values in the series
F = sum of all the class frequencies excluding the median class
fm = frequency of the median class
w = class interval width
Lm = lower limit of the median class interval
Using the formula, we can estimate the mode as= 5333.33
Findings and Interpretation of results:
With the original series of data available with us, we know that the median for this set of data is
22198. Obviously, there is a huge deviation in the Median obtained through the formula and the
proper median. Thus, this formula will be a useful one there is data constraints but there will be a
high degree of deviation in the results.
Concept Name: Arithmetic Mean
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
Formula and calculation steps:
-
Statistics Assignment No 1
The series of data for which ARITHMETIC MEAN needs to be estimated is:
Arithmetic Mean =
Where, n = number of values in the series
a = all the values in the in the set; such that a
AM = Arithmetic mean.
Using the formula, we can estimate the Arithmetic Mean as
Findings and Interpretation of results:
With the original series of data available with us, we know that the AM for this set of data is
As we can see the AM gets affected by the extreme values. Thus, Arithmetic Mean is an excellent
tool to determine the mean when the frequencies are closer to the Central Tendency and the
extreme points are not that far from it.
Concept Name: Percentiles
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
Formula and calculation steps:
Percentiles divide the data into 100 equal parts. So
points as in this case, we need to find the ((n+1)/100)
In this case, the data point corresponding to 0.14 data point
Range = 0.14 * (1st data point) = 0.14 * 3860 =
Now this data range when divided into 100 equal parts will give 1 percentile.
Findings and Interpretation of results:
With the original series of data available with us, we know that the each Perc
Percentiles are used to divide the entire data set into 100 equal parts. This is primarily used to judge
relative position of a particular datum w.r.t to the entire set. It is majorly used in Competitive
examinations to judge the relative performance of participants.
Concept Name: Deciles
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
ITHMETIC MEAN needs to be estimated is:
Where, n = number of values in the series
a = all the values in the in the set; such that a1 is the first value.
Using the formula, we can estimate the Arithmetic Mean as 22431.
Findings and Interpretation of results:
With the original series of data available with us, we know that the AM for this set of data is
As we can see the AM gets affected by the extreme values. Thus, Arithmetic Mean is an excellent
etermine the mean when the frequencies are closer to the Central Tendency and the
extreme points are not that far from it.
Consumption of Conventional Energy in India (Peta Joules)
Percentiles divide the data into 100 equal parts. So, for calculating the Percentile range, with
points as in this case, we need to find the ((n+1)/100)th
data point.
In this case, the data point corresponding to 0.14 data point is,
data point) = 0.14 * 3860 = 540.4
Now this data range when divided into 100 equal parts will give 1 percentile.
Findings and Interpretation of results:
With the original series of data available with us, we know that the each Percentile is 540.4
Percentiles are used to divide the entire data set into 100 equal parts. This is primarily used to judge
relative position of a particular datum w.r.t to the entire set. It is majorly used in Competitive
performance of participants.
Consumption of Conventional Energy in India (Peta Joules)
Page 16
With the original series of data available with us, we know that the AM for this set of data is 22431.
As we can see the AM gets affected by the extreme values. Thus, Arithmetic Mean is an excellent
etermine the mean when the frequencies are closer to the Central Tendency and the
for calculating the Percentile range, with 13 data
540.4.
Percentiles are used to divide the entire data set into 100 equal parts. This is primarily used to judge
relative position of a particular datum w.r.t to the entire set. It is majorly used in Competitive
-
Statistics Assignment No 1 Page 17
Formula and calculation steps:
The series of data for which Deciles need to be estimated is:
Deciles divide the data into 10 equal parts. So for calculating the Deciles, with 13 data points as in
this case, we need to find the ((n+1)/10)th
data point.
In this case, the data point corresponding to 1.4th
data point is,
Range = 1st
term + 0.4 * (2nd
data point - 1st data point) = 3860 + 0.4 * (5074-3860) = 4345.6
Now this data range when divided into 10 equal parts will give 10 percentile.
Findings and Interpretation of results:
With the original series of data available with us, we know that the each Decile is 4345.6. Deciles are
used to divide the entire data set into 10 equal parts. Deciles are majorly used in measuring the
amount of rainfall; where the entire rainfall is divided into 10 parts and then categorized as low,
medium and so on.
Concept Name: Quartiles
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
Formula and calculation steps:
The series of data for which Deciles need to be estimated is:
Deciles divide the data into 4 equal parts. So for calculating the Deciles, with 13 data points as in this
case, we need to find the ((n+1)/4)th
data point.
In this case, the data point corresponding to 3.5th
data point is,
Range = 3rd
term + 0.5 *(4th
term-3rd
term) = 6394 + 0.5 * (947-6394) = 7932.5
Now this data range when divided into 4 equal parts will give 25 percentile.
Findings and Interpretation of results:
With the original series of data available with us, we know that the each Quartile is 7932.5.
Quartiles are used to divide the entire data set into 4 equal parts.
Concept Name: Range
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
Formula and calculation steps:
-
Statistics Assignment No 1 Page 18
The series of data for which Range needs to be estimated is:
Range is the span of the data
Range = Value of highest observation Value of Lowest Observation.
Range = 42664 3860 = 38804
Findings and Interpretation of results:
With the original series of data available with us, we know that the each Range is 38804. Range
provides us with a good picture about the span of the data but gives very little information about the
variance in the data.
Concept Name: Inter Quartile Range
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
Formula and calculation steps:
The series of data for which Quartiles need to be estimated is:
Inter Quartile range measures how far from the median we must go on either side before we can
include one half of the dataset. So, the Inter Quartile range is the difference between the third
quartile and the first quartile.
Now, we know that the first quartile (Q1) is 7932.5
The third quartile(Q3) is calculated as 3*(14/4) = 10.5th
term
So, the third quartile = 10th
term + 0.5 *(11th
term-10th
term) = 35379.5
So the Interquartile range is Q3 Q1 = 35379.5 7932.5 = 27447
Findings and Interpretation of results:
With the original series of data available with us, we know that the Inter Quartile range is 27447.
Inter Quartile range gives us the amount by which we have to move away from median on either
side to include one half of the dataset.
Concept Name: Mean Absolute Deviation
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
Formula and calculation steps:
Mean Absolute Deviation = mean of | | Where
-
Statistics Assignment No 1 Page 19
is the element is the Athematic mean of the data.
Mean = ! " #... &
'
The steps involved
1. = ()*+,+-.*(/./.-((()))
/))//(+.+(..)(*((+.+(,(.**.( = 22431.69
x Deviation Absolute Deviation
3860 18571.69 18571.69
5074 17357.69 17357.69
6394 16037.69 16037.69
9471 12960.69 12960.69
13313 9118.692 9118.692
18188 4243.692 4243.692
22198 233.6923 233.6923
28299 -5867.308 5867.31
31040 -8608.308 8608.31
34428 -11996.31 11996.31
36330 -13898.31 13898.31
40353 -17921.31 17921.31
42664 -20232.31 20232.31
Mean Absolute Deviation = 12080.6
Findings and Interpretation of results:
With the original series of data available with us, we know that the mean for this set of data is
22431. From this value we got the Mean Absolute Value as 12080.59. From this value we can
deduce that the deviation is very large and the data is fluctuating over a large range.
Concept Name: Variance
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
Formula and calculation steps:
Variance = 0 = (234)"5 = 180670816 Where
1. X = Observation
-
Statistics Assignment No 1 Page 20
2. 6 = 7789: 3. ; = ::9 8 : 7789: 4. = @8 99 A98> ( B 6 ) 5. 0 = 7789: >:= A: 6. 0 = 7789: C=D
x x- (x-)2
3860 22432 18572 344907755
5074 22432 17358 301289482
6394 22432 16038 257207575
9471 22432 12961 167979545
13313 22432 9119 83150549.4
18188 22432 4244 18008924.4
22198 22432 234 54612.0947
28299 22432 -5867 34425299.6
31040 22432 -8608 74102961.3
34428 22432 -11996 143911398
36330 22432 -13898 193162957
40353 22432 -17921 321173269
42664 22432 -20232 409346275
Summation 2348720603
Findings and Interpretation of results:
With the original series of data available with us, we know that the mean for this set of data is
22431. From this value we got the Variance as 180670816. From this value we can deduce that the
deviation is very large and the data is fluctuating over a large range.
Concept Name: Standard Deviation
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
Formula and calculation steps:
Standard Deviation = 0 = 0 = F(234)"5 Where
1. X = Observation
2. 6 = 7789: 3. ; = ::9 8 : 7789: 4. = @8 99 A98> ( B 6 ) 5. 0 = 7789: >:= A: Standard Deviation = Square root of variance = 13441
-
Statistics Assignment No 1 Page 21
Findings and Interpretation of results:
With the original series of data available with us, we know that the mean for this set of data is
22431. From this value we got the standard deviation = 13441. From this value we can deduce that
the deviation is very large and the data is fluctuating over a large range.
Concept Name: Coefficient of Variance
Formula and calculation steps:
Coefficient of Variance = GH = I4 = 13441/22441 * 100 = 59.96 Where
1. 6 = 7789: 2. 0 = 7789: >:= A:
Findings and Interpretation of results:
With the coefficient of variation at 60%, it can be concluded that the distribution of data is highly
dispersed with respect to the mean.
Concept Name: Coefficient of Skewness
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
Formula and calculation steps:
Coefficient of Skewness = @J = ( 34)#KLM!(53)I# = 1.65 Where
1. X = Observation
2. 6 = 7789: 3. ; = ::9 8 : 7789: 4. = @8 99 A98> ( B 6 )( 5. 0 = 7789: >:= A: 6. 0 = 7789: C=D
x x- (x-)3
3860 22432 18572 6405520703583.78
5074 22432 17358 5229690128413.75
6394 22432 16038 4125015939940.37
9471 22432 12961 2177131197958.20
13313 22432 9119 758224275215.75
-
Statistics Assignment No 1 Page 22
18188 22432 4244 76424333956.14
22198 22432 234 12762426.43
28299 22432 -5867 -201983824896.17
31040 22432 -8608 -637901092000.60
34428 22432 -11996 -
1726405413819.48
36330 22432 -13898 -
2684638207112.31
40353 22432 -17921 -
5755844983504.25
42664 22432 -20232 -
8282019779521.16
Summation -516773959359.54
Findings and Interpretation of results:
From this value we got the Coefficient of Skewness as 1.65 by way of which it can be concluded that
the data is quite symmetric in nature with little skewness. This is corroborated by the fact that the
mean and the median values are not far off.
Concept Name: Coefficient of Kurtosis
Selection of Variable: Consumption of Conventional Energy in India (Peta Joules)
Formula and calculation steps:
Coefficient of Kurtosis = @N = (6)4;=1(;1)04 = -0.018
x x- (x-)4
3860 22432 18572 118961359577511000.00
5074 22432 17358 90775352113581700.00
6394 22432 16038 66155736409089900.00
9471 22432 12961 28217127570213800.00
13313 22432 9119 6914013865915450.00
18188 22432 4244 324321358130165.00
22198 22432 234 2982480884.74
28299 22432 -5867 1185101249535000.00
31040 22432 -8608 5491248877200220.00
34428 22432 -11996 20710490545844300.00
36330 22432 -13898 37311927844972200.00
40353 22432 -17921 103152268978605000.00
-
Statistics Assignment No 1 Page 23
42664 22432 -20232 167564372493050000.00
Summation 646763323866130000.00
Where
1. X = Observation
2. 6 = 7789: 3. ; = ::9 8 : 7789: 4. = @8 99 A98> ( B 6 ). 5. 0 = 7789: >:= A: 6. 0 = 7789: C=D
Findings and Interpretation of results:
From this value we got the Coefficient of Kurtosis as -0.018 by way of which it can be concluded that
the data is quite symmetric in nature with little peakedness.
Data Collected:
The data with which we are going to do statistical analysis is: Consumption of Conventional Energy in
India (Peta Joules).
Dataset 2
Year
Exports (Rs
crore)
Imports (Rs
crore)
IIP
index
2011-12 1454066 2342217 170.2
2010-11 1142922 1683467 165.5
2009-10 845534 1363736 152.9
2008-09 840754 1374434 145.2
2007-08 655863 1012312 141.7
Data Brief:
The data collected gives a measure of the imports and exports from India which, in turn, gives an
account of the trade deficit for the Indian economy over the last five years. Also included in the data
set is the Index of Industrial Production which is calculated with a base of 100 for the year 2004-05.
Why the data was maintained by the source and its importance?
The data was collected from the Macro Economic Indicators section of Economic and Political
Weekly, a popular fortnightly magazine which publishes articles, commentary and editorials on
current topics related to economy and politics. The articles are written by eminent academicians and
members of the industry.
-
Statistics Assignment No 1 Page 24
The Macro Economic Indicators section is maintained by the Economic and Political Weekly as a
regular section in their newspaper giving an overall view of the Indian economy with respect to the
trade balance, money and banking and index numbers of wholesale prices.
Type of data:
The type of data used for analysis is NUMERICAL data.
--------------------------------------------------------------------------------------------------------------------------------------
Concept Name: Geometric mean
Selection of Variable: Index of Industrial Production (IIP) data
Formula and calculation steps:
The Index of Industrial Production, as mentioned before, is relative to a base of 100 for the year
2004-05. Hence, in order to find the average index over the course of 5 years, the arithmetic mean is
not a suitable measure whereas the Geometric mean is.
In order to calculate the geometric mean, the index numbers are divided by 100 to factor to a base
of 1 instead of 100 resulting in the table below:
Year IIP index Scaled to 1
2011-12 170.2 1.702
2010-11 165.5 1.655
2009-10 152.9 1.529
2008-09 145.2 1.452
2007-08 141.7 1.417
Geometric mean is then calculated as the 5th
root of product of all index numbers across 5 years.
Hence, geometric mean = (1.702 * 1.655 * 1.529 * 1.452 * 1.417)^0.2 = 1.547
Findings and Interpretation of results:
Hence, from the geometric mean, we can conclude that the average index for industrial production
across 5 years starting from 2007-08 is 155.5 scaled to a base of 100.
Concept Name: Scatter Plot
Selection of Variable: Data for exports and imports
Formula and calculation steps:
-
Statistics Assignment No 1 Page 25
The Scatter plot is a graph which describes the relationship between two variables. In this case, the
Scatter Plot is plotted with the Exports on the x-axis and the Imports on the y-axis to provide a
relationship between the exports and imports of India.
Findings and Interpretation of results:
From the Scatter Plot, it can be observed that as Exports increase, the Imports increase as well as
seen from the trend over the last 5 years from 2007-08.