EXERCISES Exercise 1 (Chapter Two): 1.1: Household...

30
EXERCISES Exercise 1 (Chapter Two): 1.1: Household Characteristics Open c:\intropov\data\Hh.sav that contains household level variables. There will be 496 households in the file. Note that each column corresponds to each variable, whereas each row represents each observation or household. All the variables included in the data are described in Appendix 1. Of variables, there is a variable called ‘weight’. This weight is the weight given to each household. From this weight, we can calculate the population weight by multiplying weight by the size of household (See also Chapter 2 of ‘Poverty Manual’). Answer the following questions: (a) Generate the population weight called ‘pop’. (b) Compare the total number of households and the sum of population. (c) There are four regions in the survey: Dhaka, Chittagong, Khulna, and Rajshahi. ‘region’ is a string variable. Record these regional names into a different variable called ‘reg1’. Record Dhaka, Chittagong, Khulna, and Rajshahi into 1, 2, 3, and 4, respectively. To convert a string variable into a numeric, type the following commands in the Syntax Editor. STRING reg1 (A8). IF (region = " Dhaka " ) reg1 = " 1 " . IF (region = " Chittagong " ) reg1 = " 2 " . IF (region = " Khulna " ) reg1 = " 3 " . IF (region = " Rajshahi " ) reg1 = " 4 " . VARIABLE LABELS reg1 ' four regions (numeric) ' . EXECUTE.

Transcript of EXERCISES Exercise 1 (Chapter Two): 1.1: Household...

Page 1: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

EXERCISES

Exercise 1 (Chapter Two):

1.1: Household Characteristics

Open c:\intropov\data\Hh.sav that contains household level variables. There will be 496

households in the file. Note that each column corresponds to each variable, whereas each

row represents each observation or household. All the variables included in the data are

described in Appendix 1.

Of variables, there is a variable called ‘weight’. This weight is the weight given to each

household. From this weight, we can calculate the population weight by multiplying

weight by the size of household (See also Chapter 2 of ‘Poverty Manual’). Answer the

following questions:

(a) Generate the population weight called ‘pop’.

(b) Compare the total number of households and the sum of population.

(c) There are four regions in the survey: Dhaka, Chittagong, Khulna, and Rajshahi.

‘region’ is a string variable. Record these regional names into a different variable called

‘reg1’. Record Dhaka, Chittagong, Khulna, and Rajshahi into 1, 2, 3, and 4, respectively.

To convert a string variable into a numeric, type the following commands in the Syntax

Editor.

STRING reg1 (A8). IF (region = " Dhaka " ) reg1 = " 1 " . IF (region = " Chittagong " ) reg1 = " 2 " . IF (region = " Khulna " ) reg1 = " 3 " . IF (region = " Rajshahi " ) reg1 = " 4 " . VARIABLE LABELS reg1 ' four regions (numeric) ' . EXECUTE.

Page 2: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

Having executed the commands above, go to the Variable View tab and then click on

Type to change ‘reg1’ to a numeric variable. Note that a string variable ‘region’ has

changed into a numeric variable ‘reg1’. Fill in the following table:

Household characteristics Region 1 Region 2 Region 3 Region 4 Total Number of households in the population

Total population

Average distance a household to paved

road

Average distance of a household to nearest bank % of Households with electricity % of Households with toilet % of population with electricity % of population with toilet Average household size

Can you conclude from the results that one region is more affluent than the other?

Describe why.

(d) Household characteristics also vary with the gender of household head. Compute the

means of the variables in the following table.

Household Male headed Female headed

characteristics households households

Average household size

Average years of schooling of household head

Average household assets

Average household land holding Number of households in the population Sample size Ratio of sample household s to household in the population

Page 3: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

Do you think the female-headed households are underrepresented in the sample? [Hint :

Compare the ratio of sample households to households in the population for male- and

female-headed households.] Can you still conclude whether the female-headed

households are more (or less) educated or less (or more) affluent than their counterparts?

Discuss.

1.2: Individual Characteristics

Open c:\intropov\data\ind.sav. This file has information on individual members of

households. Sort this file by ‘hcode’ and then merge it with the household level data

(c:\intropov\data\Hh.sav). Remember that the household level data is also sorted by

‘hcode’. As a result, you will get the new merged data.

Unlike STATA, SPSS involves more procedures in merging files. In the merged file, you

will find many missing values represented by dots (.). When the household level data is

merged with the individual level data, there are less observations in the former data set

compared to the latter. For instance, the age of household head will be given for the first

member of the household, who is the head of the household. Dots will appear for the

other members within the same household. SPSS does not automatically fill the dots with

the same value as the first member of the household. Note that STATA does this

automatically. In SPSS, you need to fill the dots by using two steps.

-The first step is to use Split File by ‘hcode’. This will group the data by

household code. Note that Split File On sign will appear on the right corner of

the Data Editor window.

-The second step is to use Replace Missing Values from the Transform menu.

Replace dots with the series mean of the first value of the variable in which you

are interested.

SORT CASES BY hcode . SPLIT FILE LAYERED BY hcode .

Page 4: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

RMV / famsize = SMEAN(famsize) / toilet = SMEAN(toilet) .

Having gone through these two steps, we will have a merged file without any missing

values or dots. Save this file as c:\intropov\data\hsurvey.sav. Complete the following

questions.

(a) Answer the following questions and discuss how the results vary across regions.

Regional variation Region 1 Region 2 Region 3 Region 4 Total Average years of education of the

population.

% of female population

% of working population (with positive working hours) % of working population (working in farm)

Are the results very different between male and female?

Gender differences Male Female Total

Average years of schooling

Average age

Average working hours

Average working hours in farm Average working hours in non-farm

[Note: The individual file also have a weight variable, which is in fact the household

weight so that total weight is equal to the total population. A detailed discussion on

‘weight’ is provided in Chapter 2 of poverty manual]

Page 5: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

1.3: Expenditure

Open the data c:\intropov\data\consume.sav. It contains quantity and expenditure of each

food item at household level. Note that total expenditure (hhexp) is the sum of total food

(expfd) and non-food (expnfd) expenditures for a household. These are household

monthly expenditures. To get per capita expenditure per month, thus, ‘hhexp’, ‘expfd’,

and ‘expnfd’ have to be divided by the size of household. But the household size is not in

c:\intropov\data\consume.sav file but in another file called c:\intropov\data\Hh.sav. Thus,

the two files have to be merged together. Sort the ‘c:\intropov\data\consume.sav’ data by

‘hcode’ and then merge it with c:\intropov\data\Hh.sav.

(i) Compute ‘per capita food expenditure (pcfood)’, ‘per capita non-food

expenditure (pcnfood)’, and ‘per capita total expenditure (pcexp)’.

(ii) Repeat (i) with weight and without weight. How do they differ?

(Note: When weighted, population should be used as a weight, where

population weight = weight given to each household × size of household)*

(iii) Which region has the highest and the lowest per capita food and per capita

total expenditure ?

(iv) Does per capita total expenditure differ between male-headed and female-

headed households?

(v) Do you find any positive correlation between number of years of schooling of

household head and per capita total expenditure? Explain the answer by using

a graph.

(vi) Is per capita total expenditure declining with the size of household? Generate

the square of household size.

COMPUTE size_sq = famsize**2 .

EXECUTE .

* Two estimates can differ widely. The correct procedure is to use the population weight.

Page 6: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

Exercise 2 (Chapter Three):

The focus of this part of exercises is on constructing a poverty line. The poverty line

specifies the society’s minimum standard of living to which everyone should be entitled.

Poverty line used as a yardstick to identify the poor is thus the baseline for any poverty

analysis. Once the poverty line is determined, one can construct poverty profiles, the

distribution of poverty across sectors, geographical regions, and socioeconomic groups,

and a comparison of key characteristics of the poor with those of the non-poor.

This exercise discusses three methods that have been used to derive the poverty line in

Bangladesh. These are namely direct calorie intake, food energy intake, and cost of basic

needs. These three methods will be exercised in turn. (Note: A food basket considered for

the healthy survival of a typical family in rural Bangladesh is the same as the one used in

STATA exercise)

2.1: Direct Calorie Intake

The file ‘c:\intropov\data\consume.sav’ provides information on quantities of 10 food

items consumed by the households included in the data. Note that ‘potato’ and ‘other

vegetables’ are lumped together into one item called ‘veg’. These quantities of 10 food

items can be converted into calories based on food calorie conversion factors that are

provided in the table below.

Note also that the quantities in the data are expressed in kg per week and thus have to be

converted into gram per day. Based on calories of food basket and quantities of food

consumed by each household in the survey, we can obtain the household’s calorie intake.

In order to get per capita calorie intake, we need to merge c:\intropov\data\consume.sav

with c:\intropov\data\Hh.sav which has the size of household. Generate a variable called

‘pccal’ indicating ‘per capita calorie intake’.

Page 7: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

Food Per capita normative daily requirements

Items Calorie Quantity (gm)

Average rural

consumer price (taka/kg)

Rice 1386 397 15.19 Wheat 139 40 12.81 Pulse 153 40 30.84 Milk (cow) 39 58 15.9

Oil (mustard) 180 20 58.24

Meat (beef) 14 12 66.39

Fish 51 48 46.02 Potato 26 27 8.18

Other vegetable 26 150 38.3 Sugar 82 20 30.49 Fruits 6 20 28.86 Total 2112

Classify an individual is poor if his or her per capita calorie intake is less than the

nutritional requirement of 2112 calories per day and zero otherwise. Create a new

variable called ‘z_dci’ equal to 100 if the household is poor and zero otherwise. Save the

file as ‘c:\intropov\data\pline.sav’.

IF (pccal < 2112 ) z_dci = 100 .

IF (pccal >= 2112) z_dci = 0 .

EXECUTE.

Discuss the percentage of poor by regions. Which region is the poorest?

2.2: Food-Energy Intake Method

The food energy intake (FEI) method is simple. Since separate poverty lines are

estimated for each region, it takes into account the differences in regional costs of living

and food preference. A classic method of FEI has been proposed by Greer and Thorbecke

Page 8: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

(1986). They provide a method that computes the food poverty line at which an

individual’s food energy intake is just sufficient to satisfy his or her calorie requirement

per day. Their proposed cost-of-calorie function is

Ln(E) = a + b C + u

where E is the per capita total expenditure, C is the number of calories obtained from the

food basket, and u is the error term. Once the equation is estimated, we are able to

construct a poverty line for each region. Since the calorie requirement is the same for all

regions at 2112, the poverty line is estimated separately for each region as

pline = exp( a + b × 2112)

where exp stands for exponential and a and b are the coefficient of estimates in the log

equation above.

We now apply this methodology to the data. Open c:\intropov\data\pline.sav.

(i) Generate the logarithm of per capita total expenditure.

(ii) Regress log of per capita total expenditure (‘pcexp’) against per capita calorie

intake (‘pccal’). Use weighted least square method, where weight is the

population.

REGRESSION

/REGWGT = pop

/STATISTICS COEFF OUTS R ANOVA

/DEPENDENT lpcexp

/METHOD=ENTER pccal .

(iii) What are the estimates of the slope and the constant term ?

COMPUTE lpcexp = LN(pcexp) .

EXECUTE.

Page 9: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

(iv) Create a variable called ‘feipline’, which is equal to the exponential of

(estimated constant + estimated slope multiplied by 2112).

Other than this method, there is a simpler way of calculating a poverty line under FEI

method. The steps are as follows:

(i) Obtain the weighted mean of per capita total expenditure (with weight = pop)

within the range where per capita calorie intake lies between its lower bound

(=2112*0.9) and its upper bound (=2112*1.1).

(ii) Name this weighted average of per capital total expenditure as ‘feipline’.

(iii) Create a variable ‘feipoor’=100 if ‘feipline’ is greater than per capita total

expenditure and zero otherwise.

COMPUTE feipoor = 0 .

EXECUTE .

IF (feipline > pcexp ) feipoor = 100 .

EXECUTE .

(iv) Compute the percentage of poor by regions. Which region is the poorest by

FEI method?

2.3: Cost of Basic Needs

Rowntree’s (1901) approach to specifying poverty lines based on the concept of ‘physical

efficiency’ measures poverty in terms of lack of command over basic consumption needs

essential for maintaining physical efficiency. This approach is so-called the cost of basic

needs (CBN) method of constructing poverty lines. This method involves determining

COMPUTE feipline = EXP ( a + 2112ˆ ×b ) .

EXECUTE .

Page 10: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

food and non-food costs of basic consumption baskets and then adding up the two costs

gives the poverty line. We provide exercises on food poverty line and non-food poverty

line separately.

A: Food Poverty Line

First of all, we choose a basket of a reference group. Open C:\intropov\data\Hh.sav and

merge it with c:\intropov\data\consume.sav after sorting the files by ‘hcode’. Call this

merged file ‘c:\intropov\data\exp.sav’. Create per capita food, non-food, and total

expenditure. Generate the cumulative sum of population (‘cpop’) of which its last value

must be one (How to create ‘cpop’ will be explained in Exercise 4). Create a reference

group, the bottom 20 percent in the distribution of per capita total expenditure. Type the

following command in the Syntax Editor:

COMPUTE ref = 0 .

IF (cpop<=0.2) ref =1 .

EXECUTE .

Having defined the reference group, merge c:\intropov\data\hsurvey.sav’ with

c:\intropov\data\vprice.sav. In this case, arrange variables (‘thana’ and ‘vill’) in their

ascending order before merging the two data sets. In the file c:\intropov\data\vprice.sav,

there is village level price information on all 11 food items. Under the method of cost of

basic needs, it is assumed that all individuals belonging to the bottom 20 percent

nationally enjoy the same standard of living but have different consumption patterns.

Given the basket of total expenditure and calories and average prices of each food item in

the basket for the reference group, compute the quantity of each food item in the basket.

Convert quantities into calorie by using calorie conversion factor. Make sure that the unit

is converted correctly. Compute the cost of per calorie through dividing the average of

per capita expenditure on food basket by the average of per capita total calorie of the

basket for the reference group. Create a variable called ‘costcal’ for the cost of per

Page 11: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

calorie. Calculate the food poverty line equal to multiplying ‘costcal’ by 2112. Note that

there is only one food poverty line in this case.

(i) What is the cost of per calorie for the reference group?

(ii) Create 5 different quintile groups. Compare the cost of per calorie for each of

these quintiles.

(iii) What is the monthly food poverty line ? Label this food poverty line as

‘fline’. Save this file as ‘c:\intropov\data\pline.sav’.

B: Non-food Poverty Line

Parametric methods of setting non-food poverty lines can be readily estimated using a

food-share Engel curve of the regression form, which is illustrated in ‘Poverty Lines in

Theory and Practice’ by M. Ravallion (1999). In this exercise, we practice non-

parametric ways of defining non-food poverty lines which do not impose a functional

form on the Engel curve. We will illustrate constructing both the upper and the lower

poverty line.

(i) Open file c:\intropov\data\pline.sav.

(ii) Arrange per capita food expenditure in ascending order. This is an important

step to follow. Otherwise, you will get an incorrect result.

SORT CASES BY pcfd .

(iii) Compute the weighted average of per capita non-food expenditure for those

households whose per capita food expenditure lies within plus or minus 10

percent around the food poverty line.

Page 12: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

COMPUTE filter_$=(pcfd>=fline*0.9 & pcfd<=fline*1.1) . VARIABLE LABEL filter_$ ‘pcfd>=fline*0.9 & pcfd<=fline*1.1 (FILTER)’ VALUE LABELS filter_$ 0 ‘Not Selected’ 1 ‘Selected’ FORMAT filter_$ (f1.8) . FILTER BY filter_$ . EXECUTE . WEIGHT BY pop . DESCRIPTIVES VARIABLES = pcnfd /STATISTICS=MEAN .

(iv) Call this mean of per capita non-food expenditure (which is weighted by the

population weight) as ‘nfline1’

(v) Compute the upper poverty line (‘upline’) by summing the food poverty line

and ‘nfline1’.

We can apply the same approach to setting the lower poverty line described above, with

the difference that we compute the non-food expenditure of households in the

neighborhood of the point where per capita total expenditure is equal to the food poverty

line. Answer the following questions. (Note: In this case, per capita total expenditure has

to be sorted in its ascending order)

(i) What is the non-food poverty line obtained from this method?

(ii) Compute the lower poverty line (‘cbnpline’).

(iii) Compare the upper poverty line with the lower poverty line. Which one would

you use? Why?

(iv) Calculate the incidence of poverty using the upper poverty line and the lower

poverty line. How are they different? Discuss.

We have constructed poverty lines based on three different methods described above.

Discuss how different the percentage of people living below each of these poverty lines

using the three methodologies. Also discuss which method would you adopt in setting an

official poverty line for your own country.

Page 13: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

Exercises 3 (Chapter Four)

3.1: Getting Started

Open the data file c:\intropov\data\example.sav. The file contains the individual

consumption information of three countries. The figures are all monthly consumption. All

three countries have 10 citizens.

(i) Compare the means of consumption for three countries.

(ii) Suppose that a poverty line is set at 126 per month. Given this poverty line,

compute the following poverty estimates for each country.

a : the head-count index

b: the poverty gap index

c: the squared poverty gap index (or the severity of poverty index)

(iii) Repeat (ii) when the poverty line is 130. Which country has the highest

poverty? Why?

(iv) Why would you use the poverty gap index and its squared poverty gap index

rather than the head-count index even though the latter is extremely simple

and widely used?

3.2: Poverty Measures

Now we work with the data file c:\intropov\data\pline.sav. Make sure that you have

variables including per capita total expenditure (‘pcexp’) and poverty lines (‘fline’ and

‘cbnpline’) constructed by the cost of basic needs.

(i) Compute five poverty measures – including head-count ratio, poverty gap

index, squared poverty gap index (severity of poverty index) and Watts

measure– for per capita total expenditure, using both the food poverty line and

the non-food poverty line derived from the cost of basic need method.

Page 14: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

The following program calculates the four poverty measures for the whole population.

IF (pcexp < pline) hcount = 100 . IF (pcexp >= pline) hcount =0 . EXECUTE . Compute gap = hcount*(pline-pcexp)/pline. Compute severity =hcount*((pline-pcexp)/pline)**2. Compute Watts = hcount*(ln(pline)-ln(pcexp)). execute. WEIGHT BY pop . DESCRIPTIVES VARIABLES=hcount gap severity Watts /STATISTICS=MEAN .

(ii) Estimate the incidence of poverty, the poverty gap index (PGR), and the

severity of poverty index (FGT) for specific subgroups using the food poverty

line and the total poverty line.

Headcount index PGR FGT

(a) 4 Regions

(b) Male-headed households

(c) Female-headed households

(d) Households with more than 5 members

(e) Households with less than or equal to 5

(iii) Poverty calculations are based on a sample of households rather than the

population. Thus, we must compute standard errors of each poverty measure..

When poverty measures have large standard errors, small changes in poverty

may be statistically insignificant and should be carefully interpreted. To

compute corrected standard error, we suggest two methods.

One of ways to correct standard errors is simply divide the standard deviation

of a poverty measure by the square root of sample size ( 496=n in our

Page 15: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

example). Go to Analyze and choose Descriptive Statistics. Alternatively, type

the following command in the Syntax Editor:

WEIGHT BY pop . DESCRIPTIVES VARIABLES = HCOUNT PGR FGT / STATISTICS = MEAN STDDEV .

Having obtained the standard deviations of poverty measures, simply divide them by

496 to get their corrected standard errors.

The other method is to adjust the population taking into account

sample size. Type the following command in the Syntax Editor:

COMPUTE pop1 = spop

npop × .

EXECUTE .

WEIGHT BY pop1 .

DESCRIPTIVES VARIABLES = HCOUNT PGR FGT / STATISTICS = MEAN SEMEAN .

where n, spop, and SEMEAN are the size of sample, the sum of population, and the

standard error of the mean, respectively. In our example, n is equal to 496 and spop is

equal to 13280. Having adjusted the population weight taking into account the size of

sample, simply compute poverty measures and their standard errors, which are

weighted by the adjusted population. Having computed these, fill in the following

table.

Page 16: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

Headcount

ratio Poverty gap

Ratio FGT ratio Region 1

(Standard errors)

Region 2

(Standard errors)

Region 3

(Standard errors)

Region 4

(Standard errors)

Page 17: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

Exercise 4 (Chapter Five)

4.1: Stochastic Dominance

There is no general consensus on poverty line. Thus, it might be appropriate to measure

poverty using all possible poverty lines in a given range. Note that the choice of poverty

measures has a significant implication on the direction of changes in poverty. Hence, it

will be useful to find conditions under which all members of the class of poverty

measures give the same ranking. These issues are dealt with using the idea of stochastic

dominance.

The first-order stochastic dominance test compares the percentage of poor for different

regions, which have the probability distribution functions for each region. A simple way

of testing the first order dominance for each of four regions is to plot the percentage of

poor on the vertical axis and the poverty lines on the horizontal axis.

Poverty Percentage of poor

Line Region 1 Region 2 Region 3 Region 4

3000 7.08 0.62 1.80 2.31

4000 12.04 9.58 11.78 13.93

5000 27.38 27.27 42.89 32.36

6000 45.25 49.84 58.44 48.52

7000 57.34 61.40 74.04 65.90

Poverty Poverty gap ratio

Line Region 1 Region 2 Region 3 Region 4

3000 0.40 2.39 5.90 11.17

4000 0.09 1.38 4.88 10.42

5000 0.29 1.77 6.68 13.77

6000 0.38 2.08 6.34 12.14

7000 0.31 1.96 6.02 11.95

We have calculated the head-count ratio and the poverty gap ratio for all four regions.

These poverty measures are estimated for various poverty lines as shown in the table

Page 18: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

above. The first order dominance curve is the relationship between poverty line (x-axis)

and the corresponding head-count ratio (y-axis).

PLOT FORMAT = OVERLAY

/PLOT = hc1 hc2 hc3 hc4 with pline .

After formatting the graph by interpolation, the graph will look like this:

(i) Does one distribution dominate over the other?

(ii) Does any one of lines cross another line?

(iii) Can you conclude from the graph that one region has a higher incidence of

poverty than another region? Is it true for other poverty measures?

If the two curves do not intersect at all, we do not need to test the second or third

dominance because the first dominance will imply higher poverty on the basis of all

Page 19: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

poverty measures including the head-count ratio. Otherwise, we move on to testing the

second-order stochastic dominance. It is the relationship between poverty line (x-axis)

and the corresponding poverty gap ratio (y-axis). This curve is also called the ‘poverty

deficit curve’. If the second order dominance condition is satisfied, (when the curves do

not intersect), we can say unambiguously that poverty measured by entire class of Foster,

Greer and Thorbecke poverty measures with the exception of the head-count ratio will be

higher in one region than in another region. Given the table above, simply plot the

poverty gap ratio (y-axix) against the poverty lines (x-axis) for each of the regions.

Repeat questions (i), (ii) and (iii).

If the poverty deficit curves also intersect, then we move on to the third order stochastic

dominance, which is the relationship between the poverty line (x-axis) and the severity of

poverty (or square of poverty gap ratio).

Page 20: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

Exercise 5 (Chapter Six)

5.1: Lorenz curve

The Lorenz curve is a simple device that has been used widely to describe and analyze

data on income distribution. This curve has become important in recent times because it

provides a useful method of ranking income distribution from the welfare point of view.

The Lorenz curve is defined as the relationship between the proportion of people with

income less than or equal to a specified amount, and the proportion of total income

received by those people.

More generally, the Lorenz curve is represented by a function L(p), which is interpreted

as the fraction of total income received by the bottom pth fraction of people, when the

people are arranged in ascending order of their incomes. The curve is drawn in a unit

square. Thus, if p=0, L(p)=0 and if p=1, L(p)=1. The slope of the curve is positive and

increases monotonically: the curve is convex to the p axis. From this, it follows that p ≤

L(p). The straight line represented by the equation, L(p)=p, is called the egalitarian line.

In constructing the Lorenz curve, we require to compute the cumulative proportion of per

capita total expenditure and population. The following commands will be involved given

that you have computed the mean of per capita expenditure (‘mpcexp’) and the sum of

population (‘spop’).

COMPUTE cpcexp = pop*pcexp / mpcexp/spop . Compute cpop=pop/spop. EXECUTE . SORT CASES BY cpcexp (A) . CREATE /cpcexp=CSUM(cpcexp). /cpop=CSUM(cpop).

Page 21: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

Note that ‘cpcexp’, ‘cpop’ and ‘CSUM’ are the cumulative proportion of per capita

expenditure, the cumulative proportion of population and the cumulative sum,

respectively.

Check point: Are the last values of ‘cpcexp’ and ‘cpop’ equal to 1 ? If not, there is a

problem.

After having created ‘cpcexp’ and ‘cpop’, we need the following commands:

Compute p=cpop-pop/spop/2. Compute q=cpcexp-pcexp*pop/spop/mpcexp/2.

Note that we have just made a continuity correction.

(i) Go to ‘Graphs’ menu and select ‘Interactive’ and ‘Line’. Graph q on the

vertical axis and q on the horizontal axis. Does the Lorenz curve have a

positive slope? Is the curve convex to the p axis? Can you say that its slope is

increasing monotonically?

(ii) Construct the Lorenz curve for Dhaka and Chittagong. Does one curve lie

above to the other? Which one is closer to the egalitarian line? Can you

conclude that the distribution of expenditure in the Dhaka region is more

equal than in the Chittagong region? Discuss.

(iii) Do these two Lorenz curves intersect each other? If the two curves intersect,

we cannot say that one region is more equal than the other. In this respect, the

Lorenz curve provides only the partial ranking of distributions.

5.2: Inequality Measures

We exercise four inequality measures in this section – including the Gini index,

generalized Gini index, Atkinson measure, and Theil’s index.

Page 22: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

(i) Gini Index

Of all the inequality measures, the Gini index is used most widely. It became popular

because of its direct relationship with the Lorenz curve. The Gini index measures the

extent to which the Lorenz curve departs from the egalitarian line. It is defined as twice

the area between the Lorenz curve and the egalitarian line. This definition ensures that

the value of the Gini index lies between zero (for complete equality) and one (for

complete or most extreme inequality).

Having created the cumulative proportion of per capita total expenditure and population,

the following commands are to generate the Gini index and quintile shares.

COMPUTE gini = 100*(1-2*q) . EXECUTE .

IF (p<=.20) quint = 1 . IF (p>.20 and p<=.40) quint = 2 . IF (p>.40 and p<=.60) quint = 3 . IF (p>.60 and p<=.80) quint = 4 . IF (p>.80) quint = 5 . EXECUTE.

COMPUTE share =100* pcexp*pop/(spop*mpcexp) . EXECUTE .

Compute the Gini index for the four regions. Which region is the most unequal among

the four regions?

Page 23: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

(ii) Atkinson’s Measure

The inequality measure proposed by Atkinson is

µ*

1x

A −=

which is in fact a measure of loss of welfare caused as a consequence of inequality in the

society. x* is called ‘equally distributed equivalent level of income’ which is the level of

per capita income that if received by everyone, would make the total welfare exactly

equal to the total welfare generated by the actual income distribution.

With homothetic utility function, Atkinson’s index is equal to

∑−==

−−n

iii xfA

1

11

1 )(1

1)( εε

µε , 1≠ε

= µ

∑=−

n

iiei xf

1

))(logexp(1 , 1=ε

ε is a measure of degree of inequality-aversion.

The following program can be used to compute Atkinson’s measures for pcexp when ε

is 1, 1.5, and 2.

Compute lpcexp=Ln(pcexp).

Compute pcexp1=(pcexp)**(-0.5).

Compute pcexp2= (pcexp)**(-1).

Execute.

Calculate the weighted mean (weight = pop) of lpcexp, pcexp1 and pcexp2 using the

descriptive command:

Page 24: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

WEIGHT BY pop . DESCRIPTIVES VARIABLES = lpcexp pcexp1 pcexp2 / STATISTICS = MEAN .

The following calculations will give the Atkinson’s measures depending on the value of

relative aversion parameter.

A ( == )5.1ε 1- (mean pcexp)**(-2)/mpcexp.

A ( )2=ε = 1- (mean pcexp)**(-1)/mpcexp.

A ( )1=ε = 1- (exp(mean lpcexp))/mpcexp.

Repeat the following example to compute the Atkinson’s inequality measure. Fill in the

following gaps.

Households

Per capita expenditure

(exp) (1)

Relative frequency

(feq) (2)

log(exp) (3) (1) × (3)

(exp)( -0.5)

(exp)( -1)

1 1 0.03

2 1000 0.03

3 2000 0.03

4 3000 0.07

5 4000 0.17 8.29 33176 0.016 0.000

6 5000 0.2

7 6000 0.2

8 10000 0.17

9 12000 0.07 9.39 112712 0.009 0.000

10 14000 0.03

Weighted Mean 6100 8.36 54498 0.044 0.030

0.91 Atkinson index ( ε =1) Atkinson index (ε =1.5) Atkinson index ( ε =2)

Page 25: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

Is the Atkinson’s inequality measure increasing as the inequality aversion parameter

increases?

(iv) Theil’s Index

Theil (1967) proposed two inequality measures that are based on the notion of entropy in

information theory. The two entropy measures are defined as

T0 = log log( ) ( )µ −∞

∫ x f x dx0

T1 = 1

0µµx x f x d xlog( ) ( ) log

∫ −

where µ is the mean income and f(x) is the density function.

Compute the two entropy measures from the table presented above. Is your result for T0

equal to 0.36? Is your result fro T1 equal to 0.22?

Page 26: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

Exercise 6 (Chapter Seven)

Poverty profiles describe nature and extent of poverty. They provide breakdown of

aggregate poverty according to various socioeconomic and demographic characteristics

of households. They show how poverty varies with subgroups of society, such as regions,

household size, age, etc. Poverty profiles can also show the impact of the sectoral and

regional patterns of economic changes on aggregate poverty.

6.1: Characteristics of the poor

Open the file c:\intropov\data\pline.sav. Note that people whose per capita expenditure is

less than per capita monthly poverty line defined by the cost of basic needs are classified

as ‘poor’ and ‘non-poor’ otherwise. Answer the following questions.

Poor Non-poor

Average distance of a household to paved road

Average distance of a household to nearest bank

% households with electricity

% households with sanitary toilets

Average household assets

Average household land holding

Average household size

% households headed by female

% households headed by male

Average years of schooling of head

Average age of household head

Average total working hours in farming

Average total working hours in non-farming

Calculate the head-count ratio, the poverty gap ratio, and the severity of poverty by all

household characteristics shown above. Construct graphs for each of these subgroups.

Discuss a poverty profile in the rural Bangladesh based on your findings.

Page 27: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

Exercise 7 (Chapter Eight)

Suppose that we want to explain per capita total expenditure in terms of socioeconomic

and demographic household characteristics in the data. We estimate a regression model

with the logarithm of per capita expenditure as the dependent variable. The explanatory

variables can include;

- gender of household head

- age of household head

- age-square of household head

- size of household

- size-square of household

- education and employment status of household head

- access to basic infrastructure such as distance to a paved road or to bank

- asset positions such as land holding

- region or urban/rural

- and other variables

We generate variables that do not exist in the data.

COMPUTE lpcexp = ln(pcexp) . COMPUTE sq_age = age**2 . COMPUTE sq_size = size**2 . EXECUTE .

There is categorical variables in the regression model, such as gender and region. In this

case, we have to convert these categorical variables into dummy variables. For instance,

if the head of household is male, create a new variable equal to 1 and 0 otherwise. Note

that in the regression model, only one of two dummies has to be included.

IF (gender = 1) male = 1 . IF (gender = 2) female = 0 . EXECUTE .

Page 28: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

Similarly, create a regional dummy variable called ‘reg1’. There will be four dummies

yet only three dummy variables should be included. To run a regression model, open the

Syntax Editor, write the following command, and then click on the Run button to execute

the analysis.

REGRESSION /DESCTIPTIVES /REGWGT=pop /STATISTICS COEFF OUTS R ANOVA /DEPENDENT lpcexp /METHOD=ENTER male age sq_age size sq_size edu road land reg1 reg2 reg3 (include more

variables).

The Regression command is used to produce both simple and multiple regression equations and associated

statistics.

The /DESCRIPTIVE subcommand tells SPSS to produce descriptive statistics for all the variables

included in the analysis. These statistics include means, standard deviations, a correlation matrix, and so

on.

The /REGWGT subcommand indicates that the regression model is weighted (by population in our

example)

The /STATISTICS subcommand produces statistical results of the model – including R-square, adjusted

R-square, sum of squares, degrees of freedom, estimated coefficients, t and F statistics, etc.

The /DEPENDENT subcommand is used to identify the dependent variable in the regression model. In this

example, our dependent variable is the log of per capita expenditure.

The /METHOD subcommand must immediately follow the /DEPENDENT subcommand. This

subcommand is used to tell SPSS the way you want your independent variables to be added to the

regression equation. ENTER is the most direct method used to build a regression equation; it tells SPSS

simply to enter all the independent variables that you indicate for inclusion in the regression equation.

Since all the dummy variables takes values 0 and 1, the above model cannot be estimated

by the ordinary least square (OLS) method. This is because there is a perfect

multicollinearity between the dummy variables and the constant term in the regression

Page 29: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the

model. To overcome this problem, a certain constraint on the coefficient has to be

imposed. Another problem you have to deal with is heteroskedastic ity. Since each

sampled household has a different population weight attached due to sampling design

used, the OLS method will give inefficient coefficient estimates because of

heteroskedasticity. The coefficients in the model, however, can be estimated efficiently

using the weighted least square (WLS) method, where population is used as the weight.

Thus, the model is estimated based on the restricted weighted least squares method.

(i) What is your R_square of this model?

(ii) Do the signs of coefficients match with your hypothesis?

(iii) Are coefficients significant at 5 % significance level?

It is a good idea to visually examine the scatter plot of the two variables when

interpreting a regression analysis. To do so, you may need to type:

PLOT / FORMAT REGRESSION / PLOT lpcexp WITH (explanatory variables) .

In addition, the scatter plot of the residuals against the fitted values will help you to see

visually whether the model is a good fit. To carry out this task, we need additional

subcommands in the regression.

REGRESSION /DESCTIPTIVES /STATISTICS COEFF OUTS R ANOVA /DEPENDENT lpcexp /METHOD=ENTER male age sq_age size sq_size edu road land reg1 reg2 reg3 (include more variables). /SCATTERPLOT = (*ZRESID, *ZPRED) /SAVE ZRESID ZPRED .

After having saved ZPRED and ZRESID, simply type the following command in the

Syntax Editor:

PLOT / FORMAT REGRESSION / PLOT ZRESID WITH ZPRED .

Page 30: EXERCISES Exercise 1 (Chapter Two): 1.1: Household ...siteresources.worldbank.org/.../Resources/exercises_spss.pdfUnlike STATA, SPSS involves more procedures in merging files. In the