Download - Introduction

1

Estimation of Finite Estimation of Finite Population Mean Using Population Mean Using

Ranked Set Ranked Set Two-stage Sampling Two-stage Sampling

DesignDesign

ByBy

U C Sud and Dwidesh U C Sud and Dwidesh MishraMishra

IASRI, New Delhi-110012IASRI, New Delhi-110012

2

• The method of Ranked Set Sampling (RSS) was first The method of Ranked Set Sampling (RSS) was first introduced by McIntyre (1952) as a cost-efficient introduced by McIntyre (1952) as a cost-efficient alternative to simple random sampling for situations alternative to simple random sampling for situations where outside information is available allowing one to where outside information is available allowing one to rank small sets of sampling units according to the rank small sets of sampling units according to the character of interest without actually quantifying the character of interest without actually quantifying the

units.units. • McIyntyre was concerned with estimating agricultural McIyntyre was concerned with estimating agricultural

yields where the ranking could be done on the basis of yields where the ranking could be done on the basis of visual inspection.visual inspection.

• One of the strengths of the method, however, is that One of the strengths of the method, however, is that its implementation and performance require only that its implementation and performance require only that ranking be possible but they do not depend in any way ranking be possible but they do not depend in any way on how the ranking is accomplishedon how the ranking is accomplished

IntroductionIntroduction

3

The Method of RSSThe Method of RSS

• A A basic cyclebasic cycle of the method involves the random selection of of the method involves the random selection of mm22 units from the population. These units are randomly units from the population. These units are randomly partitioned into partitioned into mm subsets, each containing subsets, each containing mm sampling sampling units. The members of every subset are ranked according to units. The members of every subset are ranked according to the character of interest.the character of interest.

• Then the lowest ranked member is quantified from the first Then the lowest ranked member is quantified from the first set, the second lowest ranked member is quantified from set, the second lowest ranked member is quantified from the second set, and so on until the highest ranked member the second set, and so on until the highest ranked member of the last set is quantified.of the last set is quantified.

• This yields m quantification from among the mThis yields m quantification from among the m22 selected selected units. Since m is usually taken as small in order to facilitate units. Since m is usually taken as small in order to facilitate the ranking, there may not be enough measurements for the ranking, there may not be enough measurements for reasonable inference and the basic cycle is repeated r times reasonable inference and the basic cycle is repeated r times to give n=mr quantifications out of r selected units.to give n=mr quantifications out of r selected units.

4

• Let us take a set-size m=3 with r=4Let us take a set-size m=3 with r=4• Then the sampling scheme can be shown by the following diagramThen the sampling scheme can be shown by the following diagram• Here each row indicates a judgement ordered sample for each cycle. Encircled Here each row indicates a judgement ordered sample for each cycle. Encircled

units are quantified. Out of 36 units drawn, 12 units have been quantifiedunits are quantified. Out of 36 units drawn, 12 units have been quantified

Cycle Cycle RankRank

11 22 33

11 -- --

-- --

-- --

22 -- --

-- --

-- --

33 -- --

-- --

-- --

44 -- --

-- --

-- --

5

Contd.Contd.• Let XLet X1111, X, X1212,…, X,…, X1m1m, X, X2222,…,X,…,X2m2m,…,X,…,Xm1m1,…,X,…,Xmmmm be independent random be independent random

variables all having the same cumulative distribution function variables all having the same cumulative distribution function F(x). Also let F(x). Also let

• XXi(1)i(1), X, Xi(2)i(2),…, X,…, Xi(m) i(m) denote the corresponding order statistics ofdenote the corresponding order statistics of , X, Xi1i1,,…,X…,Xi2i2,…,X,…,Xiiii,…,X,…,Ximim

• (i=1,2,…,m). Then X(i=1,2,…,m). Then X1(1)1(1), X, X2(2)2(2),…, X,…, Xm(m) m(m) is the ranked set sampleis the ranked set sample

(considering one cycle only), since X(considering one cycle only), since X i(i)i(i)is the i-th order statistic in is the i-th order statistic in the i-th sample.the i-th sample.

• The value XThe value Xijij for the randomly drawn units can be arranged as in for the randomly drawn units can be arranged as in the following diagram:the following diagram:

• SetSet

mmmm

m

XXXm

XXX

XXX

21

232221

11211

....

2

1

6

Contd.Contd.• After ranking the units appear as:After ranking the units appear as:

)(

)2(2

)1(1

**

....

**2

**1

mmXm

X

X

)()2()1(

3(2)2(2)1(2

)(1)2(1)1(1

....

)2

1

mmmm

m

XXXm

XXX

XXX

The quantified units appear as

m:iX

7

ExamplesExamples• RSS is very useful in environmental and ecological sampling RSS is very useful in environmental and ecological sampling

where exact measurement (or quantification) of a selected unit is where exact measurement (or quantification) of a selected unit is either difficult or expensive in terms of time, money or labor, but either difficult or expensive in terms of time, money or labor, but where ranking of a small set of selected units according to the where ranking of a small set of selected units according to the characteristic of interest can be done with reasonable success on characteristic of interest can be done with reasonable success on the basis of visual inspection or other rough method not requiring the basis of visual inspection or other rough method not requiring actual measurement.actual measurement.

• Thus if the interest lies in estimating the mean height of the Thus if the interest lies in estimating the mean height of the sampled trees, then measurement of the height of the trees could sampled trees, then measurement of the height of the trees could pose a problem, but it would be relatively easy to rank small sets pose a problem, but it would be relatively easy to rank small sets of trees on the basis of visual inspection.of trees on the basis of visual inspection.

• In situations where visual inspection is not directly available In situations where visual inspection is not directly available ranking can be done on the basis of a covariate that is more ranking can be done on the basis of a covariate that is more accessible and also correlated with the character of interest.accessible and also correlated with the character of interest.

• Thus for estimating volume of trees one can carry out ranking on Thus for estimating volume of trees one can carry out ranking on the basis of diameter of the trees.the basis of diameter of the trees.

8

• Performance of the RSS estimator is generally benchmarked Performance of the RSS estimator is generally benchmarked

against that of simple random sampling (SRS) estimator against that of simple random sampling (SRS) estimator with the same number of quantifications. For this purpose, one with the same number of quantifications. For this purpose, one may employ either the may employ either the relative precision,relative precision,

•

• Or the relative savings, Or the relative savings,

• There was little follow up on McIntyre’s (1952) proposal until There was little follow up on McIntyre’s (1952) proposal until late 1960s when Hall and Dell (1966) published a field late 1960s when Hall and Dell (1966) published a field evaluation and Takahasi and Wakimoto (1968) developed the evaluation and Takahasi and Wakimoto (1968) developed the statistical theory for the RSS method. When sampling is from a statistical theory for the RSS method. When sampling is from a continuous population and the ranking is perfect, Takahasi and continuous population and the ranking is perfect, Takahasi and Wakimoto proved that is unbiased for and Wakimoto proved that is unbiased for and is at least as efficient as . is at least as efficient as .

RSS

SRSRPˆvar

ˆvar

RPRS 11

RSS

SRS

SRS

Theory of RSS

9

• They also obtained the variance of the RSS estimator asThey also obtained the variance of the RSS estimator as

• where is the population variance and is the expected i-th out where is the population variance and is the expected i-th out of m order statistic from the population. They also established the boundof m order statistic from the population. They also established the bound

• or or

• The upper bound indicates that ranked set sampling can result in very The upper bound indicates that ranked set sampling can result in very substantial savings when compared with simple random sampling. substantial savings when compared with simple random sampling. Specifically, the method can result in savings in the number of Specifically, the method can result in savings in the number of quantifications by as much as 33, 50, 60, 67 percent when m=2, 3, 4, 5 quantifications by as much as 33, 50, 60, 67 percent when m=2, 3, 4, 5 respectively.respectively.

2

m

imiRSS mmr

2:

2 11ˆvar

mi:

2

11

m

RP

Contd.

1m

1mRS1

10

ReviewReview• Stokes (1979) considered the use of concominant variable at the Stokes (1979) considered the use of concominant variable at the

estimation stage in the context of RSSestimation stage in the context of RSS

• Stokes (1980) dealt with the problem of estimation of population Stokes (1980) dealt with the problem of estimation of population variancevariance

• Dell and Clutter (1972) considered the problem of ranking errorsDell and Clutter (1972) considered the problem of ranking errors

• Philip and Lam (1997) developed a regression estimator for RSSPhilip and Lam (1997) developed a regression estimator for RSS

11

RSS in the Context of Finite RSS in the Context of Finite Population SamplingPopulation Sampling

• Early developments in RSS were concerned with sampling from infinite population.

• Patil et al. (1994) were the first to consider the situation of sampling from finite population.

• Explicit expressions were obtained for the variance of the RSS estimator and for its precision relative to that of simple random sampling without replacement.

• Krishna (2002) extended the theory of RSS to the case of sampling from a finite population by utilising a Horvitz-Thomson estimator for the estimation of the finite population mean.

• Calculation of • Calculation of is tedious

, iji

ij

12

Three different cases have been studied. In the first case the SRS is used at the 1st stage of sampling and RSS at the 2nd stage of sampling. Similarly, the RSS is used at 1st stage and SRS at the 2nd stage in second case. In the third case the RSS is used in both the stages of sampling. In each of the cases efficiency comparisons of RSS based estimators have been made with SRS based estimators with the help of real data when the sampling is SRS at both the stages of sampling.

Let there be a finite population of N primary stage units, a-th primary stage unit is of size M. Let be the value of unit pertaining to b-th secondary stage unit (ssu) of a-th primary stage unit (psu).

However, the contributions made by Patil et al. (1994) and Krishna (2002) were limited to the case of uni-stage sampling designs.

RSS for Two – stage sampling designs

RSS for Two - Stage Sampling Design

abx

13

M

baba x

MX

1

1

= mean per ssu in the a-th psu

N

a

M

babxNM

X1 1

1

= Population mean

Case 1: SRS at first stage and RSS at second stage

Let a sample of size ‘n’ be drawn from ‘N’ by SRSWOR. Also, let a set of size m be selected at random and without replacement from M using RSS.

Without any loss of generality we assume that

N1,2,...,a )...( 21 aMaa xxx

Contd.

14

Case 1: SRS at first stage and RSS at Case 1: SRS at first stage and RSS at second stagesecond stage

• Define the eventDefine the event

}{ sk

such that the k-th ranked unit in the subset is the s-th ranked unit in the population of ssu.

Also write,

}Pr{ skAsak

akA M- sakAand let denote the - dimensional column vector having

as its s-th component

'

15

Contd.Contd.

Mak

sak

2ak

1ak

'ak A . . . A . . . A AA

It may be noted that sakA is given by

,1

1

m

Mkm

sM

k

s

Ms ,...,2,1

):( mkaxIf

is the quantification of the k-th ranked unit from the set, then

16

Contd.Contd.

):():( ][ mkamka XxE

M

sasmkaas xxx

1):( ]Pr[

M

sas skx

1})Pr({

M

s

sakasAx

1

aak xA 2

):():( ][ mkaxmkaxV

22 )( aakaak xAxA

17

Contd.Contd.

2ax is the component wise square of ax

Next, we study the joint distribution of the order statistics from two disjoint sets. Let two disjoint sets each of size m

be drawn without replacement from MWrite },{ tjsk

for the event that the k-th ranked unit from set 1 has rank s and the j-th ranked unit from set 2 has rank t in the population of size .M

We define

},Pr{ tjskBstakj

18

Following Patil et al. (1994), it may be seen that

kmst

akj

mm

Mjm

kmtM

j

kt

km

tMst

k

s

B0

,

1

1

1

1

1

Let akjB be the MM matrix with stakjB as its (s,t)th component.

Notice that ajkakj BB , since tsajk

stakj BB .Let 1):( mkax and .

2):( mjax be the quantification of the k-th and j-th ranked units from set 1 and set 2, respectively. Then ,

Contd.

19

Contd.Contd.

2):(1):( , mjamka xx is given by

],[ 2):(1):( mjamka xxCov akjC aajakakja xAABx )(

The covariance between

20

Contd.Contd.

Let mr sets, each of size m, be selected randomly using RSS and without replacement from the a-th psu. Let the lowest ranked unit be quantified in each of the first ‘r’ sets-

rmamama XXX ):1(2):1(1):1( ,...,,

In each of the next r sets, the second ranked unit is quantified to give:

rmamama XXX ):2(2):2(1):2( ,...,,

This process continues until the highest ranked unit is quantified in each of the last r sets:

21

Contd.Contd.

rmmammamma XXX ):(2):(1):( ,...,,

Theorem 1, The estimator

n

a

r

o

m

komkaRSS x

nrmX

1 1 1):(1

1ˆ

is unbiased and variance of 1ˆRSSX is given by

)ˆ( 1RSSXVar

211

axSNn}]

)1(

)1({[

1 2

1aax

N

a M

mrM

nrmN

N

aaax XX

NS

1

22 )()1(

1

22

Proof of the resultsProof of the results

a amMMM

mm )12)...(1(

)!1(!

)()( aaaaa xx

The matrix is symmetric with zeroes on the diagonal, it is calculated by

*

1,

m

kakkB

mm

M

Proof:To prove that the estimator 1RSSX is unbiased, we proceed as

follows:

A program has been made in the language Turbo ‘C’ to calculate TA program has been made in the language Turbo ‘C’ to calculate T

23

Contd.Contd.

]1

[)ˆ(1 1 1

):(21121

n

a

r

o

m

komkaRSS x

nrmEEXEE

]1

[1

1 1 1):(21

n

a

r

o

m

komkaxrm

En

E

][11

1 1 1):(21

n

a

r

o

m

komkaxE

rmnE

n

1a

r

1o

m

1kaak1 xA

rm

1

n

1E

n

a

r

o

m

k

M

sas

sak xA

rmnE

1 1 1 11

11

asn

a

M

s

m

k

sak xA

mnE )(

11

1 1 11

24

Contd.Contd.

n

a

M

sasxMn

E1 1

111

n

aaXn

E1

11

N

aaXN 1

1

)ˆ()ˆ()ˆ( 1211211 RSSRSSRSS XEVXVEXV

)ˆ( 12 RSSXV [arV ]xnrm

1 n

1a

r

1o

m

1ko)m:k(a

m

1k

m

1k

m

1j

m

kakkakj

22)m:k(a

n

1a 222}]CCrr{

mr

1[

n

1

}])()1(

)1({

1[

1

11

2):(

2

1 22

m

kakk

m

kamkaax

n

aCXX

M

mrMm

rmn

)X(V 1RSS2}])()(

1

)1(

)1({

1[

1

1

2

12

m

kaaakkaaax

n

aXxBXx

mM

mrM

rmn

After centering

25

Contd.

}])1(

)1({

1[

1 2

12 aaxn

a M

mrM

rmn

)X(VE 1RSS21 }])1M(

)mr1M({

rm

1[

nN

1a

2ax

N

1a

]}xrm

1{E

n

1[V)X(EV

r

1o

m

1ko)m:k(a

n

1a211RSS21

2

axSN

1

n

1

26

Assume that a sample of size ‘m’ is selected by SRSWOR from the a-th psu a=1,2,…,N. Further, we assume that a set of size ‘n’ is selected from ‘N’ by RSS. Also, as in Case 1, we assume that the psu’s are increasingly arranged. Define the event

such that the a-th ranked unit in the subset is the s-th ranked unit in the population of psu’s.

Define }Pr{ saAsa

Nasaaaa AAAAA . . . . . . 21' be the 1N row vector having

Case2: RSS at first stage and Case2: RSS at first stage and SRS at second stageSRS at second stage

27

Contd.Contd.

as its s-th component

n

Nan

sN

a

s

Asa1

1

m

bbnana x

mx

1):():(

1= sample mean for the a-th psu.

M

bbnabna x

MxE

1):():(2

1][

):( nasXXAa

'2

):():(1 )( naxnaXV 2'2' )( XAXA aa

naXE :1

saA

s=1,2,…,N; a=1,2,…,n

28

Contd.Contd.

To study the joint distribution of the order statistics from disjoint sets each of size ‘n’ drawn by without replacement using RSS, let

},{ tcsa be the event that the a-th ranked unit from set 1 has rank s in the population and the c-th ranked unit from set 2 has rank t in the population.

},Pr{ tcsaBstac

anst

ac

nn

Ncm

amtM

c

at

am

tMst

a

s

B0

,

1

1

1

1

1

29

Contd.Contd.

Let 1):( nax and 2):( ncxbe the quantification of the a-th and c-th ranked units from set 1 and set 2, respectively. Then ,

2):( ncx ] xBx ac

acncna CxxCov ],[ ):():( xAABx caac )(

Moments of the estimator of population mean:Let nr sets each of size n be selected randomly and without replacement from a population of N psu’s. Let the lowest ranked unit be quantified in each of the first r sets

rnnn XXX ):1(,2):1(1):1( ...,,

1:[ naxE

30

Contd.Contd.

Similarly, in each of the next r sets, the second ranked unit is quantified to give

rnnn XXX ):2(,2):2(1):2( ...,,

This process continues until the highest raked unit is quantified in each of the last r sets:

rnmnmnm XXX ):(,2):(1):( ...,,

Thus, the proposed estimator of population mean, when the sample at the first stage is selected by RSS and at the second stage by SRS, is given by

n

a

r

o

m

bbonaRSS x

nrmX

1 1 1),):((2

1ˆ

31

Case III: RSS at both the stagesCase III: RSS at both the stages

On the same lines as in case 1, it can be show that is unbiased and 2

ˆRSSX

the variance of 2ˆRSSX

)ˆ( 2RSSXV = }1

1{

1 22

xN

nrN

rn

+

N

anaxSMmrnN 1

2):( })

11{(

1

2

1):():(

2):( )(

)1(

1

M

bnabnanax Xx

MS

Case3 : RSS at both the stages

2

1 1]):(;):[(

1

1 1 213

11ˆ r

v

m

ivmionk

r

o

n

kRSS X

mrnrX

)ˆ( 3RSSXV )]1

1[

11]

1

1[

1 22

1 221

21

12 kkx

N

kx M

mrM

mrNnrN

nrN

rn

32

For the purpose of comparing the RSS and the SRS based estimator an empirical study was carried out where in a part of the data of wheat crop for an experimental station as given in Singh et al. (1979) was taken. The data comprised 9 fields each field having 4 plots. (Set I). (The population values of were 4.163 and 0.306 respectively).2 and

For RSS protocol, plots in each field were ranked according to the perceived weight of wheat yield. Using this data, estimators of population mean based on RSS and SRS were considered for the three cases dealt with earlier.

3. Empirical Study3. Empirical Study

33

2 and

were 38.05 and 11.23 respectively). The data comprised 9 blocks and 4 societies in each of the block.Finally data on number of persons in a household given in Raj (1971) was also utilized to compare the performance of RSS and SRS based estimators. (Set III). (The population values of 2 and

were 7052 and 0.093 respectively). Here also the data comprised 9 households and 4 persons in a household

Another data set given in Singh and Mangat (1996) on outstanding loans of farmers affiliated to cooperatives was utilized to compare the performance of RSS and SRS based estimators. (Set II). The population values of

34

Table 2.1 Per cent gain in precision of RSS based estimators over SRS based estimators

CaseCase StageStage DesignDesign EstimatorEstimator S.E.S.E.of the estimatorof the estimator

Per cent gain in Per cent gain in precision precision

Set ISet I

11 11 SRSSRS 5.395.39 10.2110.21

22 RSSRSS

22 11 RSSRSS 5.605.60 1.851.85

22 SRSSRS

33 11 RSSRSS 5.335.33 12.4612.46

22 RSSRSS

44 11 SRSSRS 5.94 5.94 --------------

22 SRSSRS

1ˆRSSX

2ˆRSSX

3ˆRSSX

SRSX

35

Set IISet II

11 11 SRSSRS 1.031.03 10.6710.67

22 RSSRSS

22 11 RSSRSS 1.111.11 2.702.70

22 SRSSRS

33 11 RSSRSS 1.011.01 12.8712.87

22 RSSRSS

44 11 SRSSRS 1.141.14 --------------

22 SRSSRS

SRSX

3ˆRSSX

2ˆRSSX

1ˆRSSX

36

Set IIISet III

11 11 SRSSRS 0.1990.199 15.5715.57

22 RSSRSS

22 11 RSSRSS 0.2050.205 12.1912.19

22 SRSSRS

33 11 RSSRSS 0.1940.194 18.5518.55

22 RSSRSS

44 11 SRSSRS 0.2300.230 --------------

22 SRSSRS

SRSX

2ˆRSSX

3ˆRSSX

1ˆRSSX

37

References:References:•Dell, T.R. and Clutter, J.L.(1972). Ranked set sampling theory with order statistics background. Biometrics, 28, 545-553.

•Halls, L.K. and Dell, T.R. (1966). Trail of ranker set sampling for forage yields. Forest Science, 12, 22-26.

•Krishna, Pravin (2002). Some aspects of ranked set sampling from finite population. M.Sc.Thesis of I.A.R.I., New Delhi-12.

•McIntyre, G A (1952). A method of unbiased selective sampling using ranked sets. Australian Journal of Agricultural Research, 3, 385-390.

•Patil, G.P., Sinha, A. K. and Taillie, C. (1993). Ranked set sampling from a finite population in the presence of a trend on a site. Journal of Applied Statistical Science. Vol.1, No. 1, 51-65.

•Patil, G.P., Sinha, A. K. and Taillie, C. (1994). Ranked set sampling. Handbook of Statistics. 12, (eds. Patil, G. P. and Rao, C. R.), 167-198, North-Holland, Amsterdam.

•Patil, G.P., Sinha, A. K. and Taillie, C. (1995). Finite population corrections for ranked set sampling. Annals of Institute of Statistical Mathematics. Vol.47, No. 4, 621-636.

38

• Raj, D. (1971). The Design of Sample Surveys. Mcgraw-Hill Book Co., New

York.

• Singh, D., Singh, P. and Kumar, P. (1979). Hand Book on Sampling

Methods. Indian Agricultural Statistics Research Institute, New Delhi.

• Singh, R and Mangat, N.P.S. (1996). Elements of Survey Sampling. Kluwer

Academic Publisher, pp 388.

• Stokes, S L (1977). Ranked set sampling with concominant variables.

Communication in statistics, Theory and Methods, 6, 1207-1211.

• Stokes, S L (1980). Estimation of variance using judgement order ranked

set samples. Biometrics, 36, 35-42.

39

• Takahasi, K. and Wakimoto, K. (1968). On biased estimates of the

population mean based on the sample stratified by means of ordering.

Annals of the Institute of Statistical Mathematics, 20, 1-31.

• Yu, Philip L.H. and Lam K. (1997). Regression estimator in ranked set

sampling, Biometrics, 53, 1070-1080.

40

THANKS