Efficient Sampling

download Efficient Sampling

of 16

Transcript of Efficient Sampling

  • 8/8/2019 Efficient Sampling

    1/16

    User guidance

    General principles

    For an introduction to this style of statistical sampling please see the article "Efficient samples forinternal control and audit testing" at http://www.internalcontrolsdesign.co.uk/samples/index.html.

    This workbook offers two formats for using these techniques. The Test Schedule format is a classicauditor's test schedule, with space for a list of sample items and up to three tests on each item. Asyou answer Y (yes) or N (no) for each test on each item the statistical inferences you can make fromyour testing are immediately updated at the top of the page.

    Although it makes sense to have some idea of how many items you are likely to test before you starttesting it is not necessary to make a firm decision before you start. Just start testing and stop whenthe conclusions are clear enough for your purposes.

    The other format is a Summary format, where you can just enter summary results (number of itemstested and number of exceptions found) from several samples or tests, and see statistics inferred fromyour evidence.

    These formulae are appropriate where you want to assess the exception rate of a process at a point intime using sampled items processed at around that time. They are also suitable for sampling fromvery large populations. They are NOT appropriate for sampling from small populations.

    One of the big advantages of Bayesian statistics is that it helps us combine evidence from more thanone source, so to do that both formats give you a way to express what you already believe about theexception rate using a technique called Equivalent Prior Samples. You don't have to do this but if youdo then the end result from these spreadsheets is statements about the exception rate that combinewhat you knew before with what you have learned from your new sample.

    Unless you bring in other evidence the formulae in these spreadsheets make the democracticassumption that all exception rates are equally likely. However, the results of other tests, thecontinued trading of a company, and our general experience of processes designed to work tells usthat lower error rates are usually more likely than very high ones. For example, if you are testingtelephone bills and want to know what proportion of them are wrong it is much more likely that 1% arewrong than 99%.

    The idea of Equivalent Prior Samples is that you imagine that your views about the exception ratebefore starting on the new sample are the result of having tested a previous sample. You then enter a

    number of items tested and a number of exceptions found in that imaginary previous sample.

    In practice, of course, it's hard to use your imagination like that. However, you can think how confidentyou are that the exception rate is less than 5% and then adjust your prior sample numbers to matchthat view. If you're not sure, always err on the side of smaller prior sample sizes, or just leave the priorsample figures at zero and let all the evidence come from your new sample.

    Both formats tell you the confidence that the exception rate is below 5%, below 1% and below 0.1%.This is to give you a feel for what the complete distribution looks like. The complete distribution canbe viewed on the Test Schedule worksheet to see its general shape.

  • 8/8/2019 Efficient Sampling

    2/16

    Now here are detailed points on each format.

    Test Schedule format

    For an example of the Test Schedule format filled in see the Test Schedule example worksheet.

    Areas that you might want to type in are surrounded by a black line.

    Summary format

    For an example of the Summary format see the Summary example worksheet.

    Once again, areas that you might want to type in are surrounded by a black line.

    If you need to prove very low exception rates then you will find that the equivalent prior sample doesn'tusually save you much testing, relatively. However, if you are doing a small number of very expensivetests and the error rate sought is not particularly low then you may find that bringing in other evidencehelps you cut sample testing costs considerably.

    The only part of this that's not obvious from the general introduction and example is how the graphworks.

    The graph shows the probability density of beliefs about the exception rate. Put another way, the line

    is high over exception rates that are likely. The exception rate can range from 0 to 1 (i.e. 0 to 100%)but most real exception rates are quite low so I've given you two cells (C64 and C65) to set the upperand lower boundary of the range you want to see. Don't forget to keep these between 0 and 1.

    If you change the range from its starting value then the Excel chart that shows the curves will alsohave to be adjusted. Change the scale for the X axis to the range you want to see.

    The three test lines in the example illustrate the sort of thing that can be achieved. In the first test theprocess involved is manual and some exceptions are expected, captured in the equivalent priorsample which imagines 1 exception found in a test of 20 items. The actual results appear to be betterthan this and the auditor has decided to stop once it is established that the exception rate is lowerthan 5% with a confidence of just over 90%.

    In the second test a much higher level of reliability is expected and sought from a fully automatedprocess and largely automated test. Prior testing has given quite a narrow expected range for theexception rate. In the new testing the error rate turns out to be better than in the past and the auditordecides that being 99% confident that the exception rate is below 0.1% is good enough.

    For the third and final test shown there is no previous evidence and the auditor stops testing havingreached 95% confidence that the exception rate is below 1%.

    If you think these sample sizes seem rather large or the inferences are weak then you need to adjustyour expectations! These samples and inferences are correct and if your department's policy is to do30 or 50 items and the belief is that this means 95% confidence in something or other then the policyis wrong. But don't assume this means you have to increase your sample sizes. I probably meansyou have to be more explicit about the value of other evidence.

  • 8/8/2019 Efficient Sampling

    3/16

    Confidence that exception rate is:

    < 5% 81.5721% 47.2522% 25.0586%

    < 1% 12.4549% 1.4573% 0.1718%

    < 0.1% 0.1760% 0.0020% 0.0000%

    Equivalent prior sample:Total tests 20 10 10

    Exceptions 0 0 0

    Sample Test 1 Test 2 Test 3

    item Identification of sample item Initials test Date check Casting

    1 Invoice number 1082 Y Y Y

    2 Invoice number 1465 Y Y Y

    3 Invoice number 1921 Y Y Y

    4 Invoice number 203 Y Y Y

    5 Invoice number 2129 Y Y Y

    6 Invoice number 2164 Y Y Y7 Invoice number 2408 Y Y Y

    8 Invoice number 2549 Y Y Y

    9 Invoice number 267 N N N

    10 Invoice number 2793 Y Y Y

    11 Invoice number 2795 Y Y Y

    12 Invoice number 2852 Y Y Y

    13 Invoice number 3205 Y N N

    14 Invoice number 3497 Y Y Y

    15 Invoice number 4112 Y Y Y

    16 Invoice number 4148 Y Y Y

    17 Invoice number 461 Y Y N

    18 Invoice number 4746 Y Y Y19 Invoice number 5121 Y Y Y

    20 Invoice number 5127 Y Y Y

    21 Invoice number 515 Y Y Y

    22 Invoice number 5233 Y Y Y

    23 Invoice number 6094 Y Y Y

    24 Invoice number 6240 Y Y Y

    25 Invoice number 6618 Y Y Y

    26 Invoice number 6761 Y Y Y

    27 Invoice number 6896 Y Y Y

    28 Invoice number 6973 Y Y Y

    29 Invoice number 7481 Y Y Y

    30 Invoice number 7578 Y Y Y31 Invoice number 7677 Y Y Y

    32 Invoice number 809 Y Y Y

    33 Invoice number 8102 Y Y Y

    34 Invoice number 8199 Y Y Y

    35 Invoice number 8627 Y Y Y

    36 Invoice number 9021 Y Y Y

    37 Invoice number 9142 Y Y Y

    38 Invoice number 924 Y Y Y

  • 8/8/2019 Efficient Sampling

    4/16

    39 Invoice number 9682 Y Y Y

    40 Invoice number 9813 Y Y Y

    Count of Y 39 38 37Count of N 1 2 3

    alpha 2 3 4

    beta 60 49 48

    Range for graph:

    Lowest exception rate 0

    Highest exception rate 0.1

    Graph data:

    0 0 0 0

    0.01 0.01 0 00.01 0.02 0 0

    0.02 0.02 0.01 0

    0.02 0.02 0.01 0

    0.03 0.02 0.01 0

    0.03 0.02 0.01 0.01

    0.04 0.02 0.01 0.01

    0.04 0.01 0.01 0.01

    0.05 0.01 0.01 0.01

    0.05 0.01 0.01 0.01

    0.06 0.01 0.01 0.01

    0.06 0.01 0.01 0.01

    0.06 0 0.01 0.010.07 0 0.01 0.01

    0.08 0 0.01 0.01

    0.08 0 0.01 0.01

    0.09 0 0.01 0.01

    0.09 0 0.01 0.01

    0.1 0 0 0.01

    0.1 0 0 0.01

  • 8/8/2019 Efficient Sampling

    5/16

    0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

    Column C

    Test 2

    Test 3

    Exception rate

  • 8/8/2019 Efficient Sampling

    6/16

  • 8/8/2019 Efficient Sampling

    7/16

    Equivalent Prior Sample New sample results

    Count of Count of Count of Count of

    Test ref Test description items tested exceptions items tested exceptions

    1 Authorisation initialled correctly 20 1 86 1

    2 Matched to customer database 1000 2 8100 03 Correctly split 0 0 298 0

    4 0 0

    5 0 0

    6 0 0

    7 0 0

    8 0 0

    9 0 0

    10 0 0

  • 8/8/2019 Efficient Sampling

    8/16

    Confidence that exception rate is:

    Alpha Beta < 5% < 1% < 0.1% < 0.01%

    3 105 90.7632% 9.2695% 0.0184% 0.0000%

    3 9099 100.0000% 100.0000% 99.4270% 6.4521%1 299 100.0000% 95.0464% 25.8552% 2.9459%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%

  • 8/8/2019 Efficient Sampling

    9/16

    Confidence that exception rate is:

    < 5% 5.0000% 5.0000% 5.0000%

    < 1% 1.0000% 1.0000% 1.0000%

    < 0.1% 0.1000% 0.1000% 0.1000%

    Equivalent prior sample:

    Total tests 0 0 0

    Exceptions 0 0 0

    Sample Test 1 Test 2 Test 3

    item Identification of sample item OK? OK? OK?

    1

    2

    3

    4

    56

    7

    8

    9

    10

    11

    12

    13

    14

    15

    16

    1718

    19

    20

    21

    22

    23

    24

    25

    26

    27

    28

    2930

    31

    32

    33

    34

    35

    36

    37

    38

    39

    40

  • 8/8/2019 Efficient Sampling

    10/16

    Count of Y 0 0 0

    Count of N 0 0 0

    alpha 1 1 1

    beta 1 1 1

    Range for graph:

    Lowest exception rate 0

    Highest exception rate 0.1

    Graph data:

    0 0 0 0

    0.01 0 0 0

    0.01 0 0 0

    0.02 0 0 00.02 0 0 0

    0.03 0 0 0

    0.03 0 0 0

    0.04 0 0 0

    0.04 0 0 0

    0.05 0 0 0

    0.05 0 0 0

    0.06 0 0 0

    0.06 0 0 0

    0.06 0 0 0

    0.07 0 0 0

    0.08 0 0 00.08 0 0 0

    0.09 0 0 0

    0.09 0 0 0

    0.1 0 0 0

    0.1 0 0 0

  • 8/8/2019 Efficient Sampling

    11/16

    0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1

    Column

    Test 2

    Test 3

    Exception rate

  • 8/8/2019 Efficient Sampling

    12/16

  • 8/8/2019 Efficient Sampling

    13/16

  • 8/8/2019 Efficient Sampling

    14/16

  • 8/8/2019 Efficient Sampling

    15/16

    Equivalent Prior Sample New sample results

    Count of Count of Count of Count of

    Test ref Test description items tested exceptions items tested exceptions

    1 0 0 0 0

    2 0 0 0 03 0 0 0 0

    4 0 0 0 0

    5 0 0 0 0

    6 0 0 0 0

    7 0 0 0 0

    8 0 0 0 0

    9 0 0 0 0

    10 0 0 0 0

  • 8/8/2019 Efficient Sampling

    16/16

    Confidence that exception rate is:

    Alpha Beta < 5% < 1% < 0.1% < 0.01%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%1 1 5.0000% 1.0000% 0.1000% 0.0100%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%

    1 1 5.0000% 1.0000% 0.1000% 0.0100%