This function takes a data frame and performs statistical selection according to one of four algorithms: fixed interval sampling, cell sampling, random sampling, and modified sieve sampling. Selection is done on the level of two possible sampling units: items (records / rows) or monetary units. The function returns an object of class jfaSelection which can be used with associated summary() and a plot() methods.

For more details on how to use this function, see the package vignette: vignette('jfa', package = 'jfa')

selection(data, size, units = c('items', 'values'),
          method = c('interval', 'cell', 'random', 'sieve'), values = NULL,
          order = NULL, decreasing = FALSE, randomize = FALSE,
          replace = FALSE, start = 1)

Arguments

data

a data frame containing the population of items the auditor wishes to sample from.

size

an integer larger than 0 specifying the number of sampling units that need to be selected from the population. Can also be an object of class jfaPlanning.

units

a character specifying the sampling units used. Possible options are items (default) for selection on the level of items (rows) or values for selection on the level of monetary units.

method

a character specifying the sampling algorithm used. Possible options are interval (default) for fixed interval sampling, cell for cell sampling, random for random sampling, or sieve for modified sieve sampling.

values

a character specifying the name of a column in data containing the book values of the items.

order

a character specifying the name of a column in data containing the ranks of the items. The items in the data are ordered according to these values in the order indicated by decreasing.

decreasing

if order is specified, a logical specifying whether to order the items from smallest to largest. Defaults to FALSE.

randomize

a logical specifying whether the items in the data should be randomly shuffled before selection. Defaults to FALSE. Note that specifying if randomize = TRUE overrules order.

replace

if method = 'random', a logical specifying whether sampling should be performed with replacement. Defaults to FALSE.

start

if method = 'interval', an integer larger than 0 specifying the starting point of the algorithm.

Value

An object of class jfaSelection containing:

data

a data frame containing the input data.

sample

a data frame containing the selected sample of items.

n.req

an integer indicating the requested sample size.

n.units

an integer indicating the total number of obtained sampling units.

n.items

an integer indicating the total number of obtained sample items.

N.units

an integer indicating the total number of sampling units in the population.

N.items

an integer indicating the total number of items in the population.

interval

if method = 'interval', a numeric value indicating the size of the selection interval.

units

a character indicating the sampling units that were used to create the selection.

method

a character indicating the the algorithm that was used to create the selection.

values

if values is specified, a character indicating the name of the book value column.

start

if method = 'interval', an integer indicating the starting point in the interval.

data.name

a character string giving the name of the data.

Details

The first part of this section elaborates on the two possible options for the units argument:

  • items: In record sampling each item in the population is seen as a sampling unit. An item of $5000 is therefore equally likely to be selected as an item of $500.

  • values: In monetary unit sampling each monetary unit in the population is seen as a sampling unit. An item of $5000 is therefore ten times more likely to be selected as an item of $500.

The second part of this section elaborates on the four possible options for the method argument:

  • interval: In fixed interval sampling the sampling units in the population are divided into a number (equal to the sample size) of intervals. From each interval one sampling unit is selected according to a fixed starting point (specified by start).

  • cell: In cell sampling the sampling units in the population are divided into a number (equal to the sample size) of intervals. From each interval one sampling unit is selected with equal probability.

  • random: In random sampling each sampling unit in the population is drawn with equal probability.

  • sieve: In modified sieve sampling each item in the population is selected proportional to its value (Hoogduin, Hall, & Tsay, 2010).

References

Hoogduin, L. A., Hall, T. W., & Tsay, J. J. (2010). Modified sieve sampling: A method for single-and multi-stage probability-proportional-to-size sampling. Auditing: A Journal of Practice & Theory, 29(1), 125-148.

Leslie, D. A., Teitlebaum, A. D., & Anderson, R. J. (1979). Dollar-unit Sampling: A Practical Guide for Auditors. Copp Clark Pitman; Belmont, Calif.: distributed by Fearon-Pitman.

Wampler, B., & McEacharn, M. (2005). Monetary-unit sampling using Microsoft Excel. The CPA journal, 75(5), 36.

Author

Koen Derks, k.derks@nyenrode.nl

Examples

data("BuildIt")

# Select 100 items using random sampling
selection(data = BuildIt, size = 100, method = "random")
#> 
#> 	Audit Sample Selection
#> 
#> data:  BuildIt
#> number of sampling units = 100, number of items = 100
#> sample selected via method 'items' + 'random'

# Select 150 monetary units using fixed interval sampling
selection(
  data = BuildIt, size = 150, units = "values",
  method = "interval", values = "bookValue"
)
#> 
#> 	Audit Sample Selection
#> 
#> data:  BuildIt
#> number of sampling units = 150, number of items = 150
#> sample selected via method 'values' + 'interval'