Select sampling units from a set of potential sampling units using propensity score matching (PSM)

create_sample_psm(
  x,
  svy,
  psu = "psu",
  match = NULL,
  pop = "pop",
  sampling_type = c("simple", "systematic")
)

Arguments

x

A data.frame of all potential primary sampling units (PSUs) to sample from each uniquely identified and with corresponding populations.

svy

A data.frame of survey data drawn via probability proportional to population size (PPS) approach from all the potential PSUs in x.

psu

A single character value or vector of values corresponding to the variable name/s for the primary sampling unit in x and svy. Default is "psu".

match

A single character value corresponding to the variable name for the primary sampling unit in x that matches the primary sampling unit in svy. Default is NULL which indicates that primary sampling unit in x is the same as primary sampling unit in svy.

pop

A single character value corresponding to the variable name for the population figures in x. Default is "pop".

sampling_type

Which sampling type to use. Choice between simple random sample (simple) or systematic sample (systematic). Default is simple.

Value

A data.frame drawn from svy that contains the primary sampling units selected by propensity score matching.

Author

Mark Myatt and Ernest Guevarra

Examples

create_sample_psm( x = village_list, svy = sample_data, psu = c("id", "psu"), match = "cluster", pop = "population" )
#> # A tibble: 276 × 11 #> surveydate psu sex birthdate age weight height oedema muac measure #> <date> <int> <int> <date> <int> <dbl> <dbl> <int> <int> <chr> #> 1 2020-12-16 16 1 2020-05-22 6 7.8 65.6 0 140 l #> 2 2020-12-19 13 2 2020-05-26 6 5.2 62.6 0 131 l #> 3 2020-12-19 3 2 2020-06-19 6 7 69.1 0 133 l #> 4 2020-12-19 15 2 2020-06-13 6 5.4 62.4 0 127 l #> 5 2020-12-20 14 1 NA 6 7.1 66.3 0 152 l #> 6 2020-12-20 9 1 2020-05-23 6 7.1 65.5 0 123 l #> 7 2020-12-23 30 2 NA 6 8.5 63.2 0 168 l #> 8 2020-12-18 10 1 NA 7 6.9 69 0 119 l #> 9 2020-12-22 29 2 2020-05-10 7 6.3 63.6 0 127 l #> 10 2020-12-22 29 2 2020-04-24 7 7.4 66.6 0 134 l #> # … with 266 more rows, and 1 more variable: clothes <chr>