Select sampling units from a set of potential sampling units using propensity score matching (PSM)

create_sample_psm(
  x,
  svy,
  psu = "psu",
  match = NULL,
  pop = "pop",
  sampling_type = c("simple", "systematic")
)

Arguments

x	A data.frame of all potential primary sampling units (PSUs) to sample from each uniquely identified and with corresponding populations.
svy	A data.frame of survey data drawn via probability proportional to population size (PPS) approach from all the potential PSUs in x.
psu	A single character value or vector of values corresponding to the variable name/s for the primary sampling unit in x and svy. Default is "psu".
match	A single character value corresponding to the variable name for the primary sampling unit in x that matches the primary sampling unit in svy. Default is NULL which indicates that primary sampling unit in x is the same as primary sampling unit in svy.
pop	A single character value corresponding to the variable name for the population figures in x. Default is "pop".
sampling_type	Which sampling type to use. Choice between simple random sample (simple) or systematic sample (systematic). Default is simple.

Value

A data.frame drawn from svy that contains the primary sampling units selected by propensity score matching.

Author

Mark Myatt and Ernest Guevarra

Examples

create_sample_psm(
  x = village_list,
  svy = sample_data,
  psu = c("id", "psu"),
  match = "cluster",
  pop = "population"
)
#> # A tibble: 276 × 11
#>    surveydate   psu   sex birthdate    age weight height oedema  muac measure
#>    <date>     <int> <int> <date>     <int>  <dbl>  <dbl>  <int> <int> <chr>  
#>  1 2020-12-16    16     1 2020-05-22     6    7.8   65.6      0   140 l      
#>  2 2020-12-19    13     2 2020-05-26     6    5.2   62.6      0   131 l      
#>  3 2020-12-19     3     2 2020-06-19     6    7     69.1      0   133 l      
#>  4 2020-12-19    15     2 2020-06-13     6    5.4   62.4      0   127 l      
#>  5 2020-12-20    14     1 NA             6    7.1   66.3      0   152 l      
#>  6 2020-12-20     9     1 2020-05-23     6    7.1   65.5      0   123 l      
#>  7 2020-12-23    30     2 NA             6    8.5   63.2      0   168 l      
#>  8 2020-12-18    10     1 NA             7    6.9   69        0   119 l      
#>  9 2020-12-22    29     2 2020-05-10     7    6.3   63.6      0   127 l      
#> 10 2020-12-22    29     2 2020-04-24     7    7.4   66.6      0   134 l      
#> # … with 266 more rows, and 1 more variable: clothes <chr>