R/create_sample_psm.R
create_sample_psm.Rd
Select sampling units from a set of potential sampling units using propensity score matching (PSM)
create_sample_psm( x, svy, psu = "psu", match = NULL, pop = "pop", sampling_type = c("simple", "systematic") )
x | A data.frame of all potential primary sampling units (PSUs) to sample from each uniquely identified and with corresponding populations. |
---|---|
svy | A data.frame of survey data drawn via probability proportional to population size (PPS) approach from all the potential PSUs in x. |
psu | A single character value or vector of values corresponding to the variable name/s for the primary sampling unit in x and svy. Default is "psu". |
match | A single character value corresponding to the variable name for the primary sampling unit in x that matches the primary sampling unit in svy. Default is NULL which indicates that primary sampling unit in x is the same as primary sampling unit in svy. |
pop | A single character value corresponding to the variable name for the population figures in x. Default is "pop". |
sampling_type | Which sampling type to use. Choice between simple random sample (simple) or systematic sample (systematic). Default is simple. |
A data.frame drawn from svy that contains the primary sampling units selected by propensity score matching.
Mark Myatt and Ernest Guevarra
create_sample_psm( x = village_list, svy = sample_data, psu = c("id", "psu"), match = "cluster", pop = "population" )#> # A tibble: 276 × 11 #> surveydate psu sex birthdate age weight height oedema muac measure #> <date> <int> <int> <date> <int> <dbl> <dbl> <int> <int> <chr> #> 1 2020-12-16 16 1 2020-05-22 6 7.8 65.6 0 140 l #> 2 2020-12-19 13 2 2020-05-26 6 5.2 62.6 0 131 l #> 3 2020-12-19 3 2 2020-06-19 6 7 69.1 0 133 l #> 4 2020-12-19 15 2 2020-06-13 6 5.4 62.4 0 127 l #> 5 2020-12-20 14 1 NA 6 7.1 66.3 0 152 l #> 6 2020-12-20 9 1 2020-05-23 6 7.1 65.5 0 123 l #> 7 2020-12-23 30 2 NA 6 8.5 63.2 0 168 l #> 8 2020-12-18 10 1 NA 7 6.9 69 0 119 l #> 9 2020-12-22 29 2 2020-05-10 7 6.3 63.6 0 127 l #> 10 2020-12-22 29 2 2020-04-24 7 7.4 66.6 0 134 l #> # … with 266 more rows, and 1 more variable: clothes <chr>