Skip to contents

Harmonize a data frame

Usage

harmonize_df(
  .data,
  ...,
  .spec = NULL,
  .unspecified_columns = c("error", "drop", "keep")
)

Arguments

.data

(data.frame) A data frame to harmonize.

...

These dots are for future extensions and must be empty.

.spec

(hrmn_spec_df) A data frame harmonization specification.

.unspecified_columns

("error", "drop", or "keep") How to handle columns in .data that are not present in .spec.

Value

The input .data harmonized to a tibble::tibble().

See also

Other harmonization functions: harmonize_fct()

Examples

df <- data.frame(
  size = c("Small", "Medium", "S", "M", "Large", "Lrg", "Sm"),
  id = 1:7
)

# This spec will coerce values to NA if they are not "Small", "Medium",
# or "Large".
spec <- specify_df(
  size = specify_fct(levels = c("Small", "Medium", "Large"))
)

# We can provide harmonization rules to the data before the spec is applied.
# Here, we harmonize the input factor to convert "S", "M", "Sm", and "Lrg" to
# valid values.
harmonize_df(
  df,
  size = harmonize_fct(
    size,
    .lookup = c("S" = "Small", "M" = "Medium", "Sm" = "Small", "Lrg" = "Large")
  ),
  .spec = spec,
  .unspecified_columns = "keep"
)
#> # A tibble: 7 × 2
#>   size      id
#>   <fct>  <int>
#> 1 Small      1
#> 2 Medium     2
#> 3 Small      3
#> 4 Medium     4
#> 5 Large      5
#> 6 Large      6
#> 7 Small      7