-
Notifications
You must be signed in to change notification settings - Fork 3
Add customizable sex coding to checkParentIDs #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
d212334
83c5857
4060a62
63182ab
f0ae841
e37cd10
55c2afc
fb30e5b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -12,11 +12,12 @@ | |||||
| #' @param repairsex A logical flag indicating whether to attempt repairs on sex of the parents | ||||||
| #' @param addphantoms A logical flag indicating whether to add phantom parents for missing parent IDs. | ||||||
| #' @param parentswithoutrow A logical flag indicating whether to add parents without a row in the pedigree. | ||||||
| #' @param famID Character. Column name for family IDs. | ||||||
| #' @param personID Character. Column name for individual IDs. | ||||||
| #' @param famID Character. Column name for family IDs. | ||||||
| #' @param personID Character. Column name for individual IDs. | ||||||
| #' @param momID Character. Column name for maternal IDs. | ||||||
| #' @param dadID Character. Column name for paternal IDs. | ||||||
| #' | ||||||
| #' @param code_male The code value used to represent male sex in the 'sex' column of \code{ped}. | ||||||
| #' @param code_female The code value used to represent female sex in the 'sex' column of \code{ped}. | ||||||
| #' | ||||||
| #' @return Depending on the value of `repair`, either a list containing validation results or a repaired dataframe is returned. | ||||||
| #' @examples | ||||||
|
|
@@ -32,7 +33,10 @@ checkParentIDs <- function(ped, verbose = FALSE, repair = FALSE, | |||||
| famID = "famID", | ||||||
| personID = "ID", | ||||||
| momID = "momID", | ||||||
| dadID = "dadID") { | ||||||
| dadID = "dadID", | ||||||
| code_male = NULL, | ||||||
| code_female = NULL | ||||||
| ) { | ||||||
|
||||||
| ) { | |
| ) { |
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code_male and code_female parameters in checkParentIDs lack test coverage. Tests should verify that when these parameters are provided, they are properly used to set validation_results$male_var and validation_results$female_var instead of inferring them from the data.
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty lines here serve no purpose and reduce code readability. Consider removing these blank lines.
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -17,13 +17,14 @@ | |||
| #' | ||||
| #' @details This function uses the terms 'male' and 'female' in a biological context, referring to chromosomal and other biologically-based characteristics necessary for constructing genetic pedigrees. The biological aspect of sex used in genetic analysis (genotype) is distinct from the broader, richer concept of gender identity (phenotype). | ||||
| #' | ||||
| #' We recognize the importance of using language and methodologies that affirm and respect the full spectrum of gender identities. | ||||
| #' We recognize the importance of using language and methodologies that affirm and respect the full spectrum of gender identities. | ||||
| #' The developers of this package express unequivocal support for folx in the transgender | ||||
| #' and LGBTQ+ communities. | ||||
| #' | ||||
| #' @param ped A dataframe representing the pedigree data with a 'sex' column. | ||||
| #' @param code_male The current code used to represent males in the 'sex' column. | ||||
| #' @param code_female The current code used to represent females in the 'sex' column. If both are NULL, no recoding is performed. | ||||
| #' @param code_unknown The current code used to represent unknown or ambiguous sex in the 'sex' column. Can be NA to indicate that missing values should be treated as unknown. If NULL and both code_male and code_female are provided, values not matching either will be inferred as unknown. | ||||
| #' @param verbose A logical flag indicating whether to print progress and validation messages to the console. | ||||
| #' @param repair A logical flag indicating whether to attempt repairs on the sex coding. | ||||
| #' @param momID The column name for maternal IDs. Default is "momID". | ||||
|
|
@@ -37,7 +38,10 @@ | |||
| #' } | ||||
| #' @export | ||||
| #' | ||||
| checkSex <- function(ped, code_male = NULL, code_female = NULL, verbose = FALSE, repair = FALSE, | ||||
| checkSex <- function(ped, code_male = NULL, | ||||
| code_female = NULL, | ||||
| code_unknown = NULL, | ||||
| verbose = FALSE, repair = FALSE, | ||||
| momID = "momID", | ||||
| dadID = "dadID") { | ||||
| # Standardize column names in the input dataframe | ||||
|
|
@@ -61,7 +65,6 @@ checkSex <- function(ped, code_male = NULL, code_female = NULL, verbose = FALSE, | |||
| } | ||||
|
|
||||
|
|
||||
|
|
||||
| # Are there multiple sexes/genders in the list of dads and moms? | ||||
|
|
||||
| dad_results <- checkParentSex(ped, parent_col = dadID, verbose = verbose) | ||||
|
|
@@ -92,7 +95,11 @@ checkSex <- function(ped, code_male = NULL, code_female = NULL, verbose = FALSE, | |||
|
|
||||
| if (validation_results$sex_length == 2) { | ||||
| # Recode all dads to the most frequent male value | ||||
| ped <- recodeSex(ped, code_male = validation_results$most_frequent_sex_dad) | ||||
| ped <- recodeSex(ped, | ||||
| code_male = validation_results$most_frequent_sex_dad, | ||||
| code_female = validation_results$most_frequent_sex_mom, | ||||
| code_unknown = code_unknown | ||||
| ) | ||||
|
||||
| # Count and record the change | ||||
| num_changes <- sum(original_ped$sex != ped$sex) | ||||
| # Record the change and the count | ||||
|
|
@@ -128,8 +135,16 @@ checkSex <- function(ped, code_male = NULL, code_female = NULL, verbose = FALSE, | |||
| #' @export | ||||
| #' | ||||
| #' @seealso \code{\link{checkSex}} | ||||
| repairSex <- function(ped, verbose = FALSE, code_male = NULL, code_female = NULL) { | ||||
| checkSex(ped = ped, verbose = verbose, repair = TRUE, code_male = code_male, code_female = code_female) | ||||
| repairSex <- function(ped, verbose = FALSE, | ||||
| code_male = NULL, | ||||
| code_female = NULL, | ||||
| code_unknown = NULL) { | ||||
| checkSex( | ||||
| ped = ped, verbose = verbose, repair = TRUE, | ||||
| code_male = code_male, | ||||
| code_female = code_female, | ||||
| code_unknown = code_unknown | ||||
| ) | ||||
|
Comment on lines
+138
to
+147
|
||||
| } | ||||
|
|
||||
| #' Recodes Sex Variable in a Pedigree Dataframe | ||||
|
|
@@ -142,51 +157,76 @@ repairSex <- function(ped, verbose = FALSE, code_male = NULL, code_female = NULL | |||
| #' @param recode_na The value to use for missing values. Default is NA_character_ | ||||
| #' @param recode_male The value to use for males. Default is "M" | ||||
| #' @param recode_female The value to use for females. Default is "F" | ||||
| #' @param recode_unknown The value to use for unknown values. Default is "U" | ||||
| #' @inherit checkSex details | ||||
| #' @return A modified version of the input data.frame \code{ped}, containing an additional or modified 'sex_recode' column where the 'sex' values are recoded according to \code{code_male}. NA values in the 'sex' column are preserved. | ||||
| #' @export | ||||
|
|
||||
| recodeSex <- function( | ||||
| ped, verbose = FALSE, code_male = NULL, code_na = NULL, code_female = NULL, | ||||
| recode_male = "M", recode_female = "F", recode_na = NA_character_) { | ||||
| ped, verbose = FALSE, code_male = NULL, code_na = NULL, code_female = NULL, | ||||
| code_unknown = NULL, | ||||
| recode_male = "M", | ||||
| recode_female = "F", | ||||
| recode_unknown = "U", | ||||
| recode_na = NA_character_ | ||||
| ) { | ||||
| if (is.null(code_male) && is.null(code_female)) { | ||||
| if (verbose == TRUE) { | ||||
| warning("Both code male and code female are empty. No recoding was done.") | ||||
| } | ||||
| return(ped) | ||||
| } | ||||
| # First, set any code_na values to NA | ||||
| if (!is.null(code_na)) { | ||||
| ped$sex[ped$sex == code_na] <- NA | ||||
| } | ||||
| # Recode as "F" or "M" based on code_male, preserving NAs | ||||
| if (!is.null(code_male) && !is.null(code_female)) { | ||||
| # Initialize sex_recode as NA, preserving the length of the 'sex' column | ||||
| ped$sex_recode <- recode_na | ||||
| ped$sex_recode[ped$sex == code_female] <- recode_female | ||||
| ped$sex_recode[ped$sex == code_male] <- recode_male | ||||
| # Overwriting temp recode variable | ||||
| ped$sex <- ped$sex_recode | ||||
| ped$sex_recode <- NULL | ||||
| } else if (!is.null(code_male) && is.null(code_female)) { | ||||
| # Initialize sex_recode as NA, preserving the length of the 'sex' column | ||||
| ped$sex_recode <- recode_na | ||||
| ped$sex_recode[ped$sex != code_male & !is.na(ped$sex)] <- recode_female | ||||
|
|
||||
| # Initialize sex_recode as NA, preserving the length of the 'sex' column | ||||
| ped$sex_recode <- recode_na | ||||
|
|
||||
|
|
||||
|
||||
| if (!is.null(code_male)) { | ||||
| ped$sex_recode[ped$sex == code_male] <- recode_male | ||||
| # Overwriting temp recode variable | ||||
| ped$sex <- ped$sex_recode | ||||
| ped$sex_recode <- NULL | ||||
| } else if (is.null(code_male) && !is.null(code_female)) { | ||||
| # Initialize sex_recode as NA, preserving the length of the 'sex' column | ||||
| ped$sex_recode <- recode_na | ||||
| ped$sex_recode[ped$sex != code_female & !is.na(ped$sex)] <- recode_male | ||||
| } | ||||
| if (!is.null(code_female)) { | ||||
| ped$sex_recode[ped$sex == code_female] <- recode_female | ||||
| # Overwriting temp recode variable | ||||
| ped$sex <- ped$sex_recode | ||||
| ped$sex_recode <- NULL | ||||
| } else { | ||||
| if (verbose == TRUE) { | ||||
| warning("Both code male and code female are empty. No recoding was done.") | ||||
| } | ||||
|
|
||||
| # handle unknown codes | ||||
| if (!is.null(code_unknown) && !is.na(code_unknown)) { | ||||
| ped$sex_recode[ped$sex == code_unknown] <- recode_unknown | ||||
| } else if (!is.null(code_unknown) && is.na(code_unknown)) { | ||||
| ped$sex_recode[is.na(ped$sex)] <- recode_unknown | ||||
|
Comment on lines
+198
to
+199
|
||||
| } else if (!is.null(code_male) && !is.null(code_female)) { | ||||
| ped$sex_recode[!ped$sex %in% c(code_male, code_female) & !is.na(ped$sex)] <- recode_unknown | ||||
| } | ||||
|
|
||||
|
|
||||
|
||||
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a logical conflict in the recoding flow. When both code_male and code_female are provided but code_unknown is NULL, line 201 assigns recode_unknown to non-matching values. However, lines 207-221 attempt to handle the case where only one sex code is provided, which will never execute after line 201 has already recoded those values. The logic should either: (1) not assign recode_unknown in line 200-201 when code_unknown is NULL, or (2) remove lines 207-221 since they're unreachable in this scenario.
Copilot
AI
Jan 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code_unknown parameter and recode_unknown parameter lack test coverage. Tests should verify the behavior when code_unknown is provided, including the edge cases where code_unknown is NA (lines 198-199) and when it needs to infer unknown values from the data (lines 200-201).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot open a new pull request to apply changes based on this feedback
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NEWS entry doesn't accurately describe the primary change in this PR. While handling unknown sex is part of the change, the main feature is adding code_male and code_female parameters to checkParentIDs to allow customizable sex coding. Consider updating this to something like: "Added customizable sex coding parameters (code_male, code_female, code_unknown) to checkParentIDs and related functions to provide more flexible handling of sex values".