Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# BGmisc beta 1.5.2
# BGmisc 1.5.2
* More flexible ID generation for simulatePedigree
* Created ped2gen function to extract generation information from pedigree data.frames
* Added tests for ped2gen
* Fixed handling of character ID variables leading to a warning in ped2fam
* Added famIDs to phantom parents
* Tweaked how sex coding is handled to allow for unknown sex
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NEWS entry doesn't accurately describe the primary change in this PR. While handling unknown sex is part of the change, the main feature is adding code_male and code_female parameters to checkParentIDs to allow customizable sex coding. Consider updating this to something like: "Added customizable sex coding parameters (code_male, code_female, code_unknown) to checkParentIDs and related functions to provide more flexible handling of sex values".

Suggested change
* Tweaked how sex coding is handled to allow for unknown sex
* Added customizable sex coding parameters (code_male, code_female, code_unknown) to checkParentIDs and related functions to provide more flexible handling of sex values

Copilot uses AI. Check for mistakes.

# BGmisc 1.5.1
## CRAN submission
Expand Down
28 changes: 21 additions & 7 deletions R/checkParents.R
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,12 @@
#' @param repairsex A logical flag indicating whether to attempt repairs on sex of the parents
#' @param addphantoms A logical flag indicating whether to add phantom parents for missing parent IDs.
#' @param parentswithoutrow A logical flag indicating whether to add parents without a row in the pedigree.
#' @param famID Character. Column name for family IDs.
#' @param personID Character. Column name for individual IDs.
#' @param famID Character. Column name for family IDs.
#' @param personID Character. Column name for individual IDs.
#' @param momID Character. Column name for maternal IDs.
#' @param dadID Character. Column name for paternal IDs.
#'
#' @param code_male The code value used to represent male sex in the 'sex' column of \code{ped}.
#' @param code_female The code value used to represent female sex in the 'sex' column of \code{ped}.
#'
#' @return Depending on the value of `repair`, either a list containing validation results or a repaired dataframe is returned.
#' @examples
Expand All @@ -32,7 +33,10 @@ checkParentIDs <- function(ped, verbose = FALSE, repair = FALSE,
famID = "famID",
personID = "ID",
momID = "momID",
dadID = "dadID") {
dadID = "dadID",
code_male = NULL,
code_female = NULL
) {
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation is inconsistent with the closing parenthesis. It should align with the function name or be properly indented.

Suggested change
) {
) {

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +39
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code_male and code_female parameters in checkParentIDs lack test coverage. Tests should verify that when these parameters are provided, they are properly used to set validation_results$male_var and validation_results$female_var instead of inferring them from the data.

Copilot uses AI. Check for mistakes.
# Standardize column names in the input dataframe
ped <- standardizeColnames(ped, verbose = verbose)

Expand Down Expand Up @@ -87,18 +91,28 @@ checkParentIDs <- function(ped, verbose = FALSE, repair = FALSE,
cat("Step 2: Determining the if moms are the same sex and dads are same sex\n")
}
# Determine modal sex values for moms and dads



Comment on lines +94 to +96
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty lines here serve no purpose and reduce code readability. Consider removing these blank lines.

Copilot uses AI. Check for mistakes.
mom_results <- checkParentSex(ped, parent_col = "momID", verbose = verbose)
dad_results <- checkParentSex(ped, parent_col = "dadID", verbose = verbose)

validation_results$mom_sex <- mom_results$unique_sexes
validation_results$dad_sex <- dad_results$unique_sexes
validation_results$female_var <- mom_results$modal_sex
validation_results$male_var <- dad_results$modal_sex

validation_results$wrong_sex_moms <- mom_results$inconsistent_parents
validation_results$wrong_sex_dads <- dad_results$inconsistent_parents
validation_results$female_moms <- mom_results$all_same_sex
validation_results$male_dads <- dad_results$all_same_sex

if (!is.null(code_male) && !is.null(code_female)) {
validation_results$male_var <- code_male
validation_results$female_var <- code_female
validation_results$sex_code_source <- "user_provided_codes"
} else {
validation_results$female_var <- mom_results$modal_sex
validation_results$male_var <- dad_results$modal_sex
validation_results$sex_code_source <- "modal_parent_sex"
}
# Are any parents in both momID and dadID?
momdad <- intersect(ped$dadID, ped$momID)
if (length(momdad) > 0 && !is.na(momdad)) {
Expand Down
116 changes: 81 additions & 35 deletions R/checkSex.R
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,14 @@
#'
#' @details This function uses the terms 'male' and 'female' in a biological context, referring to chromosomal and other biologically-based characteristics necessary for constructing genetic pedigrees. The biological aspect of sex used in genetic analysis (genotype) is distinct from the broader, richer concept of gender identity (phenotype).
#'
#' We recognize the importance of using language and methodologies that affirm and respect the full spectrum of gender identities.
#' We recognize the importance of using language and methodologies that affirm and respect the full spectrum of gender identities.
#' The developers of this package express unequivocal support for folx in the transgender
#' and LGBTQ+ communities.
#'
#' @param ped A dataframe representing the pedigree data with a 'sex' column.
#' @param code_male The current code used to represent males in the 'sex' column.
#' @param code_female The current code used to represent females in the 'sex' column. If both are NULL, no recoding is performed.
#' @param code_unknown The current code used to represent unknown or ambiguous sex in the 'sex' column. Can be NA to indicate that missing values should be treated as unknown. If NULL and both code_male and code_female are provided, values not matching either will be inferred as unknown.
#' @param verbose A logical flag indicating whether to print progress and validation messages to the console.
#' @param repair A logical flag indicating whether to attempt repairs on the sex coding.
#' @param momID The column name for maternal IDs. Default is "momID".
Expand All @@ -37,7 +38,10 @@
#' }
#' @export
#'
checkSex <- function(ped, code_male = NULL, code_female = NULL, verbose = FALSE, repair = FALSE,
checkSex <- function(ped, code_male = NULL,
code_female = NULL,
code_unknown = NULL,
verbose = FALSE, repair = FALSE,
momID = "momID",
dadID = "dadID") {
# Standardize column names in the input dataframe
Expand All @@ -61,7 +65,6 @@ checkSex <- function(ped, code_male = NULL, code_female = NULL, verbose = FALSE,
}



# Are there multiple sexes/genders in the list of dads and moms?

dad_results <- checkParentSex(ped, parent_col = dadID, verbose = verbose)
Expand Down Expand Up @@ -92,7 +95,11 @@ checkSex <- function(ped, code_male = NULL, code_female = NULL, verbose = FALSE,

if (validation_results$sex_length == 2) {
# Recode all dads to the most frequent male value
ped <- recodeSex(ped, code_male = validation_results$most_frequent_sex_dad)
ped <- recodeSex(ped,
code_male = validation_results$most_frequent_sex_dad,
code_female = validation_results$most_frequent_sex_mom,
code_unknown = code_unknown
)
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation for closing parenthesis should be consistent with the opening line or properly aligned. This closing parenthesis should align with the opening function call on line 98.

Copilot uses AI. Check for mistakes.
# Count and record the change
num_changes <- sum(original_ped$sex != ped$sex)
# Record the change and the count
Expand Down Expand Up @@ -128,8 +135,16 @@ checkSex <- function(ped, code_male = NULL, code_female = NULL, verbose = FALSE,
#' @export
#'
#' @seealso \code{\link{checkSex}}
repairSex <- function(ped, verbose = FALSE, code_male = NULL, code_female = NULL) {
checkSex(ped = ped, verbose = verbose, repair = TRUE, code_male = code_male, code_female = code_female)
repairSex <- function(ped, verbose = FALSE,
code_male = NULL,
code_female = NULL,
code_unknown = NULL) {
checkSex(
ped = ped, verbose = verbose, repair = TRUE,
code_male = code_male,
code_female = code_female,
code_unknown = code_unknown
)
Comment on lines +138 to +147
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code_unknown parameter lacks test coverage. Tests should verify that the parameter correctly handles unknown sex values and that the recoding logic works as expected when code_unknown is provided, both when it's NA and when it's a specific value.

Copilot uses AI. Check for mistakes.
}

#' Recodes Sex Variable in a Pedigree Dataframe
Expand All @@ -142,51 +157,76 @@ repairSex <- function(ped, verbose = FALSE, code_male = NULL, code_female = NULL
#' @param recode_na The value to use for missing values. Default is NA_character_
#' @param recode_male The value to use for males. Default is "M"
#' @param recode_female The value to use for females. Default is "F"
#' @param recode_unknown The value to use for unknown values. Default is "U"
#' @inherit checkSex details
#' @return A modified version of the input data.frame \code{ped}, containing an additional or modified 'sex_recode' column where the 'sex' values are recoded according to \code{code_male}. NA values in the 'sex' column are preserved.
#' @export

recodeSex <- function(
ped, verbose = FALSE, code_male = NULL, code_na = NULL, code_female = NULL,
recode_male = "M", recode_female = "F", recode_na = NA_character_) {
ped, verbose = FALSE, code_male = NULL, code_na = NULL, code_female = NULL,
code_unknown = NULL,
recode_male = "M",
recode_female = "F",
recode_unknown = "U",
recode_na = NA_character_
) {
if (is.null(code_male) && is.null(code_female)) {
if (verbose == TRUE) {
warning("Both code male and code female are empty. No recoding was done.")
}
return(ped)
}
# First, set any code_na values to NA
if (!is.null(code_na)) {
ped$sex[ped$sex == code_na] <- NA
}
# Recode as "F" or "M" based on code_male, preserving NAs
if (!is.null(code_male) && !is.null(code_female)) {
# Initialize sex_recode as NA, preserving the length of the 'sex' column
ped$sex_recode <- recode_na
ped$sex_recode[ped$sex == code_female] <- recode_female
ped$sex_recode[ped$sex == code_male] <- recode_male
# Overwriting temp recode variable
ped$sex <- ped$sex_recode
ped$sex_recode <- NULL
} else if (!is.null(code_male) && is.null(code_female)) {
# Initialize sex_recode as NA, preserving the length of the 'sex' column
ped$sex_recode <- recode_na
ped$sex_recode[ped$sex != code_male & !is.na(ped$sex)] <- recode_female

# Initialize sex_recode as NA, preserving the length of the 'sex' column
ped$sex_recode <- recode_na


Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty line here serves no purpose and reduces code readability. Consider removing this blank line.

Copilot uses AI. Check for mistakes.
if (!is.null(code_male)) {
ped$sex_recode[ped$sex == code_male] <- recode_male
# Overwriting temp recode variable
ped$sex <- ped$sex_recode
ped$sex_recode <- NULL
} else if (is.null(code_male) && !is.null(code_female)) {
# Initialize sex_recode as NA, preserving the length of the 'sex' column
ped$sex_recode <- recode_na
ped$sex_recode[ped$sex != code_female & !is.na(ped$sex)] <- recode_male
}
if (!is.null(code_female)) {
ped$sex_recode[ped$sex == code_female] <- recode_female
# Overwriting temp recode variable
ped$sex <- ped$sex_recode
ped$sex_recode <- NULL
} else {
if (verbose == TRUE) {
warning("Both code male and code female are empty. No recoding was done.")
}

# handle unknown codes
if (!is.null(code_unknown) && !is.na(code_unknown)) {
ped$sex_recode[ped$sex == code_unknown] <- recode_unknown
} else if (!is.null(code_unknown) && is.na(code_unknown)) {
ped$sex_recode[is.na(ped$sex)] <- recode_unknown
Comment on lines +198 to +199
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a logical error in this condition. The condition checks "!is.null(code_unknown) && is.na(code_unknown)", but if code_unknown is NULL, is.na(code_unknown) will return a logical(0) vector, not TRUE or FALSE. This condition will never execute as intended. Consider using "!is.null(code_unknown) && length(code_unknown) == 1 && is.na(code_unknown)" if the intent is to check for an explicit NA value.

Copilot uses AI. Check for mistakes.
} else if (!is.null(code_male) && !is.null(code_female)) {
ped$sex_recode[!ped$sex %in% c(code_male, code_female) & !is.na(ped$sex)] <- recode_unknown
}


Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty line here serves no purpose and reduces code readability. Consider removing this blank line.

Suggested change

Copilot uses AI. Check for mistakes.
# Handle cases where only one of code
# just male
if (!is.null(code_male) && is.null(code_female)) {
if (!is.null(code_unknown)) {
ped$sex_recode[ped$sex != code_male & !is.na(ped$sex) & ped$sex != code_unknown] <- recode_female
} else if (is.null(code_unknown)) {
ped$sex_recode[ped$sex != code_male & !is.na(ped$sex)] <- recode_female
}
}
# just female
if (is.null(code_male) && !is.null(code_female)) {
if (!is.null(code_unknown)) {
ped$sex_recode[ped$sex != code_female & !is.na(ped$sex) & ped$sex != code_unknown] <- recode_male
} else if (is.null(code_unknown)) {
ped$sex_recode[ped$sex != code_female & !is.na(ped$sex)] <- recode_male
}
}
Comment on lines +200 to +221
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a logical conflict in the recoding flow. When both code_male and code_female are provided but code_unknown is NULL, line 201 assigns recode_unknown to non-matching values. However, lines 207-221 attempt to handle the case where only one sex code is provided, which will never execute after line 201 has already recoded those values. The logic should either: (1) not assign recode_unknown in line 200-201 when code_unknown is NULL, or (2) remove lines 207-221 since they're unreachable in this scenario.

Copilot uses AI. Check for mistakes.
Comment on lines 149 to +221
Copy link

Copilot AI Jan 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code_unknown parameter and recode_unknown parameter lack test coverage. Tests should verify the behavior when code_unknown is provided, including the edge cases where code_unknown is NA (lines 198-199) and when it needs to infer unknown values from the data (lines 200-201).

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback


# Overwriting temp recode variable
ped$sex <- ped$sex_recode
ped$sex_recode <- NULL
return(ped)
}



#' Check Parental Role Sex Consistency
#'
#' Validates sex coding consistency for a given parental role (momID or dadID).
Expand Down Expand Up @@ -225,9 +265,15 @@ checkParentSex <- function(ped, parent_col, sex_col = "sex", verbose = FALSE) {
# Store the most frequent sex for moms and dads
modal_sex <- names(sort(table(parent_sexes), decreasing = TRUE))[1]

if (all(is.na(modal_sex)) && verbose == TRUE) {
cat(paste0("All parents in role ", parent_col, " have missing sex values.\n"))
}

# Type coercion based on ped$sex type
if (is.numeric(ped[[sex_col]])) {
modal_sex <- as.numeric(modal_sex)
} else if (is.character(ped[[sex_col]])) {
modal_sex <- as.character(modal_sex)
}

# List ids for dads that are female, moms that are male
Expand Down
8 changes: 7 additions & 1 deletion man/checkParentIDs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 4 additions & 1 deletion man/checkSex.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 7 additions & 1 deletion man/recodeSex.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 10 additions & 2 deletions man/repairSex.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading