Add modular selectivity curve generation from EWE F values#15
Add modular selectivity curve generation from EWE F values#15
Conversation
Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
…features Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
|
@kellijohnson-NOAA I think this PR looks great! I’ll review it and add a style guide for both R functions and tests, making them general enough to be reused by Copilot for future PRs. |
|
@copilot could you update the PR following these general guidelines:
|
Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
…ll inflection variables Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
Updated the PR to follow the style guidelines:
All changes validated with syntax checking. |
|
@copilot could you update the PR following the instructions below:
|
|
@Bai-Li-NOAA Unfortunately I hit an unexpected error while processing your comment. I've automatically reported this to GitHub. You can ask me to try again later by mentioning me in a new comment. If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: Sorry for the inconvenience! |
|
@copilot could you update the PR following the instructions below:
|
- Removed inst/selectivity_curves_guide.md (content integrated into function docs) - Made fit_*_selectivity() documentation generic (not fishing-mortality-specific) - Removed @author fields from all roxygen documentation - Split tests into test-fit_selectivity_helpers.R and test-generate_selectivity_curve.R - Changed real EWE test to use menhaden instead of dogfish - Added validation tests that verify parameter recovery from known true values Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
All requested changes completed in commit e209078:
|
Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
|
Hi @Bai-Li-NOAA, I would update your code to use the binomial distribution instead of the sum of squares, as it accounts for the mean-variance relationship of bounded data. If you did not normalize F, you could also use a gamma distribution instead. With respect to the logistic, the model can be written as: This can also be written in glm form as: where The double logistic is a little more complicated as the model doesn't have a linear form. It could be set up in optim as: These could also be run in RTMB, the only benefit is that you could get uncertainties for the parameter estimated |
|
@Andrea-Havron-NOAA if we use RTMB can we use the fork of FIMS that has RTMB capability and somehow just use the logistic and double-logistic functions already written in FIMS and then RTMB to optimize the parameters? |
@kellijohnson-NOAA, with respect to the glm framework, you would not use the logistic function. Instead you would estimate a linear combination of beta0, beta1, and ages and back transform to get slope and inflection_point. We could using the RTMB fork of FIMS for the double logistic, we would just need to expose the double logistic function for RTMB similar to how we exposed the logistic function. |
|
Thank you, @Andrea-Havron-NOAA and @kellijohnson-NOAA, I think I can explore the code that @Andrea-Havron-NOAA shared and we can adopt it once all tests pass. I’m not sure we strictly need uncertainty for the parameters right now, since the function is mainly for preparing initial values used in stock assessments. Figuring out how to expose the functions in the RTMB fork and make this repo depend on that repo might take some time. In the long term, though, I’d like to reuse the developed functions rather than writing our own. |
kellijohnson-NOAA
left a comment
There was a problem hiding this comment.
I just reviewed the new selectivity file, I haven't reviewed the rest of it yet because I wanted to get you these comments. I think I will wait to review the rest until changes are made. I think that the biggest change that I am thinking is the use of dplyr::group_by and dplyr::summarize rather than the purr call. But, I am wondering about the logic of normalizing outside of the function rather than internal to the function? Also, about the requirements of positive slopes.
| #' functional_form = "double_logistic" | ||
| #' ) | ||
| #' } | ||
| calculate_selectivity_parameters <- function(data, |
There was a problem hiding this comment.
I think that we should call it estimate_true_selectivity() rather than calculate_*() because we are actually estimating them. Additionally, this proposed name leave space for future functions like estimate_true_maturity().
| # Validate inputs | ||
| if (!inherits(data, "data.frame")) { | ||
| cli::cli_abort("{.arg data} must be a data frame or tibble.") | ||
| } | ||
|
|
||
| # Check for empty data | ||
| if (nrow(data) == 0) { | ||
| cli::cli_abort("{.arg data} cannot be empty.") | ||
| } | ||
|
|
||
| required_columns <- c("functional_group", "year", "month", "value", "fleet") | ||
| missing_columns <- setdiff(required_columns, colnames(data)) | ||
| if (length(missing_columns) > 0) { | ||
| cli::cli_abort( | ||
| "{.arg data} is missing required column{?s}: {.val {missing_columns}}" | ||
| ) | ||
| } | ||
|
|
There was a problem hiding this comment.
It might be beneficial to pull this code out into its own validate function because I can see it being useful in other functions and it would just need two arguments, data and column_names.
| if (length(unique_species) > 1) { | ||
| cli::cli_warn( | ||
| c( | ||
| "{.arg data} contains multiple species: {.val {unique_species}}.", |
There was a problem hiding this comment.
should be the following so the error message has a red x next to this line.
"x" = "{.arg data} contains multiple species: {.val {unique_species}}.",
| cli::cli_abort( | ||
| "{.arg ages} must have the same length as the number of unique functional groups ({length(functional_groups)})." | ||
| ) |
There was a problem hiding this comment.
Suggestion for clarification to the user:
cli::cli_abort(
"The length of {.arg ages} ({length(ages)}) must have the same length as the number of unique functional groups ({length(functional_groups)})."
)
| valid_forms <- c("logistic", "double_logistic") | ||
| if (!functional_form %in% valid_forms) { | ||
| cli::cli_abort( | ||
| "{.arg functional_form} must be one of: {.val {valid_forms}}" | ||
| ) | ||
| } | ||
|
|
There was a problem hiding this comment.
We could instead use functional_form <- match.arg(functional_form) if we used a check in the actual input, i.e.,
calculate_selectivity_parameters <- function(data, ages = NULL, function_form = c("logistic", "double_logistic")) instead of having to write our own check. I think one of the tidyverse packages has a variation of match.arg() if you want to use their function instead.
| #' @noRd | ||
| fit_double_logistic_selectivity <- function(normalized_values, ages) { | ||
| # Sanitize inputs: set any negative values to 0 (floating point noise) | ||
| normalized_values[normalized_values < 0] <- 0 |
There was a problem hiding this comment.
If they are normalized then why would there be zeros. This is a hidden thing that I think the user should know about. Also, I think that the normalization should happen within the function. Will these functions work without normalization? If, no the normalization needs to happen in here rather than outside.
| } | ||
|
|
||
| # Objective function: sum of squared residuals | ||
| objective_function <- function(parameters) { |
There was a problem hiding this comment.
again, ages is not being passed.
| par = c(initial_inflection, initial_slope), | ||
| fn = objective_function, | ||
| method = "L-BFGS-B", | ||
| lower = c(min(ages), 0.01), |
There was a problem hiding this comment.
you can have a negative slope for the logistic function if younger ages are more selected than older ages.
| ), | ||
| fn = objective_function, | ||
| method = "L-BFGS-B", | ||
| lower = c(min(ages), 0.01, min(ages), 0.01), |
There was a problem hiding this comment.
should both slopes be bounded to be greater than 0?
| distribution_link = NA_character_, | ||
| age = NA_real_, | ||
| length = NA_real_, |
There was a problem hiding this comment.
same comment as before for the NA columns
Implementation Complete ✅
Summary
Successfully implemented a modular function system to generate selectivity curves from Ecopath with Ecosim (EWE) fishing mortality (F) values by functional group, following tidyverse style guide and FIMS contributing guidelines.
Latest Updates (addressing review feedback)
Implementation Checklist
generate_selectivity_curve()that takes ecosystem model output, functional groups vector, and functional formfit_logistic_selectivity()for logistic selectivityfit_double_logistic_selectivity()for double logistic selectivity$with[[""]]for field extractionStyle Guide Compliance
All code now follows tidyverse style guide and FIMS contributing guidelines:
[[""]]instead of$for field extractionFiles Modified
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.