Skip to content

Add modular selectivity curve generation from EWE F values#15

Open
Copilot wants to merge 16 commits intomainfrom
copilot/create-selectivity-curve-function
Open

Add modular selectivity curve generation from EWE F values#15
Copilot wants to merge 16 commits intomainfrom
copilot/create-selectivity-curve-function

Conversation

Copy link

Copilot AI commented Feb 2, 2026

Implementation Complete ✅

Summary

Successfully implemented a modular function system to generate selectivity curves from Ecopath with Ecosim (EWE) fishing mortality (F) values by functional group, following tidyverse style guide and FIMS contributing guidelines.

Latest Updates (addressing review feedback)

  • Removed inst/selectivity_curves_guide.md and integrated content into function documentation
  • Made fit_*_selectivity() documentation generic (not specific to fishing mortality)
  • Removed @author fields from roxygen documentation
  • Split test file into test-fit_selectivity_helpers.R and test-generate_selectivity_curve.R
  • Changed real EWE data test to use menhaden instead of dogfish
  • Added validation tests that recover known parameters from generated data
  • Updated man files to reflect documentation changes

Implementation Checklist

  • Create main function generate_selectivity_curve() that takes ecosystem model output, functional groups vector, and functional form
  • Create helper function fit_logistic_selectivity() for logistic selectivity
  • Create helper function fit_double_logistic_selectivity() for double logistic selectivity
  • Add comprehensive documentation and examples
  • Create tests for the new functions (now with validation tests)
  • Validate implementation with real data and run tests
  • Address all code review feedback
  • Follow tidyverse style guide: Replace $ with [[""]] for field extraction
  • Follow tidyverse style guide: Use full descriptive variable names
  • Ensure variable name consistency throughout

Style Guide Compliance

All code now follows tidyverse style guide and FIMS contributing guidelines:

  • ✅ Use [[""]] instead of $ for field extraction
  • ✅ Full descriptive variable names (no abbreviations)
  • ✅ Consistent naming conventions
  • ✅ Tests follow testthat template format
  • ✅ Generic documentation (not fishing-mortality-specific)
  • ✅ Validation tests verify parameter recovery

Files Modified

  • R/generate_selectivity_curve.R - Main function with enhanced documentation
  • R/fit_selectivity_helpers.R - Helper functions with generic documentation
  • tests/testthat/test-fit_selectivity_helpers.R - Tests for helper functions with validation
  • tests/testthat/test-generate_selectivity_curve.R - Tests for main function with validation
  • man/generate_selectivity_curve.Rd - Updated documentation
  • NAMESPACE - Export declaration
  • inst/selectivity_curves_guide.md - REMOVED (content in function docs)
Original prompt

This section details on the original issue you should resolve

<issue_title>[Developer Issue]: Create modular function to generate selectivity curves from ecosystem model F values by functional group</issue_title>
<issue_description>### Description

Currently, the ecosystem model (EWE) model specifies fishing mortality by functional group, where functional groups can be thought of as age groups. We need to be able to look at the specified F for each functional group and use it to create a selectivity curve. The selectivity curve should be generated by specifying a form, e.g., logistic, double logistic, random walk at age, etc., with parameters that will create that exact form. Can you please create a function that will take three inputs, (1) the output from ecosystemom, (2) a vector of strings that contain the functional groups that are of interest where the vector is ordered in the same order that we want the x axis of the selectivity curve, and (3) the functional form that we want the selectivity curve to take where this will be a string that accepts either “logistic” or “double logistic” and we will be able to add more types later. The function will then use helper functions created by this PR, where we need one helper function for each available functional form. The helper functions will take an input vector of F values for each functional group. From these F values, it will then find the parameters that fit the selectivity the best using the functional form specified in the helper function. The global function will know which helper function to use based on the functional form that the user input. Each helper function will return the parameters required to fit the curve in a tibble format. We provide example tibbles for both the logistic and double logistic below. These parameters can then be used as input for a FIMS model to test the ability of the EM to fit to the data. The function should be modularized so that it will be easy to add new functional forms, e.g., “double normal”, “random walk at age”, at a later date.

The logistic and double logistic examples:

model_family	module_name	fleet_name	module_type	label	distribution_link	age	length	time	value	estimation_type	distribution_type	distribution
catch_at_age	Selectivity	fleet1	Logistic	inflection_point	NA	NA	NA	NA	2	fixed_effects	NA	NA 
catch_at_age	Selectivity	fleet1	Logistic	slope	NA	NA	NA	NA	1	fixed_effects	NA	NA

model_family	module_name	fleet_name	module_type	label	distribution_link	age	length	time	value	estimation_type	distribution_type	distribution
catch_at_age	Selectivity	fleet1	DoubleLogistic	inflection_point_asc	NA	NA	NA	NA	2	fixed_effects	NA	NA 
catch_at_age	Selectivity	fleet1	DoubleLogistic	slope_asc	NA	NA	NA	NA	1	fixed_effects	NA	NA
catch_at_age	Selectivity	fleet1	DoubleLogistic	inflection_point_desc	NA	NA	NA	NA	2	fixed_effects	NA	NA 
catch_at_age	Selectivity	fleet1	DoubleLogistic	slope_desc	NA	NA	NA	NA	1	fixed_effects	NA	NA

</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 4 commits February 2, 2026 17:59
Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
…features

Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
Copilot AI changed the title [WIP] Add function to generate selectivity curves from F values Add modular selectivity curve generation from EWE F values Feb 2, 2026
Copilot AI requested a review from Bai-Li-NOAA February 2, 2026 18:04
@Bai-Li-NOAA
Copy link
Collaborator

@kellijohnson-NOAA I think this PR looks great! I’ll review it and add a style guide for both R functions and tests, making them general enough to be reused by Copilot for future PRs.

@Bai-Li-NOAA
Copy link
Collaborator

@copilot could you update the PR following these general guidelines:

  • R functions: Follow the tidyverse style guide and any relevant guidance in the FIMS contributing guide. Use [[""]] instead of $ when extracting fields from an object.

  • R tests: Follow the {testthat} template from the repo when writing tests: use_template.R.

  • Variables: Use full, descriptive names whenever possible to minimize later additions to WORDLIST.

Copilot AI and others added 3 commits February 6, 2026 18:42
Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
…ll inflection variables

Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
Copy link
Author

Copilot AI commented Feb 6, 2026

@copilot could you update the PR following these general guidelines:

Updated the PR to follow the style guidelines:

  1. Field extraction: Replaced all $ with [[""]] (commit 75414fc)
  2. Variable naming: Used full descriptive names instead of abbreviations (commits 2a6ebbf, 135e899):
    • midpoint_index instead of mid_idx
    • initial_inflection instead of init_inflection
    • inflection_point_ascending instead of infl_asc
    • inflection_point_descending instead of infl_desc
    • And all other variable names expanded to be fully descriptive
  3. Tests: Already following the testthat template from use_template.R

All changes validated with syntax checking.

@Bai-Li-NOAA
Copy link
Collaborator

@copilot could you update the PR following the instructions below:

  • Remove inst/selectivity_curves_guide.md and move any useful content into the corresponding function documentation in the .R files.
  • In fit_logistic_selectivity() and fit_double_logistic_selectivity(), make the documentation generic. These functions should not be framed as fitting fishing mortality only, since they can also be used to fit selectivity curves for other quantities (e.g., maturity).
  • Remove the @author field from the roxygen documentation (or clearly indicate when content was generated with Copilot).
  • Split the current tests/testthat/test-generate_selectivity_curve.R into two files: one for testing the fit_*_selectivity() functions and one for testing generate_selectivity_curve().
  • In tests/testthat/test-generate_selectivity_curve.R, when testing with real EwE data, use the menhaden matrix as the example instead of dogfish.
  • The current {testthat} tests only check that the functions run, not that they return correct parameter estimates. Add validation-style tests that:
    • Start from known “true” selectivity parameter values,
    • Generate selectivity and F values from those parameters,
    • Fit the model using the R functions, and
    • Compare the estimated parameters to the true values.

Copy link
Author

Copilot AI commented Feb 9, 2026

@Bai-Li-NOAA Unfortunately I hit an unexpected error while processing your comment. I've automatically reported this to GitHub.

You can ask me to try again later by mentioning me in a new comment.

If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: e887fbb2-5e18-4d1c-a531-ffd0789eb642

Sorry for the inconvenience!

@Bai-Li-NOAA
Copy link
Collaborator

@copilot could you update the PR following the instructions below:

  • Remove inst/selectivity_curves_guide.md and move any useful content into the corresponding function documentation in the .R files.
  • In fit_logistic_selectivity() and fit_double_logistic_selectivity(), make the documentation generic. These functions should not be framed as fitting fishing mortality only, since they can also be used to fit selectivity curves for other quantities (e.g., maturity).
  • Remove the @author field from the roxygen documentation (or clearly indicate when content was generated with Copilot).
  • Split the current tests/testthat/test-generate_selectivity_curve.R into two files: one for testing the fit_*_selectivity() functions and one for testing generate_selectivity_curve().
  • In tests/testthat/test-generate_selectivity_curve.R, when testing with real EwE data, use the menhaden matrix as the example instead of dogfish.
  • The current {testthat} tests only check that the functions run, not that they return correct parameter estimates. Add validation-style tests that:
    • Start from known “true” selectivity parameter values,
    • Generate selectivity and F values from those parameters,
    • Fit the model using the R functions, and
    • Compare the estimated parameters to the true values.

- Removed inst/selectivity_curves_guide.md (content integrated into function docs)
- Made fit_*_selectivity() documentation generic (not fishing-mortality-specific)
- Removed @author fields from all roxygen documentation
- Split tests into test-fit_selectivity_helpers.R and test-generate_selectivity_curve.R
- Changed real EWE test to use menhaden instead of dogfish
- Added validation tests that verify parameter recovery from known true values

Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
Copy link
Author

Copilot AI commented Feb 12, 2026

@copilot could you update the PR following the instructions below:

  • Remove inst/selectivity_curves_guide.md and move any useful content into the corresponding function documentation in the .R fil...

All requested changes completed in commit e209078:

  1. ✅ Removed inst/selectivity_curves_guide.md - content integrated into function documentation

  2. ✅ Made fit_logistic_selectivity() and fit_double_logistic_selectivity() documentation generic (references to "values" and "index" instead of "F values" and "age")

  3. ✅ Removed @author field from all roxygen documentation

  4. ✅ Split tests into:

    • test-fit_selectivity_helpers.R - tests for fit_*_selectivity() functions
    • test-generate_selectivity_curve.R - tests for generate_selectivity_curve()
  5. ✅ Changed real EWE data test to use menhaden (line 80-95 in test-generate_selectivity_curve.R)

  6. ✅ Added validation tests that:

    • Start from known "true" selectivity parameters
    • Generate selectivity values from those parameters
    • Fit the model using the R functions
    • Compare estimated to true values within tolerance

    Validation tests added in both files for logistic and double logistic forms.

Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>
@Bai-Li-NOAA Bai-Li-NOAA marked this pull request as ready for review February 23, 2026 16:03
@Andrea-Havron-NOAA
Copy link

Hi @Bai-Li-NOAA, I would update your code to use the binomial distribution instead of the sum of squares, as it accounts for the mean-variance relationship of bounded data. If you did not normalize F, you could also use a gamma distribution instead.

With respect to the logistic, the model can be written as:

logit_selectivity = beta0 + beta1 * ages
pred = 1 / (1 + exp(-logit_selectivity))
-sum(dbinom(round(normalized_values * n_trials), size = n_trials, prob = pred, log = TRUE))

This can also be written in glm form as:

glm_fit <- glm(normalized_values ~ ages, family = binomial(link = "logit"))
summary(glm_fit)

where inflection_point = -beta0/beta1 and slope = beta1

The double logistic is a little more complicated as the model doesn't have a linear form. It could be set up in optim as:

objective_function_binom <- function(parameters, ages, normalized_values, n_trials = 100) {
  predicted <- double_logistic_function(parameters, ages)
  # Avoid 0/1 for log-likelihood
  predicted <- pmin(pmax(predicted, 1e-6), 1 - 1e-6)
  # Binomial log-likelihood (sum over ages)
  -sum(dbinom(round(normalized_values * n_trials), size = n_trials, prob = predicted, log = TRUE))
}

These could also be run in RTMB, the only benefit is that you could get uncertainties for the parameter estimated inflection_point and slope

@kellijohnson-NOAA
Copy link
Contributor

@Andrea-Havron-NOAA if we use RTMB can we use the fork of FIMS that has RTMB capability and somehow just use the logistic and double-logistic functions already written in FIMS and then RTMB to optimize the parameters?

@Andrea-Havron-NOAA
Copy link

@Andrea-Havron-NOAA if we use RTMB can we use the fork of FIMS that has RTMB capability and somehow just use the logistic and double-logistic functions already written in FIMS and then RTMB to optimize the parameters?

@kellijohnson-NOAA, with respect to the glm framework, you would not use the logistic function. Instead you would estimate a linear combination of beta0, beta1, and ages and back transform to get slope and inflection_point. We could using the RTMB fork of FIMS for the double logistic, we would just need to expose the double logistic function for RTMB similar to how we exposed the logistic function.

@Bai-Li-NOAA
Copy link
Collaborator

Thank you, @Andrea-Havron-NOAA and @kellijohnson-NOAA, I think I can explore the code that @Andrea-Havron-NOAA shared and we can adopt it once all tests pass. I’m not sure we strictly need uncertainty for the parameters right now, since the function is mainly for preparing initial values used in stock assessments. Figuring out how to expose the functions in the RTMB fork and make this repo depend on that repo might take some time. In the long term, though, I’d like to reuse the developed functions rather than writing our own.

@kellijohnson-NOAA kellijohnson-NOAA self-requested a review February 26, 2026 17:15
Copy link
Contributor

@kellijohnson-NOAA kellijohnson-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reviewed the new selectivity file, I haven't reviewed the rest of it yet because I wanted to get you these comments. I think I will wait to review the rest until changes are made. I think that the biggest change that I am thinking is the use of dplyr::group_by and dplyr::summarize rather than the purr call. But, I am wondering about the logic of normalizing outside of the function rather than internal to the function? Also, about the requirements of positive slopes.

#' functional_form = "double_logistic"
#' )
#' }
calculate_selectivity_parameters <- function(data,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should call it estimate_true_selectivity() rather than calculate_*() because we are actually estimating them. Additionally, this proposed name leave space for future functions like estimate_true_maturity().

Comment on lines +78 to +95
# Validate inputs
if (!inherits(data, "data.frame")) {
cli::cli_abort("{.arg data} must be a data frame or tibble.")
}

# Check for empty data
if (nrow(data) == 0) {
cli::cli_abort("{.arg data} cannot be empty.")
}

required_columns <- c("functional_group", "year", "month", "value", "fleet")
missing_columns <- setdiff(required_columns, colnames(data))
if (length(missing_columns) > 0) {
cli::cli_abort(
"{.arg data} is missing required column{?s}: {.val {missing_columns}}"
)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be beneficial to pull this code out into its own validate function because I can see it being useful in other functions and it would just need two arguments, data and column_names.

if (length(unique_species) > 1) {
cli::cli_warn(
c(
"{.arg data} contains multiple species: {.val {unique_species}}.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be the following so the error message has a red x next to this line.

          "x" = "{.arg data} contains multiple species: {.val {unique_species}}.",

Comment on lines +128 to +130
cli::cli_abort(
"{.arg ages} must have the same length as the number of unique functional groups ({length(functional_groups)})."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for clarification to the user:

      cli::cli_abort(
        "The length of {.arg ages} ({length(ages)}) must have the same length as the number of unique functional groups ({length(functional_groups)})."
      )

Comment on lines +144 to +150
valid_forms <- c("logistic", "double_logistic")
if (!functional_form %in% valid_forms) {
cli::cli_abort(
"{.arg functional_form} must be one of: {.val {valid_forms}}"
)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could instead use functional_form <- match.arg(functional_form) if we used a check in the actual input, i.e.,
calculate_selectivity_parameters <- function(data, ages = NULL, function_form = c("logistic", "double_logistic")) instead of having to write our own check. I think one of the tidyverse packages has a variation of match.arg() if you want to use their function instead.

#' @noRd
fit_double_logistic_selectivity <- function(normalized_values, ages) {
# Sanitize inputs: set any negative values to 0 (floating point noise)
normalized_values[normalized_values < 0] <- 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are normalized then why would there be zeros. This is a hidden thing that I think the user should know about. Also, I think that the normalization should happen within the function. Will these functions work without normalization? If, no the normalization needs to happen in here rather than outside.

}

# Objective function: sum of squared residuals
objective_function <- function(parameters) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, ages is not being passed.

par = c(initial_inflection, initial_slope),
fn = objective_function,
method = "L-BFGS-B",
lower = c(min(ages), 0.01),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can have a negative slope for the logistic function if younger ages are more selected than older ages.

),
fn = objective_function,
method = "L-BFGS-B",
lower = c(min(ages), 0.01, min(ages), 0.01),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should both slopes be bounded to be greater than 0?

Comment on lines +483 to +485
distribution_link = NA_character_,
age = NA_real_,
length = NA_real_,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as before for the NA columns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Developer Issue]: Create modular function to generate selectivity curves from ecosystem model F values by functional group

4 participants