Add modular selectivity curve generation from EWE F values by Copilot · Pull Request #15 · NOAA-FIMS/ecosystemdata

Copilot · 2026-02-02T17:53:12Z

Implementation Complete ✅

Summary

Successfully implemented a modular function system to generate selectivity curves from Ecopath with Ecosim (EWE) fishing mortality (F) values by functional group, following tidyverse style guide and FIMS contributing guidelines.

Latest Updates (addressing review feedback)

Removed inst/selectivity_curves_guide.md and integrated content into function documentation
Made fit_*_selectivity() documentation generic (not specific to fishing mortality)
Removed @author fields from roxygen documentation
Split test file into test-fit_selectivity_helpers.R and test-generate_selectivity_curve.R
Changed real EWE data test to use menhaden instead of dogfish
Added validation tests that recover known parameters from generated data
Updated man files to reflect documentation changes

Implementation Checklist

Style Guide Compliance

All code now follows tidyverse style guide and FIMS contributing guidelines:

✅ Use [[""]] instead of $ for field extraction
✅ Full descriptive variable names (no abbreviations)
✅ Consistent naming conventions
✅ Tests follow testthat template format
✅ Generic documentation (not fishing-mortality-specific)
✅ Validation tests verify parameter recovery

Files Modified

R/generate_selectivity_curve.R - Main function with enhanced documentation
R/fit_selectivity_helpers.R - Helper functions with generic documentation
tests/testthat/test-fit_selectivity_helpers.R - Tests for helper functions with validation
tests/testthat/test-generate_selectivity_curve.R - Tests for main function with validation
man/generate_selectivity_curve.Rd - Updated documentation
NAMESPACE - Export declaration
inst/selectivity_curves_guide.md - REMOVED (content in function docs)

Original prompt

This section details on the original issue you should resolve

<issue_title>[Developer Issue]: Create modular function to generate selectivity curves from ecosystem model F values by functional group</issue_title>
<issue_description>### Description

Currently, the ecosystem model (EWE) model specifies fishing mortality by functional group, where functional groups can be thought of as age groups. We need to be able to look at the specified F for each functional group and use it to create a selectivity curve. The selectivity curve should be generated by specifying a form, e.g., logistic, double logistic, random walk at age, etc., with parameters that will create that exact form. Can you please create a function that will take three inputs, (1) the output from ecosystemom, (2) a vector of strings that contain the functional groups that are of interest where the vector is ordered in the same order that we want the x axis of the selectivity curve, and (3) the functional form that we want the selectivity curve to take where this will be a string that accepts either “logistic” or “double logistic” and we will be able to add more types later. The function will then use helper functions created by this PR, where we need one helper function for each available functional form. The helper functions will take an input vector of F values for each functional group. From these F values, it will then find the parameters that fit the selectivity the best using the functional form specified in the helper function. The global function will know which helper function to use based on the functional form that the user input. Each helper function will return the parameters required to fit the curve in a tibble format. We provide example tibbles for both the logistic and double logistic below. These parameters can then be used as input for a FIMS model to test the ability of the EM to fit to the data. The function should be modularized so that it will be easy to add new functional forms, e.g., “double normal”, “random walk at age”, at a later date.

The logistic and double logistic examples:
model_family	module_name	fleet_name	module_type	label	distribution_link	age	length	time	value	estimation_type	distribution_type	distribution
catch_at_age	Selectivity	fleet1	Logistic	inflection_point	NA	NA	NA	NA	2	fixed_effects	NA	NA 
catch_at_age	Selectivity	fleet1	Logistic	slope	NA	NA	NA	NA	1	fixed_effects	NA	NA

model_family	module_name	fleet_name	module_type	label	distribution_link	age	length	time	value	estimation_type	distribution_type	distribution
catch_at_age	Selectivity	fleet1	DoubleLogistic	inflection_point_asc	NA	NA	NA	NA	2	fixed_effects	NA	NA 
catch_at_age	Selectivity	fleet1	DoubleLogistic	slope_asc	NA	NA	NA	NA	1	fixed_effects	NA	NA
catch_at_age	Selectivity	fleet1	DoubleLogistic	inflection_point_desc	NA	NA	NA	NA	2	fixed_effects	NA	NA 
catch_at_age	Selectivity	fleet1	DoubleLogistic	slope_desc	NA	NA	NA	NA	1	fixed_effects	NA	NA
</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes [Developer Issue]: Create modular function to generate selectivity curves from ecosystem model F values by functional group #14

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>

…features Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>

Bai-Li-NOAA · 2026-02-03T16:22:14Z

@kellijohnson-NOAA I think this PR looks great! I’ll review it and add a style guide for both R functions and tests, making them general enough to be reused by Copilot for future PRs.

Bai-Li-NOAA · 2026-02-06T18:38:40Z

@copilot could you update the PR following these general guidelines:

R functions: Follow the tidyverse style guide and any relevant guidance in the FIMS contributing guide. Use [[""]] instead of $ when extracting fields from an object.
R tests: Follow the {testthat} template from the repo when writing tests: use_template.R.
Variables: Use full, descriptive names whenever possible to minimize later additions to WORDLIST.

Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>

…ll inflection variables Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>

Copilot · 2026-02-06T18:45:33Z

@copilot could you update the PR following these general guidelines:

R functions: Follow the tidyverse style guide and any relevant guidance in the [FIMS contribut...

Updated the PR to follow the style guidelines:

Field extraction: Replaced all $ with [[""]] (commit 75414fc)
Variable naming: Used full descriptive names instead of abbreviations (commits 2a6ebbf, 135e899):
- midpoint_index instead of mid_idx
- initial_inflection instead of init_inflection
- inflection_point_ascending instead of infl_asc
- inflection_point_descending instead of infl_desc
- And all other variable names expanded to be fully descriptive
Tests: Already following the testthat template from use_template.R

All changes validated with syntax checking.

Bai-Li-NOAA · 2026-02-09T15:20:48Z

@copilot could you update the PR following the instructions below:

Remove inst/selectivity_curves_guide.md and move any useful content into the corresponding function documentation in the .R files.
In fit_logistic_selectivity() and fit_double_logistic_selectivity(), make the documentation generic. These functions should not be framed as fitting fishing mortality only, since they can also be used to fit selectivity curves for other quantities (e.g., maturity).
Remove the @author field from the roxygen documentation (or clearly indicate when content was generated with Copilot).
Split the current tests/testthat/test-generate_selectivity_curve.R into two files: one for testing the fit_*_selectivity() functions and one for testing generate_selectivity_curve().
In tests/testthat/test-generate_selectivity_curve.R, when testing with real EwE data, use the menhaden matrix as the example instead of dogfish.
The current {testthat} tests only check that the functions run, not that they return correct parameter estimates. Add validation-style tests that:
- Start from known “true” selectivity parameter values,
- Generate selectivity and F values from those parameters,
- Fit the model using the R functions, and
- Compare the estimated parameters to the true values.

Copilot · 2026-02-09T15:20:54Z

@Bai-Li-NOAA Unfortunately I hit an unexpected error while processing your comment. I've automatically reported this to GitHub.

You can ask me to try again later by mentioning me in a new comment.

If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: e887fbb2-5e18-4d1c-a531-ffd0789eb642

Sorry for the inconvenience!

Bai-Li-NOAA · 2026-02-12T21:21:23Z

@copilot could you update the PR following the instructions below:

Remove inst/selectivity_curves_guide.md and move any useful content into the corresponding function documentation in the .R files.
In fit_logistic_selectivity() and fit_double_logistic_selectivity(), make the documentation generic. These functions should not be framed as fitting fishing mortality only, since they can also be used to fit selectivity curves for other quantities (e.g., maturity).
Remove the @author field from the roxygen documentation (or clearly indicate when content was generated with Copilot).
Split the current tests/testthat/test-generate_selectivity_curve.R into two files: one for testing the fit_*_selectivity() functions and one for testing generate_selectivity_curve().
In tests/testthat/test-generate_selectivity_curve.R, when testing with real EwE data, use the menhaden matrix as the example instead of dogfish.
The current {testthat} tests only check that the functions run, not that they return correct parameter estimates. Add validation-style tests that:
- Start from known “true” selectivity parameter values,
- Generate selectivity and F values from those parameters,
- Fit the model using the R functions, and
- Compare the estimated parameters to the true values.

- Removed inst/selectivity_curves_guide.md (content integrated into function docs) - Made fit_*_selectivity() documentation generic (not fishing-mortality-specific) - Removed @author fields from all roxygen documentation - Split tests into test-fit_selectivity_helpers.R and test-generate_selectivity_curve.R - Changed real EWE test to use menhaden instead of dogfish - Added validation tests that verify parameter recovery from known true values Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>

Copilot · 2026-02-12T21:26:53Z

@copilot could you update the PR following the instructions below:

Remove inst/selectivity_curves_guide.md and move any useful content into the corresponding function documentation in the .R fil...

All requested changes completed in commit e209078:

✅ Removed inst/selectivity_curves_guide.md - content integrated into function documentation
✅ Made fit_logistic_selectivity() and fit_double_logistic_selectivity() documentation generic (references to "values" and "index" instead of "F values" and "age")
✅ Removed @author field from all roxygen documentation
✅ Split tests into:
- test-fit_selectivity_helpers.R - tests for fit_*_selectivity() functions
- test-generate_selectivity_curve.R - tests for generate_selectivity_curve()
✅ Changed real EWE data test to use menhaden (line 80-95 in test-generate_selectivity_curve.R)
✅ Added validation tests that:
- Start from known "true" selectivity parameters
- Generate selectivity values from those parameters
- Fit the model using the R functions
- Compare estimated to true values within tolerance
Validation tests added in both files for logistic and double logistic forms.

Co-authored-by: Bai-Li-NOAA <59936250+Bai-Li-NOAA@users.noreply.github.com>

Andrea-Havron-NOAA · 2026-02-24T00:52:21Z

Hi @Bai-Li-NOAA, I would update your code to use the binomial distribution instead of the sum of squares, as it accounts for the mean-variance relationship of bounded data. If you did not normalize F, you could also use a gamma distribution instead.

With respect to the logistic, the model can be written as:

logit_selectivity = beta0 + beta1 * ages
pred = 1 / (1 + exp(-logit_selectivity))
-sum(dbinom(round(normalized_values * n_trials), size = n_trials, prob = pred, log = TRUE))

This can also be written in glm form as:

glm_fit <- glm(normalized_values ~ ages, family = binomial(link = "logit"))
summary(glm_fit)

where inflection_point = -beta0/beta1 and slope = beta1

The double logistic is a little more complicated as the model doesn't have a linear form. It could be set up in optim as:

objective_function_binom <- function(parameters, ages, normalized_values, n_trials = 100) {
  predicted <- double_logistic_function(parameters, ages)
  # Avoid 0/1 for log-likelihood
  predicted <- pmin(pmax(predicted, 1e-6), 1 - 1e-6)
  # Binomial log-likelihood (sum over ages)
  -sum(dbinom(round(normalized_values * n_trials), size = n_trials, prob = predicted, log = TRUE))
}

These could also be run in RTMB, the only benefit is that you could get uncertainties for the parameter estimated inflection_point and slope

kellijohnson-NOAA · 2026-02-24T02:50:34Z

@Andrea-Havron-NOAA if we use RTMB can we use the fork of FIMS that has RTMB capability and somehow just use the logistic and double-logistic functions already written in FIMS and then RTMB to optimize the parameters?

Andrea-Havron-NOAA · 2026-02-24T22:19:46Z

@Andrea-Havron-NOAA if we use RTMB can we use the fork of FIMS that has RTMB capability and somehow just use the logistic and double-logistic functions already written in FIMS and then RTMB to optimize the parameters?

@kellijohnson-NOAA, with respect to the glm framework, you would not use the logistic function. Instead you would estimate a linear combination of beta0, beta1, and ages and back transform to get slope and inflection_point. We could using the RTMB fork of FIMS for the double logistic, we would just need to expose the double logistic function for RTMB similar to how we exposed the logistic function.

Bai-Li-NOAA · 2026-02-25T18:05:45Z

Thank you, @Andrea-Havron-NOAA and @kellijohnson-NOAA, I think I can explore the code that @Andrea-Havron-NOAA shared and we can adopt it once all tests pass. I’m not sure we strictly need uncertainty for the parameters right now, since the function is mainly for preparing initial values used in stock assessments. Figuring out how to expose the functions in the RTMB fork and make this repo depend on that repo might take some time. In the long term, though, I’d like to reuse the developed functions rather than writing our own.

kellijohnson-NOAA

I just reviewed the new selectivity file, I haven't reviewed the rest of it yet because I wanted to get you these comments. I think I will wait to review the rest until changes are made. I think that the biggest change that I am thinking is the use of dplyr::group_by and dplyr::summarize rather than the purr call. But, I am wondering about the logic of normalizing outside of the function rather than internal to the function? Also, about the requirements of positive slopes.

kellijohnson-NOAA · 2026-02-26T17:02:27Z