Skip to content

Add big file diann#151

Open
Rudhik1904 wants to merge 15 commits intodevelfrom
Add-big-file-DIANN
Open

Add big file diann#151
Rudhik1904 wants to merge 15 commits intodevelfrom
Add-big-file-DIANN

Conversation

@Rudhik1904
Copy link
Contributor

@Rudhik1904 Rudhik1904 commented Feb 3, 2026

Motivation and Context

This PR extends MSstatsShiny to support large file handling for DIANN and Spectronaut data formats, addressing limitations when processing big files that cannot be fully loaded into memory. The solution introduces a modular, server-driven UI architecture that separates standard and large-file workflows, coupled with integration of MSstatsBig converters (bigSpectronauttoMSstatsFormat and bigDIANNtoMSstatsFormat) for memory-efficient data processing. The implementation supports both web-server and local-server deployment modes with appropriate file browsing and handling logic.

Detailed Changes

Core Server Logic (R/module-loadpage-server.R)

  • Added is_web_server parameter to loadpageServer() function signature (default: FALSE) to support conditional server/client mode handling
  • Implemented branching logic for file handling:
    • Local mode: Initializes shinyFiles volumes, enables server-side big file browsing, parses file paths into local_file_info and local_big_file_path, exposes datapath and filename for UI
    • Web server mode: Bypasses local shinyFiles logic and uses standard UI paths
  • Introduced conditional UI rendering for Spectronaut workflow:
    • spectronaut_header_ui, spectronaut_file_selection_ui, and spectronaut_options_ui now adapt to is_web_server and big_file_spec flags with corresponding standard/large UI components
    • spectronaut_options_ui renders additional filter/bottom UI components only in non-web-server big-file modes
  • Introduced conditional UI rendering for DIANN workflow:
    • diann_header_ui, diann_file_selection_ui, and diann_options_ui similarly adapt to is_web_server and big_file_diann flags with appropriate large/standard UI components
    • diann_options_ui renders additional filter/bottom UI components only in non-web-server big-file modes
  • Expanded proceed/enable logic to support both regular and big-file modes:
    • Spectronaut: Proceed enabled if (regular_spec_ok OR big_file_spec_ok) AND sep_specdata present; big-file path derived from local_big_file_path
    • DIANN: Proceed enabled if (regular_diann_ok OR big_file_diann_ok) AND sep_dianndata present; big-file path derived similarly
  • Introduced reactive definitions for local file path and file-info to support end processing when big-file mode is active
  • Updated conditional logic blocks in observe flow to utilize new spec/diann big-file readiness checks

UI Components (R/module-loadpage-ui.R)

  • Replaced static conditionalPanels for DIANN and Spectronaut upload sections with dynamic UI composition using tagList and uiOutput placeholders (diann_header_ui, diann_file_selection_ui, diann_options_ui, spectronaut_header_ui, spectronaut_file_selection_ui, spectronaut_options_ui)
  • Added 13 new UI constructor functions:
    • Headers: create_spectronaut_header(), create_diann_header()
    • Mode selectors: create_spectronaut_mode_selector(), create_diann_mode_selector()
    • Standard UI: create_spectronaut_standard_ui(), create_diann_standard_ui()
    • Large file UI: create_spectronaut_large_file_ui(), create_diann_large_file_ui()
    • Filter options: create_spectronaut_large_filter_options(), create_diann_large_filter_options()
    • Q-value and bottom components: create_spectronaut_qvalue_cutoff_ui(), create_spectronaut_large_bottom_ui(), create_diann_large_bottom_ui()
  • Updated create_standard_annotation_uploads() condition to exclude Spectronaut when big_file_spec is active, preserving compatibility with large file handling
  • Updated create_label_free_options() condition to exclude Spectronaut path when big_file_spec is active
  • Modular UI composition enables flexible assembly of upload workflows based on file type and mode

Data Loading Logic (R/utils.R)

  • Removed explicit MSstatsBig namespace prefixes for Spectronaut big-file path processing
  • Added comprehensive big-file handling branches in getData():
    • Big Spectronaut: Validates qvalue_cutoff and max_feature_count, resolves local file path, calls bigSpectronauttoMSstatsFormat() with multiple filter parameters, loads results via dplyr::collect() with memory-error handling
    • Big DIANN: Validates max_feature_count, resolves local file path, calls bigDIANNtoMSstatsFormat() with MBR, quantificationColumn, max_feature_count and filter options, loads results via dplyr::collect() with memory-error handling
  • Falls back to existing non-big-file paths when big-file conditions are not met, preserving existing read paths for standard spec/diann workflows
  • Updated getDataCode() to emit MSstatsBig usage blocks for big Spectronaut/DIANN processing, including version reporting (displays "Not Installed" if unavailable)
  • Added memory-conscious collection calls after big-file conversions with fallback error handling

Package Dependencies (R/MSstatsShiny.R)

  • Added imports: bigDIANNtoMSstatsFormat and bigSpectronauttoMSstatsFormat from MSstatsBig package via Roxygen @importFrom block

Unit Tests

New Test File (tests/testthat/test_parallel_simulation.R)

  • Added new test script for parallel execution memory behavior simulation
  • Implements mocked internal functions: .fitSurvival() (pseudo-model object generator), .isSummarizable() (pass-through), .runTukey() (data table result)
  • Provides two simulation variants:
    • Leaky version (MSstatsSummarizeSingleTMP_Leaky_Sim): Retains memory by not cleaning up survival_fit
    • Fixed version (MSstatsSummarizeSingleTMP_Fixed_Sim): Cleans up survival_fit before sleeping
  • Both variants perform prediction steps and log memory usage to parallel_log.txt
  • Uses mclapply on two cores for parallel workload simulation
  • Includes shared data table setup with randomized observations to trigger left-truncated survival modeling

Coding Guideline Violations

No explicit coding guideline violations are documented in the raw summary. The changes follow consistent naming conventions for new UI constructor functions (snake_case prefix pattern like create_*), maintain backward compatibility through optional parameters with sensible defaults, and preserve existing error handling patterns while extending functionality for big-file support.

@Rudhik1904 Rudhik1904 requested a review from tonywu1999 February 3, 2026 06:55
@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR extends the loadpage module to support big file handling for Spectronaut and DIANN workflows. It introduces an is_web_server parameter to toggle between local and web server modes, adds modular UI helper functions for dynamic composition, and integrates MSstatsBig conversion functions for large file processing via dplyr lazy evaluation.

Changes

Cohort / File(s) Summary
Server-side module logic
R/module-loadpage-server.R
Added is_web_server parameter and branching logic to handle local file browsing vs. web server modes. Introduced reactive definitions for big file path resolution and conditional rendering of UI for Spectronaut/DIANN large-file workflows, with updated proceed/enable logic supporting both regular and big-file modes.
UI module composition
R/module-loadpage-ui.R
Replaced static conditionalPanels with modular uiOutput placeholders for Spectronaut and DIANN sections. Added 13 new helper functions (create_spectronaut_*, create_diann_*) to assemble UI pieces dynamically. Updated condition logic to exclude standard annotation uploads when big file mode is active.
Data processing pipelines
R/utils.R
Integrated MSstatsBig conversion paths within getData for big Spectronaut and DIANN files, including parameter validation, progress indicators, lazy evaluation via dplyr::collect, and memory-error fallbacks. Updated getDataCode to emit MSstatsBig usage blocks and version information.
Package imports
R/MSstatsShiny.R
Added imports for bigDIANNtoMSstatsFormat and bigSpectronauttoMSstatsFormat from MSstatsBig package.
Memory behavior testing
tests/testthat/test_parallel_simulation.R
New test script simulating parallel execution with mocked MSstats functions to compare memory behavior between leaky and fixed cleanup variants using mclapply on dual cores.

Sequence Diagram

sequenceDiagram
    participant User as User/Browser
    participant Shiny as Shiny Server
    participant UI as UI Layer
    participant MSB as MSstatsBig
    participant Data as Data Processing

    User->>Shiny: Upload file (Spectronaut/DIANN)
    Shiny->>Shiny: Check is_web_server flag
    alt Big File Mode
        Shiny->>Shiny: Resolve local_big_file_path
        Shiny->>UI: Render big file UI elements
        User->>Shiny: Select file & options
        Shiny->>Data: Call getData with big file path
        Data->>MSB: Invoke bigSpectronauttoMSstatsFormat/bigDIANNtoMSstatsFormat
        MSB->>MSB: Process large file (lazy eval)
        Data->>Data: dplyr::collect with memory handling
        Data-->>Shiny: Return results
    else Standard Mode
        Shiny->>UI: Render standard UI elements
        User->>Shiny: Upload file
        Shiny->>Data: Call getData (standard path)
        Data-->>Shiny: Return results
    end
    Shiny-->>User: Display processed data
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • PR #114: Introduces DIANN UI and processing components that this PR extends with big file support
  • PR #144: Adds initial Spectronaut big file handling, which this PR generalizes across both Spectronaut and DIANN workflows

Suggested labels

Review effort 3/5, enhancement

Suggested reviewers

  • devonjkohler

Poem

🐰 A rabbit hops through big files with glee,
Web servers and modes, now perfectly free!
Modular UI blocks stack high in the warren,
MSstatsBig's power—let data sizes tarryin'! 🌽

🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title mentions only DIANN big file support, but the changeset adds big file handling for both Spectronaut and DIANN, plus supporting infrastructure changes. Update title to reflect both Spectronaut and DIANN big file support, e.g., 'Add big file support for Spectronaut and DIANN' or 'Support large file handling for Spectronaut and DIANN workflows'.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (13 files):

⚔️ DESCRIPTION (content)
⚔️ NAMESPACE (content)
⚔️ R/MSstatsShiny.R (content)
⚔️ R/constants.R (content)
⚔️ R/module-loadpage-server.R (content)
⚔️ R/module-loadpage-ui.R (content)
⚔️ R/module-statmodel-server.R (content)
⚔️ R/statmodel-ui-options-modeling.R (content)
⚔️ R/statmodel-ui-results.R (content)
⚔️ R/utils.R (content)
⚔️ tests/testthat/test-module-loadpage-ui.R (content)
⚔️ tests/testthat/test-utils-statmodel-server.R (content)
⚔️ tests/testthat/test-utils.R (content)

These conflicts must be resolved before merging into devel.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch Add-big-file-DIANN

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
R/module-loadpage-ui.R (1)

226-231: ⚠️ Potential issue | 🔴 Critical

Add && !input['loadpage-big_file_diann'] to the annotation file condition for consistency.

DIANN big file mode (using MSstatsBig::bigDIANNtoMSstatsFormat) does not accept annotation parameters, unlike regular DIANN mode which supports optional annotations. The annotation file input should be excluded for DIANN big file mode, matching the pattern already established for Spectronaut (!input['loadpage-big_file_spec']).

Suggested fix

Update line 226 condition from:

condition = "(input['loadpage-filetype'] == 'sky' || input['loadpage-filetype'] == 'prog' || input['loadpage-filetype'] == 'PD' || (input['loadpage-filetype'] == 'spec' && !input['loadpage-big_file_spec']) || input['loadpage-filetype'] == 'open'|| input['loadpage-filetype'] =='spmin' || input['loadpage-filetype'] == 'phil' || input['loadpage-filetype'] == 'diann') && input['loadpage-BIO'] != 'PTM'",

To:

condition = "(input['loadpage-filetype'] == 'sky' || input['loadpage-filetype'] == 'prog' || input['loadpage-filetype'] == 'PD' || (input['loadpage-filetype'] == 'spec' && !input['loadpage-big_file_spec']) || input['loadpage-filetype'] == 'open'|| input['loadpage-filetype'] =='spmin' || input['loadpage-filetype'] == 'phil' || (input['loadpage-filetype'] == 'diann' && !input['loadpage-big_file_diann'])) && input['loadpage-BIO'] != 'PTM'",
🤖 Fix all issues with AI agents
In `@R/module-loadpage-server.R`:
- Around line 75-98: The DIANN big-file UI is missing the separator buttons, so
update output$diann_file_selection_ui to append
create_separator_buttons(session$ns, "sep_dianndata") when building the UI for
big-file mode; locate the branch that adds
create_diann_large_file_ui(session$ns) (function create_diann_large_file_ui) and
add the call to create_separator_buttons(session$ns, "sep_dianndata") alongside
it so both standard (create_diann_standard_ui) and large-file modes include
separators.

In `@R/utils.R`:
- Around line 533-545: The call to MSstatsBig::bigSpectronauttoMSstatsFormat
uses a hardcoded output_file_name ("output_file.csv") which can collide for
concurrent runs; change it to generate a unique temporary filename (e.g., via
tempfile with a .csv extension) and pass that variable as output_file_name in
the converted_data assignment (keep references to local_big_file_path and
converted_data unchanged), and optionally ensure the temp file is cleaned up
after use.
- Around line 617-627: The call to MSstatsBig::bigDIANNtoMSstatsFormat uses a
hardcoded output_file_name ("output_file.csv"); replace that literal with a
unique temporary filename (e.g. generate via tempfile() with an appropriate file
extension) and pass that variable as output_file_name in the converted_data
assignment so the function uses a temp file for local_big_file_path conversion;
ensure the tempfile is created before the call and removed/managed after use to
avoid collisions.

In `@tests/testthat/test-loadpage-server.R`:
- Around line 1-21: Implement the missing helper .get_data_source_type to accept
(file_type, is_big_flag) and return "big_spectronaut" only when file_type ==
"spec" and is_big_flag is TRUE; in every other case (including NULL or any
non-"spec" file_type) return "standard"; add the function to the source (e.g.,
alongside other loadpage server helpers) and ensure it handles NULL/NA boolean
inputs safely and is exported/internal as needed so tests can call
.get_data_source_type.
🧹 Nitpick comments (2)
R/module-loadpage-server.R (1)

100-143: Consider extracting common default-value initialization pattern.

Both diann_options_ui (lines 104-110) and spectronaut_options_ui (lines 125-133) use the same pattern for initializing default values from potentially-null inputs. This repetition could be reduced with a helper function.

♻️ Optional: Extract helper for default value initialization
# Helper function to get input value or default
get_input_or_default <- function(input_value, default) {

if (is.null(input_value)) default else input_value
}

# Usage in diann_options_ui:
max_feature_def <- get_input_or_default(input$max_feature_count, 100)
unique_peps_def <- get_input_or_default(input$filter_unique_peptides, FALSE)
R/module-loadpage-ui.R (1)

377-396: Consider consolidating duplicate bottom UI functions.

create_spectronaut_large_bottom_ui and create_diann_large_bottom_ui are nearly identical, differing only in the default value for max_feature_def (20 vs 100). These could be consolidated into a single function.

♻️ Optional: Consolidate into single function
-#' Create Spectronaut large file options (Bottom part)
-#' `@noRd`
-create_spectronaut_large_bottom_ui <- function(ns, max_feature_def = 20, unique_peps_def = FALSE, agg_psms_def = FALSE, few_obs_def = FALSE) {
-  tagList(
-    numericInput(ns("max_feature_count"), "Max feature count", value = max_feature_def, min = 1),
-    checkboxInput(ns("filter_unique_peptides"), "Use unique peptides", value = unique_peps_def),
-    checkboxInput(ns("aggregate_psms"), "Aggregate PSMs to peptides", value = agg_psms_def),
-    checkboxInput(ns("filter_few_obs"), "Filter features with few observations", value = few_obs_def)
-  )
-}
-
-#' Create DIANN large file options (Bottom part)
-#' `@noRd`
-create_diann_large_bottom_ui <- function(ns, max_feature_def = 100, unique_peps_def = FALSE, agg_psms_def = FALSE, few_obs_def = FALSE) {
-  tagList(
-    numericInput(ns("max_feature_count"), "Max feature count", value = max_feature_def, min = 1),
-    checkboxInput(ns("filter_unique_peptides"), "Use unique peptides", value = unique_peps_def),
-    checkboxInput(ns("aggregate_psms"), "Aggregate PSMs to peptides", value = agg_psms_def),
-    checkboxInput(ns("filter_few_obs"), "Filter features with few observations", value = few_obs_def)
-  )
-}
+#' Create large file bottom options (shared by Spectronaut and DIANN)
+#' `@noRd`
+create_large_file_bottom_ui <- function(ns, max_feature_def, unique_peps_def = FALSE, agg_psms_def = FALSE, few_obs_def = FALSE) {
+  tagList(
+    numericInput(ns("max_feature_count"), "Max feature count", value = max_feature_def, min = 1),
+    checkboxInput(ns("filter_unique_peptides"), "Use unique peptides", value = unique_peps_def),
+    checkboxInput(ns("aggregate_psms"), "Aggregate PSMs to peptides", value = agg_psms_def),
+    checkboxInput(ns("filter_few_obs"), "Filter features with few observations", value = few_obs_def)
+  )
+}
+
+# Then use with tool-specific defaults:
+# create_large_file_bottom_ui(ns, max_feature_def = 20, ...)  # Spectronaut
+# create_large_file_bottom_ui(ns, max_feature_def = 100, ...) # DIANN

R/utils.R Outdated
Comment on lines +617 to +627
converted_data <- MSstatsBig::bigDIANNtoMSstatsFormat(
input_file = local_big_file_path,
output_file_name = "output_file.csv",
backend = "arrow",
MBR = isTRUE(input$diann_MBR),
quantificationColumn = input$diann_quantificationColumn,
max_feature_count = input$max_feature_count,
filter_unique_peptides = input$filter_unique_peptides,
aggregate_psms = input$aggregate_psms,
filter_few_obs = input$filter_few_obs
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Same hardcoded output file name issue as Spectronaut path.

The output_file_name = "output_file.csv" is also hardcoded here. Apply the same fix using tempfile() for consistency and to avoid file collisions.

🛠️ Suggested fix
         converted_data <- MSstatsBig::bigDIANNtoMSstatsFormat(
           input_file = local_big_file_path,
-          output_file_name = "output_file.csv",
+          output_file_name = tempfile(pattern = "diann_", fileext = ".csv"),
           backend = "arrow",
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
converted_data <- MSstatsBig::bigDIANNtoMSstatsFormat(
input_file = local_big_file_path,
output_file_name = "output_file.csv",
backend = "arrow",
MBR = isTRUE(input$diann_MBR),
quantificationColumn = input$diann_quantificationColumn,
max_feature_count = input$max_feature_count,
filter_unique_peptides = input$filter_unique_peptides,
aggregate_psms = input$aggregate_psms,
filter_few_obs = input$filter_few_obs
)
converted_data <- MSstatsBig::bigDIANNtoMSstatsFormat(
input_file = local_big_file_path,
output_file_name = tempfile(pattern = "diann_", fileext = ".csv"),
backend = "arrow",
MBR = isTRUE(input$diann_MBR),
quantificationColumn = input$diann_quantificationColumn,
max_feature_count = input$max_feature_count,
filter_unique_peptides = input$filter_unique_peptides,
aggregate_psms = input$aggregate_psms,
filter_few_obs = input$filter_few_obs
)
🤖 Prompt for AI Agents
In `@R/utils.R` around lines 617 - 627, The call to
MSstatsBig::bigDIANNtoMSstatsFormat uses a hardcoded output_file_name
("output_file.csv"); replace that literal with a unique temporary filename (e.g.
generate via tempfile() with an appropriate file extension) and pass that
variable as output_file_name in the converted_data assignment so the function
uses a temp file for local_big_file_path conversion; ensure the tempfile is
created before the call and removed/managed after use to avoid collisions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@R/module-loadpage-ui.R`:
- Around line 274-277: Add three renderUI handlers in R/module-loadpage-server.R
mirroring the Spectronaut pattern: implement output$diann_header_ui,
output$diann_file_selection_ui, and output$diann_options_ui. Each renderUI
should call the corresponding DIANN UI builder function (the same functions used
to construct the DIANN widgets in the UI module) and include the same
conditional logic used for Spectronaut (e.g., only show file selection/options
when the selected file type is DIANN); wire these outputs to the names used in
create_diann_uploads() (diann_header_ui, diann_file_selection_ui,
diann_options_ui) so the UI renders correctly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
R/module-loadpage-ui.R (2)

589-619: ⚠️ Potential issue | 🟠 Major

Standard label-free options are not hidden for DIANN big-file mode.

Lines 592 and 599 exclude Spectronaut big-file mode (input['loadpage-filetype'] != 'spec' || !input['loadpage-big_file_spec']), but do not exclude DIANN big-file mode. When a user selects DIANN with large-file mode enabled, they'll see both the standard label-free options (e.g., "Use unique peptides" at line 594) and the big-file-specific options rendered by create_diann_large_bottom_ui (which also has "Use unique peptides"). This creates a confusing duplicate UI.

Proposed fix: add DIANN big-file exclusion
-      condition = "input['loadpage-filetype'] && input['loadpage-DDA_DIA'] == 'LType' && input['loadpage-filetype'] != 'sample' && input['loadpage-filetype'] != 'MRF' && (input['loadpage-filetype'] != 'spec' || !input['loadpage-big_file_spec'])",
+      condition = "input['loadpage-filetype'] && input['loadpage-DDA_DIA'] == 'LType' && input['loadpage-filetype'] != 'sample' && input['loadpage-filetype'] != 'MRF' && (input['loadpage-filetype'] != 'spec' || !input['loadpage-big_file_spec']) && (input['loadpage-filetype'] != 'diann' || !input['loadpage-big_file_diann'])",

Apply the same pattern to the condition on line 599.


224-232: ⚠️ Potential issue | 🟠 Major

Exclude annotation upload for DIANN big-file mode.

The annotation file is only required for regular DIANN (passed to DIANNtoMSstatsFormat on line 674), but is not used in the big-file workflow (bigDIANNtoMSstatsFormat on lines 617–627 has no annotation parameter). Update the condition on line 226 to exclude annotation upload for big-file DIANN, consistent with the Spectronaut pattern: change input['loadpage-filetype'] == 'diann' to (input['loadpage-filetype'] == 'diann' && !input['loadpage-big_file_diann']).

🤖 Fix all issues with AI agents
In `@R/module-loadpage-server.R`:
- Around line 233-248: The separator requirement is inconsistent between
Spectronaut and DIANN big-file modes; update the Spectronaut checks to match
DIANN by moving the sep check into spec_regular_file_ok (i.e. set
spec_regular_file_ok <- !isTRUE(input$big_file_spec) && !is.null(input$specdata)
&& !is.null(input$sep_specdata) and keep spec_big_file_ok <-
isTRUE(input$big_file_spec) && length(local_big_file_path()) > 0) and then
change the if to if(spec_regular_file_ok || spec_big_file_ok) {
enable("proceed1") } so big-file Spectronaut no longer requires sep selection
(mirrors diann behavior).
🧹 Nitpick comments (1)
R/module-loadpage-server.R (1)

50-98: Separator buttons are rendered even in large-file mode, where they may be irrelevant.

When big_file_spec or big_file_diann is TRUE, the user browses for a local file (likely Parquet). The CSV/TSV column-separator radio buttons are still shown (lines 72, 97) but serve no purpose in that path. Consider conditionally rendering separators only when the standard (non-big-file) UI is active to avoid user confusion.

Example for Spectronaut (similar change for DIANN)
-      tagList(ui_elements, create_separator_buttons(session$ns, "sep_specdata"))
+      if (is_web_server || !isTRUE(input$big_file_spec)) {
+        tagList(ui_elements, create_separator_buttons(session$ns, "sep_specdata"))
+      } else {
+        ui_elements
+      }

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@R/module-loadpage-server.R`:
- Around line 100-143: The shared input IDs (max_feature_count,
filter_unique_peptides, aggregate_psms, filter_few_obs) used by
create_diann_large_bottom_ui() and create_spectronaut_large_bottom_ui() cause
state leakage between DIANN and Spectronaut; fix by giving each UI its own
namespaced IDs (e.g., diann_max_feature_count, diann_filter_unique_peptides,
diann_aggregate_psms, diann_filter_few_obs for DIANN and spec_max_feature_count,
spec_filter_unique_peptides, spec_aggregate_psms, spec_filter_few_obs for
Spectronaut) inside those functions and update all references in
module-loadpage-server.R (the input checks and local default variables used in
output$diann_options_ui and output$spectronaut_options_ui) to read the new
per-tool input names so defaults are applied correctly when switching file
types.

Comment on lines +100 to +143
output$diann_options_ui <- renderUI({
req(input$filetype == 'diann', input$BIO != 'PTM')

if (!is_web_server && isTRUE(input$big_file_diann)) {
mbr_def <- if (is.null(input$diann_MBR)) TRUE else input$diann_MBR
quant_col_def <- if (is.null(input$diann_quantificationColumn)) "Fragment.Quant.Corrected" else input$diann_quantificationColumn

max_feature_def <- if (is.null(input$max_feature_count)) 100 else input$max_feature_count
unique_peps_def <- if (is.null(input$filter_unique_peptides)) FALSE else input$filter_unique_peptides
agg_psms_def <- if (is.null(input$aggregate_psms)) FALSE else input$aggregate_psms
few_obs_def <- if (is.null(input$filter_few_obs)) FALSE else input$filter_few_obs

tagList(
create_diann_large_filter_options(session$ns, mbr_def, quant_col_def),
create_diann_large_bottom_ui(session$ns, max_feature_def, unique_peps_def, agg_psms_def, few_obs_def)
)
} else {
NULL
}
})

output$spectronaut_options_ui <- renderUI({
req(input$filetype == 'spec', input$BIO != 'PTM')

if (!is_web_server && isTRUE(input$big_file_spec)) {
qval_def <- if (is.null(input$filter_by_qvalue)) TRUE else input$filter_by_qvalue
excluded_def <- if (is.null(input$filter_by_excluded)) FALSE else input$filter_by_excluded
identified_def <- if (is.null(input$filter_by_identified)) FALSE else input$filter_by_identified
cutoff_def <- if (is.null(input$qvalue_cutoff)) 0.01 else input$qvalue_cutoff

max_feature_def <- if (is.null(input$max_feature_count)) 20 else input$max_feature_count
unique_peps_def <- if (is.null(input$filter_unique_peptides)) FALSE else input$filter_unique_peptides
agg_psms_def <- if (is.null(input$aggregate_psms)) FALSE else input$aggregate_psms
few_obs_def <- if (is.null(input$filter_few_obs)) FALSE else input$filter_few_obs

tagList(
create_spectronaut_large_filter_options(session$ns, excluded_def, identified_def, qval_def),
if (qval_def) create_spectronaut_qvalue_cutoff_ui(session$ns, cutoff_def),
create_spectronaut_large_bottom_ui(session$ns, max_feature_def, unique_peps_def, agg_psms_def, few_obs_def)
)
} else {
NULL
}
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find the definitions of the bottom UI functions
fd -t f -e R . | xargs rg -l "create_diann_large_bottom_ui|create_spectronaut_large_bottom_ui"

Repository: Vitek-Lab/MSstatsShiny

Length of output: 156


🏁 Script executed:

#!/bin/bash
# Search for these function definitions
rg -n "create_diann_large_bottom_ui\s*<-|create_diann_large_bottom_ui\s*=" R/ --type=r -A 30

Repository: Vitek-Lab/MSstatsShiny

Length of output: 2591


🏁 Script executed:

#!/bin/bash
# Search for spectronaut function
rg -n "create_spectronaut_large_bottom_ui\s*<-|create_spectronaut_large_bottom_ui\s*=" R/ --type=r -A 30

Repository: Vitek-Lab/MSstatsShiny

Length of output: 2487


Shared input IDs between DIANN and Spectronaut large-file options cause stale defaults when switching file types.

Both create_diann_large_bottom_ui() and create_spectronaut_large_bottom_ui() create inputs with identical namespaced IDs: max_feature_count, filter_unique_peptides, aggregate_psms, and filter_few_obs. When a user switches from DIANN big-file mode (e.g., setting max_feature_count to 100) to Spectronaut big-file mode, the if (is.null(input$max_feature_count)) guard on lines 107 and 130 fails because the input is not NULL—it retains the DIANN value of 100. As a result, Spectronaut's default of 20 is never applied, and the user sees an unexpected carryover value.

While this is a minor UX issue in practice (switching between file types is an uncommon workflow), consider isolating the input IDs per file type (e.g., "diann_max_feature_count" vs "spec_max_feature_count") to prevent state leakage.

🤖 Prompt for AI Agents
In `@R/module-loadpage-server.R` around lines 100 - 143, The shared input IDs
(max_feature_count, filter_unique_peptides, aggregate_psms, filter_few_obs) used
by create_diann_large_bottom_ui() and create_spectronaut_large_bottom_ui() cause
state leakage between DIANN and Spectronaut; fix by giving each UI its own
namespaced IDs (e.g., diann_max_feature_count, diann_filter_unique_peptides,
diann_aggregate_psms, diann_filter_few_obs for DIANN and spec_max_feature_count,
spec_filter_unique_peptides, spec_aggregate_psms, spec_filter_few_obs for
Spectronaut) inside those functions and update all references in
module-loadpage-server.R (the input checks and local default variables used in
output$diann_options_ui and output$spectronaut_options_ui) to read the new
per-tool input names so defaults are applied correctly when switching file
types.


# Call the big file conversion function from MSstatsConvert
converted_data <- MSstatsBig::bigDIANNtoMSstatsFormat(
input_file = local_big_file_path,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. The annotation file will need to be incorporated into this function
  2. Add bigDIANNtoMSstatsFormat and bigSpectronauttoMSstatsFormat as dependencies in this file. And then you won't need to do MSstatsBig:: when calling the function.

Updating MSstatsShiny.R with dependancies
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@R/utils.R`:
- Around line 938-951: The generated big DIANN code omits the annotation
argument for MSstatsBig::bigDIANNtoMSstatsFormat; update the code building block
that constructs the bigDIANN call (where codes is appended with "data =
MSstatsBig::bigDIANNtoMSstatsFormat(") to include annotation = getAnnot(input)
as a parameter (matching how the runtime uses annotation = getAnnot(input)),
ensuring proper comma placement and quoting/concatenation consistent with the
surrounding parameters (e.g., alongside input_file, backend, MBR,
quantificationColumn, etc.) so the emitted call contains annotation =
getAnnot(input).
- Around line 595-645: The generated code in getDataCode omits the annotation
argument when emitting the bigDIANNtoMSstatsFormat() call; update getDataCode to
include a line that appends the annotation parameter (e.g., codes = paste(codes,
"  annotation = \"insert your annotation file path\",\\n", sep = "")) so the
generated call matches the runtime call (bigDIANNtoMSstatsFormat with annotation
= getAnnot(input)); ensure the string is properly escaped and includes the
trailing comma/newline and references input$big_file_browse/shared UI as before.

In `@tests/testthat/test_parallel_simulation.R`:
- Around line 1-159: This file is not a real test and triggers heavy
work/side-effects during R CMD check: top-level mclapply sleeps and allocates
large objects, writes parallel_log.txt, and has no expectations; fix by either
moving the script out of tests/testthat (e.g., inst/scripts/) or converting it
into a proper test_that block that (1) wraps execution in testthat::skip_on_ci()
and testthat::skip_on_os("windows") or similar guards, (2) uses
withr::local_tempdir()/tempfile() for any log files instead of writing to
working dir, (3) reduces or parameterizes the heavy allocation/sleep (or mock
.fitSurvival to avoid large numeric vectors) and (4) includes explicit expect_*
assertions (or expect_silent/expect_true) to produce pass/fail signals; update
functions MSstatsSummarizeSingleTMP_Leaky_Sim and
MSstatsSummarizeSingleTMP_Fixed_Sim usage in the test accordingly.
- Line 146: The test currently creates and writes to a hard-coded file
"parallel_log.txt" (via file.create and subsequent cat(..., file =
"parallel_log.txt", append = TRUE)), causing unwanted I/O side effects; update
the test to generate a temp path (e.g. tempfile() or withr::local_tempfile())
and use that variable for all file.create and cat calls (and any reads) so no
artifact is left in the working directory—ensure every reference to
"parallel_log.txt" in the test (file.create, cat, etc.) is replaced with the
temp file variable.

Comment on lines +595 to +645
if (isTRUE(input$big_file_diann)) {
# Logic for big DIANN files
# Parse the file path from shinyFiles input
volumes <- shinyFiles::getVolumes()()
path_info <- shinyFiles::parseFilePaths(volumes, input$big_file_browse)
local_big_file_path <- if (nrow(path_info) > 0) path_info$datapath else NULL

if (!is.numeric(input$max_feature_count) || is.na(input$max_feature_count) || input$max_feature_count <= 0) {
showNotification("Error: max_feature_count must be a positive number.", type = "error")
shinybusy::remove_modal_spinner()
return(NULL)
}

if (is.null(local_big_file_path) || !file.exists(local_big_file_path)) {
showNotification("Error: The selected file does not exist or is not readable.", type = "error")
shinybusy::remove_modal_spinner()
return(NULL)
}

shinybusy::update_modal_spinner(text = "Processing large DIANN file...")

# Call the big file conversion function from MSstatsConvert
converted_data <- bigDIANNtoMSstatsFormat(
input_file = local_big_file_path,
annotation = getAnnot(input),
output_file_name = "output_file.csv",
backend = "arrow",
MBR = isTRUE(input$diann_MBR),
quantificationColumn = input$diann_quantificationColumn,
max_feature_count = input$max_feature_count,
filter_unique_peptides = input$filter_unique_peptides,
aggregate_psms = input$aggregate_psms,
filter_few_obs = input$filter_few_obs
)

# Attempt to load the data into memory.
mydata <- tryCatch({
dplyr::collect(converted_data)
}, error = function(e) {
showNotification(
paste("Memory Error: The dataset is too large to process in-memory.", e$message),
type = "error",
duration = NULL
)
return(NULL)
})

if (is.null(mydata)) {
shinybusy::remove_modal_spinner()
return(NULL)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n 'big_file_browse' --type=r -C2

Repository: Vitek-Lab/MSstatsShiny

Length of output: 2953


🏁 Script executed:

rg -n 'bigDIANNtoMSstatsFormat' --type=r -C5

Repository: Vitek-Lab/MSstatsShiny

Length of output: 4398


🏁 Script executed:

sed -n '938,955p' R/utils.R

Repository: Vitek-Lab/MSstatsShiny

Length of output: 1306


🏁 Script executed:

# Check if both Spectronaut and DIANN big file UIs are conditionally rendered or always active
rg -n 'create_spectronaut_large_file_ui|create_diann_large_file_ui' -C3

Repository: Vitek-Lab/MSstatsShiny

Length of output: 2795


Generated code for DIANN big-file conversion is missing the annotation parameter.

The runtime code at line 619 correctly includes annotation = getAnnot(input) in the bigDIANNtoMSstatsFormat() call. However, the generated code (lines 938–955 in the getDataCode function) omits this parameter. The generated code should include:

codes = paste(codes, "  annotation = \"insert your annotation file path\",\n", sep = "")

The shared input$big_file_browse input ID is appropriate—both Spectronaut and DIANN large-file UIs are conditionally rendered (only one active at a time based on input$big_file_spec or input$big_file_diann), so the single shinyFileChoose binding works correctly.

🤖 Prompt for AI Agents
In `@R/utils.R` around lines 595 - 645, The generated code in getDataCode omits
the annotation argument when emitting the bigDIANNtoMSstatsFormat() call; update
getDataCode to include a line that appends the annotation parameter (e.g., codes
= paste(codes, "  annotation = \"insert your annotation file path\",\\n", sep =
"")) so the generated call matches the runtime call (bigDIANNtoMSstatsFormat
with annotation = getAnnot(input)); ensure the string is properly escaped and
includes the trailing comma/newline and references input$big_file_browse/shared
UI as before.

Comment on lines +938 to +951
if (isTRUE(input$big_file_diann)) {
codes = paste(codes, "library(MSstatsBig)\n", sep = "")
codes = paste(codes, "data = MSstatsBig::bigDIANNtoMSstatsFormat(\n", sep = "")
codes = paste(codes, " input_file = \"insert your large DIANN file path\",\n", sep = "")
codes = paste(codes, " output_file_name = \"output_file.csv\",\n", sep = "")
codes = paste(codes, " backend = \"arrow\",\n", sep = "")
codes = paste(codes, " MBR = ", isTRUE(input$diann_MBR), ",\n", sep = "")
codes = paste(codes, " quantificationColumn = \"", input$diann_quantificationColumn, "\",\n", sep = "")
codes = paste(codes, " max_feature_count = ", input$max_feature_count, ",\n", sep = "")
codes = paste(codes, " filter_unique_peptides = ", input$filter_unique_peptides, ",\n", sep = "")
codes = paste(codes, " aggregate_psms = ", input$aggregate_psms, ",\n", sep = "")
codes = paste(codes, " filter_few_obs = ", input$filter_few_obs, "\n", sep = "")
codes = paste(codes, ")\n", sep = "")
codes = paste(codes, "data = dplyr::collect(data)\n", sep = "")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Generated code for big DIANN is missing the annotation parameter.

The runtime call at line 619 passes annotation = getAnnot(input), but the code snippet emitted here omits it. Users attempting to reproduce the analysis with the generated code will get incorrect results or an error if annotation is required.

Proposed fix
         codes = paste(codes, "data = MSstatsBig::bigDIANNtoMSstatsFormat(\n", sep = "")
         codes = paste(codes, "  input_file = \"insert your large DIANN file path\",\n", sep = "")
+        codes = paste(codes, "  annotation = read.csv(\"insert your annotation filepath\"),\n", sep = "")
         codes = paste(codes, "  output_file_name = \"output_file.csv\",\n", sep = "")
🤖 Prompt for AI Agents
In `@R/utils.R` around lines 938 - 951, The generated big DIANN code omits the
annotation argument for MSstatsBig::bigDIANNtoMSstatsFormat; update the code
building block that constructs the bigDIANN call (where codes is appended with
"data = MSstatsBig::bigDIANNtoMSstatsFormat(") to include annotation =
getAnnot(input) as a parameter (matching how the runtime uses annotation =
getAnnot(input)), ensuring proper comma placement and quoting/concatenation
consistent with the surrounding parameters (e.g., alongside input_file, backend,
MBR, quantificationColumn, etc.) so the emitted call contains annotation =
getAnnot(input).

Comment on lines +1 to +159
library(data.table)
library(survival)
library(parallel)

# -------------------------------------------------------------------------
# 1. Mock Internal MSstats Functions
# -------------------------------------------------------------------------

.fitSurvival <- function(data, iterations) {
tryCatch({
# Create a real survreg object to simulate memory usage
# Use a subset of data to ensure stability/speed, as we only care about object size
fit <- survreg(Surv(newABUNDANCE, !cen, type="left") ~ 1,
data = head(data, 500), dist = "gaussian")
# Artificially bloat the object to simulate a complex model (approx 100MB)
# Real MSstats models can be very large due to model frames and environments
fit$bloat <- numeric(12.5 * 1024 * 1024)
return(fit)
}, error = function(e) return(NULL))
}

.isSummarizable <- function(data, remove50missing) return(data)

.runTukey <- function(data, is_labeled, censored_symbol, remove50missing) {
return(data.table(Protein = "TestProtein", LogIntensities = mean(data$newABUNDANCE, na.rm=TRUE)))
}

# -------------------------------------------------------------------------
# 2. Define Functions with "Work Simulation" (Sleep)
# -------------------------------------------------------------------------

# LEAKY VERSION
MSstatsSummarizeSingleTMP_Leaky_Sim <- function (single_protein, impute, censored_symbol, remove50missing, aft_iterations = 90) {
# ... Setup ...
newABUNDANCE = n_obs = n_obs_run = RUN = FEATURE = LABEL = NULL
predicted = censored = NULL
cols = intersect(colnames(single_protein), c("newABUNDANCE", "cen", "RUN", "FEATURE", "ref"))
single_protein = single_protein[(n_obs > 1 & !is.na(n_obs)) & (n_obs_run > 0 & !is.na(n_obs_run))]
if (nrow(single_protein) == 0) return(list(NULL, NULL))
single_protein[, `:=`(RUN, factor(RUN))]
single_protein[, `:=`(FEATURE, factor(FEATURE))]

if (impute & any(single_protein[["censored"]])) {
converged = TRUE
survival_fit = withCallingHandlers({
.fitSurvival(single_protein[LABEL == "L", cols, with = FALSE], aft_iterations)
}, warning = function(w) { if (grepl("converge", conditionMessage(w), ignore.case = TRUE)) converged <<- FALSE })

if (converged && !is.null(survival_fit)) {
single_protein[, `:=`(predicted, predict(survival_fit, newdata = .SD))]
} else {
single_protein[, `:=`(predicted, NA_real_)]
}

# --- LEAK SIMULATION ---
# The object 'survival_fit' is still in memory here.
# We simulate "doing other work" (predictions, formatting) by sleeping.
Sys.sleep(1)

# Report Memory Usage of this Worker
mem_used <- sum(gc()[,2])
msg <- sprintf("[Worker %d] LEAKY State - Holding Memory: %.2f MB\n", Sys.getpid(), mem_used)
cat(msg)
cat(msg, file = "parallel_log.txt", append = TRUE)

single_protein[, `:=`(predicted, ifelse(censored & (LABEL == "L"), predicted, NA))]
single_protein[, `:=`(newABUNDANCE, ifelse(censored & LABEL == "L", predicted, newABUNDANCE))]
survival = single_protein[, c(cols, "predicted"), with = FALSE]
} else {
survival = single_protein[, cols, with = FALSE]
survival[, `:=`(predicted, NA)]
}
# ... Finalize ...
single_protein = .isSummarizable(single_protein, remove50missing)
if (is.null(single_protein)) return(list(NULL, NULL))
result = .runTukey(single_protein, TRUE, censored_symbol, remove50missing)
list(result, survival)
}

# FIXED VERSION
MSstatsSummarizeSingleTMP_Fixed_Sim <- function (single_protein, impute, censored_symbol, remove50missing, aft_iterations = 90) {
# ... Setup ...
newABUNDANCE = n_obs = n_obs_run = RUN = FEATURE = LABEL = NULL
predicted = censored = NULL
cols = intersect(colnames(single_protein), c("newABUNDANCE", "cen", "RUN", "FEATURE", "ref"))
single_protein = single_protein[(n_obs > 1 & !is.na(n_obs)) & (n_obs_run > 0 & !is.na(n_obs_run))]
if (nrow(single_protein) == 0) return(list(NULL, NULL))
single_protein[, `:=`(RUN, factor(RUN))]
single_protein[, `:=`(FEATURE, factor(FEATURE))]

if (impute & any(single_protein[["censored"]])) {
converged = TRUE
survival_fit = withCallingHandlers({
.fitSurvival(single_protein[LABEL == "L", cols, with = FALSE], aft_iterations)
}, warning = function(w) { if (grepl("converge", conditionMessage(w), ignore.case = TRUE)) converged <<- FALSE })

if (converged && !is.null(survival_fit)) {
single_protein[, `:=`(predicted, predict(survival_fit, newdata = .SD))]
} else {
single_protein[, `:=`(predicted, NA_real_)]
}

# --- FIX APPLIED ---
rm(survival_fit)

# --- FIXED SIMULATION ---
# We simulate "doing other work" by sleeping.
Sys.sleep(1)

# Report Memory Usage of this Worker
mem_used <- sum(gc()[,2])
msg <- sprintf("[Worker %d] FIXED State - Holding Memory: %.2f MB\n", Sys.getpid(), mem_used)
cat(msg)
cat(msg, file = "parallel_log.txt", append = TRUE)

single_protein[, `:=`(predicted, ifelse(censored & (LABEL == "L"), predicted, NA))]
single_protein[, `:=`(newABUNDANCE, ifelse(censored & LABEL == "L", predicted, newABUNDANCE))]
survival = single_protein[, c(cols, "predicted"), with = FALSE]
} else {
survival = single_protein[, cols, with = FALSE]
survival[, `:=`(predicted, NA)]
}
# ... Finalize ...
single_protein = .isSummarizable(single_protein, remove50missing)
if (is.null(single_protein)) return(list(NULL, NULL))
result = .runTukey(single_protein, TRUE, censored_symbol, remove50missing)
list(result, survival)
}

# -------------------------------------------------------------------------
# 3. Run Simulation
# -------------------------------------------------------------------------

set.seed(123)
n_rows <- 20000
dt <- data.table(
newABUNDANCE = rnorm(n_rows, 20, 5),
censored = sample(c(TRUE, FALSE), n_rows, replace=TRUE, prob=c(0.3, 0.7)),
LABEL = "L", RUN = sample(1:20, n_rows, replace=TRUE), FEATURE = sample(1:500, n_rows, replace=TRUE),
n_obs = 5, n_obs_run = 5, cen = FALSE, ref = "ref"
)
dt$cen <- dt$censored
dt$newABUNDANCE[dt$censored] <- dt$newABUNDANCE[dt$censored] - 5

# Clear log file
file.create("parallel_log.txt")

cat("\n--- Simulating LEAKY Parallel Execution ---\n")
# We run 2 cores. Both will hit the 'sleep' at the same time.
# Both will report HIGH memory because they haven't cleaned up yet.
invisible(mclapply(1:2, function(i) MSstatsSummarizeSingleTMP_Leaky_Sim(copy(dt), TRUE, "NA", FALSE), mc.cores = 2))

cat("\n--- Simulating FIXED Parallel Execution ---\n")
# We run 2 cores. Both will hit the 'sleep' at the same time.
# Both will report LOW memory because they cleaned up BEFORE sleeping.
invisible(mclapply(1:2, function(i) MSstatsSummarizeSingleTMP_Fixed_Sim(copy(dt), TRUE, "NA", FALSE), mc.cores = 2))

cat("\n--- Log File Content ---\n")
cat(readLines("parallel_log.txt"), sep = "\n") No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

This script has no test_that() blocks or assertions — it isn't an actual test.

Since it lives under tests/testthat/, testthat will source() it during R CMD check, causing it to:

  1. Execute all top-level code unconditionally — including two mclapply calls with Sys.sleep(1) each, adding ~2+ seconds of wall-clock time per run.
  2. Allocate ~200 MB of bloated objects (numeric(12.5 * 1024 * 1024) ≈ 100 MB each, two workers) which could cause CI OOM failures.
  3. Write parallel_log.txt into the working directory — a side effect that violates testthat conventions (use withr::local_tempdir() or tempfile()).
  4. Silently degrade on Windowsmclapply with mc.cores > 1 falls back to serial on Windows, so the parallel memory behavior isn't actually exercised.
  5. Produce no pass/fail signal — without expect_*() calls, this provides no regression protection.

Additionally, this file appears unrelated to the stated PR objective (adding big-file DIANN support). If this is a diagnostic tool for a separate memory-leak investigation, consider moving it out of the test suite (e.g., into inst/scripts/ or a standalone vignette) or converting it into a proper test_that() with skip guards and assertions.

🤖 Prompt for AI Agents
In `@tests/testthat/test_parallel_simulation.R` around lines 1 - 159, This file is
not a real test and triggers heavy work/side-effects during R CMD check:
top-level mclapply sleeps and allocates large objects, writes parallel_log.txt,
and has no expectations; fix by either moving the script out of tests/testthat
(e.g., inst/scripts/) or converting it into a proper test_that block that (1)
wraps execution in testthat::skip_on_ci() and testthat::skip_on_os("windows") or
similar guards, (2) uses withr::local_tempdir()/tempfile() for any log files
instead of writing to working dir, (3) reduces or parameterizes the heavy
allocation/sleep (or mock .fitSurvival to avoid large numeric vectors) and (4)
includes explicit expect_* assertions (or expect_silent/expect_true) to produce
pass/fail signals; update functions MSstatsSummarizeSingleTMP_Leaky_Sim and
MSstatsSummarizeSingleTMP_Fixed_Sim usage in the test accordingly.

dt$newABUNDANCE[dt$censored] <- dt$newABUNDANCE[dt$censored] - 5

# Clear log file
file.create("parallel_log.txt")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

File I/O side effect: parallel_log.txt written to working directory.

file.create("parallel_log.txt") and the subsequent cat(..., file = "parallel_log.txt", append = TRUE) calls (lines 64, 114) write to the current working directory. During R CMD check, this is typically a temp directory managed by the test harness, but it still leaves artifacts. Use tempfile() and pass the path explicitly, or use withr::local_tempfile() if this is converted to a proper test.

🤖 Prompt for AI Agents
In `@tests/testthat/test_parallel_simulation.R` at line 146, The test currently
creates and writes to a hard-coded file "parallel_log.txt" (via file.create and
subsequent cat(..., file = "parallel_log.txt", append = TRUE)), causing unwanted
I/O side effects; update the test to generate a temp path (e.g. tempfile() or
withr::local_tempfile()) and use that variable for all file.create and cat calls
(and any reads) so no artifact is left in the working directory—ensure every
reference to "parallel_log.txt" in the test (file.create, cat, etc.) is replaced
with the temp file variable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants