Skip to content

Conversation

@bieryAtFnal
Copy link
Collaborator

@bieryAtFnal bieryAtFnal commented Oct 8, 2025

Description

A few weeks ago, I noticed messages in one or more Slack channels that talked about dfmodules integtests consuming all of the memory on np04-srv-016 when the tests were run on that computer. I believe that the biggest offender was hdf5_compression_test.

The first change in this set was to reduce the latency buffer size that is used in the hdf5_compression_test from a very large value to a more reasonable value. I don't remember where that very large value came from, but it was probably left over from a debugging session when the test was first developed. The test seems to run fine with the new smaller value.

Beyond that, I realized that we could/should make the dfmodules tests more user friendly when they are run on computers that have limited resources. In particular, we should make consistent use of the pattern that we have in other integtests in which we check for sufficient resources and skip the test if insufficient resources are available on the current computer.

To avoid copy/pasting all of the resource-checking code among all of the integtests, I created a helper class in the integrationtest repo that does the work of checking the requested resources and formatting strings that describe any problems that are found. This means that these changes are dependent on ones in the integrationtest repo.

The general pattern for using the helper class is the following:

  • create an instance of the class
  • set minimum values for the quantities that we want to ensure are sufficient, e.g. one or more of CPU count, free memory, total memory, free disk space, and total disk space.
  • skip the running of the usual drunc command if insufficient resources are available (run a very simple command like "wait 1" instead)
  • in each of the pytest tests, check if sufficient resource were found, and use pytest.skip to skip the test if not

I've run these changes on daq.fnal.gov, np04-srv-002, np04-srv-005, and np04-srv-016, and that all worked (with certain tests being appropriately skipped when resources were limited, and with the understanding that the hdf5_compression_test and the max_file_size_test currently produce logfile complaints about gRPC ping messages).

Testing Suggestions

Here are suggested steps for testing these changes:

DATE_PREFIX=`date '+%d%b'`
TIME_SUFFIX=`date '+%H%M'`

source /cvmfs/dunedaq.opensciencegrid.org/setup_dunedaq.sh
setup_dbt latest
dbt-create -n NFD_DEV_251008_A9 ${DATE_PREFIX}FDDevTest_${TIME_SUFFIX}
cd ${DATE_PREFIX}FDDevTest_${TIME_SUFFIX}/sourcecode

git clone https://github.com/DUNE-DAQ/daqsystemtest.git -b develop
git clone https://github.com/DUNE-DAQ/dfmodules.git -b kbiery/integtest_resource_protections
cd ..

dbt-workarea-env

git clone https://github.com/DUNE-DAQ/integrationtest.git -b kbiery/integtest_resource_protections
cd integrationtest
pip install -U .
cd ..

dbt-build -j 12
dbt-workarea-env

dfmodules_integtest_bundle.sh

echo
echo " *** Note that the tests worked, except for maybe hdf5_compression_test and max_file_size_test,"
echo " *** which have been reporting a warning message in the logfiles from gRPC."

Correlated Issues and/or PRs

A feature branch name of kbiery/integtest_resource_protections is used in both repositories.

Useful coordination

No special coordination will be needed for merging these changes to develop branches beyond announcing the merges on the appropriate Slack channel.

Copy link
Member

@eflumerf eflumerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran dfmodules_integtest_bundle.sh, things look good

@bieryAtFnal bieryAtFnal marked this pull request as ready for review October 9, 2025 11:32
@bieryAtFnal bieryAtFnal merged commit 6143e06 into develop Oct 11, 2025
3 checks passed
@bieryAtFnal bieryAtFnal deleted the kbiery/integtest_resource_protections branch October 11, 2025 19:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants