Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
43570a4
use consistent server name
oleschwen Aug 13, 2025
4c2ab1a
scripts to start a dummy training from the startup kits
oleschwen Aug 13, 2025
92ccb25
skeleton for further tests
oleschwen Aug 14, 2025
cc2f8e5
added preflight checks (without checking their output so far)
oleschwen Aug 15, 2025
8f12780
check if (source code for) license is available on github and if READ…
oleschwen Aug 18, 2025
76ebb4c
check if second startup kit can be built and contains expected files
oleschwen Aug 18, 2025
e4864b5
check output of preflight checks
oleschwen Aug 18, 2025
58cfd91
check captured console output of swarm training and files created
oleschwen Aug 18, 2025
b5f9ab6
Merge branch 'main' into 104-testing-startup-kits
oleschwen Aug 20, 2025
a67405e
test pushing and pulling image to/from local docker registry
oleschwen Aug 20, 2025
1e9cdcf
Revert "test pushing and pulling image to/from local docker registry"…
oleschwen Aug 20, 2025
c039c10
use defined variable rather than hard-coded name
oleschwen Aug 20, 2025
7dcb893
Merge branch 'main' into 104-testing-startup-kits
oleschwen Aug 21, 2025
53be096
Merge branch 'main' into 104-testing-startup-kits
oleschwen Aug 25, 2025
1cb5ad0
renamed file for running integration tests
oleschwen Aug 27, 2025
2d3bdac
refactored tests to be run in Docker: moved to separate scripts
oleschwen Aug 27, 2025
93a0c4e
Merge branch 'main' into 104-testing-startup-kits
oleschwen Aug 27, 2025
79cf7d6
moved method to script for running integration tests
oleschwen Aug 27, 2025
89b6d59
moved generating two sets of startup kits to script for integration t…
oleschwen Aug 27, 2025
7590b26
dedicated function to generate synthetic data
oleschwen Aug 27, 2025
4a7c078
more meaningful name for method
oleschwen Aug 27, 2025
e02911c
moved/merged cleanup to script for integration tests
oleschwen Aug 27, 2025
d67fa92
moved test for running Docker/GPU preflight check, i.e., extended exi…
oleschwen Aug 28, 2025
c9ba141
moved method for running data access preflight check
oleschwen Aug 28, 2025
b1c72f5
refactored so that individual steps (including cleanup) can be run se…
oleschwen Aug 28, 2025
580b0c0
integrated 3dcnn training in simulation mode in Docker in test
oleschwen Aug 28, 2025
0d85a88
let scripts fail on error
oleschwen Aug 28, 2025
9602dda
Merge branch 'main' into 104-testing-startup-kits
oleschwen Sep 8, 2025
f015f8b
moved (last remaining) test of server and clients to script for integ…
oleschwen Sep 8, 2025
c6b9ec6
removed unnecessary block
oleschwen Sep 8, 2025
da14feb
completed "all" section
oleschwen Sep 8, 2025
c30a39f
consistently output what is being run
oleschwen Sep 8, 2025
452621d
running simulation mode of 3D CNN training does not work yet, comment…
oleschwen Sep 9, 2025
92a2460
expanded run_local_tests and moved unit test script to more suitable …
oleschwen Sep 9, 2025
ba7363d
disabled NVFlare unit tests as before
oleschwen Sep 9, 2025
ddc3e7e
updated developer readme
oleschwen Sep 9, 2025
6ae32c2
run integration tests in CI in one go
oleschwen Sep 9, 2025
5809387
renamed expect script and moved it to more suitable location
oleschwen Sep 9, 2025
6a4f5e9
removed step using script that no longer exists
oleschwen Sep 9, 2025
1d5c43f
trying to enable test of 3D CNN in simulation mode
oleschwen Sep 9, 2025
efc53b3
moved check of name resolution to where it is needed
oleschwen Sep 9, 2025
ef9150e
removed unnecessary step
oleschwen Sep 9, 2025
e9117fa
made tests that do not use the startup kits callable individually
oleschwen Sep 9, 2025
ebaba99
call tests as separate steps in workflow
oleschwen Sep 9, 2025
1faeefb
arguments for docker run like in docker.sh from startup scripts to cr…
oleschwen Sep 9, 2025
2b89a83
write coverage file to location outside code directory
oleschwen Sep 9, 2025
b1d3102
ensure directory exists
oleschwen Sep 9, 2025
e0b6927
renamed server "localhost" so that it does not need mapping to an IP …
oleschwen Sep 9, 2025
fb60c8e
allow local user to create home directory
oleschwen Sep 9, 2025
ce492e1
avoid name clashes of Docker containers
oleschwen Sep 9, 2025
80c1428
fixed replacement of version identifiers
oleschwen Sep 9, 2025
89de533
fixed missing closing "
oleschwen Sep 10, 2025
f1df349
wait longer so that sys_info sees both clients
oleschwen Sep 10, 2025
450603e
check that models for dummy training are small
oleschwen Sep 10, 2025
5af06b8
added check whether job ID is logged by server
oleschwen Sep 10, 2025
17d8d77
use defined container name for container running admin console
oleschwen Sep 10, 2025
8d527d0
use correct container names in `docker kill`
oleschwen Sep 10, 2025
40abbe6
updated instructions on building startup kits
oleschwen Sep 10, 2025
1c9bbc9
check for keywords in documentation
oleschwen Sep 10, 2025
2d06b68
clean up temp dir in case more than this test is run in a container
oleschwen Sep 10, 2025
b38863c
updated documentation of test output
oleschwen Sep 10, 2025
5e2e3bd
Merge branch 'main' into 104-testing-startup-kits
oleschwen Sep 11, 2025
9ca51d3
check that aggregation and metrics are communicated
oleschwen Sep 15, 2025
b87f534
check number of rounds
oleschwen Sep 15, 2025
1030d6c
check that dummy training ApC is available
oleschwen Sep 15, 2025
ce1207e
temporarily removed failing test from CI workflow
oleschwen Sep 16, 2025
e086619
test listing licenses
oleschwen Sep 16, 2025
79b1b0b
Added test of pushing image to local registry (in separate Docker con…
oleschwen Sep 17, 2025
90794d8
removed lengthy test step that does not provide much value from CI pi…
oleschwen Sep 17, 2025
8278720
more speaking names of the CI test steps
oleschwen Sep 17, 2025
f4470b9
fixed syntax of workflow
oleschwen Sep 17, 2025
095c1b7
do not need -it for listing licenses
oleschwen Sep 17, 2025
02ff284
implemented test that client with incorrect startup kit cannot connect
oleschwen Sep 19, 2025
e3533ce
Merge branch 'main' into 104-testing-startup-kits
oleschwen Sep 22, 2025
71cc567
added file forgotten in 02ff2848adb34e11e1ae10caeb68579a5c7bb965
oleschwen Sep 23, 2025
e027f6f
Merge branch 'main' into 104-testing-startup-kits
oleschwen Sep 23, 2025
01ce7a1
Merge branch 'main' into 104-testing-startup-kits
oleschwen Sep 24, 2025
25e7322
Implemented test setup for swarm nodes to connect to locally hosted V…
oleschwen Sep 24, 2025
cc7f7d8
ensure that VPN docker image exists before starting server
oleschwen Sep 24, 2025
f802a09
avoid need for admin rights for client nodes: allow local user to sta…
oleschwen Sep 30, 2025
f59465e
updated pinned versions
oleschwen Sep 30, 2025
04516dc
added VPN IPs for production server, use same "--add-host" mechanism …
oleschwen Sep 30, 2025
9780daf
made path from where VPN credentials are copied more configurable
oleschwen Sep 30, 2025
2b84145
pass directory with VPN credentials as command-line argument
oleschwen Sep 30, 2025
0d1f098
extended documentation how to build startup kits with VPN credentials
oleschwen Sep 30, 2025
4ebad3a
CI script now needs additional argument for building image
oleschwen Sep 30, 2025
cf0dec9
updated apt package versions
oleschwen Sep 30, 2025
b34c83b
Merge branch 'main' into dev-122-vpn-from-within-container
oleschwen Oct 2, 2025
29561e1
build only one test image (without Docker cache), added argument (fol…
oleschwen Oct 2, 2025
837c42d
corrected comment
oleschwen Oct 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 41 additions & 32 deletions .github/workflows/pr-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ jobs:
SITE_NAME: UKA
PYTHONUNBUFFERED: 1


steps:
- name: Checkout repository (with submodules)
uses: actions/checkout@v3
Expand All @@ -37,50 +36,60 @@ jobs:
echo "VERSION=$VERSION"
echo "version=$VERSION" >> $GITHUB_OUTPUT

- name: Build Docker image for real project (MEVIS)
run: |
chmod +x buildDockerImageAndStartupKits.sh
./buildDockerImageAndStartupKits.sh -p application/provision/project_MEVIS_test.yml
- name: Build Docker image and startup kits for test project
run: ./buildDockerImageAndStartupKits.sh -p tests/provision/dummy_project_for_testing.yml -c tests/local_vpn/client_configs

- name: Show workspace path for MEVIS project
- name: Show workspace path for test project
run: |
echo "WORKSPACE_PATH: ${{ env.WORKSPACE_PATH }}"
find workspace -maxdepth 1 -type d -name "odelia_*_MEVIS_test" || echo "No workspace found"
find workspace -maxdepth 1 -type d -name "odelia_*_dummy_project_for_testing" || echo "No workspace found"

- name: Run integration test checking documentation on github
continue-on-error: false
run: |
./runIntegrationTests.sh check_files_on_github

- name: Build Docker image and dummy startup kits
run: ./buildDockerImageAndStartupKits.sh -p tests/provision/dummy_project_for_testing.yml --use-docker-cache
- name: Run controller unit tests
continue-on-error: false
run: |
./runIntegrationTests.sh run_unit_tests_controller

- name: Prepare dummy trainings
continue-on-error: true
- name: Run dummy training standalone
continue-on-error: false
run: |
./runTestsInDocker.sh prepare_dummy_trainings
echo "Dummy training project prepared"
./runIntegrationTests.sh run_dummy_training_standalone

- name: Run dummy training
- name: Run dummy training in simulation mode
continue-on-error: false
run: |
./runTestsInDocker.sh run_dummy_training
echo "Dummy training finished"
echo "=== Checking log output ==="
ls -lh workspace/*/prod_00/client_A/logs || echo "No logs found for dummy training"
./runIntegrationTests.sh run_dummy_training_simulation_mode

- name: Run 3D CNN tests
- name: Run dummy training in proof-of-concept mode
continue-on-error: false
run: |
./runTestsInDocker.sh run_3dcnn_tests
echo "3D CNN tests check finished"
echo "=== Checking synthetic log output ==="
ls -lh workspace/*/prod_00/client_A/logs || echo "No logs found for 3D CNN tests"
./runIntegrationTests.sh run_dummy_training_poc_mode

- name: Run Unit Tests inside Docker
continue-on-error: true
- name: Run 3DCNN training in simulation mode
continue-on-error: false
run: |
./runTestsInDocker.sh run_tests
echo "=== [LOG CHECK] ==="
docker logs $(docker ps -a -q --latest) | grep -i "error" && echo "Error found in logs" || echo "No error found"
./runIntegrationTests.sh run_3dcnn_simulation_mode

- name: Cleanup training artifacts
continue-on-error: true
- name: Run integration test creating startup kits
continue-on-error: false
run: |
./runIntegrationTests.sh create_startup_kits

- name: Run intergration test listing licenses
continue-on-error: false
run: |
./runIntegrationTests.sh run_list_licenses

- name: Run integration test Docker GPU preflight check
continue-on-error: false
run: |
./runIntegrationTests.sh run_docker_gpu_preflight_check

- name: Run integration test Data access preflight check
continue-on-error: false
run: |
./runTestsInDocker.sh cleanup
echo "Cleanup finished"
./runIntegrationTests.sh run_data_access_preflight_check
21 changes: 14 additions & 7 deletions _buildStartupKits.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,32 @@

set -euo pipefail

if [ "$#" -ne 2 ]; then
echo "Usage: _buildStartupKits.sh SWARM_PROJECT.yml VERSION_STRING"
if [ "$#" -lt 3 ]; then
echo "Usage: _buildStartupKits.sh SWARM_PROJECT.yml VERSION_STRING CONTAINER_NAME [VPN_CREDENTIALS_DIR]"
exit 1
fi

PROJECT_YML=$1
VERSION=$2
CONTAINER_NAME=$3
MOUNT_VPN_CREDENTIALS_DIR=""
if [ "$#" -eq 4 ]; then
MOUNT_VPN_CREDENTIALS_DIR="-v $4:/vpn_credentials/"
fi

sed -i 's#__REPLACED_BY_CURRENT_VERSION_NUMBER_WHEN_BUILDING_STARTUP_KITS__#'$VERSION'#' $PROJECT_YML
echo "Building startup kits for project $PROJECT_YML with version $VERSION"

ARGUMENTS="$PROJECT_YML $VERSION"

echo "Building startup kits: $ARGUMENTS"
docker run --rm \
-u $(id -u):$(id -g) \
-v /etc/passwd:/etc/passwd \
-v /etc/group:/etc/group \
-v ./:/workspace/ \
$MOUNT_VPN_CREDENTIALS_DIR \
-w /workspace/ \
-e PROJECT_YML=$PROJECT_YML \
-e VERSION=$VERSION \
jefftud/odelia:$VERSION \
/bin/bash -c "nvflare provision -p \$PROJECT_YML && ./_generateStartupKitArchives.sh \$PROJECT_YML \$VERSION"|| { echo "Docker run failed"; exit 1; }
$CONTAINER_NAME \
/bin/bash -c "nvflare provision -p $PROJECT_YML && ./_generateStartupKitArchives.sh $ARGUMENTS"|| { echo "Docker run failed"; exit 1; }

sed -i 's#'$VERSION'#__REPLACED_BY_CURRENT_VERSION_NUMBER_WHEN_BUILDING_STARTUP_KITS__#' $PROJECT_YML
7 changes: 7 additions & 0 deletions _generateStartupKitArchives.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,14 @@ TARGET_FOLDER=`ls -d $OUTPUT_FOLDER/prod_* | tail -n 1`
LONG_VERSION=$2

cd $TARGET_FOLDER

for startupkit in `ls .`; do
VPN_CREDENTIALS_FILE=/vpn_credentials/${startupkit}_client.ovpn
if [[ -f $VPN_CREDENTIALS_FILE ]]; then
cp $VPN_CREDENTIALS_FILE ${startupkit}/startup/vpn_client.ovpn
else
echo "$VPN_CREDENTIALS_FILE does not exist, omitting VPN credentials for ${startupkit} in startup kit"
fi
zip -rq ${startupkit}_$LONG_VERSION.zip $startupkit
echo "Generated startup kit $TARGET_FOLDER/${startupkit}_$LONG_VERSION.zip"
done
54 changes: 0 additions & 54 deletions _runTestsInsideDocker.sh

This file was deleted.

22 changes: 14 additions & 8 deletions assets/readme/README.developer.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,22 +24,24 @@ The project description specifies the swarm nodes etc. to be used for a swarm tr
kits, running local trainings in the startup kit), you can manually push the image to DockerHub, provided you have
the necessary rights. Make sure you are not re-using a version number for this purpose.

## Running Local Tests
## Running Tests

```bash
./runTestsInDocker.sh
./runIntegrationTests.sh
```

You should see

1. several expected errors and warnings printed from unit tests that should succeed overall, and a coverage report
2. output of a successful simulation run with two nodes
3. output of a successful proof-of-concept run run with two nodes
4. output of a set of startup kits being generated
5. output of a dummy training run using one of the startup kits
6. TODO update this to what the tests output now
2. output of a successful simulation run of a dummy training with two nodes
3. output of a successful proof-of-concept run of a dummy training with two nodes
4. output of a successful simulation run of a 3D CNN training using synthetic data with two nodes
5. output of a set of startup kits being generated
6. output of a Docker/GPU preflight check using one of the startup kits
7. output of a data access preflight check using one of the startup kits
8. output of a dummy training run in a swarm consisting of one server and two client nodes

Optionally, uncomment running NVFlare unit tests in `_runTestsInsideDocker.sh`.
Optionally, uncomment running NVFlare unit tests.

## Distributing Startup Kits

Expand Down Expand Up @@ -93,3 +95,7 @@ export CONFIG=original
run in the swarm
3. Use the local tests to check if the code is swarm-ready
4. TODO more detailed instructions

## Continuous Integration

Tests to be executed after pushing to github are defined in `.github/workflows/pr-test.yaml`.
16 changes: 12 additions & 4 deletions assets/readme/README.operator.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,17 +19,25 @@ For example, add the following line (replace `<IP>` with the server's actual IP
<IP> dl3.tud.de dl3
```

TODO describe this in participant REAME if needed

## Create Startup Kits

### Via Script (recommended)

1. Use, e.g., the file `application/provision/project_MEVIS_test.yml`, adapt as needed (network protocol etc.)
2. Call `buildStartupKits.sh /path/to/project_configuration.yml` to build the startup kits
2. Call `buildDockerImageAndStartupKits.sh -p /path/to/project_configuration.yml -c /path/to/directory/with/VPN/credentials` to build the Docker image and the startup kits
- swarm nodes (admin, server, clients) are configured in `project_configuration.yml`
- the directory with VPN credentials should contain one `.ovpn` file per node
- use `-c tests/local_vpn/client_configs/` to build startup kits for the integration tests
3. Startup kits are generated to `workspace/<name configured in the .yml>/prod_00/`
4. Deploy startup kits to the respective server/clients
4. Deploy startup kits to the respective server/client operators
5. Push the Docker image to the registry

### Via the Dashboard (not recommended)

Build the Docker image as described above.

```bash
docker run -d --rm \
--ipc=host -p 8443:8443 \
Expand Down Expand Up @@ -69,14 +77,14 @@ Access the dashboard at `https://localhost:8443` log in with the admin credentia
2. Client Sites > approve client sites
3. Project Home > freeze project

## Download startup kits
#### Download startup kits

After setting up the project admin configuration, server and clients can download their startup kits. Store the
passwords somewhere, they are only displayed once (or you can download them again).

## Starting a Swarm Training

1. Connect the *server* host to the VPN as described above.
1. Connect the *server* host to the VPN as described above. (TODO update documentation, this step is not needed if the Docker container handles the VPN connection.)
2. Start the *server* startup kit using the respective `startup/docker.sh` script with the option to start the server
3. Provide the *client* startup kits to the swarm participants (be aware that email providers or other channels may
prevent encrypted archives)
Expand Down
33 changes: 19 additions & 14 deletions buildDockerImageAndStartupKits.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,20 +13,20 @@ DOCKER_BUILD_ARGS="--no-cache --progress=plain";
while [[ "$#" -gt 0 ]]; do
case $1 in
-p) PROJECT_FILE="$2"; shift ;;
-c) VPN_CREDENTIALS_DIR="$2"; shift ;;
--use-docker-cache) DOCKER_BUILD_ARGS="";;
*) echo "Unknown parameter passed: $1"; exit 1 ;;
esac
shift
done

if [ -z "$PROJECT_FILE" ]; then
echo "Usage: buildDockerImageAndStartupKits.sh -p <swarm_project.yml> [--use-docker-cache]"
if [[ -z "$PROJECT_FILE" || -z "$VPN_CREDENTIALS_DIR" ]]; then
echo "Usage: buildDockerImageAndStartupKits.sh -p <swarm_project.yml> -c <VPN credentials directory> [--use-docker-cache]"
exit 1
fi

VERSION=`./getVersionNumber.sh`
DOCKER_IMAGE=jefftud/odelia:$VERSION

CONTAINER_VERSION_ID=`git rev-parse --short HEAD`

# prepare clean version of source code repository clone for building Docker image

Expand All @@ -41,14 +41,15 @@ git clean -x -q -f .
cd ../..
rm .git -rf
chmod a+rX . -R
sed -i 's#__REPLACED_BY_CURRENT_VERSION_NUMBER_WHEN_BUILDING_DOCKER_IMAGE__#'$VERSION'#' docker_config/master_template.yml
cd $CWD

# replacements in copy of source code
sed -i 's#__REPLACED_BY_CURRENT_VERSION_NUMBER_WHEN_BUILDING_DOCKER_IMAGE__#'$VERSION'#' docker_config/master_template.yml
sed -i 's#__REPLACED_BY_CONTAINER_VERSION_IDENTIFIER_WHEN_BUILDING_DOCKER_IMAGE__#'$CONTAINER_VERSION_ID'#' docker_config/master_template.yml

# prepare pre-trained model weights for being included in Docker image

MODEL_WEIGHTS_FILE='docker_config/torch_home_cache/hub/checkpoints/dinov2_vits14_pretrain.pth'
MODEL_LICENSE_FILE='docker_config/torch_home_cache/hub/facebookresearch_dinov2_main/LICENSE'
MODEL_WEIGHTS_FILE=$CWD'/docker_config/torch_home_cache/hub/checkpoints/dinov2_vits14_pretrain.pth'
MODEL_LICENSE_FILE=$CWD'/docker_config/torch_home_cache/hub/facebookresearch_dinov2_main/LICENSE'
if [[ ! -f $MODEL_WEIGHTS_FILE || ! -f $MODEL_LICENSE_FILE ]]; then
echo "Pre-trained model not available. Attempting download"
HUBDIR=$(dirname $(dirname $MODEL_LICENSE_FILE))
Expand All @@ -61,22 +62,26 @@ if [[ ! -f $MODEL_WEIGHTS_FILE || ! -f $MODEL_LICENSE_FILE ]]; then
fi

if echo 2e405cee1bad14912278296d4f42e993 $MODEL_WEIGHTS_FILE | md5sum --check - && echo 153d2db1c329326a2d9f881317ea942e $MODEL_LICENSE_FILE | md5sum --check -; then
cp -r ./docker_config/torch_home_cache $CLEAN_SOURCE_DIR/torch_home_cache
cp -r $CWD/docker_config/torch_home_cache $CLEAN_SOURCE_DIR/torch_home_cache
else
exit 1
fi
chmod a+rX $CLEAN_SOURCE_DIR/torch_home_cache -R

cd $CWD

# build and print follow-up steps
CONTAINER_NAME=`grep " docker_image: " $PROJECT_FILE | sed 's/ docker_image: //' | sed 's#__REPLACED_BY_CURRENT_VERSION_NUMBER_WHEN_BUILDING_STARTUP_KITS__#'$VERSION'#'`
echo $CONTAINER_NAME

docker build $DOCKER_BUILD_ARGS -t $DOCKER_IMAGE $CLEAN_SOURCE_DIR -f docker_config/Dockerfile_ODELIA
docker build $DOCKER_BUILD_ARGS -t $CONTAINER_NAME $CLEAN_SOURCE_DIR -f docker_config/Dockerfile_ODELIA

echo "Docker image $DOCKER_IMAGE built successfully"
echo "./_buildStartupKits.sh $PROJECT_FILE $VERSION"
./_buildStartupKits.sh $PROJECT_FILE $VERSION
echo "Docker image $CONTAINER_NAME built successfully"
echo "./_buildStartupKits.sh $PROJECT_FILE $VERSION $CONTAINER_NAME"
VPN_CREDENTIALS_DIR=$(realpath $VPN_CREDENTIALS_DIR)
./_buildStartupKits.sh $PROJECT_FILE $VERSION $CONTAINER_NAME $VPN_CREDENTIALS_DIR
echo "Startup kits built successfully"

rm -rf $CLEAN_SOURCE_DIR

echo "If you wish, manually push $DOCKER_IMAGE now"
echo "If you wish, manually push $CONTAINER_NAME now"
Loading