Skip to content

Conversation

@vandah
Copy link
Contributor

@vandah vandah commented Nov 12, 2025

Description

This PR adds a provider for testing devices with an NPU.
It is currently focused on Intel NPUs which use the accel kernel interface.
The driver for these NPUs is distributed through the intel-npu-driver snap which also includes a gtest-based testing utility npu-umd-test. The tests in this provider check the appropriate firmware version is loaded, the user has the appropriate permissions and the rest runs individual tests from the npu-umd-test utility.

Known issues

Some of the test names coming from the npu-umd-test test suite are longer than 80 characters which triggers a warning in checkbox.

Tests

Tests have been run on Meteor Lake and Arrow Lake devices.

@vandah vandah changed the title New: Add NPU provider Add NPU provider (New) Nov 12, 2025
@vandah vandah force-pushed the npu-provider branch 5 times, most recently from 40b763c to 3c0ef4f Compare November 13, 2025 09:29
@vandah vandah marked this pull request as draft November 13, 2025 15:24
@vandah vandah marked this pull request as ready for review November 14, 2025 12:54
@@ -0,0 +1,5 @@
# Checkbox Provider - NPU
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything that should be done when creating a new provider? CI-wise, packaging-wise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been told by @farshidtz that to run the coverage and different python version tests, the provider needs to be added in the tox-checkbox workflow.
However, I am not sure if there is more to do for a new provider.

estimated_duration: 2s
command: check_accel_permissions.py
imports: from com.canonical.plainbox import manifest
requires: manifest.has_npu == 'True'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that new manifest entries must now be submitted to the certification team to be added to C3's feature set

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C3 gets them from https://github.com/canonical/blueprints so you could also make a PR there, I think.

@vandah
Copy link
Contributor Author

vandah commented Dec 12, 2025

@fernando79513 and @tomli380576, can you please review this PR?

Copy link
Contributor

@pseudocc pseudocc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a passersby reviewer, see inline comments.

@@ -0,0 +1,5 @@
# Checkbox Provider - NPU

This provider includes tests for devices with an NPU. As of right now, it is intended only for Intel NPUs. The tests only run as long as the manifest entry `has_npu` is set to `true`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should check this instead of defining has_npu

modinfo -Falias intel_vpu                                                                                                    devenv-shell-env
pci:v00008086d0000FD3Esv*sd*bc*sc*i*
pci:v00008086d0000B03Esv*sd*bc*sc*i*
pci:v00008086d0000643Esv*sd*bc*sc*i*
pci:v00008086d0000AD1Dsv*sd*bc*sc*i*
pci:v00008086d00007D1Dsv*sd*bc*sc*i*

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean there should be no manifest entry at all ?
I have now added a job to check modinfo output, is that what you had in mind?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about reading the modinfo output in a resource job, but now I think it may not be the ideal way.

Say we have:

  • 6.17 kernel and a Panther Lake CPU (modaliases for Panther Lake were added in 6.18).

Then all NPU tests would be skipped, which shouldn't be what we expect. Let's just keep the manifest implementation; it's good.

Copy link
Contributor

@tomli380576 tomli380576 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test scripts lgtm! Could you provide a bit of documentation on what the expectations are for NPU_UMD_TEST_CONFIG like where it's supposed to be placed at, permissions, what exactly are the expected contents (for example the tree output of a correct setup), etc.

One small question: is the driver snap expected to come preinstalled? If not, I think we should print an error somewhere or mention it in the manifest.



def main():
config_path = os.environ.get("NPU_UMD_TEST_CONFIG")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think NPU_UMD_TEST_CONFIG also needs to be an absolute path since it's passed to dirname in one of the jobs

assert Path(config_path).is_absolute()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file (along with the model file) might have to be placed inside the driver snap's current directory or the umd tests will say "the file is bad" and trigger an apparmor deny message. For example if I put the config file in $HOME, this happens:

[Tue Dec 23 10:47:03 2025] audit: type=1400 audit(1766458023.464:331): apparmor="DENIED" operation="open" class="file" profile="snap.intel-npu-driver.npu-umd-test" name="/home/ubuntu/basic.yaml" pid=31451 comm="npu-umd-test" requested_mask="r" denied_mask="r" fsuid=1000 ouid=1000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this file is placed in the current directory by the script which installs the intel-npu-driver snap. The file is pretty much static but we're planning to have the file located directly inside the intel-npu-driver snap (since the format could change between versions of npu-umd-test which is already distributed as a binary in the intel-npu-driver snap).

Copy link
Contributor

@tomli380576 tomli380576 Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the model files would be bundled with the snap in the future? If so, I think you can use the path of the bundled files as the default inside the test case and only use the environment variable as an override. This would also make it easier to run the test since we won't have to specify NPU_UMD_TEST_CONFIG every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config file as well as the model files are now bundled with the intel-npu-driver snap on the latest/edge channel. Further, the npu-umd-test is defined such that it takes that bundled in config file as default when no config file is supplied but also passing a different config file with --config is still possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the code to make the NPU_UMD_TEST_CONFIG optional.

@vandah
Copy link
Contributor Author

vandah commented Jan 23, 2026

One small question: is the driver snap expected to come preinstalled? If not, I think we should print an error somewhere or mention it in the manifest.

The snap is not pre-installed but ideally the devices that do have the has_npu manifest entry should install the snap before running checkbox... Is there any way in checkbox to define this dependency and maybe even have the snap auto-installed by checkbox?

@vandah vandah requested a review from pseudocc January 23, 2026 07:55
pseudocc
pseudocc previously approved these changes Jan 23, 2026
Copy link
Contributor

@pseudocc pseudocc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me now, thanks!

@tomli380576
Copy link
Contributor

To make this case depend on whether the driver snap exists, add snap.name == "intel-npu-driver" in the requires: section of the test job; but note that this would make checkbox "silently" skip the job and put it in the "job with failed dependencies" pile if the snap is not installed even if the manifest is set to true.

@codecov
Copy link

codecov bot commented Jan 23, 2026

Codecov Report

❌ Patch coverage is 96.85185% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.66%. Comparing base (278c069) to head (1a6595b).
⚠️ Report is 79 commits behind head on main.

Files with missing lines Patch % Lines
...roviders/npu/tests/test_check_accel_permissions.py 94.73% 7 Missing ⚠️
providers/npu/bin/check_accel_permissions.py 86.95% 3 Missing ⚠️
providers/npu/bin/check_firmware_version.py 94.28% 2 Missing ⚠️
providers/npu/bin/intel_npu_gtest_resource.py 97.91% 1 Missing ⚠️
providers/npu/bin/min_kernel_version.py 95.65% 1 Missing ⚠️
providers/npu/tests/test_check_firmware_version.py 99.27% 1 Missing ⚠️
...oviders/npu/tests/test_intel_npu_gtest_resource.py 98.59% 1 Missing ⚠️
providers/npu/tests/test_min_kernel_version.py 98.50% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2198      +/-   ##
==========================================
+ Coverage   53.34%   55.66%   +2.32%     
==========================================
  Files         399      422      +23     
  Lines       42907    45440    +2533     
  Branches     7945     8190     +245     
==========================================
+ Hits        22887    25295    +2408     
- Misses      19214    19401     +187     
+ Partials      806      744      -62     
Flag Coverage Δ
checkbox-ng 71.69% <ø> (+0.28%) ⬆️
checkbox-support 69.55% <ø> (+4.26%) ⬆️
provider-base 32.17% <ø> (+2.34%) ⬆️
provider-certification-client 57.14% <ø> (ø)
provider-certification-server 57.14% <ø> (ø)
provider-genio 96.90% <ø> (ø)
provider-gpgpu 93.14% <ø> (ø)
provider-iiotg 100.00% <ø> (ø)
provider-npu 53.45% <96.85%> (?)
provider-resource 39.57% <ø> (+0.26%) ⬆️
provider-sru 97.97% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@vandah
Copy link
Contributor Author

vandah commented Jan 29, 2026

To make this case depend on whether the driver snap exists, add snap.name == "intel-npu-driver" in the requires: section of the test job; but note that this would make checkbox "silently" skip the job and put it in the "job with failed dependencies" pile if the snap is not installed even if the manifest is set to true.

I see, I have actually avoided using the dependency exactly so that it would be an explicit failure if the snap is not present but has_npu. In our testing pipeline, we install the snap with testflinger before running checkbox, but I was wondering how to make this more obvious if it gets run by someone else (and how does it work with dependencies when a device gets certified).

As a side note, I think it would be useful to have something like Ansible's failed_when which would allow the tests to fail based on the value of checkbox variables since as I understand, skipped tests have a bit different meaning than failed tests (not applicable vs. something is wrong).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants