Skip to content

Add sbom generation tooling (#2232)#106

Open
Lukasz-Juranek wants to merge 1 commit intoeclipse-score:mainfrom
Lukasz-Juranek:feat/issue-2232-sbom-init
Open

Add sbom generation tooling (#2232)#106
Lukasz-Juranek wants to merge 1 commit intoeclipse-score:mainfrom
Lukasz-Juranek:feat/issue-2232-sbom-init

Conversation

@Lukasz-Juranek
Copy link

@Lukasz-Juranek Lukasz-Juranek commented Jan 31, 2026

This PR adds SBOM Bazel rules for SCORE modules. For setup details see the SBOM_Readme.md.

Generated SBOM files (reference_integration — sbom_kyron_module, 2026-02-27)

Built with bazel build //:sbom_kyron_module after all compliance fixes.

sbom_kyron_module.spdx.json — SPDX 2.3 (96 KB, 93 packages)
sbom_kyron_module.cdx.json — CycloneDX 1.6 (75 KB, 93 components)

Both files pass sbomgenerator.com/tools/validator with zero errors and zero warnings.

Coverage (93 packages)

Metric Count %
With license 71 76%
With checksum (SHA-256) 74 80%
With description 93 100%
With supplier 91 98%

Missing licenses: 22 packages — iceoryx2-*-qnx8 platform forks (not in Eclipse Foundation / ClearlyDefined) + a handful of other crates not yet indexed.

Generated file formats

Output file Format Notes
`.spdx.json` SPDX 2.3 `checksums` SHA-256 for crates from lockfile; boolean operators uppercase-normalized (e.g. `Apache-2.0 OR MIT`)
`.cdx.json` CycloneDX 1.6 Schema `https://cyclonedx.org/schema/bom-1.6.schema.json\`; compound expressions use `expression` field
`_crates_metadata.json` Internal cache Generated by `dash-license-scan` + crates.io API when `auto_crates_cache = True`
`_cdxgen.cdx.json` CycloneDX (cdxgen) C++ dependency scan output when `auto_cdxgen = True`

CISA 2025 element coverage (both formats)

Element SPDX 2.3 CycloneDX 1.6
Component Name
Component Version
Software Identifiers (PURL) ✅ `pkg:cargo/`, `pkg:generic/`, `pkg:github/`
Component Hash (SHA-256) ✅ `checksums` field ✅ `hashes` field
License ✅ Rust via dash-license-scan
Supplier
Description ✅ Rust via crates.io API
Dependency Relationship ✅ `DEPENDS_ON` ✅ `dependencies` graph
SBOM Author ✅ `creators` ✅ `metadata.authors`
Tool Name ✅ `creators` ✅ `metadata.tools`
Timestamp
Generation Context ✅ `metadata.lifecycles`

Compliance fixes (commits e4432679acc306)

  • PURL type: `pkg:bazel/` → `pkg:generic/` (unregistered PURL type)
  • SPDX checksums: `checksums` field now emitted for packages with known SHA-256
  • CycloneDX `$schema`: `http://` → `https://`
  • crates.io `externalReferences`: distribution URL emitted for all lockfile-derived crate components
  • BCR known licenses: `BCR_KNOWN_LICENSES` + `apply_known_licenses()` — auto-fills license/supplier for `boost.*`, `abseil-cpp`, `zlib`, etc.
  • License operator case: `_normalize_spdx_license()` in both formatters — `Apache-2.0 or MIT` → `Apache-2.0 OR MIT`; compound expressions correctly routed to CycloneDX `expression` field

Dependabot / GitHub Dependency Graph integration (f2cc71f)

The generated SPDX 2.3 SBOMs can feed GitHub Dependabot via the Dependency Submission API.

New: `sbom/scripts/spdx_to_github_snapshot.py`

Converts SPDX 2.3 → GitHub Dependency Submission snapshot (handles SPDX 2.3 — GitHub's official action only supports 2.2):

  • Extracts all packages with PURLs (`pkg:cargo`, `pkg:generic`, `pkg:github`, etc.)
  • Test run on `sbom_kyron_module.spdx.json`: 103 packages → 102 snapshot packages

New: `.github/workflows/sbom_dependency_submission.yml`

Reusable workflow: builds SBOM → converts `*.spdx.json` → POSTs to `dependency-graph/snapshots`.

Prerequisite: enable Dependency Graph in repo Settings → Code security → Dependency graph.

jobs:
  sbom:
    uses: eclipse-score/tooling/.github/workflows/sbom_dependency_submission.yml@main
    with:
      sbom_target: '//:sbom_all'
      release_tag: ${{ github.ref_name }}
    permissions:
      contents: write

Dash-license-scan results (288 crates from `MODULE.bazel.lock`)

Metric Count %
With license 261 90.6%
With description 288 100%
With supplier 285 99.0%
With checksum 269 93.4%

27 crates lack license data: 19 `iceoryx2-*-qnx8` platform forks + 8 others (`tonic`, `ipc-channel`, `argh`, etc.) not yet indexed by Eclipse Foundation / ClearlyDefined.

Test results: 79 passed

Related

SBOM is automatically generated — no modifications are needed for SCORE modules.

@AlexanderLanin
Copy link
Member

@Lukasz-Juranek this looks interesting! Can you describe a little why we need custom code? No native bazel support etc.

@Lukasz-Juranek
Copy link
Author

Lukasz-Juranek commented Feb 3, 2026

Hi right now i'm not aware of any out of box support for bazel that would really cover all c++ imports and rust code.

But IMO tooling itself is not important that much, this can be replaced later on with anything

What is important is to start getting the SBOM data for 3rd party dependencies in any meaningful format when there is not much deps, and to build in SCORE developers this behavior that when you import some stuff to your project you think about SBOM.

If SBOM data will be available then you can do conversion to some mature solution.

@Lukasz-Juranek Lukasz-Juranek force-pushed the feat/issue-2232-sbom-init branch from ffca1f1 to e308870 Compare February 7, 2026 09:37
@Lukasz-Juranek
Copy link
Author

Lukasz-Juranek commented Feb 7, 2026

Updated PR according discussion that we had https://github.com/orgs/eclipse-score/discussions/2226#discussioncomment-15669973

Now manual generation of is removed , sbom data is generated via

  • cdxgen for C++
  • for rust @score_crates//:MODULE.bazel.lock is used

Added mandatory fields from

  • CISA 2025 Element Coverage for CycloneDX

For details see updated readme https://github.com/Lukasz-Juranek/score-tooling/blob/8e9cb957c32b7b2da9b4faf607be172ba9b14a6f/sbom/SBOM_Readme.md

@masc2023 you asked for list of tools here it is https://github.com/Lukasz-Juranek/score-tooling/blob/8e9cb957c32b7b2da9b4faf607be172ba9b14a6f/sbom/SBOM_Readme.md#core-tools

@Lukasz-Juranek Lukasz-Juranek force-pushed the feat/issue-2232-sbom-init branch 2 times, most recently from cad2c59 to 8e9cb95 Compare February 8, 2026 14:04
@Lukasz-Juranek Lukasz-Juranek force-pushed the feat/issue-2232-sbom-init branch 4 times, most recently from 9382341 to c8d2a39 Compare February 9, 2026 19:51
| [@cyclonedx/cdxgen](https://github.com/CycloneDX/cdxgen) | C++ dependency scanner and license discovery tool | C++ metadata extraction when `auto_cdxgen = True` |
| [Node.js / npm](https://nodejs.org) | Runtime for cdxgen | C++ metadata extraction when `auto_cdxgen = True` |

### Five-Phase Architecture
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which of these actually depend on bazel? Can we use this tool without bazel?

Copy link
Author

@Lukasz-Juranek Lukasz-Juranek Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexanderLanin thx for review.

cyclonedx/cdxgen you can use without bazel, both NPM and cdxgen are currently required on system to be installed on system they are not deployed via bazel.

Bazel is only needed to collect dependencies that are needed to build the target,
This data is then hold in sbom_aspect then tool is matching data from this structure to data from cdxgen and bazel lockfile

But if you have all data that is needed in dash-license-scan then we should use this as source of data, not cdxgen and bazel lockfile

Maybe lets make some short sync on this just ping me on slack when you will have time

For think that we can't use only the dash-licensese-scan but we can extract license data from it

Field crates_metadata.json provides dash-license-scan provides
version Yes Yes (from Cargo.lock parse)
license Yes (from crates.io API) Yes (from Eclipse/ClearlyDefined)
checksum (SHA-256) Yes No
purl Yes (pkg:cargo/name@ver) No (uses crate/cratesio/-/ format)
repository Yes (from crates.io API) No

#!/usr/bin/env python3
"""Generate crates.io metadata cache for SBOM generation.

This script parses Cargo.lock files and/or MODULE.bazel.lock files,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo.lock parsing has also been implemented in dash-license-scan. Probably needs alignment.

lockfile_path: Path to MODULE.bazel.lock file

Returns:
Dict mapping crate name to {version, checksum, source}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you know which version to use? Thats the main reason I have not implemented bazel lockfile parsing in dash-license-scan, since there is no way to know which version is actually relevant?!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i was not aware of this problem

@@ -0,0 +1,142 @@
"""Tests for CycloneDX 1.6 formatter."""

import unittest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you consider pytest? I don't know if its better or worse. But we use pytest everywhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok i will change framework

@@ -0,0 +1,78 @@
{
"serde": {"license": "MIT OR Apache-2.0"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that different versions can have different licenses

Copy link
Author

@Lukasz-Juranek Lukasz-Juranek Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is not used ,

The actual license data now comes from crates_metadata.json (generated by the cache script from crates.io API + MODULE.bazel.lock).

i need to clean this trash

targets, traversing their transitive dependencies and generating output
in SPDX 2.3 and/or CycloneDX 1.6 format.

License metadata is collected automatically:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where dash-license-scan could somehow fit it. It collects verified license information.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i will check it if we can match get license field from dash output makes sense.

# bazel_dep module (version from module graph)
sbom_ext.license(
name = "googletest",
license = "BSD-3-Clause",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should use dash-licene-scan and not write hand collected license infos

@Lukasz-Juranek Lukasz-Juranek force-pushed the feat/issue-2232-sbom-init branch 2 times, most recently from 3c0af62 to 35c92b1 Compare February 10, 2026 17:04
@Lukasz-Juranek
Copy link
Author

@AlexanderLanin i've updated SBOM tooling to get license from dash you can take a look at some generated files

sbom_communication.cdx.json
sbom_kyron_module.cdx.json
sbom_baselibs.cdx.json

Two things to confirm

  1. Dash only supports rust right ? I could not get cpp licenses
  2. License ids like this are expected right ?
 "id": "Apache-2.0 AND MIT AND Apache-2.0 AND MIT"

@evinoth1206
Copy link

Thanks for the work on the SBOM generation! I reviewed the CycloneDX output and have some feedback:

Must fix:

  • Invalid license format for compound SPDX expressions: When a component has multiple licenses combined with AND/OR, the current format uses "license": { "id": "Apache-2.0 AND MIT" }, which fails schema validation. The id field only accepts a single SPDX identifier. Compound expressions must use the expression field instead:
    // ❌ Current (invalid):
    "licenses": [{ "license": { "id": "Apache-2.0 AND MIT" } }]

// ✅ Correct:
"licenses": [{ "expression": "Apache-2.0 AND MIT" }]

  • Missing description field on all components. Every component should have at least a short description for SBOM usability

Nice to have (can be addressed in a follow-up):

  • Missing cpe identifiers for OSS components. Adding CPE entries (e.g. cpe:2.3:a:google:flatbuffers:25.2.10:::::::*) would improve vulnerability matching against the NVD.

  • Flat dependency graph. Currently all dependsOn arrays are empty except for the root component, which lists everything as a direct dependency. Ideally the graph should reflect actual transitive relationships (e.g. serde-derive depends on syn, quote, proc-macro2).

@Lukasz-Juranek Lukasz-Juranek force-pushed the feat/issue-2232-sbom-init branch from 7b00264 to 3d5b273 Compare February 12, 2026 19:21
@Lukasz-Juranek
Copy link
Author

Thanks for the work on the SBOM generation! I reviewed the CycloneDX output and have some feedback:

Must fix:

  • Invalid license format for compound SPDX expressions: When a component has multiple licenses combined with AND/OR, the current format uses "license": { "id": "Apache-2.0 AND MIT" }, which fails schema validation. The id field only accepts a single SPDX identifier. Compound expressions must use the expression field instead:
    // ❌ Current (invalid):
    "licenses": [{ "license": { "id": "Apache-2.0 AND MIT" } }]

// ✅ Correct: "licenses": [{ "expression": "Apache-2.0 AND MIT" }]

  • Missing description field on all components. Every component should have at least a short description for SBOM usability

Nice to have (can be addressed in a follow-up):

  • Missing cpe identifiers for OSS components. Adding CPE entries (e.g. cpe:2.3:a:google:flatbuffers:25.2.10:::::::*) would improve vulnerability matching against the NVD.
  • Flat dependency graph. Currently all dependsOn arrays are empty except for the root component, which lists everything as a direct dependency. Ideally the graph should reflect actual transitive relationships (e.g. serde-derive depends on syn, quote, proc-macro2).

Updated PR fixed licenses you can check this i've validated it using https://tools.spdx.org/app/ntia_checker/ and it passed

sbom_kyron_module.cdx.json
sbom_kyron_module.spdx.json

@masc2023
Copy link

@Lukasz-Juranek , you can also check here, both files above seems valid
https://sbomgenerator.com/tools/validator

@Lukasz-Juranek Lukasz-Juranek marked this pull request as ready for review February 27, 2026 10:33
config: dict[str, Any],
timestamp: str,
) -> dict[str, Any]:
"""Generate SPDX 2.3 JSON document.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest version of SPDX is 3.0 at least.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are sure that tooling that we want to use for venerability scan and some other stuff is supporting SPDX 3.0 then fine i can generate SPDX 3.0

@Lukasz-Juranek
Copy link
Author

Attaching regenerated SBOM files (post license-normalization fix):

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Lukasz-Juranek Lukasz-Juranek force-pushed the feat/issue-2232-sbom-init branch from 1f779eb to 1aca6d2 Compare February 27, 2026 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Draft

Development

Successfully merging this pull request may close these issues.

5 participants