Skip to content

Add shared C2PA text normalization helpers#18

Closed
erik-sv wants to merge 22 commits intomainfrom
codex/update-c2pa-module-for-full-spec-compliance
Closed

Add shared C2PA text normalization helpers#18
erik-sv wants to merge 22 commits intomainfrom
codex/update-c2pa-module-for-full-spec-compliance

Conversation

@erik-sv
Copy link
Contributor

@erik-sv erik-sv commented Oct 23, 2025

Summary

  • add a shared text_hashing helper that performs NFC normalization, exclusion filtering, and SHA-256 hashing for C2PA text assets
  • refactor the C2PA embedding/verification flows to call the helper and stabilise recorded exclusion spans
  • document the new helper across the C2PA guides and add regression tests for the hashing workflow

Testing

  • pytest tests/interop/test_text_hashing.py tests/integration/test_c2pa_text_embedding.py

erik-sv and others added 21 commits March 22, 2025 13:22
… examples, clarify secret_key usage, add demo video, and apply Black formatting
….0.0

This release introduces major architectural changes including a refactor for C2PA alignment, the adoption of Ed25519 digital signatures, and API consolidation, alongside bug fixes and enhanced examples.

Major Changes & Features (v2.0.0):
- feat(security): Replaced HMAC-based verification with Ed25519 digital signatures for enhanced security and non-repudiation. Keys are managed via private_key and public_key_resolver.
- refactor(api): Deprecated MetadataEncoder and consolidated all core functionality into the UnicodeMetadata class for a unified API surface.
- feat(c2pa): Refactored manifest structure and added encypher.interop.c2pa module to align more closely with C2PA principles for text-based content provenance.
- feat(core): Added explicit comments and docstrings highlighting C2PA concepts and Ed25519 usage.
- docs: Updated all documentation (README, guides, API refs) to reflect Ed25519 signatures, the UnicodeMetadata API, and C2PA alignment.

Supporting Changes:
- fix(core): Resolved critical signature verification failures in UnicodeMetadata by correcting payload serialization and structure handling.
- fix(tests): Updated 	est_unicode_metadata.py assertions for the refactored manifest structure and Ed25519 verification.
- feat(examples): Added examples/encypher_v2_demo.ipynb notebook and create_demo_notebook.py script demonstrating v2.0.0 features.
- docs: Updated documentation links to point to the new notebook example, including a Colab link.
- fix(deps): Added 'deprecated' package to dependencies to fix import errors.
- fix(types): Added type stubs for 'deprecated' package to fix mypy errors.
- fix(types): Fixed TypedDict definitions and mypy errors throughout the codebase.
…ocs for v2.1.0

Refactors the `UnicodeMetadata` class and `StreamingHandler` for improved clarity and consistency in version 2.1.0. Key changes include:

- `UnicodeMetadata.embed_metadata`:
    - Requires `signer_id` directly.
    - Accepts other metadata components as direct parameters.
- `UnicodeMetadata.verify_metadata`:
    - Renames `public_key_resolver` to `public_key_provider`.
    - `public_key_provider` now receives `signer_id`.
    - Standardizes return tuple.
- `UnicodeMetadata.extract_metadata`:
    - Consistently returns `BasicPayload` or `ManifestPayload` objects.
- Replaces `key_id` with `signer_id` throughout API and documentation.
- `StreamingHandler` constructor updated to accept direct metadata parameters.

Documentation:
- Updated quickstart, basic usage, user guides (extraction, encoding), and API reference for `UnicodeMetadata`.
- Revised all examples to reflect new API signatures and `signer_id` usage.
- Ensured all code examples are runnable and accurate for v2.1.0.

Fixes:
- Resolved import errors and runtime issues in fastapi_example.py
- Corrected `encypher.__version__` to report "2.1.0".
- Updated changelog date for v2.1.0.
This commit introduces a comprehensive refactoring of the EncypherAI core package to improve code organization, maintainability, and clarity. The changes follow the Single Responsibility Principle by splitting functionality into dedicated modules.

Key Changes:
- Split crypto_utils.py into three specialized modules:
  - keys.py: Key generation, loading, and saving functions
  - payloads.py: Payload data structures and serialization
  - signing.py: Digital signature operations
- Added backward compatibility layer in crypto_utils.py
- Renamed functions for clarity and consistency
- Updated all internal imports across the codebase
- Created comprehensive test suite for signing functionality
- Updated documentation including migration guide
- Applied code formatting standards with black, isort, and ruff
This release significantly improves C2PA interoperability and adds CBOR support for more compact and standards-aligned manifest storage.

Key Changes:
- Fixed timestamp handling in C2PA conversion to ensure timestamp is properly included in c2pa.created assertion data
- Added CBOR manifest format for embedding full C2PA-compliant manifests
- Implemented conversion utilities between EncypherAI and C2PA-like structures
- Added support for nested JSON-LD assertions (e.g., schema.org CreativeWork)
- Created comprehensive example demonstrating C2PA embedding and verification workflows
- Updated documentation with detailed C2PA relationship guide
- Added tests for C2PA and CBOR manifest round-trips
- Resolved mypy static type checking errors in unicode_metadata module

This release maintains backward compatibility while extending functionality to better align with C2PA standards and support more complex provenance metadata structures.
…treamingHandler fixes

- Added C2PA v2.2 compliance with proper manifest structure, hard/soft binding, and advanced signing
- Added conditional hard binding control for streaming content verification
- Fixed critical bug in StreamingHandler metadata embedding to preserve all user-supplied metadata fields
- Fixed StreamingHandler initialization for proper C2PA manifest creation
- Fixed multiple mypy type errors and added types-requests to dev dependencies
- Added conftest.py to skip Gemini integration test when API key is not available
- Updated documentation for streaming verification with require_hard_binding=False
- Added comprehensive Jupyter notebook demo showcasing all v2.4.0 features
…bedding

- implemented a tiny JUMBF box serializer/deserializer
- mentioned new format in README and technical guide
- documented JUMBF support in the changelog
add optional omit_keys to UnicodeMetadata.embed_metadata
apply omit filtering in StreamingHandler and CLI
document the new feature in README and docs
bump version to 2.5.0
test coverage for omit key handling
* Normalize C2PA format references

* Updated version and changelog
… to Encypher Corporation

- Added new embedding targets to docs and examples:
  - FILE_END ("file_end") — appends variation selectors at end of text
  - FILE_END_ZWNBSP ("file_end_zwnbsp") — prefixes U+FEFF then appends selectors at end
- Updated API and streaming references to include new targets and behavior:
  - docs/package/api-reference/unicode_metadata.md
  - docs/package/api-reference/streaming-metadata-encoder.md
  - docs/package/streaming/handlers.md
  - docs/package/examples/advanced-usage.md
  - docs/package/examples/c2pa_text_demo.md
- Updated changelogs with 2.8.0 "Added" entry documenting new targets:
  - docs/package/CHANGELOG.md
  - docs/package/changelog.md

Branding and license:
- Updated logos and branding assets to Encypher Corporation (docs/assets/*)
- Updated license holder and references to Encypher Corporation (LICENSE.md, README.md)

No breaking changes.
… extraction

tests: split success vs tamper; expand FILE_END matrix (32 cases)

docs: update changelog for 2.8.1; improve diagnostics and robustness
…traction; flatten manifest extraction

Fix IndentationError in unicode_metadata.py blocking test collection
Ensure custom_metadata is embedded for manifest and cbor_manifest
Flatten extracted manifest payloads so custom_metadata is surfaced
Improve robustness: strip BOM pre-parse; ignore stray trailing VS; better tail marker handling
test(integration): split success vs tamper; expand FILE_END matrix (32 cases)
test_extract_and_verify_file_end_success (PASS)
test_tamper_detection_file_end (XFAIL by design: non-hard-binding)
Cover format  {manifest,cbor_manifest}, BOM  {T,F}, newline  {LF,CRLF}, markers  {T,F}
docs(changelog): add 2.8.1 entry documenting fixes and test split
build(version): bump package version to 2.8.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant