From 63a666568e7e4a6027bc317e382989b8b05fb88a Mon Sep 17 00:00:00 2001 From: Tania Mathern Date: Mon, 9 Mar 2026 20:11:57 -0700 Subject: [PATCH 1/6] fix: Managed resources docs --- docs/native-resources-management.md | 304 +++++++++++++++++++++++++++- src/c2pa/c2pa.py | 11 +- 2 files changed, 307 insertions(+), 8 deletions(-) diff --git a/docs/native-resources-management.md b/docs/native-resources-management.md index 0acb5153..9bfb00c2 100644 --- a/docs/native-resources-management.md +++ b/docs/native-resources-management.md @@ -1,7 +1,299 @@ -# TBD +# Native resource management (ManagedResource class) -- Explain what ManagedResource is -- Why base class (multiple inheritance) -- What it does -- Class diagram (Mermaid) -- Lifecycle/state changes (mermaid) \ No newline at end of file +## Why is native resources management needed? + +The C2PA Python SDK is a wrapper around a native Rust library that exposes a C FFI. When the SDK creates a `Reader`, `Builder`, `Signer`, `Context`, or `Settings` object, that object holds a **pointer** to memory allocated on the native side. + +Python manages its own objects' memory automatically through garbage collection. In CPython (the standard interpreter), this works primarily through reference counting: each object has a counter tracking how many references point to it, and when that counter reaches zero the object is deallocated. A secondary cycle-detecting collector handles the case where objects reference each other in a loop and their counts never reach zero on their own. + +This system works well for pure Python objects, but native memory sits outside of it entirely. The garbage collector sees the Python wrapper object (e.g. a `Reader` instance) and tracks references to it, but it has no visibility into the native memory that the wrapper's `_handle` attribute points to. It does not know the size of that native allocation, cannot tell when it is no longer needed, and will not call the native library's `c2pa_free` function to release it. If the Python wrapper is collected without first calling `c2pa_free`, the native memory leaks. If `c2pa_free` is called twice on the same pointer, the process crashes. + +Python does offer `__del__` as a hook that runs when an object is collected, and `ManagedResource` uses it as a fallback. But `__del__` cannot be relied on as the primary cleanup mechanism: its timing is unpredictable, it may not run at all during interpreter shutdown, and other Python implementations (PyPy, GraalPy) that do not use reference counting make its behavior even less deterministic. + +In CPython, `__del__` runs synchronously when the last reference to an object disappears, which in simple cases happens at a predictable point (e.g. when a local variable goes out of scope). But if the object is part of a reference cycle, its reference count never reaches zero on its own. The cycle collector must discover and break the cycle first, and it runs periodically rather than immediately. An object caught in a cycle might sit in memory for an arbitrary amount of time before `__del__` fires. CPython's cycle collector does not guarantee an order when finalizing groups of objects in a cycle, so `__del__` methods that depend on other objects in the same cycle may find those objects already partially torn down. During interpreter shutdown, the situation is even less reliable: CPython clears module globals and may collect objects in an arbitrary order, and `__del__` methods that reference global state (like the `_lib` handle to the native library) can fail silently because those globals have already been set to `None`. PyPy and GraalPy use tracing garbage collectors (which periodically walk the object graph to find unreachable objects, rather than tracking individual reference counts) instead of reference counting, so `__del__` does not run when the last reference disappears. It runs at some later point when the GC happens to trace that region of the heap, which could be seconds or minutes later, or not at all if the process exits first. + +`ManagedResource` is the internal base class that handles this. Every class that holds a native pointer inherits from it. + +## Class hierarchy + +```mermaid +classDiagram + class ManagedResource { + <> + } + + class ContextProvider { + <> + } + + ManagedResource <|-- Settings + ManagedResource <|-- Context + ManagedResource <|-- Reader + ManagedResource <|-- Builder + ManagedResource <|-- Signer + + ContextProvider <|-- Context + ContextProvider <|-- Settings +``` + +`ContextProvider`, by contrast, is an ABC. It defines two abstract properties (`is_valid` and `execution_context`) that subclasses must implement. `Context` and `Settings` both inherit from it. Since Python supports multiple inheritance, `Context` gets lifecycle management from `ManagedResource` and the context-provider interface from `ContextProvider`. The `is_valid` property required by `ContextProvider` is implemented once, on `ManagedResource`, and inherited by both. + +## Preventing garbage collection of live references + +When a Python object passes a callback or pointer to the native library, that reference must stay alive for as long as the native side might use it. Python's garbage collector has no way to know that native code is still holding a reference to a Python callback. + +The SDK solves this by storing these references as instance attributes on the owning object. For example, `Stream` stores its four callback objects (`_read_cb`, `_seek_cb`, `_write_cb`, `_flush_cb`) as instance attributes. As long as the `Stream` object is alive, its callbacks have a nonzero reference count and will not be collected. Similarly, when a `Signer` is consumed by a `Context`, the Context copies the signer's `_callback_cb` to its own `_signer_callback_cb` attribute so the callback survives even though the Signer object is now closed. + +During cleanup, `_release()` sets these attributes to `None`, which drops the reference count on the callback objects and allows them to be collected. In the cleanup sequence, `_release()` runs first, then `c2pa_free` frees the native pointer. `_release()` goes first so that subclass-specific resources (open file handles, stream wrappers) are torn down before the native pointer they depend on is freed. + +## How native memory is freed + +The native Rust library exposes a single C FFI function, `c2pa_free`, that deallocates memory it previously allocated. `ManagedResource` wraps this in a static method: + +```python +@staticmethod +def _free_native_ptr(ptr): + _lib.c2pa_free(ctypes.cast(ptr, ctypes.c_void_p)) +``` + +All native pointers are freed through this single path, regardless of which constructor created them (`c2pa_reader_from_stream`, `c2pa_builder_from_json`, `c2pa_signer_from_info`, etc.). The `ctypes.cast` to `c_void_p` is needed because the C function accepts a generic void pointer regardless of the original type. + +`ManagedResource` guarantees that `c2pa_free` is called exactly once per pointer: not zero times (leak), not twice (double-free). + +## Lifecycle states + +Each `ManagedResource` tracks its state with a `LifecycleState` enum: + +```mermaid +stateDiagram-v2 + [*] --> UNINITIALIZED : __init__() + UNINITIALIZED --> ACTIVE : native pointer created + ACTIVE --> CLOSED : close() / __exit__ / __del__ + ACTIVE --> CLOSED : _mark_consumed() +``` + +- `UNINITIALIZED`: The Python object exists but the native pointer has not been set yet. This is a transient state during construction. +- `ACTIVE`: The native pointer is valid. The object can be used. +- `CLOSED`: The native pointer has been freed (or ownership was transferred). Any further use raises `C2paError`. + +The transition from ACTIVE to CLOSED is one-way. Once closed, an object cannot be reactivated. + +Every public method calls `_ensure_valid_state()` before doing any work. Besides checking the lifecycle state, this method also calls `_clear_error_state()`, which resets any stale error left over from a previous native library call. Without this, an error from one operation could leak into the next one and produce a misleading error message. + +## Ways to clean up + +### Context manager (`with` statement) + +```python +with Reader("image.jpg") as reader: + print(reader.json()) +# reader is automatically closed here, even if an exception occurs +``` + +When the `with` block exits, `__exit__` calls `close()`, which frees the native pointer. This is the safest approach because cleanup happens even if the code inside the block raises an exception. + +### Explicit `.close()` + +```python +reader = Reader("image.jpg") +try: + print(reader.json()) +finally: + reader.close() +``` + +Calling `.close()` directly is equivalent to exiting a `with` block. It is idempotent: calling it multiple times is safe and does nothing after the first call. + +### Destructor fallback (`__del__`) + +If neither of the above is used, `__del__` attempts to free the native pointer when Python garbage-collects the object. As described above, `__del__` timing is unpredictable and it may not run at all, so it is a safety net rather than a primary cleanup mechanism. + +## Error handling during cleanup + +Cleanup must never raise an exception. A failure during cleanup (for example, the native library crashing on free) should not mask the original exception that caused the `with` block to exit. `ManagedResource` enforces this: + +- `close()` delegates to `_cleanup_resources()`, which wraps the entire cleanup sequence in a try/except that catches and silences all exceptions. +- If freeing the native pointer fails, the error is logged via Python's `logging` module but not re-raised. +- The state is set to `CLOSED` as the very first step, before attempting to free anything. If cleanup fails halfway, the object is still marked closed, preventing a second attempt from doing further damage. +- Cleanup is idempotent. Calling `close()` on an already-closed object returns immediately. + +## Nesting resources + +When multiple native resources are in play at once, they can share a single `with` statement or use nested blocks. Either way, Python cleans them up in reverse order (right to left, or inner to outer). + +```python +with open("photo.jpg", "rb") as file, Reader("image/jpeg", file) as reader: + manifest = reader.json() +# reader is closed first, then file +``` + +The same can be written with nested blocks if readability is better: + +```python +with open("photo.jpg", "rb") as file: + with Reader("image/jpeg", file) as reader: + manifest = reader.json() +``` + +The order matters. The Reader depends on the file, so the Reader must be closed before the file handle. Python's `with` statement guarantees this: resources listed later (or nested deeper) are torn down first. If the file were closed while the Reader still held a pointer into it, the native library could read freed memory. + +## Reader lifecycle + +A `Reader` wraps a stream (or opens a file), passes it to the native library, and holds the returned pointer. While active, callers can use `.json()`, `.detailed_json()`, `.resource_to_stream()`, and other methods. Each of these checks state via `_ensure_valid_state()` before making the FFI call. + +```mermaid +stateDiagram-v2 + [*] --> ACTIVE : Reader("image.jpg") + ACTIVE --> ACTIVE : .json(), .detailed_json(), etc. + ACTIVE --> CLOSED : close() / exit with block + CLOSED --> CLOSED : close() (no-op) + CLOSED --> [*] + + note right of CLOSED + Any method call raises + C2paError("Reader is closed") + end note +``` + +When the Reader is closed, it first releases its own resources (open file handles, stream wrappers) via `_release()`, then frees the native pointer via `c2pa_free`. + +## Builder lifecycle + +A `Builder` follows the same pattern as Reader, with one difference: **signing consumes the builder**. The native library takes ownership of the builder's pointer during the sign operation. After signing, the builder is closed and cannot be reused. + +```mermaid +stateDiagram-v2 + [*] --> ACTIVE : Builder.from_json(manifest) + ACTIVE --> ACTIVE : .add_ingredient(), .add_action(), etc. + ACTIVE --> CLOSED : .sign() (pointer consumed by native library) + ACTIVE --> CLOSED : close() without signing + CLOSED --> [*] + + note right of CLOSED + Builder cannot be reused + after signing + end note +``` + +After `.sign()`, the builder calls `_mark_consumed()`, which sets the handle to `None` and the state to `CLOSED`. Because the native library now owns the pointer, `ManagedResource` does not call `c2pa_free`. That would double-free memory the native library already manages. + +## Ownership transfer + +Some operations transfer a native pointer from one object to another. When this happens, the original object must stop managing the pointer (e.g. so it is not freed twice). + +`_mark_consumed()` handles this. It sets `_handle = None` and `_lifecycle_state = CLOSED` in one step. + +There are two cases where this is relevant: + +- When a `Signer` is passed to a `Context`, the Context takes ownership of the Signer's native pointer. The Signer is marked consumed and must not be used again. + +- When `Builder.sign()` is called, the native library consumes the Builder's pointer. The Builder marks itself consumed regardless of whether the sign operation succeeds or fails, because in both cases the native library has taken the pointer. + +## Consume-and-return + +`_mark_consumed()` closes an object permanently. A different pattern exists where an FFI call consumes the current pointer and returns a new one (pointer swap), and the same Python object keeps working with the replacement pointer. + +`Reader.with_fragment()` and `Builder.with_archive()` both do this: + +```mermaid +stateDiagram-v2 + state "ACTIVE (ptr A)" as A + state "ACTIVE (ptr B)" as B + + A --> B : FFI call consumes ptr A, returns ptr B + note right of B + Same Python object, + new native pointer + end note +``` + +```python +# Reader.with_fragment() internally does: +new_ptr = _lib.c2pa_reader_with_fragment(self._handle, ...) +# self._handle (old pointer) is now invalid +self._handle = new_ptr +``` + +The object stays `ACTIVE` throughout. This is different from `_mark_consumed()`, where the object transitions to `CLOSED`. The old pointer must not be freed by `ManagedResource` because the native library already consumed it as part of the FFI call. + +## Subclass-specific cleanup with `_release()` + +Each subclass can override `_release()` to clean up its own resources before the native pointer is freed. The base implementation does nothing. + +Examples from the codebase: + +| Class | What `_release()` cleans up | +|---|---| +| Reader | Closes owned file handles and stream wrappers | +| Context | Drops the reference to the signer callback | +| Signer | Drops the reference to the signing callback | +| Settings | (no override, nothing extra to clean up) | +| Builder | (no override, nothing extra to clean up) | + +The cleanup order matters: `_release()` runs first (closing streams, dropping callbacks), then `c2pa_free` frees the native pointer. This order prevents the native library from accessing Python objects that no longer exist. + +## Why is `Stream` not a `ManagedResource`? + +`Stream` wraps a Python stream-like object (file stream or memory stream) so the native library can read from and write to it via callbacks. It does not inherit from `ManagedResource`, and it uses `c2pa_release_stream()` instead of `c2pa_free()` for cleanup. + +The reason is that ownership runs in the opposite direction. A `Reader` or `Builder` holds a native resource that Python code calls methods on. A `Stream` holds a native handle that the native library calls *back into* (read, seek, write, flush). The native library needs a different release function to tear down the callback machinery. + +`Stream` tracks its own state with `_closed` and `_initialized` flags rather than `LifecycleState`, but it supports the same three cleanup paths: context manager, explicit `.close()`, and `__del__` fallback. + +## Implementing a subclass of `ManagedResource` + +To wrap a new native resource, inherit from `ManagedResource` and follow these rules: + +```python +class MyResource(ManagedResource): + def __init__(self, arg): + super().__init__() + + # 1. Initialize ALL instance attributes before any code + # that can raise. If __init__ fails partway through, + # __del__ will call _release(), which accesses these + # attributes. If they don't exist, _release() raises AttributeError. + self._my_stream = None + self._my_cache = None + + # 2. Create the native pointer. + ptr = _lib.c2pa_my_resource_new(arg) + _check_ffi_operation_result(ptr, "Failed to create MyResource") + + # 3. Only set _handle and activate AFTER the FFI call + # succeeded. If it raised, _lifecycle_state stays + # UNINITIALIZED and cleanup won't try to free a + # pointer that doesn't exist. + self._handle = ptr + self._lifecycle_state = LifecycleState.ACTIVE + + def _release(self): + # 4. Clean up class-specific resources. + # Never let this method raise. Use try/except with + # logging if needed. + if self._my_stream: + try: + self._my_stream.close() + except Exception: + logger.error("Failed to close MyResource stream") + finally: + self._my_stream = None + + def do_something(self): + # 5. Check state at the start of every public method. + # This raises C2paError if the resource is closed. + self._ensure_valid_state() + return _lib.c2pa_my_resource_do_something(self._handle) +``` + +### Troubleshooting + +- If `self._my_callback = None` is set after the FFI call that can raise, and the call fails, `_release()` will try to access `self._my_callback` and crash with `AttributeError`. Always initialize attributes right after `super().__init__()`. + +- If `_lifecycle_state = ACTIVE` is set before the FFI call and the call fails, cleanup will try to free a null or invalid pointer. Activation should happen only after a valid handle exists. + +- If `_release()` raises, the exception is silently swallowed by `_cleanup_resources()`. It will not be visible unless logs are checked. Wrap risky operations in try/except. + +- `_release()` can be called more than once (via `close()` then `__del__`, or multiple `close()` calls). Make sure it handles being called on an already-cleaned-up object. Setting attributes to `None` after closing them is the standard pattern. + +- Calling `c2pa_free` directly is not recommended. `ManagedResource` handles this. If the pointer is freed manually and `ManagedResource` frees it again, the process crashes (double-free). diff --git a/src/c2pa/c2pa.py b/src/c2pa/c2pa.py index 40365d2e..b1b59fd8 100644 --- a/src/c2pa/c2pa.py +++ b/src/c2pa/c2pa.py @@ -2235,7 +2235,11 @@ def __init__( # we may have opened ourselves, and that we need to close later self._backing_file = None - # Caches for manifest JSON string and parsed data + # Caches for manifest JSON string and parsed data. + # These are invalidated when with_fragment() is called, because each + # new BMFF fragment can refine or update the manifest content as the + # reader progressively builds its understanding of the fragmented stream. + # They are also cleared on close() to release memory. self._manifest_json_str_cache = None self._manifest_data_cache = None @@ -2489,7 +2493,10 @@ def with_fragment(self, format: str, stream, ].format("Unknown error")) self._handle = new_ptr - # Invalidate caches: fragment may change manifest data + # Invalidate caches: processing a new BMFF fragment updates the native + # reader's state, which can change the manifest data it returns. + # The cached JSON string and parsed dict may now be stale, so clear + # them to force a fresh read from the native layer on next access. self._manifest_json_str_cache = None self._manifest_data_cache = None From 340cd1b1f147b1c0d7a959089abea682f528397b Mon Sep 17 00:00:00 2001 From: Tania Mathern Date: Mon, 9 Mar 2026 20:20:00 -0700 Subject: [PATCH 2/6] fix: Docs --- docs/native-resources-management.md | 38 +++++++++++++++-------------- 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/docs/native-resources-management.md b/docs/native-resources-management.md index 9bfb00c2..c27f84a9 100644 --- a/docs/native-resources-management.md +++ b/docs/native-resources-management.md @@ -2,12 +2,20 @@ ## Why is native resources management needed? +### Native pointers in a Python wrapper + The C2PA Python SDK is a wrapper around a native Rust library that exposes a C FFI. When the SDK creates a `Reader`, `Builder`, `Signer`, `Context`, or `Settings` object, that object holds a **pointer** to memory allocated on the native side. +### How Python's garbage collector works + Python manages its own objects' memory automatically through garbage collection. In CPython (the standard interpreter), this works primarily through reference counting: each object has a counter tracking how many references point to it, and when that counter reaches zero the object is deallocated. A secondary cycle-detecting collector handles the case where objects reference each other in a loop and their counts never reach zero on their own. +### Why garbage collection is not enough for native memory + This system works well for pure Python objects, but native memory sits outside of it entirely. The garbage collector sees the Python wrapper object (e.g. a `Reader` instance) and tracks references to it, but it has no visibility into the native memory that the wrapper's `_handle` attribute points to. It does not know the size of that native allocation, cannot tell when it is no longer needed, and will not call the native library's `c2pa_free` function to release it. If the Python wrapper is collected without first calling `c2pa_free`, the native memory leaks. If `c2pa_free` is called twice on the same pointer, the process crashes. +### Why `__del__` is not reliable enough + Python does offer `__del__` as a hook that runs when an object is collected, and `ManagedResource` uses it as a fallback. But `__del__` cannot be relied on as the primary cleanup mechanism: its timing is unpredictable, it may not run at all during interpreter shutdown, and other Python implementations (PyPy, GraalPy) that do not use reference counting make its behavior even less deterministic. In CPython, `__del__` runs synchronously when the last reference to an object disappears, which in simple cases happens at a predictable point (e.g. when a local variable goes out of scope). But if the object is part of a reference cycle, its reference count never reaches zero on its own. The cycle collector must discover and break the cycle first, and it runs periodically rather than immediately. An object caught in a cycle might sit in memory for an arbitrary amount of time before `__del__` fires. CPython's cycle collector does not guarantee an order when finalizing groups of objects in a cycle, so `__del__` methods that depend on other objects in the same cycle may find those objects already partially torn down. During interpreter shutdown, the situation is even less reliable: CPython clears module globals and may collect objects in an arbitrary order, and `__del__` methods that reference global state (like the `_lib` handle to the native library) can fail silently because those globals have already been set to `None`. PyPy and GraalPy use tracing garbage collectors (which periodically walk the object graph to find unreachable objects, rather than tracking individual reference counts) instead of reference counting, so `__del__` does not run when the last reference disappears. It runs at some later point when the GC happens to trace that region of the heap, which could be seconds or minutes later, or not at all if the process exits first. @@ -36,7 +44,7 @@ classDiagram ContextProvider <|-- Settings ``` -`ContextProvider`, by contrast, is an ABC. It defines two abstract properties (`is_valid` and `execution_context`) that subclasses must implement. `Context` and `Settings` both inherit from it. Since Python supports multiple inheritance, `Context` gets lifecycle management from `ManagedResource` and the context-provider interface from `ContextProvider`. The `is_valid` property required by `ContextProvider` is implemented once, on `ManagedResource`, and inherited by both. +`Context` and `Settings` inherit from both `ManagedResource` and `ContextProvider` (Python supports multiple inheritance). `ContextProvider` is an ABC that requires two properties: `is_valid` and `execution_context`. The `is_valid` implementation lives on `ManagedResource`, so `Context` and `Settings` satisfy the `ContextProvider` contract without duplicating the property. ## Preventing garbage collection of live references @@ -135,7 +143,7 @@ with open("photo.jpg", "rb") as file: manifest = reader.json() ``` -The order matters. The Reader depends on the file, so the Reader must be closed before the file handle. Python's `with` statement guarantees this: resources listed later (or nested deeper) are torn down first. If the file were closed while the Reader still held a pointer into it, the native library could read freed memory. +The order matters because resources often depend on each other. In the example above, the `Reader` holds a native pointer that references the file's data through a `Stream` wrapper. If the file handle were closed first, the native library would still hold a pointer into the stream's read callbacks, and any subsequent access (including cleanup) could read freed memory or trigger a segfault. By closing the Reader first, the native pointer is freed while the underlying file is still open and valid. Python's `with` statement guarantees this ordering: resources listed later (or nested deeper) are torn down first. ## Reader lifecycle @@ -143,18 +151,14 @@ A `Reader` wraps a stream (or opens a file), passes it to the native library, an ```mermaid stateDiagram-v2 + direction LR [*] --> ACTIVE : Reader("image.jpg") - ACTIVE --> ACTIVE : .json(), .detailed_json(), etc. ACTIVE --> CLOSED : close() / exit with block - CLOSED --> CLOSED : close() (no-op) CLOSED --> [*] - - note right of CLOSED - Any method call raises - C2paError("Reader is closed") - end note ``` +While `ACTIVE`, callers can use `.json()`, `.detailed_json()`, etc. repeatedly without changing state. Calling `.close()` on an already-closed Reader is a no-op. Any other method call on a closed Reader raises `C2paError`. + When the Reader is closed, it first releases its own resources (open file handles, stream wrappers) via `_release()`, then frees the native pointer via `c2pa_free`. ## Builder lifecycle @@ -163,18 +167,16 @@ A `Builder` follows the same pattern as Reader, with one difference: **signing c ```mermaid stateDiagram-v2 + direction LR [*] --> ACTIVE : Builder.from_json(manifest) - ACTIVE --> ACTIVE : .add_ingredient(), .add_action(), etc. - ACTIVE --> CLOSED : .sign() (pointer consumed by native library) + ACTIVE --> CLOSED_BY_SIGN : .sign() ACTIVE --> CLOSED : close() without signing + CLOSED_BY_SIGN --> [*] CLOSED --> [*] - - note right of CLOSED - Builder cannot be reused - after signing - end note ``` +While `ACTIVE`, callers can use `.add_ingredient()`, `.add_action()`, etc. repeatedly. `.sign()` consumes the native pointer (ownership transfers to the native library), so the Builder cannot be reused afterward. Closing without signing frees the pointer normally. + After `.sign()`, the builder calls `_mark_consumed()`, which sets the handle to `None` and the state to `CLOSED`. Because the native library now owns the pointer, `ManagedResource` does not call `c2pa_free`. That would double-free memory the native library already manages. ## Ownership transfer @@ -191,7 +193,7 @@ There are two cases where this is relevant: ## Consume-and-return -`_mark_consumed()` closes an object permanently. A different pattern exists where an FFI call consumes the current pointer and returns a new one (pointer swap), and the same Python object keeps working with the replacement pointer. +`_mark_consumed()` closes an object permanently. A different pattern exists where a FFI call consumes the current pointer and returns a new one (pointer swap), and the same Python object keeps working with the replacement pointer. `Reader.with_fragment()` and `Builder.with_archive()` both do this: @@ -200,7 +202,7 @@ stateDiagram-v2 state "ACTIVE (ptr A)" as A state "ACTIVE (ptr B)" as B - A --> B : FFI call consumes ptr A, returns ptr B + A --> B : C FFI call consumes ptr A, returns ptr B note right of B Same Python object, new native pointer From 22d6be43673785e0e7c87fcf03c1e4d9ea44d65a Mon Sep 17 00:00:00 2001 From: Tania Mathern Date: Mon, 9 Mar 2026 20:21:53 -0700 Subject: [PATCH 3/6] fix: Docs --- docs/native-resources-management.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/docs/native-resources-management.md b/docs/native-resources-management.md index c27f84a9..1687009b 100644 --- a/docs/native-resources-management.md +++ b/docs/native-resources-management.md @@ -74,10 +74,10 @@ Each `ManagedResource` tracks its state with a `LifecycleState` enum: ```mermaid stateDiagram-v2 + direction LR [*] --> UNINITIALIZED : __init__() UNINITIALIZED --> ACTIVE : native pointer created - ACTIVE --> CLOSED : close() / __exit__ / __del__ - ACTIVE --> CLOSED : _mark_consumed() + ACTIVE --> CLOSED : close() / __exit__ / __del__ / _mark_consumed() ``` - `UNINITIALIZED`: The Python object exists but the native pointer has not been set yet. This is a transient state during construction. @@ -169,10 +169,13 @@ A `Builder` follows the same pattern as Reader, with one difference: **signing c stateDiagram-v2 direction LR [*] --> ACTIVE : Builder.from_json(manifest) - ACTIVE --> CLOSED_BY_SIGN : .sign() - ACTIVE --> CLOSED : close() without signing - CLOSED_BY_SIGN --> [*] + ACTIVE --> CLOSED : .sign() or close() CLOSED --> [*] + + note left of CLOSED + .sign() consumes the pointer + close() frees it + end note ``` While `ACTIVE`, callers can use `.add_ingredient()`, `.add_action()`, etc. repeatedly. `.sign()` consumes the native pointer (ownership transfers to the native library), so the Builder cannot be reused afterward. Closing without signing frees the pointer normally. From 26b792f510482cef44fc904a4bb4073bedb43cc5 Mon Sep 17 00:00:00 2001 From: Tania Mathern Date: Mon, 9 Mar 2026 20:22:41 -0700 Subject: [PATCH 4/6] fix: Docs --- docs/native-resources-management.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/native-resources-management.md b/docs/native-resources-management.md index 1687009b..0345bab1 100644 --- a/docs/native-resources-management.md +++ b/docs/native-resources-management.md @@ -152,7 +152,8 @@ A `Reader` wraps a stream (or opens a file), passes it to the native library, an ```mermaid stateDiagram-v2 direction LR - [*] --> ACTIVE : Reader("image.jpg") + [*] --> UNINITIALIZED : __init__() + UNINITIALIZED --> ACTIVE : Reader("image.jpg") ACTIVE --> CLOSED : close() / exit with block CLOSED --> [*] ``` @@ -168,7 +169,8 @@ A `Builder` follows the same pattern as Reader, with one difference: **signing c ```mermaid stateDiagram-v2 direction LR - [*] --> ACTIVE : Builder.from_json(manifest) + [*] --> UNINITIALIZED : __init__() + UNINITIALIZED --> ACTIVE : Builder.from_json(manifest) ACTIVE --> CLOSED : .sign() or close() CLOSED --> [*] From 219936b778da66d8570ed5b52acc6bc4d44a16ab Mon Sep 17 00:00:00 2001 From: Tania Mathern Date: Mon, 9 Mar 2026 20:25:17 -0700 Subject: [PATCH 5/6] fix: Docs --- docs/native-resources-management.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/native-resources-management.md b/docs/native-resources-management.md index 0345bab1..5dc30c2a 100644 --- a/docs/native-resources-management.md +++ b/docs/native-resources-management.md @@ -198,9 +198,11 @@ There are two cases where this is relevant: ## Consume-and-return -`_mark_consumed()` closes an object permanently. A different pattern exists where a FFI call consumes the current pointer and returns a new one (pointer swap), and the same Python object keeps working with the replacement pointer. +`_mark_consumed()` closes an object permanently. A different pattern is needed when the native library must replace an object's internal state without discarding the Python-side object. This happens with fragmented media: `Reader.with_fragment()` feeds a new BMFF fragment (used in DASH/HLS streaming) into an existing Reader, and the native library must rebuild its internal representation to account for the new data. The native API does this by consuming the old pointer and returning a new one. Creating a fresh `Reader` from scratch would not work because the native library needs the accumulated state from prior fragments. -`Reader.with_fragment()` and `Builder.with_archive()` both do this: +`Builder.with_archive()` follows the same pattern: it loads an archive into an existing Builder, replacing the manifest definition while preserving the Builder's context and settings. + +In both cases the FFI call consumes the current pointer and returns a replacement: ```mermaid stateDiagram-v2 @@ -221,7 +223,7 @@ new_ptr = _lib.c2pa_reader_with_fragment(self._handle, ...) self._handle = new_ptr ``` -The object stays `ACTIVE` throughout. This is different from `_mark_consumed()`, where the object transitions to `CLOSED`. The old pointer must not be freed by `ManagedResource` because the native library already consumed it as part of the FFI call. +The object stays `ACTIVE` throughout because the Python-side object is still valid: it has a live native pointer, its public methods still work, and callers may continue using it (e.g. reading the updated manifest or feeding in another fragment). The lifecycle state does not change because from `ManagedResource`'s perspective nothing has closed. Only the underlying native pointer has been swapped. This is different from `_mark_consumed()`, where the object transitions to `CLOSED` and becomes unusable. The old pointer must not be freed by `ManagedResource` because the native library already consumed it as part of the FFI call. ## Subclass-specific cleanup with `_release()` From 929f76bf7ab76e0dee657b05ed1812bc3d4982d8 Mon Sep 17 00:00:00 2001 From: Tania Mathern Date: Tue, 10 Mar 2026 10:18:53 -0700 Subject: [PATCH 6/6] fix: Update docs --- docs/native-resources-management.md | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/docs/native-resources-management.md b/docs/native-resources-management.md index 5dc30c2a..ba7496a2 100644 --- a/docs/native-resources-management.md +++ b/docs/native-resources-management.md @@ -1,5 +1,18 @@ # Native resource management (ManagedResource class) +`ManagedResource` is the internal base class used by the C2PA Python SDK to wrap native (Rust/FFI) pointers. When adding new wrappers around native resources `ManagedResource` should be subclassed and follow the documented lifecycle rules. + +## Why `ManagedResource`? + +`ManagedResource` is the internal base class responsible for managing native pointers owned by the C2PA Python SDK. It guarantees: + +- Native memory is freed exactly once. +- Resources are cleaned up deterministically via context managers or explicit `close()`. +- Ownership transfers** (e.g. signer to context) are handled safely so the same pointer is never freed twice. +- Cleanup never raises or masks real exceptions. + +Developers wrapping new native resources must inherit from `ManagedResource` and follow the documented lifecycle rules. + ## Why is native resources management needed? ### Native pointers in a Python wrapper @@ -46,6 +59,19 @@ classDiagram `Context` and `Settings` inherit from both `ManagedResource` and `ContextProvider` (Python supports multiple inheritance). `ContextProvider` is an ABC that requires two properties: `is_valid` and `execution_context`. The `is_valid` implementation lives on `ManagedResource`, so `Context` and `Settings` satisfy the `ContextProvider` contract without duplicating the property. +## Guarantees provided by ManagedResource + +`ManagedResource` provides the following guarantees. Subclasses and callers can rely on them. These guarantees invariants must be maintained when subclassing the `ManagedResource` class in new implementation/new native resources handlers. + +| Guarantee | Description | +| --------- | ----------- | +| **Pointer freed exactly once** | Each native pointer is passed to `c2pa_free` at most once. No leak (zero frees) and no double-free. | +| **Cleanup is idempotent** | Calling `close()` (or exiting a `with` block) multiple times is safe; after the first successful cleanup, further calls do nothing. | +| **Cleanup never raises** | The cleanup path (including `_release()` and `c2pa_free`) is wrapped so that exceptions are caught and logged, never re-raised. The original exception from the `with` block (if any) is never masked. | +| **State transitions are one-way** | Lifecycle moves only from UNINITIALIZED → ACTIVE → CLOSED. A closed resource cannot be reactivated. | +| **Ownership transfer is safe** | When a pointer is transferred elsewhere (e.g. via `_mark_consumed()`), the object stops managing it and does not call `c2pa_free` on it. | +| **Public methods validate lifecycle state** | Every public API calls `_ensure_valid_state()` before use; closed or invalid state yields `C2paError` instead of undefined behavior or crashes. | + ## Preventing garbage collection of live references When a Python object passes a callback or pointer to the native library, that reference must stay alive for as long as the native side might use it. Python's garbage collector has no way to know that native code is still holding a reference to a Python callback. @@ -232,7 +258,7 @@ Each subclass can override `_release()` to clean up its own resources before the Examples from the codebase: | Class | What `_release()` cleans up | -|---|---| +| --- | --- | | Reader | Closes owned file handles and stream wrappers | | Context | Drops the reference to the signer callback | | Signer | Drops the reference to the signing callback |