youtube-vis: video dataset #486

JonasWurst · 2026-02-02T06:51:20Z

What has changed and why?

Adding the loading method for youtube-vis format to the video dataset.

How has it been tested?

tests are implemented.

Did you update CHANGELOG.md?

Yes
Not needed (internal change)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af08e50151

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

lightly_studio/src/lightly_studio/core/video_dataset.py

lightly_studio/tests/core/test_dataset__video.py

lightly_studio/src/lightly_studio/core/video_dataset.py

JonasWurst · 2026-02-02T07:19:05Z

/review

michal-lightly

Thanks! I think we can simplify this PR.

I have not looked at the tests yet.

lightly_studio/src/lightly_studio/core/video_dataset.py

lightly_studio/src/lightly_studio/examples/example_video_annotations.py

michal-lightly · 2026-02-02T09:21:26Z

CHANGELOG.md

 - Enabled editing of segmentation masks and deletion of annotations in the details view.
 - Allowed users to customize toolbar shortcuts.
 - Added `Sample.add_annotation()` method, adds annotations to samples.
+- Loading videos with annotations from youtube-vis format via `dataset.add_videos_from_youtube_vis`.


We will also need to add info to docs, I assume that will be in a separate PR?

We will need to highlight that this is not the standard youtubevis format.

Follow up PR

lightly_studio/src/lightly_studio/core/video_dataset.py

michal-lightly · 2026-02-03T16:27:20Z

lightly_studio/src/lightly_studio/core/video_dataset.py

+            allowed_extensions: An iterable container of allowed video file
+                extensions in lowercase, including the leading dot. If None,
+                uses default VIDEO_EXTENSIONS.


Please add information about how this is used - this is user-facing interface.

Naively, annotations.json should have all the info about the loaded videos. But there is a reason we need to supply path and allowed_extensions.

What is not clear to me is what happens if the info from annotations.json and the list of video files mismatches.

michal-lightly · 2026-02-03T16:28:08Z

lightly_studio/src/lightly_studio/core/video_dataset.py

+    def add_videos_from_youtube_vis(
+        self,
+        annotations_json: PathLike,
+        path: PathLike,


Optional: We could call it videos_path.

michal-lightly · 2026-02-03T16:29:14Z

lightly_studio/src/lightly_studio/core/video_dataset.py

    )
+
+
+def _collect_video_file_paths(


Optional: Consider moving this to add_videos.py too.

michal-lightly · 2026-02-03T16:31:08Z

lightly_studio/tests/core/test_add_videos.py


 def test_load_video_annotations_from_labelformat(
-    db_session: Session,
+    patch_collection: None,  # noqa: ARG001


Why has this file changed to depend on the VideoDataset class? It would be better to keep it independent if possible, then we also don't need the patch.

michal-lightly · 2026-02-03T16:35:40Z

lightly_studio/src/lightly_studio/core/add_videos.py

        session: The database session.
        dataset_id: The ID of the video dataset to load annotations into.
+        video_paths: An iterable of file paths to the videos to load.
        input_labels: The labelformat input containing video annotations.


Also here, please add info on how the parameters interact (see the comment in VideoDataset).

michal-lightly · 2026-02-03T16:43:33Z

lightly_studio/src/lightly_studio/core/add_videos.py

+        file_path = Path(video_file)
+        video_name_to_path[file_path.name] = str(file_path.absolute())


Will this work also with fsspec paths?

michal-lightly · 2026-02-03T16:56:43Z

lightly_studio/src/lightly_studio/core/add_videos.py

+    for video_file in video_paths:
+        file_path = Path(video_file)
+        video_name_to_path[file_path.name] = str(file_path.absolute())
+        if file_path.stem in video_stem_to_path:


There is a problem with .stem(), it will strip any path to the video. I'd suggest .with_suffix("").

This assumes labelformat might return the video paths with a folder prefix, which I think is the case. Can we add a test with videos in some nested folder structure (e.g. a/vid.mp4, b/vid.mp4)? Can be a follow-up.

michal-lightly · 2026-02-03T16:59:59Z

lightly_studio/src/lightly_studio/core/add_videos.py

+        if resolved_path is None:
+            raise FileNotFoundError(f"No video file found for '{filename}'.")
+        video_paths.append(resolved_path)
+    return list(dict.fromkeys(video_paths))


Suggested change

return list(dict.fromkeys(video_paths))

return video_paths

michal-lightly · 2026-02-03T17:15:28Z

lightly_studio/tests/core/test_dataset__video.py

+                    "length": 3,
+                },
+            ],
+            "annotations": [


Optional: We can keep this empty since it is irrelevant for the test.

michal-lightly · 2026-02-03T17:18:17Z

lightly_studio/tests/core/test_dataset__video.py

+            embed=True,
+        )
+
+        # Verify embeddings were created


There is an easier way, from a test with images:

assert len(image1.sample.embeddings) == 1

JonasWurst added 4 commits February 2, 2026 07:09

add youtube method

1633828

video dataset complete

a6b59cd

format

4bf18fe

adding tests

af08e50

chatgpt-codex-connector bot reviewed Feb 2, 2026

View reviewed changes

JonasWurst added 2 commits February 2, 2026 07:55

changelog

36ea9e6

Format

10dac23

JonasWurst commented Feb 2, 2026

View reviewed changes

lightly_studio/src/lightly_studio/core/video_dataset.py Outdated Show resolved Hide resolved

reorder tests

6572fcb

fix masks

719619b

michal-lightly requested changes Feb 2, 2026

View reviewed changes

michal-lightly reviewed Feb 3, 2026

View reviewed changes

lightly_studio/src/lightly_studio/core/video_dataset.py Outdated Show resolved Hide resolved

lightly_studio/src/lightly_studio/core/video_dataset.py Outdated Show resolved Hide resolved

JonasWurst added 3 commits February 3, 2026 15:01

Push functionality to the add_videos method

cb8be29

refactor, update tests, remove path

417d0f9

simplify example

f290ce5

michal-lightly requested changes Feb 3, 2026

View reviewed changes

		file_path = Path(video_file)
		video_name_to_path[file_path.name] = str(file_path.absolute())

youtube-vis: video dataset #486

Are you sure you want to change the base?

youtube-vis: video dataset #486

Uh oh!

Conversation

JonasWurst commented Feb 2, 2026

What has changed and why?

How has it been tested?

Did you update CHANGELOG.md?

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JonasWurst commented Feb 2, 2026

Uh oh!

michal-lightly left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants