Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/_apps/smolvlm2-captioner/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ title: smolvlm2-captioner
date: 1970-01-01T00:00:00+00:00
---
Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.
- [v0.2](v0.2) ([`@kelleyl`](https://github.com/kelleyl))
- [v0.1](v0.1) ([`@kelleyl`](https://github.com/kelleyl))
134 changes: 134 additions & 0 deletions docs/_apps/smolvlm2-captioner/v0.2/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
---
layout: posts
classes: wide
title: "SmolVLM2 Captioner (v0.2)"
date: 2026-01-28T03:14:45+00:00
---
## About this version

- Submitter: [kelleyl](https://github.com/kelleyl)
- Submission Time: 2026-01-28T03:14:45+00:00
- Prebuilt Container Image: [ghcr.io/clamsproject/app-smolvlm2-captioner:v0.2](https://github.com/clamsproject/app-smolvlm2-captioner/pkgs/container/app-smolvlm2-captioner/v0.2)
- Release Notes

(no notes provided by the developer)

## About this app (See raw [metadata.json](metadata.json))

**Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.**

- App ID: [http://apps.clams.ai/smolvlm2-captioner/v0.2](http://apps.clams.ai/smolvlm2-captioner/v0.2)
- App License: Apache 2.0
- Source Repository: [https://github.com/clamsproject/app-smolvlm2-captioner](https://github.com/clamsproject/app-smolvlm2-captioner) ([source tree of the submitted version](https://github.com/clamsproject/app-smolvlm2-captioner/tree/v0.2))


#### Inputs
(**Note**: "*" as a property value means that the property is required but can be any value.)

- [http://mmif.clams.ai/vocabulary/VideoDocument/v1](http://mmif.clams.ai/vocabulary/VideoDocument/v1) (required)
(of any properties)

- [http://mmif.clams.ai/vocabulary/ImageDocument/v1](http://mmif.clams.ai/vocabulary/ImageDocument/v1) (required)
(of any properties)

- [http://mmif.clams.ai/vocabulary/TimeFrame/v6](http://mmif.clams.ai/vocabulary/TimeFrame/v6) (required)
(of any properties)



#### Configurable Parameters
(**Note**: _Multivalued_ means the parameter can have one or more values.)

- `frameInterval`: optional, defaults to `30`

- Type: integer
- Multivalued: False


> The interval at which to extract frames from the video if there are no timeframe annotations. Default is every 30 frames.
- `defaultPrompt`: optional, defaults to `Describe what is shown in this video frame. Analyze the purpose of this frame in the context of a news video. Transcribe any text present.`

- Type: string
- Multivalued: False


> default prompt to use for timeframes not specified in the promptMap. If set to `-`, timeframes not specified in the promptMap will be skipped.
- `promptMap`: optional, defaults to `[]`

- Type: map
- Multivalued: True


> mapping of labels of the input timeframe annotations to new prompts. Must be formatted as "IN_LABEL:PROMPT" (with a colon). To pass multiple mappings, use this parameter multiple times. By default, any timeframe labels not mapped to a prompt will be used with the defaultprompt. In order to skip timeframes with a particular label, pass `-` as the prompt value.in order to skip all timeframes not specified in the promptMap, set the defaultPromptparameter to `-`
- `defaultSystemPrompt`: optional, defaults to `""`

- Type: string
- Multivalued: False


> default system prompt to use for all timeframes. System prompts are passed to the model using the messages format with role="system", providing context or instructions that guide the model's behavior. The processor will format this properly using its chat template.
- `systemPromptMap`: optional, defaults to `[]`

- Type: map
- Multivalued: True


> mapping of labels of the input timeframe annotations to system prompts. Must be formatted as "IN_LABEL:SYSTEM_PROMPT" (with a colon). To pass multiple mappings, use this parameter multiple times. System prompts are passed to the model using the messages format with role="system", providing context or instructions that guide the model's behavior.
- `config`: optional, defaults to `config/default.yaml`

- Type: string
- Multivalued: False


> Name of the config file to use.
- `num_beams`: optional, defaults to `1`

- Type: integer
- Multivalued: False


> Number of beams for beam search during text generation. Default is 1. Higher values may improve quality but increase generation time.
- `batchSize`: optional, defaults to `12`

- Type: integer
- Multivalued: False


> Number of images to process in each batch. Default is 12. Higher values may improve throughput but require more memory.
- `pretty`: optional, defaults to `false`

- Type: boolean
- Multivalued: False
- Choices: **_`false`_**, `true`


> The JSON body of the HTTP response will be re-formatted with 2-space indentation
- `runningTime`: optional, defaults to `false`

- Type: boolean
- Multivalued: False
- Choices: **_`false`_**, `true`


> The running time of the app will be recorded in the view metadata
- `hwFetch`: optional, defaults to `false`

- Type: boolean
- Multivalued: False
- Choices: **_`false`_**, `true`


> The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata


#### Outputs
(**Note**: "*" as a property value means that the property is required but can be any value.)

(**Note**: Not all output annotations are always generated.)

- [http://mmif.clams.ai/vocabulary/Alignment/v1](http://mmif.clams.ai/vocabulary/Alignment/v1)
(of any properties)

- [http://mmif.clams.ai/vocabulary/TextDocument/v1](http://mmif.clams.ai/vocabulary/TextDocument/v1)
(of any properties)

110 changes: 110 additions & 0 deletions docs/_apps/smolvlm2-captioner/v0.2/metadata.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
{
"name": "SmolVLM2 Captioner",
"description": "Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.",
"app_version": "v0.2",
"mmif_version": "1.1.0",
"app_license": "Apache 2.0",
"identifier": "http://apps.clams.ai/smolvlm2-captioner/v0.2",
"url": "https://github.com/clamsproject/app-smolvlm2-captioner",
"input": [
{
"@type": "http://mmif.clams.ai/vocabulary/VideoDocument/v1",
"required": true
},
{
"@type": "http://mmif.clams.ai/vocabulary/ImageDocument/v1",
"required": true
},
{
"@type": "http://mmif.clams.ai/vocabulary/TimeFrame/v6",
"required": true
}
],
"output": [
{
"@type": "http://mmif.clams.ai/vocabulary/Alignment/v1"
},
{
"@type": "http://mmif.clams.ai/vocabulary/TextDocument/v1"
}
],
"parameters": [
{
"name": "frameInterval",
"description": "The interval at which to extract frames from the video if there are no timeframe annotations. Default is every 30 frames.",
"type": "integer",
"default": 30,
"multivalued": false
},
{
"name": "defaultPrompt",
"description": "default prompt to use for timeframes not specified in the promptMap. If set to `-`, timeframes not specified in the promptMap will be skipped.",
"type": "string",
"default": "Describe what is shown in this video frame. Analyze the purpose of this frame in the context of a news video. Transcribe any text present.",
"multivalued": false
},
{
"name": "promptMap",
"description": "mapping of labels of the input timeframe annotations to new prompts. Must be formatted as \"IN_LABEL:PROMPT\" (with a colon). To pass multiple mappings, use this parameter multiple times. By default, any timeframe labels not mapped to a prompt will be used with the defaultprompt. In order to skip timeframes with a particular label, pass `-` as the prompt value.in order to skip all timeframes not specified in the promptMap, set the defaultPromptparameter to `-`",
"type": "map",
"default": [],
"multivalued": true
},
{
"name": "defaultSystemPrompt",
"description": "default system prompt to use for all timeframes. System prompts are passed to the model using the messages format with role=\"system\", providing context or instructions that guide the model's behavior. The processor will format this properly using its chat template.",
"type": "string",
"default": "",
"multivalued": false
},
{
"name": "systemPromptMap",
"description": "mapping of labels of the input timeframe annotations to system prompts. Must be formatted as \"IN_LABEL:SYSTEM_PROMPT\" (with a colon). To pass multiple mappings, use this parameter multiple times. System prompts are passed to the model using the messages format with role=\"system\", providing context or instructions that guide the model's behavior.",
"type": "map",
"default": [],
"multivalued": true
},
{
"name": "config",
"description": "Name of the config file to use.",
"type": "string",
"default": "config/default.yaml",
"multivalued": false
},
{
"name": "num_beams",
"description": "Number of beams for beam search during text generation. Default is 1. Higher values may improve quality but increase generation time.",
"type": "integer",
"default": 1,
"multivalued": false
},
{
"name": "batchSize",
"description": "Number of images to process in each batch. Default is 12. Higher values may improve throughput but require more memory.",
"type": "integer",
"default": 12,
"multivalued": false
},
{
"name": "pretty",
"description": "The JSON body of the HTTP response will be re-formatted with 2-space indentation",
"type": "boolean",
"default": false,
"multivalued": false
},
{
"name": "runningTime",
"description": "The running time of the app will be recorded in the view metadata",
"type": "boolean",
"default": false,
"multivalued": false
},
{
"name": "hwFetch",
"description": "The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata",
"type": "boolean",
"default": false,
"multivalued": false
}
]
}
5 changes: 5 additions & 0 deletions docs/_apps/smolvlm2-captioner/v0.2/submission.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{
"time": "2026-01-28T03:14:45+00:00",
"submitter": "kelleyl",
"image": "ghcr.io/clamsproject/app-smolvlm2-captioner:v0.2"
}
24 changes: 14 additions & 10 deletions docs/_data/app-index.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,18 @@
{
"http://apps.clams.ai/smolvlm2-captioner": {
"description": "Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.",
"latest_update": "2026-01-28T03:14:45+00:00",
"versions": [
[
"v0.2",
"kelleyl"
],
[
"v0.1",
"kelleyl"
]
]
},
"http://apps.clams.ai/swt-detection": {
"description": "Detects scenes with text, like slates, chyrons and credits. This app can run in three modes, depending on `useClassifier`, `useStitcher` parameters. When `useClassifier=True`, it runs in the \"TimePoint mode\" and generates TimePoint annotations. When `useStitcher=True`, it runs in the \"TimeFrame mode\" and generates TimeFrame annotations based on existing TimePoint annotations -- if no TimePoint is found, it produces an error. By default, it runs in the 'both' mode and first generates TimePoint annotations and then TimeFrame annotations on them.",
"latest_update": "2025-12-14T01:08:09+00:00",
Expand Down Expand Up @@ -109,16 +123,6 @@
]
]
},
"http://apps.clams.ai/smolvlm2-captioner": {
"description": "Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.",
"latest_update": "2025-11-20T15:41:04+00:00",
"versions": [
[
"v0.1",
"kelleyl"
]
]
},
"http://apps.clams.ai/tonedetection": {
"description": "Detects spans of monotonic audio within an audio file",
"latest_update": "2025-11-20T08:01:02+00:00",
Expand Down
2 changes: 1 addition & 1 deletion docs/_data/apps.json

Large diffs are not rendered by default.

Loading