clamsproject · keighrim · Jan 28, 2026 · Jan 28, 2026
diff --git a/docs/_apps/smolvlm2-captioner/index.md b/docs/_apps/smolvlm2-captioner/index.md
@@ -5,4 +5,5 @@ title: smolvlm2-captioner
 date: 1970-01-01T00:00:00+00:00
 ---
 Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.
+- [v0.2](v0.2) ([`@kelleyl`](https://github.com/kelleyl))
 - [v0.1](v0.1) ([`@kelleyl`](https://github.com/kelleyl))
diff --git a/docs/_apps/smolvlm2-captioner/v0.2/index.md b/docs/_apps/smolvlm2-captioner/v0.2/index.md
@@ -0,0 +1,134 @@
+---
+layout: posts
+classes: wide
+title: "SmolVLM2 Captioner (v0.2)"
+date: 2026-01-28T03:14:45+00:00
+---
+## About this version
+
+- Submitter: [kelleyl](https://github.com/kelleyl)
+- Submission Time: 2026-01-28T03:14:45+00:00
+- Prebuilt Container Image: [ghcr.io/clamsproject/app-smolvlm2-captioner:v0.2](https://github.com/clamsproject/app-smolvlm2-captioner/pkgs/container/app-smolvlm2-captioner/v0.2)
+- Release Notes
+
+    (no notes provided by the developer)
+
+## About this app (See raw [metadata.json](metadata.json))
+
+**Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.**
+
+- App ID: [http://apps.clams.ai/smolvlm2-captioner/v0.2](http://apps.clams.ai/smolvlm2-captioner/v0.2)
+- App License: Apache 2.0
+- Source Repository: [https://github.com/clamsproject/app-smolvlm2-captioner](https://github.com/clamsproject/app-smolvlm2-captioner) ([source tree of the submitted version](https://github.com/clamsproject/app-smolvlm2-captioner/tree/v0.2))
+
+
+#### Inputs
+(**Note**: "*" as a property value means that the property is required but can be any value.)
+
+- [http://mmif.clams.ai/vocabulary/VideoDocument/v1](http://mmif.clams.ai/vocabulary/VideoDocument/v1) (required)
+(of any properties)
+
+- [http://mmif.clams.ai/vocabulary/ImageDocument/v1](http://mmif.clams.ai/vocabulary/ImageDocument/v1) (required)
+(of any properties)
+
+- [http://mmif.clams.ai/vocabulary/TimeFrame/v6](http://mmif.clams.ai/vocabulary/TimeFrame/v6) (required)
+(of any properties)
+
+
+
+#### Configurable Parameters
+(**Note**: _Multivalued_ means the parameter can have one or more values.)
+
+- `frameInterval`: optional, defaults to `30`
+
+    - Type: integer
+    - Multivalued: False
+
+
+    > The interval at which to extract frames from the video if there are no timeframe annotations. Default is every 30 frames.
+- `defaultPrompt`: optional, defaults to `Describe what is shown in this video frame. Analyze the purpose of this frame in the context of a news video. Transcribe any text present.`
+
+    - Type: string
+    - Multivalued: False
+
+
+    > default prompt to use for timeframes not specified in the promptMap. If set to `-`, timeframes not specified in the promptMap will be skipped.
+- `promptMap`: optional, defaults to `[]`
+
+    - Type: map
+    - Multivalued: True
+
+
+    > mapping of labels of the input timeframe annotations to new prompts. Must be formatted as "IN_LABEL:PROMPT" (with a colon). To pass multiple mappings, use this parameter multiple times. By default, any timeframe labels not mapped to a prompt will be used with the defaultprompt. In order to skip timeframes with a particular label, pass `-` as the prompt value.in order to skip all timeframes not specified in the promptMap, set the defaultPromptparameter to `-`
+- `defaultSystemPrompt`: optional, defaults to `""`
+
+    - Type: string
+    - Multivalued: False
+
+
+    > default system prompt to use for all timeframes. System prompts are passed to the model using the messages format with role="system", providing context or instructions that guide the model's behavior. The processor will format this properly using its chat template.
+- `systemPromptMap`: optional, defaults to `[]`
+
+    - Type: map
+    - Multivalued: True
+
+
+    > mapping of labels of the input timeframe annotations to system prompts. Must be formatted as "IN_LABEL:SYSTEM_PROMPT" (with a colon). To pass multiple mappings, use this parameter multiple times. System prompts are passed to the model using the messages format with role="system", providing context or instructions that guide the model's behavior.
+- `config`: optional, defaults to `config/default.yaml`
+
+    - Type: string
+    - Multivalued: False
+
+
+    > Name of the config file to use.
+- `num_beams`: optional, defaults to `1`
+
+    - Type: integer
+    - Multivalued: False
+
+
+    > Number of beams for beam search during text generation. Default is 1. Higher values may improve quality but increase generation time.
+- `batchSize`: optional, defaults to `12`
+
+    - Type: integer
+    - Multivalued: False
+
+
+    > Number of images to process in each batch. Default is 12. Higher values may improve throughput but require more memory.
+- `pretty`: optional, defaults to `false`
+
+    - Type: boolean
+    - Multivalued: False
+    - Choices: **_`false`_**, `true`
+
+
+    > The JSON body of the HTTP response will be re-formatted with 2-space indentation
+- `runningTime`: optional, defaults to `false`
+
+    - Type: boolean
+    - Multivalued: False
+    - Choices: **_`false`_**, `true`
+
+
+    > The running time of the app will be recorded in the view metadata
+- `hwFetch`: optional, defaults to `false`
+
+    - Type: boolean
+    - Multivalued: False
+    - Choices: **_`false`_**, `true`
+
+
+    > The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata
+
+
+#### Outputs
+(**Note**: "*" as a property value means that the property is required but can be any value.)
+
+(**Note**: Not all output annotations are always generated.)
+
+- [http://mmif.clams.ai/vocabulary/Alignment/v1](http://mmif.clams.ai/vocabulary/Alignment/v1)
+(of any properties)
+
+- [http://mmif.clams.ai/vocabulary/TextDocument/v1](http://mmif.clams.ai/vocabulary/TextDocument/v1)
+(of any properties)
+
diff --git a/docs/_apps/smolvlm2-captioner/v0.2/metadata.json b/docs/_apps/smolvlm2-captioner/v0.2/metadata.json
@@ -0,0 +1,110 @@
+{
+  "name": "SmolVLM2 Captioner",
+  "description": "Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.",
+  "app_version": "v0.2",
+  "mmif_version": "1.1.0",
+  "app_license": "Apache 2.0",
+  "identifier": "http://apps.clams.ai/smolvlm2-captioner/v0.2",
+  "url": "https://github.com/clamsproject/app-smolvlm2-captioner",
+  "input": [
+    {
+      "@type": "http://mmif.clams.ai/vocabulary/VideoDocument/v1",
+      "required": true
+    },
+    {
+      "@type": "http://mmif.clams.ai/vocabulary/ImageDocument/v1",
+      "required": true
+    },
+    {
+      "@type": "http://mmif.clams.ai/vocabulary/TimeFrame/v6",
+      "required": true
+    }
+  ],
+  "output": [
+    {
+      "@type": "http://mmif.clams.ai/vocabulary/Alignment/v1"
+    },
+    {
+      "@type": "http://mmif.clams.ai/vocabulary/TextDocument/v1"
+    }
+  ],
+  "parameters": [
+    {
+      "name": "frameInterval",
+      "description": "The interval at which to extract frames from the video if there are no timeframe annotations. Default is every 30 frames.",
+      "type": "integer",
+      "default": 30,
+      "multivalued": false
+    },
+    {
+      "name": "defaultPrompt",
+      "description": "default prompt to use for timeframes not specified in the promptMap. If set to `-`, timeframes not specified in the promptMap will be skipped.",
+      "type": "string",
+      "default": "Describe what is shown in this video frame. Analyze the purpose of this frame in the context of a news video. Transcribe any text present.",
+      "multivalued": false
+    },
+    {
+      "name": "promptMap",
+      "description": "mapping of labels of the input timeframe annotations to new prompts. Must be formatted as \"IN_LABEL:PROMPT\" (with a colon). To pass multiple mappings, use this parameter multiple times. By default, any timeframe labels not mapped to a prompt will be used with the defaultprompt. In order to skip timeframes with a particular label, pass `-` as the prompt value.in order to skip all timeframes not specified in the promptMap, set the defaultPromptparameter to `-`",
+      "type": "map",
+      "default": [],
+      "multivalued": true
+    },
+    {
+      "name": "defaultSystemPrompt",
+      "description": "default system prompt to use for all timeframes. System prompts are passed to the model using the messages format with role=\"system\", providing context or instructions that guide the model's behavior. The processor will format this properly using its chat template.",
+      "type": "string",
+      "default": "",
+      "multivalued": false
+    },
+    {
+      "name": "systemPromptMap",
+      "description": "mapping of labels of the input timeframe annotations to system prompts. Must be formatted as \"IN_LABEL:SYSTEM_PROMPT\" (with a colon). To pass multiple mappings, use this parameter multiple times. System prompts are passed to the model using the messages format with role=\"system\", providing context or instructions that guide the model's behavior.",
+      "type": "map",
+      "default": [],
+      "multivalued": true
+    },
+    {
+      "name": "config",
+      "description": "Name of the config file to use.",
+      "type": "string",
+      "default": "config/default.yaml",
+      "multivalued": false
+    },
+    {
+      "name": "num_beams",
+      "description": "Number of beams for beam search during text generation. Default is 1. Higher values may improve quality but increase generation time.",
+      "type": "integer",
+      "default": 1,
+      "multivalued": false
+    },
+    {
+      "name": "batchSize",
+      "description": "Number of images to process in each batch. Default is 12. Higher values may improve throughput but require more memory.",
+      "type": "integer",
+      "default": 12,
+      "multivalued": false
+    },
+    {
+      "name": "pretty",
+      "description": "The JSON body of the HTTP response will be re-formatted with 2-space indentation",
+      "type": "boolean",
+      "default": false,
+      "multivalued": false
+    },
+    {
+      "name": "runningTime",
+      "description": "The running time of the app will be recorded in the view metadata",
+      "type": "boolean",
+      "default": false,
+      "multivalued": false
+    },
+    {
+      "name": "hwFetch",
+      "description": "The hardware information (architecture, GPU and vRAM) will be recorded in the view metadata",
+      "type": "boolean",
+      "default": false,
+      "multivalued": false
+    }
+  ]
+}
diff --git a/docs/_apps/smolvlm2-captioner/v0.2/submission.json b/docs/_apps/smolvlm2-captioner/v0.2/submission.json
@@ -0,0 +1,5 @@
+{
+  "time": "2026-01-28T03:14:45+00:00",
+  "submitter": "kelleyl",
+  "image": "ghcr.io/clamsproject/app-smolvlm2-captioner:v0.2"
+}
diff --git a/docs/_data/app-index.json b/docs/_data/app-index.json
@@ -1,4 +1,18 @@
 {
+  "http://apps.clams.ai/smolvlm2-captioner": {
+    "description": "Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.",
+    "latest_update": "2026-01-28T03:14:45+00:00",
+    "versions": [
+      [
+        "v0.2",
+        "kelleyl"
+      ],
+      [
+        "v0.1",
+        "kelleyl"
+      ]
+    ]
+  },
   "http://apps.clams.ai/swt-detection": {
     "description": "Detects scenes with text, like slates, chyrons and credits. This app can run in three modes, depending on `useClassifier`, `useStitcher` parameters. When `useClassifier=True`, it runs in the \"TimePoint mode\" and generates TimePoint annotations. When `useStitcher=True`, it runs in the \"TimeFrame mode\" and generates TimeFrame annotations based on existing TimePoint annotations -- if no TimePoint is found, it produces an error. By default, it runs in the 'both' mode and first generates TimePoint annotations and then TimeFrame annotations on them.",
     "latest_update": "2025-12-14T01:08:09+00:00",
@@ -109,16 +123,6 @@
       ]
     ]
   },
-  "http://apps.clams.ai/smolvlm2-captioner": {
-    "description": "Applies SmolVLM2-2.2B-Instruct multimodal model to video frames for image captioning.",
-    "latest_update": "2025-11-20T15:41:04+00:00",
-    "versions": [
-      [
-        "v0.1",
-        "kelleyl"
-      ]
-    ]
-  },
   "http://apps.clams.ai/tonedetection": {
     "description": "Detects spans of monotonic audio within an audio file",
     "latest_update": "2025-11-20T08:01:02+00:00",

diff --git a/docs/_data/apps.json b/docs/_data/apps.json