From eda7b2c7a4f84ba2facffc3eb15bc6235b463b91 Mon Sep 17 00:00:00 2001
From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Date: Fri, 2 Jan 2026 07:00:49 +0000
Subject: [PATCH 1/3] docs(leap-bundle): update documentation for GGUF bundling
 support

- Update quick-start.mdx to document GGUF as default inference engine
- Update cli-spec.mdx with --executorch flag and GGUF quantization options
- Add v0.9.0 changelog entry for GGUF bundling features

Co-Authored-By: Liren <tuliren@gmail.com>
---
 leap/leap-bundle/changelog.md    | 13 +++++++++++
 leap/leap-bundle/cli-spec.mdx    | 40 ++++++++++++++++++++++----------
 leap/leap-bundle/quick-start.mdx | 28 ++++++++++------------
 3 files changed, 53 insertions(+), 28 deletions(-)
diff --git a/leap/leap-bundle/changelog.md b/leap/leap-bundle/changelog.md
index 1775a1a..3b32602 100644
--- a/leap/leap-bundle/changelog.md
+++ b/leap/leap-bundle/changelog.md
@@ -4,6 +4,19 @@ sidebar_position: 4
 
 # Changelog
 
+## `v0.9.0` - 2026-01-02
+
+**New features**
+
+- GGUF is now the default inference engine for model bundling, generating `.gguf` files for llama.cpp inference.
+- Add `--executorch` flag to use ExecuteTorch bundling instead of GGUF.
+- Add `--mmproj-quantization` option for GGUF bundling of vision-language and audio models.
+- Support downloading multiple `.gguf` files for GGUF bundle requests.
+
+**Improvements**
+
+- Update `--quantization` option to support GGUF quantization types (e.g., `Q4_K_M`, `Q8_0`, `F16`).
+
 ## `v0.8.0` - 2025-12-16
 
 **Improvements**
diff --git a/leap/leap-bundle/cli-spec.mdx b/leap/leap-bundle/cli-spec.mdx
index 3f0c64e..571d984 100644
--- a/leap/leap-bundle/cli-spec.mdx
+++ b/leap/leap-bundle/cli-spec.mdx
@@ -8,7 +8,9 @@ sidebar_position: 2
 
 The Model Bundling Service provides a command-line interface (CLI) with two main features:
 
-1. **LEAP Bundle Requests**: Upload model directories, create bundle requests, monitor processing status, and download completed bundles for the LEAP (Liquid Edge AI Platform)
+1. **LEAP Bundle Requests**: Upload model directories, create bundle requests, monitor processing status, and download completed bundles for the LEAP (Liquid Edge AI Platform). Supports two inference engines:
+   - **GGUF (default)**: Generates `.gguf` files for llama.cpp inference
+   - **ExecuteTorch**: Generates `.bundle` files for ExecuteTorch inference
 2. **Manifest Downloads**: Download pre-packaged GGUF models from JSON manifest URLs without authentication
 
 ## Requirements
@@ -209,7 +211,11 @@ leap-bundle create <input-path>
 - `--sequential`: Upload files sequentially. This is the fallback option if parallel upload fails.
   - If neither `--parallel` nor `--sequential` is specified, the CLI will attempt parallel upload first, and fall back to sequential if it fails.
   - If both `--parallel` and `--sequential` are specified, `--parallel` takes precedence.
-- `--quantization <type>`: Specify the quantization type for the model bundle. Valid options: `8da4w_output_8da8w` (default), `8da8w_output_8da8w`.
+- `--executorch`: Use ExecuteTorch bundling instead of GGUF. By default, the CLI uses GGUF bundling.
+- `--quantization <type>`: Specify the quantization type for the model bundle.
+  - For GGUF (default): `Q4_K_M` (default), `Q8_0`, `F16`, and [other llama.cpp quantization types](https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-quants.c).
+  - For ExecuteTorch: `8da4w_output_8da8w` (default), `8da8w_output_8da8w`.
+- `--mmproj-quantization <type>`: (GGUF only) Specify the mmproj quantization type for vision-language or audio models. Valid options: `q4`, `q8` (default), `f16`.
 
 **Behavior**
 
@@ -245,8 +251,17 @@ leap-bundle create ./my-model-directory --json
 # Example JSON output when request already exists
 {"error": "A bundle request with the same input hash already exists: req_xyz789abc123", "status": "exists"}
 
-# Create bundle with specific quantization
-leap-bundle create ./my-model-directory --quantization 8da8w_output_8da8w
+# Create GGUF bundle with specific quantization
+leap-bundle create ./my-model-directory --quantization Q8_0
+
+# Create ExecuteTorch bundle
+leap-bundle create ./my-model-directory --executorch
+
+# Create ExecuteTorch bundle with specific quantization
+leap-bundle create ./my-model-directory --executorch --quantization 8da8w_output_8da8w
+
+# Create GGUF bundle for VL model with mmproj quantization
+leap-bundle create ./my-vl-model-directory --mmproj-quantization f16
 ```
 
 **Validation**
@@ -588,7 +603,7 @@ This command supports two modes of operation:
 
 #### Mode 1: Bundle Request Download
 
-Download the bundle file for a completed request.
+Download the model files for a completed request.
 
 ```sh
 leap-bundle download <request-id> [--output-path <path>]
@@ -600,26 +615,27 @@ leap-bundle download <request-id> [--output-path <path>]
 
 **Options**
 
-- `--output-path <path>`: Directory to save the downloaded file (default: current directory)
+- `--output-path <path>`: Directory to save the downloaded files (default: current directory)
 
 **Behavior**
 
-- Requests a signed download URL from the LEAP platform
-- Downloads the bundle file using the signed URL
-- Saves the file with a default name or to the specified output path
+- Requests signed download URLs from the LEAP platform
+- Downloads the model files using the signed URLs
+- Saves files with default names or to the specified output path
+- GGUF requests may produce multiple `.gguf` files; ExecuteTorch requests produce a single `.bundle` file
 - **Requires authentication** via `leap-bundle login`
 
 **Examples**
 
 ```sh
-# Download bundle request to current directory
+# Download GGUF bundle request to current directory
 leap-bundle download 18734
 
 # Example output
 ℹ Requesting download for bundle request 18734...
 ✓ Download URL obtained for request 18734
 Downloading bundle output... ✓
-✓ Download completed successfully! File saved to: input-8da4w_output_8da8w-seq_8196.bundle
+✓ Download completed successfully! File saved to: model-Q4_K_M.gguf
 
 # Download to specific directory
 leap-bundle download 18734 --output-path ./downloads/
@@ -628,7 +644,7 @@ leap-bundle download 18734 --output-path ./downloads/
 ℹ Requesting download for bundle request 18734...
 ✓ Download URL obtained for request 18734
 Downloading bundle output... ✓
-✓ Download completed successfully! File saved to: downloads/input-8da4w_output_8da8w-seq_8196.bundle
+✓ Download completed successfully! File saved to: downloads/model-Q4_K_M.gguf
 ```
 
 **Error Cases**
diff --git a/leap/leap-bundle/quick-start.mdx b/leap/leap-bundle/quick-start.mdx
index bb97d6f..3dc11d6 100644
--- a/leap/leap-bundle/quick-start.mdx
+++ b/leap/leap-bundle/quick-start.mdx
@@ -6,11 +6,10 @@ sidebar_position: 1
 
 The Bundling Service helps users create and manage model bundles for Liquid Edge AI Platform (LEAP). Currently users interact with it through `leap-bundle`, a command-line interface (CLI).
 
-Here is a typical user workflow:
+The CLI supports two inference engines for model bundling:
 
-- Download an open source base model.
-- Customize the base model with your own dataset e.g. by finetuning.
-- Create a model bundle using the `leap-bundle` CLI for LEAP SDK.
+- **GGUF (default)**: Generates `.gguf` files for llama.cpp inference
+- **ExecuteTorch**: Generates `.bundle` files for ExecuteTorch inference (use `--executorch` flag)
 
 The CLI also supports downloading GGUF models directly from JSON manifest files.
 
@@ -52,13 +51,7 @@ Manifest downloads don't require authentication with `leap-bundle login`. They w
 the model architecture comes from a base model that is part of the LEAP model library.
 :::
 
-If you have a custom-trained or fine-tuned model, you can create a model bundle for use with LEAP SDK.
-
-Here is a typical user workflow:
-
-- Download an open source base model.
-- Customize the base model with your own dataset e.g. by finetuning.
-- Create a model bundle using the `leap-bundle` CLI for LEAP SDK.
+If you have a custom-trained or fine-tuned model, you can create a model bundle for use with LEAP SDK. By default, the CLI generates GGUF files for llama.cpp inference. Use the `--executorch` flag to generate ExecuteTorch bundles instead.
 
 ### Authenticate
 
@@ -151,10 +144,10 @@ Example output:
 ℹ Requesting download for bundle request 1...
 ✓ Download URL obtained for request 1
 Downloading bundle output... ✓
-✓ Download completed successfully! File saved to: input-8da4w_output_8da8w-seq_8196.bundle
+✓ Download completed successfully! File saved to: model-Q4_K_M.gguf
 ```
 
-The model bundle file will be saved in the current directory with a `.bundle` extension.
+The model files will be saved in the current directory. GGUF bundling produces `.gguf` files, while ExecuteTorch bundling produces `.bundle` files.
 
 ### Complete Example
 
@@ -166,17 +159,20 @@ pip install leap-bundle
 leap-bundle login <api-key>
 leap-bundle whoami
 
-# 2. Create a bundle request
+# 2. Create a bundle request (GGUF by default)
 leap-bundle create <model-directory>
 
+# Or create an ExecuteTorch bundle
+leap-bundle create <model-directory> --executorch
+
 # 3. Monitor the request (repeat until completed)
 leap-bundle list
 
 # 4. Download when ready
 leap-bundle download <request-id>
 
-# 5. Your bundle file is now ready to use!
-ls -la <downloaded-bundle-file>
+# 5. Your model files are now ready to use!
+ls -la <downloaded-model-files>
 ```
 
 ### Managing Requests

From 5bb4edb0ac1a05728370a4aceeb93b27d7719783 Mon Sep 17 00:00:00 2001
From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Date: Fri, 2 Jan 2026 07:08:07 +0000
Subject: [PATCH 2/3] docs(leap-bundle): add ExecuteTorch deprecation note and
 fix quantization link

Co-Authored-By: Liren <tuliren@gmail.com>
---
 leap/leap-bundle/changelog.md    | 2 +-
 leap/leap-bundle/cli-spec.mdx    | 8 ++++----
 leap/leap-bundle/quick-start.mdx | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/leap/leap-bundle/changelog.md b/leap/leap-bundle/changelog.md
index 3b32602..c8e4b8b 100644
--- a/leap/leap-bundle/changelog.md
+++ b/leap/leap-bundle/changelog.md
@@ -9,7 +9,7 @@ sidebar_position: 4
 **New features**
 
 - GGUF is now the default inference engine for model bundling, generating `.gguf` files for llama.cpp inference.
-- Add `--executorch` flag to use ExecuteTorch bundling instead of GGUF.
+- Add `--executorch` flag to use ExecuteTorch bundling instead of GGUF. ExecuteTorch inference is deprecated and may be removed in a future version.
 - Add `--mmproj-quantization` option for GGUF bundling of vision-language and audio models.
 - Support downloading multiple `.gguf` files for GGUF bundle requests.
 
diff --git a/leap/leap-bundle/cli-spec.mdx b/leap/leap-bundle/cli-spec.mdx
index 571d984..ebdd49b 100644
--- a/leap/leap-bundle/cli-spec.mdx
+++ b/leap/leap-bundle/cli-spec.mdx
@@ -10,7 +10,7 @@ The Model Bundling Service provides a command-line interface (CLI) with two main
 
 1. **LEAP Bundle Requests**: Upload model directories, create bundle requests, monitor processing status, and download completed bundles for the LEAP (Liquid Edge AI Platform). Supports two inference engines:
    - **GGUF (default)**: Generates `.gguf` files for llama.cpp inference
-   - **ExecuteTorch**: Generates `.bundle` files for ExecuteTorch inference
+   - **ExecuteTorch** (deprecated): Generates `.bundle` files for ExecuteTorch inference. This option may be removed in a future version.
 2. **Manifest Downloads**: Download pre-packaged GGUF models from JSON manifest URLs without authentication
 
 ## Requirements
@@ -211,10 +211,10 @@ leap-bundle create <input-path>
 - `--sequential`: Upload files sequentially. This is the fallback option if parallel upload fails.
   - If neither `--parallel` nor `--sequential` is specified, the CLI will attempt parallel upload first, and fall back to sequential if it fails.
   - If both `--parallel` and `--sequential` are specified, `--parallel` takes precedence.
-- `--executorch`: Use ExecuteTorch bundling instead of GGUF. By default, the CLI uses GGUF bundling.
+- `--executorch` (deprecated): Use ExecuteTorch bundling instead of GGUF. By default, the CLI uses GGUF bundling. This option may be removed in a future version.
 - `--quantization <type>`: Specify the quantization type for the model bundle.
-  - For GGUF (default): `Q4_K_M` (default), `Q8_0`, `F16`, and [other llama.cpp quantization types](https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-quants.c).
-  - For ExecuteTorch: `8da4w_output_8da8w` (default), `8da8w_output_8da8w`.
+  - For GGUF (default): `Q4_K_M` (default), `Q8_0`, `F16`, and [other llama.cpp quantization types](https://github.com/ggml-org/llama.cpp/blob/0a0bba05e8390ab7e4a54bb8c0ed0a25da64cf62/tools/quantize/quantize.cpp#L22-L58).
+  - For ExecuteTorch (deprecated): `8da4w_output_8da8w` (default), `8da8w_output_8da8w`.
 - `--mmproj-quantization <type>`: (GGUF only) Specify the mmproj quantization type for vision-language or audio models. Valid options: `q4`, `q8` (default), `f16`.
 
 **Behavior**
diff --git a/leap/leap-bundle/quick-start.mdx b/leap/leap-bundle/quick-start.mdx
index 3dc11d6..4a8300e 100644
--- a/leap/leap-bundle/quick-start.mdx
+++ b/leap/leap-bundle/quick-start.mdx
@@ -9,7 +9,7 @@ The Bundling Service helps users create and manage model bundles for Liquid Edge
 The CLI supports two inference engines for model bundling:
 
 - **GGUF (default)**: Generates `.gguf` files for llama.cpp inference
-- **ExecuteTorch**: Generates `.bundle` files for ExecuteTorch inference (use `--executorch` flag)
+- **ExecuteTorch** (deprecated): Generates `.bundle` files for ExecuteTorch inference (use `--executorch` flag). This option may be removed in a future version.
 
 The CLI also supports downloading GGUF models directly from JSON manifest files.
 

From ef93b6efd4593b94aaa6873f9cedd482ae114331 Mon Sep 17 00:00:00 2001
From: Liren Tu <tuliren@gmail.com>
Date: Thu, 1 Jan 2026 23:11:13 -0800
Subject: [PATCH 3/3] Update changelog

---
 leap/leap-bundle/changelog.md | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/leap/leap-bundle/changelog.md b/leap/leap-bundle/changelog.md
index c8e4b8b..196f0fa 100644
--- a/leap/leap-bundle/changelog.md
+++ b/leap/leap-bundle/changelog.md
@@ -4,7 +4,7 @@ sidebar_position: 4
 
 # Changelog
 
-## `v0.9.0` - 2026-01-02
+## `v0.9.0` - unreleased
 
 **New features**
 
@@ -13,10 +13,6 @@ sidebar_position: 4
 - Add `--mmproj-quantization` option for GGUF bundling of vision-language and audio models.
 - Support downloading multiple `.gguf` files for GGUF bundle requests.
 
-**Improvements**
-
-- Update `--quantization` option to support GGUF quantization types (e.g., `Q4_K_M`, `Q8_0`, `F16`).
-
 ## `v0.8.0` - 2025-12-16
 
 **Improvements**