From eda7b2c7a4f84ba2facffc3eb15bc6235b463b91 Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Fri, 2 Jan 2026 07:00:49 +0000 Subject: [PATCH 1/3] docs(leap-bundle): update documentation for GGUF bundling support - Update quick-start.mdx to document GGUF as default inference engine - Update cli-spec.mdx with --executorch flag and GGUF quantization options - Add v0.9.0 changelog entry for GGUF bundling features Co-Authored-By: Liren --- leap/leap-bundle/changelog.md | 13 +++++++++++ leap/leap-bundle/cli-spec.mdx | 40 ++++++++++++++++++++++---------- leap/leap-bundle/quick-start.mdx | 28 ++++++++++------------ 3 files changed, 53 insertions(+), 28 deletions(-) diff --git a/leap/leap-bundle/changelog.md b/leap/leap-bundle/changelog.md index 1775a1a..3b32602 100644 --- a/leap/leap-bundle/changelog.md +++ b/leap/leap-bundle/changelog.md @@ -4,6 +4,19 @@ sidebar_position: 4 # Changelog +## `v0.9.0` - 2026-01-02 + +**New features** + +- GGUF is now the default inference engine for model bundling, generating `.gguf` files for llama.cpp inference. +- Add `--executorch` flag to use ExecuteTorch bundling instead of GGUF. +- Add `--mmproj-quantization` option for GGUF bundling of vision-language and audio models. +- Support downloading multiple `.gguf` files for GGUF bundle requests. + +**Improvements** + +- Update `--quantization` option to support GGUF quantization types (e.g., `Q4_K_M`, `Q8_0`, `F16`). + ## `v0.8.0` - 2025-12-16 **Improvements** diff --git a/leap/leap-bundle/cli-spec.mdx b/leap/leap-bundle/cli-spec.mdx index 3f0c64e..571d984 100644 --- a/leap/leap-bundle/cli-spec.mdx +++ b/leap/leap-bundle/cli-spec.mdx @@ -8,7 +8,9 @@ sidebar_position: 2 The Model Bundling Service provides a command-line interface (CLI) with two main features: -1. **LEAP Bundle Requests**: Upload model directories, create bundle requests, monitor processing status, and download completed bundles for the LEAP (Liquid Edge AI Platform) +1. **LEAP Bundle Requests**: Upload model directories, create bundle requests, monitor processing status, and download completed bundles for the LEAP (Liquid Edge AI Platform). Supports two inference engines: + - **GGUF (default)**: Generates `.gguf` files for llama.cpp inference + - **ExecuteTorch**: Generates `.bundle` files for ExecuteTorch inference 2. **Manifest Downloads**: Download pre-packaged GGUF models from JSON manifest URLs without authentication ## Requirements @@ -209,7 +211,11 @@ leap-bundle create - `--sequential`: Upload files sequentially. This is the fallback option if parallel upload fails. - If neither `--parallel` nor `--sequential` is specified, the CLI will attempt parallel upload first, and fall back to sequential if it fails. - If both `--parallel` and `--sequential` are specified, `--parallel` takes precedence. -- `--quantization `: Specify the quantization type for the model bundle. Valid options: `8da4w_output_8da8w` (default), `8da8w_output_8da8w`. +- `--executorch`: Use ExecuteTorch bundling instead of GGUF. By default, the CLI uses GGUF bundling. +- `--quantization `: Specify the quantization type for the model bundle. + - For GGUF (default): `Q4_K_M` (default), `Q8_0`, `F16`, and [other llama.cpp quantization types](https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-quants.c). + - For ExecuteTorch: `8da4w_output_8da8w` (default), `8da8w_output_8da8w`. +- `--mmproj-quantization `: (GGUF only) Specify the mmproj quantization type for vision-language or audio models. Valid options: `q4`, `q8` (default), `f16`. **Behavior** @@ -245,8 +251,17 @@ leap-bundle create ./my-model-directory --json # Example JSON output when request already exists {"error": "A bundle request with the same input hash already exists: req_xyz789abc123", "status": "exists"} -# Create bundle with specific quantization -leap-bundle create ./my-model-directory --quantization 8da8w_output_8da8w +# Create GGUF bundle with specific quantization +leap-bundle create ./my-model-directory --quantization Q8_0 + +# Create ExecuteTorch bundle +leap-bundle create ./my-model-directory --executorch + +# Create ExecuteTorch bundle with specific quantization +leap-bundle create ./my-model-directory --executorch --quantization 8da8w_output_8da8w + +# Create GGUF bundle for VL model with mmproj quantization +leap-bundle create ./my-vl-model-directory --mmproj-quantization f16 ``` **Validation** @@ -588,7 +603,7 @@ This command supports two modes of operation: #### Mode 1: Bundle Request Download -Download the bundle file for a completed request. +Download the model files for a completed request. ```sh leap-bundle download [--output-path ] @@ -600,26 +615,27 @@ leap-bundle download [--output-path ] **Options** -- `--output-path `: Directory to save the downloaded file (default: current directory) +- `--output-path `: Directory to save the downloaded files (default: current directory) **Behavior** -- Requests a signed download URL from the LEAP platform -- Downloads the bundle file using the signed URL -- Saves the file with a default name or to the specified output path +- Requests signed download URLs from the LEAP platform +- Downloads the model files using the signed URLs +- Saves files with default names or to the specified output path +- GGUF requests may produce multiple `.gguf` files; ExecuteTorch requests produce a single `.bundle` file - **Requires authentication** via `leap-bundle login` **Examples** ```sh -# Download bundle request to current directory +# Download GGUF bundle request to current directory leap-bundle download 18734 # Example output ℹ Requesting download for bundle request 18734... ✓ Download URL obtained for request 18734 Downloading bundle output... ✓ -✓ Download completed successfully! File saved to: input-8da4w_output_8da8w-seq_8196.bundle +✓ Download completed successfully! File saved to: model-Q4_K_M.gguf # Download to specific directory leap-bundle download 18734 --output-path ./downloads/ @@ -628,7 +644,7 @@ leap-bundle download 18734 --output-path ./downloads/ ℹ Requesting download for bundle request 18734... ✓ Download URL obtained for request 18734 Downloading bundle output... ✓ -✓ Download completed successfully! File saved to: downloads/input-8da4w_output_8da8w-seq_8196.bundle +✓ Download completed successfully! File saved to: downloads/model-Q4_K_M.gguf ``` **Error Cases** diff --git a/leap/leap-bundle/quick-start.mdx b/leap/leap-bundle/quick-start.mdx index bb97d6f..3dc11d6 100644 --- a/leap/leap-bundle/quick-start.mdx +++ b/leap/leap-bundle/quick-start.mdx @@ -6,11 +6,10 @@ sidebar_position: 1 The Bundling Service helps users create and manage model bundles for Liquid Edge AI Platform (LEAP). Currently users interact with it through `leap-bundle`, a command-line interface (CLI). -Here is a typical user workflow: +The CLI supports two inference engines for model bundling: -- Download an open source base model. -- Customize the base model with your own dataset e.g. by finetuning. -- Create a model bundle using the `leap-bundle` CLI for LEAP SDK. +- **GGUF (default)**: Generates `.gguf` files for llama.cpp inference +- **ExecuteTorch**: Generates `.bundle` files for ExecuteTorch inference (use `--executorch` flag) The CLI also supports downloading GGUF models directly from JSON manifest files. @@ -52,13 +51,7 @@ Manifest downloads don't require authentication with `leap-bundle login`. They w the model architecture comes from a base model that is part of the LEAP model library. ::: -If you have a custom-trained or fine-tuned model, you can create a model bundle for use with LEAP SDK. - -Here is a typical user workflow: - -- Download an open source base model. -- Customize the base model with your own dataset e.g. by finetuning. -- Create a model bundle using the `leap-bundle` CLI for LEAP SDK. +If you have a custom-trained or fine-tuned model, you can create a model bundle for use with LEAP SDK. By default, the CLI generates GGUF files for llama.cpp inference. Use the `--executorch` flag to generate ExecuteTorch bundles instead. ### Authenticate @@ -151,10 +144,10 @@ Example output: ℹ Requesting download for bundle request 1... ✓ Download URL obtained for request 1 Downloading bundle output... ✓ -✓ Download completed successfully! File saved to: input-8da4w_output_8da8w-seq_8196.bundle +✓ Download completed successfully! File saved to: model-Q4_K_M.gguf ``` -The model bundle file will be saved in the current directory with a `.bundle` extension. +The model files will be saved in the current directory. GGUF bundling produces `.gguf` files, while ExecuteTorch bundling produces `.bundle` files. ### Complete Example @@ -166,17 +159,20 @@ pip install leap-bundle leap-bundle login leap-bundle whoami -# 2. Create a bundle request +# 2. Create a bundle request (GGUF by default) leap-bundle create +# Or create an ExecuteTorch bundle +leap-bundle create --executorch + # 3. Monitor the request (repeat until completed) leap-bundle list # 4. Download when ready leap-bundle download -# 5. Your bundle file is now ready to use! -ls -la +# 5. Your model files are now ready to use! +ls -la ``` ### Managing Requests From 5bb4edb0ac1a05728370a4aceeb93b27d7719783 Mon Sep 17 00:00:00 2001 From: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Date: Fri, 2 Jan 2026 07:08:07 +0000 Subject: [PATCH 2/3] docs(leap-bundle): add ExecuteTorch deprecation note and fix quantization link Co-Authored-By: Liren --- leap/leap-bundle/changelog.md | 2 +- leap/leap-bundle/cli-spec.mdx | 8 ++++---- leap/leap-bundle/quick-start.mdx | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/leap/leap-bundle/changelog.md b/leap/leap-bundle/changelog.md index 3b32602..c8e4b8b 100644 --- a/leap/leap-bundle/changelog.md +++ b/leap/leap-bundle/changelog.md @@ -9,7 +9,7 @@ sidebar_position: 4 **New features** - GGUF is now the default inference engine for model bundling, generating `.gguf` files for llama.cpp inference. -- Add `--executorch` flag to use ExecuteTorch bundling instead of GGUF. +- Add `--executorch` flag to use ExecuteTorch bundling instead of GGUF. ExecuteTorch inference is deprecated and may be removed in a future version. - Add `--mmproj-quantization` option for GGUF bundling of vision-language and audio models. - Support downloading multiple `.gguf` files for GGUF bundle requests. diff --git a/leap/leap-bundle/cli-spec.mdx b/leap/leap-bundle/cli-spec.mdx index 571d984..ebdd49b 100644 --- a/leap/leap-bundle/cli-spec.mdx +++ b/leap/leap-bundle/cli-spec.mdx @@ -10,7 +10,7 @@ The Model Bundling Service provides a command-line interface (CLI) with two main 1. **LEAP Bundle Requests**: Upload model directories, create bundle requests, monitor processing status, and download completed bundles for the LEAP (Liquid Edge AI Platform). Supports two inference engines: - **GGUF (default)**: Generates `.gguf` files for llama.cpp inference - - **ExecuteTorch**: Generates `.bundle` files for ExecuteTorch inference + - **ExecuteTorch** (deprecated): Generates `.bundle` files for ExecuteTorch inference. This option may be removed in a future version. 2. **Manifest Downloads**: Download pre-packaged GGUF models from JSON manifest URLs without authentication ## Requirements @@ -211,10 +211,10 @@ leap-bundle create - `--sequential`: Upload files sequentially. This is the fallback option if parallel upload fails. - If neither `--parallel` nor `--sequential` is specified, the CLI will attempt parallel upload first, and fall back to sequential if it fails. - If both `--parallel` and `--sequential` are specified, `--parallel` takes precedence. -- `--executorch`: Use ExecuteTorch bundling instead of GGUF. By default, the CLI uses GGUF bundling. +- `--executorch` (deprecated): Use ExecuteTorch bundling instead of GGUF. By default, the CLI uses GGUF bundling. This option may be removed in a future version. - `--quantization `: Specify the quantization type for the model bundle. - - For GGUF (default): `Q4_K_M` (default), `Q8_0`, `F16`, and [other llama.cpp quantization types](https://github.com/ggml-org/llama.cpp/blob/master/ggml/src/ggml-quants.c). - - For ExecuteTorch: `8da4w_output_8da8w` (default), `8da8w_output_8da8w`. + - For GGUF (default): `Q4_K_M` (default), `Q8_0`, `F16`, and [other llama.cpp quantization types](https://github.com/ggml-org/llama.cpp/blob/0a0bba05e8390ab7e4a54bb8c0ed0a25da64cf62/tools/quantize/quantize.cpp#L22-L58). + - For ExecuteTorch (deprecated): `8da4w_output_8da8w` (default), `8da8w_output_8da8w`. - `--mmproj-quantization `: (GGUF only) Specify the mmproj quantization type for vision-language or audio models. Valid options: `q4`, `q8` (default), `f16`. **Behavior** diff --git a/leap/leap-bundle/quick-start.mdx b/leap/leap-bundle/quick-start.mdx index 3dc11d6..4a8300e 100644 --- a/leap/leap-bundle/quick-start.mdx +++ b/leap/leap-bundle/quick-start.mdx @@ -9,7 +9,7 @@ The Bundling Service helps users create and manage model bundles for Liquid Edge The CLI supports two inference engines for model bundling: - **GGUF (default)**: Generates `.gguf` files for llama.cpp inference -- **ExecuteTorch**: Generates `.bundle` files for ExecuteTorch inference (use `--executorch` flag) +- **ExecuteTorch** (deprecated): Generates `.bundle` files for ExecuteTorch inference (use `--executorch` flag). This option may be removed in a future version. The CLI also supports downloading GGUF models directly from JSON manifest files. From ef93b6efd4593b94aaa6873f9cedd482ae114331 Mon Sep 17 00:00:00 2001 From: Liren Tu Date: Thu, 1 Jan 2026 23:11:13 -0800 Subject: [PATCH 3/3] Update changelog --- leap/leap-bundle/changelog.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/leap/leap-bundle/changelog.md b/leap/leap-bundle/changelog.md index c8e4b8b..196f0fa 100644 --- a/leap/leap-bundle/changelog.md +++ b/leap/leap-bundle/changelog.md @@ -4,7 +4,7 @@ sidebar_position: 4 # Changelog -## `v0.9.0` - 2026-01-02 +## `v0.9.0` - unreleased **New features** @@ -13,10 +13,6 @@ sidebar_position: 4 - Add `--mmproj-quantization` option for GGUF bundling of vision-language and audio models. - Support downloading multiple `.gguf` files for GGUF bundle requests. -**Improvements** - -- Update `--quantization` option to support GGUF quantization types (e.g., `Q4_K_M`, `Q8_0`, `F16`). - ## `v0.8.0` - 2025-12-16 **Improvements**