diff --git a/leap/leap-bundle/changelog.md b/leap/leap-bundle/changelog.md index 1775a1a..196f0fa 100644 --- a/leap/leap-bundle/changelog.md +++ b/leap/leap-bundle/changelog.md @@ -4,6 +4,15 @@ sidebar_position: 4 # Changelog +## `v0.9.0` - unreleased + +**New features** + +- GGUF is now the default inference engine for model bundling, generating `.gguf` files for llama.cpp inference. +- Add `--executorch` flag to use ExecuteTorch bundling instead of GGUF. ExecuteTorch inference is deprecated and may be removed in a future version. +- Add `--mmproj-quantization` option for GGUF bundling of vision-language and audio models. +- Support downloading multiple `.gguf` files for GGUF bundle requests. + ## `v0.8.0` - 2025-12-16 **Improvements** diff --git a/leap/leap-bundle/cli-spec.mdx b/leap/leap-bundle/cli-spec.mdx index 3f0c64e..ebdd49b 100644 --- a/leap/leap-bundle/cli-spec.mdx +++ b/leap/leap-bundle/cli-spec.mdx @@ -8,7 +8,9 @@ sidebar_position: 2 The Model Bundling Service provides a command-line interface (CLI) with two main features: -1. **LEAP Bundle Requests**: Upload model directories, create bundle requests, monitor processing status, and download completed bundles for the LEAP (Liquid Edge AI Platform) +1. **LEAP Bundle Requests**: Upload model directories, create bundle requests, monitor processing status, and download completed bundles for the LEAP (Liquid Edge AI Platform). Supports two inference engines: + - **GGUF (default)**: Generates `.gguf` files for llama.cpp inference + - **ExecuteTorch** (deprecated): Generates `.bundle` files for ExecuteTorch inference. This option may be removed in a future version. 2. **Manifest Downloads**: Download pre-packaged GGUF models from JSON manifest URLs without authentication ## Requirements @@ -209,7 +211,11 @@ leap-bundle create - `--sequential`: Upload files sequentially. This is the fallback option if parallel upload fails. - If neither `--parallel` nor `--sequential` is specified, the CLI will attempt parallel upload first, and fall back to sequential if it fails. - If both `--parallel` and `--sequential` are specified, `--parallel` takes precedence. -- `--quantization `: Specify the quantization type for the model bundle. Valid options: `8da4w_output_8da8w` (default), `8da8w_output_8da8w`. +- `--executorch` (deprecated): Use ExecuteTorch bundling instead of GGUF. By default, the CLI uses GGUF bundling. This option may be removed in a future version. +- `--quantization `: Specify the quantization type for the model bundle. + - For GGUF (default): `Q4_K_M` (default), `Q8_0`, `F16`, and [other llama.cpp quantization types](https://github.com/ggml-org/llama.cpp/blob/0a0bba05e8390ab7e4a54bb8c0ed0a25da64cf62/tools/quantize/quantize.cpp#L22-L58). + - For ExecuteTorch (deprecated): `8da4w_output_8da8w` (default), `8da8w_output_8da8w`. +- `--mmproj-quantization `: (GGUF only) Specify the mmproj quantization type for vision-language or audio models. Valid options: `q4`, `q8` (default), `f16`. **Behavior** @@ -245,8 +251,17 @@ leap-bundle create ./my-model-directory --json # Example JSON output when request already exists {"error": "A bundle request with the same input hash already exists: req_xyz789abc123", "status": "exists"} -# Create bundle with specific quantization -leap-bundle create ./my-model-directory --quantization 8da8w_output_8da8w +# Create GGUF bundle with specific quantization +leap-bundle create ./my-model-directory --quantization Q8_0 + +# Create ExecuteTorch bundle +leap-bundle create ./my-model-directory --executorch + +# Create ExecuteTorch bundle with specific quantization +leap-bundle create ./my-model-directory --executorch --quantization 8da8w_output_8da8w + +# Create GGUF bundle for VL model with mmproj quantization +leap-bundle create ./my-vl-model-directory --mmproj-quantization f16 ``` **Validation** @@ -588,7 +603,7 @@ This command supports two modes of operation: #### Mode 1: Bundle Request Download -Download the bundle file for a completed request. +Download the model files for a completed request. ```sh leap-bundle download [--output-path ] @@ -600,26 +615,27 @@ leap-bundle download [--output-path ] **Options** -- `--output-path `: Directory to save the downloaded file (default: current directory) +- `--output-path `: Directory to save the downloaded files (default: current directory) **Behavior** -- Requests a signed download URL from the LEAP platform -- Downloads the bundle file using the signed URL -- Saves the file with a default name or to the specified output path +- Requests signed download URLs from the LEAP platform +- Downloads the model files using the signed URLs +- Saves files with default names or to the specified output path +- GGUF requests may produce multiple `.gguf` files; ExecuteTorch requests produce a single `.bundle` file - **Requires authentication** via `leap-bundle login` **Examples** ```sh -# Download bundle request to current directory +# Download GGUF bundle request to current directory leap-bundle download 18734 # Example output ℹ Requesting download for bundle request 18734... ✓ Download URL obtained for request 18734 Downloading bundle output... ✓ -✓ Download completed successfully! File saved to: input-8da4w_output_8da8w-seq_8196.bundle +✓ Download completed successfully! File saved to: model-Q4_K_M.gguf # Download to specific directory leap-bundle download 18734 --output-path ./downloads/ @@ -628,7 +644,7 @@ leap-bundle download 18734 --output-path ./downloads/ ℹ Requesting download for bundle request 18734... ✓ Download URL obtained for request 18734 Downloading bundle output... ✓ -✓ Download completed successfully! File saved to: downloads/input-8da4w_output_8da8w-seq_8196.bundle +✓ Download completed successfully! File saved to: downloads/model-Q4_K_M.gguf ``` **Error Cases** diff --git a/leap/leap-bundle/quick-start.mdx b/leap/leap-bundle/quick-start.mdx index bb97d6f..4a8300e 100644 --- a/leap/leap-bundle/quick-start.mdx +++ b/leap/leap-bundle/quick-start.mdx @@ -6,11 +6,10 @@ sidebar_position: 1 The Bundling Service helps users create and manage model bundles for Liquid Edge AI Platform (LEAP). Currently users interact with it through `leap-bundle`, a command-line interface (CLI). -Here is a typical user workflow: +The CLI supports two inference engines for model bundling: -- Download an open source base model. -- Customize the base model with your own dataset e.g. by finetuning. -- Create a model bundle using the `leap-bundle` CLI for LEAP SDK. +- **GGUF (default)**: Generates `.gguf` files for llama.cpp inference +- **ExecuteTorch** (deprecated): Generates `.bundle` files for ExecuteTorch inference (use `--executorch` flag). This option may be removed in a future version. The CLI also supports downloading GGUF models directly from JSON manifest files. @@ -52,13 +51,7 @@ Manifest downloads don't require authentication with `leap-bundle login`. They w the model architecture comes from a base model that is part of the LEAP model library. ::: -If you have a custom-trained or fine-tuned model, you can create a model bundle for use with LEAP SDK. - -Here is a typical user workflow: - -- Download an open source base model. -- Customize the base model with your own dataset e.g. by finetuning. -- Create a model bundle using the `leap-bundle` CLI for LEAP SDK. +If you have a custom-trained or fine-tuned model, you can create a model bundle for use with LEAP SDK. By default, the CLI generates GGUF files for llama.cpp inference. Use the `--executorch` flag to generate ExecuteTorch bundles instead. ### Authenticate @@ -151,10 +144,10 @@ Example output: ℹ Requesting download for bundle request 1... ✓ Download URL obtained for request 1 Downloading bundle output... ✓ -✓ Download completed successfully! File saved to: input-8da4w_output_8da8w-seq_8196.bundle +✓ Download completed successfully! File saved to: model-Q4_K_M.gguf ``` -The model bundle file will be saved in the current directory with a `.bundle` extension. +The model files will be saved in the current directory. GGUF bundling produces `.gguf` files, while ExecuteTorch bundling produces `.bundle` files. ### Complete Example @@ -166,17 +159,20 @@ pip install leap-bundle leap-bundle login leap-bundle whoami -# 2. Create a bundle request +# 2. Create a bundle request (GGUF by default) leap-bundle create +# Or create an ExecuteTorch bundle +leap-bundle create --executorch + # 3. Monitor the request (repeat until completed) leap-bundle list # 4. Download when ready leap-bundle download -# 5. Your bundle file is now ready to use! -ls -la +# 5. Your model files are now ready to use! +ls -la ``` ### Managing Requests