From a041bcd745eb2cb41f42970614a1a1689e4a285f Mon Sep 17 00:00:00 2001 From: Joseph Schlesinger Date: Thu, 1 Jan 2026 07:39:10 +0000 Subject: [PATCH] Add Modal documentation - Source: llms.txt (modal.com/llms.txt) - Files: 1 file (modal-full.md) - Size: 1.7 MB - Path: docs/llms-txt/modal/ - Description: Modal serverless platform documentation covering functions, GPU acceleration, deployment, scheduled jobs, volumes, and containerization Generated with Claude Code --- docs/llms-txt/modal/modal-full.md | 50791 ++++++++++++++++++++++++++++ index.yaml | 859 +- scripts/llms-sites.yaml | 3 + 3 files changed, 51253 insertions(+), 400 deletions(-) create mode 100644 docs/llms-txt/modal/modal-full.md diff --git a/docs/llms-txt/modal/modal-full.md b/docs/llms-txt/modal/modal-full.md new file mode 100644 index 0000000000..c2e98c6318 --- /dev/null +++ b/docs/llms-txt/modal/modal-full.md @@ -0,0 +1,50791 @@ +# Modal Documentation + +Source: https://modal.com/llms-full.txt + +--- + +# Modal llms-full.txt + +> Modal is a platform for running Python code in the cloud with minimal +> configuration, especially for serving AI models and high-performance batch +> processing. It supports fast prototyping, serverless APIs, scheduled jobs, +> GPU inference, distributed volumes, and sandboxes. + +Important notes: + +- Modal's primitives are embedded in Python and tailored for AI/GPU use cases, + but they can be used for general-purpose cloud compute. +- Modal is a serverless platform, meaning you are only billed for resources used + and can spin up containers on demand in seconds. + +You can sign up for free at [https://modal.com] and get $30/month of credits. + +## Guides + +### Custom container images + +#### Defining Images + +# Images + +This guide walks you through how to define a Modal Image, the environment your Modal code runs in. + +The typical flow for defining an Image in Modal is +[method chaining](https://jugad2.blogspot.com/2016/02/examples-of-method-chaining-in-python.html) +starting from a base Image, like this: + +```python +image = ( + modal.Image.debian_slim(python_version="3.13") + .apt_install("git") + .uv_pip_install("torch<3") + .env({"HALT_AND_CATCH_FIRE": "0"}) + .run_commands("git clone https://github.com/modal-labs/agi && echo 'ready to go!'") +) +``` + +If you have your own container image defintions, like a Dockerfile or a registry link, you can use those too! +See [this guide](https://modal.com/docs/guide/existing-images). + +This page is a high-level guide to using Modal Images. +For reference documentation on the `modal.Image` object, see +[this page](https://modal.com/docs/reference/modal.Image). + +## What are Images? + +Your code on Modal runs in _containers_. Containers are like light-weight +virtual machines -- container engines use +[operating system tricks](https://earthly.dev/blog/chroot/) to isolate programs +from each other ("containing" them), making them work as though they were +running on their own hardware with their own filesystem. This makes execution +environments more reproducible, for example by preventing accidental +cross-contamination of environments on the same machine. For added security, +Modal runs containers using the sandboxed +[gVisor container runtime](https://cloud.google.com/blog/products/identity-security/open-sourcing-gvisor-a-sandboxed-container-runtime). + +Containers are started up from a stored "snapshot" of their filesystem state +called an _image_. Producing the image for a container is called _building_ the +image. + +By default, Modal Functions and Sandboxes run in a +[Debian Linux](https://en.wikipedia.org/wiki/Debian) container with a basic +Python installation of the same minor version `v3.x` as your local Python +interpreter. + +To make your Apps and Functions useful, you will probably need some third party system packages +or Python libraries. Modal provides a number of options to customize your container images at +different levels of abstraction and granularity, from high-level convenience +methods like `pip_install` through wrappers of core container image build +features like `RUN` and `ENV`. We'll cover each of these in this guide, +along with tips and tricks for building Images effectively when using each tool. + +## Add Python packages + +The simplest and most common Image modification is to add a third party +Python package, like [`pandas`](https://pandas.pydata.org/). + +You can add Python packages to the environment by passing all the packages you +need to the [`Image.uv_pip_install`](https://modal.com/docs/reference/modal.Image#uv_pip_install) method, +which installs packages with [`uv`](https://docs.astral.sh/uv/): + +```python +import modal + +datascience_image = ( + modal.Image.debian_slim() + .uv_pip_install("pandas==2.2.0", "numpy") +) + +@app.function(image=datascience_image) +def my_function(): + import pandas as pd + import numpy as np + + df = pd.DataFrame() + ... +``` + +You can include +[Python dependency version specifiers](https://peps.python.org/pep-0508/), +like `"torch<3"`, in the arguments. But we recommend pinning dependencies +tightly, like `"torch==2.8.0"`, to improve the reproducibility and robustness +of your builds. + +If you run into any issues with +[`Image.uv_pip_install`](https://modal.com/docs/reference/modal.Image#uv_pip_install), then +you can fallback to [`Image.pip_install`](https://modal.com/docs/reference/modal.Image#pip_install) which +uses standard [`pip`](https://pip.pypa.io/en/stable/user_guide/): + +```python +datascience_image = ( + modal.Image.debian_slim(python_version="3.13") + .pip_install("pandas==2.2.0", "numpy") +) +``` + +Note that because you can define a different environment for each and every +function if you so choose, you don't need to worry about virtual +environment management. Containers make for much better separation of concerns! + +If you want to run a specific version of Python remotely rather than just +matching the one you're running locally, provide the `python_version` as a +string when constructing the base image, like we did above. + +## Add local files with `add_local_dir` and `add_local_file` + +Sometimes your containers need a dependency that's not available on the Internet, +like configuration files or code on your laptop. + +To forward files from your local system use the +`image.add_local_dir` and `image.add_local_file` Image methods. + +```python +image = modal.Image.debian_slim().add_local_dir("/user/erikbern/.aws", remote_path="/root/.aws") +``` + +By default, these files are added to your container as it starts up rather than introducing +a new Image layer. This means that the redeployment after making changes is really quick, but +also means you can't run additional build steps after. You can specify a `copy=True` argument +to the `add_local_` methods to instead force the files to be included in the built Image. + +### Add local Python code with `add_local_python_source` + +You can add Python code that's importable locally to your container +by providing the module name to +[`Image.add_local_python_source`](https://modal.com/docs/reference/modal.Image#add_local_python_source). + +```python +image_with_module = modal.Image.debian_slim().add_local_python_source("local_module") + +@app.function(image=image_with_module) +def f(): + import local_module + + local_module.do_stuff() +``` + +The difference from `add_local_dir` is that `add_local_python_source` takes module names as arguments +instead of a file system path and looks up the local package's or module's location via Python's importing +mechanism. The files are then added to directories that make them importable in containers in the +same way as they are locally. + +This is intended for pure Python auxiliary modules that are part of your project and that your code imports. +Third party packages should be installed via +[`Image.uv_pip_install`](https://modal.com/docs/reference/modal.Image#uv_pip_install) or similar. + +### What if I have different Python packages locally and remotely? + +You might want to use packages inside your Modal code that you don't have on +your local computer. In the example above, we build a container that uses +`pandas`. But if we don't have `pandas` locally, on the computer building the +Modal App, we can't put `import pandas` at the top of the script, since it would +cause an `ImportError`. + +The easiest solution to this is to put `import pandas` in the function body +instead, as you can see above. This means that `pandas` is only imported when +running inside the remote Modal container, which has `pandas` installed. + +Be careful about what you return from Modal Functions that have different +packages installed than the ones you have locally! Modal Functions return Python +objects, like `pandas.DataFrame`s, and if your local machine doesn't have +`pandas` installed, it won't be able to handle a `pandas` object (the error +message you see will mention +[serialization](https://hazelcast.com/glossary/serialization/)/[deserialization](https://hazelcast.com/glossary/deserialization/)). + +If you have a lot of Functions and a lot of Python packages, you might want to +keep the imports in the global scope so that every function can use the same +imports. In that case, you can use the +[`Image.imports`](https://modal.com/docs/reference/modal.Image#imports) context manager: + +```python +pandas_image = modal.Image.debian_slim().pip_install("pandas", "numpy") + +with pandas_image.imports(): + import pandas as pd + import numpy as np + +@app.function(image=pandas_image) +def my_function(): + df = pd.DataFrame() + ... +``` + +Because these imports happen before a new container processes its first input, +you can combine this decorator with [memory snapshots](https://modal.com/docs/guide/memory-snapshot) +to improve [cold start performance](https://modal.com/docs/guide/cold-start#share-initialization-work-across-cold-starts-with-memory-snapshots) +for Functions that frequently scale from zero. + +## Install system packages with `.apt_install` + +You can install Linux packages with the [`apt` package manager](https://www.debian.org/doc/manuals/apt-guide/index.en.html) +using [`Image.apt_install`](https://modal.com/docs/reference/modal.Image#apt_install): + +```python +image = modal.Image.debian_slim().apt_install("git", "curl") +``` + +## Set environment variables with `.env` + +You can change the environment variables that your code sees +(in, e.g., [`os.environ`](https://docs.python.org/3/library/os.html#os.environ)) +by passing a dictionary to [`Image.env`](https://modal.com/docs/reference/modal.Image#env): + +```python +image = modal.Image.debian_slim().env({"PORT": "6443"}) +``` + +Environment variable names and values must be strings. + +## Run shell commands with `.run_commands` + +You can supply shell commands that should be executed when building the +Image to [`Image.run_commands`](https://modal.com/docs/reference/modal.Image#run_commands): + +```python +image_with_repo = ( + modal.Image.debian_slim().apt_install("git").run_commands( + "git clone https://github.com/modal-labs/gpu-glossary" + ) +) +``` + +## Run a Python function during your build with `.run_function` + +You can run Python code as a build step using the +[`Image.run_function`](https://modal.com/docs/reference/modal.Image#run_function) method. + +For example, you can use this to download model parameters from Hugging Face into +your Image: + +```python +import os + +def download_models() -> None: + import diffusers + + model_name = "segmind/small-sd" + pipe = diffusers.StableDiffusionPipeline.from_pretrained( + model_name, use_auth_token=os.environ["HF_TOKEN"] + ) + +hf_cache = modal.Volume.from_name("hf-cache") + +image = ( + modal.Image.debian_slim() + .pip_install("diffusers[torch]", "transformers", "ftfy", "accelerate") + .run_function( + download_models, + secrets=[modal.Secret.from_name("huggingface-secret")], + volumes={"/root/.cache/huggingface": hf_cache}, + ) +) +``` + +For details on storing model weights on Modal, see +[this guide](https://modal.com/docs/guide/model-weights). + +Essentially, this is equivalent to running a Modal Function and snapshotting the +resulting filesystem as a new Image. Any kwargs accepted by [`@app.function`](https://modal.com/docs/reference/modal.App#function) +([`Volume`s](https://modal.com/docs/guide/volumes), [`Secret`s](https://modal.com/docs/guide/secrets), specifications of +resources like [GPUs](https://modal.com/docs/guide/gpu)) can be supplied here. + +Whenever you change other features of your Image, like the base Image or the +version of a Python package, the Image will automatically be rebuilt the next +time it is used. This is a bit more complicated when changing the contents of +functions. See the +[reference documentation](https://modal.com/docs/reference/modal.Image#run_function) for details. + +## Attach GPUs during setup + +If a step in the setup of your Image should be run on an instance with +a GPU (e.g., so that a package can query the GPU to set compilation flags), pass the +desired GPU type when defining that step: + +```python +image = ( + modal.Image.debian_slim() + .pip_install("bitsandbytes", gpu="H100") +) +``` + +## Use `mamba` instead of `pip` with `micromamba_install` + +`pip` installs Python packages, but some Python workloads require the +coordinated installation of system packages as well. The `mamba` package manager +can install both. Modal provides a pre-built +[Micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html) +base image that makes it easy to work with `micromamba`: + +```python +app = modal.App("bayes-pgm") + +numpyro_pymc_image = ( + modal.Image.micromamba() + .micromamba_install("pymc==5.10.4", "numpyro==0.13.2", channels=["conda-forge"]) +) + +@app.function(image=numpyro_pymc_image) +def sample(): + import pymc as pm + import numpyro as np + + print(f"Running on PyMC v{pm.__version__} with JAX/numpyro v{np.__version__} backend") + ... +``` + +## Image caching and rebuilds + +Modal uses the definition of an Image to determine whether it needs to be +rebuilt. If the definition hasn't changed since the last time you ran or +deployed your App, the previous version will be pulled from the cache. + +Images are cached per layer (i.e., per `Image` method call), and breaking +the cache on a single layer will cause cascading rebuilds for all subsequent +layers. You can shorten iteration cycles by defining frequently-changing +layers last so that the cached version of all other layers can be used. + +In some cases, you may want to force an Image to rebuild, even if the +definition hasn't changed. You can do this by adding the `force_build=True` +argument to any of the Image building methods. + +```python +image = ( + modal.Image.debian_slim() + .apt_install("git") + .pip_install("slack-sdk", force_build=True) + .run_commands("echo hi") +) +``` + +As in other cases where a layer's definition changes, both the `pip_install` and +`run_commands` layers will rebuild, but the `apt_install` will not. Remember to +remove `force_build=True` after you've rebuilt the Image, or it will +rebuild every time you run your code. + +Alternatively, you can set the `MODAL_FORCE_BUILD` environment variable (e.g. +`MODAL_FORCE_BUILD=1 modal run ...`) to rebuild all images attached to your App. +But note that when you rebuild a base layer, the cache will be invalidated for _all_ +Images that depend on it, and they will rebuild the next time you run or deploy +any App that uses that base. If you're debugging an issue with your Image, a better +option might be using `MODAL_IGNORE_CACHE=1`. This will rebuild the Image from the +top without breaking the Image cache or affecting subsequent builds. + +## Image builder updates + +Because changes to base images will cause cascading rebuilds, Modal is +conservative about updating the base definitions that we provide. But many +things are baked into these definitions, like the specific versions of the Image +OS, the included Python, and the Modal client dependencies. + +We provide a separate mechanism for keeping base images up-to-date without +causing unpredictable rebuilds: the "Image Builder Version". This is a workspace +level-configuration that will be used for every Image built in your workspace. +We release a new Image Builder Version every few months but allow you to update +your workspace's configuration when convenient. After updating, your next +deployment will take longer, because your Images will rebuild. You may also +encounter problems, especially if your Image definition does not pin the version +of the third-party libraries that it installs (as your new Image will get the +latest version of these libraries, which may contain breaking changes). + +You can set the Image Builder Version for your workspace by going to your +[workspace settings](https://modal.com/settings/image-config). This page also documents the +important updates in each version. + +#### Using existing container images + +# Using existing images + +This guide walks you through how to use an existing container image as a Modal Image. + +```python notest +sklearn_image = modal.Image.from_registry("huanjason/scikit-learn") +custom_image = modal.Image.from_dockerfile("./src/Dockerfile") +``` + +## Load an image from a public registry with `.from_registry` + +To load an image from a public registry, just pass the image name, including any tags, to [`Image.from_registry`](https://modal.com/docs/reference/modal.Image#from_registry): + +```python +sklearn_image = modal.Image.from_registry("huanjason/scikit-learn") + +@app.function(image=sklearn_image) +def fit_knn(): + from sklearn.neighbors import KNeighborsClassifier + ... +``` + +The `from_registry` method can load images from all public registries, such as +[Nvidia's `nvcr.io`](https://catalog.ngc.nvidia.com/containers), +[AWS ECR](https://aws.amazon.com/ecr/), and +[GitHub's `ghcr.io`](https://docs.github.com/en/packages/working-with-a-github-packages-registry/working-with-the-container-registry). + +You can further modify the image [just like any other Modal Image](https://modal.com/docs/guide/images): + +```python continuation +data_science_image = sklearn_image.uv_pip_install("polars", "datasette") +``` + +You can use external images so long as + +- The image is built for the + [`linux/amd64` platform](https://unix.stackexchange.com/questions/53415/why-are-64-bit-distros-often-called-amd64) +- The image has a [compatible `ENTRYPOINT`](#entrypoint) + +Additionally, to be used with a Modal Function, the image needs to have `python` and `pip` +installed and available on the `$PATH`. +If an existing image does not have either `python` or `pip` set up compatibly, you +can still use it. Just provide a version number as the `add_python` argument to +install a reproducible +[standalone build](https://github.com/indygreg/python-build-standalone) +of Python: + +```python +ubuntu_image = modal.Image.from_registry("ubuntu:22.04", add_python="3.11") +valhalla_image = modal.Image.from_registry("gisops/valhalla:latest", add_python="3.12") +``` + +There are some additional restrictions for older versions of the Modal image builder. +Image builder version is set at a workspace level via the settings page [here](https://modal.com/settings/image-config). +See the migration guides on that page for details on any additional restrictions on images. + +## Load images from private registries + +You can also use images defined in private container registries on Modal. +The exact method depends on the registry you are using. + +### Docker Hub (Private) + +To pull container images from private Docker Hub repositories, +[create an access token](https://docs.docker.com/security/for-developers/access-tokens/) +with "Read-Only" permissions and use this token value and your Docker Hub +username to create a Modal [Secret](https://modal.com/docs/guide/secrets). + +``` +REGISTRY_USERNAME=my-dockerhub-username +REGISTRY_PASSWORD=dckr_pat_REDACTED_FOR_SECURITY +``` + +Use this Secret with the +[`modal.Image.from_registry`](https://modal.com/docs/reference/modal.Image#from_registry) method. + +### Elastic Container Registry (ECR) + +You can pull images from your AWS ECR account by specifying the full image URI +as follows: + +```python +import modal + +aws_secret = modal.Secret.from_name("my-aws-secret") +image = ( + modal.Image.from_aws_ecr( + "000000000000.dkr.ecr.us-east-1.amazonaws.com/my-private-registry:latest", + secret=aws_secret, + ) + .pip_install("torch", "huggingface") +) + +app = modal.App(image=image) +``` + +As shown above, you also need to use a [Modal Secret](https://modal.com/docs/guide/secrets) +containing the environment variables `AWS_ACCESS_KEY_ID`, +`AWS_SECRET_ACCESS_KEY`, and `AWS_REGION`. The AWS IAM user account associated +with those keys must have access to the private registry you want to access. + +Alternatively, you can use [OIDC token authentication](https://modal.com/docs/guide/oidc-integration#pull-images-from-aws-elastic-container-registry-ecr). + +The user needs to have the following read-only policies: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Action": ["ecr:GetAuthorizationToken"], + "Effect": "Allow", + "Resource": "*" + }, + { + "Effect": "Allow", + "Action": [ + "ecr:BatchCheckLayerAvailability", + "ecr:GetDownloadUrlForLayer", + "ecr:GetRepositoryPolicy", + "ecr:DescribeRepositories", + "ecr:ListImages", + "ecr:DescribeImages", + "ecr:BatchGetImage", + "ecr:GetLifecyclePolicy", + "ecr:GetLifecyclePolicyPreview", + "ecr:ListTagsForResource", + "ecr:DescribeImageScanFindings" + ], + "Resource": "" + } + ] +} +``` + +You can use the IAM configuration above as a template for creating an IAM user. +You can then +[generate an access key](https://aws.amazon.com/premiumsupport/knowledge-center/create-access-key/) +and create a Modal Secret using the AWS integration option. Modal will use your +access keys to generate an ephemeral ECR token. That token is only used to pull +image layers at the time a new image is built. We don't store this token but +will cache the image once it has been pulled. + +Images on ECR must be private and follow +[image configuration requirements](https://modal.com/docs/reference/modal.Image#from_aws_ecr). + +### Google Artifact Registry and Google Container Registry + +For further detail on how to pull images from Google's image registries, see +[`modal.Image.from_gcp_artifact_registry`](https://modal.com/docs/reference/modal.Image#from_gcp_artifact_registry). + +## Bring your own image definition with `.from_dockerfile` + +You can define an Image from an existing Dockerfile by passing its path to +[`Image.from_dockerfile`](https://modal.com/docs/reference/modal.Image#from_dockerfile): + +```python +dockerfile_image = modal.Image.from_dockerfile("Dockerfile") + +@app.function(image=dockerfile_image) +def fit(): + import sklearn + ... +``` + +Note that you can still extend this Image using image builder methods! +See [the guide](https://modal.com/docs/guide/images) for details. + +### Dockerfile command compatibility + +Since Modal doesn't use Docker to build containers, we have our own +implementation of the +[Dockerfile specification](https://docs.docker.com/engine/reference/builder/). +Most Dockerfiles should work out of the box, but there are some differences to +be aware of. + +First, a few minor Dockerfile commands and flags have not been implemented yet. +These include `ONBUILD`, `STOPSIGNAL`, and `VOLUME`. +Please reach out to us if your use case requires any of these. + +Next, there are some command-specific things that may be useful when porting a +Dockerfile to Modal. + +#### `ENTRYPOINT` + +While the +[`ENTRYPOINT`](https://docs.docker.com/engine/reference/builder/#entrypoint) +command is supported, there is an additional constraint to the entrypoint script +provided: when used with a Modal Function, it must also `exec` the arguments passed to it at some point. +This is so the Modal Function runtime's Python entrypoint can run after your own. Most entrypoint +scripts in Docker containers are wrappers over other scripts, so this is likely +already the case. + +If you wish to write your own entrypoint script, you can use the following as a +template: + +```bash +#!/usr/bin/env bash + +# Your custom startup commands here. + +exec "$@" # Runs the command passed to the entrypoint script. +``` + +If the above file is saved as `/usr/bin/my_entrypoint.sh` in your container, +then you can register it as an entrypoint with +`ENTRYPOINT ["/usr/bin/my_entrypoint.sh"]` in your Dockerfile, or with +[`entrypoint`](https://modal.com/docs/reference/modal.Image#entrypoint) as an +Image build step. + +```python +import modal + +image = ( + modal.Image.debian_slim() + .pip_install("foo") + .entrypoint(["/usr/bin/my_entrypoint.sh"]) +) +``` + +#### `ENV` + +We currently don't support default values in +[interpolations](https://docs.docker.com/compose/compose-file/12-interpolation/), +such as `${VAR:-default}` + +#### Fast pull from registry + +# Fast pull from registry + +The performance of pulling public and private images from registries into Modal +can be significantly improved by adopting the [eStargz](https://github.com/containerd/stargz-snapshotter/blob/main/docs/estargz.md) compression format. + +By applying eStargz compression during your image build and push, Modal will be much +more efficient at pulling down your image from the registry. + +## How to use estargz + +If you have [Buildkit](https://docs.docker.com/build/buildkit/) version greater than `0.10.0`, adopting `estargz` is as simple as +adding some flags to your `docker buildx build` command: + +- `type=registry` flag will instruct BuildKit to push the image after building. + - If you do not push the image from immediately after build and instead attempt to push it later with docker push, the image will be converted to a standard gzip image. +- `compression=estargz` specifies that we are using the [eStargz](https://github.com/containerd/stargz-snapshotter/blob/main/docs/estargz.md) compression format. +- `oci-mediatypes=true` specifies that we are using the OCI media types, which is required for eStargz. +- `force-compression=true` will recompress the entire image and convert the base image to eStargz if it is not already. + +```bash +docker buildx build --tag "//:" \ +--output type=registry,compression=estargz,force-compression=true,oci-mediatypes=true \ +. +``` + +Then reference the container image as normal in your Modal code. + +```python notest +app = modal.App( + "example-estargz-pull", + image=modal.Image.from_registry( + "public.ecr.aws/modal/estargz-example-images:text-generation-v1-esgz" + ) +) +``` + +At build time you should see the eStargz-enabled puller activate: + +``` +Building image im-TinABCTIf12345ydEwTXYZ + +=> Step 0: FROM public.ecr.aws/modal/estargz-example-images:text-generation-v1-esgz +Using estargz to speed up image pull (index loaded in 1.86s)... +Progress: 10% complete... (1.11s elapsed) +Progress: 20% complete... (3.10s elapsed) +Progress: 30% complete... (4.18s elapsed) +Progress: 40% complete... (4.76s elapsed) +Progress: 50% complete... (5.51s elapsed) +Progress: 62% complete... (6.17s elapsed) +Progress: 74% complete... (6.99s elapsed) +Progress: 81% complete... (7.23s elapsed) +Progress: 99% complete... (8.90s elapsed) +Progress: 100% complete... (8.90s elapsed) +Copying image... +Copied image in 5.81s +``` + +## Supported registries + +Currently, Modal supports fast estargz pulling images with the following registries: + +- AWS Elastic Container Registry (ECR) +- Docker Hub (docker.io) +- Google Artifact Registry (gcr.io, pkg.dev) + +We are working on adding support for GitHub Container Registry (ghcr.io). + +### GPUs and other resources + +#### GPU acceleration + +# GPU acceleration + +Modal makes it easy to run your code on [GPUs](https://modal.com/gpu-glossary/readme). + +## Quickstart + +Here's a simple example of a Function running on an A100 in Modal: + +```python +import modal + +image = modal.Image.debian_slim().pip_install("torch") +app = modal.App(image=image) + +@app.function(gpu="A100") +def run(): + import torch + + assert torch.cuda.is_available() +``` + +## Specifying GPU type + +You can pick a specific GPU type for your Function via the `gpu` argument. +Modal supports the following values for this parameter: + +- `T4` +- `L4` +- `A10` +- `A100` +- `A100-40GB` +- `A100-80GB` +- `L40S` +- `H100`/`H100!` +- `H200` +- `B200` + +For instance, to use a B200, you can use `@app.function(gpu="B200")`. + +Refer to our [pricing page](https://modal.com/pricing) for the latest pricing on each GPU type. + +## Specifying GPU count + +You can specify more than 1 GPU per container by appending `:n` to the GPU +argument. For instance, to run a Function with eight H100s: + +```python + +@app.function(gpu="H100:8") +def run_llama_405b_fp8(): + ... +``` + +Currently B200, H200, H100, A100, L4, T4 and L40S instances support up to 8 GPUs (up to 1,536 GB GPU RAM), +and A10 instances support up to 4 GPUs (up to 96 GB GPU RAM). Note that requesting +more than 2 GPUs per container will usually result in larger wait times. These +GPUs are always attached to the same physical machine. + +## Picking a GPU + +For running, rather than training, neural networks, we recommend starting off +with the [L40S](https://resources.nvidia.com/en-us-l40s/l40s-datasheet-28413), +which offers an excellent trade-off of cost and performance and 48 GB of GPU +RAM for storing model weights and activations. + +For more on how to pick a GPU for use with neural networks like LLaMA or Stable +Diffusion, and for tips on how to make that GPU go brrr, check out +[Tim Dettemers' blog post](https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/) +or the +[Full Stack Deep Learning page on Cloud GPUs](https://fullstackdeeplearning.com/cloud-gpus/). + +## B200 GPUs + +Modal's most powerful GPUs are the [B200s](https://www.nvidia.com/en-us/data-center/dgx-b200/), +NVIDIA's flagship data center chip for the Blackwell [architecture](https://modal.com/gpu-glossary/device-hardware/streaming-multiprocessor-architecture). + +To request a B200, set the `gpu` argument to `"B200"` + +```python +@app.function(gpu="B200:8") +def run_deepseek(): + ... +``` + +Check out [this example](https://modal.com/docs/examples/llm_inference) to see how you can use B200s to max out vLLM serving performance for LLaMA 3.1-8B. + +Before you jump for the most powerful (and so most expensive) GPU, make sure you +understand where the bottlenecks are in your computations. For example, running +language models with small batch sizes (e.g. one prompt at a time) results in a +[bottleneck on memory, not arithmetic](https://kipp.ly/transformer-inference-arithmetic/). +Since arithmetic throughput has risen faster than memory throughput in recent +hardware generations, speedups for memory-bound GPU jobs are not as extreme and +may not be worth the extra cost. + +## H200 and H100 GPUs + +[H200s](https://www.nvidia.com/en-us/data-center/h200/) and [H100s](https://www.nvidia.com/en-us/data-center/h100/) are the previous +generation of top-of-the-line data center chips from NVIDIA, based on the Hopper [architecture](https://modal.com/gpu-glossary/device-hardware/streaming-multiprocessor-architecture). +These GPUs have better software support than do Blackwell GPUs (e.g. popular libraries include pre-compiled kernels for Hopper, but not Blackwell), +and they often get the job done at a competitive cost, so they are a common choice of accelerator, on and off Modal. + +All H100 GPUs on the Modal platform are of the SXM variant, as can be verified by examining the +[power draw](https://modal.com/docs/guide/gpu-metrics) in the dashboard or with `nvidia-smi`. + +### Automatic upgrades to H200s + +Modal may automatically upgrade a `gpu="H100"` request to run on an H200. +This automatic upgrade does _not_ change the cost of the GPU. + +Kernels [compatible](https://modal.com/gpu-glossary/device-software/compute-capability) with H200s are also compatible with H100s, +so your code will still run, just faster, so long as it doesn't make strict assumptions about memory capacity. +An H200’s [HBM3e memory](https://modal.com/gpu-glossary/device-hardware/gpu-ram) +has a capacity of 141 GB and a bandwidth of 4.8TB/s, 1.75x larger and 1.4x faster than an NVIDIA H100 with HBM3. + +In cases where an automatic upgrade to H200 would not be helpful (for instance, benchmarking) you can pass +`gpu=H100!` to avoid it. + +## A100 GPUs + +[A100s](https://www.nvidia.com/en-us/data-center/a100/) are based on NVIDIA's Ampere [architecture](https://modal.com/gpu-glossary/device-hardware/streaming-multiprocessor-architecture). +Modal offers two versions of the A100: one with 40 GB of RAM and another with 80 GB of RAM. + +To request an A100 with 40 GB of [GPU memory](https://modal.com/gpu-glossary/device-hardware/gpu-ram), use `gpu="A100"`: + +```python +@app.function(gpu="A100") +def qwen_7b(): + ... +``` + +Modal may automatically upgrade a `gpu="A100"` request to run on an 80 GB A100. +This automatic upgrade does _not_ change the cost of the GPU. + +You can specifically request a 40GB A100 with the string `A100-40GB`. +To specifically request an 80 GB A100, use the string `A100-80GB`: + +```python +@app.function(gpu="A100-80GB") +def llama_70b_fp8(): + ... +``` + +## GPU fallbacks + +Modal allows specifying a list of possible GPU types, suitable for Functions that are +compatible with multiple options. Modal respects the ordering of this list and +will try to allocate the most preferred GPU type before falling back to less +preferred ones. + +```python +@app.function(gpu=["H100", "A100-40GB:2"]) +def run_on_80gb(): + ... +``` + +See [this example](https://modal.com/docs/examples/gpu_fallbacks) for more detail. + +## Multi GPU training + +Modal currently supports multi-GPU training on a single node, with multi-node training in closed beta ([contact us](https://modal.com/slack) for access). +Depending on which framework you are using, you may need to use different techniques to train on multiple GPUs. + +If the framework re-executes the entrypoint of the Python process (like [PyTorch Lightning](https://lightning.ai/docs/pytorch/stable/index.html)) you need to either set the strategy to `ddp_spawn` or `ddp_notebook` if you wish to invoke the training directly. Another option is to run the training script as a subprocess instead. + +```python +@app.function(gpu="A100:2") +def run(): + import subprocess + import sys + subprocess.run( + ["python", "train.py"], + stdout=sys.stdout, stderr=sys.stderr, + check=True, + ) +``` + +## Examples and more resources + +For more information about GPUs in general, check out our [GPU Glossary](https://modal.com/gpu-glossary/readme). + +Or take a look some examples of Modal apps using GPUs: + +- [Fine-tune a character LoRA for your pet](https://modal.com/docs/examples/diffusers_lora_finetune) +- [Fast LLM inference on big GPUs](https://modal.com/docs/examples/llm_inference) +- [Stable Diffusion with a CLI, API, and web UI](https://modal.com/docs/examples/text_to_image) +- [Rendering Blender videos](https://modal.com/docs/examples/blender_video) + +#### Using CUDA on Modal + +# Using CUDA on Modal + +Modal makes it easy to accelerate your workloads with datacenter-grade NVIDIA GPUs. + +To take advantage of the hardware, you need to use matching software: the CUDA stack. +This guide explains the components of that stack and how to install them on Modal. +For more on which GPUs are available on Modal and how to choose a GPU for your use case, +see [this guide](https://modal.com/docs/guide/gpu). For a deep dive on both the +[GPU hardware](https://modal.com/gpu-glossary/device-hardware) and [software](https://modal.com/gpu-glossary/device-software) +and for even more detail on [the CUDA stack](https://modal.com/gpu-glossary/host-software/), +see our [GPU Glossary](https://modal.com/gpu-glossary/readme). + +Here's the tl;dr: + +- The [NVIDIA Accelerated Graphics Driver for Linux-x86_64](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#driver-installation), version 575.57.08, + and [CUDA Driver API](https://docs.nvidia.com/cuda/archive/12.9.0/cuda-driver-api/index.html), version 12.9, are already installed. + You can call `nvidia-smi` or run compiled CUDA programs from any Modal Function with access to a GPU. +- That means you can install many popular libraries like `torch` that bundle their other CUDA dependencies [with a simple `pip_install`](#install-gpu-accelerated-torch-and-transformers-with-pip_install). +- For bleeding-edge libraries like `flash-attn`, you may need to install CUDA dependencies manually. + To make your life easier, [use an existing image](#for-more-complex-setups-use-an-officially-supported-cuda-image). + +## What is CUDA? + +When someone refers to "installing CUDA" or "using CUDA", +they are referring not to a library, but to a +[stack](https://modal.com/gpu-glossary/host-software/cuda-software-platform) with multiple layers. +Your application code (and its dependencies) can interact +with the stack at different levels. + +![The CUDA stack](../../assets/docs/cuda-stack-diagram.png) + +This leads to a lot of confusion. To help clear that up, the following sections explain each component in detail. + +### Level 0: Kernel-mode driver components + +At the lowest level are the [_kernel-mode driver components_](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/#nvidia-open-gpu-kernel-modules). +The Linux kernel is essentially a single program operating the entire machine and all of its hardware. +To add hardware to the machine, this program is extended by loading new modules into it. +These components communicate directly with hardware -- in this case the GPU. + +Because they are kernel modules, these driver components are tightly integrated with the host operating system +that runs your containerized Modal Functions and are not something you can inspect or change yourself. + +### Level 1: User-mode driver API + +All action in Linux that doesn't occur in the kernel occurs in [user space](https://en.wikipedia.org/wiki/User_space). +To talk to the kernel drivers from our user space programs, we need _user-mode driver components_. + +Most prominently, that includes: + +- the [CUDA Driver API](https://modal.com/gpu-glossary/host-software/cuda-driver-api), + a [shared object](https://en.wikipedia.org/wiki/Shared_library) called `libcuda.so`. + This object exposes functions like [`cuMemAlloc`](https://docs.nvidia.com/cuda/archive/12.8.0/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1gb82d2a09844a58dd9e744dc31e8aa467), + for allocating GPU memory. +- the [NVIDIA management library](https://developer.nvidia.com/management-library-nvml), `libnvidia-ml.so`, and its command line interface [`nvidia-smi`](https://developer.nvidia.com/system-management-interface). + You can use these tools to check the status of the system's GPU(s). + +These components are installed on all Modal machines with access to GPUs. +Because they are user-level components, you can use them directly: + +```python runner:ModalRunner +import modal + +app = modal.App() + +@app.function(gpu="any") +def check_nvidia_smi(): + import subprocess + output = subprocess.check_output(["nvidia-smi"], text=True) + assert "Driver Version:" in output + assert "CUDA Version:" in output + print(output) + return output +``` + +### Level 2: CUDA Toolkit + +Wrapping the CUDA Driver API is the [CUDA Runtime API](https://modal.com/gpu-glossary/host-software/cuda-runtime-api), the `libcudart.so` shared library. +This API includes functions like [`cudaLaunchKernel`](https://docs.nvidia.com/cuda/archive/12.8.0/cuda-runtime-api/group__CUDART__HIGHLEVEL.html#group__CUDART__HIGHLEVEL_1g7656391f2e52f569214adbfc19689eb3) +and is more commonly used in CUDA programs (see [this HackerNews comment](https://news.ycombinator.com/item?id=20616385) for color commentary on why). +This shared library is _not_ installed by default on Modal. + +The CUDA Runtime API is generally installed as part of the larger [NVIDIA CUDA Toolkit](https://docs.nvidia.com/cuda/index.html), +which includes the [NVIDIA CUDA compiler driver](https://modal.com/gpu-glossary/host-software/nvcc) (`nvcc`) and its toolchain +and a number of [useful goodies](https://modal.com/gpu-glossary/host-software/cuda-binary-utilities) for writing and debugging CUDA programs (`cuobjdump`, `cudnn`, profilers, etc.). + +Contemporary GPU-accelerated machine learning workloads like LLM inference frequently make use of many components of the CUDA Toolkit, +such as the run-time compilation library [`nvrtc`](https://docs.nvidia.com/cuda/archive/12.8.0/nvrtc/index.html). + +So why aren't these components installed along with the drivers? +A compiled CUDA program can run without the CUDA Runtime API installed on the system, +by [statically linking](https://en.wikipedia.org/wiki/Static_library) the CUDA Runtime API into the program binary, +though this is fairly uncommon for CUDA-accelerated Python programs. +Additionally, older versions of these components are needed for some applications +and some application deployments even use several versions at once. +Both patterns are compatible with the host machine driver provided on Modal. + +## Install GPU-accelerated `torch` and `transformers` with `pip_install` + +The components of the CUDA Toolkit can be installed via `pip`, +via PyPI packages like [`nvidia-cuda-runtime-cu12`](https://pypi.org/project/nvidia-cuda-runtime-cu12/) +and [`nvidia-cuda-nvrtc-cu12`](https://pypi.org/project/nvidia-cuda-nvrtc-cu12/). +These components are listed as dependencies of some popular GPU-accelerated Python libraries, like `torch`. + +Because Modal already includes the lower parts of the CUDA stack, you can install these libraries +with [the `pip_install` method of `modal.Image`](https://modal.com/docs/guide/images#add-python-packages-with-pip_install), just like any other Python library: + +```python +image = modal.Image.debian_slim().pip_install("torch") + +@app.function(gpu="any", image=image) +def run_torch(): + import torch + has_cuda = torch.cuda.is_available() + print(f"It is {has_cuda} that torch can access CUDA") + return has_cuda +``` + +Many libraries for running open-weights models, like `transformers` and `vllm`, +use `torch` under the hood and so can be installed in the same way: + +```python +image = modal.Image.debian_slim().pip_install("transformers[torch]") +image = image.apt_install("ffmpeg") # for audio processing + +@app.function(gpu="any", image=image) +def run_transformers(): + from transformers import pipeline + transcriber = pipeline(model="openai/whisper-tiny.en", device="cuda") + result = transcriber("https://modal-cdn.com/mlk.flac") + print(result["text"]) # I have a dream that one day this nation will rise up live out the true meaning of its creed +``` + +## For more complex setups, use an officially-supported CUDA image + +The disadvantage of installing the CUDA stack via `pip` is that +many other libraries that depend on its components being installed as normal system packages cannot find them. + +For these cases, we recommend you use an image that already has the full CUDA stack installed as system packages +and all environment variables set correctly, like the [`nvidia/cuda:*-devel-*` images on Docker Hub](https://hub.docker.com/r/nvidia/cuda). + +[TensorRT-LLM](https://nvidia.github.io/TensorRT-LLM/overview.html) is an inference engine that accelerates and optimizes performance for the large language models. It requires the full CUDA toolkit for installation. + +```python +cuda_version = "12.8.1" # should be no greater than host CUDA version +flavor = "devel" # includes full CUDA toolkit +operating_sys = "ubuntu24.04" +tag = f"{cuda_version}-{flavor}-{operating_sys}" +HF_CACHE_PATH = "/cache" + +image = ( + modal.Image.from_registry(f"nvidia/cuda:{tag}", add_python="3.12") + .entrypoint([]) # remove verbose logging by base image on entry + .apt_install("libopenmpi-dev") # required for tensorrt + .pip_install("tensorrt-llm==0.19.0", "pynvml", extra_index_url="https://pypi.nvidia.com") + .pip_install("hf-transfer", "huggingface_hub[hf_xet]") + .env({"HF_HUB_CACHE": HF_CACHE_PATH, "HF_HUB_ENABLE_HF_TRANSFER": "1", "PMIX_MCA_gds": "hash"}) +) + +app = modal.App("tensorrt-llm", image=image) +hf_cache_volume = modal.Volume.from_name("hf_cache_tensorrt", create_if_missing=True) + +@app.function(gpu="A10G", volumes={HF_CACHE_PATH: hf_cache_volume}) +def run_tiny_model(): + from tensorrt_llm import LLM, SamplingParams + + sampling_params = SamplingParams(temperature=0.8, top_p=0.95) + + llm = LLM(model="TinyLlama/TinyLlama-1.1B-Chat-v1.0") + + output = llm.generate("The capital of France is", sampling_params) + print(f"Generated text: {output.outputs[0].text}") + return output.outputs[0].text +``` + +Make sure to choose a version of CUDA that is no greater than the version provided by the host machine. +Older minor (`12.*`) versions are guaranteed to be compatible with the host machine's driver, +but older major (`11.*`, `10.*`, etc.) versions may not be. + +## What next? + +For more on accessing and choosing GPUs on Modal, check out [this guide](https://modal.com/docs/guide/gpu). +To dive deep on GPU internals, check out our [GPU Glossary](https://modal.com/gpu-glossary/readme). + +To see these installation patterns in action, check out these examples: + +- [Fast LLM inference on big GPUs](https://modal.com/docs/examples/llm_inference) +- [Finetune a character LoRA for your pet](https://modal.com/docs/examples/diffusers_lora_finetune) +- [Optimized Flux inference](https://modal.com/docs/examples/flux) + +#### Reserving CPU and memory + +# Reserving CPU and memory + +Each Modal container has a default reservation of 0.125 CPU cores and 128 MiB of memory. +Containers can exceed this minimum if the worker has available CPU or memory. +You can also guarantee access to more resources by requesting a higher reservation. + +## CPU cores + +If you have code that must run on a larger number of cores, you can +request that using the `cpu` argument. This allows you to specify a +floating-point number of CPU cores: + +```python +import modal + +app = modal.App() + +@app.function(cpu=8.0) +def my_function(): + # code here will have access to at least 8.0 cores + ... +``` + +Note that this value corresponds to physical cores, not vCPUs. + +Modal also will set several environment variables that control multi-threading +behavior in linear algebra libraries (e.g., `OPENBLAS_NUM_THREADS`, +`OMP_NUM_THREADS`, `MKL_NUM_THREADS`) based on your CPU reservation. + +## Memory + +If you have code that needs more guaranteed memory, you can request it using the +`memory` argument. This expects an integer number of megabytes: + +```python +import modal + +app = modal.App() + +@app.function(memory=32768) +def my_function(): + # code here will have access to at least 32 GiB of RAM + ... +``` + +## How much can I request? + +For both CPU and memory, a maximum is enforced at Function creation time to +ensure your containers can be scheduled for execution. Requests exceeding the +maximum will be rejected with an +[`InvalidError`](https://modal.com/docs/reference/modal.exception#modalexceptioninvaliderror). + +## Billing + +For CPU and memory, you'll be charged based on whichever is higher: your reservation or actual usage. + +Disk requests are billed by increasing the memory request at a 20:1 ratio. For example, requesting 500 GiB of disk will increase the memory request to 25 GiB, if it is not already set higher. + +## Resource limits + +### CPU limits + +Modal containers have a default soft CPU limit that is set at 16 physical cores above the CPU request. +Given that the default CPU request is 0.125 cores, the default soft CPU limit is 16.125 cores. +Above this limit, the host will begin to throttle the CPU usage of the container. + +You can alternatively set the CPU limit explicitly: + +```python +cpu_request = 1.0 +cpu_limit = 4.0 +@app.function(cpu=(cpu_request, cpu_limit)) +def f(): + ... +``` + +### Memory limits + +Modal containers can have a hard memory limit which will 'Out of Memory' (OOM) kill +containers which attempt to exceed the limit. This functionality is useful when a process +has a serious memory leak. You can set the limit and have the container killed to avoid paying +for the leaked GBs of memory. + +Specify this limit using the [`memory` parameter](https://modal.com/docs/reference/modal.App#function): + +```python +mem_request = 1024 +mem_limit = 2048 +@app.function( + memory=(mem_request, mem_limit), +) +def f(): + ... +``` + +### Disk limits + +Running Modal containers have access to many GBs of SSD disk, but the amount +of writes is limited by: + +1. The size of the underlying worker's SSD disk capacity +2. A per-container disk quota that is set in the 100s of GBs. + +Hitting either limit will cause the container's disk writes to be rejected, which +typically manifests as an `OSError`. + +Increased disk sizes can be requested with the [`ephemeral_disk` parameter](https://modal.com/docs/reference/modal.App#function). The maximum +disk size is 3.0 TiB (3,145,728 MiB). Larger disks are intended to be used for [dataset processing](https://modal.com/docs/guide/dataset-ingestion). + +### Scaling out + +#### Scaling out + +# Scaling out + +Modal makes it trivially easy to scale compute across thousands of containers. +You won't have to worry about your App crashing if it goes viral or need to wait +a long time for your batch jobs to complete. + +For the the most part, scaling out will happen automatically, and you won't need +to think about it. But it can be helpful to understand how Modal's autoscaler +works and how you can control its behavior when you need finer control. + +## How does autoscaling work on Modal? + +Every Modal Function corresponds to an autoscaling pool of containers. The size +of the pool is managed by Modal's autoscaler. The autoscaler will spin up new +containers when there is no capacity available for new inputs, and it will spin +down containers when resources are idling. By default, Modal Functions will +scale to zero when there are no inputs to process. + +Autoscaling decisions are made quickly and frequently so that your batch jobs +can ramp up fast and your deployed Apps can respond to any sudden changes in +traffic. + +## Configuring autoscaling behavior + +Modal exposes a few settings that allow you to configure the autoscaler's +behavior. These settings can be passed to the `@app.function` or `@app.cls` +decorators: + +- `max_containers`: The upper limit on containers for the specific Function. +- `min_containers`: The minimum number of containers that should be kept warm, + even when the Function is inactive. +- `buffer_containers`: The size of the buffer to maintain while the Function is + active, so that additional inputs will not need to queue for a new container. +- `scaledown_window`: The maximum duration (in seconds) that individual + containers can remain idle when scaling down. + +In general, these settings allow you to trade off cost and latency. Maintaining +a larger warm pool or idle buffer will increase costs but reduce the chance that +inputs will need to wait for a new container to start. + +Similarly, a longer scaledown window will let containers idle for longer, which +might help avoid unnecessary churn for Apps that receive regular but infrequent +inputs. Note that containers may not wait for the entire scaledown window before +shutting down if the App is substantially overprovisioned. + +## Dynamic autoscaler updates + +It's also possible to update the autoscaler settings dynamically (i.e., without redeploying +the App) using the [`Function.update_autoscaler()`](https://modal.com/docs/reference/modal.Function#update_autoscaler) +method: + +```python notest +f = modal.Function.from_name("my-app", "f") +f.update_autoscaler(max_containers=100) +``` + +The autoscaler settings will revert to the configuration in the function +decorator the next time you deploy the App. Or they can be overridden by +further dynamic updates: + +```python notest +f.update_autoscaler(min_containers=2, max_containers=10) +f.update_autoscaler(min_containers=4) # max_containers=10 will still be in effect +``` + +A common pattern is to run this method in a [scheduled function](https://modal.com/docs/guide/cron) +that adjusts the size of the warm pool (or container buffer) based on the time of day: + +```python +@app.function() +def inference_server(): + ... + +@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York")) +def increase_warm_pool(): + inference_server.update_autoscaler(min_containers=4) + +@app.function(schedule=modal.Cron("0 22 * * *", timezone="America/New_York")) +def decrease_warm_pool(): + inference_server.update_autoscaler(min_containers=0) +``` + +When you have a [`modal.Cls`](https://modal.com/docs/reference/modal.Cls), `update_autoscaler` +is a method on an _instance_ and will control the autoscaling behavior of +containers serving the Function with that specific set of parameters: + +```python notest +MyClass = modal.Cls.from_name("my-app", "MyClass") +obj = MyClass(model_version="3.5") +obj.update_autoscaler(buffer_containers=2) # type: ignore +``` + +Note that it's necessary to disable type checking on this line, because the +object will appear as an instance of the class that you defined rather than the +Modal wrapper type. + +## Parallel execution of inputs + +If your code is running the same function repeatedly with different independent +inputs (e.g., a grid search), the easiest way to increase performance is to run +those function calls in parallel using Modal's +[`Function.map()`](https://modal.com/docs/reference/modal.Function#map) method. + +Here is an example if we had a function `evaluate_model` that takes a single +argument: + +```python +import modal + +app = modal.App() + +@app.function() +def evaluate_model(x): + ... + +@app.local_entrypoint() +def main(): + inputs = list(range(100)) + for result in evaluate_model.map(inputs): # runs many inputs in parallel + ... +``` + +In this example, `evaluate_model` will be called with each of the 100 inputs +(the numbers 0 - 99 in this case) roughly in parallel and the results are +returned as an iterable with the results ordered in the same way as the inputs. + +### Exceptions + +By default, if any of the function calls raises an exception, the exception will +be propagated. To treat exceptions as successful results and aggregate them in +the results list, pass in +[`return_exceptions=True`](https://modal.com/docs/reference/modal.Function#map). + +```python +@app.function() +def my_func(a): + if a == 2: + raise Exception("ohno") + return a ** 2 + +@app.local_entrypoint() +def main(): + print(list(my_func.map(range(3), return_exceptions=True, wrap_returned_exceptions=False))) + # [0, 1, Exception('ohno'))] +``` + +Note: prior to version 1.0.5, the returned exceptions inadvertently leaked an internal +wrapper type (`modal.exceptions.UserCodeException`). To avoid breaking any user code that +was checking exception types, we're taking a gradual approach to fixing this bug. Adding +`wrap_returned_exceptions=False` will opt-in to the future default behavior and return the +underlying exception type without a wrapper. + +### Starmap + +If your function takes multiple variable arguments, you can either use +[`Function.map()`](https://modal.com/docs/reference/modal.Function#map) with one input iterator +per argument, or [`Function.starmap()`](https://modal.com/docs/reference/modal.Function#starmap) +with a single input iterator containing sequences (like tuples) that can be +spread over the arguments. This works similarly to Python's built in `map` and +`itertools.starmap`. + +```python +@app.function() +def my_func(a, b): + return a + b + +@app.local_entrypoint() +def main(): + assert list(my_func.starmap([(1, 2), (3, 4)])) == [3, 7] +``` + +### Gotchas + +Note that `.map()` is a method on the modal function object itself, so you don't +explicitly _call_ the function. + +Incorrect usage: + +```python notest +results = evaluate_model(inputs).map() +``` + +Modal's map is also not the same as using Python's builtin `map()`. While the +following will technically work, it will execute all inputs in sequence rather +than in parallel. + +Incorrect usage: + +```python notest +results = map(evaluate_model, inputs) +``` + +## Asynchronous usage + +All Modal APIs are available in both blocking and asynchronous variants. If you +are comfortable with asynchronous programming, you can use it to create +arbitrary parallel execution patterns, with the added benefit that any Modal +functions will be executed remotely. See the [async guide](https://modal.com/docs/guide/async) or +the examples for more information about asynchronous usage. + +## GPU acceleration + +Sometimes you can speed up your applications by utilizing GPU acceleration. See +the [gpu section](https://modal.com/docs/guide/gpu) for more information. + +## Scaling Limits + +Modal enforces the following limits for every function: + +- 2,000 pending inputs (inputs that haven't been assigned to a container yet) +- 25,000 total inputs (which include both running and pending inputs) + +For inputs created with `.spawn()` for async jobs, Modal allows up to 1 million pending inputs instead of 2,000. + +If you try to create more inputs and exceed these limits, you'll receive a `Resource Exhausted` error, and you should retry your request later. If you need higher limits, please reach out! + +Additionally, each `.map()` invocation can process at most 1000 inputs concurrently. + +#### Input concurrency + +# Input concurrency + +As traffic to your application increases, Modal will automatically scale up the +number of containers running your Function: + +
+ +By default, each container will be assigned one input at a time. Autoscaling +across containers allows your Function to process inputs in parallel. This is +ideal when the operations performed by your Function are CPU-bound. + +For some workloads, though, it is inefficient for containers to process inputs +one-by-one. Modal supports these workloads with its _input concurrency_ feature, +which allows individual containers to process multiple inputs at the same time: + +
+ +When used effectively, input concurrency can reduce latency and lower costs. + +## Use cases + +Input concurrency can be especially effective for workloads that are primarily +I/O-bound, e.g.: + +- Querying a database +- Making external API requests +- Making remote calls to other Modal Functions + +For such workloads, individual containers may be able to concurrently process +large numbers of inputs with minimal additional latency. This means that your +Modal application will be more efficient overall, as it won't need to scale +containers up and down as traffic ebbs and flows. + +Another use case is to leverage _continuous batching_ on GPU-accelerated +containers. Frameworks such as [vLLM](https://modal.com/docs/examples/llm_inference) can +achieve the benefits of batching across multiple inputs even when those +inputs do not arrive simultaneously (because new batches are formed for each +forward pass of the model). + +Note that for CPU-bound workloads, input concurrency will likely not be as +effective (or will even be counterproductive), and you may want to use +Modal's [_dynamic batching_ feature](https://modal.com/docs/guide/dynamic-batching) instead. + +## Enabling input concurrency + +To enable input concurrency, add the `@modal.concurrent` decorator: + +```python +@app.function() +@modal.concurrent(max_inputs=100) +def my_function(input: str): + ... + +``` + +When using the class pattern, the decorator should be applied at the level of +the _class_, not on individual methods: + +```python +@app.cls() +@modal.concurrent(max_inputs=100) +class MyCls: + + @modal.method() + def my_method(self, input: str): + ... +``` + +Because all methods on a class will be served by the same containers, a class +with input concurrency enabled will concurrently run distinct methods in +addition to multiple inputs for the same method. + +**Note:** The `@modal.concurrent` decorator was added in v0.73.148 of the Modal +Python SDK. Input concurrency could previously be enabled by setting the +`allow_concurrent_inputs` parameter on the `@app.function` decorator. + +## Setting a concurrency target + +When using the `@modal.concurrent` decorator, you must always configure the +maximum number of inputs that each container will concurrently process. If +demand exceeds this limit, Modal will automatically scale up more containers. + +Additional inputs may need to queue up while these additional containers cold +start. To help avoid degraded latency during scaleup, the `@modal.concurrent` +decorator has a separate `target_inputs` parameter. When set, Modal's autoscaler +will aim for this target as it provisions resources. If demand increases faster +than new containers can spin up, the active containers will be allowed to burst +above the target up to the `max_inputs` limit: + +```python +@app.function() +@modal.concurrent(max_inputs=120, target_inputs=100) # Allow a 20% burst +def my_function(input: str): + ... +``` + +It may take some experimentation to find the right settings for these parameters +in your particular application. Our suggestion is to set the `target_inputs` +based on your desired latency and the `max_inputs` based on resource constraints +(i.e., to avoid GPU OOM). You may also consider the relative latency cost of +scaling up a new container versus overloading the existing containers. + +## Concurrency mechanisms + +Modal uses different concurrency mechanisms to execute your Function depending +on whether it is defined as synchronous or asynchronous. Each mechanism imposes +certain requirements on the Function implementation. Input concurrency is an +advanced feature, and it's important to make sure that your implementation +complies with these requirements to avoid unexpected behavior. + +For synchronous Functions, Modal will execute concurrent inputs on separate +threads. _This means that the Function implementation must be thread-safe._ + +```python +# Each container can execute up to 10 inputs in separate threads +@app.function() +@modal.concurrent(max_inputs=10) +def sleep_sync(): + # Function must be thread-safe + time.sleep(1) +``` + +For asynchronous Functions, Modal will execute concurrent inputs using +separate `asyncio` tasks on a single thread. This does not require thread +safety, but it does mean that the Function needs to participate in +collaborative multitasking (i.e., it should not block the event loop). + +```python +# Each container can execute up to 10 inputs with separate async tasks +@app.function() +@modal.concurrent(max_inputs=10) +async def sleep_async(): + # Function must not block the event loop + await asyncio.sleep(1) +``` + +## Gotchas + +Input concurrency is a powerful feature, but there are a few caveats that can +be useful to be aware of before adopting it. + +### Input cancellations + +Synchronous and asynchronous Functions handle input cancellations differently. +Modal will raise a `modal.exception.InputCancellation` exception in synchronous +Functions and an `asyncio.CancelledError` in asynchronous Functions. + +When using input concurrency with a synchronous Function, a single input +cancellation will terminate the entire container. If your workflow depends on +graceful input cancellations, we recommend using an asynchronous +implementation. + +### Concurrent logging + +The separate threads or tasks that are executing the concurrent inputs will +write any logs to the same stream. This makes it difficult to associate logs +with a specific input, and filtering for a specific function call in Modal's web +dashboard will show logs for all inputs running at the same time. + +To work around this, we recommend including a unique identifier in the messages +you log (either your own identifier or the `modal.current_input_id()`) so that +you can use the search functionality to surface logs for a specific input: + +```python +@app.function() +@modal.concurrent(max_inputs=10) +async def better_concurrent_logging(x: int): + logger.info(f"{modal.current_input_id()}: Starting work with {x}") +``` + +#### Batch processing + +# Batch Processing + +Modal is optimized for large-scale batch processing, allowing functions to scale to thousands of parallel containers with zero additional configuration. Function calls can be submitted asynchronously for background execution, eliminating the need to wait for jobs to finish or tune resource allocation. + +This guide covers Modal's batch processing capabilities, from basic invocation to integration with existing pipelines. + +## Background Execution with `.spawn_map` + +The fastest way to submit multiple jobs for asynchronous processing is by invoking a function with `.spawn_map`. When combined with the [`--detach`](https://modal.com/docs/reference/cli/run) flag, your App continues running until all jobs are completed. + +Here's an example of submitting 100,000 videos for parallel embedding. You can disconnect after submission, and the processing will continue to completion in the background: + +```python +# Kick off asynchronous jobs with `modal run --detach batch_processing.py` +import modal + +app = modal.App("batch-processing-example") +volume = modal.Volume.from_name("video-embeddings", create_if_missing=True) + +@app.function(volumes={"/data": volume}) +def embed_video(video_id: int): + # Business logic: + # - Load the video from the volume + # - Embed the video + # - Save the embedding to the volume + ... + +@app.local_entrypoint() +def main(): + embed_video.spawn_map(range(100_000)) +``` + +This pattern works best for jobs that store results externally—for example, in a [Modal Volume](https://modal.com/docs/guide/volumes), [Cloud Bucket Mount](https://modal.com/docs/guide/cloud-bucket-mounts), or your own database\*. + +_\* For database connections, consider using [Modal Proxy](https://modal.com/docs/guide/proxy-ips) to maintain a static IP across thousands of containers._ + +## Parallel Processing with `.map` + +Using `.map` allows you to offload expensive computations to powerful machines while gathering results. This is particularly useful for pipeline steps with bursty resource demands. Modal handles all infrastructure provisioning and de-provisioning automatically. + +Here's how to implement parallel video similarity queries as a single Modal function call: + +```python +# Run jobs and collect results with `modal run gather.py` +import modal + +app = modal.App("gather-results-example") + +@app.function(gpu="L40S") +def compute_video_similarity(query: str, video_id: int) -> tuple[int, int]: + # Embed video with GPU acceleration & compute similarity with query + return video_id, score + +@app.local_entrypoint() +def main(): + import itertools + + queries = itertools.repeat("Modal for batch processing") + video_ids = range(100_000) + + for video_id, score in compute_video_similarity.map(queries, video_ids): + # Process results (e.g., extract top 5 most similar videos) + pass +``` + +This example runs `compute_video_similarity` on an autoscaling pool of L40S GPUs, returning scores to a local process for further processing. + +## Integration with Existing Systems + +The recommended way to use Modal Functions within your existing data pipeline is through [deployed function invocation](https://modal.com/docs/guide/trigger-deployed-functions). After deployment, you can call Modal functions from external systems: + +```python +def external_function(inputs): + compute_similarity = modal.Function.from_name( + "gather-results-example", + "compute_video_similarity" + ) + for result in compute_similarity.map(inputs): + # Process results + pass +``` + +You can invoke Modal Functions from any Python context, gaining access to built-in observability, resource management, and GPU acceleration. + +#### Job queues + +# Job processing + +Modal can be used as a scalable job queue to handle asynchronous tasks submitted +from a web app or any other Python application. This allows you to offload up to 1 million +long-running or resource-intensive tasks to Modal, while your main application +remains responsive. + +## Creating jobs with .spawn() + +The basic pattern for using Modal as a job queue involves three key steps: + +1. Defining and deploying the job processing function using `modal deploy`. +2. Submitting a job using + [`modal.Function.spawn()`](https://modal.com/docs/reference/modal.Function#spawn) +3. Polling for the job's result using + [`modal.FunctionCall.get()`](https://modal.com/docs/reference/modal.FunctionCall#get) + +Here's a simple example that you can run with `modal run my_job_queue.py`: + +```python +# my_job_queue.py +import modal + +app = modal.App("my-job-queue") + +@app.function() +def process_job(data): + # Perform the job processing here + return {"result": data} + +def submit_job(data): + # Since the `process_job` function is deployed, need to first look it up + process_job = modal.Function.from_name("my-job-queue", "process_job") + call = process_job.spawn(data) + return call.object_id + +def get_job_result(call_id): + function_call = modal.FunctionCall.from_id(call_id) + try: + result = function_call.get(timeout=5) + except modal.exception.OutputExpiredError: + result = {"result": "expired"} + except TimeoutError: + result = {"result": "pending"} + return result + +@app.local_entrypoint() +def main(): + data = "my-data" + + # Submit the job to Modal + call_id = submit_job(data) + print(get_job_result(call_id)) +``` + +In this example: + +- `process_job` is the Modal function that performs the actual job processing. + To deploy the `process_job` function on Modal, run + `modal deploy my_job_queue.py`. +- `submit_job` submits a new job by first looking up the deployed `process_job` + function, then calling `.spawn()` with the job data. It returns the unique ID + of the spawned function call. +- `get_job_result` attempts to retrieve the result of a previously submitted job + using [`FunctionCall.from_id()`](https://modal.com/docs/reference/modal.FunctionCall#from_id) and + [`FunctionCall.get()`](https://modal.com/docs/reference/modal.FunctionCall#get). + [`FunctionCall.get()`](https://modal.com/docs/reference/modal.FunctionCall#get) waits indefinitely + by default. It takes an optional timeout argument that specifies the maximum + number of seconds to wait, which can be set to 0 to poll for an output + immediately. Here, if the job hasn't completed yet, we return a pending + response. +- The results of a `.spawn()` are accessible via `FunctionCall.get()` for up to + 7 days after completion. After this period, we return an expired response. + +[Document OCR Web App](https://modal.com/docs/examples/doc_ocr_webapp) is an example that uses +this pattern. + +## Integration with web frameworks + +You can easily integrate the job queue pattern with web frameworks like FastAPI. +Here's an example, assuming that you have already deployed `process_job` on +Modal with `modal deploy` as above. This example won't work if you haven't +deployed your app yet. + +```python +# my_job_queue_endpoint.py +import modal + +image = modal.Image.debian_slim().pip_install("fastapi[standard]") +app = modal.App("fastapi-modal", image=image) + +@app.function() +@modal.asgi_app() +@modal.concurrent(max_inputs=20) +def fastapi_app(): + from fastapi import FastAPI + + web_app = FastAPI() + + @web_app.post("/submit") + async def submit_job_endpoint(data): + process_job = modal.Function.from_name("my-job-queue", "process_job") + + call = await process_job.spawn.aio(data) + return {"call_id": call.object_id} + + @web_app.get("/result/{call_id}") + async def get_job_result_endpoint(call_id: str): + function_call = modal.FunctionCall.from_id(call_id) + try: + result = await function_call.get.aio(timeout=0) + except modal.exception.OutputExpiredError: + return fastapi.responses.JSONResponse(content="", status_code=404) + except TimeoutError: + return fastapi.responses.JSONResponse(content="", status_code=202) + + return result + + return web_app +``` + +In this example: + +- The `/submit` endpoint accepts job data, submits a new job using + `await process_job.spawn.aio()`, and returns the job's ID to the client. +- The `/result/{call_id}` endpoint allows the client to poll for the job's + result using the job ID. If the job hasn't completed yet, it returns a 202 + status code to indicate that the job is still being processed. If the job + has expired, it returns a 404 status code to indicate that the job is not found. + +You can try this app by serving it with `modal serve`: + +```shell +modal serve my_job_queue_endpoint.py +``` + +Then interact with its endpoints with `curl`: + +```shell +# Make a POST request to your app endpoint with. +$ curl -X POST $YOUR_APP_ENDPOINT/submit?data=data +{"call_id":"fc-XXX"} + +# Use the call_id value from above. +$ curl -X GET $YOUR_APP_ENDPOINT/result/fc-XXX +``` + +## Scaling and reliability + +Modal automatically scales the job queue based on the workload, spinning up new +instances as needed to process jobs concurrently. It also provides built-in +reliability features like automatic retries and timeout handling. + +You can customize the behavior of the job queue by configuring the +`@app.function()` decorator with options like +[`retries`](https://modal.com/docs/guide/retries#function-retries), +[`timeout`](https://modal.com/docs/guide/timeouts#timeouts), and +[`max_containers`](https://modal.com/docs/guide/scale#configuring-autoscaling-behavior). + +#### Dynamic batching (beta) + +# Dynamic batching (beta) + +Modal's `@batched` feature allows you to accumulate requests +and process them in dynamically-sized batches, rather than one-by-one. + +Batching increases throughput at a potential cost to latency. +Batched requests can share resources and reuse work, reducing the time and cost per request. +Batching is particularly useful for GPU-accelerated machine learning workloads, +as GPUs are designed to maximize throughput and are frequently bottlenecked on shareable resources, +like weights stored in memory. + +Static batching can lead to unbounded latency, as the function waits for a fixed number of requests to arrive. +Modal's dynamic batching waits for the lesser of a fixed time _or_ a fixed number of requests before executing, +maximizing the throughput benefit of batching while minimizing the latency penalty. + +## Enable dynamic batching with `@batched` + +To enable dynamic batching, apply the +[`@modal.batched` decorator](https://modal.com/docs/reference/modal.batched) to the target +Python function. Then, wrap it in `@app.function()` and run it on Modal, +and the inputs will be accumulated and processed in batches. + +Here's what that looks like: + +```python +import modal + +app = modal.App() + +@app.function() +@modal.batched(max_batch_size=2, wait_ms=1000) +async def batch_add(xs: list[int], ys: list[int]) -> list[int]: + return [x + y for x, y in zip(xs, ys)] +``` + +When you invoke a function decorated with `@batched`, you invoke it asynchronously on individual inputs. +Outputs are returned where they were invoked. + +For instance, the code below invokes the decorated `batch_add` function above three times, but `batch_add` +only executes twice: + +```python continuation +@app.local_entrypoint() +async def main(): + inputs = [(1, 300), (2, 200), (3, 100)] + async for result in batch_add.starmap.aio(inputs): + print(f"Sum: {result}") + # Sum: 301 + # Sum: 202 + # Sum: 103 +``` + +The first time it is executed with `xs` batched to `[1, 2]` +and `ys` batched to `[300, 200]`. After about a one second delay, it is executed with `xs` +batched to `[3]` and `ys` batched to `[100]`. +The result is an iterator that yields `301`, `202`, and `101`. + +## Use `@batched` with functions that take and return lists + +For a Python function to be compatible with `@modal.batched`, it must adhere to +the following rules: + +- ** The inputs to the function must be lists. ** + In the example above, we pass `xs` and `ys`, which are both lists of `int`s. +- ** The function must return a list**. In the example above, the function returns + a list of sums. +- ** The lengths of all the input lists and the output list must be the same. ** + In the example above, if `L == len(xs) == len(ys)`, then `L == len(batch_add(xs, ys))`. + +## Modal `Cls` methods are compatible with dynamic batching + +Methods on Modal [`Cls`](https://modal.com/docs/guide/lifecycle-functions)es also support dynamic batching. + +```python +import modal + +app = modal.App() + +@app.cls() +class BatchedClass(): + @modal.batched(max_batch_size=2, wait_ms=1000) + async def batch_add(self, xs: list[int], ys: list[int]) -> list[int]: + return [x + y for x, y in zip(xs, ys)] +``` + +One additional rule applies to classes with Batched Methods: + +- If a class has a Batched Method, it **cannot have other Batched Methods or [Methods](https://modal.com/docs/reference/modal.method#modalmethod)**. + +## Configure the wait time and batch size of dynamic batches + +The `@batched` decorator takes in two required configuration parameters: + +- `max_batch_size` limits the number of inputs combined into a single batch. +- `wait_ms` limits the amount of time the Function waits for more inputs after + the first input is received. + +The first invocation of the Batched Function initiates a new batch, and subsequent +calls add requests to this ongoing batch. If `max_batch_size` is reached, +the batch immediately executes. If the `max_batch_size` is not met but `wait_ms` +has passed since the first request was added to the batch, the unfilled batch is +executed. + +### Selecting a batch configuration + +To optimize the batching configurations for your application, consider the following heuristics: + +- Set `max_batch_size` to the largest value your function can handle, so you + can amortize and parallelize as much work as possible. + +- Set `wait_ms` to the difference between your targeted latency and the execution time. Most applications + have a targeted latency, and this allows the latency of any request to stay + within that limit. + +## Serve web endpoints with dynamic batching + +Here's a simple example of serving a Function that batches requests dynamically +with a [`@modal.fastapi_endpoint`](https://modal.com/docs/guide/webhooks). Run +[`modal serve`](https://modal.com/docs/reference/cli/serve), submit requests to the endpoint, +and the Function will batch your requests on the fly. + +```python +import modal + +app = modal.App(image=modal.Image.debian_slim().pip_install("fastapi")) + +@app.function() +@modal.batched(max_batch_size=2, wait_ms=1000) +async def batch_add(xs: list[int], ys: list[int]) -> list[int]: + return [x + y for x, y in zip(xs, ys)] + +@app.function() +@modal.fastapi_endpoint(method="POST", docs=True) +async def add(body: dict[str, int]) -> dict[str, int]: + result = await batch_add.remote.aio(body["x"], body["y"]) + return {"result": result} +``` + +Now, you can submit requests to the web endpoint and process them in batches. For instance, the three requests +in the following example, which might be requests from concurrent clients in a real deployment, +will be batched into two executions: + +```python notest +import asyncio +import aiohttp + +async def send_post_request(session, url, data): + async with session.post(url, json=data) as response: + return await response.json() + +async def main(): + # Enter the URL of your web endpoint here + url = "https://workspace--app-name-endpoint-name.modal.run" + + async with aiohttp.ClientSession() as session: + # Submit three requests asynchronously + tasks = [ + send_post_request(session, url, {"x": 1, "y": 300}), + send_post_request(session, url, {"x": 2, "y": 200}), + send_post_request(session, url, {"x": 3, "y": 100}), + ] + results = await asyncio.gather(*tasks) + for result in results: + print(f"Sum: {result['result']}") + +asyncio.run(main()) +``` + +#### Multi-node clusters (beta) + +# Multi-node clusters (beta) + +> 🚄 Multi-node clusters with RDMA are in **private beta.** Please contact us via the [Modal Slack](https://modal.com/slack) or support@modal.com to get access. + +Modal supports running a training job across several coordinated containers. Each container can saturate the available GPU devices on its host (aka node) and communicate with peer containers which do the same. By scaling a training job from a single GPU to 16 GPUs you can achieve nearly 16x improvements in training time. + +### Cluster compute capability + +Modal H100 clusters provide: + +- A 50 Gbps [IPv6 private network](https://modal.com/docs/guide/private-networking) for orchestration, dataset downloading, etc. +- A 3,200 Gbps RDMA scale-out network ([RoCE](https://en.wikipedia.org/wiki/RDMA_over_Converged_Ethernet)). +- Up to 64 H100 SXM devices. +- At least 1 TB of RAM and 4 TB of local NVMe SSD per node. +- Deep burn-in testing. +- Interoperability with all Modal platform functionality ([Volumes](https://modal.com/docs/guide/volumes), [Dicts](https://modal.com/docs/guide/dicts), [Tunnels](https://modal.com/docs/guide/tunnels), etc.). + +The guide will walk you through how the Modal client library enables multi-node training and integrates with `torchrun`. + +### `@clustered` + +Unlike standard Modal Function containers, containers in a multi-node training job must be able to: + +1. Perform fast, direct network communication between each other. +2. Be scheduled together, all or nothing, at the same time. + +The `@clustered` decorator enables this behavior. + +```python +import modal.experimental + +@app.function( + gpu="H100:8", + timeout=60 * 60 * 24, + retries=modal.Retries(initial_delay=0.0, max_retries=10), +) +@modal.experimental.clustered(size=4) +def train_model(): + cluster_info = modal.experimental.get_cluster_info() + + container_rank = cluster_info.rank + world_size = len(cluster_info.container_ips) + main_addr = cluster_info.container_ips[0] + is_main = "(main)" if container_rank == 0 else "" + + print(f"{container_rank=} {is_main} {world_size=} {main_addr=}") + ... +``` + +Applying this decorator under `@app.function` modifies the Function so that remote calls to it are serviced by a multi-node container group. The above configuration creates a group of four containers each having 8 H100 GPU devices, for a total of 32 devices. + +## Scheduling + +A `modal.experimental.clustered` Function runs on multiple nodes in our cloud, but executes like a normal function call. For example, all nodes are scheduled together ([gang scheduling](https://en.wikipedia.org/wiki/Gang_scheduling)) so that your code runs on all of the requested hardware or not at all. + +Traditionally this kind of cluster and scheduling management would be handled by SLURM, Kubernetes, or manually. But with Modal it's all provided serverlessly with just a Python decorator! + +### Rank & input broadcast + +![diagram](https://modal-cdn.com/cdnbot/multinodepmgnla70_4b57a155.webp) + +You may notice above that a single `.remote` Function call created three input executions but returned only one output. This is how input-output is structured for multi-node training jobs on Modal. The Function call’s arguments are replicated to each container, but only the rank zero container’s is returned to the caller. + +A container’s rank is a key concept in multi-node jobs. Rank zero is the 'leader' rank and typically coordinates the job. Rank zero is also known as the "main" container. Rank zero's output will always be the output of a multi-node training run. + +## Networking + +Function containers cannot normally make direct network connections to other Function containers, but this is a requirement for multi-node training communication. So, along with gang scheduling, the `@clustered` decorator enables Modal’s workspace-private inter-container networking called [i6pn](https://www.notion.so/Multi-node-docs-1281e7f16949806f966adedfe8b2cb74?pvs=21). + +The [cluster networking guide](https://modal.com/docs/guide/private-networking) goes into more detail on i6pn, but the upshot is that each container in the cluster is made aware of the network address of all the other containers in the cluster, enabling them to communicate with each other quickly via [TCP](https://pytorch.org/docs/stable/elastic/rendezvous.html). + +### RDMA (Infiniband) + +H100 clusters are equipped with Infiniband providing up to 3,200 Gbps scale-out bandwidth for inter-node communication. +RDMA scale-out networking is enabled with the `rdma` parameter of `modal.experimental.clustered`. + +```python notest +@modal.experimental.clustered(size=2, rdma=True) +def train(): + ... +``` + +To run a simple Infiniband RDMA performance test see the [this sample code](https://github.com/modal-labs/multinode-training-guide/tree/main/benchmark). + +## Cluster Info + +`modal.experimental.get_cluster_info()` exposes the following information about the cluster: + +- `rank: int` is the current container's order within the cluster, starting from `0`, the leader. +- `container_ips: list[str]` contains the IPv6 addresses of each container in the cluster, sorted by rank. +- `container_ipv4_ips: list[str]` contains the IPv4 addresses of each container in the cluster, sorted by rank. + +## Fault Tolerance + +For a clustered Function, failures in inputs and containers are handled differently. + +If an input fails on any container, this failure **is not propagated** to other containers in the cluster. Containers are responsible for detecting and responding to input failures on other containers. + +Only rank 0's output matters: if an input fails on the leader container (rank 0), the input is marked as failed, even if the input succeeds on another container. Similarly, if an input succeeds on the leader container but fails on another container, the input will still be marked as successful. + +If a container in the cluster is preempted, Modal will terminate all remaining containers in the cluster, and retry the input. + +### Input Synchronization + +_**Important:**_ synchronization is not relevant for single training runs, and applies mostly to inference use-cases. + +Modal does not synchronize input execution across containers. Containers are responsible for ensuring that they do not process inputs faster than other containers in their cluster. + +In particular, it is important that the leader container (rank 0) only starts processing the next input after all other containers have finished processing the current input. + +## Examples + +To get hands-on with multi-node training you can jump into the [`multinode-training-guide` repository](https://github.com/modal-labs/multinode-training-guide) or [`modal-examples` repository](https://github.com/modal-labs/modal-examples/tree/main/14_clusters) and `modal run` something! + +- [Simple ‘hello world’ 4 X 1 H100 torch cluster example](https://github.com/modal-labs/modal-examples/blob/main/14_clusters/simple_torch_cluster.py) +- [Infiniband RDMA performance test](https://github.com/modal-labs/multinode-training-guide/tree/main/benchmark) +- [Use 2 x 8 H100s to train a ResNet50 model on the ImageNet dataset](https://github.com/modal-labs/multinode-training-guide/tree/main/resnet50) +- [Speedrun GPT-2 training with modded-nanogpt](https://github.com/modal-labs/multinode-training-guide/tree/main/nanoGPT) + + +### Torchrun Example + +```python +import modal +import modal.experimental + +image = ( + modal.Image.debian_slim(python_version="3.12") + .pip_install("torch~=2.5.1", "numpy~=2.2.1") + .add_local_dir( + "training", remote_path="/root/training" + ) +) +app = modal.App("example-simple-torch-cluster", image=image) + +n_nodes = 4 + +@app.function(gpu=f"H100:8", timeout=60 * 60 * 24) +@modal.experimental.clustered(size=n_nodes, rdma=True) +def launch_torchrun(): + # import the 'torchrun' interface directly. + from torch.distributed.run import parse_args, run + + cluster_info = modal.experimental.get_cluster_info() + + run( + parse_args( + [ + f"--nnodes={n_nodes}", + f"--node-rank={cluster_info.rank}", + f"--master-addr={cluster_info.container_ips[0]}", + "--nproc-per-node=8", + "--master-port=1234", + "training/train.py", + ] + ) + ) +``` + +### Deployment + +#### Apps, Functions, and entrypoints + +# Apps, Functions, and entrypoints + +An [`App`](https://modal.com/docs/reference/modal.App) represents an application running on Modal. It groups one or more Functions for atomic deployment and acts as a shared namespace. All Functions and Clses are associated with an +App. + +A [`Function`](https://modal.com/docs/reference/modal.Function) acts as an independent unit once it is deployed, and [scales up and down](https://modal.com/docs/guide/scale) independently from other Functions. If there are no live inputs to the Function then by default, no containers will run and your account will not be charged for compute resources, even if the App it belongs to is deployed. + +An App can be ephemeral or deployed. You can view a list of all currently running Apps on the [`apps`](https://modal.com/apps) page. + +The code for a Modal App defining two separate Functions might look something like this: + +```python + +import modal + +app = modal.App(name="my-modal-app") + +@app.function() +def f(): + print("Hello world!") + +@app.function() +def g(): + print("Goodbye world!") + +``` + +## Ephemeral Apps + +An ephemeral App is created when you use the +[`modal run`](https://modal.com/docs/reference/cli/run) CLI command, or the +[`app.run`](https://modal.com/docs/reference/modal.App#run) method. This creates a temporary +App that only exists for the duration of your script. + +Ephemeral Apps are stopped automatically when the calling program exits, or when +the server detects that the client is no longer connected. +You can use +[`--detach`](https://modal.com/docs/reference/cli/run) in order to keep an ephemeral App running even +after the client exits. + +By using `app.run` you can run your Modal apps from within your Python scripts: + +```python +def main(): + ... + with app.run(): + some_modal_function.remote() +``` + +By default, running your app in this way won't propagate Modal logs and progress bar messages. To enable output, use the [`modal.enable_output`](https://modal.com/docs/reference/modal.enable_output) context manager: + +```python +def main(): + ... + with modal.enable_output(): + with app.run(): + some_modal_function.remote() +``` + +## Deployed Apps + +A deployed App is created using the [`modal deploy`](https://modal.com/docs/reference/cli/deploy) +CLI command. The App is persisted indefinitely until you stop it via the +[web UI](https://modal.com/apps) or the [`modal app stop`](https://modal.com/docs/reference/cli/app#modal-app-stop) command. Functions in a deployed App that have an attached +[schedule](https://modal.com/docs/guide/cron) will be run on a schedule. Otherwise, you can +invoke them manually using +[web endpoints or Python](https://modal.com/docs/guide/trigger-deployed-functions). + +Deployed Apps are named via the [`App`](https://modal.com/docs/reference/modal.App#modalapp) +constructor. Re-deploying an existing `App` (based on the name) will update it +in place. + +## Entrypoints for ephemeral Apps + +The code that runs first when you `modal run` an App is called the "entrypoint". + +You can register a local entrypoint using the +[`@app.local_entrypoint()`](https://modal.com/docs/reference/modal.App#local_entrypoint) +decorator. You can also use a regular Modal function as an entrypoint, in which +case only the code in global scope is executed locally. + +### Argument parsing + +If your entrypoint function takes arguments with primitive types, `modal run` +automatically parses them as CLI options. For example, the following function +can be called with `modal run script.py --foo 1 --bar "hello"`: + +```python +# script.py + +@app.local_entrypoint() +def main(foo: int, bar: str): + some_modal_function.remote(foo, bar) +``` + +If you wish to use your own argument parsing library, such as `argparse`, you can instead accept a variable-length argument list for your entrypoint or your function. In this case, Modal skips CLI parsing and forwards CLI arguments as a tuple of strings. For example, the following function can be invoked with `modal run my_file.py --foo=42 --bar="baz"`: + +```python +import argparse + +@app.function() +def train(*arglist): + parser = argparse.ArgumentParser() + parser.add_argument("--foo", type=int) + parser.add_argument("--bar", type=str) + args = parser.parse_args(args = arglist) +``` + +### Manually specifying an entrypoint + +If there is only one `local_entrypoint` registered, +[`modal run script.py`](https://modal.com/docs/reference/cli/run) will automatically use it. If +you have no entrypoint specified, and just one decorated Modal function, that +will be used as a remote entrypoint instead. Otherwise, you can direct +`modal run` to use a specific entrypoint. + +For example, if you have a function decorated with +[`@app.function()`](https://modal.com/docs/reference/modal.App#function) in your file: + +```python +# script.py + +@app.function() +def f(): + print("Hello world!") + +@app.function() +def g(): + print("Goodbye world!") + +@app.local_entrypoint() +def main(): + f.remote() +``` + +Running [`modal run script.py`](https://modal.com/docs/reference/cli/run) will execute the `main` +function locally, which would call the `f` function remotely. However you can +instead run `modal run script.py::app.f` or `modal run script.py::app.g` to +execute `f` or `g` directly. + +## Apps were once Stubs + +The `modal.App` class in the client was previously called `modal.Stub`. The +old name was kept as an alias for some time, but from Modal 1.0.0 onwards, +using `modal.Stub` will result in an error. + +#### Managing deployments + +# Managing deployments + +Once you've finished using `modal run` or `modal serve` to iterate on your Modal +code, it's time to deploy. A Modal deployment creates and then persists an +application and its objects, providing the following benefits: + +- Repeated application function executions will be grouped under the deployment, + aiding observability and usage tracking. Programmatically triggering lots of + ephemeral App runs can clutter your web and CLI interfaces. +- Function calls are much faster because deployed functions are persistent and + reused, not created on-demand by calls. Learn how to trigger deployed + functions in + [Invoking deployed functions](https://modal.com/docs/guide/trigger-deployed-functions). +- [Scheduled functions](https://modal.com/docs/guide/cron) will continue scheduling separate from + any local iteration you do, and will notify you on failure. +- [Web endpoints](https://modal.com/docs/guide/webhooks) keep running when you close your laptop, + and their URL address matches the deployment name. + +## Creating deployments + +Deployments are created using the +[`modal deploy` command](https://modal.com/docs/reference/cli/app#modal-app-list). + +``` + % modal deploy -m whisper_pod_transcriber.main +✓ Initialized. View app page at https://modal.com/apps/ap-PYc2Tb7JrkskFUI8U5w0KG. +✓ Created objects. +├── 🔨 Created populate_podcast_metadata. +├── 🔨 Mounted /home/ubuntu/whisper_pod_transcriber at /root/whisper_pod_transcriber +├── 🔨 Created fastapi_app => https://modal-labs-whisper-pod-transcriber-fastapi-app.modal.run +├── 🔨 Mounted /home/ubuntu/whisper_pod_transcriber/whisper_frontend/dist at /assets +├── 🔨 Created search_podcast. +├── 🔨 Created refresh_index. +├── 🔨 Created transcribe_segment. +├── 🔨 Created transcribe_episode.. +└── 🔨 Created fetch_episodes. +✓ App deployed! 🎉 + +View Deployment: https://modal.com/apps/modal-labs/whisper-pod-transcriber +``` + +Running this command on an existing deployment will redeploy the App, +incrementing its version. For detail on how live deployed apps transition +between versions, see the [Updating deployments](#updating-deployments) section. + +Deployments can also be created programmatically using Modal's +[Python API](https://modal.com/docs/reference/modal.App#deploy). + +## Viewing deployments + +Deployments can be viewed either on the [apps](https://modal.com/apps) web page or by using the +[`modal app list` command](https://modal.com/docs/reference/cli/app#modal-app-list). + +## Updating deployments + +A deployment can deploy a new App or redeploy a new version of an existing +deployed App. It's useful to understand how Modal handles the transition between +versions when an App is redeployed. In general, Modal aims to support +zero-downtime deployments by gradually transitioning traffic to the new version. + +If the deployment involves building new versions of the Images used by the App, +the build process will need to complete succcessfully. The existing version of +the App will continue to handle requests during this time. Errors during the +build will abort the deployment with no change to the status of the App. + +After the build completes, Modal will start to bring up new containers running +the latest version of the App. The existing containers will continue handling +requests (using the previous version of the App) until the new containers have +completed their cold start. + +Once the new containers are ready, old containers will stop accepting new +requests. However, the old containers will continue running any requests they +had previously accepted. The old containers will not terminate until they have +finished processing all ongoing requests. + +Any warm pool containers will also be cycled during a deployment, as the +previous version's warm pool are now outdated. + +## Deployment rollbacks + +To quickly reset an App back to a previous version, you can perform a deployment +_rollback_. Rollbacks can be triggered from either the App dashboard or the CLI. +Rollback deployments look like new deployments: they increment the version number +and are attributed to the user who triggered the rollback. But the App's functions +and metadata will be reset to their previous state independently of your current +App codebase. + +Note that deployment rollbacks are supported only on the Team and Enterprise plans. + +## Stopping deployments + +Deployed apps can be stopped in the web UI by clicking the red "Stop app" button on +the App's "Overview" page, or alternatively from the command line using the +[`modal app stop` command](https://modal.com/docs/reference/cli/app#modal-app-stop). + +Stopping an App is a destructive action. Apps cannot be restarted from this state; +a new App will need to be deployed from the same source files. Objects associated +with stopped deployments will eventually be garbage collected. + +#### Invoking deployed functions + +# Invoking deployed functions + +Modal lets you take a function created by a +[deployment](https://modal.com/docs/guide/managing-deployments) and call it from other contexts. + +There are two ways of invoking deployed functions. If the invoking client is +running Python, then the same +[Modal client library](https://pypi.org/project/modal/) used to write Modal code +can be used. HTTPS is used if the invoking client is not running Python and +therefore cannot import the Modal client library. + +## Invoking with Python + +Some use cases for Python invocation include: + +- An existing Python web server (eg. Django, Flask) wants to invoke Modal + functions. +- You have split your product or system into multiple Modal applications that + deploy independently and call each other. + +### Function lookup and invocation basics + +Let's say you have a script `my_shared_app.py` and this script defines a Modal +app with a function that computes the square of a number: + +```python +import modal + +app = modal.App("my-shared-app") + +@app.function() +def square(x: int): + return x ** 2 +``` + +You can deploy this app to create a persistent deployment: + +``` +% modal deploy shared_app.py +✓ Initialized. +✓ Created objects. +├── 🔨 Created square. +├── 🔨 Mounted /Users/erikbern/modal/shared_app.py. +✓ App deployed! 🎉 + +View Deployment: https://modal.com/apps/erikbern/my-shared-app +``` + +Let's try to run this function from a different context. For instance, let's +fire up the Python interactive interpreter: + +```bash +% python +Python 3.9.5 (default, May 4 2021, 03:29:30) +[Clang 12.0.0 (clang-1200.0.32.27)] on darwin +Type "help", "copyright", "credits" or "license" for more information. +>>> import modal +>>> f = modal.Function.from_name("my-shared-app", "square") +>>> f.remote(42) +1764 +>>> +``` + +This works exactly the same as a regular modal `Function` object. For example, +you can `.map()` over functions invoked this way too: + +```bash +>>> f = modal.Function.from_name("my-shared-app", "square") +>>> f.map([1, 2, 3, 4, 5]) +[1, 4, 9, 16, 25] +``` + +#### Authentication + +The Modal Python SDK will read the token from `~/.modal.toml` which typically is +created using `modal token new`. + +Another method of providing the credentials is to set the environment variables +`MODAL_TOKEN_ID` and `MODAL_TOKEN_SECRET`. If you want to call a Modal function +from a context such as a web server, you can expose these environment variables +to the process. + +#### Lookup of lifecycle functions + +[Lifecycle functions](https://modal.com/docs/guide/lifecycle-functions) are defined on classes, +which you can look up in a different way. Consider this code: + +```python +import modal + +app = modal.App("my-shared-app") + +@app.cls() +class MyLifecycleClass: + @modal.enter() + def enter(self): + self.var = "hello world" + + @modal.method() + def foo(self): + return self.var +``` + +Let's say you deploy this app. You can then call the function by doing this: + +```bash +>>> cls = modal.Cls.from_name("my-shared-app", "MyLifecycleClass") +>>> obj = cls() # You can pass any constructor arguments here +>>> obj.foo.remote() +'hello world' +``` + +### Asynchronous invocation + +In certain contexts, a Modal client will need to trigger Modal functions without +waiting on the result. This is done by spawning functions and receiving a +[`FunctionCall`](https://modal.com/docs/reference/modal.FunctionCall) as a +handle to the triggered execution. + +The following is an example of a Flask web server (running outside Modal) which +accepts model training jobs to be executed within Modal. Instead of the HTTP +POST request waiting on a training job to complete, which would be infeasible, +the relevant Modal function is spawned and the +[`FunctionCall`](https://modal.com/docs/reference/modal.FunctionCall) +object is stored for later polling of execution status. + +```python +from uuid import uuid4 +from flask import Flask, jsonify, request + +app = Flask(__name__) +pending_jobs = {} + +... + +@app.route("/jobs", methods = ["POST"]) +def create_job(): + predict_fn = modal.Function.from_name("example", "train_model") + job_id = str(uuid4()) + function_call = predict_fn.spawn( + job_id=job_id, + params=request.json, + ) + pending_jobs[job_id] = function_call + return { + "job_id": job_id, + "status": "pending", + } +``` + +### Importing a Modal function between Modal apps + +You can also import one function defined in an app from another app: + +```python +import modal + +app = modal.App("another-app") + +square = modal.Function.from_name("my-shared-app", "square") + +@app.function() +def cube(x): + return x * square.remote(x) + +@app.local_entrypoint() +def main(): + assert cube.remote(42) == 74088 +``` + +### Comparison with HTTPS + +Compared with HTTPS invocation, Python invocation has the following benefits: + +- Avoids the need to create web endpoint functions. +- Avoids handling serialization of request and response data between Modal and + your client. +- Uses the Modal client library's built-in authentication. + - Web endpoints are public to the entire internet, whereas function `lookup` + only exposes your code to you (and your org). +- You can work with shared Modal functions as if they are normal Python + functions, which might be more convenient. + +## Invoking with HTTPS + +Any application that can make HTTPS requests can interact with deployed Modal +applications via [web endpoint functions](https://modal.com/docs/guide/webhooks). Note that +all deployed web endpoint functions have [a stable HTTPS +URL](https://modal.com/docs/guide/webhook-urls). + +Some use cases for HTTPS invocation include: + +- Calling Modal functions from a web browser client running JavaScript +- Calling Modal functions from backend services in languages we don't yet have + official SDKs for (Java, Ruby, etc.) +- Calling Modal functions using UNIX tools (`curl`, `wget`) + +However, if the client of your Modal deployment is running Python, JavaScript, +or Go, it's better to use the [Modal Python +SDK](https://pypi.org/project/modal/) or [libmodal SDKs for JavaScript and +Go](https://modal.com/docs/guide/sdk-javascript-go) to invoke your Modal code. + +For more detail on setting up functions for invocation over HTTP see the +[web endpoints guide](https://modal.com/docs/guide/webhooks). + +#### Continuous deployment + +# Continuous deployment + +It's a common pattern to auto-deploy your Modal App as part of a CI/CD pipeline. +To get you started, below is a guide to doing continuous deployment of a Modal +App in GitHub. + +## GitHub Actions + +Here's a sample GitHub Actions workflow that deploys your App on every push to +the `main` branch. + +This requires you to create a [Modal token](https://modal.com/settings/tokens) and add it as a +[secret for your Github Actions workflow](https://docs.github.com/en/actions/how-tos/write-workflows/choose-what-workflows-do/use-secrets). + +After setting up secrets, create a new workflow file in your repository at +`.github/workflows/ci-cd.yml` with the following contents: + +```yaml +name: CI/CD + +on: + push: + branches: + - main + +jobs: + deploy: + name: Deploy + runs-on: ubuntu-latest + env: + MODAL_TOKEN_ID: ${{ secrets.MODAL_TOKEN_ID }} + MODAL_TOKEN_SECRET: ${{ secrets.MODAL_TOKEN_SECRET }} + + steps: + - name: Checkout Repository + uses: actions/checkout@v4 + + - name: Install Python + uses: actions/setup-python@v5 + with: + python-version: "3.10" + + - name: Install Modal + run: | + python -m pip install --upgrade pip + pip install modal + + - name: Deploy job + run: | + modal deploy -m my_package.my_file +``` + +Be sure to replace `my_package.my_file` with your actual entrypoint. + +If you use multiple Modal [Environments](https://modal.com/docs/guide/environments), you can +additionally specify the target environment in the YAML using +`MODAL_ENVIRONMENT=xyz`. + +#### Running untrusted code in Functions + +# Running untrusted code in Functions + +Modal provides two primitives for running untrusted code: Restricted Functions and [Sandboxes](https://modal.com/docs/guide/sandboxes). While both can be used for running untrusted code, they serve different purposes: Sandboxes provide a container-like interface while Restricted Functions provide an interface similar to a traditional Function. + +Restricted Functions are useful for executing: + +- Code generated by language models (LLMs) +- User-submitted code in interactive environments +- Third-party plugins or extensions + +## Using `restrict_modal_access` + +To restrict a Function's access to Modal resources, set `restrict_modal_access=True` on the Function definition: + +```python +import modal + +app = modal.App() + +@app.function(restrict_modal_access=True) +def run_untrusted_code(code_input: str): + # This function cannot access Modal resources + return eval(code_input) +``` + +When `restrict_modal_access` is enabled: + +- The Function cannot access Modal resources (Queues, Dicts, etc.) +- The Function cannot call other Functions +- The Function cannot access Modal's internal APIs + +## Comparison with Sandboxes + +While both `restrict_modal_access` and [Sandboxes](https://modal.com/docs/guide/sandboxes) can be used for running untrusted code, they serve different purposes: + +| Feature | Restricted Function | Sandbox | +| --------- | ------------------------------ | ---------------------------------------------- | +| State | Stateless | Stateful | +| Interface | Function-like | Container-like | +| Setup | Simple decorator | Requires explicit creation/termination | +| Use case | Quick, isolated code execution | Interactive development, long-running sessions | + +## Best Practices + +When running untrusted code, consider these additional security measures: + +1. Use `single_use_containers=True` to ensure each container only handles one request. Containers that get reused could cause information leakage between users. + +```python +@app.function(restrict_modal_access=True, single_use_containers=True) +def isolated_function(input_data): + # Each input gets a fresh container + return process(input_data) +``` + +2. Set appropriate timeouts to prevent long-running operations: + +```python +@app.function( + restrict_modal_access=True, + timeout=30, # 30 second timeout + single_use_containers=True +) +def time_limited_function(input_data): + return process(input_data) +``` + +3. Consider using `block_network=True` to prevent the container from making outbound network requests: + +```python +@app.function( + restrict_modal_access=True, + block_network=True, + single_use_containers=True +) +def network_isolated_function(input_data): + return process(input_data) +``` + +4. Minimize the App source that's included in the container + +A restricted Modal Function will have read access to its source files in the +container, so you'll want to avoid including anything that would be harmful +if exfiltrated by the untrusted process. + +If deploying an App from within a [larger package](https://modal.com/docs/guide/project-structure), +the entire package source may be automatically included by default. A best +practice would be to make the untrusted Function part of a standalone App that +includes the minimum necessary files to run: + +```python +restricted_app = modal.App("restricted-app", include_source=False) + +image = ( + modal.Image.debian_slim() + .add_local_file("restricted_executor.py", "/root/restricted_executor.py") +) + +@restricted_app.function( + restrict_modal_access=True, + block_network=True, + single_use_containers=True +) +def isolated_function(input_data): + return process(input_data) +``` + +## Example: Running LLM-generated Code + +Below is a complete example of running code generated by a language model: + +```python +import modal + +app = modal.App("restricted-access-example") + +@app.function(restrict_modal_access=True, single_use_containers=True, timeout=30, block_network=True) +def run_llm_code(generated_code: str): + try: + # Create a restricted environment + execution_scope = {} + + # Execute the generated code + exec(generated_code, execution_scope) + + # Return the result if it exists + return execution_scope.get("result", None) + except Exception as e: + return f"Error executing code: {str(e)}" + +@app.local_entrypoint() +def main(): + # Example LLM-generated code + code = """ +def calculate_fibonacci(n): + if n <= 1: + return n + return calculate_fibonacci(n-1) + calculate_fibonacci(n-2) + +result = calculate_fibonacci(10) + """ + + result = run_llm_code.remote(code) + print(f"Result: {result}") + +``` + +This example locks down the container to ensure that the code is safe to execute by: + +- Restricting Modal access +- Using a fresh container for each execution +- Setting a timeout +- Blocking network access +- Catching and handling potential errors + +## Error Handling + +When a restricted Function attempts to access Modal resources, it will raise an `AuthError`: + +```python +@app.function(restrict_modal_access=True) +def restricted_function(q: modal.Queue): + try: + # This will fail because the Function is restricted + return q.get() + except modal.exception.AuthError as e: + return f"Access denied: {e}" +``` + +The error message will indicate that the operation is not permitted due to restricted Modal access. + +### Modal Sandboxes + +#### Sandboxes + +# Sandboxes + +In addition to the Function interface, Modal has a direct +interface for defining containers _at runtime_ and securely running arbitrary code +inside them. + +This can be useful if, for example, you want to: + +- Execute code generated by a language model. +- Create isolated environments for running untrusted code. +- Check out a git repository and run a command against it, like a test suite, or + `npm lint`. +- Run containers with arbitrary dependencies and setup scripts. + +Each individual job is called a **Sandbox** and can be created using the +[`Sandbox.create`](https://modal.com/docs/reference/modal.Sandbox#create) constructor: + + {#snippet python()} + +```python notest +import modal + +app = modal.App.lookup("my-app", create_if_missing=True) + +sb = modal.Sandbox.create(app=app) + +p = sb.exec("python", "-c", "print('hello')", timeout=3) +print(p.stdout.read()) + +p = sb.exec("bash", "-c", "for i in {1..10}; do date +%T; sleep 0.5; done", timeout=5) +for line in p.stdout: + # Avoid double newlines by using end="". + print(line, end="") + +sb.terminate() +``` + +{/snippet} + +{#snippet javascript()} + +```javascript notest + +const modal = new ModalClient(); +const app = await modal.apps.fromName("my-app", { + createIfMissing: true, +}); +const image = modal.images.fromRegistry("python:3.13-slim"); + +const sb = await modal.sandboxes.create(app, image); + +const p = await sb.exec(["python", "-c", "print('hello')"], { + timeout: 3 * 1000, +}); +console.log(await p.stdout.readText()); + +const p2 = await sb.exec( + ["bash", "-c", "for i in {1..10}; do date +%T; sleep 0.5; done"], + { timeout: 5 * 1000 }, +); +for await (const line of p2.stdout) { + process.stdout.write(line); +} + +await sb.terminate(); +``` + +{/snippet} + +{#snippet go()} + +```go notest +package main + +import ( + "context" + "fmt" + "io" + "os" + "time" + + "github.com/modal-labs/libmodal/modal-go" +) + +func main() { + ctx := context.Background() + mc, _ := modal.NewClient() + + app, _ := mc.Apps.FromName(ctx, "my-app", &modal.AppFromNameParams{ + CreateIfMissing: true, + }) + image := mc.Images.FromRegistry("python:3.13-slim", nil) + + sb, _ := mc.Sandboxes.Create(ctx, app, image, nil) + defer sb.Terminate(context.Background()) + + p, _ := sb.Exec(ctx, []string{"python", "-c", "print('hello')"}, &modal.SandboxExecParams{ + Timeout: 3 * time.Second, + }) + stdout, _ := io.ReadAll(p.Stdout) + fmt.Println(string(stdout)) + + p2, _ := sb.Exec(ctx, []string{"bash", "-c", "for i in {1..10}; do date +%T; sleep 0.5; done"}, &modal.SandboxExecParams{ + Timeout: 5 * time.Second, + }) + io.Copy(os.Stdout, p2.Stdout) +} +``` + +{/snippet} + + +**Note:** you can run the above example as a script directly with `python my_script.py`. `modal run` is not needed here since there is no [entrypoint](https://modal.com/docs/guide/apps#entrypoints-for-ephemeral-apps). + +Sandboxes require an [`App`](https://modal.com/docs/guide/apps) to be passed when spawned from outside +of a Modal container. You may pass in a regular `App` object or look one up by name with +[`App.lookup`](https://modal.com/docs/reference/modal.App#lookup). The `create_if_missing` flag on `App.lookup` +will create an `App` with the given name if it doesn't exist. + +## Lifecycle + +### Timeouts + +Sandboxes have a default maximum lifetime of 5 minutes. You can change this by passing +a `timeout` of up to 24 hours to the `Sandbox.create(...)` function. + + {#snippet python()} + +```python notest +sb = modal.Sandbox.create(app=my_app, timeout=10*60) # 10 minutes +``` + +{/snippet} + +{#snippet javascript()} + +```javascript notest +const sb = await modal.sandboxes.create(app, image, { + timeout: 10 * 60 * 1000, // 10 minutes +}); +``` + +{/snippet} + +{#snippet go()} + +```go notest +sb, err := mc.Sandboxes.Create(ctx, app, image, &modal.SandboxCreateParams{ + Timeout: 10 * time.Minute, +}) +``` + +{/snippet} + + +If you need a Sandbox to run for more than 24 hours, we recommend using +[Filesystem Snapshots](https://modal.com/docs/guide/sandbox-snapshots) to preserve its state, +and then restore from that snapshot with a subsequent Sandbox. + +### Idle Timeouts + +Sandboxes can also be automatically terminated after a period of inactivity - you can do this by setting the `idle_timeout` parameter. A Sandbox is considered active if any of the following are true: + +1. It has an active [command](https://modal.com/docs/guide/sandbox-spawn) running (via [`sb.exec(...)`](https://modal.com/docs/reference/modal.Sandbox#exec)) +2. Its stdin is being written to (via [`sb.stdin.write()`](https://modal.com/docs/reference/modal.Sandbox#stdin)) +3. It has an open TCP connection over one of its [Tunnels](https://modal.com/docs/guide/tunnels) + +## Configuration + +Sandboxes support nearly all configuration options found in regular `modal.Function`s. +Refer to [`Sandbox.create`](https://modal.com/docs/reference/modal.Sandbox#create) for further documentation +on Sandbox configs. + +For example, Images and Volumes can be used just as with functions: + + {#snippet python()} + +```python notest +sb = modal.Sandbox.create( + image=modal.Image.debian_slim().pip_install("pandas"), + volumes={"/data": modal.Volume.from_name("my-volume")}, + workdir="/repo", + app=my_app, +) +``` + +{/snippet} + +{#snippet javascript()} + +```javascript notest +const image = modal.images.fromRegistry("python:3.13-slim"); +const volume = modal.volumes.fromName("my-volume"); +const sb = await modal.sandboxes.create(app, image, { + volumes: { "/data": volume }, + workdir: "/repo", +}); +``` + +{/snippet} + +{#snippet go()} + +```go notest +image := mc.Images.FromRegistry("python:3.13-slim", nil) +volume := mc.Volumes.FromName("my-volume", nil) +sb, err := mc.Sandboxes.Create(ctx, app, image, &modal.SandboxCreateParams{ + Volumes: map[string]*modal.Volume{"/data": volume}, + Workdir: "/repo", +}) +``` + +{/snippet} + + +## Environments + +### Environment variables + +You can set environment variables using inline secrets: + + {#snippet python()} + +```python notest +secret = modal.Secret.from_dict({"MY_SECRET": "hello"}) + +sb = modal.Sandbox.create( + secrets=[secret], + app=my_app, +) +p = sb.exec("bash", "-c", "echo $MY_SECRET") +print(p.stdout.read()) +``` + +{/snippet} + +{#snippet javascript()} + +```javascript notest +const secret = modal.secrets.fromObject({ MY_SECRET: "hello" }); +const image = modal.images.fromRegistry("python:3.13-slim"); + +const sb = await modal.sandboxes.create(app, image, { + secrets: [secret], +}); +const p = await sb.exec(["bash", "-c", "echo $MY_SECRET"]); +console.log(await p.stdout.readText()); +``` + +{/snippet} + +{#snippet go()} + +```go notest +secret, err := mc.Secrets.FromMap(ctx, map[string]string{"MY_SECRET": "hello"}, nil) +image := mc.Images.FromRegistry("python:3.13-slim", nil) + +sb, err := mc.Sandboxes.Create(ctx, app, image, &modal.SandboxCreateParams{ + Secrets: []*modal.Secret{secret}, +}) +p, err := sb.Exec(ctx, []string{"bash", "-c", "echo $MY_SECRET"}, nil) +stdout, err := io.ReadAll(p.Stdout) +fmt.Println(string(stdout)) +``` + +{/snippet} + + +### Custom Images + +Sandboxes support custom images just as Functions do. However, while you'll typically +invoke a Modal Function with the `modal run` cli, you typically spawn a Sandbox +with a simple script call. As such, you may need to manually enable output streaming +to see your image build logs: + + {#snippet python()} + +```python notest +image = modal.Image.debian_slim().pip_install("pandas", "numpy") + +with modal.enable_output(): + sb = modal.Sandbox.create(image=image, app=my_app) +``` + +{/snippet} + +{#snippet javascript()} + +```javascript notest +const image = modal.images + .fromRegistry("python:3.13-slim") + .dockerfileCommands(["RUN pip install pandas numpy"]); + +const sb = await modal.sandboxes.create(app, image); +``` + +{/snippet} + +{#snippet go()} + +```go notest +image := mc.Images.FromRegistry("python:3.13-slim", nil). + DockerfileCommands([]string{"RUN pip install pandas numpy"}, nil) + +// Note: Image build logs are automatically streamed in Go +sb, err := mc.Sandboxes.Create(ctx, app, image, nil) +``` + +{/snippet} + + +### Dynamically defined environments + +Note that any valid `Image` or `Mount` can be used with a Sandbox, even if those +images or mounts have not previously been defined. This also means that Images and +Mounts can be built from requirements at **runtime**. For example, you could +use a language model to write some code and define your image, and then spawn a +Sandbox with it. Check out [devlooper](https://github.com/modal-labs/devlooper) +for a concrete example of this. + +## Running a Sandbox with an entrypoint + +In most cases, Sandboxes are treated as a generic container that can run arbitrary +commands. However, in some cases, you may want to run a single command or script +as the entrypoint of the Sandbox. You can do this by passing command arguments to the +Sandbox constructor: + + {#snippet python()} + +```python notest +sb = modal.Sandbox.create("python", "-m", "http.server", "8080", app=my_app, timeout=10) +for line in sb.stdout: + print(line, end="") +``` + +{/snippet} + +{#snippet javascript()} + +```javascript notest +const sb = await modal.sandboxes.create(app, image, { + entrypoint: ["python", "-m", "http.server", "8080"], + timeout: 10 * 1000, +}); +``` + +{/snippet} + +{#snippet go()} + +```go notest +sb, err := mc.Sandboxes.Create(ctx, app, image, &modal.SandboxCreateParams{ + Entrypoint: []string{"python", "-m", "http.server", "8080"}, + Timeout: 10 * time.Second, +}) +``` + +{/snippet} + + +This functionality is most useful for running long-lived services that you want +to keep running in the background. See our [Jupyter notebook example](https://modal.com/docs/examples/jupyter_sandbox) +for a more concrete example of this. + +## Referencing Sandboxes from other code + +If you have a running Sandbox, you can retrieve it using the `from_id` method. + + {#snippet python()} + +```python notest +sb = modal.Sandbox.create(app=my_app) +sb_id = sb.object_id + +# ... later in the program ... + +sb2 = modal.Sandbox.from_id(sb_id) + +p = sb2.exec("echo", "hello") +print(p.stdout.read()) +sb2.terminate() +``` + +{/snippet} + +{#snippet javascript()} + +```javascript notest +const sb = await modal.sandboxes.create(app, image); +const sbId = sb.sandboxId; + +// ... later in the program ... + +const sb2 = await modal.sandboxes.fromId(sbId); + +const p = await sb2.exec(["echo", "hello"]); +console.log(await p.stdout.readText()); +await sb2.terminate(); +``` + +{/snippet} + +{#snippet go()} + +```go notest +sb, err := mc.Sandboxes.Create(ctx, app, image, nil) +sbId := sb.SandboxID + +// ... later in the program ... + +sb2, err := mc.Sandboxes.FromID(ctx, sbId) + +p, err := sb2.Exec(ctx, []string{"echo", "hello"}, nil) +stdout, err := io.ReadAll(p.Stdout) +fmt.Println(string(stdout)) +sb2.Terminate(ctx) +``` + +{/snippet} + + +A common use case for this is keeping a pool of Sandboxes available for executing tasks +as they come in. You can keep a list of `object_id`s of Sandboxes that are "open" and +reuse them, closing over the `object_id` in whatever function is using them. + +## Logging + +You can see Sandbox execution logs using the `verbose` option. For example: + + {#snippet python()} + +```python notest +sb = modal.Sandbox.create(app=my_app, verbose=True) + +p = sb.exec("python", "-c", "print('hello')") +print(p.stdout.read()) + +with sb.open("test.txt", "w") as f: + f.write("Hello World\n") +``` + +{/snippet} + +{#snippet javascript()} + +```javascript notest +const sb = await modal.sandboxes.create(app, image, { verbose: true }); +``` + +{/snippet} + +{#snippet go()} + +```go notest +sb, err := mc.Sandboxes.Create(ctx, app, image, &modal.SandboxCreateParams{ + Verbose: true, +}) +``` + +{/snippet} + + +shows Sandbox logs: + +``` +Sandbox exec started: python -c print('hello') +Opened file 'test.txt': fd-yErSQzGL9sig6WAjyNgTPR +Wrote to file: fd-yErSQzGL9sig6WAjyNgTPR +Closed file: fd-yErSQzGL9sig6WAjyNgTPR +``` + +## Named Sandboxes + +You can assign a name to a Sandbox when creating it. Each name must be unique within an app - +only one _running_ Sandbox can use a given name at a time. Note that the associated app must be +a deployed app. Once a Sandbox completely stops running, its name becomes available for reuse. +Some applications find Sandbox Names to be useful for ensuring that no more than one Sandbox is +running per resource or project. If a Sandbox with the given name is already running, `create()` +will raise an error. + + {#snippet python()} + +```python notest +sb1 = modal.Sandbox.create(app=my_app, name="my-name") +# This will raise a modal.exception.AlreadyExistsError. +sb2 = modal.Sandbox.create(app=my_app, name="my-name") +``` + +{/snippet} + +{#snippet javascript()} + +```javascript notest +const sb1 = await modal.sandboxes.create(app, image, { name: "my-name" }); +// this will raise an AlreadyExistsError +const sb2 = await modal.sandboxes.create(app, image, { name: "my-name" }); +``` + +{/snippet} + +{#snippet go()} + +```go notest +sb1, err := mc.Sandboxes.Create(ctx, app, image, &modal.SandboxCreateParams{ + Name: "my-name", +}) +// this will return an error +sb2, err := mc.Sandboxes.Create(ctx, app, image, &modal.SandboxCreateParams{ + Name: "my-name", +}) +``` + +{/snippet} + + +A named Sandbox may be fetched from a deployed app using `from_name()` _but only +if the Sandbox is currently running_. If no running Sandbox is found, `from_name()` will raise +an error. + + {#snippet python()} + +```python notest +my_app = modal.App.lookup("my-app", create_if_missing=True) +sb1 = modal.Sandbox.create(app=my_app, name="my-name") +# Returns the currently running Sandbox with the name "my-name" from the +# deployed app named "my-app". +sb2 = modal.Sandbox.from_name("my-app", "my-name") +assert sb1.object_id == sb2.object_id # sb1 and sb2 refer to the same Sandbox +``` + +{/snippet} + +{#snippet javascript()} + +```javascript notest +const app = await modal.apps.fromName("my-app", { createIfMissing: true }); +const sb1 = await modal.sandboxes.create(app, image, { name: "my-name" }); +// returns the currently running Sandbox with the name "my-name" from the +// deployed app named "my-app". +const sb2 = await modal.sandboxes.fromName("my-app", "my-name"); +console.assert(sb1.sandboxId === sb2.sandboxId); // sb1 and sb2 refer to the same Sandbox +``` + +{/snippet} + +{#snippet go()} + +```go notest +app, err := mc.Apps.FromName(ctx, "my-app", &modal.AppFromNameParams{ + CreateIfMissing: true, +}) +sb1, err := mc.Sandboxes.Create(ctx, app, image, &modal.SandboxCreateParams{ + Name: "my-name", +}) +// returns the currently running Sandbox with the name "my-name" from the +// deployed app named "my-app". +sb2, err := mc.Sandboxes.FromName(ctx, "my-app", "my-name", nil) +// sb1 and sb2 refer to the same Sandbox +fmt.Println(sb1.SandboxID == sb2.SandboxID) +``` + +{/snippet} + + +Sandbox Names may contain only alphanumeric characters, dashes, periods, and underscores, and must +be shorter than 64 characters. + +## Tagging + +Sandboxes can also be tagged with arbitrary key-value pairs. These tags can be used +to filter results in `Sandbox.list`. + + {#snippet python()} + +```python notest +sandbox_v1_1 = modal.Sandbox.create("sleep", "10", app=my_app) +sandbox_v1_2 = modal.Sandbox.create("sleep", "20", app=my_app) + +sandbox_v1_1.set_tags({"major_version": "1", "minor_version": "1"}) +sandbox_v1_2.set_tags({"major_version": "1", "minor_version": "2"}) + +for sandbox in modal.Sandbox.list(app_id=my_app.app_id): # All sandboxes. + print(sandbox.object_id) + +for sandbox in modal.Sandbox.list( + app_id=my_app.app_id, + tags={"major_version": "1"}, +): # Also all sandboxes. + print(sandbox.object_id) + +for sandbox in modal.Sandbox.list( + app_id=app.app_id, + tags={"major_version": "1", "minor_version": "2"}, +): # Just the latest sandbox. + print(sandbox.object_id) +``` + +{/snippet} + +{#snippet javascript()} + +```javascript notest +const sandboxV1_1 = await modal.sandboxes.create(app, image, { + command: ["sleep", "10"], +}); +const sandboxV1_2 = await modal.sandboxes.create(app, image, { + command: ["sleep", "20"], +}); + +await sandboxV1_1.setTags({ major_version: "1", minor_version: "1" }); +await sandboxV1_2.setTags({ major_version: "1", minor_version: "2" }); + +// All sandboxes. +for await (const sandbox of modal.sandboxes.list({ appId: app.appId })) { + console.log(sandbox.sandboxId); +} + +// Also all sandboxes. +for await (const sandbox of modal.sandboxes.list({ + appId: app.appId, + tags: { major_version: "1" }, +})) { + console.log(sandbox.sandboxId); +} + +// Just the latest sandbox. +for await (const sandbox of modal.sandboxes.list({ + appId: app.appId, + tags: { major_version: "1", minor_version: "2" }, +})) { + console.log(sandbox.sandboxId); +} +``` + +{/snippet} + +{#snippet go()} + +```go notest +sandboxV1_1, err := mc.Sandboxes.Create(ctx, app, image, &modal.SandboxCreateParams{ + Command: []string{"sleep", "10"}, +}) +sandboxV1_2, err := mc.Sandboxes.Create(ctx, app, image, &modal.SandboxCreateParams{ + Command: []string{"sleep", "20"}, +}) + +sandboxV1_1.SetTags(ctx, map[string]string{"major_version": "1", "minor_version": "1"}) +sandboxV1_2.SetTags(ctx, map[string]string{"major_version": "1", "minor_version": "2"}) + +// All sandboxes. +it, _ := mc.Sandboxes.List(ctx, &modal.SandboxListParams{ + AppID: app.AppID, +}) +for sandbox := range it { + fmt.Println(sandbox.SandboxID) +} + +// Also all sandboxes. +it, _ = mc.Sandboxes.List(ctx, &modal.SandboxListParams{ + AppID: app.AppID, + Tags: map[string]string{"major_version": "1"}, +}) +for sandbox := range it { + fmt.Println(sandbox.SandboxID) +} + +// Just the latest sandbox. +it, _ = mc.Sandboxes.List(ctx, &modal.SandboxListParams{ + AppID: app.AppID, + Tags: map[string]string{"major_version": "1", "minor_version": "2"}, +}) +for sandbox := range it { + fmt.Println(sandbox.SandboxID) +} +``` + +{/snippet} + + +#### Running commands + +# Running commands in Sandboxes + +Once you have created a Sandbox, you can run commands inside it using the +[`Sandbox.exec`](https://modal.com/docs/reference/modal.Sandbox#exec) method. + +```python notest +sb = modal.Sandbox.create(app=my_app) + +process = sb.exec("echo", "hello", timeout=3) +print(process.stdout.read()) + +process = sb.exec("python", "-c", "print(1 + 1)", timeout=3) +print(process.stdout.read()) + +process = sb.exec( + "bash", + "-c", + "for i in $(seq 1 10); do echo foo $i; sleep 0.1; done", + timeout=5, +) +for line in process.stdout: + print(line, end="") + +sb.terminate() +``` + +`Sandbox.exec` returns a [`ContainerProcess`](https://modal.com/docs/reference/modal.container_process#modalcontainer_processcontainerprocess) +object, which allows access to the process's `stdout`, `stderr`, and `stdin`. +The `timeout` parameter ensures that the `exec` command will run for at most +`timeout` seconds. + +## Input + +The Sandbox and ContainerProcess `stdin` handles are [`StreamWriter`](https://modal.com/docs/reference/modal.io_streams#modalio_streamsstreamwriter) +objects. This object supports flushing writes with both synchronous and asynchronous APIs: + +```python notest +import asyncio + +sb = modal.Sandbox.create(app=my_app) + +p = sb.exec("bash", "-c", "while read line; do echo $line; done") +p.stdin.write(b"foo bar\n") +p.stdin.write_eof() +p.stdin.drain() +p.wait() +sb.terminate() + +async def run_async(): + sb = await modal.Sandbox.create.aio(app=my_app) + p = await sb.exec.aio("bash", "-c", "while read line; do echo $line; done") + p.stdin.write(b"foo bar\n") + p.stdin.write_eof() + await p.stdin.drain.aio() + await p.wait.aio() + await sb.terminate.aio() + +asyncio.run(run_async()) +``` + +## Output + +The Sandbox and ContainerProcess `stdout` and `stderr` handles are [`StreamReader`](https://modal.com/docs/reference/modal.io_streams#modalio_streamsstreamreader) +objects. These objects support reading from the stream in both synchronous and asynchronous manners. +These handles also respect the timeout given to `Sandbox.exec`. + +To read from a stream after the underlying process has finished, you can use the `read` +method, which blocks until the process finishes and returns the entire output stream. + +```python notest +sb = modal.Sandbox.create(app=my_app) +p = sb.exec("echo", "hello") +print(p.stdout.read()) +sb.terminate() +``` + +To stream output, take advantage of the fact that `stdout` and `stderr` are +iterable: + +```python notest +import asyncio + +sb = modal.Sandbox.create(app=my_app) + +p = sb.exec("bash", "-c", "for i in $(seq 1 10); do echo foo $i; sleep 0.1; done") + +for line in p.stdout: + # Lines preserve the trailing newline character, so use end="" to avoid double newlines. + print(line, end="") +p.wait() +sb.terminate() + +async def run_async(): + sb = await modal.Sandbox.create.aio(app=my_app) + p = await sb.exec.aio("bash", "-c", "for i in $(seq 1 10); do echo foo $i; sleep 0.1; done") + async for line in p.stdout: + # Avoid double newlines by using end="". + print(line, end="") + await p.wait.aio() + await sb.terminate.aio() + +asyncio.run(run_async()) +``` + +### Stream types + +By default, all streams are buffered in memory, waiting to be consumed by the +client. You can control this behavior with the `stdout` and `stderr` parameters. +These parameters are conceptually similar to the `stdout` and `stderr` +parameters of the [`subprocess`](https://docs.python.org/3/library/subprocess.html#subprocess.DEVNULL) module. + +```python notest +from modal.stream_type import StreamType + +sb = modal.Sandbox.create(app=my_app) + +# Default behavior: buffered in memory. +p = sb.exec( + "bash", + "-c", + "echo foo; echo bar >&2", + stdout=StreamType.PIPE, + stderr=StreamType.PIPE, +) +print(p.stdout.read()) +print(p.stderr.read()) + +# Print the stream to STDOUT as it comes in. +p = sb.exec( + "bash", + "-c", + "echo foo; echo bar >&2", + stdout=StreamType.STDOUT, + stderr=StreamType.STDOUT, +) +p.wait() + +# Discard all output. +p = sb.exec( + "bash", + "-c", + "echo foo; echo bar >&2", + stdout=StreamType.DEVNULL, + stderr=StreamType.DEVNULL, +) +p.wait() + +sb.terminate() +``` + +#### Networking and security + +# Networking and security + +Sandboxes are built to be secure-by-default, meaning that a default Sandbox has +no ability to accept incoming network connections or access your Modal resources. + +## Networking + +Since Sandboxes may run untrusted code, they have options to restrict their network access. +To block all network access, set `block_network=True` on [`Sandbox.create`](https://modal.com/docs/reference/modal.Sandbox#create). + +For more fine-grained networking control, a Sandbox's outbound network access +can be restricted using the `cidr_allowlist` parameter. This parameter takes a +list of CIDR ranges that the Sandbox is allowed to access, blocking all other +outbound traffic. + +### Connecting to Sandboxes with HTTP and WebSockets + +You can make authenticated HTTP and WebSocket requests to a Sandbox by generating +Sandbox Connect Tokens. They work like this: + +```python notest +# Start a Sandbox with a server running on port 8080. +sb = modal.Sandbox.create( + "bash", "-c", "python3 -m http.server 8080", + app=my_app, +) + +# Create a connect token, optionally including arbitrary user metadata. +creds = sb.create_connect_token(user_metadata={"user_id": "foo"}) + +# Make an HTTP request, passing the token in the Authorization header. +requests.get(creds.url, headers={"Authorization": f"Bearer {creds.token}"}) + +# You can also put the token in a `_modal_connect_token` query param. +url = f"{creds.url}/?_modal_connect_token={creds.token}" +ws_url = url.replace("https://", "wss://") +with websockets.connect(ws_url) as socket: + socket.send("Hello world!") +``` + +The server running on port 8080 in the container will receive an authenticated +request with an unspoofable `X-Verified-User-Data` header whose value is the +JSON-serialized Python dict that was passed as `user_metadata` to the +`create_connect_token()` function. This can be used by the application to +determine access control, for example. + +There are a few things to remember with Sandbox Connect Tokens: + +1. The server inside the container must be listening on port 8080. +2. The token may be sent in an `Authorization` header, in a `_modal_connect_token` + query param, or in a `_modal_connect_token` cookie. +3. If `_modal_connect_token` is set as a query param, the resulting response will + include a `Set-Cookie` header that sets it as a cookie. +4. The `user_metadata` must be JSON-serializable and must be less than 512 + characters after serialization. + +### Forwarding ports + +While it is recommended to use [Sandbox Connect Tokens](#connecting-to-sandboxes-with-http-and-websockets) +for HTTP requests and WebSocket connections to the container, you can also expose +raw TCP ports to the internet. This is useful if, for example, you want to run a +server inside the Sandbox that expects a raw TCP connection and handles +authentication itself. + +Use the `encrypted_ports` and `unencrypted_ports` parameters of `Sandbox.create` +to specify which ports to forward. You can then access the public URL of a tunnel +using the [`Sandbox.tunnels`](https://modal.com/docs/reference/modal.Sandbox#tunnels) method: + +```python notest +import requests +import time + +sb = modal.Sandbox.create( + "python", + "-m", + "http.server", + "12345", + encrypted_ports=[12345], + app=my_app, +) + +tunnel = sb.tunnels()[12345] + +time.sleep(1) # Wait for server to start. + +print(f"Connecting to {tunnel.url}...") +print(requests.get(tunnel.url, timeout=5).text) +``` + +It is also possible to create an encrypted port that uses `HTTP/2` rather than `HTTP/1.1` with the `h2_ports` option. This will return +a URL that you can make H2 (HTTP/2 + TLS) requests to. If you want to run an `HTTP/2` server inside a sandbox, this feature may be useful. +Here is an example: + +```python notest +import time + +port = 4359 +sb = modal.Sandbox.create( + app=my_app, + image=my_image, + h2_ports = [port], +) +p = sb.exec("python", "my_http2_server.py") + +tunnel = sb.tunnels()[port] +time.sleep(1) +print(f"Tunnel URL: {tunnel.url}") +``` + +For more details on how tunnels work, see the [tunnels guide](https://modal.com/docs/guide/tunnels). + +## Security model + +Sandboxes are built on top of [gVisor](https://gvisor.dev/), a container runtime +by Google that provides strong isolation properties. gVisor has custom logic to +prevent Sandboxes from making malicious system calls, giving you stronger isolation +than standard [runc](https://github.com/opencontainers/runc) containers. + +Additionally, Sandboxes are not authorized to access other resources in your Modal +workspace the way that Modal Functions are [by default](https://modal.com/docs/guide/restricted-access). +As a result, the blast radius of any malicious code will be limited to the Sandbox +container itself. + +#### File access + +# Filesystem Access + +There are multiple options for uploading files to a Sandbox and accessing them +from outside the Sandbox. + +## Efficient file syncing + +To efficiently upload local files to a Sandbox, you can use the +[`add_local_file`](https://modal.com/docs/reference/modal.Image#add_local_file) and +[`add_local_dir`](https://modal.com/docs/reference/modal.Image#add_local_dir) methods on the +[`Image`](https://modal.com/docs/reference/modal.Image) class: + +```python notest +sb = modal.Sandbox.create( + app=my_app, + image=modal.Image.debian_slim().add_local_dir( + local_path="/home/user/my_dir", + remote_path="/app" + ) +) +p = sb.exec("ls", "/app") +print(p.stdout.read()) +p.wait() +``` + +Alternatively, it's possible to use Modal [Volume](https://modal.com/docs/reference/modal.Volume)s or +[CloudBucketMount](https://modal.com/docs/guide/cloud-bucket-mounts)s. These have the benefit that +files created from inside the Sandbox can easily be accessed outside the +Sandbox. + +To efficiently upload files to a Sandbox using a Volume, you can use the +[`batch_upload`](https://modal.com/docs/reference/modal.Volume#batch_upload) method on the +`Volume` class - for instance, using an ephemeral Volume that +will be garbage collected when the App finishes: + +```python notest +with modal.Volume.ephemeral() as vol: + import io + with vol.batch_upload() as batch: + batch.put_file("local-path.txt", "/remote-path.txt") + batch.put_directory("/local/directory/", "/remote/directory") + batch.put_file(io.BytesIO(b"some data"), "/foobar") + + sb = modal.Sandbox.create( + volumes={"/cache": vol}, + app=my_app, + ) + p = sb.exec("cat", "/cache/remote-path.txt") + print(p.stdout.read()) + p.wait() + sb.terminate() +``` + +The caller also can access files created in the Volume from the Sandbox, even after the Sandbox is terminated: + +```python notest +with modal.Volume.ephemeral() as vol: + sb = modal.Sandbox.create( + volumes={"/cache": vol}, + app=my_app, + ) + p = sb.exec("bash", "-c", "echo foo > /cache/a.txt") + p.wait() + sb.terminate() + sb.wait(raise_on_termination=False) + for data in vol.read_file("a.txt"): + print(data) +``` + +Alternatively, if you want to persist files between Sandbox invocations (useful +if you're building a stateful code interpreter, for example), you can use create +a persisted `Volume` with a dynamically assigned label: + +```python notest +session_id = "example-session-id-123abc" +vol = modal.Volume.from_name(f"vol-{session_id}", create_if_missing=True) +sb = modal.Sandbox.create( + volumes={"/cache": vol}, + app=my_app, +) +p = sb.exec("bash", "-c", "echo foo > /cache/a.txt") +p.wait() +sb.terminate() +sb.wait(raise_on_termination=False) +for data in vol.read_file("a.txt"): + print(data) +``` + +File syncing behavior differs between Volumes and CloudBucketMounts. For +Volumes, files are only synced back to the Volume when the Sandbox terminates. +For CloudBucketMounts, files are synced automatically. + +## Filesystem API (Alpha) + +If you're less concerned with efficiency of uploads and want a convenient way +to pass data in and out of the Sandbox during execution, you can use our +filesystem API to easily read and write files. The API supports reading +files up to 100 MiB and writes up to 1 GiB in size. + +This API is currently in Alpha, and we don't recommend using it for production +workloads. + +```python +import modal + +app = modal.App.lookup("sandbox-fs-demo", create_if_missing=True) + +sb = modal.Sandbox.create(app=app) + +with sb.open("test.txt", "w") as f: + f.write("Hello World\n") + +f = sb.open("test.txt", "rb") +print(f.read()) +f.close() +``` + +The filesystem API is similar to Python's built-in [io.FileIO](https://docs.python.org/3/library/io.html#io.FileIO) and supports many of the same methods, including `read`, `readline`, `readlines`, `write`, `flush`, `seek`, and `close`. + +We additionally provide commands [`mkdir`](https://modal.com/docs/reference/modal.Sandbox#mkdir), [`rm`](https://modal.com/docs/reference/modal.Sandbox#rm), and [`ls`](https://modal.com/docs/reference/modal.Sandbox#ls) to make interacting with the filesystem more ergonomic. + + + + +#### Snapshots + +# Snapshots + +Sandboxes support snapshotting, allowing you to save your Sandbox's state +and restore it later. This is useful for: + +- Creating custom environments for your Sandboxes to run in +- Backing up your Sandbox's state for debugging +- Running large-scale experiments with the same initial state +- Branching your Sandbox's state to test different code changes independently + +## Filesystem Snapshots + +Filesystem Snapshots are copies of the Sandbox's filesystem at a given point in time. +These Snapshots are [Images](https://modal.com/docs/reference/modal.Image) and can be used to create +new Sandboxes. + +To create a Filesystem Snapshot, you can use the +[`Sandbox.snapshot_filesystem()`](https://modal.com/docs/reference/modal.Sandbox#snapshot_filesystem) method: + +```python notest +import modal + +app = modal.App.lookup("sandbox-fs-snapshot-test", create_if_missing=True) + +sb = modal.Sandbox.create(app=app) +p = sb.exec("bash", "-c", "echo 'test' > /test") +p.wait() +assert p.returncode == 0, "failed to write to file" +image = sb.snapshot_filesystem() +sb.terminate() + +sb2 = modal.Sandbox.create(image=image, app=app) +p2 = sb2.exec("bash", "-c", "cat /test") +assert p2.stdout.read().strip() == "test" +sb2.terminate() +``` + +Filesystem Snapshots are optimized for performance: they are calculated as the difference +from your base image, so only modified files are stored. Restoring a Filesystem Snapshot +utilizes the same infrastructure we use to get fast cold starts for your Sandboxes. + +Filesystem Snapshots will generally persist indefinitely. + +## Memory Snapshots + +[Sandboxes memory snapshots](https://modal.com/docs/guide/sandbox-memory-snapshots) are in early preview. +Contact us if this is something you're interested in! + +### Modal Notebooks + +# Modal Notebooks + +Notebooks allow you to write and execute Python code in Modal's cloud, within your browser. It's a hosted Jupyter notebook with: + +- Serverless pricing and automatic idle shutdown +- Access to Modal GPUs and compute +- Real-time collaborative editing +- Python Intellisense/LSP support and AI autocomplete +- Support for rich and interactive outputs like images, widgets, and plots + +
+ +
+ +## Getting started + +Open [modal.com/notebooks](https://modal.com/notebooks) in your browser and create a new notebook. You can also upload an `.ipynb` file from your computer. + +Once you create a notebook, you can start running cells. Try a simple statement like + +```python +print("Hello, Modal!") +``` + +Or, import a library and create a plot: + +```python notest +import matplotlib.pyplot as plt +import numpy as np + +x = np.linspace(-20, 20, 500) +plt.plot(np.cos(x / 3.7 + 0.3), x * np.sin(x)) +``` + +The default notebook image comes with a number of Python packages pre-installed, so you can get started right away. Popular ones include PyTorch, NumPy, Pandas, JAX, Transformers, and Matplotlib. You can find the full image definition [here](https://github.com/modal-labs/modal-client/blob/v1.1.3/modal/experimental/__init__.py#L234-L342). If you need another package, just install it: + +```shell +%uv pip install [my-package] +``` + +All output types work out-of-the-box, including rich HTML, images, [Jupyter Widgets](https://ipywidgets.readthedocs.io/en/latest/), and interactive plots. + +## Kernel resources + +Just like with Modal Functions, notebooks run in serverless containers. This means you pay only for the CPU cores and memory you use. + +If you need more resources, you can change kernel settings in the sidebar. This lets you set the number of CPU cores, memory, and GPU type for your notebook. You can also set a timeout for idle shutdown, which defaults to 10 minutes. + +Use any GPU type available in Modal, including up to 8 Nvidia A100s or H100s. You can switch the kernel configuration in seconds! + +![Compute profile tab in notebook sidebar](https://modal-cdn.com/cdnbot/compute-profilev9rvmmvw_365a1197.webp) + +Note that the CPU and memory settings are _reservations_, so you can usually burst above the request. For example, if you've set the notebook to have 0.5 CPU cores, you'll be billed for that continuously, but you can use up to any available cores on the machine (e.g., 32 CPUs) and will be billed for only the time you use them. + +### Notebook pricing + +Modal Notebooks are priced simply, by compute usage while the kernel is running. See the [pricing page](https://modal.com/pricing) for rates. Currently the CPU and Memory costs are priced according to Sandboxes. They appear in your [usage dashboard](https://modal.com/settings/usage) under "Sandboxes" as well. + +Inactive notebooks do not incur any cost. You are only billed for time the notebook is actively running. + +## Custom images, volumes, secrets, and cloud storage + +Modal Notebooks supports custom images, volumes, and secrets, just like Modal Functions. You can use these to install additional packages, mount persistent storage, or access secrets. + +- To use a custom image, you need to have a [deployed Modal Function](https://modal.com/docs/guide/managing-deployments) using that image. Then, search for that function in the sidebar. +- To use a Secret, simply create a [Modal Secret](https://modal.com/secrets) using our wizard and attach it to the notebook, so it can be injected as an environment variable automatically. +- To use a Volume, create a [Modal Volume](https://modal.com/docs/guide/volumes) and attach it to the notebook. This lets you mount high-performance, persistent storage that can be shared across multiple notebooks or functions. They will appear as folders in the `/mnt` directory by default. + +### Creating a Custom Image + +If you don't have a suitable deployed Modal App already, you can set up your environment to deploy custom images in under a minute using the Modal CLI. First, run `pip install modal`, and define your image in a file like: + +```python +import modal + +# Image definition here: +image = ( + modal.Image.from_registry("python:3.13-slim") + .pip_install("requests", "numpy") + .apt_install("curl", "wget") + .run_commands( + "echo 'foo' > /root/hello.txt", + # ... other commands + ) +) + +app = modal.App("notebook-images") + +@app.function(image=image) # You need a Function object to reference the image. +def notebook_image(): + pass +``` + +Then, make sure you have the Modal CLI (`pip install modal`) and run this command to build and deploy the image: + +```bash +modal deploy notebook_images.py +``` + +For more information on custom images in Modal, see our [guide on defining images](https://modal.com/docs/guide/images). + +(Advanced) Note that if you use the [`add_local_file()` or `add_local_dir()` functions](https://modal.com/docs/guide/images#add-local-files-with-add_local_dir-and-add_local_file), you'll need to pass `copy=True` for them to work in Modal Notebooks. This is because they skip creating a custom image and instead mount the files into the function at startup, which won't work in notebooks. + +### Creating a Secret + +Secrets can be created from the dashboard at [modal.com/secrets](https://modal.com/secrets). We have templates for common credential types, and they are saved as encrypted objects until container startup. + +Attacahed secrets become available as environment variables in your notebook. + +### Creating a Volume + +[Volumes](https://modal.com/docs/guide/volumes) can be created via the files panel on the filesystem tab. This panel can also be used to attach existing Volumes from your Apps or Functions, including those created via the Modal CLI. + +Any volumes are attached in the `/mnt` folder in your notebook, and files saved there will be persisted across kernel startups and elsewhere on Modal. + +### Mounting Cloud Buckets + +Modal Notebooks now support mounting cloud storage buckets, initially S3 buckets, directly to your notebook filesystem. This allows you to access large datasets stored in cloud storage easily on your notebooks. + +To mount an S3 bucket: + +1. Create a [Modal Secret](https://modal.com/secrets) containing your AWS credentials (AWS Access Key ID and Secret Access Key) +2. In the notebook sidebar's Files panel, use the Cloud Buckets section to attach your bucket +3. Specify: + - The S3 bucket name + - Mount path (e.g., `/mnt/s3/my-data`) + - The AWS credentials secret stored in that environment + - Optional: A key prefix to mount only a subset of objects (e.g., `datasets/`) + - Optional: Set the mount as read-only + +Once attached, your S3 bucket will be mounted at the specified path and accessible just like any other directory in your notebook. + +For more information on using cloud bucket mounts with Modal, see the [CloudBucket mounts guide](https://modal.com/docs/guide/cloud-bucket-mounts). + +## Access and sharing + +Need a colleague—or the whole internet—to see your work? Just click **Share** in the top‑right corner of the notebook editor. + +Notebooks are editable by you and teammates in your workspace. To make the notebook view-only to collaborators, the creator of the notebook can change access settings in the "Share" menu. Workspace managers are also allowed to change this setting. + +You can also turn on sharing by public, unlisted link. If you toggle this, it allows _anyone with the link_ to open the notebook, even if they are not logged in. Pick **Can view** (default) or **Can view and run** based on your preference. Viewers don’t need a Modal account, so this is perfect for collaborating with stakeholders outside your workspace. + +No matter how the notebook is shared, anyone with access can fork and run their own version of it. + +## Interactive file viewer + +The panel on the left-hand side of the notebook shows a **live view of the container’s filesystem**: + +| Feature | Details | +| ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Browse & preview** | Click through folders to inspect any file that your code has created or downloaded. | +| **Upload & download** | Drag-and-drop files from your desktop, or click the **⬆** / **⬇** icons to add new data sets, notebooks, or models—or to save results back to your machine. | +| **One-click refresh** | Changes made by your code (for example, writing a CSV) appear instantly; hit the refresh icon if you want to force an update. | +| **Context-aware paths** | The viewer always reflects _exactly_ what your code sees (e.g. `/root`, `/mnt/…`), so you can double-check that that file you just wrote really landed where you expected. | + +**Important:** the underlying container is **ephemeral**. Anything stored outside an attached [Volume](https://modal.com/docs/guide/volumes) disappears when the kernel shuts down (after your idle-timeout or when you hit **Stop kernel**). Mount a Volume for data you want to keep across sessions. + +The viewer itself is only active while the kernel is running—if the notebook is stopped you’ll see an “empty” state until you start it again. + +## Editor features + +Modal Notebooks bundle the same productivity tooling you’d expect from a modern IDE. + +With Pyright, you get autocomplete, signature help, and on-hover documentation for every installed library. + +We also implemented AI-powered code completion using Anthropic's **Claude 4** model. This keeps you in the flow for everything from small snippets to multi-line functions. Just press `Tab` to accept suggestions or `Esc` to dismiss them. + +Familiar Jupyter shortcuts (`A`, `B`, `X`, `Y`, `M`, etc.) all work within the notebook, so you can quickly add new cells, delete existing ones, or change cell types. + +Finally, we have real-time collaborative editing, so you can work with your team in the same notebook. You can see other users' cursors and edits in real-time, and you can see when others are running cells with you. This makes it easy to pair program or review code together. + +## Widgets + +Modal Notebooks support [Jupyter Widgets](https://ipywidgets.readthedocs.io/en/latest/), which can be used to create interactive components living in the browser. Currently, Notebooks support all the widgets in the base `ipywidgets` package, except the following: + +- Media Widgets (`Audio`, `Video`), try using `IPython.display` outputs instead. +- `Play` +- Controllers (`ControllerAxis`, `ControllerButton`, `Controller`) + +Modal Notebooks do not support custom widget packages. + +## Cell magic + +Modal Notebooks have built-in support for the `%modal` cell magic. This lets you run code in any [deployed Modal Function or Cls](https://modal.com/docs/guide/trigger-deployed-functions), right from your notebook. + +For example, if you have previously run `modal deploy` for an app like: + +```python notest +import modal + +app = modal.App("my-app") + +@app.function() +def my_function(s: str): + return len(s) +``` + +Then you could access this function from your notebook: + +```python notest +%modal from my-app import my_function + +my_function.remote("hello, world!") # returns 13 +``` + +Run `%modal` to see all options. This works for Cls as well, and you can import from different environments or alias them with the `as` keyword. + +## Roadmap + +The product is in beta, and we're planning to make a lot of improvements over the coming months. Some bigger features on mind: + +- **Modal cloud integrations** + - Expose ports with [Tunnels](https://modal.com/docs/guide/tunnels) + - Memory snapshots to restore from past notebook sessions + - Create notebooks from the `modal` CLI + - Custom image registry +- **Notebook editor** + - Interactive outline, collapsing sections by headings + - Reactive cell execution + - Edit history + - Integrated debugger (pdb and `%debug`) +- **Documents and sharing** + - Restore recently deleted notebooks + - Folders and tags for grouping notebooks + - Sync with Git repositories + +Let us know via [Slack](https://modal.com/slack) if you have any feedback. + +### Secrets and environment variables + +#### Secrets + +# Secrets + +Securely provide credentials and other sensitive information to your Modal Functions with Secrets. + +You can create and edit Secrets via +the [dashboard](https://modal.com/secrets), +the command line interface ([`modal secret`](https://modal.com/docs/reference/cli/secret)), and +programmatically from Python code ([`modal.Secret`](https://modal.com/docs/reference/modal.Secret)). + +To inject Secrets into the container running your Function, add the +`secrets=[...]` argument to your `app.function` or `app.cls` decoration. + +## Deploy Secrets from the Modal Dashboard + +The most common way to create a Modal Secret is to use the +[Secrets panel of the Modal dashboard](https://modal.com/secrets), +which also shows any existing Secrets. + +When you create a new Secret, you'll be prompted with a number of templates to help you get started. +These templates demonstrate standard formats for credentials for everything from Postgres and MongoDB +to Weights & Biases and Hugging Face. + +## Use Secrets in your Modal Apps + +You can then use your Secret by constructing it `from_name` when defining a Modal App +and then accessing its contents as environment variables. +For example, if you have a Secret called `secret-keys` containing the key +`MY_PASSWORD`: + +```python +@app.function(secrets=[modal.Secret.from_name("secret-keys")]) +def some_function(): + import os + + secret_key = os.environ["MY_PASSWORD"] + ... +``` + +Each Secret can contain multiple keys and values but you can also inject +multiple Secrets, allowing you to separate Secrets into smaller reusable units: + +```python +@app.function(secrets=[ + modal.Secret.from_name("my-secret-name"), + modal.Secret.from_name("other-secret"), +]) +def other_function(): + ... +``` + +The Secrets are applied in order, so key-values from later `modal.Secret` +objects in the list will overwrite earlier key-values in the case of a clash. +For example, if both `modal.Secret` objects above contained the key `FOO`, then +the value from `"other-secret"` would always be present in `os.environ["FOO"]`. + +## Create Secrets programmatically + +In addition to defining Secrets on the web dashboard, you can +programmatically create a Secret directly in your script and send it along to +your Function using `Secret.from_dict(...)`. This can be useful if you want to +send Secrets from your local development machine to the remote Modal App. + +```python +import os + +if modal.is_local(): + local_secret = modal.Secret.from_dict({"FOO": os.environ["LOCAL_FOO"]}) +else: + local_secret = modal.Secret.from_dict({}) + +@app.function(secrets=[local_secret]) +def some_function(): + import os + + print(os.environ["FOO"]) +``` + +If you have [`python-dotenv`](https://pypi.org/project/python-dotenv/) installed, +you can also use `Secret.from_dotenv()` to create a Secret from the variables in a `.env` +file + +```python +@app.function(secrets=[modal.Secret.from_dotenv()]) +def some_other_function(): + print(os.environ["USERNAME"]) +``` + +## Interact with Secrets from the command line + +You can create, list, and delete your Modal Secrets with the `modal secret` command line interface. + +View your Secrets and their timestamps with + +```bash +modal secret list +``` + +Create a new Secret by passing `{KEY}={VALUE}` pairs to `modal secret create`: + +```bash +modal secret create database-secret PGHOST=uri PGPORT=5432 PGUSER=admin PGPASSWORD=hunter2 +``` + +or using environment variables (assuming below that the `PGPASSWORD` environment variable is set +e.g. by your CI system): + +```bash +modal secret create database-secret PGHOST=uri PGPORT=5432 PGUSER=admin PGPASSWORD="$PGPASSWORD" +``` + +Remove Secrets by passing their name to `modal secret delete`: + +```bash +modal secret delete database-secret +``` + +#### Environment variables + +# Environment variables + +The Modal runtime sets several environment variables during initialization. The +keys for these environment variables are reserved and cannot be overridden by +your Function or Sandbox configuration. + +These variables provide information about the containers's runtime +environment. + +## Container runtime environment variables + +The following variables are present in every Modal container: + +- **`MODAL_CLOUD_PROVIDER`** — Modal executes containers across a number of cloud + providers ([AWS](https://aws.amazon.com/), [GCP](https://cloud.google.com/), + [OCI](https://www.oracle.com/cloud/)). This variable specifies which cloud + provider the Modal container is running within. +- **`MODAL_IMAGE_ID`** — The ID of the + [`modal.Image`](https://modal.com/docs/reference/modal.Image) used by the Modal container. +- **`MODAL_REGION`** — This will correspond to a geographic area identifier from + the cloud provider associated with the Modal container (see above). For AWS, the + identifier is a "region". For GCP it is a "zone", and for OCI it is an + "availability domain". Example values are `us-east-1` (AWS), `us-central1` + (GCP), `us-ashburn-1` (OCI). See the [full list here](https://modal.com/docs/guide/region-selection#region-options). +- **`MODAL_TASK_ID`** — The ID of the container running the Modal Function or Sandbox. + +## Function runtime environment variables + +The following variables are present in containers running Modal Functions: + +- **`MODAL_ENVIRONMENT`** — The name of the + [Modal Environment](https://modal.com/docs/guide/environments) the container is running within. +- **`MODAL_IS_REMOTE`** - Set to '1' to indicate that Modal Function code is running in + a remote container. +- **`MODAL_IDENTITY_TOKEN`** — An [OIDC token](https://modal.com/docs/guide/oidc-integration) + encoding the identity of the Modal Function. + +## Sandbox environment variables + +The following variables are present within [`modal.Sandbox`](https://modal.com/docs/reference/modal.Sandbox) instances. + +- **`MODAL_SANDBOX_ID`** — The ID of the Sandbox. + +## Container image environment variables + +The container image layers used by a `modal.Image` may set +environment variables. These variables will be present within your container's runtime +environment. For example, the +[`debian_slim`](https://modal.com/docs/reference/modal.Image#debian_slim) image sets the +`GPG_KEY` variable. + +To override image variables or set new ones, use the +[`.env`](https://modal.com/docs/reference/modal.Image#env) method provided by +`modal.Image`. + +### Scheduling and cron jobs + +# Scheduling remote cron jobs + +A common requirement is to perform some task at a given time every day or week +automatically. Modal facilitates this through function schedules. + +## Basic scheduling + +Let's say we have a Python module `heavy.py` with a function, +`perform_heavy_computation()`. + +```python +# heavy.py +def perform_heavy_computation(): + ... + +if __name__ == "__main__": + perform_heavy_computation() +``` + +To schedule this function to run once per day, we create a Modal App and attach +our function to it with the `@app.function` decorator and a schedule parameter: + +```python +# heavy.py +import modal + +app = modal.App() + +@app.function(schedule=modal.Period(days=1)) +def perform_heavy_computation(): + ... +``` + +To activate the schedule, deploy your app, either through the CLI: + +```shell +modal deploy --name daily_heavy heavy.py +``` + +Or programmatically: + +```python +if __name__ == "__main__": + app.deploy() +``` + +Now the function will run every day, at the time of the initial deployment, +without any further interaction on your part. + +When you make changes to your function, just rerun the deploy command to +overwrite the old deployment. + +Note that when you redeploy your function, `modal.Period` resets, and the +schedule will run X hours after this most recent deployment. + +If you want to run your function at a regular schedule not disturbed by deploys, +`modal.Cron` (see below) is a better option. + +## Monitoring your scheduled runs + +To see past execution logs for the scheduled function, go to the +[Apps](https://modal.com/apps) section on the Modal web site. + +Schedules currently cannot be paused. Instead the schedule should be removed and +the app redeployed. Schedules can be started manually on the app's dashboard +page, using the "run now" button. + +## Schedule types + +There are two kinds of base schedule values - +[`modal.Period`](https://modal.com/docs/reference/modal.Period) and +[`modal.Cron`](https://modal.com/docs/reference/modal.Cron). + +[`modal.Period`](https://modal.com/docs/reference/modal.Period) lets you specify an interval +between function calls, e.g. `Period(days=1)` or `Period(hours=5)`: + +```python +# runs once every 5 hours +@app.function(schedule=modal.Period(hours=5)) +def perform_heavy_computation(): + ... +``` + +[`modal.Cron`](https://modal.com/docs/reference/modal.Cron) gives you finer control using +[cron](https://en.wikipedia.org/wiki/Cron) syntax: + +```python +# runs at 8 am (UTC) every Monday +@app.function(schedule=modal.Cron("0 8 * * 1")) +def perform_heavy_computation(): + ... + +# runs daily at 6 am (New York time) +@app.function(schedule=modal.Cron("0 6 * * *", timezone="America/New_York")) +def send_morning_report(): + ... +``` + +For more details, see the API reference for +[Period](https://modal.com/docs/reference/modal.Period), [Cron](https://modal.com/docs/reference/modal.Cron) and +[Function](https://modal.com/docs/reference/modal.Function) + +### Web endpoints + +#### Web endpoints + +# Web endpoints + +This guide explains how to set up web endpoints with Modal. + +All deployed Modal Functions can be [invoked from any other Python application](https://modal.com/docs/guide/trigger-deployed-functions) +using the Modal client library. We additionally provide multiple ways to expose +your Functions over the web for non-Python clients. + +You can [turn any Python function into a web endpoint](#simple-endpoints) with a single line +of code, you can [serve a full app](#serving-asgi-and-wsgi-apps) using +frameworks like FastAPI, Django, or Flask, or you can +[serve anything that speaks HTTP and listens on a port](#non-asgi-web-servers). + +Below we walk through each method, assuming you're familiar with web applications outside of Modal. +For a detailed walkthrough of basic web endpoints on Modal aimed at developers new to web applications, +see [this tutorial](https://modal.com/docs/examples/basic_web). + +## Simple endpoints + +The easiest way to create a web endpoint from an existing Python function is to use the +[`@modal.fastapi_endpoint` decorator](https://modal.com/docs/reference/modal.fastapi_endpoint). + +```python +image = modal.Image.debian_slim().pip_install("fastapi[standard]") + +@app.function(image=image) +@modal.fastapi_endpoint() +def f(): + return "Hello world!" +``` + +This decorator wraps the Modal Function in a +[FastAPI application](#how-do-web-endpoints-run-in-the-cloud). + +_Note: Prior to v0.73.82, this function was named `@modal.web_endpoint`_. + +### Developing with `modal serve` + +You can run this code as an ephemeral app, by running the command + +```shell +modal serve server_script.py +``` + +Where `server_script.py` is the file name of your code. This will create an +ephemeral app for the duration of your script (until you hit Ctrl-C to stop it). +It creates a temporary URL that you can use like any other REST endpoint. This +URL is on the public internet. + +The `modal serve` command will live-update an app when any of its supporting +files change. + +Live updating is particularly useful when working with apps containing web +endpoints, as any changes made to web endpoint handlers will show up almost +immediately, without requiring a manual restart of the app. + +### Deploying with `modal deploy` + +You can also deploy your app and create a persistent web endpoint in the cloud +by running `modal deploy`: + +### Passing arguments to an endpoint + +When using `@modal.fastapi_endpoint`, you can add +[query parameters](https://fastapi.tiangolo.com/tutorial/query-params/) which +will be passed to your Function as arguments. For instance + +```python +image = modal.Image.debian_slim().pip_install("fastapi[standard]") + +@app.function(image=image) +@modal.fastapi_endpoint() +def square(x: int): + return {"square": x**2} +``` + +If you hit this with a URL-encoded query string with the `x` parameter present, +the Function will receive the value as an argument: + +``` +$ curl https://modal-labs--web-endpoint-square-dev.modal.run?x=42 +{"square":1764} +``` + +If you want to use a `POST` request, you can use the `method` argument to +`@modal.fastapi_endpoint` to set the HTTP verb. To accept any valid JSON object, +[use `dict` as your type annotation](https://fastapi.tiangolo.com/tutorial/body-nested-models/?h=dict#bodies-of-arbitrary-dicts) +and FastAPI will handle the rest. + +```python +image = modal.Image.debian_slim().pip_install("fastapi[standard]") + +@app.function(image=image) +@modal.fastapi_endpoint(method="POST") +def square(item: dict): + return {"square": item['x']**2} +``` + +This now creates an endpoint that takes a JSON body: + +``` +$ curl -X POST -H 'Content-Type: application/json' --data-binary '{"x": 42}' https://modal-labs--web-endpoint-square-dev.modal.run +{"square":1764} +``` + +This is often the easiest way to get started, but note that FastAPI recommends +that you use +[typed Pydantic models](https://fastapi.tiangolo.com/tutorial/body/) in order to +get automatic validation and documentation. FastAPI also lets you pass data to +web endpoints in other ways, for instance as +[form data](https://fastapi.tiangolo.com/tutorial/request-forms/) and +[file uploads](https://fastapi.tiangolo.com/tutorial/request-files/). + +## How do web endpoints run in the cloud? + +Note that web endpoints, like everything else on Modal, only run when they need +to. When you hit the web endpoint the first time, it will boot up the container, +which might take a few seconds. Modal keeps the container alive for a short +period in case there are subsequent requests. If there are a lot of requests, +Modal might create more containers running in parallel. + +For the shortcut `@modal.fastapi_endpoint` decorator, Modal wraps your function in a +[FastAPI](https://fastapi.tiangolo.com/) application. This means that the +[Image](https://modal.com/docs/guide/images) +your Function uses must have FastAPI installed, and the Functions that you write +need to follow its request and response +[semantics](https://fastapi.tiangolo.com/tutorial). Web endpoint Functions can use +all of FastAPI's powerful features, such as Pydantic models for automatic validation, +typed query and path parameters, and response types. + +Here's everything together, combining Modal's abilities to run functions in +user-defined containers with the expressivity of FastAPI: + +```python +import modal +from fastapi.responses import HTMLResponse +from pydantic import BaseModel + +image = modal.Image.debian_slim().pip_install("fastapi[standard]", "boto3") +app = modal.App(image=image) + +class Item(BaseModel): + name: str + qty: int = 42 + +@app.function() +@modal.fastapi_endpoint(method="POST") +def f(item: Item): + import boto3 + # do things with boto3... + return HTMLResponse(f"Hello, {item.name}!") +``` + +This endpoint definition would be called like so: + +```bash +curl -d '{"name": "Erik", "qty": 10}' \ + -H "Content-Type: application/json" \ + -X POST https://ecorp--web-demo-f-dev.modal.run +``` + +Or in Python with the [`requests`](https://pypi.org/project/requests/) library: + +```python +import requests + +data = {"name": "Erik", "qty": 10} +requests.post("https://ecorp--web-demo-f-dev.modal.run", json=data, timeout=10.0) +``` + +## Serving ASGI and WSGI apps + +You can also serve any app written in an +[ASGI](https://asgi.readthedocs.io/en/latest/) or +[WSGI](https://en.wikipedia.org/wiki/Web_Server_Gateway_Interface)-compatible +web framework on Modal. + +ASGI provides support for async web frameworks. WSGI provides support for +synchronous web frameworks. + +### ASGI apps - FastAPI, FastHTML, Starlette + +For ASGI apps, you can create a function decorated with +[`@modal.asgi_app`](https://modal.com/docs/reference/modal.asgi_app) that returns a reference to +your web app: + +```python +image = modal.Image.debian_slim().pip_install("fastapi[standard]") + +@app.function(image=image) +@modal.concurrent(max_inputs=100) +@modal.asgi_app() +def fastapi_app(): + from fastapi import FastAPI, Request + + web_app = FastAPI() + + @web_app.post("/echo") + async def echo(request: Request): + body = await request.json() + return body + + return web_app +``` + +Now, as before, when you deploy this script as a Modal App, you get a URL for +your app that you can hit: + +The `@modal.concurrent` decorator enables a single container +to process multiple inputs at once, taking advantage of the asynchronous +event loops in ASGI applications. See [this guide](https://modal.com/docs/guide/concurrent-inputs) +for details. + +#### ASGI Lifespan + +While we recommend using [`@modal.enter`](https://modal.com/docs/guide/lifecycle-functions#enter) for defining container lifecycle hooks, we also support the [ASGI lifespan protocol](https://asgi.readthedocs.io/en/latest/specs/lifespan.html). Lifespans begin when containers start, typically at the time of the first request. Here's an example using [FastAPI](https://fastapi.tiangolo.com/advanced/events/#lifespan): + +```python +import modal + +app = modal.App("fastapi-lifespan-app") + +image = modal.Image.debian_slim().pip_install("fastapi[standard]") + +@app.function(image=image) +@modal.asgi_app() +def fastapi_app_with_lifespan(): + from fastapi import FastAPI, Request + + def lifespan(wapp: FastAPI): + print("Starting") + yield + print("Shutting down") + + web_app = FastAPI(lifespan=lifespan) + + @web_app.get("/") + async def hello(request: Request): + return "hello" + + return web_app +``` + +### WSGI apps - Django, Flask + +You can serve WSGI apps using the +[`@modal.wsgi_app`](https://modal.com/docs/reference/modal.wsgi_app) decorator: + +```python +image = modal.Image.debian_slim().pip_install("flask") + +@app.function(image=image) +@modal.concurrent(max_inputs=100) +@modal.wsgi_app() +def flask_app(): + from flask import Flask, request + + web_app = Flask(__name__) + + @web_app.post("/echo") + def echo(): + return request.json + + return web_app +``` + +See [Flask's docs](https://flask.palletsprojects.com/en/2.1.x/deploying/asgi/) +for more information on using Flask as a WSGI app. + +Because WSGI apps are synchronous, concurrent inputs will be run on separate +threads. See [this guide](https://modal.com/docs/guide/concurrent-inputs) for details. + +## Non-ASGI web servers + +Not all web frameworks offer an ASGI or WSGI interface. For example, +[`aiohttp`](https://docs.aiohttp.org/) and [`tornado`](https://www.tornadoweb.org/) +use their own asynchronous network binding, while others like +[`text-generation-inference`](https://github.com/huggingface/text-generation-inference) +actually expose a Rust-based HTTP server running as a subprocess. + +For these cases, you can use the +[`@modal.web_server`](https://modal.com/docs/reference/modal.web_server) decorator to "expose" a +port on the container: + +```python +@app.function() +@modal.concurrent(max_inputs=100) +@modal.web_server(8000) +def my_file_server(): + import subprocess + subprocess.Popen("python -m http.server -d / 8000", shell=True) +``` + +Just like all web endpoints on Modal, this is only run on-demand. The function +is executed on container startup, creating a file server at the root directory. +When you hit the web endpoint URL, your request will be routed to the file +server listening on port `8000`. + +For `@web_server` endpoints, you need to make sure that the application binds to +the external network interface, not just localhost. This usually means binding +to `0.0.0.0` instead of `127.0.0.1`. + +See our examples of how to serve [Streamlit](https://modal.com/docs/examples/serve_streamlit) and +[ComfyUI](https://modal.com/docs/examples/comfyapp) on Modal. + +## Serve many configurations with parametrized functions + +Python functions that launch ASGI/WSGI apps or web servers on Modal +cannot take arguments. + +One simple pattern for allowing client-side configuration of these web endpoints +is to use [parametrized functions](https://modal.com/docs/guide/parametrized-functions). +Each different choice for the values of the parameters will create a distinct +auto-scaling container pool. + +```python +@app.cls() +@modal.concurrent(max_inputs=100) +class Server: + root: str = modal.parameter(default=".") + + @modal.web_server(8000) + def files(self): + import subprocess + subprocess.Popen(f"python -m http.server -d {self.root} 8000", shell=True) +``` + +The values are provided in URLs as query parameters: + +```bash +curl https://ecorp--server-files.modal.run # use the default value +curl https://ecorp--server-files.modal.run?root=.cache # use a different value +curl https://ecorp--server-files.modal.run?root=%2F # don't forget to URL encode! +``` + +For details, see [this guide to parametrized functions](https://modal.com/docs/guide/parametrized-functions). + +## WebSockets + +Functions annotated with `@web_server`, `@asgi_app`, or `@wsgi_app` also support +the WebSocket protocol. Consult your web framework for appropriate documentation +on how to use WebSockets with that library. + +WebSockets on Modal maintain a single function call per connection, which can be +useful for keeping state around. Most of the time, you will want to set your +handler function to [allow concurrent inputs](https://modal.com/docs/guide/concurrent-inputs), +which allows multiple simultaneous WebSocket connections to be handled by the +same container. + +We support the full WebSocket protocol as per +[RFC 6455](https://www.rfc-editor.org/rfc/rfc6455), but we do not yet have +support for [RFC 8441](https://www.rfc-editor.org/rfc/rfc8441) (WebSockets over +HTTP/2) or [RFC 7692](https://datatracker.ietf.org/doc/html/rfc7692) +(`permessage-deflate` extension). WebSocket messages can be up to 2 MiB each. + +## Performance and scaling + +If you have no active containers when the web endpoint receives a request, it will +experience a "cold start". Consult the guide page on +[cold start performance](https://modal.com/docs/guide/cold-start) for more information on when +Functions will cold start and advice how to mitigate the impact. + +If your Function uses `@modal.concurrent`, multiple requests to the same +endpoint may be handled by the same container. Beyond this limit, additional +containers will start up to scale your App horizontally. When you reach the +Function's limit on containers, requests will queue for handling. + +Each workspace on Modal has a rate limit on total operations. For a new account, +this is set to 200 function inputs or web endpoint requests per second, with a +burst multiplier of 5 seconds. If you reach the rate limit, excess requests to +web endpoints will return a +[429 status code](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429), +and you'll need to [get in touch](mailto:support@modal.com) with us about +raising the limit. + +Web endpoint request bodies can be up to 4 GiB, and their response bodies are +unlimited in size. + +## Authentication + +Modal offers first-class web endpoint protection via [proxy auth tokens](https://modal.com/docs/guide/webhook-proxy-auth). +Proxy auth tokens protect web endpoints by requiring a key and token combination to be passed +in the `Modal-Key` and `Modal-Secret` headers. +Modal works as a proxy, rejecting requests that aren't authorized to access +your endpoint. + +We also support standard techniques for securing web servers. + +### Token-based authentication + +This is easy to implement in whichever framework you're using. For example, if +you're using `@modal.fastapi_endpoint` or `@modal.asgi_app` with FastAPI, you +can validate a Bearer token like this: + +```python +from fastapi import Depends, HTTPException, status, Request +from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials + +import modal + +image = modal.Image.debian_slim().pip_install("fastapi[standard]") +app = modal.App("auth-example", image=image) + +auth_scheme = HTTPBearer() + +@app.function(secrets=[modal.Secret.from_name("my-web-auth-token")]) +@modal.fastapi_endpoint() +async def f(request: Request, token: HTTPAuthorizationCredentials = Depends(auth_scheme)): + import os + + print(os.environ["AUTH_TOKEN"]) + + if token.credentials != os.environ["AUTH_TOKEN"]: + raise HTTPException( + status_code=status.HTTP_401_UNAUTHORIZED, + detail="Incorrect bearer token", + headers={"WWW-Authenticate": "Bearer"}, + ) + + # Function body + return "success!" +``` + +This assumes you have a [Modal Secret](https://modal.com/secrets) named +`my-web-auth-token` created, with contents `{AUTH_TOKEN: secret-random-token}`. +Now, your endpoint will return a 401 status code except when you hit it with the +correct `Authorization` header set (note that you have to prefix the token with +`Bearer `): + +```bash +curl --header "Authorization: Bearer secret-random-token" https://modal-labs--auth-example-f.modal.run +``` + +### Client IP address + +You can access the IP address of the client making the request. This can be used +for geolocation, whitelists, blacklists, and rate limits. + +```python +from fastapi import Request + +import modal + +image = modal.Image.debian_slim().pip_install("fastapi[standard]") +app = modal.App(image=image) + +@app.function() +@modal.fastapi_endpoint() +def get_ip_address(request: Request): + return f"Your IP address is {request.client.host}" +``` + +#### Streaming endpoints + +# Streaming endpoints + +Modal web endpoints support streaming responses using FastAPI's +[`StreamingResponse`](https://fastapi.tiangolo.com/advanced/custom-response/#streamingresponse) +class. This class accepts asynchronous generators, synchronous generators, or +any Python object that implements the +[_iterator protocol_](https://docs.python.org/3/library/stdtypes.html#typeiter), +and can be used with Modal Functions! + +## Simple example + +This simple example combines Modal's `@modal.fastapi_endpoint` decorator with a +`StreamingResponse` object to produce a real-time SSE response. + +```python +import time + +def fake_event_streamer(): + for i in range(10): + yield f"data: some data {i}\n\n".encode() + time.sleep(0.5) + +@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]")) +@modal.fastapi_endpoint() +def stream_me(): + from fastapi.responses import StreamingResponse + return StreamingResponse( + fake_event_streamer(), media_type="text/event-stream" + ) +``` + +If you serve this web endpoint and hit it with `curl`, you will see the ten SSE +events progressively appear in your terminal over a ~5 second period. + +```shell +curl --no-buffer https://modal-labs--example-streaming-stream-me.modal.run +``` + +The MIME type of `text/event-stream` is important in this example, as it tells +the downstream web server to return responses immediately, rather than buffering +them in byte chunks (which is more efficient for compression). + +You can still return other content types like large files in streams, but they +are not guaranteed to arrive as real-time events. + +## Streaming responses with `.remote` + +A Modal Function wrapping a generator function body can have its response passed +directly into a `StreamingResponse`. This is particularly useful if you want to +do some GPU processing in one Modal Function that is called by a CPU-based web +endpoint Modal Function. + +```python +@app.function(gpu="any") +def fake_video_render(): + for i in range(10): + yield f"data: finished processing some data from GPU {i}\n\n".encode() + time.sleep(1) + +@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]")) +@modal.fastapi_endpoint() +def hook(): + from fastapi.responses import StreamingResponse + return StreamingResponse( + fake_video_render.remote_gen(), media_type="text/event-stream" + ) +``` + +## Streaming responses with `.map` and `.starmap` + +You can also combine Modal Function parallelization with streaming responses, +enabling applications to service a request by farming out to dozens of +containers and iteratively returning result chunks to the client. + +```python +@app.function() +def map_me(i): + return f"segment {i}\n" + +@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]")) +@modal.fastapi_endpoint() +def mapped(): + from fastapi.responses import StreamingResponse + return StreamingResponse(map_me.map(range(10)), media_type="text/plain") +``` + +This snippet will spread the ten `map_me(i)` executions across containers, and +return each string response part as it completes. By default the results will be +ordered, but if this isn't necessary pass `order_outputs=False` as keyword +argument to the `.map` call. + +### Asynchronous streaming + +The example above uses a synchronous generator, which automatically runs on its +own thread, but in asynchronous applications, a loop over a `.map` or `.starmap` +call can block the event loop. This will stop the `StreamingResponse` from +returning response parts iteratively to the client. + +To avoid this, you can use the `.aio()` method to convert a synchronous `.map` +into its async version. Also, other blocking calls should be offloaded to a +separate thread with `asyncio.to_thread()`. For example: + +```python +@app.function(gpu="any", image=modal.Image.debian_slim().pip_install("fastapi[standard]")) +@modal.fastapi_endpoint() +async def transcribe_video(request): + from fastapi.responses import StreamingResponse + + segments = await asyncio.to_thread(split_video, request) + return StreamingResponse(wrapper(segments), media_type="text/event-stream") + +# Notice that this is an async generator. +async def wrapper(segments): + async for partial_result in transcribe_video.map.aio(segments): + yield "data: " + partial_result + "\n\n" +``` + +## Further examples + +- Complete code the for the simple examples given above is available + [in our modal-examples Github repository](https://github.com/modal-labs/modal-examples/blob/main/07_web_endpoints/streaming.py). +- [An end-to-end example of streaming Youtube video transcriptions with OpenAI's whisper model.](https://github.com/modal-labs/modal-examples/blob/main/06_gpu_and_ml/openai_whisper/streaming/main.py) + +#### Web endpoint URLs + +# Web endpoint URLs + +This guide documents the behavior of URLs for [web endpoints](https://modal.com/docs/guide/webhooks) +on Modal: automatic generation, configuration, programmatic retrieval, and more. + +## Determine the URL of a web endpoint from code + +Modal Functions with the +[`fastapi_endpoint`](https://modal.com/docs/reference/modal.fastapi_endpoint), +[`asgi_app`](https://modal.com/docs/reference/modal.asgi_app), +[`wsgi_app`](https://modal.com/docs/reference/modal.wsgi_app), +or [`web_server`](https://modal.com/docs/reference/modal.web_server) decorator +are made available over the Internet when they are +[`serve`d](https://modal.com/docs/reference/cli/serve) or [`deploy`ed](https://modal.com/docs/reference/cli/deploy) +and so they have a URL. + +This URL is displayed in the `modal` CLI output +and is available in the Modal [dashboard](https://modal.com/apps) for the Function. + +To determine a Function's URL programmatically, +check its [`get_web_url()`](https://modal.com/docs/reference/modal.Function#get_web_url) +property: + +```python +@app.function(image=modal.Image.debian_slim().pip_install("fastapi[standard]")) +@modal.fastapi_endpoint(docs=True) +def show_url() -> str: + return show_url.get_web_url() +``` + +For deployed Functions, this also works from other Python code! +You just need to do a [`from_name`](https://modal.com/docs/reference/modal.Function#from_name) +based on the name of the Function and its [App](https://modal.com/docs/guide/apps): + +```python notest +import requests + +remote_function = modal.Function.from_name("app", "show_url") +remote_function.get_web_url() == requests.get(handle.get_web_url()).json() +``` + +## Auto-generated URLs + +By default, Modal Functions +will be served from the `modal.run` domain. +The full URL will be constructed from a number of pieces of information +to uniquely identify the endpoint. + +At a high-level, web endpoint URLs for deployed applications have the +following structure: `https://--