diff --git a/README.md b/README.md index 9e8f50f..e3e8ae3 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,7 @@ Samples are organized by use case (training, inference) below: | --- | --- | --- | | [BERT Inference on SageMaker](inference/inf2-bert-on-sagemaker) | Sample inference notebook using Hugging Face BERT model | Inf2, Trn1, Trn1n | | [Stable Diffusion Inference on SageMaker](inference/stable-diffusion/) | How to compile and run HF Stable Diffusion model on SageMaker | Inf2, Trn1, Trn1n | +| [CLIP Inference on SageMaker](inference/inf2-clip-on-sagemaker/) |Sample notebook to compile and deploy a pretrained CLIP model | Inf2, Trn1, Trn1n | ## Getting Help diff --git a/inference/inf2-clip-on-sagemaker/inf2_clip_sagemaker.ipynb b/inference/inf2-clip-on-sagemaker/inf2_clip_sagemaker.ipynb new file mode 100644 index 0000000..dcfe7a9 --- /dev/null +++ b/inference/inf2-clip-on-sagemaker/inf2_clip_sagemaker.ipynb @@ -0,0 +1,1392 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "1849fdfc-f4e9-422d-9561-1886491bccd9", + "metadata": {}, + "source": [ + "# Compiling and Deploying HuggingFace Pretrained CLIP on Inf2 on Amazon SageMaker" + ] + }, + { + "cell_type": "markdown", + "id": "fc79137e-3548-4119-907b-f2a932664101", + "metadata": {}, + "source": [ + "## Overview" + ] + }, + { + "cell_type": "markdown", + "id": "b5f4db48-e101-43eb-8b5c-9d8abebdfb9a", + "metadata": {}, + "source": [ + "In this notebook, we will compile and deploy a pretrained CLIP model from HuggingFace Transformers, using the [AWS Deep Learning Containers](https://github.com/aws/deep-learning-containers). We use AWS Deep Learning Containers as they offer a convenient, pre-configured environment with necessary deep learning framework and AWS Neuron dependencies. \n", + "\n", + "CLIP (Contrastive Language–Image Pretraining) is a model designed to understand and relate text and image data. It is especially known to perform well on zero-shot classification tasks, where it can classify images into categories it hasn't been explicitly trained on, by understanding the relationship between text and image content. You can find more information on the model architecture in this paper: https://arxiv.org/abs/2103.00020, and a full list of CLIP models on this page: https://huggingface.co/models?sort=trending&search=clip.\n", + "\n", + "By the end of this tutorial, you will have a clear understanding of how to optimize a CLIP model for AWS infrastructure, including model compilation, and deployment on Inferentia 2.\n", + "\n", + "This Jupyter Notebook was tested on a ml.t3.medium SageMaker Notebook instance with PyTorch 2.0.0 Python 3.10 CPU kernel, in the us-east-1 region. " + ] + }, + { + "cell_type": "markdown", + "id": "5522ca52-6ca0-4087-8ed5-ecff50f91397", + "metadata": { + "tags": [] + }, + "source": [ + "## Install Dependencies:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fed0160c-28c5-4815-a423-85b5d905eaee", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%pip install --upgrade pip" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "600aa3d9-344e-414f-a0e1-6da2d4428955", + "metadata": { + "scrolled": true, + "tags": [] + }, + "outputs": [], + "source": [ + "%pip install --upgrade sagemaker boto3 awscli " + ] + }, + { + "cell_type": "markdown", + "id": "7bb0bb14-8fc0-4076-aa5e-956ed71ec3a0", + "metadata": {}, + "source": [ + "## Compile the model into an AWS Neuron optimized TorchScript\n", + "\n", + "In the following section we will compile the model into an AWS Neuron optimized TorchScript. We start with a src directory where we create the following files:\n", + "\n", + "- A **'compile_clip.py'** file: In this script we perform several key tasks:\n", + " - Loading the CLIP Model: we import and load the CLIP model (specifically, the 'openai/clip-vit-large-patch14' version) from the Hugging Face Transformers library. \n", + " - Sample input preparation: we retrieve a sample input comprising both text and image data from the CIFAR100 dataset. The CIFAR100 dataset is a collection of images classified into 100 different classes, and these classes provide the textual component.\n", + " - Processing the sample Input: We then process this input using the CLIPProcessor to ensure the data is in the correct tensor format for the model. This step involves tokenizing the text and appropriately formatting the image data.\n", + " - Model Compilation for AWS Neuron: Using torch_neuronx.trace(), we compile the CLIP model for optimized execution on AWS Neuron hardware. This step is crucial for deploying the model on AWS Inferentia chips.\n", + " - Saving the Optimized Model: Finally, the compiled model is saved as a TorchScript file, allowing the model to be executed in a variety of environments.\n", + "\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7abcef04-8f53-4489-a1cf-e30c2e73ea19", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "os.makedirs(\"src\", exist_ok=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bbb47d75-2ea7-4519-82d1-27412929f618", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%%writefile src/compile_clip.py\n", + "import os\n", + "import tarfile\n", + "import torch\n", + "import torch_neuronx\n", + "from transformers import CLIPProcessor, CLIPModel\n", + "from torchvision.datasets import CIFAR100\n", + "\n", + "# Disable parallelism in Hugging Face's tokenizer to avoid potential issues. \n", + "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n", + "\n", + "model_name = 'openai/clip-vit-large-patch14'\n", + "\n", + "\n", + "if __name__=='__main__':\n", + " # Create the input preprocessor and model\n", + " processor = CLIPProcessor.from_pretrained(model_name) \n", + " model = CLIPModel.from_pretrained(model_name, return_dict=False)\n", + " model.eval()\n", + "\n", + " # Load cifar100 dataset\n", + " cifar100 = CIFAR100(root=os.path.expanduser(\"~/.cache\"), download=True, train=False)\n", + "\n", + " # Get text captions for the model to classify the image against\n", + " text = cifar100.classes\n", + "\n", + " # Get sample input (the first image in the CIFAR-100 dataset)\n", + " image = cifar100[0][0]\n", + "\n", + " # Process sample input text and image data from the CIFAR100 dataset\n", + " inputs = processor(text=text, images=image, return_tensors=\"pt\", padding=True)\n", + " \n", + " # Example input that the function will use to trace the model's execution\n", + " example = (inputs['input_ids'], inputs['pixel_values'])\n", + "\n", + " # Compile the PyTorch model for optimized execution on inf2, using the AWS Neuron SDK\n", + " model_neuron = torch_neuronx.trace(model, example, compiler_args='--enable-saturate-infinity --target=inf2')\n", + "\n", + " # Save the TorchScript for inference deployment\n", + " torch.jit.save(model_neuron, '/tmp/neuron_compiled_model.pt')\n", + " with tarfile.open(os.path.join(\"/opt/ml/model/\", 'model.tar.gz'), \"w:gz\") as tar:\n", + " tar.add('/tmp/neuron_compiled_model.pt', \"neuron_compiled_model.pt\")\n", + " " + ] + }, + { + "cell_type": "markdown", + "id": "2e29610e-7c1b-4b72-a96a-4921030fb15d", + "metadata": {}, + "source": [ + "## Get Sagemaker execution role\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0a34f8e6-2844-4f3e-978c-f91aa9609d51", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import boto3\n", + "import sagemaker\n", + "\n", + "sess = sagemaker.Session()\n", + "role = sagemaker.get_execution_role()\n", + "sess_bucket = sess.default_bucket()\n", + "\n", + "prefix = \"inf2_compiled_model\"\n", + "\n", + "print(f\"sagemaker role arn: {role}\")\n", + "print(f\"sagemaker bucket: {sess_bucket}\")\n", + "print(f\"sagemaker session region: {sess.boto_region_name}\")" + ] + }, + { + "cell_type": "markdown", + "id": "981c1772-0def-43d1-9813-36d6d2795815", + "metadata": {}, + "source": [ + "## Create Pytorch estimator \n", + "\n", + "In this section we create a Pytorch estimator, with the Pytorch class from the 'sagemaker.pytorch' module. This estimator is a high-level abstraction for running PyTorch jobs in SageMaker, simplifying the process of training and deploying PyTorch models. It allows you to execute a custom Python script (compile_clip.py in this case, including the steps to compile the CLIP model using AWS Neuron) on a SageMaker-managed instance. In this case we compile the model with the ml.trn1.2xlarge instance.\n", + "\n", + "Configuration of the Estimator:\n", + "- entry_point: The path to your Python script (compile_clip.py) that contains the code for compiling the CLIP model.\n", + "- source_dir: Specifies the directory (src) where additional code and files related to the entry_point script are located.\n", + "- role, sagemaker_session: These parameters pass the AWS role and the SageMaker session information, respectively, which are essential for accessing AWS resources.\n", + "- instance_count: Set to 1, indicating that the job will run on a single instance.\n", + "- output_path: Specifies the S3 bucket path where the output of the job (the compiled model) will be stored.\n", + "- disable_profiler and disable_output_compression: These are specific configurations to control the SageMaker job behavior, like disabling the built-in profiler and output compression for efficiency or debugging purposes.\n", + "- image_uri: This specifies the Docker image to be used for the compiling job. It points to an AWS Deep Learning Container image with PyTorch and the AWS Neuron SDK, optimized for model compilation and training. \n", + "- volume_size: Defines the size of the EBS volume attached to the instance.\n", + "\n", + "estimator.fit() initiates the SageMaker training job, which will take around 10 minutes to complete. It will use the configurations specified above to launch a SageMaker training instance, run your compile_clip.py script on this instance, and output the results to the specified S3 path. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "cf9d7d3d-7457-4961-b7c3-6d6969a42d76", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from sagemaker.pytorch import PyTorch\n", + "\n", + "instance_type = \"ml.trn1.2xlarge\"\n", + "\n", + "estimator = PyTorch(\n", + " entry_point=\"compile_clip.py\",\n", + " source_dir=\"src\",\n", + " role=role,\n", + " sagemaker_session=sess,\n", + " instance_count=1,\n", + " instance_type=instance_type,\n", + " output_path=f\"s3://{sess_bucket}/{prefix}\",\n", + " disable_profiler=True,\n", + " disable_output_compression=True,\n", + " image_uri=f\"763104351884.dkr.ecr.{sess.boto_region_name}.amazonaws.com/pytorch-training-neuronx:1.13.1-neuronx-py310-sdk2.15.0-ubuntu20.04\",\n", + " volume_size=128,\n", + ")\n", + "\n", + "estimator.framework_version = \"1.13.1\" # workround when using image_uri" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6fb83246-465d-48e0-a519-2c92f622cb0d", + "metadata": { + "collapsed": true, + "jupyter": { + "outputs_hidden": true + }, + "tags": [] + }, + "outputs": [], + "source": [ + "%%time\n", + "estimator.fit()" + ] + }, + { + "cell_type": "markdown", + "id": "debd4d7f-038d-429e-9ab9-2eda17a6ee65", + "metadata": {}, + "source": [ + "## Deploy Container and run inference based on the pretrained model" + ] + }, + { + "cell_type": "markdown", + "id": "872bd679-0694-4b96-925a-75fe675ff3a4", + "metadata": {}, + "source": [ + "To deploy a pretrained PyTorch model, you'll need to use the PyTorch estimator object to create a PyTorchModel object and set a different entry_point.\n", + "\n", + "The entry_point will be the inference script (inference.py). The inference.py script contains several key functions that SageMaker invokes during the inference process:\n", + "- **model_fn**: This function is called once when the SageMaker endpoint is first started. It loads the model from the provided directory (typically from the path where model artifacts are unarchived) and returns the model object.\n", + "- **input_fn**: Each time an inference request is made, this function is invoked to process the incoming data (e.g., JSON payload, images) into a format that the model can work with.\n", + "- **predict_fn**: After input_fn, the predict_fn function is called with the processed data and the model loaded by model_fn. This is where the actual inference (prediction) happens.\n", + "- **output_fn**: Finally, the output_fn function formats the output of predict_fn into the response format that will be returned to the client.\n", + "\n", + "After the inference script is prepared, you use the **PyTorchModel** class from the SageMaker Python SDK to create a model object. This object requires the S3 URI of the compiled model artifacts, the role for SageMaker to access AWS resources, and the Docker image URI for the inference container. The entry_point parameter points to your inference.py script, which contains the logic for handling inference requests.\n", + "\n", + "Lastly, the deploy method of the PytorchModel object is used to create a SageMaker Endpoint -- a hosted prediction service that we can use to perform inference." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dedfd83f-257c-4e49-933f-dc55d3058fc7", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import os\n", + "\n", + "os.makedirs(\"code\", exist_ok=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b683e624-ac72-4dd6-9181-8b300969476c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%%writefile code/inference.py\n", + "\n", + "import os\n", + "import io\n", + "import json\n", + "import base64\n", + "\n", + "import torch\n", + "import torch_neuronx\n", + "\n", + "from PIL import Image\n", + "from transformers import CLIPProcessor\n", + "\n", + "os.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n", + "\n", + "JSON_CONTENT_TYPE = 'application/json'\n", + "\n", + "def model_fn(model_dir):\n", + " \"\"\"Loads the model from the provided directory.\"\"\"\n", + " model_file = os.path.join(model_dir, 'neuron_compiled_model.pt')\n", + " model_neuron = torch.jit.load(model_file)\n", + " return model_neuron\n", + "\n", + "def input_fn(serialized_input_data, content_type=JSON_CONTENT_TYPE):\n", + " \"\"\"Processes incoming data into a format the model can work with.\"\"\"\n", + " if content_type == JSON_CONTENT_TYPE:\n", + " input_data = json.loads(serialized_input_data)\n", + " \n", + " base_64_img_str = input_data['image']\n", + " image = Image.open(io.BytesIO(base64.decodebytes(bytes(base_64_img_str, \"utf-8\"))))\n", + " text = input_data['candidate_labels']\n", + " \n", + " return (image, text)\n", + "\n", + " else:\n", + " raise Exception('Requested unsupported ContentType in Accept: ' + content_type)\n", + " return\n", + "\n", + "def predict_fn(input_data, models):\n", + " \"\"\"Takes the model and input data, and generates the prediction.\"\"\"\n", + " model_neuron = models\n", + " processor = CLIPProcessor.from_pretrained('openai/clip-vit-large-patch14')\n", + " image, text = input_data\n", + " \n", + " inputs = processor(text=text, images=image, return_tensors=\"pt\", padding=True)\n", + " input_data = (inputs['input_ids'], inputs['pixel_values'])\n", + " \n", + " output_neuron = model_neuron(*input_data)\n", + " \n", + " softmax_probs = output_neuron[0][0].softmax(dim=-1)\n", + " \n", + " label_probabilities = {text[i]: 100 * prob.item() for i, prob in enumerate(softmax_probs)}\n", + " sorted_label_probabilities = dict(sorted(label_probabilities.items(), key=lambda item: item[1], reverse=True))\n", + " formatted_label_probabilities = {label: f\"{prob}%\" for label, prob in sorted_label_probabilities.items()}\n", + " \n", + " return formatted_label_probabilities\n", + "\n", + "def output_fn(prediction_output, accept=JSON_CONTENT_TYPE):\n", + " \"\"\"formats the output of predict_fn into the response format that will be returned to the client.\"\"\"\n", + " if accept == JSON_CONTENT_TYPE:\n", + " return json.dumps(prediction_output), accept\n", + "\n", + " raise Exception('Requested unsupported ContentType in Accept: ' + accept)" + ] + }, + { + "cell_type": "markdown", + "id": "1918127a-fcd4-4f45-9782-3f06a416fb75", + "metadata": {}, + "source": [ + "Path of compiled pretrained model in S3:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "46245367-83dc-4af0-8b2d-5f134f3ebe18", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "s3_model_uri = f\"{estimator.model_data['S3DataSource']['S3Uri']}model.tar.gz\"" + ] + }, + { + "cell_type": "markdown", + "id": "ff8a01d9-e7e5-4ba4-baf5-08f5ff184823", + "metadata": {}, + "source": [ + "Note, **image_uri** is from the [Neuron Containers](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-containers). This is a Docker image that is optimized for running PyTorch inference on AWS Neuron. This image includes the necessary dependencies and configurations for PyTorch and AWS Neuron SDK." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7ba75d76-bb74-4059-a058-c7f9c9a4720e", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from sagemaker.pytorch.model import PyTorchModel\n", + "\n", + "ecr_image = f\"763104351884.dkr.ecr.{sess.boto_region_name}.amazonaws.com/pytorch-inference-neuronx:1.13.1-neuronx-py310-sdk2.15.0-ubuntu20.04\"\n", + "\n", + "pytorch_model = PyTorchModel(\n", + " model_data=s3_model_uri,\n", + " role=role,\n", + " source_dir=\"code\",\n", + " entry_point=\"inference.py\",\n", + " image_uri=ecr_image,\n", + ")\n", + "\n", + "# Let SageMaker know that we've already compiled the model via neuron-cc\n", + "pytorch_model._is_compiled_model = True" + ] + }, + { + "cell_type": "markdown", + "id": "96130d27-77bf-4a3d-bcf1-eddb5f0a2669", + "metadata": {}, + "source": [ + "The arguments to the deploy function allow us to set the number and type of instances that will be used for the Endpoint.\n", + "\n", + "Here you will deploy the model to a single **ml.inf2.xlarge** instance.\n", + "It may take 6-10 min to deploy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c7e314c8-a99b-4f76-b5b6-5c854d5e6530", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "%%time\n", + "\n", + "predictor = pytorch_model.deploy(\n", + " instance_type=\"ml.inf2.xlarge\",\n", + " initial_instance_count=1,\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "06c8afcc-01e6-49b6-8040-dedd71257459", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "print(predictor.endpoint_name)" + ] + }, + { + "cell_type": "markdown", + "id": "649e5669-cc1b-4f9a-ba7d-757adc1d4b2c", + "metadata": {}, + "source": [ + "## Perform inference on your deployed endpoint. \n", + "\n", + "In order to perform inference on the deployed endpoint, we will need to convert the image we want to classify into a base64 encoded string, with the **image_to_base64** function. It can handle both a filepath (a string pointing to an image file) and a PIL Image object. \n", + "This encoding is necessary because the inference request needs to be serialized into a JSON format before we send it to the endpoint. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "69d25b09-1491-41ee-93a8-13d1640695ab", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import base64\n", + "import io\n", + "import os\n", + "\n", + "from PIL import Image\n", + "\n", + "\n", + "def image_to_base64(img) -> str:\n", + " \"\"\"Convert a PIL Image or local image file path to a base64 string\"\"\"\n", + " if isinstance(img, str):\n", + " if os.path.isfile(img):\n", + " with open(img, \"rb\") as f:\n", + " return base64.b64encode(f.read()).decode(\"utf-8\")\n", + " else:\n", + " raise FileNotFoundError(f\"File {img} does not exist\")\n", + "\n", + " elif isinstance(img, Image.Image):\n", + " buffer = io.BytesIO()\n", + " img.save(buffer, format=\"PNG\")\n", + " return base64.b64encode(buffer.getvalue()).decode(\"utf-8\")\n", + "\n", + " else:\n", + " raise ValueError(f\"Expected str (filename) or PIL Image. Got {type(img)}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "472fcaae-38e4-43c1-a096-ea8789e5a533", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Take example image from CIFAR-100\n", + "image = Image.open(\n", + " io.BytesIO(\n", + " base64.decodebytes(\n", + " bytes(\n", + " \"iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAIAAAD8GO2jAAAGUUlEQVR4nGVW3Y5cRxms7+s+8+PZtXdNHMexY0wimaAQCXFhuEVccoHEC3DBK/ESPAL3kEiJFGQkB5tYDt4YO95dj3e8Oztzzumvios+M15gRpqfMzpdXfVVVY/94Y8P3LJEwDA8DAZJETF/efDs689OXjxavHiE0jaT3TfzJ9nHP7j5M0tN6Xu47d28u3frp/s3P5zuXIUlgyTVBcnIo+RmDhlsCwDAKI5yunHro6vXbp6fzp8/+vLxF3+K9TI3MyfN0S5f0RKglw//cjZ/5qPfXr581T0JlCg5zBTKTU7uSRTM7L8wkiABo/Huzu4uGEff/f3lN5+JnTydvn46nu6PR5PSLpvZ/u61O9dvfjiZTCUKeWBgTrfcZHM3CTDfiLSFkQQJkq7fuvOTe7+ZP3+wXh416dJqcdivlmgmsT4bT/d2lneabKNRJgtkkiQzszDPTZPMk6QqUX25wKMCYGx+5+OfL3/9+8X3j7vz0zevnkUpo8kVAbP9G7d//Mudnd0mG5kgQJAEWAJz05gNDCqA/R8JqzA7l2f3fvU7SB1Lt1pGVzw1pe9Csbv/TvLMCDFJAiASMJrnUTZLFWB4Aibgf0jUd8ABGyFjdgkwkdUxQUKCOyXJIIEOIQy5yebZJcIMkJkBDgMEQXj7yTa06mBKJS1KUOMmQJJogiRTyGBhlscjeBIHm1aV6rrbjVuF2jq4zlBVa0C0jRcgV71Og0kGDRKRMt9KL0BmEAzDAAS8RZGsjlEVoK7ILRJFVggT8yTDXHTBTaI0iLKZ9mbZ7TcNHqkAAxJB31yTBNSFzJFnDd2NGGZGiVXFbe60IQWIqDfrwi8C6tZp9VZSMhCCocsNiltd0ZEggYIMNtxub0nI5MOM31KBRNBUgrSIYZMBBYPt+ixn0pySQG1DZheFqRHUZsCbMhNEMUIlpCAiwB5dtz59c3T4bL1adm1p1+c5Su8aqmijxkXT2NsUbMaqQQj2fb/uSldoiIy+9H0pPH3z+uGDrx7+42+vX71cn5/lrg8PUDJ4XYjbqEPVRYCZBNaLkFgi+r5ftV1bglH68+P7X/x5Pl/Mruz0xY6OXhweHiwXJ+7KUYoSSG3zW0t0cAMMGsxHimQp0Zfo+1L6QrFbL7599PnBk/tHLx93qzKdjNxsverevTx576NPD4/mue87Y5CS+WaDW5EB2wKIIskIsrBfL5P1u9NRx+7qLOPW7Q9u3Si9mjxOCX27nmbsTJu/fv5VXrdr80xpqCBdGMSmYoeCrI0hmbMZN2Q67dDZlekH9/avreYnc3Z9b1qV0kxXOZ0++fab714c5a7rakdtfO2sjufFBFSXqaoYIqUSioiI6Ps2uhYs0Z+beRL7wuOuSZPdvVmTz9etWSEpWC+GzGBkJWRmmzDV1A2tK9XQS1QQgPtoNJZk7iUi5TGjvP++rv/i0/zs+Mxg7ubJaXC3ZElSSkOUBZAyq8AcKFntH/WMIMOy8sSVAJgFXJ7UGc/PS85ZycxAM2hwYwgCbfh3ULMtiIPNQgLq0U5SUQsGKYYycYqFVMt/Pvk+t12Bp6AAmilkrHM2DFWKQRAOBThEIkhC2jpMohAMDmXar1698Gx52XaCh2wbJgCBWnkDmGw4H+qe63SG+nvbz9V4NAiwkVaX4th2c16cdzAnBPONuhSMqt0pXghF9drQy5vzdGDCbVtHiTJB3L1x41+vDnLb9zQPklviAZjLkdzczH3Ya4Ri3UUQSJ7N3IIETFSQMBkBkGaKWAmjcXO6WOSj+YJw1YPGzJK5u5u5W0CkRALMyV0oXQHMUpQiSixRC5KQmaIoIXZSG21r5vuz6d0fXsvL1bnnnNxz9pQSDAYyWNpChkwmGdlTkLEQySyBhGQGpeQwM9GTj73caY6v6/iU63Xhu6OP3/vkdp5Nk5mJEV3blmAJkjCTaG5e0yCQhAFKOehk45rGcurr00W3Kmwskjez1K2X/37h3NmZvTMdlZN5BvP5yZwRLAGzbaDMYMlSTiJIutc/AMgq13Q8Kmdd6fuz15evjLFsF4cnq67du7yzHk1evjkb5XTzRvrkR9evXEr3vz7Iq9evILOUUpNrsXlKLMG20OoYaQgzQmlSjtuzp+FpsWoXZ210V8fj5G4t/ODwdR8yIDn292alj6fPT0D7D+4UwZGygvYMAAAAAElFTkSuQmCC\",\n", + " \"utf-8\",\n", + " )\n", + " )\n", + " )\n", + ")\n", + "\n", + "image" + ] + }, + { + "cell_type": "markdown", + "id": "4dbeea2e-6ff1-4eee-8fc6-c7a779c81c20", + "metadata": {}, + "source": [ + "Since in the input_fn we declared that the incoming requests are json-encoded, we need to use a json serializer, to make sure the data is converted to JSON format before sending it to the model. \n", + "\n", + "Also, we declared the return content type to be a JSON string, so we need to use a json deserializer to parse the response." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "a8dd1078-8bfc-41ec-970d-24c4915c0f00", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "predictor.serializer = sagemaker.serializers.JSONSerializer()\n", + "predictor.deserializer = sagemaker.deserializers.JSONDeserializer()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9dd0c406-073c-4cf7-a626-6f0f585161a5", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "from torchvision.datasets import CIFAR100\n", + "\n", + "# Define CIFAR100 dataset and candidate labels from the dataset.\n", + "cifar100 = CIFAR100(root=os.path.expanduser(\"~/.cache\"), download=True, train=False)\n", + "candidate_labels = cifar100.classes\n", + "\n", + "data = {\n", + " \"image\": image_to_base64(image),\n", + " \"candidate_labels\": candidate_labels,\n", + "}\n", + "\n", + "predictor.predict(data)" + ] + }, + { + "cell_type": "markdown", + "id": "5c5d3127-aaf0-444a-a797-b0a7952f1973", + "metadata": {}, + "source": [ + "\n", + "## Benchmarking your endpoint\n", + "\n", + "The following cells create a load test for your endpoint. You first define some helper functions: \n", + "\n", + "- **inference_latency** runs the endpoint request, collects client side latency and any errors.\n", + "- **random_image_from_cifar100** returns random images and candidate labels from the cifar100 dataset to be sent to the endpoint. It also returns the image in decoded form. \n", + "\n", + "We uses parallel processing (with Parallel and delayed from the joblib library) to simulate multiple clients sending inference requests to the endpoint. This is done to measure how the model performs under a certain load.\n", + "number_of_clients and number_of_runs define the parallelism level and the total number of inference requests, respectively.\n", + "\n", + "We then calculate throughput (number of inferences per second) and various latency metrics (like the 50th, 90th, and 95th percentile latencies) based on the collected data. Percentile latencies give an idea of how long a certain percentage of requests take to be processed, which is useful for understanding performance under different load conditions.\n", + "\n", + "The script also uses AWS CloudWatch to get additional metrics on the model's performance, including average latency and percentile latencies directly from CloudWatch." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "303338d6-b41d-48b9-94ce-d7ea0c231e28", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "import datetime\n", + "import math\n", + "import random\n", + "import time\n", + "\n", + "import numpy as np\n", + "from joblib import Parallel, delayed\n", + "from tqdm import tqdm" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "4381613b-6363-4381-ae59-07763cbe975b", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "def inference_latency(model, *data):\n", + " \"\"\"inference_latency() is a simple method to return the latency of a model inference.\"\"\"\n", + " error = False\n", + " start = time.time()\n", + " try:\n", + " results = model(*data)\n", + " except:\n", + " error = True\n", + " results = []\n", + " return {\"latency\": time.time() - start, \"error\": error, \"result\": results}" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c99ec1d1-0ddf-45bd-b1b6-131c44f2a52c", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "def random_image_from_cifar100():\n", + " \"\"\"random_image_from_cifar100() randomly select an image from the cifar100 dataset together with the candidate labels, and returns a tuple with the image and candidate classes.\"\"\"\n", + " # Randomly select an image\n", + " random_index = random.randint(0, len(cifar100) - 1)\n", + " image, _ = cifar100[random_index]\n", + "\n", + " # Convert image to base64\n", + " image_base64 = image_to_base64(image)\n", + "\n", + " return {\n", + " \"image\": image_base64,\n", + " \"candidate_labels\": candidate_labels,\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b45dbcec-c151-465a-a328-e2c1ebbdc011", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Defining Auxiliary variables\n", + "number_of_clients = 2\n", + "number_of_runs = 1000\n", + "t = tqdm(range(number_of_runs), position=0, leave=True)\n", + "\n", + "# Starting parallel clients\n", + "cw_start = datetime.datetime.utcnow()\n", + "\n", + "results = Parallel(n_jobs=number_of_clients, prefer=\"threads\")(\n", + " delayed(inference_latency)(predictor.predict, random_image_from_cifar100())\n", + " for _ in t\n", + ")\n", + "avg_throughput = t.total / t.format_dict[\"elapsed\"]\n", + "\n", + "cw_end = datetime.datetime.utcnow()\n", + "\n", + "# Computing metrics and print\n", + "latencies = [res[\"latency\"] for res in results]\n", + "errors = [res[\"error\"] for res in results]\n", + "error_p = sum(errors) / len(errors) * 100\n", + "p50 = np.quantile(latencies[-10000:], 0.50) * 1000\n", + "p90 = np.quantile(latencies[-10000:], 0.95) * 1000\n", + "p95 = np.quantile(latencies[-10000:], 0.99) * 1000\n", + "\n", + "print(f\"Avg Throughput: :{avg_throughput:.1f}\\n\")\n", + "print(f\"50th Percentile Latency:{p50:.1f} ms\")\n", + "print(f\"90th Percentile Latency:{p90:.1f} ms\")\n", + "print(f\"95th Percentile Latency:{p95:.1f} ms\\n\")\n", + "print(f\"Errors percentage: {error_p:.1f} %\\n\")\n", + "\n", + "# Querying CloudWatch\n", + "print(\"Getting Cloudwatch:\")\n", + "cloudwatch = boto3.client(\"cloudwatch\")\n", + "statistics = [\"SampleCount\", \"Average\", \"Minimum\", \"Maximum\"]\n", + "extended = [\"p50\", \"p90\", \"p95\", \"p100\"]\n", + "\n", + "# Give 5 minute buffer to end\n", + "cw_end += datetime.timedelta(minutes=5)\n", + "\n", + "# Period must be 1, 5, 10, 30, or multiple of 60\n", + "# Calculate closest multiple of 60 to the total elapsed time\n", + "factor = math.ceil((cw_end - cw_start).total_seconds() / 60)\n", + "period = factor * 60\n", + "print(f\"Time elapsed: {(cw_end - cw_start).total_seconds()} seconds\")\n", + "print(f\"Using period of {period} seconds\\n\")\n", + "\n", + "cloudwatch_ready = False\n", + "# Keep polling CloudWatch metrics until datapoints are available\n", + "while not cloudwatch_ready:\n", + " time.sleep(30)\n", + " print(\"Waiting 30 seconds ...\")\n", + " # Must use default units of microseconds\n", + " model_latency_metrics = cloudwatch.get_metric_statistics(\n", + " MetricName=\"ModelLatency\",\n", + " Dimensions=[\n", + " {\"Name\": \"EndpointName\", \"Value\": predictor.endpoint_name},\n", + " {\"Name\": \"VariantName\", \"Value\": \"AllTraffic\"},\n", + " ],\n", + " Namespace=\"AWS/SageMaker\",\n", + " StartTime=cw_start,\n", + " EndTime=cw_end,\n", + " Period=period,\n", + " Statistics=statistics,\n", + " ExtendedStatistics=extended,\n", + " )\n", + "\n", + " if len(model_latency_metrics[\"Datapoints\"]) > 0:\n", + " print(\n", + " \"{} latency datapoints ready\".format(\n", + " model_latency_metrics[\"Datapoints\"][0][\"SampleCount\"]\n", + " )\n", + " )\n", + " side_avg = model_latency_metrics[\"Datapoints\"][0][\"Average\"] / 1000\n", + " side_p50 = (\n", + " model_latency_metrics[\"Datapoints\"][0][\"ExtendedStatistics\"][\"p50\"] / 1000\n", + " )\n", + " side_p90 = (\n", + " model_latency_metrics[\"Datapoints\"][0][\"ExtendedStatistics\"][\"p90\"] / 1000\n", + " )\n", + " side_p95 = (\n", + " model_latency_metrics[\"Datapoints\"][0][\"ExtendedStatistics\"][\"p95\"] / 1000\n", + " )\n", + " side_p100 = (\n", + " model_latency_metrics[\"Datapoints\"][0][\"ExtendedStatistics\"][\"p100\"] / 1000\n", + " )\n", + "\n", + " print(f\"50th Percentile Latency:{side_p50:.1f} ms\")\n", + " print(f\"90th Percentile Latency:{side_p90:.1f} ms\")\n", + " print(f\"95th Percentile Latency:{side_p95:.1f} ms\\n\")\n", + "\n", + " cloudwatch_ready = True" + ] + }, + { + "cell_type": "markdown", + "id": "52e867d9-b7ac-4121-adfd-e675879fa182", + "metadata": {}, + "source": [ + "## Cleanup\n", + "Endpoints should be deleted when no longer in use, to avoid costs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "803e779c-12c2-4851-9790-6575d1620785", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "predictor.delete_model()\n", + "predictor.delete_endpoint()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ddc55323-4217-4952-a95e-2cd020445ad4", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "availableInstances": [ + { + "_defaultOrder": 0, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.t3.medium", + "vcpuNum": 2 + }, + { + "_defaultOrder": 1, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.t3.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 2, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.t3.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 3, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.t3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 4, + "_isFastLaunch": true, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 5, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 6, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 7, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 8, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 9, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 10, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 11, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 12, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.m5d.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 13, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.m5d.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 14, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.m5d.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 15, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.m5d.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 16, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.m5d.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 17, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.m5d.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 18, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.m5d.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 19, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.m5d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 20, + "_isFastLaunch": false, + "category": "General purpose", + "gpuNum": 0, + "hideHardwareSpecs": true, + "memoryGiB": 0, + "name": "ml.geospatial.interactive", + "supportedImageNames": [ + "sagemaker-geospatial-v1-0" + ], + "vcpuNum": 0 + }, + { + "_defaultOrder": 21, + "_isFastLaunch": true, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 4, + "name": "ml.c5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 22, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 8, + "name": "ml.c5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 23, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.c5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 24, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.c5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 25, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 72, + "name": "ml.c5.9xlarge", + "vcpuNum": 36 + }, + { + "_defaultOrder": 26, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 96, + "name": "ml.c5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 27, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 144, + "name": "ml.c5.18xlarge", + "vcpuNum": 72 + }, + { + "_defaultOrder": 28, + "_isFastLaunch": false, + "category": "Compute optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.c5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 29, + "_isFastLaunch": true, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g4dn.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 30, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g4dn.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 31, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g4dn.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 32, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g4dn.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 33, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g4dn.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 34, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g4dn.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 35, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 61, + "name": "ml.p3.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 36, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 244, + "name": "ml.p3.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 37, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 488, + "name": "ml.p3.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 38, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.p3dn.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 39, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.r5.large", + "vcpuNum": 2 + }, + { + "_defaultOrder": 40, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.r5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 41, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.r5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 42, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.r5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 43, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.r5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 44, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.r5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 45, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 512, + "name": "ml.r5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 46, + "_isFastLaunch": false, + "category": "Memory Optimized", + "gpuNum": 0, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.r5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 47, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 16, + "name": "ml.g5.xlarge", + "vcpuNum": 4 + }, + { + "_defaultOrder": 48, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 32, + "name": "ml.g5.2xlarge", + "vcpuNum": 8 + }, + { + "_defaultOrder": 49, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 64, + "name": "ml.g5.4xlarge", + "vcpuNum": 16 + }, + { + "_defaultOrder": 50, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 128, + "name": "ml.g5.8xlarge", + "vcpuNum": 32 + }, + { + "_defaultOrder": 51, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 1, + "hideHardwareSpecs": false, + "memoryGiB": 256, + "name": "ml.g5.16xlarge", + "vcpuNum": 64 + }, + { + "_defaultOrder": 52, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 192, + "name": "ml.g5.12xlarge", + "vcpuNum": 48 + }, + { + "_defaultOrder": 53, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 4, + "hideHardwareSpecs": false, + "memoryGiB": 384, + "name": "ml.g5.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 54, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 768, + "name": "ml.g5.48xlarge", + "vcpuNum": 192 + }, + { + "_defaultOrder": 55, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4d.24xlarge", + "vcpuNum": 96 + }, + { + "_defaultOrder": 56, + "_isFastLaunch": false, + "category": "Accelerated computing", + "gpuNum": 8, + "hideHardwareSpecs": false, + "memoryGiB": 1152, + "name": "ml.p4de.24xlarge", + "vcpuNum": 96 + } + ], + "instance_type": "ml.t3.medium", + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.4" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +}