diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.de-de.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.de-de.md index 6bd3026e577..cedd199e5d7 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.de-de.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.de-de.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Getting started excerpt: Discover AI Deploy and unfold your first application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-asia.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-asia.md index 6bd3026e577..cedd199e5d7 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-asia.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-asia.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Getting started excerpt: Discover AI Deploy and unfold your first application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-au.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-au.md index 6bd3026e577..cedd199e5d7 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-au.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-au.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Getting started excerpt: Discover AI Deploy and unfold your first application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-ca.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-ca.md index 6bd3026e577..cedd199e5d7 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-ca.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-ca.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Getting started excerpt: Discover AI Deploy and unfold your first application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-gb.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-gb.md index 6bd3026e577..cedd199e5d7 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-gb.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-gb.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Getting started excerpt: Discover AI Deploy and unfold your first application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-ie.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-ie.md index 6bd3026e577..cedd199e5d7 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-ie.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-ie.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Getting started excerpt: Discover AI Deploy and unfold your first application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-sg.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-sg.md index 6bd3026e577..cedd199e5d7 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-sg.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-sg.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Getting started excerpt: Discover AI Deploy and unfold your first application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-us.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-us.md index 6bd3026e577..cedd199e5d7 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-us.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.en-us.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Getting started excerpt: Discover AI Deploy and unfold your first application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.es-es.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.es-es.md index 6bd3026e577..cedd199e5d7 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.es-es.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.es-es.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Getting started excerpt: Discover AI Deploy and unfold your first application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.es-us.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.es-us.md index 6bd3026e577..cedd199e5d7 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.es-us.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.es-us.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Getting started excerpt: Discover AI Deploy and unfold your first application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.fr-ca.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.fr-ca.md index ef94c2b7baf..aa2d675faba 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.fr-ca.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.fr-ca.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Premiers pas (EN) excerpt: Découvrez AI Deploy et lancez votre première application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.fr-fr.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.fr-fr.md index ef94c2b7baf..aa2d675faba 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.fr-fr.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.fr-fr.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Premiers pas (EN) excerpt: Découvrez AI Deploy et lancez votre première application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.it-it.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.it-it.md index 6bd3026e577..cedd199e5d7 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.it-it.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.it-it.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Getting started excerpt: Discover AI Deploy and unfold your first application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.pl-pl.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.pl-pl.md index 6bd3026e577..cedd199e5d7 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.pl-pl.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.pl-pl.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Getting started excerpt: Discover AI Deploy and unfold your first application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.pt-pt.md b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.pt-pt.md index 6bd3026e577..cedd199e5d7 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.pt-pt.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_02_getting_started/guide.pt-pt.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Getting started excerpt: Discover AI Deploy and unfold your first application -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -101,7 +101,7 @@ Then you can modify the **Number of replicas** on which your AI Deploy app will The **static scaling** strategy allows you to choose a fixed number of replicas on which the app will be deployed. For this method, the minimum number of replicas is **1** and the maximum is **10**. This strategy is useful when your consumption or inference load is fixed. Moreover, it allows you to have fixed costs. -With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. +With the **autoscaling strategy**, it is possible to choose both the minimum number of replicas (1 by default) and the maximum number of replicas. **High availability** will measure the average resource usage across its replicas and add instances if this average exceeds the specified average usage percentage threshold. Conversely, it will remove instances when this average resource utilisation falls below the threshold. You can even downscale to 0 if you have no usage, thereby limiting costs. The monitored metric can either be `CPU` or `RAM`, or a custom metric. This solution might be better if you have irregular or sawtooth inference loads. For more detailed information about scaling strategies, please refer to our dedicated guide: [AI Deploy - Scaling strategies](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies). diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.de-de.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.de-de.md index 9c9224fb2f5..6cbb725ef0b 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.de-de.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.de-de.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Scaling strategies excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-asia.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-asia.md index 9c9224fb2f5..6cbb725ef0b 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-asia.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-asia.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Scaling strategies excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-au.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-au.md index 9c9224fb2f5..6cbb725ef0b 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-au.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-au.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Scaling strategies excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ca.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ca.md index 9c9224fb2f5..6cbb725ef0b 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ca.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ca.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Scaling strategies excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-gb.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-gb.md index 9c9224fb2f5..6cbb725ef0b 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-gb.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-gb.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Scaling strategies excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ie.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ie.md index 9c9224fb2f5..6cbb725ef0b 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ie.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-ie.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Scaling strategies excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-sg.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-sg.md index 9c9224fb2f5..6cbb725ef0b 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-sg.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-sg.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Scaling strategies excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-us.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-us.md index 9c9224fb2f5..6cbb725ef0b 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-us.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.en-us.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Scaling strategies excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-es.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-es.md index 9c9224fb2f5..6cbb725ef0b 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-es.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-es.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Scaling strategies excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-us.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-us.md index 9c9224fb2f5..6cbb725ef0b 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-us.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.es-us.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Scaling strategies excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-ca.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-ca.md index d2c0bea4693..79db78444f6 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-ca.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-ca.md @@ -1,7 +1,7 @@ --- title: "AI Deploy - Stratégies de mise à l'échelle (EN)" excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-fr.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-fr.md index d2c0bea4693..79db78444f6 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-fr.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.fr-fr.md @@ -1,7 +1,7 @@ --- title: "AI Deploy - Stratégies de mise à l'échelle (EN)" excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.it-it.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.it-it.md index 9c9224fb2f5..6cbb725ef0b 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.it-it.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.it-it.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Scaling strategies excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pl-pl.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pl-pl.md index 9c9224fb2f5..6cbb725ef0b 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pl-pl.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pl-pl.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Scaling strategies excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pt-pt.md b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pt-pt.md index 9c9224fb2f5..6cbb725ef0b 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pt-pt.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/guide.pt-pt.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Scaling strategies excerpt: Understand the scaling strategies (static scaling vs autoscaling) of AI Deploy and learn how to use them -updated: 2025-12-17 +updated: 2026-02-17 --- > [!primary] @@ -68,21 +68,37 @@ The minimum number of replicas is **1** and the maximum is **10**. Autoscaling dynamically adjusts the number of application replicas based on **real-time metrics**, such as CPU or RAM usage. This is optimized for **workloads with varying demand**. +> [!primary] +> +> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are added after the scaling up delay; if it falls below, replicas are removed after the scaling down delay. + ### Autoscaling Key Configuration Parameters -Using this strategy, it is possible to choose: +Using this strategy, it is possible to choose: | Parameter | Description | |----------------------------|-----------------------------------------------------------------------------------------------| -| **Minimum Replicas** | Lowest number of running replicas. | +| **Minimum Replicas** | Lowest number of running replicas. When set to 0, the number of replicas will be reduced to 0 when your application no longer receives calls during the defined period, limiting costs of your app. | | **Maximum Replicas** | Upper bound for replica count (define based on usage expectations). | -| **Monitored Metric** | The metric to be monitored. Choose between `CPU` or `RAM` for triggering autoscaling actions. | +| **Time before scaling down (s)** | Number of seconds before scaling from N to N-1 replicas. Default value is 300s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Time before scaling to 0 (s)** | Number of seconds before reducing from 1 to 0 replica. **Only applies** when minimum replicas is set to 0. When enabled, this time must be considered in addition to the `Time before scaling down` parameter. | +| **Time before scaling up (s)** | Number of seconds before scaling from N to N+1 replicas. Default value is 0s. Must be greater than or equal to 0 and less than or equal to 3600 (one hour). | +| **Monitored Metric** | The metric to be monitored. Choose between `CPU`, `RAM` or a custom metric for triggering autoscaling actions. | | **Trigger Threshold (%)** | Average usage percentage used to trigger scaling up or down. Range: 1–100%. | -> [!primary] +> [!warning] +> +> For **High Availability**, it is strongly recommended to deploy a **minimum of 2 replicas**. +> +> If you set the minimum number of replicas to 0, please consider the following: > -> Autoscaling adjusts by calculating the **average resource usage** across all replicas. If the average exceeds the threshold, new replicas are spun up; if it falls below, replicas are removed. +> - **Scaling behavior**: If there is no traffic to it, your app will scale down to zero replicas. > +> - **Cold start latency**: If a request comes in while no replicas are serving your app, there will be a cold start delay before the app starts serving requests again, varying from 30 seconds to several minutes depending on your image and volume weight. +> +> - **Resource availability risk**: If you use a popular flavor, then there is a risk that your app will NOT be able to scale up again, if flavor is unavailable, preventing your app from handling incoming requests. +> +> - **Parameter interaction**: The time before scaling to 0 is applied in addition to the time before scaling down. This means the total time before an app scales down to 0 is the sum of both parameters. ### When to Choose Autoscaling? @@ -107,8 +123,10 @@ Using this strategy, it is possible to choose: >> ovhai app run /: \ >> --auto-min-replicas 1 \ >> --auto-max-replicas 5 \ ->> --auto-resource-type CPU \ ->> --auto-resource-usage-threshold 75 +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-resource-type \ +>> --auto-resource-usage-threshold >> ``` >> @@ -171,6 +189,7 @@ You can also modify the scaling strategy after the app has been created using th >> - Switch between auto scaling and static scaling >> - Change replica values >> - Modify monitored metric and associated values +>> - Update your scaling window (time to scale up, to scale down or to scale to 0) >> >> ![Update application scaling step 2](images/update-autoscaling-2.png){.thumbnail} >> @@ -193,11 +212,14 @@ You can also modify the scaling strategy after the app has been created using th >> **Updating Autoscaling** >> >> To change the autoscaling parameters, use the `ovhai app scale` command with the following parameters: ->> +>> >> ```bash >> ovhai app scale \ >> --auto-min-replicas \ >> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-resource-type \ >> --auto-resource-usage-threshold \ >> @@ -209,6 +231,11 @@ You can also modify the scaling strategy after the app has been created using th >> >> ```bash >> ovhai app scale \ +>> --auto-min-replicas \ +>> --auto-max-replicas \ +>> --auto-scale-down-stabilization-window-seconds \ +>> --auto-scale-up-stabilization-window-seconds \ +>> --auto-cooldown-period-seconds \ >> --auto-custom-api-url \ >> --auto-custom-value-location \ >> --auto-custom-target-value \ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/set-autoscaling.png b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/set-autoscaling.png index 7656139ff46..4b82590dd15 100644 Binary files a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/set-autoscaling.png and b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/set-autoscaling.png differ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/set-custom-autoscaling.png b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/set-custom-autoscaling.png index 0cb36c0f3cd..8fb2dea7d72 100644 Binary files a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/set-custom-autoscaling.png and b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/set-custom-autoscaling.png differ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/update-autoscaling-1.png b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/update-autoscaling-1.png index c0f802e3792..8f3fbc01a19 100644 Binary files a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/update-autoscaling-1.png and b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/update-autoscaling-1.png differ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/update-autoscaling-2.png b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/update-autoscaling-2.png index 631562888b0..2b69eb898b5 100644 Binary files a/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/update-autoscaling-2.png and b/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies/images/update-autoscaling-2.png differ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.de-de.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.de-de.md index 9ca557768c7..1e9cdaf6cea 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.de-de.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.de-de.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Billing and lifecycle excerpt: Learn how we bill AI Deploy -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-asia.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-asia.md index 9ca557768c7..1e9cdaf6cea 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-asia.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-asia.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Billing and lifecycle excerpt: Learn how we bill AI Deploy -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-au.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-au.md index 9ca557768c7..1e9cdaf6cea 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-au.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-au.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Billing and lifecycle excerpt: Learn how we bill AI Deploy -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-ca.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-ca.md index 9ca557768c7..1e9cdaf6cea 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-ca.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-ca.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Billing and lifecycle excerpt: Learn how we bill AI Deploy -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-gb.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-gb.md index 75d7dad942c..d68c73654e4 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-gb.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-gb.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Billing and lifecycle excerpt: Learn how we bill AI Deploy -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-ie.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-ie.md index 9ca557768c7..1e9cdaf6cea 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-ie.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-ie.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Billing and lifecycle excerpt: Learn how we bill AI Deploy -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-sg.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-sg.md index 9ca557768c7..1e9cdaf6cea 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-sg.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-sg.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Billing and lifecycle excerpt: Learn how we bill AI Deploy -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-us.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-us.md index 9ca557768c7..1e9cdaf6cea 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-us.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.en-us.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Billing and lifecycle excerpt: Learn how we bill AI Deploy -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.es-es.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.es-es.md index 9ca557768c7..1e9cdaf6cea 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.es-es.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.es-es.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Billing and lifecycle excerpt: Learn how we bill AI Deploy -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.es-us.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.es-us.md index 9ca557768c7..1e9cdaf6cea 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.es-us.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.es-us.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Billing and lifecycle excerpt: Learn how we bill AI Deploy -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.fr-ca.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.fr-ca.md index 8594c199cf4..0b340d68737 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.fr-ca.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.fr-ca.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Facturation et cycle de vie (EN) excerpt: Découvrez comment la solution AI Deploy est facturée -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.fr-fr.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.fr-fr.md index 8594c199cf4..0b340d68737 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.fr-fr.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.fr-fr.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Facturation et cycle de vie (EN) excerpt: Découvrez comment la solution AI Deploy est facturée -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.it-it.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.it-it.md index 9ca557768c7..1e9cdaf6cea 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.it-it.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.it-it.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Billing and lifecycle excerpt: Learn how we bill AI Deploy -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.pl-pl.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.pl-pl.md index 9ca557768c7..1e9cdaf6cea 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.pl-pl.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.pl-pl.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Billing and lifecycle excerpt: Learn how we bill AI Deploy -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.pt-pt.md b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.pt-pt.md index 9ca557768c7..1e9cdaf6cea 100644 --- a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.pt-pt.md +++ b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/guide.pt-pt.md @@ -1,7 +1,7 @@ --- title: AI Deploy - Billing and lifecycle excerpt: Learn how we bill AI Deploy -updated: 2025-02-18 +updated: 2026-02-17 --- > [!primary] @@ -24,16 +24,17 @@ AI Deploy is linked to a Public Cloud project. The whole project is billed at th OVHcloud AI Deploy allows deployment of Docker images, and each deployment is called an `app`. During its lifetime, the app will go through the following status: -- `QUEUED`: the app deployment request is about to be processed. -- `INITIALIZING`: the app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. +- `QUEUED`: The app deployment request is about to be processed. +- `INITIALIZING`: The app is being started and, if any, the remote data is synchronized from the Object Storage. To learn more about data synchronization, please check out the [Data - Concept and best practices](/pages/public_cloud/ai_machine_learning/gi_02_concepts_data#how-it-works) documentation. - `SCALING`: First, the system allocates the necessary compute resources (CPU/GPU) for the app. Then, the specified Docker image is pulled for use in the app. This status is also entered when the number of app replicas is being increased or decreased. - `RUNNING`: At least one replica of the app is available and accessible via its endpoint. As the app scales up to create new replicas, the status transitions back to `SCALING`. However, there is no interruption in service, and the original replica(s) remain accessible during this time. -- `STOPPING`: the app is stopping, your compute resources are freed. Ephemeral data is deleted. -- `STOPPED`: the app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. -- `FAILED`: the app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). -- `ERROR`: the app ended due to a backend error (issue on OVHcloud side). You may reach our support. -- `DELETING`: the app is being removed. When it is deleted, you will no longer see it, it will no longer exist. -- `DELETED`: the app is fully deleted. +- `STANDBY`: The app has no running replicas but is ready to scale back up when traffic arrives. **This happens after a period with no incoming requests when scaling to 0 is enabled**. When traffic arrives, the app transitions from `STANDBY` to `RUNNING` through the `INITIALIZING` and `SCALING` states. +- `STOPPING`: The app is stopping, your compute resources are freed. Ephemeral data is deleted. +- `STOPPED`: The app ended normally. You can restart it whenever you want or delete it. It will keep the same endpoint. +- `FAILED`: The app ended in error, e.g. the Docker image is invalid (unreachable, built with linux/arm, ...). +- `ERROR`: The app ended due to a backend error (issue on OVHcloud side). You may [reach our support](/links/support-contact). +- `DELETING`: The app is being removed. When it is deleted, you will no longer see it, it will no longer exist. +- `DELETED`: The app is fully deleted. ![image](images/ai.deploy.lifecycle.png){.thumbnail} @@ -68,10 +69,10 @@ Their official pricing is available in the [OVHcloud Control Panel](/links/manag Rates for compute are mentioned per hour to facilitate reading of the prices, but the billing granularity remains **per minute**. -Once you select the compute resources, you can specify the scaling strategy: +Once you select the compute resources, you can specify the [scaling strategy](/pages/public_cloud/ai_machine_learning/deploy_guide_04_scaling_strategies): - **Fixed scaling**: you can specify a fixed amount of replicas, starting at one. Please note that with one replica, you will not benefit from high-availability. -- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU or RAM usage). Each replica will benefit from the compute resource selected before. +- **Auto-scaling**: you can specify a minimum and maximum amount of replicas, and a metric that will act as a trigger for scaling up or down (CPU, RAM usage or a custom metric). Each replica will benefit from the compute resource selected before. ### Storage details diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/images/ai.deploy.billing.png b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/images/ai.deploy.billing.png index 3b45765eed9..15cd6376654 100644 Binary files a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/images/ai.deploy.billing.png and b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/images/ai.deploy.billing.png differ diff --git a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/images/ai.deploy.lifecycle.png b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/images/ai.deploy.lifecycle.png index f1379a6fc61..b19c03d8da8 100644 Binary files a/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/images/ai.deploy.lifecycle.png and b/pages/public_cloud/ai_machine_learning/deploy_guide_06_billing_concept/images/ai.deploy.lifecycle.png differ