## Progress Indicators
diff --git a/workflows/concepts/jobs.mdx b/workflows/concepts/jobs.mdx
index b05e01c..4b337f3 100644
--- a/workflows/concepts/jobs.mdx
+++ b/workflows/concepts/jobs.mdx
@@ -176,13 +176,47 @@ job, err := client.Jobs.Get(ctx, uuid.MustParse("018dd029-58ca-74e5-8b58-b4f99d6
`find` is also a useful tool for fetching a jobs state after a while, to check if it's still running or has already completed.
+In interactive environments such as Jupyter notebooks, the job object also provides a rich display of the job's state and progress, if it's used as the last expression in a cell.
+
+
+
+
+
+
## States
-A job can be in one of the following states:
+Every Job is always in exactly one of the following states:
+
+
+


Submitted
+
The Job hasn't started yet, all it's tasks are queued and it wasn't canceled by the user.
+
+
+


Running
+
At least one task of the job is currently running.
+
+
+


Started
+
The job has started, some tasks are already `COMPUTED`, but others are still `QUEUED`, waiting for an [eligible task runner](/workflows/concepts/task-runners#task-selection) to pick them up. However no task is currently `RUNNING`.
+
+
+


Completed
+
The job has successfully completed. Every task of the job succeeded and is `COMPUTED`.
+
+
+


Failed
+
At least one task of the job has failed, causing the execution of the remaining tasks to be halted. You can [retry](#retries) the job to resume execution from the point of failure.
+
+
+


Canceled
+
The job was canceled upon user request. You can [retry](#retries) the job to resume execution from the point of cancellation.
+
-- `QUEUED`: the job is queued and waiting for execution
-- `STARTED`: at least one task of the job has been started
-- `COMPLETED`: all tasks of the job have been completed
+
+ The state of a job is determined by the states of all it's tasks. For a list of possible task states, see the [task state](/workflows/concepts/tasks#task-states) documentation.
+
+
+You can programmatically check the state of a job by inspecting it's `state` field.
```python Python
@@ -190,17 +224,17 @@ from tilebox.workflows.data import JobState
job = job_client.find("018dd029-58ca-74e5-8b58-b4f99d610f9a")
-print("Job is queued:", job.state == JobState.QUEUED)
+print("Job is running:", job.state == JobState.RUNNING)
```
```go Go
job, err := client.Jobs.Get(ctx, uuid.MustParse("018dd029-58ca-74e5-8b58-b4f99d610f9a"))
-fmt.Println("Job is queued:", job.State == workflows.JobQueued)
+fmt.Println("Job is running:", job.State == workflows.JobRunning)
```
```plaintext Output
-Job is queued: True
+Job is running: True
```
## Visualization
@@ -223,22 +257,13 @@ job_client.display(job)
```
-The following diagram represents the job execution as a graph. Each task is shown as a node, with edges indicating sub-task relationships. The diagram also uses color coding to display the status of each task.
+The following diagram represents the job execution as a graph. Each task is shown as a node, with edges indicating sub-task relationships. The diagram also uses color coding to display the state of each task.
-The color codes for task states are:
-
-| Task State | Color | Description |
-|------------|-------|-------------|
-| Queued | SalmonYellow | The task is queued and waiting for execution. |
-| Running | Blue | The task is currently being executed. |
-| Computed | Green | The task has successfully been computed. If a task is computed, and all it's sub-tasks are also computed, the task is considered completed. |
-| Failed | Red | The task has been executed but encountered an error. |
-
Below is another visualization of a job currently being executed by multiple task runners.
@@ -246,9 +271,9 @@ Below is another visualization of a job currently being executed by multiple tas
-This visualization shows:
+From the diagram, the following can be inferred:
-- The root task, `MyTask`, has executed and spawned three sub-tasks.
+- The root task, `MyTask`, has been executed, is marked as `COMPUTED` and submitted three sub-tasks.
- At least three task runners are available, as three tasks currently are executed simultaneously.
- The `SubTask` that is still executing has not generated any sub-tasks yet, as sub-tasks are queued for execution only after the parent task finishes and becomes computed.
- The queued `DependentTask` requires the `LeafTask` to complete before it can be executed.
diff --git a/workflows/concepts/tasks.mdx b/workflows/concepts/tasks.mdx
index 2efb8f9..d5697cf 100644
--- a/workflows/concepts/tasks.mdx
+++ b/workflows/concepts/tasks.mdx
@@ -354,18 +354,118 @@ In total, six tasks are executed: the `DownloadRandomDogImages` task and five `D
Check out [job_client.display](/workflows/concepts/jobs#visualization) to learn how this visualization was automatically generated from the task executions.
-Currently, a limit of `64` subtasks per task is in place to discourage creating workflows where individual parent tasks submit a large number of subtasks, which can lead to performance issues since those parent tasks are not parallelized. If you need to submit more than `64` subtasks, consider using [recursive subtask submission](#recursive-subtasks) instead.
+## Task States
-## Recursive subtasks
+Every task goes through a set of states during its lifetime. When submitted, either as a job or as a subtask, it starts in the `QUEUED` state and transitions to `RUNNING` when a task runner picks it up. If the task executes successfully, it transitions to `COMPUTED`. If the task fails, it transitions to `FAILED`.
+As soon as all subtasks of a task are also `COMPUTED`, the task is considered `COMPLETED`, allowing dependent tasks to be executed.
+The table below summarizes the different task states and their meanings.
+
+| Task State | Description |
+|------------|-------------|
+| **Queued** | The task is queued and waiting for execution. Any [eligible](/workflows/concepts/task-runners#task-selection) task runner can pick it up and execute it, as soon as it's parent task is `COMPUTED` and all it's dependencies are `COMPLETED`. |
+| **Running** | The task is currently being executed by a task runner. |
+| **Computed** | The task has successfully been computed, but still has outstanding subtasks. |
+| **Completed** | The task has successfully been computed, and all it's subtasks are also computed, making it `COMPLETED`. This is the final state of a task. Only once a task has been `COMPLETED`, dependent tasks can be executed. |
+| **Failed** | The task has been executed but encountered an error. |
+
+
+
+
+
+
+## Map-Reduce Pattern
+Often times the input to a task is a list, with elements that should then be **mapped** to individual subtasks, whose results are later aggregated in a **reduce** step. This pattern is commonly known as [MapReduce](https://en.wikipedia.org/wiki/MapReduce) and a common pattern in workflows. In Tilebox, the reduce step is typically defined as a separate task that depends on all the map tasks.
+
+For example, the workflow below applies this pattern to a list of numbers to calculate the sum of all squares of the numbers. The `Square` task takes a single number and squares it, and the `Sum` task reduces the list of squared numbers to a single sum.
+
+
+```python Python
+class SumOfSquares(Task):
+ numbers: list[int]
+
+ def execute(self, context: ExecutionContext) -> None:
+ # 1. Map
+ square_tasks = context.submit_subtasks(
+ [Square(num) for num in self.numbers]
+ )
+ # 2. Reduce
+ sum_task = context.submit_subtask(Sum(), depends_on=square_tasks)
+
+
+class Square(Task): # The map step
+ num: int
+
+ def execute(self, context: ExecutionContext) -> None:
+ result = self.num ** 2
+ # typically the output of a task is a large dataset,
+ # so we save individual results into a shared cache
+ context.job_cache.group("squares")[str(self.num)] = str(result).encode()
+ context.current_task.display = f"Square({self.num})"
+
+class Sum(Task): # The reduce step
+ def execute(self, context: ExecutionContext) -> None:
+ result = 0
+ # access our cached results from the map step
+ squares = context.job_cache.group("squares")
+ for key in squares:
+ result += int(squares[key].decode())
+
+ print("Sum of squares is:", result)
+```
+
-Tasks can submit other tasks as subtasks, allowing for complex workflows. Sometimes the input to a task is a list, with elements that can be **mapped** to individual subtasks, whose outputs are then aggregated in a **reduce** step. This pattern is commonly known as **MapReduce**.
+Submitting a job of the `SumOfSquares` task and running it with a task runner can be done as follows:
-Often times the initial map step—submitting the individual subtasks—might already be an expensive operation. Since this is executed within a single task, it's not parallelizable, which can bottleneck the entire workflow.
+
+```python Python
+from tilebox.workflows import Client
+from tilebox.workflows.cache import InMemoryCache
+
+client = Client()
+jobs = client.jobs()
+job = jobs.submit(
+ "sum-of-squares",
+ SumOfSquares([12, 345, 453, 21, 45, 98]),
+)
-Fortunately, Tilebox Workflows offers a solution through **recursive subtask submission**. A task can submit instances of itself as subtasks, enabling a recursive breakdown into smaller tasks.
+client.runner(tasks=[SumOfSquares, Square, Sum], cache=InMemoryCache()).run_all()
+jobs.display(job)
+```
+
+
+```plaintext Output
+Sum of squares is: 336448
+```
+
+
+
+
+
+
+## Recursive subtasks
+
+Tasks can not only submit other tasks as subtasks, but also instances of themselves. This allows for a recursive breakdown of a task into smaller chunks. Such recursive decomposition algorithms are referred to as [divide and conquer algorithms](https://en.wikipedia.org/wiki/Divide-and-conquer_algorithm).
For example, the `RecursiveTask` below is a valid task that submits smaller instances of itself as subtasks.
+When implementing a recursive task, it's important to define a base case that stops the recursion. Otherwise, the task will keep submitting subtasks indefinitely, resulting in an infinite loop.
+
```python Python
class RecursiveTask(Task):
@@ -373,6 +473,7 @@ For example, the `RecursiveTask` below is a valid task that submits smaller inst
def execute(self, context: ExecutionContext) -> None:
print(f"Executing RecursiveTask with num={self.num}")
+ # if num < 2, we reached the base case and stop recursion
if self.num >= 2:
context.submit_subtask(RecursiveTask(self.num // 2))
```
@@ -383,6 +484,7 @@ For example, the `RecursiveTask` below is a valid task that submits smaller inst
func (t *RecursiveTask) Execute(ctx context.Context) error {
slog.Info("Executing RecursiveTask", slog.Int("num", t.Num))
+ // if num < 2, we reached the base case and stop recursion
if t.Num >= 2 {
_, err := workflows.SubmitSubtask(ctx, &RecursiveTask{Num: t.Num / 2})
if err != nil {
diff --git a/workflows/progress.mdx b/workflows/progress.mdx
index 1b60277..e8b93b5 100644
--- a/workflows/progress.mdx
+++ b/workflows/progress.mdx
@@ -2,7 +2,6 @@
title: Progress
description: Add progress indicators to provide visibility into the execution of a job
icon: bars-progress
-tag: NEW
---
Tilebox supports user-defined progress indicators during the execution of a job. This can be useful to provide visibility into the execution and the expected duration of a job, especially for longer running jobs.