Skip to content

Conversation

@ELin2025
Copy link

@ELin2025 ELin2025 commented Feb 6, 2026

What changes were proposed in this PR?

image

This change relates to the addition of a ternary contour plot operator, which visualizes how a scalar value varies as a function of three normalized components that sum to a constant (typically 1 or 100%).

In a ternary contour plot:

  • Each vertex of the triangular plot represents 100% of one component.
  • Any interior point represents a mixture of the three components.
  • Contour lines or color gradients indicate equal values of the measured quantity across different mixtures.

This visualization is useful for identifying regions where the output is optimized or insensitive to changes in the component proportions, as well as for understanding trade-offs between the three variables.

The operator takes in 4 inputs. The first three variables are the components, and the fourth variable is the output that corresponds to the proportion of the the three components.

Any related issues, documentation, discussions?

Needs python library scikit-image
Can be installed using: pip install scikit-image

How was this PR tested?

Tested with existing test cases

Was this PR authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added frontend Changes related to the frontend GUI common labels Feb 6, 2026
carloea2 and others added 5 commits February 8, 2026 23:54
…thon based operators (apache#4189)

### What changes were proposed in this PR?
This new PR introduces a PythonTemplateBuilder mechanism to create
Texera’s Python native operators. It refactors how Python code is
created using a new template concept, addressing prior issues with
string formatting. Previously, creating Python-based operators is via
raw string formatting, which is fragile: user text can contain `{}`,
`%`, quotes, or newlines that break formatting. This PR makes codegen
deterministic and safer by treating interpolated values as data
segments.

#### Design
**Diagram 1** (compile-time `pyb` expansion and validation)

This diagram describes the Scala compile-time flow when a developer
writes a `pyb"..."` template: the `pyb` macro receives the literal parts
and argument trees, verifies that literal segments are safe, classifies
each interpolated argument (plain text vs. encodable vs. nested
builder), and applies boundary validation to ensure encodable content
cannot “break out” of its intended Python context. Each argument is
evaluated once, runtime guards are injected when a nested builder is
spliced in, and the pieces are concatenated into a
`PythonTemplateBuilder`, which compacts adjacent text chunks and renders
an `encode()` output where encodable values become decode-at-runtime
segments before the generated Python is embedded into the operator
payload.


```mermaid
sequenceDiagram
    participant Dev as Scala code
    participant SC as StringContext
    participant M as pyb macro
    participant EI as EncodableInspector
    participant BV as BoundaryValidator
    participant PTB as PythonTemplateBuilder

    Dev->>SC: pyb"t0 $a0 t1 $a1 t2"
    SC->>M: parts + arg trees
    M->>M: verify literal parts
    M->>EI: classify args
    loop each direct encodable arg
        M->>BV: validateCompileTime(left,right,prefixLine)
        BV-->>M: ok / abort
    end
    M->>M: eval each arg once into __pyb_argN
    loop each nested builder arg
        M->>BV: runtimeChecksForNestedBuilder(ctx,__pyb_argN)
        BV-->>M: injected guard if unsafe
    end
    M->>PTB: concat parts + __pyb_argN
    PTB-->>Dev: returns PythonTemplateBuilder
    PTB->>PTB: compact adjacent Text chunks
    PTB->>PTB: render Encode (encodable -> decode(base64))
    PTB-->>Dev: encode() returns python source string
    Dev->>Dev: embed generated python into operator payload

``` 
**Diagram 2** (end-to-end runtime flow: UI → descriptor → worker
decoding with cache)

This diagram illustrates the end-to-end pipeline from UI input to
execution: the UI submits parameters (including user-controlled strings)
to the Scala descriptor, where `pyb` expansion and
`PythonTemplateBuilder` assembly produce a deterministic Python source
string in “encode mode.” The encoded Python is embedded into the
workflow plan payload, dispatched by the workflow service to the Python
worker, and executed by the operator; during execution, the operator
uses `PythonTemplateDecoder` to recover user text by decoding each
encoded segment. An LRU cache (size 256) backs the decoder so repeated
encoded strings decode once and subsequently reuse cached UTF-8 strings,
reducing overhead while preserving strict decoding semantics.

```mermaid
sequenceDiagram
    autonumber
    participant UI as UI Web
    participant DESC as Descriptor (Scala)
    participant MAC as pyb macro (compile time)
    participant PTB as PythonTemplateBuilder
    participant PLAN as Plan payload
    participant SVC as Workflow service
    participant WK as Python worker
    participant OP as Python Operator
    participant DEC as PythonTemplateDecoder
    participant CACHE as lru_cache 256

    note over DESC,PTB: PyB related (Scala compile time codegen)
    UI->>DESC: submit params + code strings
    DESC->>MAC: pyb interpolation expands
    MAC-->>DESC: expanded builder + validation logic
    DESC->>PTB: assemble chunks (Text + Value)
    PTB-->>DESC: rendered python source (encode mode)

    note over DESC,WK: Plan + dispatch
    DESC->>PLAN: embed python source into payload
    PLAN->>SVC: submit workflow plan
    SVC->>WK: dispatch operator payload

    note over WK,DEC: Python runtime (worker executes generated source)
    WK->>OP: start operator with python source

    loop each encoded segment
        OP->>DEC: decode(base64)

        DEC->>CACHE: lookup(base64)
        alt cache hit
            CACHE-->>DEC: cached str
        else cache miss
            CACHE-->>DEC: miss
            DEC->>DEC: base64 decode + utf8 strict
            DEC->>CACHE: store(base64,str)
        end

        DEC-->>OP: recovered user text
    end

    OP-->>WK: execution continued
``` 

**Diagram 3** (test harness: generate code, reject raw-invalid,
`py_compile`)

This diagram shows the automated verification path for Python native
operators: ScalaTest uses ClassGraph to discover every
`PythonOperatorDescriptor`, instantiates each descriptor, inject invalid
raw strings into class fields marked with `Json` properties and calls
`generatePythonCode()` to produce the final Python source string. The
test asserts that no “RawInvalid” marker appears in the generated output
(indicating unsafe raw text did not leak), writes the source to a
temporary `source.py`, and runs `python -m py_compile` to ensure the
code is syntactically valid and compilable. Any raw-invalid leakage,
compile error, or timeout causes the test to fail, enforcing consistent
template-based code generation across operators.

```mermaid
sequenceDiagram
  autonumber
  participant TS as ScalaTest
  participant CG as ClassGraph scanner
  participant DESC as PythonOperatorDescriptor
  participant GEN as generatePythonCode
  participant SPEC as PythonCodeRawInvalidTextSpec
  participant PY as python -m py_compile
  participant FS as temp file (source.py)

  TS->>CG: scan descriptors in packages
  CG-->>TS: list of PythonOperatorDescriptor classes

  loop each descriptor class
    TS->>DESC: instantiate descriptor
    TS->>GEN: call generatePythonCode(descriptor)
    GEN-->>TS: python source string

    TS->>SPEC: assert RawInvalid marker not present
    alt marker leaked
      SPEC-->>TS: FAIL (invalid raw text leaked)
    else marker clean
      SPEC-->>TS: OK
      TS->>FS: write source to temp file
      TS->>PY: py_compile(temp file)
      alt compile error or timeout
        PY-->>TS: FAIL (compile/timeout)
      else compile ok
        PY-->>TS: PASS
      end
    end
  end
``` 

#### As a developer, how to use `pyb` to create your python-based
operators

1. **Use `EncodableString` for any UI/user-controlled text**

Before (raw `String`)
```scala
@JsonSchemaTitle("Ground Truth Attribute Column")
@AutofillAttributeName
var groundTruthAttribute: String = ""

@JsonSchemaTitle("Selected Features")
@AutofillAttributeNameList
var selectedFeatures: List[String] = _
```

After (`EncodableString`)
```scala
import org.apache.texera.amber.pybuilder.PyStringTypes.EncodableString

@JsonSchemaTitle("Ground Truth Attribute Column")
@AutofillAttributeName
var groundTruthAttribute: EncodableString = ""

@JsonSchemaTitle("Selected Features")
@AutofillAttributeNameList
var selectedFeatures: List[EncodableString] = _
```

---

2. **Write Python using `pyb"""..."""` and interpolate values with
`$param`**

Before (string interpolation with manual quoting)
```scala
val code =
  s"""
     |y_train = self.dataset[\"$groundTruthAttribute\"]
     |""".stripMargin
```

After (template + data: no manual quoting)
```scala
import org.apache.texera.amber.pybuilder.PythonTemplateBuilder.PythonTemplateBuilderStringContext

val code = pyb"""
  |y_train = self.dataset[$groundTruthAttribute]
  |""".encode //Automatic stripMargin applied inside the builder
```

---

3. **For optional arguments, represent them as small `pyb` fragments,
then put them in the code template**

Before (manual string concatenation + quote juggling)
```scala
val colorArg   = if (color.nonEmpty) s", color='$color'" else ""
val patternArg = if (pattern.nonEmpty) s", pattern_shape='$pattern'" else ""

val fig = s"fig = px.timeline(table, x_start='start', x_end='finish', y='task'$colorArg$patternArg)"
```

After (optional fragments are builders too)
```scala
val colorArg   = if (color.nonEmpty) pyb", color=$color" else pyb"""
val patternArg = if (pattern.nonEmpty) pyb", pattern_shape=$pattern" else pyb"""

val fig = pyb"""fig = px.timeline(table, x_start=$start, x_end=$finish, y=$task$colorArg$patternArg)"""
```

---

4. **Return `.encode` from `generatePythonCode()`**

Before (returns raw string)
```scala
override def generatePythonCode(): String = {
  val finalCode =
    s"""
       |from pytexera import *
       |y_train = self.dataset[\"$groundTruthAttribute\"]
       |""".stripMargin
  finalCode
}
```

After (returns encoded output from builder)
```scala
override def generatePythonCode(): String = {
  val finalCode = pyb"""
    |from pytexera import *
    |y_train = self.dataset[$groundTruthAttribute]
    |"""
  finalCode.encode
}
```

---

5. **Try to avoid the use of `s"..."`, `.format`, or `%` formatting for
Python codegen**

Before (`s` / `String.format` / `.format` patterns)
```scala
// s"..."
return s"""table[\"${ele.attribute}\"].values.shape[0]"""

// String.format / "{}" placeholders
workflowParam = workflowParam + String.format("%s = {},", ele.parameter.getName)
portParam = portParam + String.format("%s(table['%s'].values[i]),", ele.parameter.getType, ele.attribute)
```

After (`pyb` templates end-to-end)
```scala
return pyb"""table[${ele.attribute}].values.shape[0]"""

workflowParam = pyb"$workflowParam${ele.parameter.getName} = {},"
portParam = pyb"$portParam${ele.parameter.getType}(table[${ele.attribute}].values[i]),"
```

---

6. **Develop the unit tests in the new way**

Before (expects quoted literals like `'start'`)
```scala
assert(
  opDesc.createPlotlyFigure().plain.contains(
    "fig = px.timeline(table, x_start='start', x_end='finish', y='task' , color='color' )"
  )
)
```

After (expects template output using variables, no embedded quotes)
```scala
assert(
  opDesc.createPlotlyFigure().plain.contains(
    "fig = px.timeline(table, x_start=start, x_end=finish, y=task , color=color )"
  )
)
```

### Any related issues, documentation, discussions?
No

### How was this PR tested?
The PR includes a comprehensive set of tests to ensure the new
functionality works and that it doesn’t break existing workflows:

Unit Tests for PythonTemplateBuilder: New unit tests were added to
verify that PythonTemplateBuilder correctly classifies and encodes
segments. For example, tests likely feed in code strings with various
edge cases (braces, percentage signs, quotes, etc.) and assert that the
builder produces the expected spec output.

Unit Tests for PythonCodeRawInvalidTextSpec: 2 new unit test to
instantiate each Python Native Operator, and call `generatePythonCode`
method and checks the python code compiles and the string format is
consistent.

## Was this PR authored or co-authored using generative AI tooling?
Reviewed by ChatGPT 5.2
@github-actions github-actions bot added engine dependencies Pull requests that update a dependency file python ci changes related to CI labels Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci changes related to CI common dependencies Pull requests that update a dependency file engine frontend Changes related to the frontend GUI python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants