feat: add a new ternary contour plot operator #4193

ELin2025 · 2026-02-06T23:30:14Z

What changes were proposed in this PR?

This change relates to the addition of a ternary contour plot operator, which visualizes how a scalar value varies as a function of three normalized components that sum to a constant (typically 1 or 100%).

In a ternary contour plot:

Each vertex of the triangular plot represents 100% of one component.
Any interior point represents a mixture of the three components.
Contour lines or color gradients indicate equal values of the measured quantity across different mixtures.

This visualization is useful for identifying regions where the output is optimized or insensitive to changes in the component proportions, as well as for understanding trade-offs between the three variables.

The operator takes in 4 inputs. The first three variables are the components, and the fourth variable is the output that corresponds to the proportion of the the three components.

Any related issues, documentation, discussions?

Needs python library scikit-image
Can be installed using: pip install scikit-image

How was this PR tested?

Tested with existing test cases

Was this PR authored or co-authored using generative AI tooling?

No

…thon based operators (apache#4189) ### What changes were proposed in this PR? This new PR introduces a PythonTemplateBuilder mechanism to create Texera’s Python native operators. It refactors how Python code is created using a new template concept, addressing prior issues with string formatting. Previously, creating Python-based operators is via raw string formatting, which is fragile: user text can contain `{}`, `%`, quotes, or newlines that break formatting. This PR makes codegen deterministic and safer by treating interpolated values as data segments. #### Design **Diagram 1** (compile-time `pyb` expansion and validation) This diagram describes the Scala compile-time flow when a developer writes a `pyb"..."` template: the `pyb` macro receives the literal parts and argument trees, verifies that literal segments are safe, classifies each interpolated argument (plain text vs. encodable vs. nested builder), and applies boundary validation to ensure encodable content cannot “break out” of its intended Python context. Each argument is evaluated once, runtime guards are injected when a nested builder is spliced in, and the pieces are concatenated into a `PythonTemplateBuilder`, which compacts adjacent text chunks and renders an `encode()` output where encodable values become decode-at-runtime segments before the generated Python is embedded into the operator payload. ```mermaid sequenceDiagram participant Dev as Scala code participant SC as StringContext participant M as pyb macro participant EI as EncodableInspector participant BV as BoundaryValidator participant PTB as PythonTemplateBuilder Dev->>SC: pyb"t0 $a0 t1 $a1 t2" SC->>M: parts + arg trees M->>M: verify literal parts M->>EI: classify args loop each direct encodable arg M->>BV: validateCompileTime(left,right,prefixLine) BV-->>M: ok / abort end M->>M: eval each arg once into __pyb_argN loop each nested builder arg M->>BV: runtimeChecksForNestedBuilder(ctx,__pyb_argN) BV-->>M: injected guard if unsafe end M->>PTB: concat parts + __pyb_argN PTB-->>Dev: returns PythonTemplateBuilder PTB->>PTB: compact adjacent Text chunks PTB->>PTB: render Encode (encodable -> decode(base64)) PTB-->>Dev: encode() returns python source string Dev->>Dev: embed generated python into operator payload ``` **Diagram 2** (end-to-end runtime flow: UI → descriptor → worker decoding with cache) This diagram illustrates the end-to-end pipeline from UI input to execution: the UI submits parameters (including user-controlled strings) to the Scala descriptor, where `pyb` expansion and `PythonTemplateBuilder` assembly produce a deterministic Python source string in “encode mode.” The encoded Python is embedded into the workflow plan payload, dispatched by the workflow service to the Python worker, and executed by the operator; during execution, the operator uses `PythonTemplateDecoder` to recover user text by decoding each encoded segment. An LRU cache (size 256) backs the decoder so repeated encoded strings decode once and subsequently reuse cached UTF-8 strings, reducing overhead while preserving strict decoding semantics. ```mermaid sequenceDiagram autonumber participant UI as UI Web participant DESC as Descriptor (Scala) participant MAC as pyb macro (compile time) participant PTB as PythonTemplateBuilder participant PLAN as Plan payload participant SVC as Workflow service participant WK as Python worker participant OP as Python Operator participant DEC as PythonTemplateDecoder participant CACHE as lru_cache 256 note over DESC,PTB: PyB related (Scala compile time codegen) UI->>DESC: submit params + code strings DESC->>MAC: pyb interpolation expands MAC-->>DESC: expanded builder + validation logic DESC->>PTB: assemble chunks (Text + Value) PTB-->>DESC: rendered python source (encode mode) note over DESC,WK: Plan + dispatch DESC->>PLAN: embed python source into payload PLAN->>SVC: submit workflow plan SVC->>WK: dispatch operator payload note over WK,DEC: Python runtime (worker executes generated source) WK->>OP: start operator with python source loop each encoded segment OP->>DEC: decode(base64) DEC->>CACHE: lookup(base64) alt cache hit CACHE-->>DEC: cached str else cache miss CACHE-->>DEC: miss DEC->>DEC: base64 decode + utf8 strict DEC->>CACHE: store(base64,str) end DEC-->>OP: recovered user text end OP-->>WK: execution continued ``` **Diagram 3** (test harness: generate code, reject raw-invalid, `py_compile`) This diagram shows the automated verification path for Python native operators: ScalaTest uses ClassGraph to discover every `PythonOperatorDescriptor`, instantiates each descriptor, inject invalid raw strings into class fields marked with `Json` properties and calls `generatePythonCode()` to produce the final Python source string. The test asserts that no “RawInvalid” marker appears in the generated output (indicating unsafe raw text did not leak), writes the source to a temporary `source.py`, and runs `python -m py_compile` to ensure the code is syntactically valid and compilable. Any raw-invalid leakage, compile error, or timeout causes the test to fail, enforcing consistent template-based code generation across operators. ```mermaid sequenceDiagram autonumber participant TS as ScalaTest participant CG as ClassGraph scanner participant DESC as PythonOperatorDescriptor participant GEN as generatePythonCode participant SPEC as PythonCodeRawInvalidTextSpec participant PY as python -m py_compile participant FS as temp file (source.py) TS->>CG: scan descriptors in packages CG-->>TS: list of PythonOperatorDescriptor classes loop each descriptor class TS->>DESC: instantiate descriptor TS->>GEN: call generatePythonCode(descriptor) GEN-->>TS: python source string TS->>SPEC: assert RawInvalid marker not present alt marker leaked SPEC-->>TS: FAIL (invalid raw text leaked) else marker clean SPEC-->>TS: OK TS->>FS: write source to temp file TS->>PY: py_compile(temp file) alt compile error or timeout PY-->>TS: FAIL (compile/timeout) else compile ok PY-->>TS: PASS end end end ``` #### As a developer, how to use `pyb` to create your python-based operators 1. **Use `EncodableString` for any UI/user-controlled text** Before (raw `String`) ```scala @JsonSchemaTitle("Ground Truth Attribute Column") @AutofillAttributeName var groundTruthAttribute: String = "" @JsonSchemaTitle("Selected Features") @AutofillAttributeNameList var selectedFeatures: List[String] = _ ``` After (`EncodableString`) ```scala import org.apache.texera.amber.pybuilder.PyStringTypes.EncodableString @JsonSchemaTitle("Ground Truth Attribute Column") @AutofillAttributeName var groundTruthAttribute: EncodableString = "" @JsonSchemaTitle("Selected Features") @AutofillAttributeNameList var selectedFeatures: List[EncodableString] = _ ``` --- 2. **Write Python using `pyb"""..."""` and interpolate values with `$param`** Before (string interpolation with manual quoting) ```scala val code = s""" |y_train = self.dataset[\"$groundTruthAttribute\"] |""".stripMargin ``` After (template + data: no manual quoting) ```scala import org.apache.texera.amber.pybuilder.PythonTemplateBuilder.PythonTemplateBuilderStringContext val code = pyb""" |y_train = self.dataset[$groundTruthAttribute] |""".encode //Automatic stripMargin applied inside the builder ``` --- 3. **For optional arguments, represent them as small `pyb` fragments, then put them in the code template** Before (manual string concatenation + quote juggling) ```scala val colorArg = if (color.nonEmpty) s", color='$color'" else "" val patternArg = if (pattern.nonEmpty) s", pattern_shape='$pattern'" else "" val fig = s"fig = px.timeline(table, x_start='start', x_end='finish', y='task'$colorArg$patternArg)" ``` After (optional fragments are builders too) ```scala val colorArg = if (color.nonEmpty) pyb", color=$color" else pyb""" val patternArg = if (pattern.nonEmpty) pyb", pattern_shape=$pattern" else pyb""" val fig = pyb"""fig = px.timeline(table, x_start=$start, x_end=$finish, y=$task$colorArg$patternArg)""" ``` --- 4. **Return `.encode` from `generatePythonCode()`** Before (returns raw string) ```scala override def generatePythonCode(): String = { val finalCode = s""" |from pytexera import * |y_train = self.dataset[\"$groundTruthAttribute\"] |""".stripMargin finalCode } ``` After (returns encoded output from builder) ```scala override def generatePythonCode(): String = { val finalCode = pyb""" |from pytexera import * |y_train = self.dataset[$groundTruthAttribute] |""" finalCode.encode } ``` --- 5. **Try to avoid the use of `s"..."`, `.format`, or `%` formatting for Python codegen** Before (`s` / `String.format` / `.format` patterns) ```scala // s"..." return s"""table[\"${ele.attribute}\"].values.shape[0]""" // String.format / "{}" placeholders workflowParam = workflowParam + String.format("%s = {},", ele.parameter.getName) portParam = portParam + String.format("%s(table['%s'].values[i]),", ele.parameter.getType, ele.attribute) ``` After (`pyb` templates end-to-end) ```scala return pyb"""table[${ele.attribute}].values.shape[0]""" workflowParam = pyb"$workflowParam${ele.parameter.getName} = {}," portParam = pyb"$portParam${ele.parameter.getType}(table[${ele.attribute}].values[i])," ``` --- 6. **Develop the unit tests in the new way** Before (expects quoted literals like `'start'`) ```scala assert( opDesc.createPlotlyFigure().plain.contains( "fig = px.timeline(table, x_start='start', x_end='finish', y='task' , color='color' )" ) ) ``` After (expects template output using variables, no embedded quotes) ```scala assert( opDesc.createPlotlyFigure().plain.contains( "fig = px.timeline(table, x_start=start, x_end=finish, y=task , color=color )" ) ) ``` ### Any related issues, documentation, discussions? No ### How was this PR tested? The PR includes a comprehensive set of tests to ensure the new functionality works and that it doesn’t break existing workflows: Unit Tests for PythonTemplateBuilder: New unit tests were added to verify that PythonTemplateBuilder correctly classifies and encodes segments. For example, tests likely feed in code strings with various edge cases (braces, percentage signs, quotes, etc.) and assert that the builder produces the expected spec output. Unit Tests for PythonCodeRawInvalidTextSpec: 2 new unit test to instantiate each Python Native Operator, and call `generatePythonCode` method and checks the python code compiles and the string format is consistent. ## Was this PR authored or co-authored using generative AI tooling? Reviewed by ChatGPT 5.2

ELin2025 added 2 commits February 6, 2026 13:09

added ternary contour op

a0a3a65

scala format fix

0b8451d

github-actions bot added frontend Changes related to the frontend GUI common labels Feb 6, 2026

carloea2 and others added 5 commits February 8, 2026 23:54

added ternary contour op

edf1988

scala format fix

85bbe02

reconfigured ternary contour op to most recent PR merge

71ae194

merge to most recent PR merge

1894729

github-actions bot added engine dependencies Pull requests that update a dependency file python ci changes related to CI labels Feb 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add a new ternary contour plot operator #4193

feat: add a new ternary contour plot operator #4193

Uh oh!

ELin2025 commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add a new ternary contour plot operator #4193

Are you sure you want to change the base?

feat: add a new ternary contour plot operator #4193

Uh oh!

Conversation

ELin2025 commented Feb 6, 2026

What changes were proposed in this PR?

Any related issues, documentation, discussions?

How was this PR tested?

Was this PR authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants