feat(RFC): An expression language for format strings#113
feat(RFC): An expression language for format strings#113mwiebe wants to merge 1 commit intoOpenJobDescription:mainlinefrom
Conversation
|
I'm curious how you'd compare the approach of adding new syntax to using external tools for features that OpenJD doesn't cover yet. A couple ways external tools could be integrated:
I'd think at a high level, the new syntax allows more expressive OpenJD templates to still be portable since we wouldn't need an external tool (like Python) to preprocess the job template. Also it'd let us build more features off of OpenJD like conditionally hiding job submitter UI elements. But the cost is adding complexity to OpenJD that alternative implementations need to support both for parsing job templates and for interacting with them (e.g. building an OpenJD-based submitter). With agents getting better, the complexity cost is probably lower than it was a year ago. I'm also curious about your experience adding support to OpenCue. The new expression syntax seems like the right approach to me. I'm asking because I'm sure you've thought about the tradeoffs and I'd like to understand them more. |
ea6155b to
68cd4a9
Compare
When we first made the spec, we also started the pattern in Deadline Cloud submitters because we weren't able to fit in enough features to represent every variation the submitters need. That pattern uses code to generate a template that has mostly has a general template with parameters for the specific submission, but hardcodes some parts of it. The result is that the template isn't then a general reusable entity - it is just for the specific submission. When we can express everything in the template and have the submitter just set parameters, the result is available to read and re-apply to other scenes directly.
This approach adds constraints on the environment you're submitting a job template from, that an extension to the spec does not. Think of job submissions from the desktop vs from a browser, with and without a file system, etc. Expressions inside the template don't have to be understood at the submission context, but a preprocessor forces the submission context to have the ability to run the preprocessor commands.
Yes, I think this expression language hits the design tenets well.
The OpenCue support I wrote is a very incomplete prototype - it is adding openjd support to the pyoutline library, which is for defining jobs to submit to OpenCue. I think I got the basic structure right, but it needs a lot of validation before it would be ready to use. The opposite, of running an OpenJD job in OpenCue, is not something I've looked at yet.
Thanks! |
54dc9cf to
9f231a3
Compare
521b10b to
4f5d29a
Compare
| | `__not__(a: bool) -> bool` | `not a` logical NOT | | ||
|
|
||
| The `and` and `or` operators are value-returning: they return one of their operands, not | ||
| necessarily a `bool`. Only `null` and `false` are considered falsy — unlike Python, values |
There was a problem hiding this comment.
This break from Python feels like it could be confusing since the rest of the syntax is basically Python (though the motivation for a nice fallback mechanism makes sense).
What do you think about either:
- using Python's semantics. People and LLMs understand them already
- or making or/and return bool always (so forcing a more explicit
X if X else Ypattern)
There was a problem hiding this comment.
I did this based on @epmog's feedback above (see #113 (comment)). This choice lets you express null-coalescing very easily, which in Python syntax is not possible without a longer expression.
Let's extend the example from that comment. So Area is a job parameter with int? type (assuming an extension to optional job parameters), while Width and Height are integers. The expression Param.Area or (Param.Width * Param.Height) says "use the area if provided, otherwise calculate from width and height." To use Python semantics, you have to say Param.Width * Param.Height if Param.Area == None else Param.Area, otherwise you run into trouble with the Param.Area == 0 case.
There was a problem hiding this comment.
Have a read of https://numpy.org/neps/nep-0026-missing-data-summary.html if you want to dive deeper into a similar-ish topic! There the missing values are for calculation, here they are for type checking.
| args: ["-p", "{{ work_dir }}"] | ||
| ``` | ||
|
|
||
| ## Specification |
There was a problem hiding this comment.
Do we need to specify the order of operations? Or maybe just spec that it follows Python? Might be noted somewhere and I missed it.
There was a problem hiding this comment.
Which order of operations do you mean? Your comment is on the top level specification header.
This is a set of 3 RFCs to propose an expression language to use in template format strings based on a subset of the Python language. The three proposals are: RFC 5 - Expression Language RFC 6 - Expression Function Library RFC 7 - Extended Parameter Types Signed-off-by: Mark <399551+mwiebe@users.noreply.github.com>
| **Value Size Calculation:** | ||
|
|
||
| The size of a value is implementation-defined. Implementations should try to match the | ||
| actual memory usage of each value as closely as practical in their language and runtime. | ||
|
|
||
| **Memory Tracking During Evaluation:** | ||
|
|
||
| When evaluating a binary operation like `left + right`: | ||
| 1. Evaluate `left` → add `size(left)` to current memory |
There was a problem hiding this comment.
If memory constraints are up to the implementation, should we punt the memory tracking details from the RFC to the model PR?
There was a problem hiding this comment.
The size of the values are implementation-defined. A conforming implementation still needs to track it. I'm not sure why we would want to exclude this from the spec - maybe I can describe this as illustrative, and the key is that any implementation needs to track in a reasonable way.
| ### Operation-Bounded Evaluation | ||
|
|
||
| Expression evaluation must operate within a bounded number of operations to prevent | ||
| unbounded computation from deeply nested or combinatorially explosive expressions. | ||
|
|
||
| Implementations accept an optional `operation_limit` parameter (default: 10 million | ||
| recommended). During evaluation, the evaluator maintains a running operation count. | ||
| If the count exceeds the limit at any point, evaluation fails with an error. |
There was a problem hiding this comment.
I have similar concerns here as with the memory counting idea. Counting operations adds complexity to the spec and constrains implementations, but it still doesn't give strong guarantees against runtime timeouts. For example, 10M regex operations might take an unreasonable amount of time.
I think the spec should be open to an implementation capping memory and runtime with a cgroup and a timeout which would offer firmer guarantees. Or run the job parser in a Lambda function with a timeout.
There was a problem hiding this comment.
I see what I've proposed here as stronger and more portable than something based on cgroups and timeouts. There's a lot of operating system variation for that kind of mechanism, but what's specified here is quite simple to implement everywhere.
There was a problem hiding this comment.
Re: the regexes, that's a fair point for the current Python implementation. Rust's regex doesn't use unbounded backtracking per https://docs.rs/regex/latest/regex/#untrusted-input, and I'm interested in getting a Rust implementation of this spec. I see no obstacles for either the memory or operation tracking in a Rust port.
| - Operations that would produce NaN (e.g., `0.0 / 0.0`) are errors. | ||
| - `float('inf')`, `float('nan')`, and `float('-inf')` are errors. | ||
|
|
||
| ### 3. 64-bit Signed Integer Type |
There was a problem hiding this comment.
I think it'd help to add a bullet list summary in the RFC of divergences from Python's expressions. It'd be useful for checking design choices and as a list of things to double check in an implementation.
PR for RFC
Tracking Issue: #112
This is a request for comments about the proposed RFC 5, 6, and 7.
This is a set of 3 RFCs to propose an expression language to use in template format strings based on a subset of the Python language. The three proposals are:
RFC 5 - Expression Language
RFC 6 - Expression Function Library
RFC 7 - Extended Parameter Types
I have a fully-functioning prototype implementation across these branches:
This branch contains edits to show how things look when using the extension:
This branch in my fork of OpenCue is an experiment implementing a pyoutline backend for OpenJD:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.