Implement xarray-like labeled tensors and semantics by ricardoV94 · Pull Request #1411 · pymc-devs/pytensor

ricardoV94 · 2025-05-22T19:39:30Z

import numpy as np

from pytensor import function
from pytensor.xtensor.basic import add, exp
from pytensor.xtensor.type import xtensor

x = xtensor("x", dims=("city",), shape=(None,))
y = xtensor("y", dims=("country",), shape=(4,))
z = add(exp(x), exp(y))
assert z.type.dims == ("city", "country")
assert z.type.shape == (None, 4)

fn = function([x, y], z)
fn.dprint(print_type=True)
# XTensorFromTensor{dims=('city', 'country')} [id A] <XTensorType{dtype='float64', shape=(None, 4), dims=('city', 'country')}> 7
#  └─ Add [id B] <Matrix(float64, shape=(?, 4))> 6
#     ├─ Exp [id C] <Matrix(float64, shape=(?, 1))> 5
#     │  └─ ExpandDims{axis=1} [id D] <Matrix(float64, shape=(?, 1))> 3
#     │     └─ TensorFromXTensor [id E] <Vector(float64, shape=(?,))> 1
#     │        └─ x [id F] <XTensorType{dtype='float64', shape=(None,), dims=('city',)}>
#     └─ Exp [id G] <Matrix(float64, shape=(1, 4))> 4
#        └─ ExpandDims{axis=0} [id H] <Matrix(float64, shape=(1, 4))> 2
#           └─ TensorFromXTensor [id I] <Vector(float64, shape=(4,))> 0
#              └─ y [id J] <XTensorType{dtype='float64', shape=(4,), dims=('country',)}>

np.testing.assert_allclose(
    fn(x=np.zeros(3), y=np.zeros(4)),
    np.full((3, 4), 2.0),
)

Strategy

We implement xarray-like dummy Ops that respect / propagate dims semantics, and lower them to regular PyTensor graphs with rewrites.

Note in the example above the dummy TensorFromXtensor and XTensorFromTensor remain in the final graph. If we had created a function with Tensor inputs and outputs that are only then converted (symbolically) to and from xtensor, respectively, the final graph would have no signs of dimension operations, other than how it was constructed.

I suggest registering those rewrites in an xtensor_lowering database.

Coordinates

For now I'm playing with how far we can get without coordinates. This means the graphs produced by an xarray-like syntax are much more amenable to the numpy-like backend of PyTensor. Otherwise it involves a lot of Pandas-like stuff (e.g., Multiindex) that we don't really have. It may be feasible, specially if nothing is symbolic, but... I fear a rabbit hole of edge cases)

Gradients

These ops are currently not differentiable, but one can lower the graph and then call the gradient. I do want to try the lazy grad approach from #788

Help implementing more Ops so we have MVP to try out with PyMC next. We need some Ops

Open a PR on top of this branch, I'll try to merge quickly! Try to make it clean (one commit per Op, unless it's like a factory of related Ops)

Implementing means:

Create a dummy Op
Create a rewrite that lowers the dummy Op to real tensor operations
3.1 The rewrites "box" the lower tensor operations between TensorFromXTensor and XTensorFromTensor calls, so that the replacements are valid in terms of types. There are rewrites to remove chains of useless TensorFromXTensor/XTensorFromTensor that should clean up everything in the middle of the graph.
Add a test that compares with xarray/xarray_einstants and proves it's correct
If you really want, test the error checks (I haven't been doing that)

Interplay between XTensorTypes and TensorTypes / weakly typed inputs

Symbolic conversion to and from XTensor and Tensor
Make sure MetaOps accept non-XTensorType scalar inputs
Make MetaOps (Elemwise / Blockwise) "cast" regular numpy/TensorVariable inputs to XTensorVariable to behave like xarray does (dims are considered to match positionally, try it out).
Operators as methods (__add__ and the like so you can do x + x)

Meta Ops

Elemwise (automatically generated, some may fail)
Blockwise (each Op needs manual curation though)
XTensorVariable.where (1 or 2, can be implemented with math.switch probably can't support drop=True)
xtensor.where(3 inputs, just an alias to xtensor.math.switch that should be available at the module level)
CAReduce (Sum, All, Mean, ...)
Einsum (probably low priority)
Scan (a thin wrapper around Scan should be fine, just need to add the time dim to the outputs, and perhaps use that to also align the sequences)

Math stuff

Cast (it's a parametrized ScalarOp so the general XElemwise logic won't suffice)
Dot
Mean/Std/Variance (there's no CAReduce Op corresponding to those)
Everything that is a blockwise in vanilla pytensor (like all of linalg)
Argmax / Argmin
Sort / Argsort?

Shape stuff

Array creation stuff

ZerosLike / OnesLike (just return self.x * 0, self.x * 0 + 1)? PyTensor will do the right thing when it gets lowered)

Indexing stuff

__getitem__ + isel Implement indexing operations for XTensorVariables #1429
__getitem__ + isel for boolean indices (should work fine, just need to test and lift raise error)
Indexing update (aka set and inc_subtensor)
It probably makes sense to convert the non-XTensor indices to XTensor indices if they can be rendered equivalent, to reduce logic needed.

RandomVariables

This is quite important, as we'll need those for PyMC models! They are a mix of blockwise + size argument (which can or not be redundant)

Graph transformations

grad (and jacobian and all that)
vectorize_graph

📚 Documentation preview 📚: https://pytensor--1411.org.readthedocs.build/en/1411/

codecov · 2025-06-02T16:59:02Z

Codecov Report

❌ Patch coverage is 81.01695% with 336 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.99%. Comparing base (f72d7e5) to head (41d9be4).
⚠️ Report is 120 commits behind head on main.

Files with missing lines	Patch %	Lines
pytensor/xtensor/type.py	68.28%	108 Missing and 23 partials ⚠️
pytensor/xtensor/shape.py	77.52%	29 Missing and 31 partials ⚠️
pytensor/xtensor/indexing.py	75.60%	16 Missing and 14 partials ⚠️
pytensor/xtensor/basic.py	69.84%	16 Missing and 3 partials ⚠️
pytensor/xtensor/vectorization.py	83.89%	11 Missing and 8 partials ⚠️
pytensor/xtensor/reduction.py	82.50%	12 Missing and 2 partials ⚠️
pytensor/xtensor/rewriting/reduction.py	79.31%	7 Missing and 5 partials ⚠️
pytensor/xtensor/random.py	89.87%	5 Missing and 3 partials ⚠️
pytensor/xtensor/rewriting/basic.py	80.48%	8 Missing ⚠️
pytensor/xtensor/math.py	95.80%	4 Missing and 3 partials ⚠️
... and 8 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1411      +/-   ##
==========================================
- Coverage   82.01%   81.99%   -0.03%     
==========================================
  Files         214      231      +17     
  Lines       50439    52173    +1734     
  Branches     8907     9178     +271     
==========================================
+ Hits        41370    42779    +1409     
- Misses       6861     7088     +227     
- Partials     2208     2306      +98

Files with missing lines	Coverage Δ
pytensor/compile/mode.py	`84.86% <100.00%> (+0.14%)`	⬆️
pytensor/compile/ops.py	`83.45% <100.00%> (-0.12%)`	⬇️
pytensor/link/jax/dispatch/basic.py	`84.61% <100.00%> (ø)`
pytensor/link/numba/dispatch/scalar.py	`90.28% <100.00%> (ø)`
pytensor/tensor/basic.py	`91.69% <100.00%> (ø)`
pytensor/tensor/extra_ops.py	`88.88% <ø> (+0.67%)`	⬆️
pytensor/tensor/random/basic.py	`98.85% <100.00%> (-0.01%)`	⬇️
pytensor/tensor/rewriting/basic.py	`95.84% <100.00%> (+0.32%)`	⬆️
pytensor/tensor/variable.py	`86.98% <100.00%> (+0.23%)`	⬆️
pytensor/xtensor/rewriting/__init__.py	`100.00% <100.00%> (ø)`
... and 18 more

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ricardoV94 · 2025-06-04T15:16:38Z

If anybody wants to fix mypy that's very welcome :)

twiecki · 2025-06-04T20:24:44Z

pytensor/xtensor/indexing.py

@@ -0,0 +1,219 @@
+# HERE LIE DRAGONS


new files need a license.

@lucianopaz can we bring your pre-commit hook over?

Can be a separate PR, wouldn't be surprised if have files missing it in main

ricardoV94 added the enhancement New feature or request label May 22, 2025

ricardoV94 mentioned this pull request May 22, 2025

Sketch of dim-ed tensors #407

Closed

ricardoV94 force-pushed the labeled_tensors branch 5 times, most recently from d6a3ddf to 177a4c2 Compare May 26, 2025 17:36

aseyboldt mentioned this pull request May 26, 2025

Labeled Tensors as first class objects #1421

Closed

ricardoV94 force-pushed the labeled_tensors branch 2 times, most recently from 49fac6a to e32d865 Compare May 27, 2025 09:06

AllenDowney mentioned this pull request May 27, 2025

Add transpose() for labeled tensors #1427

Closed

ricardoV94 force-pushed the labeled_tensors branch 2 times, most recently from d8fe0d1 to 29b954a Compare May 28, 2025 17:50

ricardoV94 mentioned this pull request May 28, 2025

Implement indexing operations for XTensorVariables #1429

Merged

ricardoV94 force-pushed the labeled_tensors branch 2 times, most recently from 2966e9d to 692c53c Compare June 2, 2025 16:11

twiecki reviewed Jun 4, 2025

View reviewed changes

ricardoV94 mentioned this pull request Jun 5, 2025

Implement labeled RVs #1446

Merged

ricardoV94 force-pushed the labeled_tensors branch 7 times, most recently from 7da9935 to 7b8877b Compare June 6, 2025 16:09

ricardoV94 force-pushed the labeled_tensors branch 3 times, most recently from 77e018d to 5c94ce1 Compare June 20, 2025 14:40

ricardoV94 force-pushed the labeled_tensors branch from 5c94ce1 to 51031fc Compare June 20, 2025 16:17

ricardoV94 changed the title ~~Labeled tensors~~ Implement xarray-like labeled tensors and semantics Jun 20, 2025

ricardoV94 force-pushed the labeled_tensors branch 8 times, most recently from 2952b1a to 71bc4ef Compare June 21, 2025 16:58

ricardoV94 and others added 18 commits June 21, 2025 19:23

Avoid no-op DimShuffle

588494c

Use DimShuffle instead of Reshape in ix_

068229e

Extract ViewOp functionality into a base TypeCastOp

4d15f7f

Implement basic labeled tensor functionality

fb11cc5

Implement stack for XTensorVariables

d753f3e

Implement Elemwise and Blockwise operations for XTensorVariables

bec40a2

Implement cast for XTensorVariables

8012134

Implement reduction operations for XTensorVariables

3e7a803

Implement concat for XTensorVariables

5442a2d

Implement transpose for XTensorVariables

1cb5289

Implement unstack for XTensorVariables

5c62008

Implement index for XTensorVariables

a14ec0b

Implement index update for XTensorVariables

b08b395

Implement diff for XTensorVariables

a60d0ec

Implement squeeze for XTensorVariables

86d58b9

Implement expand_dims for XTensorVariables (#1449)

6e2bd64

Implement dot for XTensorVariables (#1475)

81cbec8

Implement XTensorVariable version of RandomVariables

41d9be4

ricardoV94 force-pushed the labeled_tensors branch from 71bc4ef to 41d9be4 Compare June 21, 2025 17:24

ricardoV94 merged commit 7886cf8 into main Jun 21, 2025
73 of 74 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement xarray-like labeled tensors and semantics#1411

Implement xarray-like labeled tensors and semantics#1411
ricardoV94 merged 18 commits intomainfrom
labeled_tensors

ricardoV94 commented May 22, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jun 2, 2025 •

edited

Loading

Uh oh!

ricardoV94 commented Jun 4, 2025

Uh oh!

twiecki Jun 4, 2025

Uh oh!

ricardoV94 Jun 4, 2025 •

edited

Loading

Uh oh!

lucianopaz Jun 5, 2025

Uh oh!

ricardoV94 Jun 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ricardoV94 commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Strategy

Coordinates

Gradients

Help implementing more Ops so we have MVP to try out with PyMC next. We need some Ops

Interplay between XTensorTypes and TensorTypes / weakly typed inputs

Meta Ops

Math stuff

Shape stuff

Array creation stuff

Indexing stuff

RandomVariables

Graph transformations

Uh oh!

codecov bot commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ricardoV94 commented Jun 4, 2025

Uh oh!

twiecki Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucianopaz Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

ricardoV94 Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ricardoV94 commented May 22, 2025 •

edited

Loading

codecov bot commented Jun 2, 2025 •

edited

Loading

ricardoV94 Jun 4, 2025 •

edited

Loading