Add utility functions for recentering data by JamesMcClung · Pull Request #23 · psc-code/pscpy

JamesMcClung · 2025-03-20T20:11:01Z

Adds two new utility functions:

get_recentered: creates a new DataArray by recentering a given array along a given dimension
auto_recenter: automatically recenters variables in a given Dataset in-place according to given parameters

The intended workflow (for now) is for the user to call decode_psc and then, if desired, call auto_recenter. The get_recentered function might be useful when auto_recenter fails, and is thus also made available.

Both functions are also tested.

Optimization was not a consideration for this first implementation.

just the interface for now

just ec to nc for now

this is why i can't stand line length limits

germasch

Looks good to me (as usual, without going through all of the details), except some minor comments.

Not something for this PR, but more for the usual long to-do list: I suspect there's a problem with not even writing all of the actually needed data in the case of non-periodic b.c.s. That is, in a situation with $n$ cells, certain quantities should have $n+1$ data points in certain dimensions because of these staggering issues, and I know I've thought about this before, but I kinda doubt it's a solved issue... (I think that data may be available in some variation of the xdmf writer, where the arrays are written separately for each patch including ghost points, but that of course doesn't help unless one is doing just that.)

germasch · 2025-03-21T00:15:41Z

src/pscpy/postprocessing.py

+def _rename_var(ds: xr.Dataset, old_name: str, new_name: str) -> None:
+    ds[new_name] = ds[old_name].rename(new_name)
+    del ds[old_name]


Why don't you use Dataset.rename()? I suppose that's what it's there for -- it does return a new Dataset, but I think that's actually not bad, since it kinda treats Datasets as immutable, so one doesn't have to worry about changing an existing object and possibly messing something up.

I suppose that all of that xarray stuff is implemented efficiently, ie., doesn't make copies and wastes memory.

I didn't use rename because I wanted to do it in-place. I personally just don't like the pattern of ds = ds.do_something() in python, but I don't feel that strongly, and wouldn't mind changing the api to that.

However, I don't trust xarray to do anything efficiently anymore, and even if rename uses views, the recentering itself can't. Using the builtin rename would necessitate that all the changes happen to a copy of the original ds. At best, the original ds would then be garbage-collected, but that all still seems wasteful when these datasets can become so huge.

I'd actually like to make get_recentered work in-place, too, or at least to not make multiple copies (or views? who knows). You can always pass a copy to an in-place function, but you can't do something in-place with a function that makes a copy.

I agree that in particular ds = ds.do_something() isn't great, in particular it's iffy if used in notebooks. But if used as ds_cc = ds.psc.recenter('cc'), I think it is a pattern that's more flexible in that it allows for keeping the original data around unchanged, and that again is something that's useful for not breaking other cells in a notebook.

This now actually reminds me of the marimo notebooks I mentioned before -- which IIRC does have restrictions on mutating objects since that makes it hard to automatically figure out a dependency graph.

It is foreign to me to not mutate objects, since that's what one usually does for efficiency reasons, rather than copying and then modifying. But it seems to be xarray's model to not mutate objects, and so I'm kinda inclined to give that approach the benefit of the doubt.

In any case, as far as this PR is concerned, it's fine as-is, it's not worth holding up progress for questions that can be resolved later -- it's not like this defines an interface we expect to be set in stone.

germasch · 2025-03-21T00:20:15Z

src/pscpy/postprocessing.py

+def auto_recenter(
+    ds: xr.Dataset,
+    to_centering: Literal["nc", "cc"],
+    **boundaries: BoundaryInterpMethod,
+) -> None:


I think my preference would be to just call this recenter() (also in that I think eventually this could become ds2 = ds.psc.recenter(). It's automatic, I guess, in that it uses knowledge about the existing staggering, but well, that's more or less an implementation detail.

It's also automatic in the sense that it can fail to do the correct thing. It detects which variables need to be recentered based on their names and the passed dimension names (the keys to boundares), but that's a heuristic. For example, it fails on the gauss data, since neither rho nor dive end with _nc or _cc. It similarly fails on pfd_moments (although that could potentially be fixed by appending _nc or _cc to variable names during the decoding step based on the all_1st_*c super-variable name).

If recenter existed, I would expect it to take a map of variable names to their current centerings instead of guessing. That actually seems like a good idea, and I'd be happy to make that so (and auto_recenter would just call recenter with the guessed map). There would be the question of whether/how to rename recentered variables, however.

Again, not worth the argument at this point. In the longer run, it'd certainly be nice to store the centering information in the output directly, rather than relying on field names / heuristics, making this mostly moot.

Also, IMHO nothing wrong with having recenter() try to do the right thing based on heuristics for now, but later extending it be follow a map of centerings if one is passed.

germasch · 2025-03-21T14:56:33Z

You will have to fix that CI error, though (it looks pretty trivial).

FWIW, if you run pre-commit install (IIRC), it'll install a git pre-commit hook that'll not let you commit unless you fix issues like this first.

pre-commit run -a should help to reproduce the problem locally. (Though I recurrently have a heck of a time with something getting into inconsistent state with pre-commit, where I still haven't figured out what the underlying problem is...)

JamesMcClung · 2025-03-21T16:04:16Z

The CI error is weird. CI passes on my machine, and I didn't even change the line of code that is failing. I was going to check if there was some discrepancy between my CI and this CI, or if the CI updated since the last PR somehow (I'm not yet knowledgeable about how the CI works). I suppose for now, I can just fix it manually.

germasch · 2025-03-21T17:24:15Z

Looks like you're right that this error happens independently of your PR -- on the main branch I get locally:

mypy.....................................................................Failed
- hook id: mypy
- exit code: 1

src/pscpy/psc.py:35: error: Unused "type: ignore" comment  [unused-ignore]
Found 1 error in 1 file (checked 3 source files)

Now, this would make more sense to me if there actually was a type: ignore comment there... 🤷

germasch · 2025-03-21T18:32:04Z

I take the comment above back, I had local unsaved changes. So that comment is there, and it may now be unneeded because maybe numpy got updated.

        return np.linspace(  # type: ignore[no-any-return]
            start=self.corner[coord_idx],
            stop=self.corner[coord_idx] + self.length[coord_idx],
            num=self.gdims[coord_idx],
            endpoint=False,
        )

codecov · 2025-03-21T18:39:17Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines	Coverage Δ
src/pscpy/__init__.py	`100.00% <100.00%> (ø)`
src/pscpy/postprocessing.py	`100.00% <100.00%> (ø)`
src/pscpy/psc.py	`98.07% <100.00%> (ø)`

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JamesMcClung · 2025-03-21T18:50:36Z

Thanks. My machine fails that change, though, and updating numpy didn't seem to help. How can I make it work?

Also, that 1 line of code not covered is pretty irrelevant and not worth testing; I think it can be ignored.

germasch · 2025-03-21T22:58:23Z

Since somehow my virtual env had disappeared, I think all I did was

uv venv
source .venv/bin/activate
uv pip install pre-commit
pre-commit run -a

The numpy thing is just a guess, and in fact it's not all that easy to figure out what numpy was used by pre-commit since its mypy venv is hiding in some cache directory somewhere.

When I manually installed pscpy, I ended up with numpy==2.2.4, and when I manually run mypy, I ended up with a bunch of complaints about test_postprocessing.py -- that might be because the rules for tests/ are relaxed in pyproject.toml, but it may not have picked that up. I also got complaints about adios2.

Running nox passed, though -- that's I think pretty close to running the actual CI.

germasch · 2025-03-21T23:02:05Z

On the codecov -- it's not a big deal, but I wouldn't mind keeping it at 100%. Do you ever get something that's not a str? (I think strictly speaking, those keys are Hashable, so it's possible, but not sure it's worth worrying about. They could be something else that's not a str, but still understands .endswith(), in theory...

In any case, I wouldn't mind to just remove that continue, and if it ever triggers then well, at least you'll have a test to add ;)

this seemed like the "correct" way of handling the coverage problem

JamesMcClung · 2025-03-24T17:35:31Z

for future reference/posterity's sake: removing my ~/cache/.pre-commit/ resolved the CI discrepancy

germasch · 2025-03-24T18:01:08Z

Do you have permissions to merge this yourself?

JamesMcClung added 20 commits March 20, 2025 16:05

+postprocessing: +recenter

9750991

just the interface for now

+test_postprocessing: +recenter tests

de8fca9

postprocessing: implement recenter

772daf9

postprocessing; *: rename to get_recentered

89b6365

postprocessing: +BoundaryInterpMethod

a099518

postprocessing: +auto_recenter

b00b25b

test_postprocessing: +test_autorecenter_ec_to_nc

b2a4c5e

postprocessing: partial impl of auto_recenter

a8d3ab3

just ec to nc for now

test_postprocessing: +test_autorecenter_ec_to_cc

fa33c63

postprocessing: pass test

35ae011

test_postprocessing: add "dont_touch" var

72d6307

postprocessing: fix spurious renames

52c6a01

postprocessing: +_rename_var

c27f92d

test_postprocessing: +fc tests

58955e0

postprocessing: pass fc tests

408d269

test_postprocessing: +test_autorecenter_spurious_renames

6689864

test_postprocessing: +nc, cc tests

4afed34

postprocessing: pass nc, cc tests

c40c75e

postprocessing: ugly nasty reformatting

6a52df0

this is why i can't stand line length limits

postprocessing: fix typing issues

35f7b37

germasch reviewed Mar 21, 2025

View reviewed changes

fix mypy complaint

dae5a51

test_postprocessing: add test for nonstr varname

ffcd124

this seemed like the "correct" way of handling the coverage problem

JamesMcClung merged commit a2f40bc into psc-code:main Mar 24, 2025
4 checks passed

JamesMcClung deleted the recenter branch March 24, 2025 20:17

Conversation

JamesMcClung commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

germasch left a comment

Choose a reason for hiding this comment

Uh oh!

germasch Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

JamesMcClung Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

germasch Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

germasch Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

JamesMcClung Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

germasch Mar 21, 2025

Choose a reason for hiding this comment

Uh oh!

germasch commented Mar 21, 2025

Uh oh!

JamesMcClung commented Mar 21, 2025

Uh oh!

germasch commented Mar 21, 2025

Uh oh!

germasch commented Mar 21, 2025

Uh oh!

codecov bot commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

JamesMcClung commented Mar 21, 2025

Uh oh!

germasch commented Mar 21, 2025

Uh oh!

germasch commented Mar 21, 2025

Uh oh!

JamesMcClung commented Mar 24, 2025

Uh oh!

germasch commented Mar 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JamesMcClung commented Mar 20, 2025 •

edited

Loading

codecov bot commented Mar 21, 2025 •

edited

Loading