Stepcompress cache by nefelim4ag · Pull Request #7191 · Klipper3d/klipper

nefelim4ag · 2026-02-07T16:20:13Z

This is a derivative of my work around the step compression code.
I hope those changes make sense on their own.

First changes change where we compute max_error, which produces exactly the same result as the current master code.
It is a "bit" faster, but I would say within the measurement error of ~0.9%.

I think this is because the current compressor works close to O(n) in practice, or stepcompress takes too little amount of time from batch computations.

Second changes, exploits all the knowledge, and simply removes the occasional increase of max_error.
Which simplifies the logic - we no longer compute it during compression, and decreases speed jumps a bit.

The first commit makes any future changes slightly simpler, and if those changes make more complicated repetitive computations - cheaper.

If we worry about increased memory usage, it is possible to simply reduce the maximum buffer size in half.
Due to itersolve quantization, practically most of the "count" will fit within the "max_error" as the max count.
Probably we can limit it to some specific value: https://www.klipper3d.org/Protocol.html#variable-length-quantities
Within the compressor 65535 -> 12287.
Total step count in memory, in half 65535 -> 32768, to keep the memory usage as before.

Some step frequency data

So, if one decodes the serial data, peak values for count per cmd is:

./klippy/parsedump.py stepcompress/150mhz.dict stepcompress/150Mhz/master.serial > stepcompress/150Mhz/master.log
...
$ grep queue_step stepcompress/012Mhz/master.log | grep count= | cut -d' ' -f4 | sed 's/=/ /g' | sort -h -k2 -r | head -n 20
count 6723
count 6391
count 6303
$ grep queue_step stepcompress/064Mhz/master.log | grep count= | cut -d' ' -f4 | sed 's/=/ /g' | sort -h -k2 -r | head -n 20
count 10002
count 9205
count 9170
$ grep queue_step stepcompress/150Mhz/master.log | grep count= | cut -d' ' -f4 | sed 's/=/ /g' | sort -h -k2 -r | head -n 20
count 9184
count 9150
count 9044

How to extract graph data:

grep queue_step stepcompress/150Mhz/master.log | grep count= | cut -d' ' -f4 | sed 's/=/ /g' | sort -h -k2 -r | uniq -c | sort -h -k1 -r > /tmp/count_freq_150mhz.stats

If I graph steps per cmd vs occurrence frequency, it will look like this:

Thanks,
-Timofey

The above data was done in the same circumstances as here: https://klipper.discourse.group/t/improved-stepcompress-implementation/3203/50?u=nefelim4ag
So, batch mode, RP2040, adjusted MCU frequency (12, 64, 150MHz) and 64 microsteps.

I can collect data for any MCU frequency, any microstepping, or any model if it is necessary.

Btw, technically, the max error is often half of the interval, so it is possible to use it for velocity/knot analysis, but I didn't get a good/cheap way to do so right now.

KevinOConnor · 2026-02-07T17:17:43Z

Thanks. I'm a bit confused though. Why would we want to expand the memory size to store a value that is already reliably calculated from existing memory? I feel like I'm missing something - why store this value? What's the end user visible change?

The stepcompress.c queues were optimized for memory size. That is why it plays the weird games with int32_t instead of using the more natural int64_t.

-Kevin

KevinOConnor · 2026-02-07T17:29:10Z

Just to be clear, the queues were optimized for memory - both for overall memory size and for cache-line size. So, for example, if the code has to analyze 4000 steps that would be 16000 bytes in the current code - which could use a good chunk of the L1 cache on a typically SBC. This change would double the size to 32000 and notably increase the pressure on the L1 cache.

So, I feel like I'm missing something.

Cheers,
-Kevin

nefelim4ag · 2026-02-07T18:11:23Z

I feel like I'm missing something - why store this value?

My general logic about those changes is:

There is a bisect loop that repeatedly recalculates values.
I can't optimize the loop.
I can reduce the amount of branching and fixed cost per step calculation by doing max_error once.

What's the end user visible change?

It is too low-level for any user-visible changes.
Therefore, if I try to explain my "Big Picture" view.
I think that any (future) increase in compression precision (reduction of speed jumps) can benefit:

IS smoothers
Any stepper-related shapers, because they produce "small" output changes.
(speculation on my side) Driver's logic, closed loop driver's behaviour.

Any increase in precision will increase the computational cost.
These changes reduces that cost beforehand.

... which could use a good chunk of the L1 cache on a typically SBC. This change would double the size to 32000 and notably increase the pressure on the L1 cache.

I did think about the cache hits.
I think that, as we mostly read linearly, prefetch should mostly negotiate the difference.
Alas, on my hardware RPI5 CPU.
There is no measurable impact.
Because of resolution limitations, a long compressed sequence seems unlikely, but it is the worst-case scenario for CPU caches.

That said, it is possible that it will behave differently on a weak SBC; I cannot know how different it will be.

If you think this is not worth it, I'm okay with that.

Regards,
-Timofey

KevinOConnor · 2026-02-07T18:37:05Z

Ah, okay. So, it's an optimization to reduce cpu instructions.

For what it is worth, in my experience optimizing for memory cache yields much larger benefits than optimizing for cpu instructions.

If this change can be shown to measurably improve performance then I'm fine with making the change, but I'd be surprised if it notably improves performance. C compilers and modern CPUs are really good at this type of math and small branches.

Cheers,
-Kevin

KevinOConnor · 2026-02-07T18:40:54Z

That said, introducing struct step as you have done here is probably a good idea. (That is, struct step with just the time member.) It gives the compiler more information (better ability to detect aliases) and would likely make the code a little easier to read.

Cheers,
-Kevin

EDIT: If you decide to do that (no pressure), how about struct qstep with member clock32 for names?

nefelim4ag · 2026-02-07T19:06:18Z

I will simply split it with the name refactoring, because I agree, it will simplify the understanding of what is happening here.
I think I'll appropriate your naming suggestion.

Thanks,
-Timofey

This change was initially done to make my repetitive "fit" computations cheaper, like if it was +17%, with cache it is ~15%
I will try to collect some performance data from different SBCs.
To get the idea of the impact of this specific change.

Max error is mostly a static value. Defined by either external value or inter-step time. Procompute it once to reduce the time cost on each compressor loop. It works the same as before. Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>

On every cycle last step happens at the time or before. New max_error is equal to or larger than before. Simplify logic by avoiding re-computing it. It have insegnificant impact on the compressed output. Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>

github-actions · 2026-02-22T00:14:36Z

Thank you for your contribution to Klipper. Unfortunately, a reviewer has not assigned themselves to this GitHub Pull Request. All Pull Requests are reviewed before merging, and a reviewer will need to volunteer. Further information is available at: https://www.klipper3d.org/CONTRIBUTING.html

There are some steps that you can take now:

Perform a self-review of your Pull Request by following the steps at: https://www.klipper3d.org/CONTRIBUTING.html#what-to-expect-in-a-review
If you have completed a self-review, be sure to state the results of that self-review explicitly in the Pull Request comments. A reviewer is more likely to participate if the bulk of a review has already been completed.
Consider opening a topic on the Klipper Discourse server to discuss this work. The Discourse server is a good place to discuss development ideas and to engage users interested in testing. Reviewers are more likely to prioritize Pull Requests with an active community of users.
Consider helping out reviewers by reviewing other Klipper Pull Requests. Taking the time to perform a careful and detailed review of others work is appreciated. Regular contributors are more likely to prioritize the contributions of other regular contributors.

Unfortunately, if a reviewer does not assign themselves to this GitHub Pull Request then it will be automatically closed. If this happens, then it is a good idea to move further discussion to the Klipper Discourse server. Reviewers can reach out on that forum to let you know if they are interested and when they are available.

Best regards,
~ Your friendly GitIssueBot

PS: I'm just an automated script, not a human being.

nefelim4ag mentioned this pull request Feb 7, 2026

stepcompress: wrap steps in struct qstep #7192

Merged

nefelim4ag marked this pull request as draft February 7, 2026 22:09

nefelim4ag added 2 commits February 7, 2026 23:13

stepcompress: precompute and cache max error

86da9c8

Max error is mostly a static value. Defined by either external value or inter-step time. Procompute it once to reduce the time cost on each compressor loop. It works the same as before. Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>

nefelim4ag force-pushed the stepcompress-cache branch from 03ad7bc to 3f728d5 Compare February 7, 2026 22:14

github-actions bot added the reviewer needed label Feb 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stepcompress cache#7191

Stepcompress cache#7191
nefelim4ag wants to merge 2 commits intoKlipper3d:masterfrom
nefelim4ag:stepcompress-cache

nefelim4ag commented Feb 7, 2026 •

edited

Loading

Uh oh!

KevinOConnor commented Feb 7, 2026

Uh oh!

KevinOConnor commented Feb 7, 2026

Uh oh!

nefelim4ag commented Feb 7, 2026

Uh oh!

KevinOConnor commented Feb 7, 2026

Uh oh!

KevinOConnor commented Feb 7, 2026 •

edited

Loading

Uh oh!

nefelim4ag commented Feb 7, 2026

Uh oh!

github-actions bot commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nefelim4ag commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KevinOConnor commented Feb 7, 2026

Uh oh!

KevinOConnor commented Feb 7, 2026

Uh oh!

nefelim4ag commented Feb 7, 2026

Uh oh!

KevinOConnor commented Feb 7, 2026

Uh oh!

KevinOConnor commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nefelim4ag commented Feb 7, 2026

Uh oh!

github-actions bot commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nefelim4ag commented Feb 7, 2026 •

edited

Loading

KevinOConnor commented Feb 7, 2026 •

edited

Loading