Conversation
|
Thanks. I'm a bit confused though. Why would we want to expand the memory size to store a value that is already reliably calculated from existing memory? I feel like I'm missing something - why store this value? What's the end user visible change? The stepcompress.c queues were optimized for memory size. That is why it plays the weird games with int32_t instead of using the more natural int64_t. -Kevin |
|
Just to be clear, the queues were optimized for memory - both for overall memory size and for cache-line size. So, for example, if the code has to analyze 4000 steps that would be 16000 bytes in the current code - which could use a good chunk of the L1 cache on a typically SBC. This change would double the size to 32000 and notably increase the pressure on the L1 cache. So, I feel like I'm missing something. Cheers, |
My general logic about those changes is:
It is too low-level for any user-visible changes.
Any increase in precision will increase the computational cost.
I did think about the cache hits. That said, it is possible that it will behave differently on a weak SBC; I cannot know how different it will be. If you think this is not worth it, I'm okay with that. Regards, |
|
Ah, okay. So, it's an optimization to reduce cpu instructions. For what it is worth, in my experience optimizing for memory cache yields much larger benefits than optimizing for cpu instructions. If this change can be shown to measurably improve performance then I'm fine with making the change, but I'd be surprised if it notably improves performance. C compilers and modern CPUs are really good at this type of math and small branches. Cheers, |
|
That said, introducing Cheers, EDIT: If you decide to do that (no pressure), how about |
|
I will simply split it with the name refactoring, because I agree, it will simplify the understanding of what is happening here. Thanks, This change was initially done to make my repetitive "fit" computations cheaper, like if it was +17%, with cache it is ~15% |
Max error is mostly a static value. Defined by either external value or inter-step time. Procompute it once to reduce the time cost on each compressor loop. It works the same as before. Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
On every cycle last step happens at the time or before. New max_error is equal to or larger than before. Simplify logic by avoiding re-computing it. It have insegnificant impact on the compressed output. Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
03ad7bc to
3f728d5
Compare
|
Thank you for your contribution to Klipper. Unfortunately, a reviewer has not assigned themselves to this GitHub Pull Request. All Pull Requests are reviewed before merging, and a reviewer will need to volunteer. Further information is available at: https://www.klipper3d.org/CONTRIBUTING.html There are some steps that you can take now:
Unfortunately, if a reviewer does not assign themselves to this GitHub Pull Request then it will be automatically closed. If this happens, then it is a good idea to move further discussion to the Klipper Discourse server. Reviewers can reach out on that forum to let you know if they are interested and when they are available. Best regards, PS: I'm just an automated script, not a human being. |
This is a derivative of my work around the step compression code.
I hope those changes make sense on their own.
First changes change where we compute
max_error, which produces exactly the same result as the current master code.It is a "bit" faster, but I would say within the measurement error of ~0.9%.
I think this is because the current compressor works close to O(n) in practice, or stepcompress takes too little amount of time from batch computations.
Second changes, exploits all the knowledge, and simply removes the occasional increase of max_error.
Which simplifies the logic - we no longer compute it during compression, and decreases speed jumps a bit.
The first commit makes any future changes slightly simpler, and if those changes make more complicated repetitive computations - cheaper.
If we worry about increased memory usage, it is possible to simply reduce the maximum buffer size in half.
Due to itersolve quantization, practically most of the "count" will fit within the "max_error" as the max count.
Probably we can limit it to some specific value: https://www.klipper3d.org/Protocol.html#variable-length-quantities
Within the compressor 65535 -> 12287.
Total step count in memory, in half 65535 -> 32768, to keep the memory usage as before.
Some step frequency data
So, if one decodes the serial data, peak values for count per cmd is:
How to extract graph data:
If I graph steps per cmd vs occurrence frequency, it will look like this:



Thanks,
-Timofey
The above data was done in the same circumstances as here: https://klipper.discourse.group/t/improved-stepcompress-implementation/3203/50?u=nefelim4ag
So, batch mode, RP2040, adjusted MCU frequency (12, 64, 150MHz) and 64 microsteps.
I can collect data for any MCU frequency, any microstepping, or any model if it is necessary.
Btw, technically, the max error is often half of the interval, so it is possible to use it for velocity/knot analysis, but I didn't get a good/cheap way to do so right now.