big speedup for "Last N" statistics#41
Conversation
2469eb4 to
17b2cc3
Compare
|
Hi! Just to let you know I appreciate your contributions, and I'll try to review and merge them eventually. But it definitely won't happen in the next week(s). I'll try to squeeze some time, but I can't promise a deadline, sorry. If you have a specific order for the PRs to be applied, please point it out. Otherwise, if they can be merged in any order because they don't conflict with each other, that's also great. |
|
Thanks for continuing to work on prettyping! It's one of my favorite hidden gems out there. This PR is the most intense (and coolest), so maybe start with this while it's fresh in my head? I think there are only a few conflicting lines. As you merge them, I can correct conflicts that arise. |
17b2cc3 to
8e315f1
Compare
8e315f1 to
5dbeea4
Compare
Use O(1) algorithms to compute the "Last N" min/max/mean/stddev statistics.
5dbeea4 to
58369ec
Compare
|
Hi @denilsonsa , can you please consider accepting my PRs? I'd like to keep making improvements to your great script, but the PRs are starting to stack up and it's getting difficult to cleanly make new ones. I'm happy to share any explanations, testing methodologies, etc. to help you gain confidence in them. |
Use O(1) algorithms to compute the "Last N" min/max/mean/stddev statistics.
I often use
--lastwith a high N, e.g.prettyping -i 0.1 --last 6000 ...to watch for microbursts of loss or latency during the last 10 minutes.Currently, those "Last N" statistics (min/max/mean/mean-absolute-dev) are computed by passing through the whole Last N arrays each time, which becomes very slow and CPU intensive for high values of
--last.Luckily, there are relatively simple constant-time algorithms for these, which this PR switches too.
(It also needs to change from Mean Absolute Deviation to Standard Deviation, because I'm not aware of an online, constant-time algo for Mean Absolute Deviation. But Standard Deviation will be very close anyways, and that's how most other
pingtools do it.)Tested fairly extensively for correctness with nawk (macOS), gawk, mawk, and Busybox awk.
Benchmarking shows dramatic speedups as
--lastgrows.