Conversation
|
Hi @pveber! Thanks for submitting this! 👍 Sorry in advance for the long reply. Here are my detailed thoughts on the proposed implementation and on the grouping operations in general. Thoughts on
|
|
I think providing a hashtable of |
|
@rizo Impressive summary, thanks! First, I like the idea of representing inner groups as simple lists rather than a stream. Second, while I followed Core's naming because the library is used widespread, I also like some of the alternatives you propose, notably When you've settled on a naming convention, I'll happily rework this PR. |
|
Hi @pveber! Really sorry for such a delay in getting back to you on this. Life has been pushing me into some other directions... I have been trying to do some thorough and future-proof thinking about the API and came up with a few conventions for both naming and semantics for the streaming operations. You can find the full analysis here: https://docs.google.com/spreadsheets/d/101Xe69CBLcdupRKXRaBk1DaFbMvjwpqwr_z7t-vdnXo One main idea I had was to avoid committing to a particular group representation for val group_by : ('a -> 'key) -> ?hash:('key -> int) -> into:('a, 'r) sink -> 'a t -> ('key * 'r) tAnd, of course, producing list groups would work like: Stream.group_by String.length ~into:Sink.list streamThe linked spreadsheet also includes other operations I mentioned previously that split the stream "by" and "between" elements. In addition to that, I have been thinking about adding more operators, some of which are described in the following diagram (also included in the linked spreadsheet): Would love to know your thoughts on this and do let me know if you would still happy to implement the |

Here is a tentative implementation of
Stream.group, similar to Core.List.group.This PR has two commits, each with a different implementation. The first is very close to
Stream.split, but since I noticed the comment above asking how efficient this implementation would be, I tried an alternative where the elements of each group are accumulated in a list. I believe it should behave the same but I'd be glad to have a confirmation. Meanwhile, I made a quick check to verify that performance-wise the second version is approximately twice as fast.The PR certainly needs more work, please let me know what I can do.