Skip to content

Conversation

@leekeiabstraction
Copy link
Contributor

@leekeiabstraction leekeiabstraction commented Jan 3, 2026

Purpose

Linked issue: close #119

Brief change log

  • Added CompactedKeyEncoder
  • Added CompactedKeyWriter
  • Added FieldGetter
  • Added Value enum (not in Java) to allow more graceful/polymorphic BinaryWriter implementations

Several TODOs that may benefit from being broken up into smaller task to prevent PR from becoming too large:

  1. BinaryWriter functions to return Result<?> for exception handling instead of panic!
  2. Decimal, Timestamp, Date, InternalArray, Row and other data type's BinaryWriter functions, FieldGetter etc. implementation
  3. CompactedKeyEncoder's all data type unit test

Tests

  • Added CompactedKeyEncoder UTs similar to Java UTs

@leekeiabstraction leekeiabstraction marked this pull request as ready for review January 4, 2026 12:40
@leekeiabstraction
Copy link
Contributor Author

@luoyuxia @Kelvinyu1117 Would appreciate review here.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the CompactedKeyEncoder infrastructure to encode key columns of rows into compact binary format. The implementation closely follows the Java reference implementation and provides a foundation for encoding primary keys for various data lake formats.

Key Changes

  • Added CompactedKeyEncoder with support for basic data types (integers, floats, strings, bytes, binary, boolean)
  • Introduced BinaryWriter trait and ValueWriter abstraction for extensible binary serialization
  • Added FieldGetter trait to extract typed field values from rows
  • Extended Datum enum with BorrowedBlob variant to support borrowed byte slices

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 19 comments.

Show a summary per file
File Description
crates/fluss/src/row/mod.rs Added module declarations for binary, encode, and field_getter; added helper method to create GenericRow from data
crates/fluss/src/row/field_getter.rs New file implementing FieldGetter trait and type-specific getters for extracting fields from rows
crates/fluss/src/row/encode/mod.rs New file defining KeyEncoder trait and factory method for creating encoders based on data lake format
crates/fluss/src/row/encode/compacted_key_encoder.rs Core implementation of CompactedKeyEncoder with comprehensive unit tests
crates/fluss/src/row/datum.rs Added BorrowedBlob variant and corresponding From implementation for borrowed byte slices
crates/fluss/src/row/compacted/mod.rs Exposed CompactedKeyWriter module
crates/fluss/src/row/compacted/compacted_key_writer.rs Wrapper around CompactedRowWriter that rejects null values for key encoding
crates/fluss/src/row/binary/mod.rs New module defining BinaryRowFormat enum and re-exporting binary writer types
crates/fluss/src/row/binary/binary_writer.rs Defines BinaryWriter and ValueWriter traits with implementations for all basic data types
crates/fluss/src/metadata/datatype.rs Added test helper methods to create RowType from data types and field names
crates/fluss/Cargo.toml Added delegate crate dependency for delegation pattern

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@leekeiabstraction
Copy link
Contributor Author

Copilot comments have been addressed

Copy link
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leekeiabstraction Thanks for the pr. Left minor comment. PTAL

@leekeiabstraction
Copy link
Contributor Author

@luoyuxia Addressed all comments, PTAL!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce CompactedKeyEncoder

3 participants