Skip to content

POC FS LAYOUT#266

Open
Adit2607 wants to merge 8 commits intodevelopfrom
poc/fs_layout
Open

POC FS LAYOUT#266
Adit2607 wants to merge 8 commits intodevelopfrom
poc/fs_layout

Conversation

@Adit2607
Copy link
Contributor

@Adit2607 Adit2607 commented Jan 7, 2026

🔁 Pull Request Template – BharatMLStack

Please fill out the following sections to help us review your changes efficiently.


📌 Summary

e.g., Adds optimizes Redis fetch latency in online-feature-store, or improves search UI responsiveness in trufflebox-ui.


📂 Modules Affected

  • horizon (Real-time systems / networking)
  • online-feature-store (Feature serving infra)
  • trufflebox-ui (Admin panel / UI)
  • infra (Docker, CI/CD, GCP/AWS setup)
  • docs (Documentation updates)
  • Other: ___________

✅ Type of Change

  • Feature addition
  • Bug fix
  • Infra / build system change
  • Performance improvement
  • Refactor
  • Documentation
  • Other: ___________

📊 Benchmark / Metrics (if applicable)

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced Layout V2 storage format with bitmap-based optimization for sparse feature data, improving compression efficiency for features with default values.
    • Enhanced feature retrieval to support default value handling for features not explicitly set.
  • Tests

    • Added comprehensive layout comparison tests validating Layout V2 performance improvements over Layout V1.
    • Extended test coverage for bitmap optimization validation across multiple data types and sparsity patterns.

@coderabbitai
Copy link

coderabbitai bot commented Feb 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

The pull request implements bitmap-based default-value tracking across PSDB Layout 2 format. Core changes include: adding bitmap metadata to deserialized and persistent storage blocks, extending all feature getter methods with defaultValue parameters, updating serialization and deserialization logic to handle bitmaps, and propagating bitmap data through retrieval and persistence layers. New comprehensive layout comparison tests validate Layout 2 optimizations against Layout 1.

Changes

Cohort / File(s) Summary
PSDB Deserialization with Layout 2 Bitmap Support
deserialized_psdb_v2.go, deserialized_psdb_v2_test.go
Added BitmapMeta field to DeserializedPSDB. All GetXxxFeature methods (numeric, string, bool scalars and vectors) now accept defaultValue []byte parameter and include bitmap-aware retrieval paths using countSetBitsBefore helper. Implemented deserializePSDBForLayout2 to parse header, bitmap metadata, and data. Test calls updated with defaultValue parameters (nil or seed bytes).
Persistent Storage Block Serialization
perm_storage_datablock_v2.go, perm_storage_datablock_v2_test.go
Added bitmap []byte field and PSDBLayout2ExtraBytes constant. New SetBitmap() and SetupBitmapMeta() methods on builder. Serialize paths updated to filter data by bitmap and prepend bitmap to payload for Layout 2. Updated setupHeadersV2 to include bitmap-present flag. Comment clarification in test for byte extraction logic.
Feature Persistence Layer
persist.go, persist_test.go
Updated BuildPSDBBlock signature to accept featureBitmap parameter. ParseFeatureValue now returns (data, bitmap, error) triple. BuildPSDBBlock propagates bitmap via SetBitmap() and conditionally calls SetupBitmapMeta() for Layout 2. Test cases expanded with extra_feature_1 and extra_feature_2 to validate 9-feature scenarios.
Feature Retrieval Layer
retrieve.go
Extended GetFeature signature to accept defaultValue []byte and propagate it through all numeric/string/bool feature extraction paths. Default value retrieval added upstream with error handling and early return on failure.
Core System API
system.go
Refactored all public GetXxx and GetXxxVector methods (GetInt32, GetUInt32, GetInt64, GetUInt64, GetFP32, GetFP64, GetUInt8, GetString, and all vector variants) to return (data, []byte, error) instead of (data, error). Added getMetaBySequence and slicesEqual{Type} helpers for bitmap population. ParseFeatureValue signature updated to return triple with bitmap. All vector getters now compute bitmap by comparing values against per-feature defaults.
Layout Comparison Testing
layout_comparison_test.go, layout_comparison_results.txt
New comprehensive test suite with TestLayout1VsLayout2Compression and TestLayout2BitmapOptimization validating Layout 2 bitmap optimizations. Exported TestResult struct captures metrics (feature counts, sparsity, data type, compression, layout sizes, reductions). Generated ASCII report documents per-data-type compression improvements and recommends Layout 2 as default.
Cache Storage Block Tests
cache_storage_datablock_v2_test.go
Updated all GetNumericScalarFeature, GetNumericVectorFeature, GetStringScalarFeature, GetStringVectorFeature, and GetBoolVectorFeature test calls with new signature parameters (extra int and []byte seed).
🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Krd Checker ⚠️ Warning The PR description contains only template boilerplate without KRD link or exemption information. Add either a KRD Google Doc link or a KRD exemption block with Pod Type and justification (minimum 20 characters).
✅ Passed checks (1 passed)
Check name Status Explanation
Dynamic Configuration Validation ✅ Passed No files matching pattern application-dyn-*.yml were found in the repository; all modified files are Go source code and one text report file.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (2)
online-feature-store/internal/handler/feature/retrieve.go (1)

883-889: Consider caching default values instead of per-feature lookups.

GetDefaultValueByte is now called for every feature in the hot path. If config access isn’t O(1), this could add latency. Consider caching defaults per FG/version (or reuse values preloaded in preProcessForKeys) and pass them into GetFeature.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/handler/feature/retrieve.go` around lines 883 -
889, GetDefaultValueByte is being called for each feature in the hot path
(h.config.GetDefaultValueByte in the loop) which can be expensive; change the
code to cache default values per feature-group/version (keyed by fgId and
version) — populate that cache once (preferably in preProcessForKeys or at the
start of the retrieval flow) and then look up the defaultValue from the cache
inside the loop instead of calling GetDefaultValueByte repeatedly, and update
the call to GetFeature to accept the cached defaultValue so you no longer call
h.config.GetDefaultValueByte for every featureLabel.
online-feature-store/internal/data/blocks/layout_comparison_test.go (1)

303-446: Results file should go to a temp dir or be opt‑in.

Tests always write layout_comparison_results.txt to the repo root, which can pollute CI or fail in read‑only environments. Consider t.TempDir() or gating behind an env flag.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` around
lines 303 - 446, The generateResultsFile function always writes
layout_comparison_results.txt to the repo root which can pollute CI or fail in
read‑only environments; change generateResultsFile to accept an output path
(e.g., outputDir string) or honor an env flag (e.g., LAYOUT_RESULTS_DIR or
LAYOUT_RESULTS_ENABLED) and, in tests, use t.TempDir() to pass a temp directory
(or skip writing when the flag is off); update callers/tests that invoke
generateResultsFile to provide the temp dir or check the flag so file creation
is opt‑in and not written to the repo root.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@LAYOUT_TEST_RESULTS.md`:
- Around line 11-22: The table in LAYOUT_TEST_RESULTS.md has inconsistent
"Improvement" percentages compared to layout_comparison_results.txt; regenerate
the results so both files come from the same successful test run and match
exactly: rerun the layout comparison test suite, produce fresh outputs, and
update LAYOUT_TEST_RESULTS.md entries (e.g., rows "High sparsity | 500 | 80%",
"Very high sparsity | 850 | 95%", "Low sparsity | 1000 | 23%", "Medium sparsity
| 100 | 50%", "Low sparsity | 200 | 20%", "FP16 high sparsity | 500 | 70%") to
the values generated by layout_comparison_results.txt (or vice versa) ensuring
date stamps remain identical and no manual edits alter the numeric percentages.
- Line 131: The summary line stating "10 different scenarios" is inconsistent
with the compressed-size table (9 rows) and layout_comparison_results.txt (13
scenarios); pick the authoritative source used by the test suite (preferably
layout_comparison_results.txt if it’s the executed output) and update the
phrasing in LAYOUT_TEST_RESULTS.md (replace the "10 different scenarios" text)
to match that authoritative scenario count, or alternatively update the
compressed-size table to include the missing scenarios so all three sources (the
"compressed-size" table, the "10 different scenarios" sentence, and
layout_comparison_results.txt) consistently report the same number.

In
`@online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go`:
- Line 67: Tests call ddb.GetNumericScalarFeature(i, 3, []byte{0,0,0}) with
hardcoded numFeatures and default bytes which are wrong; replace the placeholder
3 and []byte{0,0,0} with the actual feature count and a default-sized slice
derived from the feature's dataType.Size(). Locate calls to
GetNumericScalarFeature (and similar numeric scalar lookup helpers) and pass the
real feature count variable used in the test and construct the default value as
make([]byte, dataType.Size()) or equivalent so bitmap-path and larger numeric
types get correct-sized defaults.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go`:
- Line 394: Calls to d.GetNumericScalarFeature are using a hardcoded
numFeatures=3 and a 3‑byte default which is incorrect; compute the real feature
count from the test's values slice (e.g. numFeatures := len(values)) and create
defaultValue with the proper byte length from the feature data type (e.g.
defaultValue := make([]byte, dataType.Size())). Then pass those into
GetNumericScalarFeature(pos, numFeatures, defaultValue). Update every occurrence
(the call sites that currently pass 3 and a 3‑byte default) to use the
corresponding values length and dataType.Size() for the default.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2.go`:
- Around line 46-51: Tests that assert an "invalid layout version" error must be
updated because layoutVersion 2 is now handled by deserializePSDBForLayout2;
find the switch on layoutVersion in deserialized_psdb_v2.go (cases for 1 and 2
call deserializePSDBForLayout1/deserializePSDBForLayout2) and update any unit
tests that previously used version 2 as the invalid value to use a truly
unsupported value (e.g., 3) so they exercise the default error path instead of
the new layout-2 handler.
- Around line 19-20: The DeserializedPSDB.Copy() implementation currently omits
the new BitmapMeta flag (and any bitmap-related state), causing cloned instances
to lose bitmap metadata; update the Copy() method on DeserializedPSDB to copy
BitmapMeta from the source into the destination and also deep-copy any
bitmap-related fields (e.g., bitmap slices, pointers or flags that indicate
presence/layout) so cloned objects preserve bitmap presence and layout-2 payload
semantics.
- Around line 329-388: The bitmap branch in
DeserializedPSDB.GetNumericScalarFeature fails to validate pos, allowing
negative or >= numFeatures values to panic or misbehave; add explicit bounds
checks near the top of the bitmap path (before computing byteIdx/bitIdx or
indexing bitmap) to return an error (e.g., "position out of bounds") if pos < 0
or pos >= numFeatures, and then proceed with the existing
bitmapSize/byteIdx/bitIdx logic so countSetBitsBefore and bitmap indexing cannot
be invoked with an invalid pos.

In `@online-feature-store/internal/data/blocks/layout_comparison_results.txt`:
- Around line 1-337: The committed static snapshot layout_comparison_results.txt
is stale because the Go test suite for internal/data/blocks is failing (exit
code 1); fix by running and resolving the failing tests (go test
./internal/data/blocks), then regenerate the snapshot file (overwrite
layout_comparison_results.txt with the new output) or remove and recreate it as
a CI artifact only after tests pass, and include the regenerated file in the
same commit; ensure the commit message references the fixed tests and
regenerated snapshot so CI can validate the updated numbers.

In `@online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go`:
- Around line 268-305: The loop uses p.bitmap[i/8] without bounds checking which
can panic if the bitmap is too short; before the loop in the p.layoutVersion ==
2 && len(p.bitmap) > 0 branch compute needed := (len(values)+7)/8 and validate
len(p.bitmap) >= needed, and if not return a clear error (e.g.,
fmt.Errorf("invalid bitmap length: need %d bytes", needed)); add this guard so
indexing p.bitmap[i/8] is always safe.
- Around line 124-130: Serialize() incorrectly routes layoutVersion==2 through
serializeLayout1(), causing non-FP32 serializers to omit the layout‑2
header/bitmapMeta and corrupt deserialization; update Serialize() to handle case
2 by calling the layout‑2 serializer(s) (e.g., call serializeFP32AndLessV2 for
FP32 paths and implement or route other types to a serializeLayout2 variant that
appends bitmapMeta) or explicitly return an error for unsupported types; ensure
serializeFP32AndLessV2’s bitmapMeta format is applied consistently for all
layoutVersion==2 code paths and update any serializer functions referenced
(serializeLayout1, serializeFP32AndLessV2, or new serializeLayout2) so the
deserializer sees the header byte for every layout‑2 block.

---

Nitpick comments:
In `@online-feature-store/internal/data/blocks/layout_comparison_test.go`:
- Around line 303-446: The generateResultsFile function always writes
layout_comparison_results.txt to the repo root which can pollute CI or fail in
read‑only environments; change generateResultsFile to accept an output path
(e.g., outputDir string) or honor an env flag (e.g., LAYOUT_RESULTS_DIR or
LAYOUT_RESULTS_ENABLED) and, in tests, use t.TempDir() to pass a temp directory
(or skip writing when the flag is off); update callers/tests that invoke
generateResultsFile to provide the temp dir or check the flag so file creation
is opt‑in and not written to the repo root.

In `@online-feature-store/internal/handler/feature/retrieve.go`:
- Around line 883-889: GetDefaultValueByte is being called for each feature in
the hot path (h.config.GetDefaultValueByte in the loop) which can be expensive;
change the code to cache default values per feature-group/version (keyed by fgId
and version) — populate that cache once (preferably in preProcessForKeys or at
the start of the retrieval flow) and then look up the defaultValue from the
cache inside the loop instead of calling GetDefaultValueByte repeatedly, and
update the call to GetFeature to accept the cached defaultValue so you no longer
call h.config.GetDefaultValueByte for every featureLabel.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 11efebf and af24654.

📒 Files selected for processing (10)
  • LAYOUT_TEST_RESULTS.md
  • online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go
  • online-feature-store/internal/data/blocks/deserialized_psdb_v2.go
  • online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go
  • online-feature-store/internal/data/blocks/layout_comparison_results.txt
  • online-feature-store/internal/data/blocks/layout_comparison_test.go
  • online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go
  • online-feature-store/internal/handler/feature/persist.go
  • online-feature-store/internal/handler/feature/retrieve.go
  • online-feature-store/internal/system/system.go

Comment on lines 11 to 22
| Test Scenario | Features | Default Ratio | Compression | Improvement |
|---------------|----------|---------------|-------------|-------------|
| High sparsity | 500 | 80% | ZSTD | **21.66%** ✅ |
| Very high sparsity | 850 | 95% | ZSTD | **10.23%** ✅ |
| Low sparsity | 1000 | 23% | ZSTD | **6.39%** ✅ |
| Medium sparsity | 100 | 50% | ZSTD | **24.47%** ✅ |
| Low sparsity | 200 | 20% | ZSTD | **8.90%** ✅ |
| Edge case: All non-zero | 50 | 0% | ZSTD | **-3.50%** ⚠️ |
| Edge case: All zeros | 100 | 100% | ZSTD | **18.75%** ✅ |
| FP16 high sparsity | 500 | 70% | ZSTD | **28.54%** ✅ |
| No compression | 500 | 60% | None | **56.85%** ✅ |

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Compressed size improvement percentages are inconsistent with layout_comparison_results.txt.

Six of the nine comparable rows in this table show different percentages from the matching entries in layout_comparison_results.txt, despite both files sharing the same generation date (January 7, 2026):

Scenario This file .txt file
500 features, 80% defaults 21.66% 23.72%
850 features, 95% defaults 10.23% 6.85%
1000 features, 23% defaults 6.39% 6.02%
100 features, 50% defaults 24.47% 23.66%
200 features, 20% defaults 8.90% 7.77%
500 FP16, 70% defaults 28.54% 27.11%

This suggests the two files were authored from different test runs or edited manually after generation. Both should be regenerated from a single, passing test run to ensure they agree.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@LAYOUT_TEST_RESULTS.md` around lines 11 - 22, The table in
LAYOUT_TEST_RESULTS.md has inconsistent "Improvement" percentages compared to
layout_comparison_results.txt; regenerate the results so both files come from
the same successful test run and match exactly: rerun the layout comparison test
suite, produce fresh outputs, and update LAYOUT_TEST_RESULTS.md entries (e.g.,
rows "High sparsity | 500 | 80%", "Very high sparsity | 850 | 95%", "Low
sparsity | 1000 | 23%", "Medium sparsity | 100 | 50%", "Low sparsity | 200 |
20%", "FP16 high sparsity | 500 | 70%") to the values generated by
layout_comparison_results.txt (or vice versa) ensuring date stamps remain
identical and no manual edits alter the numeric percentages.


### Test Coverage

- ✅ 10 different scenarios covering sparsity from 0% to 100%
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Scenario count in Test Coverage doesn't match either document.

Line 131 states "10 different scenarios" but the compressed-size table above (lines 11–21) has 9 rows, and layout_comparison_results.txt documents 13 scenarios. Please align the count with whichever authoritative set is actually exercised by the test suite.

✏️ Suggested fix
-  - ✅ 10 different scenarios covering sparsity from 0% to 100%
+  - ✅ 13 different scenarios covering sparsity from 0% to 100%
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
-10 different scenarios covering sparsity from 0% to 100%
-13 different scenarios covering sparsity from 0% to 100%
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@LAYOUT_TEST_RESULTS.md` at line 131, The summary line stating "10 different
scenarios" is inconsistent with the compressed-size table (9 rows) and
layout_comparison_results.txt (13 scenarios); pick the authoritative source used
by the test suite (preferably layout_comparison_results.txt if it’s the executed
output) and update the phrasing in LAYOUT_TEST_RESULTS.md (replace the "10
different scenarios" text) to match that authoritative scenario count, or
alternatively update the compressed-size table to include the missing scenarios
so all three sources (the "compressed-size" table, the "10 different scenarios"
sentence, and layout_comparison_results.txt) consistently report the same
number.

// Verify all values
for i, expected := range []int32{1, 2, 3} {
feature, err := ddb.GetNumericScalarFeature(i)
feature, err := ddb.GetNumericScalarFeature(i, 3, []byte{0, 0, 0})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pass the correct feature count and default size.

numFeatures=3 and []byte{0,0,0} are placeholders that don’t match actual feature counts or data type sizes. This can hide bitmap-path defects and will fail when defaults are returned for larger numeric types. Please pass the real count and a default sized via dataType.Size().

Also applies to: 124-124, 279-279, 336-336, 492-492, 549-549, 705-705, 762-762, 917-917, 978-978, 1146-1146, 1203-1203, 1359-1359, 1416-1416

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go`
at line 67, Tests call ddb.GetNumericScalarFeature(i, 3, []byte{0,0,0}) with
hardcoded numFeatures and default bytes which are wrong; replace the placeholder
3 and []byte{0,0,0} with the actual feature count and a default-sized slice
derived from the feature's dataType.Size(). Locate calls to
GetNumericScalarFeature (and similar numeric scalar lookup helpers) and pass the
real feature count variable used in the test and construct the default value as
make([]byte, dataType.Size()) or equivalent so bitmap-path and larger numeric
types get correct-sized defaults.

// Test each position
for pos := 0; pos < 3; pos++ {
feature, err := d.GetNumericScalarFeature(pos)
feature, err := d.GetNumericScalarFeature(pos, 3, []byte{0, 0, 0})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use real numFeatures and correctly sized defaultValue.

Many calls pass numFeatures=3 and a 3‑byte default regardless of actual feature count/type. This hides layout‑2 bitmap behavior and will break if a default is returned for int32/float64 (size mismatch). Please pass the real feature count and a default sized to dataType.Size().

Example pattern:

defaultValue := make([]byte, types.DataTypeInt32.Size())
feature, err := d.GetNumericScalarFeature(pos, len(values), defaultValue)

Also applies to: 566-566, 590-590, 614-614, 638-638, 662-662, 999-999, 1104-1104, 1131-1131, 1158-1158, 1185-1185, 1212-1212

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go` at
line 394, Calls to d.GetNumericScalarFeature are using a hardcoded numFeatures=3
and a 3‑byte default which is incorrect; compute the real feature count from the
test's values slice (e.g. numFeatures := len(values)) and create defaultValue
with the proper byte length from the feature data type (e.g. defaultValue :=
make([]byte, dataType.Size())). Then pass those into
GetNumericScalarFeature(pos, numFeatures, defaultValue). Update every occurrence
(the call sites that currently pass 3 and a 3‑byte default) to use the
corresponding values length and dataType.Size() for the default.

Comment on lines 46 to 51
switch layoutVersion {
case 1:
ddb, err = deserializePSDBForLayout1(data)
case 2:
ddb, err = deserializePSDBForLayout2(data)
default:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Update “invalid layout version” tests after enabling layout‑2.

With layout version 2 now supported, tests expecting an error for version 2 should be updated (e.g., use version 3) to avoid CI failures.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2.go` around
lines 46 - 51, Tests that assert an "invalid layout version" error must be
updated because layoutVersion 2 is now handled by deserializePSDBForLayout2;
find the switch on layoutVersion in deserialized_psdb_v2.go (cases for 1 and 2
call deserializePSDBForLayout1/deserializePSDBForLayout2) and update any unit
tests that previously used version 2 as the invalid value to use a truly
unsupported value (e.g., 3) so they exercise the default error path instead of
the new layout-2 handler.

Comment on lines 268 to 305
if p.layoutVersion == 2 && len(p.bitmap) > 0 {

for i, v := range values {
if (p.bitmap[i/8] & (1 << (i % 8))) == 0 {
continue
}
putFloat(p.originalData[idx:idx+unitSize], v)
idx += unitSize
}

p.originalData = p.originalData[:idx]
} else {
for _, v := range values {
putFloat(p.originalData[idx:idx+unitSize], v)
idx += unitSize
}
}

// ─────────────────────────────
// Step 2: layout-2 payload handling
// ─────────────────────────────
if p.layoutVersion == 2 {
// prepend bitmap to payload if present
if len(p.bitmap) > 0 {
p.bitmapMeta = p.bitmapMeta | 1<<3 // bitmapPresent = 1
tmp := make([]byte, 0, len(p.bitmap)+len(p.originalData))
tmp = append(tmp, p.bitmap...)
tmp = append(tmp, p.originalData...)
p.originalData = tmp
}

// append bitmapMeta to header
if len(p.buf) != PSDBLayout1LengthBytes {
return nil, fmt.Errorf("invalid base header length for layout-2")
}
p.buf = append(p.buf, p.bitmapMeta)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate bitmap length before indexing.

The loop indexes p.bitmap[i/8] without checking size; a malformed bitmap can panic. Add a guard that the bitmap has at least ceil(len(values)/8) bytes.

🐛 Suggested fix
 	if p.layoutVersion == 2 && len(p.bitmap) > 0 {
+		if len(p.bitmap)*8 < len(values) {
+			return nil, fmt.Errorf("bitmap too small for %d values", len(values))
+		}
 
 		for i, v := range values {
 			if (p.bitmap[i/8] & (1 << (i % 8))) == 0 {
 				continue
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if p.layoutVersion == 2 && len(p.bitmap) > 0 {
for i, v := range values {
if (p.bitmap[i/8] & (1 << (i % 8))) == 0 {
continue
}
putFloat(p.originalData[idx:idx+unitSize], v)
idx += unitSize
}
p.originalData = p.originalData[:idx]
} else {
for _, v := range values {
putFloat(p.originalData[idx:idx+unitSize], v)
idx += unitSize
}
}
// ─────────────────────────────
// Step 2: layout-2 payload handling
// ─────────────────────────────
if p.layoutVersion == 2 {
// prepend bitmap to payload if present
if len(p.bitmap) > 0 {
p.bitmapMeta = p.bitmapMeta | 1<<3 // bitmapPresent = 1
tmp := make([]byte, 0, len(p.bitmap)+len(p.originalData))
tmp = append(tmp, p.bitmap...)
tmp = append(tmp, p.originalData...)
p.originalData = tmp
}
// append bitmapMeta to header
if len(p.buf) != PSDBLayout1LengthBytes {
return nil, fmt.Errorf("invalid base header length for layout-2")
}
p.buf = append(p.buf, p.bitmapMeta)
}
if p.layoutVersion == 2 && len(p.bitmap) > 0 {
if len(p.bitmap)*8 < len(values) {
return nil, fmt.Errorf("bitmap too small for %d values", len(values))
}
for i, v := range values {
if (p.bitmap[i/8] & (1 << (i % 8))) == 0 {
continue
}
putFloat(p.originalData[idx:idx+unitSize], v)
idx += unitSize
}
p.originalData = p.originalData[:idx]
} else {
for _, v := range values {
putFloat(p.originalData[idx:idx+unitSize], v)
idx += unitSize
}
}
// ─────────────────────────────
// Step 2: layout-2 payload handling
// ─────────────────────────────
if p.layoutVersion == 2 {
// prepend bitmap to payload if present
if len(p.bitmap) > 0 {
p.bitmapMeta = p.bitmapMeta | 1<<3 // bitmapPresent = 1
tmp := make([]byte, 0, len(p.bitmap)+len(p.originalData))
tmp = append(tmp, p.bitmap...)
tmp = append(tmp, p.originalData...)
p.originalData = tmp
}
// append bitmapMeta to header
if len(p.buf) != PSDBLayout1LengthBytes {
return nil, fmt.Errorf("invalid base header length for layout-2")
}
p.buf = append(p.buf, p.bitmapMeta)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go`
around lines 268 - 305, The loop uses p.bitmap[i/8] without bounds checking
which can panic if the bitmap is too short; before the loop in the
p.layoutVersion == 2 && len(p.bitmap) > 0 branch compute needed :=
(len(values)+7)/8 and validate len(p.bitmap) >= needed, and if not return a
clear error (e.g., fmt.Errorf("invalid bitmap length: need %d bytes", needed));
add this guard so indexing p.bitmap[i/8] is always safe.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (4)
online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go (1)

67-67: ⚠️ Potential issue | 🟠 Major

Use actual feature counts and correctly sized numeric defaults.

Line 67 (and the listed lines) still pass hardcoded 3 and []byte{0,0,0} into GetNumericScalarFeature. This can hide bitmap/default-path defects and is byte-width incorrect for types like int32/int64/fp64.

Suggested fix pattern
- feature, err := ddb.GetNumericScalarFeature(pos, 3, []byte{0, 0, 0})
+ numFeatures := /* test-specific actual feature count */
+ defaultValue := make([]byte, ddb.DataType.Size())
+ feature, err := ddb.GetNumericScalarFeature(pos, numFeatures, defaultValue)

Also applies to: 124-124, 279-279, 336-336, 492-492, 549-549, 705-705, 762-762, 917-917, 978-978, 1146-1146, 1203-1203, 1359-1359, 1416-1416

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go`
at line 67, Replace the hardcoded element count and 3-byte zero default in calls
to GetNumericScalarFeature with the actual feature count and a zero byte slice
sized to the feature's numeric width: compute the correct count (use the test's
existing variable that holds the feature/vector length instead of the literal 3)
and create the default bytes with make([]byte, numericWidth) where numericWidth
is derived from the feature's configured byte width (or the helper that returns
element width) before calling ddb.GetNumericScalarFeature(i, count,
defaultBytes). Update all occurrences that currently pass 3 and []byte{0,0,0} so
the second argument and the default slice size match the real feature count and
numeric byte width.
online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go (1)

394-394: ⚠️ Potential issue | 🟡 Minor

Use real numFeatures and type-sized defaultValue buffers in numeric scalar tests.

These calls still use hardcoded 3 and []byte{0,0,0}. That masks layout-2/default behavior and is size-invalid for types like int64/fp64.

Suggested test pattern
- feature, err := d.GetNumericScalarFeature(pos, 3, []byte{0, 0, 0})
+ numFeatures := len(expectedValues)
+ defaultValue := make([]byte, types.DataTypeInt32.Size()) // use the test's concrete dtype
+ feature, err := d.GetNumericScalarFeature(pos, numFeatures, defaultValue)

Also applies to: 566-566, 590-590, 614-614, 638-638, 662-662, 999-999, 1104-1104, 1131-1131, 1158-1158, 1185-1185, 1212-1212

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go` at
line 394, Several tests call GetNumericScalarFeature with hardcoded numFeatures
(3) and a 3-byte default ([]byte{0,0,0}), which hides layout-2/default behavior
and is invalid for wider types; update each call (e.g., the call in
deserialized_psdb_v2_test.go where feature, err :=
d.GetNumericScalarFeature(pos, 3, []byte{0,0,0})) to pass the actual numFeatures
variable used in the test and a defaultValue buffer sized to the numeric type
(e.g., defaultBytes := make([]byte, dtype.ByteSize()) or make([]byte,
expectedTypeSize) and then d.GetNumericScalarFeature(pos, numFeatures,
defaultBytes)); apply the same change to the other listed call sites so
defaultValue length matches the scalar byte-size and numFeatures reflects the
test's configured feature count.
online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go (1)

253-257: ⚠️ Potential issue | 🟠 Major

Validate bitmap length before bitmap indexing in layout-2 serializers.

p.bitmap[i/8] is used in multiple layout-2 branches without checking required bitmap size. A short bitmap will panic.

Suggested guard pattern
+ needed := (len(values) + 7) / 8
+ if len(p.bitmap) < needed {
+   return nil, fmt.Errorf("invalid bitmap length: need %d bytes, got %d", needed, len(p.bitmap))
+ }
  for i, v := range values {
    if (p.bitmap[i/8] & (1 << (i % 8))) == 0 {
      continue
    }

Also applies to: 302-304, 343-345, 384-385, 424-425, 464-465, 509-510, 604-605, 657-658, 710-711, 762-763, 813-814, 865-866, 928-929, 1002-1003

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go`
around lines 253 - 257, The code uses p.bitmap[i/8] inside layoutVersion==2
branches (e.g., the loop over values where p.bitmap[(i/8)] & (1<<(i%8)) is
checked) without validating that p.bitmap has at least (i/8)+1 bytes, which can
panic on a short bitmap; add a guard before indexing (for example ensure
len(p.bitmap) > i/8) and skip or handle the missing-bit case consistently; apply
the same validation to every layout-2 bitmap access sites in this file (the
other occurrences around the indicated branches) so all loops that reference
p.bitmap check bitmap length before using p.bitmap[i/8].
online-feature-store/internal/data/blocks/deserialized_psdb_v2.go (1)

253-263: ⚠️ Potential issue | 🟠 Major

Add explicit pos bounds checks before bitmap/vector indexing.

Bitmap-aware getters compute bit/byte indexes and read vectorLengths[pos] without first validating pos. Invalid positions can panic instead of returning an error.

Suggested guard pattern
+ if pos < 0 || pos >= numFeatures {
+   return nil, fmt.Errorf("position out of bounds")
+ }
  bitmapSize := (numFeatures + 7) / 8

For vector getters, validate against len(vectorLengths) before reading vectorLengths[pos].

Also applies to: 347-357, 449-463, 519-529, 573-576

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2.go` around
lines 253 - 263, Add explicit bounds checks for pos before any bitmap/vector
indexing: validate pos >= 0 (if signed) and pos < len(vectorLengths) at the
start of bitmap-aware getters (e.g. the block using d.LayoutVersion,
bitmapPresentMask, bitmapSize, bitmap, dense, byteIdx, bitIdx and before reading
vectorLengths[pos]); if out of range return a descriptive error rather than
allowing a panic. Apply the same guard where vectorLengths[pos] is used in the
other blocks referenced (around lines 347-357, 449-463, 519-529, 573-576) so you
check pos against len(bitmap) and len(vectorLengths) before computing
byteIdx/bitIdx or indexing the vectors.
🧹 Nitpick comments (1)
online-feature-store/internal/data/blocks/layout_comparison_test.go (1)

385-385: Consider deterministic report metadata to reduce churn.

Using time.Now() in generated artifacts causes noisy, non-semantic diffs whenever tests are rerun. Prefer stable metadata (or omit timestamp) if these files are committed.

Also applies to: 400-400

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` at line
385, Replace the non-deterministic time.Now() in the report metadata call so
test output is stable: update the b.WriteString(fmt.Sprintf("**Generated:**
%s\n", time.Now().Format("2006-01-02 15:04:05"))) call to use a deterministic
value (e.g. a fixed timestamp string or a test-controlled variable like
generatedTimestamp) or omit the timestamp entirely; change every similar
occurrence (the same b.WriteString + fmt.Sprintf pattern) so the generated
artifacts no longer embed the current time.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@online-feature-store/internal/data/blocks/layout_comparison_test.go`:
- Around line 497-505: The code currently computes validCases as len(results)-1
which assumes exactly one zero-default scenario; replace this with a dynamic
count of non-default cases by iterating over results and counting entries that
are not the 0% default (e.g., check a field or metric on each result that
indicates default/zero) and use that count when computing averages for
totalOriginalReduction and totalCompressedReduction; also guard against zero
validCases to avoid divide-by-zero and adjust the fmt.Fprintf blocks (the uses
of validCases, results, betterCount, totalOriginalReduction,
totalCompressedReduction) to only print averages when validCases > 0 and
otherwise print an appropriate message or zeros.
- Around line 701-733: generateSparseData is unused causing a staticcheck U1000;
wire it into the FP32 test path instead of leaving it dead. Find the FP32 branch
in layout_comparison_test.go where sparse float32 test data is currently created
inline and replace that inline generation with a call to
generateSparseData(numFeatures, defaultRatio), using the returned []float32 and
[]byte for the test inputs (preserve any downstream variable names or adapt as
needed); alternatively if you prefer removal, delete the generateSparseData
function entirely to eliminate the dead code.

In `@online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go`:
- Around line 1020-1025: The code currently truncates the last partial byte for
non-byte-aligned bool vectors by doing p.originalData = p.originalData[:idx];
change the slice to include the partial byte: compute the actual byte count
(e.g., byteCount := (bitLen+7)/8 or use idx+1 when there are remaining bits) and
set p.originalData = p.originalData[:byteCount] (ensure byteCount ≤
len(p.originalData)), then rebuild tmp from p.bitmap and p.originalData and call
encodeData(p, enc) as before; update the tmp capacity calculation to use
len(p.bitmap)+len(p.originalData). Ensure this change is applied in the layout-2
bool vector serialization branch where p.originalData, idx, p.bitmap, and
encodeData are used.

In `@online-feature-store/internal/system/system.go`:
- Around line 1094-1095: In GetFP32Vector (where fp32Vectors[meta.Sequence] is
set), replace the incorrect FP16 decoder call
ByteOrder.FP16Vector(meta.DefaultValuesInBytes) with the FP32 decoder (e.g.,
ByteOrder.FP32Vector or the library's FP32 decoding function) so default FP32
vectors are decoded with the correct type; also update the identical occurrences
around the other block (the spots referenced at lines ~1103-1104) to use the
FP32 decoder instead of FP16 to ensure correct default vector and bitmap
behavior.

---

Duplicate comments:
In
`@online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go`:
- Line 67: Replace the hardcoded element count and 3-byte zero default in calls
to GetNumericScalarFeature with the actual feature count and a zero byte slice
sized to the feature's numeric width: compute the correct count (use the test's
existing variable that holds the feature/vector length instead of the literal 3)
and create the default bytes with make([]byte, numericWidth) where numericWidth
is derived from the feature's configured byte width (or the helper that returns
element width) before calling ddb.GetNumericScalarFeature(i, count,
defaultBytes). Update all occurrences that currently pass 3 and []byte{0,0,0} so
the second argument and the default slice size match the real feature count and
numeric byte width.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go`:
- Line 394: Several tests call GetNumericScalarFeature with hardcoded
numFeatures (3) and a 3-byte default ([]byte{0,0,0}), which hides
layout-2/default behavior and is invalid for wider types; update each call
(e.g., the call in deserialized_psdb_v2_test.go where feature, err :=
d.GetNumericScalarFeature(pos, 3, []byte{0,0,0})) to pass the actual numFeatures
variable used in the test and a defaultValue buffer sized to the numeric type
(e.g., defaultBytes := make([]byte, dtype.ByteSize()) or make([]byte,
expectedTypeSize) and then d.GetNumericScalarFeature(pos, numFeatures,
defaultBytes)); apply the same change to the other listed call sites so
defaultValue length matches the scalar byte-size and numFeatures reflects the
test's configured feature count.

In `@online-feature-store/internal/data/blocks/deserialized_psdb_v2.go`:
- Around line 253-263: Add explicit bounds checks for pos before any
bitmap/vector indexing: validate pos >= 0 (if signed) and pos <
len(vectorLengths) at the start of bitmap-aware getters (e.g. the block using
d.LayoutVersion, bitmapPresentMask, bitmapSize, bitmap, dense, byteIdx, bitIdx
and before reading vectorLengths[pos]); if out of range return a descriptive
error rather than allowing a panic. Apply the same guard where
vectorLengths[pos] is used in the other blocks referenced (around lines 347-357,
449-463, 519-529, 573-576) so you check pos against len(bitmap) and
len(vectorLengths) before computing byteIdx/bitIdx or indexing the vectors.

In `@online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go`:
- Around line 253-257: The code uses p.bitmap[i/8] inside layoutVersion==2
branches (e.g., the loop over values where p.bitmap[(i/8)] & (1<<(i%8)) is
checked) without validating that p.bitmap has at least (i/8)+1 bytes, which can
panic on a short bitmap; add a guard before indexing (for example ensure
len(p.bitmap) > i/8) and skip or handle the missing-bit case consistently; apply
the same validation to every layout-2 bitmap access sites in this file (the
other occurrences around the indicated branches) so all loops that reference
p.bitmap check bitmap length before using p.bitmap[i/8].

---

Nitpick comments:
In `@online-feature-store/internal/data/blocks/layout_comparison_test.go`:
- Line 385: Replace the non-deterministic time.Now() in the report metadata call
so test output is stable: update the b.WriteString(fmt.Sprintf("**Generated:**
%s\n", time.Now().Format("2006-01-02 15:04:05"))) call to use a deterministic
value (e.g. a fixed timestamp string or a test-controlled variable like
generatedTimestamp) or omit the timestamp entirely; change every similar
occurrence (the same b.WriteString + fmt.Sprintf pattern) so the generated
artifacts no longer embed the current time.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between af24654 and 272b8db.

📒 Files selected for processing (12)
  • online-feature-store/internal/data/blocks/cache_storage_datablock_v2_test.go
  • online-feature-store/internal/data/blocks/deserialized_psdb_v2.go
  • online-feature-store/internal/data/blocks/deserialized_psdb_v2_test.go
  • online-feature-store/internal/data/blocks/layout_comparison_results.md
  • online-feature-store/internal/data/blocks/layout_comparison_results.txt
  • online-feature-store/internal/data/blocks/layout_comparison_test.go
  • online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go
  • online-feature-store/internal/data/blocks/perm_storage_datablock_v2_test.go
  • online-feature-store/internal/handler/feature/persist.go
  • online-feature-store/internal/handler/feature/persist_test.go
  • online-feature-store/internal/handler/feature/retrieve.go
  • online-feature-store/internal/system/system.go
✅ Files skipped from review due to trivial changes (1)
  • online-feature-store/internal/data/blocks/perm_storage_datablock_v2_test.go

Comment on lines +701 to +733
func generateSparseData(numFeatures int, defaultRatio float64) ([]float32, []byte) {
rand.Seed(time.Now().UnixNano())

data := make([]float32, numFeatures)
bitmap := make([]byte, (numFeatures+7)/8)

numDefaults := int(float64(numFeatures) * defaultRatio)

// Create a list of indices
indices := make([]int, numFeatures)
for i := range indices {
indices[i] = i
}

// Shuffle indices
rand.Shuffle(len(indices), func(i, j int) {
indices[i], indices[j] = indices[j], indices[i]
})

// Set first numDefaults indices to 0.0 (default), rest to random values
for i := 0; i < numFeatures; i++ {
idx := indices[i]
if i < numDefaults {
data[idx] = 0.0
// bitmap bit remains 0
} else {
data[idx] = rand.Float32()
bitmap[idx/8] |= 1 << (idx % 8)
}
}

return data, bitmap
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix staticcheck blocker: generateSparseData is dead code.

Line 701 introduces an unused function, and CI is already failing with U1000.

One minimal fix option (use the helper in FP32 path)
-	case types.DataTypeFP32, types.DataTypeFP16:
+	case types.DataTypeFP32:
+		data, bitmap := generateSparseData(numFeatures, defaultRatio)
+		return &sparseDataResult{data: data, bitmap: bitmap, nonZeroCount: nonZeroCount}, nil
+	case types.DataTypeFP16:
 		data := make([]float32, numFeatures)
 		for i := 0; i < numFeatures; i++ {
 			idx := indices[i]
 			if i < numDefaults {
 				data[idx] = 0.0
🧰 Tools
🪛 GitHub Actions: Online Feature Store CI

[error] 701-701: staticcheck: internal/data/blocks/layout_comparison_test.go:701:6: func generateSparseData is unused (U1000)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` around
lines 701 - 733, generateSparseData is unused causing a staticcheck U1000; wire
it into the FP32 test path instead of leaving it dead. Find the FP32 branch in
layout_comparison_test.go where sparse float32 test data is currently created
inline and replace that inline generation with a call to
generateSparseData(numFeatures, defaultRatio), using the returned []float32 and
[]byte for the test inputs (preserve any downstream variable names or adapt as
needed); alternatively if you prefer removal, delete the generateSparseData
function entirely to eliminate the dead code.

Comment on lines +1020 to +1025
p.originalData = p.originalData[:idx]
tmp := make([]byte, 0, len(p.bitmap)+len(p.originalData))
tmp = append(tmp, p.bitmap...)
tmp = append(tmp, p.originalData...)
p.originalData = tmp
return encodeData(p, enc)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix bool-vector dense payload truncation for non-byte-aligned vectors.

In layout-2 bool vector serialization, p.originalData = p.originalData[:idx] drops the last partial byte when total bits are not a multiple of 8.

Suggested fix
- p.originalData = p.originalData[:idx]
+ bytesUsed := idx
+ if shift != 7 { // partially used current byte
+   bytesUsed++
+ }
+ p.originalData = p.originalData[:bytesUsed]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
p.originalData = p.originalData[:idx]
tmp := make([]byte, 0, len(p.bitmap)+len(p.originalData))
tmp = append(tmp, p.bitmap...)
tmp = append(tmp, p.originalData...)
p.originalData = tmp
return encodeData(p, enc)
bytesUsed := idx
if shift != 7 { // partially used current byte
bytesUsed++
}
p.originalData = p.originalData[:bytesUsed]
tmp := make([]byte, 0, len(p.bitmap)+len(p.originalData))
tmp = append(tmp, p.bitmap...)
tmp = append(tmp, p.originalData...)
p.originalData = tmp
return encodeData(p, enc)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/perm_storage_datablock_v2.go`
around lines 1020 - 1025, The code currently truncates the last partial byte for
non-byte-aligned bool vectors by doing p.originalData = p.originalData[:idx];
change the slice to include the partial byte: compute the actual byte count
(e.g., byteCount := (bitLen+7)/8 or use idx+1 when there are remaining bits) and
set p.originalData = p.originalData[:byteCount] (ensure byteCount ≤
len(p.originalData)), then rebuild tmp from p.bitmap and p.originalData and call
encodeData(p, enc) as before; update the tmp capacity calculation to use
len(p.bitmap)+len(p.originalData). Ensure this change is applied in the layout-2
bool vector serialization branch where p.originalData, idx, p.bitmap, and
encodeData are used.

Comment on lines 1094 to 1095
fp32Vectors[meta.Sequence] = ByteOrder.FP16Vector(meta.DefaultValuesInBytes)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Use FP32 decoding for FP32 vector defaults (not FP16).

GetFP32Vector currently decodes defaults via ByteOrder.FP16Vector, which is type-mismatched and will produce incorrect default vectors/bitmap decisions.

Suggested fix
- fp32Vectors[meta.Sequence] = ByteOrder.FP16Vector(meta.DefaultValuesInBytes)
+ fp32Vectors[meta.Sequence] = ByteOrder.Float32Vector(meta.DefaultValuesInBytes)

- defaultVec := ByteOrder.FP16Vector(meta.DefaultValuesInBytes)
+ defaultVec := ByteOrder.Float32Vector(meta.DefaultValuesInBytes)

Also applies to: 1103-1104

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/system/system.go` around lines 1094 - 1095, In
GetFP32Vector (where fp32Vectors[meta.Sequence] is set), replace the incorrect
FP16 decoder call ByteOrder.FP16Vector(meta.DefaultValuesInBytes) with the FP32
decoder (e.g., ByteOrder.FP32Vector or the library's FP32 decoding function) so
default FP32 vectors are decoded with the correct type; also update the
identical occurrences around the other block (the spots referenced at lines
~1103-1104) to use the FP32 decoder instead of FP16 to ensure correct default
vector and bitmap behavior.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
online-feature-store/internal/data/blocks/layout_comparison_test.go (2)

727-759: ⚠️ Potential issue | 🔴 Critical

generateSparseData is still unused — CI pipeline is blocked.

The function at Line 727 is unreferenced and staticcheck emits U1000, failing CI. This was flagged in a previous review cycle and remains unresolved.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` around
lines 727 - 759, The function generateSparseData is unused and triggers
staticcheck U1000; remove the entire generateSparseData function declaration
from layout_comparison_test.go (or if you intended to keep it, add a direct,
meaningful usage such as calling generateSparseData(...) within an existing test
or a small helper test function so it’s referenced), ensuring references are
updated (search for generateSparseData) and run go vet/staticcheck to confirm
the U1000 warning is gone.

523-523: ⚠️ Potential issue | 🟡 Minor

Hardcoded validCases denominator still incorrect.

validCases := len(results) - 1 assumes exactly one 0%-default entry. With the current 65-scenario run, there are 23 zero-default entries, so averages are divided by 64 instead of 42, silently skewing the output in both the .txt and .md artifact files.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` at line
523, Replace the hardcoded denominator validCases := len(results) - 1 with a
computed count that filters out all entries that represent the 0% defaults case
(i.e., iterate results and increment validCases only for entries where the
defaults/defaultPercent field is non-zero); update any uses of validCases (e.g.,
average calculations) to use this filtered count so the average divides by the
actual number of non-zero-default scenarios. Use the existing results slice and
its field that indicates default percentage (e.g., DefaultPercent/Defaults) to
identify zero-default entries and compute validCases accordingly.
online-feature-store/internal/data/blocks/layout_comparison_results.txt (1)

1453-1458: ⚠️ Potential issue | 🟡 Minor

Average improvement figures are computed with the wrong denominator.

The "Average Improvements (excluding 0% defaults)" block uses validCases = len(results) - 1 = 64 as the divisor, but there are 23 zero-default scenarios in this run (42 actual valid cases). Both the original-size and compressed-size average reductions shown here are understated as a result. These numbers will self-correct once the validCases computation in generateResultsFile is fixed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_results.txt`
around lines 1453 - 1458, The average-improvement denominators are computed
incorrectly in generateResultsFile: validCases is set to len(results) - 1
instead of counting only non-zero-default scenarios, causing understated
averages; update generateResultsFile to compute validCases by filtering results
for entries that are considered "valid" (e.g., where the default reduction/
delta is non-zero or a specific flag on each result indicates non-default) and
use that count when dividing to compute Original Size and Compressed Size
average reductions; ensure you reference the same filtered collection when
summing reductions so the numerator and denominator match (use the results
filter rather than len(results)-1) and keep the special-case exclusion logic
consistent with how zero-default scenarios are detected.
🧹 Nitpick comments (1)
online-feature-store/internal/data/blocks/layout_comparison_test.go (1)

1114-1126: printComparison silently no-ops on type-assertion failure.

tc interface{} is asserted to a specific anonymous struct. Any future caller that uses a structurally identical but separately declared anonymous struct, or any change to the struct fields, will cause ok == false and the function will silently return nothing — with no compilation error. Consider defining a named struct type for the test case (which already has a clear shape in TestLayout1VsLayout2Compression) and passing that directly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` around
lines 1114 - 1126, The function printComparison currently type-asserts tc
interface{} to an anonymous struct and silently returns if the assertion fails;
to fix, define a named test-case type (e.g., type LayoutTestCase struct { name
string; numFeatures int; defaultRatio float64; dataType types.DataType;
compressionType compression.Type; expectedImprovement string }) matching the
shape used in TestLayout1VsLayout2Compression, update printComparison signature
to accept that named LayoutTestCase (instead of interface{}) and update all
callers (including TestLayout1VsLayout2Compression) to pass the typed value so
the runtime type-assertion is removed and compile-time checks ensure
correctness.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@online-feature-store/internal/data/blocks/layout_comparison_results.md`:
- Line 193: The Key Findings text is reporting the raw percentages instead of
the actual sparsity ratios used after truncating default counts; update
generateResultsMarkdown to compute the real ratios from the integer default
counts (use the same truncated values derived from defaultRatiosForCatalog) and
render those computed ratios in the "With ..." sentence (or alternatively change
the wording template to say “various tested defaults”); locate and adjust code
in generateResultsMarkdown where defaultRatiosForCatalog is read and the summary
sentence is formed so the displayed percentages match the post-truncation,
per-feature-group ratios actually used in the tests.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go`:
- Line 728: Remove the deprecated global seeding calls in
layout_comparison_test.go (the rand.Seed(time.Now().UnixNano()) calls around the
test setup) because Go auto-seeds the global source; delete those lines or
replace them with a local source if deterministic/non-global randomness is
required (use rand.New(rand.NewSource(...)) and update any uses that expect the
global rand to use that local instance instead).
- Around line 548-556: The Conclusion block uses
totalCompressedReduction/float64(validCases) without checking validCases,
causing a divide-by-zero when validCases==0; update the code around the
Conclusion (the fmt.Fprintf calls that reference validCases,
totalCompressedReduction, maxOriginalReduction, betterCount, results, and f) to
compute averageCompressedReduction and maxOriginalReduction conditionally: if
validCases>0 calculate and print the numeric percentages, otherwise print a safe
placeholder like "N/A" or "0.00%" (or move the whole Conclusion inside the
existing if validCases>0 guard) so no division by zero can occur and the results
file is still written when validCases==0.

---

Duplicate comments:
In `@online-feature-store/internal/data/blocks/layout_comparison_results.txt`:
- Around line 1453-1458: The average-improvement denominators are computed
incorrectly in generateResultsFile: validCases is set to len(results) - 1
instead of counting only non-zero-default scenarios, causing understated
averages; update generateResultsFile to compute validCases by filtering results
for entries that are considered "valid" (e.g., where the default reduction/
delta is non-zero or a specific flag on each result indicates non-default) and
use that count when dividing to compute Original Size and Compressed Size
average reductions; ensure you reference the same filtered collection when
summing reductions so the numerator and denominator match (use the results
filter rather than len(results)-1) and keep the special-case exclusion logic
consistent with how zero-default scenarios are detected.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go`:
- Around line 727-759: The function generateSparseData is unused and triggers
staticcheck U1000; remove the entire generateSparseData function declaration
from layout_comparison_test.go (or if you intended to keep it, add a direct,
meaningful usage such as calling generateSparseData(...) within an existing test
or a small helper test function so it’s referenced), ensuring references are
updated (search for generateSparseData) and run go vet/staticcheck to confirm
the U1000 warning is gone.
- Line 523: Replace the hardcoded denominator validCases := len(results) - 1
with a computed count that filters out all entries that represent the 0%
defaults case (i.e., iterate results and increment validCases only for entries
where the defaults/defaultPercent field is non-zero); update any uses of
validCases (e.g., average calculations) to use this filtered count so the
average divides by the actual number of non-zero-default scenarios. Use the
existing results slice and its field that indicates default percentage (e.g.,
DefaultPercent/Defaults) to identify zero-default entries and compute validCases
accordingly.

---

Nitpick comments:
In `@online-feature-store/internal/data/blocks/layout_comparison_test.go`:
- Around line 1114-1126: The function printComparison currently type-asserts tc
interface{} to an anonymous struct and silently returns if the assertion fails;
to fix, define a named test-case type (e.g., type LayoutTestCase struct { name
string; numFeatures int; defaultRatio float64; dataType types.DataType;
compressionType compression.Type; expectedImprovement string }) matching the
shape used in TestLayout1VsLayout2Compression, update printComparison signature
to accept that named LayoutTestCase (instead of interface{}) and update all
callers (including TestLayout1VsLayout2Compression) to pass the typed value so
the runtime type-assertion is removed and compile-time checks ensure
correctness.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 272b8db and ff6223e.

📒 Files selected for processing (3)
  • online-feature-store/internal/data/blocks/layout_comparison_results.md
  • online-feature-store/internal/data/blocks/layout_comparison_results.txt
  • online-feature-store/internal/data/blocks/layout_comparison_test.go


// generateSparseData creates test data with specified sparsity (default ratio) for FP32
func generateSparseData(numFeatures int, defaultRatio float64) ([]float32, []byte) {
rand.Seed(time.Now().UnixNano())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the Go module version
fd go.mod --exec grep '^go ' {}

Repository: Meesho/BharatMLStack

Length of output: 144


🏁 Script executed:

#!/bin/bash
# Read the specific file at the mentioned lines to verify the content
if [ -f "online-feature-store/internal/data/blocks/layout_comparison_test.go" ]; then
  echo "=== Lines 720-740 (around line 728) ==="
  sed -n '720,740p' online-feature-store/internal/data/blocks/layout_comparison_test.go
  echo ""
  echo "=== Lines 765-780 (around line 773) ==="
  sed -n '765,780p' online-feature-store/internal/data/blocks/layout_comparison_test.go
else
  echo "File not found, searching for it..."
  fd layout_comparison_test.go --exec cat {}
fi

Repository: Meesho/BharatMLStack

Length of output: 1471


🏁 Script executed:

#!/bin/bash
# Search for all rand.Seed calls in the file to understand usage pattern
rg "rand\.Seed" online-feature-store/internal/data/blocks/layout_comparison_test.go -n

Repository: Meesho/BharatMLStack

Length of output: 139


Remove deprecated rand.Seed calls — automatic seeding is enabled in Go 1.20+.

rand.Seed(time.Now().UnixNano()) at lines 728 and 773 is deprecated in Go 1.24. The global random source is automatically seeded; these calls are no-ops and should be removed. Alternatively, if determinism per call is needed, use rand.New(rand.NewSource(time.Now().UnixNano())) for a local source.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` at line
728, Remove the deprecated global seeding calls in layout_comparison_test.go
(the rand.Seed(time.Now().UnixNano()) calls around the test setup) because Go
auto-seeds the global source; delete those lines or replace them with a local
source if deterministic/non-global randomness is required (use
rand.New(rand.NewSource(...)) and update any uses that expect the global rand to
use that local instance instead).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
online-feature-store/internal/data/blocks/layout_comparison_test.go (3)

781-781: ⚠️ Potential issue | 🟡 Minor

Remove deprecated rand.Seed call.

rand.Seed(time.Now().UnixNano()) is deprecated since Go 1.20—the global random source is automatically seeded. This call is a no-op and should be removed.

🧹 Proposed fix
 func generateSparseDataByType(dataType types.DataType, numFeatures int, defaultRatio float64) (*sparseDataResult, error) {
-	rand.Seed(time.Now().UnixNano())
 	bitmap := make([]byte, (numFeatures+7)/8)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` at line
781, Remove the deprecated seeding call by deleting the
rand.Seed(time.Now().UnixNano()) invocation in the test; the global random
source is auto-seeded in Go 1.20+, so locate the rand.Seed(...) line (in
layout_comparison_test.go) and remove it to eliminate the no-op.

734-767: ⚠️ Potential issue | 🔴 Critical

Remove unused generateSparseData function — CI is failing.

This function is unused (staticcheck U1000) and blocking the pipeline. generateSparseDataByType handles all data types including FP32, making this function redundant.

🐛 Proposed fix: delete the unused function
-// generateSparseData creates test data with specified sparsity (default ratio) for FP32
-func generateSparseData(numFeatures int, defaultRatio float64) ([]float32, []byte) {
-	rand.Seed(time.Now().UnixNano())
-
-	data := make([]float32, numFeatures)
-	bitmap := make([]byte, (numFeatures+7)/8)
-
-	numDefaults := int(float64(numFeatures) * defaultRatio)
-
-	// Create a list of indices
-	indices := make([]int, numFeatures)
-	for i := range indices {
-		indices[i] = i
-	}
-
-	// Shuffle indices
-	rand.Shuffle(len(indices), func(i, j int) {
-		indices[i], indices[j] = indices[j], indices[i]
-	})
-
-	// Set first numDefaults indices to 0.0 (default), rest to random values
-	for i := 0; i < numFeatures; i++ {
-		idx := indices[i]
-		if i < numDefaults {
-			data[idx] = 0.0
-			// bitmap bit remains 0
-		} else {
-			data[idx] = rand.Float32()
-			bitmap[idx/8] |= 1 << (idx % 8)
-		}
-	}
-
-	return data, bitmap
-}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` around
lines 734 - 767, The function generateSparseData is unused and triggers
staticcheck U1000; remove the entire generateSparseData function declaration
(including its signature and body) so that generateSparseDataByType remains the
single implementation handling FP32 and other types; search for
generateSparseData to ensure no callers exist before deleting.

556-564: ⚠️ Potential issue | 🔴 Critical

Divide-by-zero: validCases used unconditionally in Conclusion block.

Line 560 divides by validCases outside the if validCases > 0 guard (which ends at line 548). If there are no valid cases, this causes a runtime panic and the results file is never written.

🐛 Proposed fix: guard the division
 	fmt.Fprintf(f, "✅ Layout2 should be used as the default layout version.\n\n")
 	fmt.Fprintf(f, "Rationale:\n")
 	fmt.Fprintf(f, "  • Consistent improvements in %d out of %d scenarios (%.1f%%)\n",
 		betterCount, len(results), float64(betterCount)/float64(len(results))*100)
-	fmt.Fprintf(f, "  • Average compressed size reduction: %.2f%%\n", totalCompressedReduction/float64(validCases))
+	if validCases > 0 {
+		fmt.Fprintf(f, "  • Average compressed size reduction: %.2f%%\n", totalCompressedReduction/float64(validCases))
+	} else {
+		fmt.Fprintf(f, "  • Average compressed size reduction: N/A (no non-zero default cases)\n")
+	}
 	fmt.Fprintf(f, "  • Maximum original size reduction: %.2f%%\n", maxOriginalReduction)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` around
lines 556 - 564, The Conclusion block unconditionally divides by validCases
(totalCompressedReduction/float64(validCases)) which can panic when validCases
== 0; modify the code in the Conclusion/printing section so it checks if
validCases > 0 before performing the division and printing compressed-size
statistics (totalCompressedReduction and maxOriginalReduction), otherwise print
a sensible fallback (e.g., "N/A" or "no valid cases") for the average/max
reduction lines; ensure you still print the other lines that don't depend on
validCases (betterCount, results, sparsity note) so the results file is always
written.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@online-feature-store/internal/data/blocks/layout_comparison_test.go`:
- Line 781: Remove the deprecated seeding call by deleting the
rand.Seed(time.Now().UnixNano()) invocation in the test; the global random
source is auto-seeded in Go 1.20+, so locate the rand.Seed(...) line (in
layout_comparison_test.go) and remove it to eliminate the no-op.
- Around line 734-767: The function generateSparseData is unused and triggers
staticcheck U1000; remove the entire generateSparseData function declaration
(including its signature and body) so that generateSparseDataByType remains the
single implementation handling FP32 and other types; search for
generateSparseData to ensure no callers exist before deleting.
- Around line 556-564: The Conclusion block unconditionally divides by
validCases (totalCompressedReduction/float64(validCases)) which can panic when
validCases == 0; modify the code in the Conclusion/printing section so it checks
if validCases > 0 before performing the division and printing compressed-size
statistics (totalCompressedReduction and maxOriginalReduction), otherwise print
a sensible fallback (e.g., "N/A" or "no valid cases") for the average/max
reduction lines; ensure you still print the other lines that don't depend on
validCases (betterCount, results, sparsity note) so the results file is always
written.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ff6223e and 7b8bc3d.

📒 Files selected for processing (3)
  • online-feature-store/internal/data/blocks/layout_comparison_results.md
  • online-feature-store/internal/data/blocks/layout_comparison_results.txt
  • online-feature-store/internal/data/blocks/layout_comparison_test.go
✅ Files skipped from review due to trivial changes (1)
  • online-feature-store/internal/data/blocks/layout_comparison_results.md

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
online-feature-store/internal/data/blocks/layout_comparison_test.go (2)

669-702: ⚠️ Potential issue | 🔴 Critical

Remove or wire generateSparseData to unblock staticcheck.

generateSparseData is currently dead code and CI is already failing with U1000. Please either delete it or reuse it in the FP32 path.

💡 Minimal fix (delete unused helper)
-// generateSparseData creates test data with specified sparsity (default ratio) for FP32
-func generateSparseData(numFeatures int, defaultRatio float64) ([]float32, []byte) {
-	rand.Seed(time.Now().UnixNano())
-
-	data := make([]float32, numFeatures)
-	bitmap := make([]byte, (numFeatures+7)/8)
-
-	numDefaults := int(float64(numFeatures) * defaultRatio)
-
-	// Create a list of indices
-	indices := make([]int, numFeatures)
-	for i := range indices {
-		indices[i] = i
-	}
-
-	// Shuffle indices
-	rand.Shuffle(len(indices), func(i, j int) {
-		indices[i], indices[j] = indices[j], indices[i]
-	})
-
-	// Set first numDefaults indices to 0.0 (default), rest to random values
-	for i := 0; i < numFeatures; i++ {
-		idx := indices[i]
-		if i < numDefaults {
-			data[idx] = 0.0
-			// bitmap bit remains 0
-		} else {
-			data[idx] = rand.Float32()
-			bitmap[idx/8] |= 1 << (idx % 8)
-		}
-	}
-
-	return data, bitmap
-}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` around
lines 669 - 702, The helper generateSparseData is unused and triggers
staticcheck U1000; either delete generateSparseData entirely, or wire it into
the FP32 test path by replacing the inline FP32 data-generation code with a call
to generateSparseData(numFeatures, defaultRatio) and use its returned data and
bitmap values (data, bitmap) where the test currently constructs fp32 values and
bitmaps; reference the generateSparseData function name and the returned symbols
data and bitmap when making the change.

498-501: ⚠️ Potential issue | 🟠 Major

Guard conclusion percentages against zero denominators.

Line 499 and Line 500 can emit +Inf/NaN when results or validCases is zero, which makes the report invalid.

✅ Suggested fix
+	improvementPct := 0.0
+	if len(results) > 0 {
+		improvementPct = float64(betterCount) / float64(len(results)) * 100
+	}
+	avgCompressedReduction := 0.0
+	if validCases > 0 {
+		avgCompressedReduction = totalCompressedReduction / float64(validCases)
+	}
 	fmt.Fprintf(f, "Rationale:\n")
 	fmt.Fprintf(f, "  • Consistent improvements in %d out of %d scenarios (%.1f%%)\n",
-		betterCount, len(results), float64(betterCount)/float64(len(results))*100)
-	fmt.Fprintf(f, "  • Average compressed size reduction: %.2f%%\n", totalCompressedReduction/float64(validCases))
+		betterCount, len(results), improvementPct)
+	fmt.Fprintf(f, "  • Average compressed size reduction: %.2f%%\n", avgCompressedReduction)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` around
lines 498 - 501, The Fprintf lines compute percentages using len(results) and
validCases which can be zero; update the reporting to guard against
division-by-zero in the fmt.Fprintf calls: for the consistency percentage (using
betterCount and results) check if len(results) == 0 and emit a safe value or
"N/A" (or 0.0) instead of performing betterCount/len(results), and for the
average compressed size reduction check if validCases == 0 and emit a safe value
instead of totalCompressedReduction/validCases; modify the code paths that call
fmt.Fprintf (the three lines referencing betterCount, results,
totalCompressedReduction, validCases, and maxOriginalReduction) to compute the
displayed percentage into a local variable after these guards and pass that
variable to fmt.Fprintf.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@online-feature-store/internal/data/blocks/layout_comparison_test.go`:
- Around line 502-503: The test output embeds hardcoded rationale numbers in the
fmt.Fprintf calls (the two lines that print "  • Minimal overhead (3.5%%)..."
and "  • Production ML feature vectors typically have 20-95%% sparsity"), so
change these to not assert fixed claims: either compute and format the
percentages from the test's actual sample/sparsity data (use the test variables
that produce the layout metrics) or remove/replace the lines with a generic
statement without numeric literals; update the fmt.Fprintf invocations in
layout_comparison_test.go to reference the computed variables or a non-numeric
message so generated output reflects current results rather than hardcoded
numbers.
- Around line 420-424: The test currently prints "expected for 0% defaults" for
every regression; update the else branch that prints the failure message (the
fmt.Fprintf call inside the if r.IsLayout2Better {...} else {...} block) to
conditionally choose the message based on the record's default rate: if
r.DefaultRate == 0 print the original "expected for 0% defaults" text, otherwise
print a different message indicating an unexpected regression that includes the
actual default rate (e.g. "unexpected regression for X% defaults"). Ensure you
modify the fmt.Fprintf call and reference r.DefaultRate (or the appropriate
field that holds the default-percentage) so the output is accurate for
non-zero-default cases.

---

Duplicate comments:
In `@online-feature-store/internal/data/blocks/layout_comparison_test.go`:
- Around line 669-702: The helper generateSparseData is unused and triggers
staticcheck U1000; either delete generateSparseData entirely, or wire it into
the FP32 test path by replacing the inline FP32 data-generation code with a call
to generateSparseData(numFeatures, defaultRatio) and use its returned data and
bitmap values (data, bitmap) where the test currently constructs fp32 values and
bitmaps; reference the generateSparseData function name and the returned symbols
data and bitmap when making the change.
- Around line 498-501: The Fprintf lines compute percentages using len(results)
and validCases which can be zero; update the reporting to guard against
division-by-zero in the fmt.Fprintf calls: for the consistency percentage (using
betterCount and results) check if len(results) == 0 and emit a safe value or
"N/A" (or 0.0) instead of performing betterCount/len(results), and for the
average compressed size reduction check if validCases == 0 and emit a safe value
instead of totalCompressedReduction/validCases; modify the code paths that call
fmt.Fprintf (the three lines referencing betterCount, results,
totalCompressedReduction, validCases, and maxOriginalReduction) to compute the
displayed percentage into a local variable after these guards and pass that
variable to fmt.Fprintf.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b8bc3d and 486d25d.

📒 Files selected for processing (2)
  • online-feature-store/internal/data/blocks/layout_comparison_results.txt
  • online-feature-store/internal/data/blocks/layout_comparison_test.go

Comment on lines +420 to +424
if r.IsLayout2Better {
fmt.Fprintf(f, " Result: ✅ Layout2 is BETTER\n")
} else {
fmt.Fprintf(f, " Result: ⚠️ Layout2 has overhead (expected for 0%% defaults)\n")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use conditional failure messaging instead of always “expected for 0% defaults”.

Line 423 labels all regressions as expected 0%-default overhead, which is inaccurate for non-zero-default regressions.

✏️ Suggested fix
 			if r.IsLayout2Better {
 				fmt.Fprintf(f, "     Result: ✅ Layout2 is BETTER\n")
 			} else {
-				fmt.Fprintf(f, "     Result: ⚠️  Layout2 has overhead (expected for 0%% defaults)\n")
+				if r.DefaultRatio == 0 {
+					fmt.Fprintf(f, "     Result: ⚠️  Layout2 has overhead (expected for 0%% defaults)\n")
+				} else {
+					fmt.Fprintf(f, "     Result: ❌ Layout2 is worse for this defaults ratio\n")
+				}
 			}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if r.IsLayout2Better {
fmt.Fprintf(f, " Result: ✅ Layout2 is BETTER\n")
} else {
fmt.Fprintf(f, " Result: ⚠️ Layout2 has overhead (expected for 0%% defaults)\n")
}
if r.IsLayout2Better {
fmt.Fprintf(f, " Result: ✅ Layout2 is BETTER\n")
} else {
if r.DefaultRatio == 0 {
fmt.Fprintf(f, " Result: ⚠️ Layout2 has overhead (expected for 0%% defaults)\n")
} else {
fmt.Fprintf(f, " Result: ❌ Layout2 is worse for this defaults ratio\n")
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` around
lines 420 - 424, The test currently prints "expected for 0% defaults" for every
regression; update the else branch that prints the failure message (the
fmt.Fprintf call inside the if r.IsLayout2Better {...} else {...} block) to
conditionally choose the message based on the record's default rate: if
r.DefaultRate == 0 print the original "expected for 0% defaults" text, otherwise
print a different message indicating an unexpected regression that includes the
actual default rate (e.g. "unexpected regression for X% defaults"). Ensure you
modify the fmt.Fprintf call and reference r.DefaultRate (or the appropriate
field that holds the default-percentage) so the output is accurate for
non-zero-default cases.

Comment on lines +502 to +503
fmt.Fprintf(f, " • Minimal overhead (3.5%%) only in edge case with 0%% defaults\n")
fmt.Fprintf(f, " • Production ML feature vectors typically have 20-95%% sparsity\n")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid hardcoded rationale numbers in generated output.

Line 502 and Line 503 embed fixed claims (3.5%, 20-95%) that are not derived from current results and can become incorrect.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@online-feature-store/internal/data/blocks/layout_comparison_test.go` around
lines 502 - 503, The test output embeds hardcoded rationale numbers in the
fmt.Fprintf calls (the two lines that print "  • Minimal overhead (3.5%%)..."
and "  • Production ML feature vectors typically have 20-95%% sparsity"), so
change these to not assert fixed claims: either compute and format the
percentages from the test's actual sample/sparsity data (use the test variables
that produce the layout metrics) or remove/replace the lines with a generic
statement without numeric literals; update the fmt.Fprintf invocations in
layout_comparison_test.go to reference the computed variables or a non-numeric
message so generated output reflects current results rather than hardcoded
numbers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants