Skip to content

test: add lightweight unit coverage for EnrichmentPipeline #138#142

Open
davide954 wants to merge 4 commits intoDigitalPebble:mainfrom
davide954:test/pipeline-unit-coverage
Open

test: add lightweight unit coverage for EnrichmentPipeline #138#142
davide954 wants to merge 4 commits intoDigitalPebble:mainfrom
davide954:test/pipeline-unit-coverage

Conversation

@davide954
Copy link
Contributor

This PR adds a pure JUnit test (EnrichmentPipelineTest.java) for the EnrichmentPipeline class.

Instead of spinning up a full SparkSession (which is slow and can have path issues on Windows), this test:

  1. Manually creates input Row objects.
  2. Runs them through the pipeline logic.
  3. Verifies the output.

Passes with mvn clean test.

@davide954 davide954 force-pushed the test/pipeline-unit-coverage branch 2 times, most recently from e301671 to c33c78b Compare February 4, 2026 17:51
@jnioche jnioche added the enhancement New feature or request label Feb 4, 2026
@davide954 davide954 force-pushed the test/pipeline-unit-coverage branch from c33c78b to 2b2190a Compare February 4, 2026 22:43
Introduced parameterized tests for negative cases. Added assertions for enriched columns. Used CURColumn constants where available, keeping hardcoded strings for missing fields.
@davide954 davide954 requested a review from jnioche February 5, 2026 10:55
Added missing standard columns to CURColumn to avoid magic strings
@davide954
Copy link
Contributor Author

Addressed the feedback from the previous review.

I updated the test to use @ParameterizedTest with @CsvSource as suggested.

Regarding your comment about CURColumn: I noticed that some standard columns (like dates, costs, and IDs) were missing from the class. I added them to CURColumn.java so we can use the constants in the test instead of hardcoded strings.

@jnioche
Copy link
Member

jnioche commented Feb 5, 2026

Thanks, will review more when I have more time

Regarding your comment about CURColumn: I noticed that some standard columns (like dates, costs, and IDs) were missing from the class. I added them to CURColumn.java so we can use the constants in the test instead of hardcoded strings.

I think we should have in CURColumn only things that are used in the enrichment modules. The tests should also not use fields that are not needed by the modules, likeline_item_unblendedcost - did you want to add those for a specific purpose?

@tejas-2232
Copy link
Contributor

Great work @davide954 🎆
I would like to request some minor changes.

@davide954
Copy link
Contributor Author

Thanks @tejas-2232.
@jnioche I'll stay in touch for your instructions...


Object[] values = new Object[schema.length()];
values[0] = "us-east-1"; // Matches CURColumn.PRODUCT_REGION_CODE
values[1] = lineItemType; // The dynamic parameter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 55-57.
The hardcoded array indices depend on loop iteration order for enrichment module columns. This looks fragile. what do you think about adding something like this?

int regionIndex = schema.fieldIndex(CURColumn.PRODUCT_REGION_CODE.getLabel());
int lineItemIndex = schema.fieldIndex(CURColumn.LINE_ITEM_TYPE.getLabel());
values[regionIndex] = "us-east-1";
values[lineItemIndex] = lineItemType;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! That makes perfect sense I'll work on it.

// Verify enrichment logic
for (EnrichmentModule module : config.getModules()) {
for (Column c : module.columnsAdded()) {
int fieldIndex = processedRow.fieldIndex(c.getLabel());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Null Check on fieldIndex()

fieldIndex() can throw IllegalArgumentException if field doesn't exist.

I would say -> Add try-catch or ensure field exists first

try {
    int fieldIndex = processedRow.fieldIndex(c.getLabel());
    // assertions...
} catch (IllegalArgumentException e) {
    fail("Column " + c.getLabel() + " not found in schema: " + e.getMessage());
}

Copy link
Contributor Author

@davide954 davide954 Feb 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but I wonder if the explicit try-catch is necessary here. Since this is a unit test, if fieldIndex() throws an IllegalArgumentException, the test will automatically fail with the exception message (which already identifies the missing field).

I'd prefer to keep the loop clean and let JUnit handle the exception as a test failure, unless you have a specific case in mind where masking the exception is better? In this case enlighten me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but I wonder if the explicit try-catch is necessary here. Since this is a unit test, if fieldIndex() throws an IllegalArgumentException, the test will automatically fail with the exception message (which already identifies the missing field).

I'd prefer to keep the loop clean and let JUnit handle the exception as a test failure, unless you have a specific case in mind where masking the exception is better? In this case enlighten me

haha, right. I went too far in thinking. That works anytime.

.add(CURColumn.PRODUCT_REGION_CODE.getLabel(), CURColumn.PRODUCT_REGION_CODE.getType(), true)
.add(CURColumn.LINE_ITEM_TYPE.getLabel(), CURColumn.LINE_ITEM_TYPE.getType(), true);

for (EnrichmentModule module : config.getModules()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No Assertion on Module Count:
If config.getModules() returns an empty list then the test passes without actually validating anything.

I would say - you can add below assertion right after pipeline is created.

Let me know what you think? 😃

List<EnrichmentModule> modules = config.getModules();

assertTrue(!modules.isEmpty(), "Config should have at least one enrichment module for this test");
// OR below one 
assertTrue(modules.size() > 0, "Config should have at least one enrichment module for this test");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll push tomorrow! Now I'm signing off as it's almost midnight here in Italy. Thanks for the sharp review @tejas-2232, catch you later!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants