Skip to content

Comments

fix(loader): avoid NPE in FileLineFetcher close method & update HDFS test#710

Open
bajisn-666 wants to merge 8 commits intoapache:masterfrom
bajisn-666:fix-issue-706
Open

fix(loader): avoid NPE in FileLineFetcher close method & update HDFS test#710
bajisn-666 wants to merge 8 commits intoapache:masterfrom
bajisn-666:fix-issue-706

Conversation

@bajisn-666
Copy link

What is the purpose of the change
This PR fixes a NullPointerException in FileLineFetcher.java. The NPE occurred when the close() method was called, but the reader was null. This usually happens in multi-threaded environments or when file initialization fails.

Also updated HDFSLoadTest.java to use 4 threads to verify the fix in a concurrent scenario.

Related issues

Fixes #706

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jan 24, 2026
@github-actions github-actions bot added the loader hugegraph-loader label Jan 24, 2026
@dosubot dosubot bot added the bug Something isn't working label Jan 24, 2026
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jan 24, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a NullPointerException issue in FileLineFetcher.java that occurred in multi-threaded environments when parser threads > 1. The fix adds null checks in the fetch(), skipOffset(), and checkMatchHeader() methods to handle cases where the reader might be null during concurrent execution.

Changes:

  • Added null safety checks in FileLineFetcher.java to prevent NPE in multi-threaded scenarios
  • Updated HDFS tests to use 4 parser threads to verify the fix works under concurrent load
  • Refactored HDFSUtil.java test utility to simplify HDFS configuration handling
  • Added .editorconfig rules for Parquet binary files to prevent editor corruption
  • Added a test Parquet file

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/reader/file/FileLineFetcher.java Added null checks in fetch(), skipOffset(), and checkMatchHeader() methods to handle null reader scenarios
hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/HDFSLoadTest.java Increased parser thread count to 4 to test concurrent scenarios
hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/HDFSUtil.java Simplified HDFS configuration and refactored utility methods
.editorconfig Added configuration for Parquet files to prevent binary file corruption
hugegraph-loader/src/test/resources/parquet_compress_file/vertex_person.parquet Added test Parquet file for testing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +205 to +209
// Fix NPE: check reader again inside loop
if (this.reader == null) {
throw new LoadException("Reader is null when skipping offset of file %s",
readable);
}
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contradictory null checking logic: Line 201 enforces that reader must not be null with E.checkState(this.reader != null, ...), but then lines 206-209 add a null check inside the loop. If the reader can become null during iteration (e.g., in a multi-threaded environment), the initial check at line 201 is insufficient. If it cannot, then the null check inside the loop is unnecessary. Consider either: 1) removing the check at line 201 and relying on the null check in the loop, or 2) using synchronized access if reader can be modified by other threads.

Suggested change
// Fix NPE: check reader again inside loop
if (this.reader == null) {
throw new LoadException("Reader is null when skipping offset of file %s",
readable);
}

Copilot uses AI. Check for mistakes.
Comment on lines +160 to +163
// Fix NPE: check if reader is null before reading
if (this.reader == null) {
return null;
}
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR title mentions "avoid NPE in FileLineFetcher close method" but no changes were made to the closeReader() method (lines 151-155), which already has a null check for this.reader. The actual changes are in fetch(), skipOffset(), and checkMatchHeader() methods. Consider updating the PR title to accurately reflect what was changed, e.g., "fix: add null checks in FileLineFetcher fetch/skipOffset/checkMatchHeader methods".

Copilot uses AI. Check for mistakes.
"-g", GRAPH,
"-h", SERVER,
"--batch-insert-threads", "2",
"--parser-threads", "4", // <--- 【修改点 1】增加并发解析线程数
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment contains Chinese text in a codebase that appears to use English for comments. For consistency, consider translating to English, e.g., "Modification point 1: Increase concurrent parser thread count".

Copilot uses AI. Check for mistakes.
"-g", GRAPH,
"-h", SERVER,
"--batch-insert-threads", "2",
"--parser-threads", "4", // <--- 【修改点 2】增加并发解析线程数
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment contains Chinese text in a codebase that appears to use English for comments. For consistency, consider translating to English, e.g., "Modification point 2: Increase concurrent parser thread count".

Copilot uses AI. Check for mistakes.
try {
FileUtil.copy(new File(srcPath), this.hdfs, new Path(destPath),
false, this.conf);
// 通常测试场景是将本地文件上传到 HDFS
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment contains Chinese text in a codebase that appears to use English for comments. For consistency, consider translating to English, e.g., "Usually in test scenarios, upload local files to HDFS".

Suggested change
// 通常测试场景是将本地文件上传到 HDFS
// Usually in test scenarios, upload local files to HDFS

Copilot uses AI. Check for mistakes.
indent_size = 4

# ---------------------------------------------------------
# 关键修改:防止编辑器破坏 Parquet 二进制文件
Copy link

Copilot AI Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment contains Chinese text. For consistency with the rest of the file and international collaboration, consider using English, e.g., "Key change: Prevent editor from corrupting Parquet binary files".

Suggested change
# 关键修改:防止编辑器破坏 Parquet 二进制文件
# Key change: Prevent editor from corrupting Parquet binary files

Copilot uses AI. Check for mistakes.
Comment on lines -40 to +52
import org.apache.hugegraph.loader.reader.line.Line;
import org.apache.hugegraph.loader.reader.line.LineFetcher;
import org.apache.hugegraph.loader.source.file.Compression;
import org.apache.hugegraph.loader.source.file.FileFormat;
import org.apache.hugegraph.loader.source.file.FileSource;
import org.slf4j.Logger;

import org.apache.hugegraph.loader.parser.CsvLineParser;
import org.apache.hugegraph.loader.parser.JsonLineParser;
import org.apache.hugegraph.loader.parser.LineParser;
import org.apache.hugegraph.loader.parser.TextLineParser;
import org.apache.hugegraph.loader.reader.Readable;
import org.apache.hugegraph.loader.reader.line.Line;
import org.apache.hugegraph.loader.reader.line.LineFetcher;
import org.apache.hugegraph.loader.source.file.Compression;
import org.apache.hugegraph.loader.source.file.FileFormat;
import org.apache.hugegraph.loader.source.file.FileSource;
import org.apache.hugegraph.util.E;
import org.apache.hugegraph.util.Log;
import org.slf4j.Logger;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use style file to avoid code format diff

https://github.com/apache/incubator-hugegraph-toolchain/blob/master/.editorconfig (use it to override original IDE format)

Comment on lines +207 to +208
throw new LoadException("Reader is null when skipping offset of file %s",
readable);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hit 100 chars in one line here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems the judgement logic in method is not precise & duplicated

maybe we have a better solution for it

@codecov
Copy link

codecov bot commented Jan 24, 2026

Codecov Report

❌ Patch coverage is 25.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 51.49%. Comparing base (b066b80) to head (c6a2b0d).
⚠️ Report is 66 commits behind head on master.

Files with missing lines Patch % Lines
.../hugegraph/loader/reader/file/FileLineFetcher.java 25.00% 4 Missing and 2 partials ⚠️

❗ There is a different number of reports uploaded between BASE (b066b80) and HEAD (c6a2b0d). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (b066b80) HEAD (c6a2b0d)
2 1
Additional details and impacted files
@@              Coverage Diff              @@
##             master     #710       +/-   ##
=============================================
- Coverage     62.49%   51.49%   -11.01%     
+ Complexity     1903      973      -930     
=============================================
  Files           262      111      -151     
  Lines          9541     5861     -3680     
  Branches        886      755      -131     
=============================================
- Hits           5963     3018     -2945     
+ Misses         3190     2580      -610     
+ Partials        388      263      -125     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working loader hugegraph-loader size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Unit tests fail with NPE in HDFS-Tests (loader)

2 participants