fix(loader): avoid NPE in FileLineFetcher close method & update HDFS test#710
fix(loader): avoid NPE in FileLineFetcher close method & update HDFS test#710bajisn-666 wants to merge 8 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a NullPointerException issue in FileLineFetcher.java that occurred in multi-threaded environments when parser threads > 1. The fix adds null checks in the fetch(), skipOffset(), and checkMatchHeader() methods to handle cases where the reader might be null during concurrent execution.
Changes:
- Added null safety checks in
FileLineFetcher.javato prevent NPE in multi-threaded scenarios - Updated HDFS tests to use 4 parser threads to verify the fix works under concurrent load
- Refactored
HDFSUtil.javatest utility to simplify HDFS configuration handling - Added
.editorconfigrules for Parquet binary files to prevent editor corruption - Added a test Parquet file
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| hugegraph-loader/src/main/java/org/apache/hugegraph/loader/reader/file/FileLineFetcher.java | Added null checks in fetch(), skipOffset(), and checkMatchHeader() methods to handle null reader scenarios |
| hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/HDFSLoadTest.java | Increased parser thread count to 4 to test concurrent scenarios |
| hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/HDFSUtil.java | Simplified HDFS configuration and refactored utility methods |
| .editorconfig | Added configuration for Parquet files to prevent binary file corruption |
| hugegraph-loader/src/test/resources/parquet_compress_file/vertex_person.parquet | Added test Parquet file for testing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Fix NPE: check reader again inside loop | ||
| if (this.reader == null) { | ||
| throw new LoadException("Reader is null when skipping offset of file %s", | ||
| readable); | ||
| } |
There was a problem hiding this comment.
Contradictory null checking logic: Line 201 enforces that reader must not be null with E.checkState(this.reader != null, ...), but then lines 206-209 add a null check inside the loop. If the reader can become null during iteration (e.g., in a multi-threaded environment), the initial check at line 201 is insufficient. If it cannot, then the null check inside the loop is unnecessary. Consider either: 1) removing the check at line 201 and relying on the null check in the loop, or 2) using synchronized access if reader can be modified by other threads.
| // Fix NPE: check reader again inside loop | |
| if (this.reader == null) { | |
| throw new LoadException("Reader is null when skipping offset of file %s", | |
| readable); | |
| } |
| // Fix NPE: check if reader is null before reading | ||
| if (this.reader == null) { | ||
| return null; | ||
| } |
There was a problem hiding this comment.
PR title mentions "avoid NPE in FileLineFetcher close method" but no changes were made to the closeReader() method (lines 151-155), which already has a null check for this.reader. The actual changes are in fetch(), skipOffset(), and checkMatchHeader() methods. Consider updating the PR title to accurately reflect what was changed, e.g., "fix: add null checks in FileLineFetcher fetch/skipOffset/checkMatchHeader methods".
| "-g", GRAPH, | ||
| "-h", SERVER, | ||
| "--batch-insert-threads", "2", | ||
| "--parser-threads", "4", // <--- 【修改点 1】增加并发解析线程数 |
There was a problem hiding this comment.
Comment contains Chinese text in a codebase that appears to use English for comments. For consistency, consider translating to English, e.g., "Modification point 1: Increase concurrent parser thread count".
| "-g", GRAPH, | ||
| "-h", SERVER, | ||
| "--batch-insert-threads", "2", | ||
| "--parser-threads", "4", // <--- 【修改点 2】增加并发解析线程数 |
There was a problem hiding this comment.
Comment contains Chinese text in a codebase that appears to use English for comments. For consistency, consider translating to English, e.g., "Modification point 2: Increase concurrent parser thread count".
| try { | ||
| FileUtil.copy(new File(srcPath), this.hdfs, new Path(destPath), | ||
| false, this.conf); | ||
| // 通常测试场景是将本地文件上传到 HDFS |
There was a problem hiding this comment.
Comment contains Chinese text in a codebase that appears to use English for comments. For consistency, consider translating to English, e.g., "Usually in test scenarios, upload local files to HDFS".
| // 通常测试场景是将本地文件上传到 HDFS | |
| // Usually in test scenarios, upload local files to HDFS |
| indent_size = 4 | ||
|
|
||
| # --------------------------------------------------------- | ||
| # 关键修改:防止编辑器破坏 Parquet 二进制文件 |
There was a problem hiding this comment.
Comment contains Chinese text. For consistency with the rest of the file and international collaboration, consider using English, e.g., "Key change: Prevent editor from corrupting Parquet binary files".
| # 关键修改:防止编辑器破坏 Parquet 二进制文件 | |
| # Key change: Prevent editor from corrupting Parquet binary files |
| import org.apache.hugegraph.loader.reader.line.Line; | ||
| import org.apache.hugegraph.loader.reader.line.LineFetcher; | ||
| import org.apache.hugegraph.loader.source.file.Compression; | ||
| import org.apache.hugegraph.loader.source.file.FileFormat; | ||
| import org.apache.hugegraph.loader.source.file.FileSource; | ||
| import org.slf4j.Logger; | ||
|
|
||
| import org.apache.hugegraph.loader.parser.CsvLineParser; | ||
| import org.apache.hugegraph.loader.parser.JsonLineParser; | ||
| import org.apache.hugegraph.loader.parser.LineParser; | ||
| import org.apache.hugegraph.loader.parser.TextLineParser; | ||
| import org.apache.hugegraph.loader.reader.Readable; | ||
| import org.apache.hugegraph.loader.reader.line.Line; | ||
| import org.apache.hugegraph.loader.reader.line.LineFetcher; | ||
| import org.apache.hugegraph.loader.source.file.Compression; | ||
| import org.apache.hugegraph.loader.source.file.FileFormat; | ||
| import org.apache.hugegraph.loader.source.file.FileSource; | ||
| import org.apache.hugegraph.util.E; | ||
| import org.apache.hugegraph.util.Log; | ||
| import org.slf4j.Logger; |
There was a problem hiding this comment.
use style file to avoid code format diff
https://github.com/apache/incubator-hugegraph-toolchain/blob/master/.editorconfig (use it to override original IDE format)
| throw new LoadException("Reader is null when skipping offset of file %s", | ||
| readable); |
There was a problem hiding this comment.
hit 100 chars in one line here?
There was a problem hiding this comment.
seems the judgement logic in method is not precise & duplicated
maybe we have a better solution for it
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #710 +/- ##
=============================================
- Coverage 62.49% 51.49% -11.01%
+ Complexity 1903 973 -930
=============================================
Files 262 111 -151
Lines 9541 5861 -3680
Branches 886 755 -131
=============================================
- Hits 5963 3018 -2945
+ Misses 3190 2580 -610
+ Partials 388 263 -125 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What is the purpose of the change
This PR fixes a
NullPointerExceptioninFileLineFetcher.java. The NPE occurred when theclose()method was called, but thereaderwas null. This usually happens in multi-threaded environments or when file initialization fails.Also updated
HDFSLoadTest.javato use 4 threads to verify the fix in a concurrent scenario.Related issues
Fixes #706