feat(loader): update loading logic & enhance failure case#704
feat(loader): update loading logic & enhance failure case#704imbajin merged 36 commits intoapache:masterfrom
Conversation
[pull] master from apache:master
[pull] master from apache:master
…ader, and refactored the failure-handling logic for batch inserts.
…ader, and refactored the failure-handling logic for batch inserts.
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java
Outdated
Show resolved
Hide resolved
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java
Show resolved
Hide resolved
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java
Outdated
Show resolved
Hide resolved
| validateWith = {PositiveValidator.class}, | ||
| description = "The number of parallel read pipelines. " + | ||
| "Default: auto max(2, cpu/2). " + | ||
| "Must be >= 1") |
There was a problem hiding this comment.
从固定值 1 变更为 Math.max(2, CPUS/2) 是一个显著的性能相关变更。
建议:
- 需要在不同CPU配置(2核、4核、8核、16核+)的机器上进行性能测试
- 验证在不同数据规模下的表现(小数据集 vs 大数据集)
- 在PR描述的"Verifying these changes"章节补充性能测试结果
- 考虑是否需要设置一个上限值,避免在高核心数机器上创建过多线程
测试建议:
# 测试不同并行度下的性能
for i in 1 2 4 8; do
time hugegraph-loader --parser-threads $i ...
done
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/HugeGraphLoader.java
Outdated
Show resolved
Hide resolved
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
This PR updates the HugeGraph Loader by adjusting default parameters for connection pooling and parallel processing, renaming configuration fields for clarity, and refactoring the batch insert failure handling to stop the entire load process instead of falling back to single insert mode.
Changes:
- Refactored batch insert error handling to stop loading on failure instead of falling back to single insert
- Renamed
parallelCounttoparallelThreadswith updated default calculation (max(2, CPUS/2)) - Added validation and improved descriptions for connection pool parameters
- Added GitHub workflow to auto-trigger PR reviews
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java | Refactored batch insert error handling to stop loading on failure; added early return when loading is stopped |
| hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java | Renamed parallelCount to parallelThreads, added constants for default connections, improved parameter descriptions, added validators |
| hugegraph-loader/src/main/java/org/apache/hugegraph/loader/HugeGraphLoader.java | Updated references from parallelCount to parallelThreads; removed dynamic calculation logic |
| .github/workflows/auto-pr-review.yml | Added workflow to automatically comment on new PRs to trigger codecov-ai-reviewer |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java
Outdated
Show resolved
Hide resolved
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java
Outdated
Show resolved
Hide resolved
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/task/TaskManager.java
Show resolved
Hide resolved
…ader, and refactored the failure-handling logic for batch inserts.
…ader, and refactored the failure-handling logic for batch inserts.
…ader, and refactored the failure-handling logic for batch inserts.
…ader, and refactored the failure-handling logic for batch inserts.
cc737b8 to
572cc94
Compare
6c5a3ad to
ed816e3
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "-h", SERVER, | ||
| "--test-mode", "true" | ||
| "--test-mode", "true", | ||
| // FIXME: Set parser-threads to 1 because values > 1 currently trigger a NullPointerException (NPE). |
There was a problem hiding this comment.
The FIXME comment indicates that setting parser-threads to values > 1 triggers a NullPointerException (NPE). This suggests that either: (1) the new default value of parser-threads (max(2, CPUS/2)) may cause NPEs in these test scenarios, or (2) there's an existing bug that's being worked around. If this is a new bug introduced by changing the default value, it should be fixed before merging. If it's an existing known issue, the FIXME should reference an issue tracker number and explain why it's acceptable to work around it in tests.
| // FIXME: Set parser-threads to 1 because values > 1 currently trigger a NullPointerException (NPE). | |
| // FIXME [HUGEGRAPH-XXXX]: Using parser-threads > 1 currently triggers a NullPointerException (NPE) | |
| // when loading multiple CSV files with headers in parallel. Limit to 1 thread here as a temporary | |
| // workaround to keep this functional test stable until the underlying concurrency bug is fixed. |
| "-h", SERVER, | ||
| "--test-mode", "true" | ||
| "--test-mode", "true", | ||
| // FIXME: Set parser-threads to 1 because values > 1 currently trigger a NullPointerException (NPE). |
There was a problem hiding this comment.
The FIXME comment indicates that setting parser-threads to values > 1 triggers a NullPointerException (NPE). This suggests that either: (1) the new default value of parser-threads (max(2, CPUS/2)) may cause NPEs in these test scenarios, or (2) there's an existing bug that's being worked around. If this is a new bug introduced by changing the default value, it should be fixed before merging. If it's an existing known issue, the FIXME should reference an issue tracker number and explain why it's acceptable to work around it in tests.
| // FIXME: Set parser-threads to 1 because values > 1 currently trigger a NullPointerException (NPE). | |
| // Set parser-threads to 1 to avoid a known NullPointerException (NPE) when using multiple parser threads in this scenario. |
hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/functional/FileLoadTest.java
Show resolved
Hide resolved
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java
Outdated
Show resolved
Hide resolved
hugegraph-loader/src/main/java/org/apache/hugegraph/loader/executor/LoadOptions.java
Show resolved
Hide resolved
hugegraph-loader/src/test/java/org/apache/hugegraph/loader/test/unit/LoadOptionsTest.java
Show resolved
Hide resolved
| LoadOptions options = new LoadOptions(); | ||
|
|
||
| Assert.assertEquals(cpus * 4, options.maxConnections); | ||
| Assert.assertEquals(cpus * 2, options.maxConnectionsPerRoute); |
There was a problem hiding this comment.
Potential overflow in int multiplication before it is converted to long by use in an invocation context.
There was a problem hiding this comment.
PR #704 代码审查
本PR调整了 Loader 的多个默认参数和描述,并重构了批量插入的失败处理逻辑。整体代码质量较好,但发现了几个关键问题需要修复。
‼️ 关键问题(必须修复)
1. 批量失败时存在信号量泄漏
在 TaskManager.java 的 submitBatch 方法中,当 batchFailureFallback=false 时:
} else {
summary.metrics(struct).minusFlighting(batch.size());
this.context.occurredError();
this.context.stopLoading();
Printer.printError("Batch insert %s failed, stop loading.", ...);
}问题: 调用了 minusFlighting 但未调用 batchSemaphore.release(),导致:
- 每次批量失败永久占用一个信号量许可
- 多次失败后新的批量插入将永久阻塞
修复: 在 else 分支末尾添加 this.batchSemaphore.release();
⚠️ 重要问题
2. 错误日志级别不一致
TaskManager.java:180 使用 LOG.error,紧接着 182 行又用 LOG.warn。建议统一为:
- 批量失败但有降级: WARN
- 批量失败且无降级: ERROR
🧹 代码质量建议
3. 日志信息未同步更新
HugeGraphLoader.java 中,参数重命名为 parseThreads 后,日志仍为:
LOG.info("{} threads for loading {} structs...", parseThreads, ...)建议改为:
LOG.info("{} parser threads for loading {} structs...", parseThreads, ...)4. 参数描述过长
--batch-insert-threads 等参数的描述达到 3 行,影响 --help 输出可读性。建议保持描述简洁,详细说明放文档中。
5. 单元测试质量优秀 👍
新增的 LoadOptionsTest 和 TaskManagerTest 覆盖了:
- 连接池自动调整
- 并发场景信号量管理
- 批量失败降级行为
小建议: TaskManagerTest 使用了 sun.misc.Unsafe,在 JDK 9+ 可能有兼容性问题,考虑使用 Mockito。
总结
优点:
- 参数重命名使语义更清晰(
parallelCount→parseThreads) - 添加了
batchFailureFallback开关,提供更灵活的错误处理 - 连接池自动调整逻辑合理
- 测试覆盖全面
必须修复的阻塞问题:
- 多线程解析的 NPE 问题(最高优先级)
- 信号量泄漏
- 动态线程数计算逻辑缺失
建议先修复这些关键问题后再合并。
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #704 +/- ##
=============================================
- Coverage 62.49% 51.52% -10.98%
+ Complexity 1903 973 -930
=============================================
Files 262 111 -151
Lines 9541 5855 -3686
Branches 886 754 -132
=============================================
- Hits 5963 3017 -2946
+ Misses 3190 2576 -614
+ Partials 388 262 -126 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Update required review count and collaborators
Purpose of the PR
Main Changes
Verifying these changes
Does this PR potentially affect the following parts?
Documentation Status
Doc - TODODoc - DoneDoc - No Need