DAOS-18367 vos: properly evict object for failed transaction #17325

Nasf-Fan · 2025-12-29T11:37:29Z

Currently, if a transaction failed for some reason, the cleanup logic
will try to evict related vos object from cache to avoid leaving stable
information in cache. Such logic works well for the system with PMEM.
But under md-on-ssd mode, the eviction may cause trouble. Because one
vos modification may hold the same object multiple times, and there is
CPU yield during these object hold actions. That creates race windows
for other concurrent operations against the same object.

This patch changes the logic: when the transaction changes some vos
object(s), it will record related oid(s), if such transaction failed
in subsequent process, it will only evict these modified objects. The
others in cache will not be affected during transaction cleanup.

On the other hand, under md-on-ssd mode, CPU may yield during backend
TX start, the object that is held by current modification maybe marked
as evicted in such race windows. So add logic to check whether related
object is evicted or not after backend TX started, if yes, then restart
current transaction.

Signed-off-by: Fan Yong fan.yong@hpe.com

Steps for the author:

Commit message follows the guidelines.
Appropriate Features or Test-tag pragmas were used.
Appropriate Functional Test Stages were run.
At least two positive code reviews including at least one code owner from each category referenced in the PR.
Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

Gatekeeper requested (daos-gatekeeper added as a reviewer).

github-actions · 2025-12-29T11:37:47Z

Ticket title is 'Enhance dtx_act_ent_cleanup() to only evict self-created object when transaction failure'
Status is 'In Progress'
Labels: 'scrubbed_2.8'
https://daosio.atlassian.net/browse/DAOS-18367

daosbuild3 · 2025-12-29T12:34:22Z

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17325/1/testReport/

daosbuild3 · 2025-12-29T18:25:29Z

Test stage Functional Hardware Large MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17325/1/execution/node/1282/log

daosbuild3 · 2025-12-30T04:06:55Z

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17325/1/execution/node/1367/log

daosbuild3 · 2025-12-30T08:12:08Z

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17325/2/testReport/

daosbuild3 · 2025-12-30T17:03:12Z

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17325/2/execution/node/1345/log

daosbuild3 · 2025-12-30T18:34:08Z

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17325/2/execution/node/1365/log

Nasf-Fan · 2026-01-01T03:36:36Z

Test stage Functional Hardware Medium MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17325/2/execution/node/1345/log

dmg_telemetry_io_basic.py failed for DAOS-18388, not related with the patch, to be retested.

Currently, if a transaction failed for some reason, the cleanup logic will try to evict related vos object from cache to avoid leaving stable information in cache. Such logic works well for the system with PMEM. But under md-on-ssd mode, the eviction may cause trouble. Because one vos modification may hold the same object multiple times, and there is CPU yield during these object hold actions. That creates race windows for other concurrent operations against the same object. This patch changes the logic: when the transaction changes some vos object(s), it will record related oid(s), if such transaction failed in subsequent process, it will only evict these modified objects. The others in cache will not be affected during transaction cleanup. On the other hand, under md-on-ssd mode, CPU may yield during backend TX start, the object that is held by current modification maybe marked as evicted in such race windows. So add logic to check whether related object is evicted or not after backend TX started, if yes, then restart current transaction. Signed-off-by: Fan Yong <fan.yong@hpe.com>

daosbuild3 · 2026-01-01T06:06:39Z

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17325/3/testReport/

daosbuild3 · 2026-01-01T13:04:12Z

Test stage Functional Hardware Medium Verbs Provider MD on SSD completed with status FAILURE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net//job/daos-stack/job/daos/view/change-requests/job/PR-17325/3/execution/node/1364/log

daosbuild3 · 2026-01-01T15:13:07Z

Test stage Functional Hardware Medium MD on SSD completed with status UNSTABLE. https://jenkins-3.daos.hpc.amslabs.hpecorp.net/job/daos-stack/job/daos//view/change-requests/job/PR-17325/3/testReport/

Nasf-Fan force-pushed the Nasf-Fan/DAOS-18367_1 branch from 0747c8a to 6413f26 Compare December 30, 2025 07:19

Nasf-Fan changed the title ~~DAOS-18367 vos: evict self-created object when transaction failure~~ DAOS-18367 vos: properly evict object for failed transaction Dec 30, 2025

Nasf-Fan force-pushed the Nasf-Fan/DAOS-18367_1 branch from 6413f26 to 96e5af4 Compare January 1, 2026 05:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DAOS-18367 vos: properly evict object for failed transaction #17325

DAOS-18367 vos: properly evict object for failed transaction #17325

Nasf-Fan commented Dec 29, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

daosbuild3 commented Dec 29, 2025

Uh oh!

daosbuild3 commented Dec 29, 2025

Uh oh!

daosbuild3 commented Dec 30, 2025

Uh oh!

daosbuild3 commented Dec 30, 2025

Uh oh!

daosbuild3 commented Dec 30, 2025

Uh oh!

daosbuild3 commented Dec 30, 2025

Uh oh!

Nasf-Fan commented Jan 1, 2026

Uh oh!

daosbuild3 commented Jan 1, 2026

Uh oh!

daosbuild3 commented Jan 1, 2026

Uh oh!

daosbuild3 commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

DAOS-18367 vos: properly evict object for failed transaction #17325

Are you sure you want to change the base?

DAOS-18367 vos: properly evict object for failed transaction #17325

Conversation

Nasf-Fan commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Steps for the author:

After all prior steps are complete:

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

daosbuild3 commented Dec 29, 2025

Uh oh!

daosbuild3 commented Dec 29, 2025

Uh oh!

daosbuild3 commented Dec 30, 2025

Uh oh!

daosbuild3 commented Dec 30, 2025

Uh oh!

daosbuild3 commented Dec 30, 2025

Uh oh!

daosbuild3 commented Dec 30, 2025

Uh oh!

Nasf-Fan commented Jan 1, 2026

Uh oh!

daosbuild3 commented Jan 1, 2026

Uh oh!

daosbuild3 commented Jan 1, 2026

Uh oh!

daosbuild3 commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Nasf-Fan commented Dec 29, 2025 •

edited

Loading