lustre-dirty-blockmap

A patch for the Lustre 2.15.8 llite client that tracks which 2 GB-aligned blocks have been written to large files. The bitmap is persisted as the extended attribute user.dirty_blockmap and survives unmount/remount.

Motivation

Large Lustre files (HPC workloads, checkpoints, scientific datasets) are often written in partial passes — only certain regions of a multi-terabyte file change between runs. Without block-level write tracking, consumers downstream (backup, tiering, integrity checking) must either scan the entire file or rely on coarse-grained modification timestamps.

user.dirty_blockmap gives any userspace tool a compact, persistent bitmap of exactly which 2 GB regions have ever been written, at negligible runtime cost.

Design

Property	Value
Block size	2 GB
Minimum tracked file size	2 GB (smaller files: no bitmap, no overhead)
Maximum tracked file size	1 PB (larger files: `-EFBIG`)
Maximum blocks	524,288 (1 PB ÷ 2 GB)
Bitmap storage	8,192 × `__u64` = 64 KB (fits in a single xattr value)
xattr name	`user.dirty_blockmap`
Encoding	Raw little-endian `uint64_t` array, no header

Bit b of word w being set means 2 GB block (w × 64 + b) has been written at least once. Block numbering starts at 0 (file bytes 0–2 GB).

Files Changed

File	Change
`lustre/llite/dirty_blockmap.c`	New file — all bitmap logic
`lustre/llite/llite_internal.h`	Constants, `struct ll_dirty_blockmap`, `lli_dirty_blockmap` field, declarations
`lustre/llite/file.c`	Hooks in `ll_file_open`, `ll_file_write_iter`, `ll_file_release`, `ll_fsync`
`lustre/llite/vvp_io.c`	Periodic flush hook in `vvp_io_rw_end` (CIT_WRITE)
`lustre/llite/llite_lib.c`	Persist + free on inode eviction in `ll_clear_inode`
`lustre/llite/Makefile.in`	Add `dirty_blockmap.o` to the kernel module build

Lifecycle

open()
  └─ ll_file_open()
       └─ ll_dirty_blockmap_alloc()   if file >= 2 GB at open time
            (fresh zeroed bitmap — no xattr load)

write()
  └─ ll_file_write_iter()
       ├─ lazy alloc if no bitmap yet and (ki_pos >= 2 GB or
       │  i_size_read() >= 2 GB) — handles dd O_TRUNC and writes
       │  at low offsets into large sparse files
       └─ ll_dirty_blockmap_mark()    set bits for written byte range
  └─ vvp_io_rw_end()  [CIT_WRITE, after each IO completion]
       └─ ll_dirty_blockmap_store()   periodic flush if dbm_dirty
            (same cadence as mtime updates — survives long-running opens)

close()
  └─ ll_file_release()
       └─ ll_dirty_blockmap_store()   persist xattr if bitmap is dirty

fsync()
  └─ ll_fsync()
       └─ ll_dirty_blockmap_store()   persist xattr if bitmap is dirty

inode eviction
  └─ ll_clear_inode()
       ├─ ll_dirty_blockmap_store()   final persist
       └─ ll_dirty_blockmap_free()    release memory

Implementation Notes (Lustre 2.15.8 Specifics)

lli->lli_lock is rwlock_t in 2.15.8 — use write_lock/write_unlock in ll_dirty_blockmap_free(), not spin_lock.
xattr I/O buffers are heap-allocated via OBD_ALLOC/OBD_FREE — the full bitmap is 64 KB which exceeds the kernel stack limit.
md_setxattr() requires a struct ptlrpc_request ** — always pass &req and call ptlrpc_req_finished(req) afterwards to free the MDS reply buffer.
The bitmap init in ll_file_open() is placed before the final GOTO(out_och_free, rc) on the success path. Placing it before the out_och_free: label itself would land in unreachable dead code.
ll_file_write_iter() uses rc_normal (not result). Write start offset is iocb->ki_pos - rc_normal because ki_pos has already advanced by the time the hook runs.
Lazy allocation in ll_file_write_iter() triggers when either iocb->ki_pos >= DIRTY_BLOCKMAP_MIN_FILESIZE (file grew past threshold) or i_size_read() >= DIRTY_BLOCKMAP_MIN_FILESIZE (write at any offset into a large sparse file, e.g. pwrite at offset 0 on a 3 GB sparse file).

Building

The patch is generated by a script that applies all changes to a clean Lustre 2.15.8 source tree and produces a git format-patch output.

git clone https://github.com/lustre/lustre-release
cd lustre-release
git checkout 2.15.8
bash /path/to/generate_dirty_blockmap_patch.sh

Output: ../0001-llite-add-2GB-block-dirty-blockmap-via-user-xattr.patch

To apply to another tree:

git checkout 2.15.8
git apply 0001-llite-add-2GB-block-dirty-blockmap-via-user-xattr.patch

Build and install as you would any Lustre client RPM:

make rpms
dnf install -y kmod-lustre-client-*.rpm lustre-client-*.rpm

Requirements

Lustre 2.15.8 source tree
Lustre mount with user_xattr option (verify with mount | grep lustre)
Files must be ≥ 2 GB to be tracked

Reading the Bitmap

import os, struct

def read_dirty_blockmap(path):
    BLOCK_SIZE = 2 * 1024**3  # 2 GB

    try:
        data = os.getxattr(path, 'user.dirty_blockmap')
    except OSError:
        print(f"{path}: no dirty_blockmap (file < 2 GB or never written)")
        return

    file_size  = os.stat(path).st_size
    nwords     = len(data) // 8
    words      = struct.unpack(f'<{nwords}Q', data)
    total_blks = (file_size + BLOCK_SIZE - 1) // BLOCK_SIZE
    dirty_blks = sum(bin(w).count('1') for w in words)

    print(f"File:         {path}")
    print(f"Size:         {file_size:,} bytes  ({file_size / BLOCK_SIZE:.2f} × 2 GB blocks)")
    print(f"Dirty blocks: {dirty_blks} / {total_blks}")
    print(f"Block map:    ", end="")
    for w_idx, w in enumerate(words):
        for bit in range(64):
            block = w_idx * 64 + bit
            if block >= total_blks:
                break
            print((w >> bit) & 1, end="")
    print()

read_dirty_blockmap("/lustre/myfile.bin")

Example output for a 3 GB file where only the second 2 GB block was written:

File:         /lustre/myfile.bin
Size:         3,221,225,472 bytes  (1.50 × 2 GB blocks)
Dirty blocks: 1 / 2
Block map:    01

Verified Test Cases

Test	Expected	Result
`dd` (no fsync, `O_TRUNC`) into 3 GB file	2 dirty blocks	✅
`pwrite` at offset 0 into 3 GB sparse file	block 0 dirty (`10`)	✅
`pwrite` at offset 2.5 GB only	block 1 dirty (`01`)	✅
OR merge: write block 0, reopen, write block 1	both blocks dirty (`11`)	✅
Read-only open+close	xattr unchanged	✅
File < 2 GB	no xattr, no overhead	✅

Limitations & Known Issues

Truncation: bits above the new file size are not cleared when a file is truncated. A future version should hook ll_setattr to zero stale bits.
Block size is fixed at compile time (2 GB). It is not encoded in the xattr, so the constant must match between the kernel module and any userspace reader.
Concurrency: the xattr update is a read-OR-write sequence with no distributed lock. Lustre has no client-side LCK_EX path for MDS_INODELOCK_XATTR (IT_SETXATTR is obsolete; ACL consistency relies on MDS-side serialization). Two nodes flushing concurrently at close() or write commit can race — both read the same xattr, both OR in their bits, and the last writer wins, potentially losing the other node's bits. With 2 GB block granularity this window is very narrow in practice. The consequence is a missed dirty block not data corruption or data loss.
Tested on RHEL 9.7, kernel 5.14.0-611.36.1.el9_7.x86_64.

License

GPL-2.0 — same as the Lustre source tree this patch applies to.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
generate_dirty_blockmap_patch.sh		generate_dirty_blockmap_patch.sh
test_dirty_blockmap.py		test_dirty_blockmap.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lustre-dirty-blockmap

Motivation

Design

Files Changed

Lifecycle

Implementation Notes (Lustre 2.15.8 Specifics)

Building

Requirements

Reading the Bitmap

Verified Test Cases

Limitations & Known Issues

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

lustre-dirty-blockmap

Motivation

Design

Files Changed

Lifecycle

Implementation Notes (Lustre 2.15.8 Specifics)

Building

Requirements

Reading the Bitmap

Verified Test Cases

Limitations & Known Issues

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages