Skip to content

feat(gemini): Scans directories for old or large files, presenting them as 'digital dust bunnies' for archival or deletion.#3691

Open
polsala wants to merge 1 commit intomainfrom
ai/gemini-20260309-1255
Open

feat(gemini): Scans directories for old or large files, presenting them as 'digital dust bunnies' for archival or deletion.#3691
polsala wants to merge 1 commit intomainfrom
ai/gemini-20260309-1255

Conversation

@polsala
Copy link
Owner

@polsala polsala commented Mar 9, 2026

Implementation Summary

  • Utility: nightly-digital-dust-bunny-sweeper
  • Provider: gemini
  • Location: node-utils/nightly-nightly-digital-dust-bunny-s-2
  • Files Created: 3
  • Description: Scans directories for old or large files, presenting them as 'digital dust bunnies' for archival or deletion.

Rationale

  • Automated proposal from the Gemini generator delivering a fresh community utility.
  • This utility was generated using the gemini AI provider.

Why safe to merge

  • Utility is isolated to node-utils/nightly-nightly-digital-dust-bunny-s-2.
  • README + tests ship together (see folder contents).
  • No secrets or credentials touched.
  • All changes are additive and self-contained.

Test Plan

  • Follow the instructions in the generated README at node-utils/nightly-nightly-digital-dust-bunny-s-2/README.md
  • Run tests located in node-utils/nightly-nightly-digital-dust-bunny-s-2/tests/

Links

  • Generated docs and examples committed alongside this change.

Mock Justification

  • Not applicable; generator did not introduce new mocks.

…em as 'digital dust bunnies' for archival or deletion.
@polsala
Copy link
Owner Author

polsala commented Mar 9, 2026

🤖 Review by GROQ Agent

✅ What's solid

  • Clear purpose & isolation – The utility lives in its own folder (node-utils/nightly‑nightly‑digital‑dust‑bunny‑s‑2) and does not touch any existing code‑base, making it safe to merge from a dependency‑impact perspective.
  • Robust error handling
    • getFileStats swallows inaccessible files and returns null instead of bubbling an exception.
    • Directory‑read failures are logged as warnings, not fatal errors, so a single unreadable folder won’t abort the whole sweep.
  • CLI ergonomics – The script prints a helpful usage line when required arguments are missing and validates --age/--size values before proceeding.
  • Test coverage – The test suite exercises the core logic (findDustBunnies) and the CLI entry point (main) for a variety of edge cases (empty dirs, inaccessible files, invalid arguments, non‑existent paths).
  • Exported functions – Exporting findDustBunnies, getFileStats, and main makes the core algorithm testable without spawning a child process.

🧪 Tests

Observation Recommendation
Tests mock fs.promises globally and restore it in an after hook. Consider dependency‑injecting the fs module into findDustBunnies (e.g. findDustBunnies(dir, age, size, fs = require('fs').promises)). This avoids global monkey‑patching and prevents interference with other test suites.
The test file repeats the same findDustBunnies('/mock/root', defaultAge, defaultSize) call many times. Extract the common call into a helper (async function runDefaultScan() { return await findDustBunnies('/mock/root', defaultAge, defaultSize); }) to keep the test body concise and reduce duplication.
CLI tests replace process.argv, console.error, and process.exit manually. Wrap the overrides in a utility function (withMockedProcess(argv, fn)) that automatically restores the originals, making each test less noisy and less error‑prone.
No test verifies the output format (size string, ISO date). Add a unit test that calls findDustBunnies with a known file and asserts the shape of the returned object (e.g. size ends with ' MB' and lastModified is a valid ISO string).
Edge cases such as symlink loops or deep recursion are not covered. Introduce a test that creates a symlink pointing to a parent directory (or a very deep nested structure) and ensures the utility does not crash or exceed the call stack.
Performance‑related tests (large directory trees) are missing. Consider a benchmark test that creates a mock directory with thousands of files and asserts that the scan completes within a reasonable time (e.g., < 2 s). This can surface potential throttling or concurrency issues.

Example: Dependency‑injection refactor (test‑friendly)

// src/index.js
async function findDustBunnies(
  directoryPath,
  ageThresholdDays,
  sizeThresholdMB,
  fsModule = require('fs').promises
) {
  // replace all `fs` references with `fsModule`
}
module.exports = { findDustBunnies, /* … */ };

Then in tests:

const mockFs = { /* … */ };
const { findDustBunnies } = require('../src/index');
await findDustBunnies('/mock/root', 365, 100, mockFs);

🔒 Security

  • Path handling – The script trusts the supplied directoryPath. While this is acceptable for a local utility, adding a canonicalisation step (path.resolve) before scanning can prevent accidental traversal outside the intended root (e.g., node src/index.js ../../).
  • Symlink handlingfs.readdir with withFileTypes: true will treat symlinks as files (entry.isFile() returns false for symlinks). However, if a symlink points to a directory, the current code will ignore it, potentially missing dust bunnies. Conversely, following symlinks could create infinite loops. Document the current behaviour and consider adding an explicit --follow-symlinks flag with a depth guard.
  • Denial‑of‑service – Scanning extremely large trees can consume significant CPU and I/O. Introducing a concurrency limit (e.g., using p-limit or a simple semaphore) when recursing can mitigate resource exhaustion.
  • No external secrets – The utility does not read environment variables or network resources, so the surface for credential leakage is nil.

🧩 Docs / Developer Experience

  • README path mismatch – The installation instructions reference node-utils/nightly-digital-dust-bunny-sweeper, but the actual folder is node-utils/nightly-nightly-digital-dust-bunny-s-2. Align the documentation with the real path to avoid confusion.
  • CLI options description – It would be helpful to list the exit codes (0 = success, 1 = usage/error) and explain that warnings (unreadable directories) do not affect the exit status.
  • Example output – Include a sample of the console output (both the “no bunnies” and “found bunnies” cases) so users know what to expect.
  • Argument parsing – Consider using a lightweight library like yargs or commander. This would automatically generate --help output, enforce type checking, and reduce manual parsing bugs.
  • Programmatic API – The exported findDustBunnies function is useful for other scripts. Document its signature and return shape in the README under a “Programmatic usage” section.

🧱 Mocks / Fakes

  • Filesystem mock – The test suite replaces require('fs').promises with a handcrafted mock that returns a static directory tree. This works but is global and can interfere with other test files that also rely on fs.
    • Improvement: Use mock-fs (https://github.com/tschaub/mock-fs) which provides a scoped mock and automatically restores the original fs after each test. It also supports symlinks, permissions, and timestamps out‑of‑the‑box, making the mock more realistic.
  • Process mocks – Overriding process.exit with a throwing function is a common pattern, but the current implementation repeats the same boilerplate in multiple tests. Extract this into a reusable helper:
async function withMockedProcess(argv, fn) {
  const original = { argv: process.argv, exit: process.exit, error: console.error };
  process.argv = argv;
  let exitCode;
  console.error = (msg) => (original.error(msg));
  process.exit = (code) => { exitCode = code; throw new Error('ProcessExit'); };
  try {
    await fn(exitCode);
  } finally {
    process.argv = original.argv;
    process.exit = original.exit;
    console.error = original.error;
  }
}
  • Re‑use of mock data – The mock directory structure is defined once (presumably at the top of the test file). Ensure that each test runs against a fresh copy of the mock (e.g., Object.assign({}, mockTree)) if any test mutates the structure, to keep tests isolated.

Overall, the utility is well‑structured, the core algorithm is solid, and the test suite provides decent coverage. Addressing the documentation inconsistencies, tightening the test‑mock strategy, and adding a few edge‑case tests will make the contribution even more robust and maintainable.

@polsala
Copy link
Owner Author

polsala commented Mar 9, 2026

🤖 Review by GROQ Agent

✅ What's solid

  • Clear purpose & isolation – The utility lives in its own folder (node-utils/nightly‑nightly‑digital‑dust‑bunny‑s‑2) and does not touch any existing code‑base, making it low‑risk to merge.
  • Robust error handling
    • getFileStats swallows inaccessible files and returns null so the scan never crashes on permission errors or broken symlinks.
    • scanDirectory logs a warning instead of throwing when a directory cannot be read.
    • The CLI validates required arguments, numeric thresholds, and the existence/type of the target path, exiting with a clear error message.
  • Readable output – Human‑friendly console messages (emoji, formatted size, ISO timestamps) make the results easy to understand for end‑users.
  • Exported functionsfindDustBunnies, getFileStats, and main are exported, enabling unit testing of core logic and CLI behaviour.

🧪 Tests

  • Good coverage of core scenarios – The test suite checks:
    • Normal detection of old + large files.
    • Empty directories, inaccessible files, and non‑matching files.
    • CLI error paths (missing directory, path is a file, invalid --age/--size values).
  • Mocked filesystem – Replaces fs.promises with an in‑memory structure, keeping tests fast and deterministic.

Actionable suggestions

  • Restore the original fs.promises in a afterEach hook (instead of a single after). If a test fails early, the global mock could leak into subsequent suites. Example:

    let originalFs;
    before(() => { originalFs = require('fs').promises; });
    afterEach(() => { require('fs').promises = originalFs; });
  • Add a test for edge‑case thresholds (e.g., --age 0 or --size 0). The current validation rejects non‑positive numbers, but a unit test would guard against regressions.

  • Consider a small integration test that creates a temporary directory with real files (using fs.mkdtemp + fs.writeFile) and runs the CLI via child_process.exec. This validates that the shebang/Node execution path works end‑to‑end.

🔒 Security

  • Path handling – The script accepts any string as directoryPath. While this is expected for a CLI tool, it could be hardened by normalising the path (path.resolve) and optionally rejecting paths that resolve outside a configured root (useful if the utility is ever exposed via a service).

    const resolvedPath = path.resolve(directoryPath);
    if (!resolvedPath.startsWith(allowedRoot)) {
      console.error('Error: Path outside allowed root.');
      process.exit(1);
    }
  • Symlink traversal – The current implementation ignores symlinks because Dirent.isDirectory()/isFile() return false for them. Explicitly documenting this behaviour (or deliberately following safe symlinks) would avoid confusion.

  • No external dependencies – By avoiding third‑party modules the attack surface stays minimal.

🧩 Docs / Developer Experience

  • README completeness – The README covers usage, arguments, and installation, but could be improved by:

    • Adding an “Exit codes” section (e.g., 0 = success, 1 = error).
    • Providing a sample JSON/CSV output option for downstream automation (e.g., --json).
    • Explaining the logging level (warnings vs. normal output) and how to silence them if desired.
  • CLI ergonomics – Hand‑rolling argument parsing works, yet a lightweight library like yargs or commander would give:

    • Automatic --help generation.
    • Built‑in validation of numeric options.
    • Consistent error messages.

    Example snippet with commander:

    const { program } = require('commander');
    program
      .argument('<directory>', 'directory to scan')
      .option('--age <days>', 'minimum age in days', parseInt, 365)
      .option('--size <mb>', 'minimum size in MB', parseInt, 100)
      .parse();
    const [directoryPath] = program.args;
    const { age, size } = program.opts();
  • Shebang for direct execution – Adding #!/usr/bin/env node at the top of src/index.js and marking the file executable (chmod +x) would let users run ./src/index.js … without invoking node explicitly.

  • JSDoc / Type hints – Adding JSDoc comments to exported functions would improve IDE autocomplete and future TypeScript migration.

🧱 Mocks / Fakes

  • Current approach – Directly overwriting require('fs').promises works but is a bit invasive. Using a mocking library (e.g., sinon.stub(require('fs').promises, 'readdir')) would allow more granular control and automatic restoration.

  • Potential improvement – Abstract the filesystem access behind a small interface (e.g., fsAdapter.readDir, fsAdapter.stat) and inject a mock implementation in tests. This decouples the production code from the Node fs module and makes the codebase easier to test in non‑Node environments.

    // src/fsAdapter.js
    const fs = require('fs').promises;
    module.exports = {
      readdir: (p, opts) => fs.readdir(p, opts),
      stat: (p) => fs.stat(p),
    };

    Then in index.js:

    const fsAdapter = require('./fsAdapter');
    // use fsAdapter.readdir / fsAdapter.stat

    Tests can replace fsAdapter with an in‑memory stub without touching the global fs module.


Overall, the utility is well‑structured, well‑tested, and safe for the intended use‑case. The suggestions above focus on tightening edge‑case handling, improving developer ergonomics, and future‑proofing the codebase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant