Skip to content

Conversation

@simonw
Copy link
Owner

@simonw simonw commented Dec 13, 2025

Adds a feature to detect and remove duplicate blocks of 10+ consecutive
lines in the HTML output. When duplicates are found, a button appears
showing "Remove X duplicate blocks" which removes them when clicked.

  • Adds findDuplicateBlocks() to detect duplicate line sequences
  • Adds removeDuplicateBlocks() to remove detected duplicates
  • Shows/hides deduplicate button based on detection results
  • Updates both HTML output and preview after deduplication

Adds a feature to detect and remove duplicate blocks of 10+ consecutive
lines in the HTML output. When duplicates are found, a button appears
showing "Remove X duplicate blocks" which removes them when clicked.

- Adds findDuplicateBlocks() to detect duplicate line sequences
- Adds removeDuplicateBlocks() to remove detected duplicates
- Shows/hides deduplicate button based on detection results
- Updates both HTML output and preview after deduplication
@cloudflare-workers-and-pages
Copy link

Deploying tools with  Cloudflare Pages  Cloudflare Pages

Latest commit: 7091cdc
Status: ✅  Deploy successful!
Preview URL: https://6c057391.tools-b1q.pages.dev
Branch Preview URL: https://claude-add-deduplicate-featu.tools-b1q.pages.dev

View logs

@claude
Copy link
Contributor

claude bot commented Dec 13, 2025

Claude finished @simonw's task —— View job


Code Review Complete ✅

I've analyzed the deduplicate feature implementation in terminal-to-html.html. Here's my comprehensive review:

Todo List

  • Read the modified terminal-to-html.html file
  • Analyze code quality and best practices
  • Check for potential bugs or issues
  • Evaluate performance considerations
  • Review security concerns
  • Assess test coverage
  • Provide comprehensive feedback

Review Feedback

Code Quality & Best Practices

Strengths:

  • Clean, well-structured implementation with clear separation of concerns
  • Good function naming (findDuplicateBlocks, removeDuplicateBlocks)
  • Consistent coding style matching the existing codebase
  • Proper ES6 module syntax and modern JavaScript features

Minor improvements:

  • Consider extracting magic numbers as constants (e.g., const MIN_DUPLICATE_LINES = 10)
  • The minLines parameter in findDuplicateBlocks could be documented with JSDoc

🔍 Potential Bugs & Issues

Critical Issue - Line 770:

previewContainer.innerHTML = fullHtml;

This creates a security vulnerability and functional bug. Setting innerHTML to full HTML content can:

  • Execute embedded JavaScript (XSS risk)
  • Break the page layout when injecting complete <!DOCTYPE html> documents

Fix needed: Extract only the body content for preview updates.

Logic Issue - Lines 461-464:
The overlap detection in findDuplicateBlocks may miss edge cases where blocks share boundaries but don't technically overlap.

Minor Issue - Line 499:
The destructuring spread [...duplicates] is unnecessary since duplicates is already sorted in place on line 479.

Performance Considerations

Algorithmic Complexity:

  • Current implementation is O(n³) in worst case due to nested loops and string comparisons
  • For typical terminal outputs (< 1000 lines), this should be acceptable
  • Consider optimization for very large outputs (> 5000 lines)

Memory Usage:

  • Multiple string operations create temporary objects
  • The lines.join('\n') operations could be expensive for large content

Optimization suggestions:

  • Use line hashing for faster block comparisons
  • Implement early termination when no duplicates are found

🔒 Security Concerns

High Priority:

  • The innerHTML assignment in the deduplicate button handler (line 770) poses XSS risk
  • While the tool processes user-pasted content, malicious HTML could execute scripts

Mitigation:

  • Use textContent or properly sanitize HTML before preview updates
  • Consider using DOMParser for safer HTML handling

🧪 Test Coverage

Missing test scenarios:

  • No automated tests for the new duplicate detection functionality
  • Edge cases not covered:
    • Empty input
    • Single line duplicates (below threshold)
    • Overlapping duplicate blocks
    • Very large inputs
    • Malformed HTML content

Recommendations:

  • Add unit tests for findDuplicateBlocks and removeDuplicateBlocks
  • Test with various terminal output formats (RTF, HTML, plain text)
  • Performance testing with large inputs

🎯 Feature Implementation Quality

Well-designed aspects:

  • Progressive enhancement - feature only appears when duplicates are detected
  • Good UX with dynamic button text updates
  • Proper integration with existing preview system

User experience:

  • Clear visual feedback about number of duplicate blocks
  • Button disappears when no duplicates remain
  • Seamless integration with existing workflow

Summary

This is a solid feature implementation with good UX design. The main concern is the security vulnerability in the preview update mechanism that should be addressed before merging. The performance characteristics are acceptable for typical use cases, and the code quality is consistent with the existing codebase.

Priority fixes needed:

  1. ⚠️ Fix the innerHTML security vulnerability in preview updates
  2. Consider adding basic error handling for edge cases
  3. Add unit tests for the core duplicate detection logic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants