PowerShell scripts to compare two CSV files and produce a changes report. Choose by data size and validation depth. All provide robust header validation, configurable delimiter and encoding, case-sensitive/insensitive comparisons, and progress reporting.
Scripts
- CompareCSVs_small.ps1 — In-memory comparison (fastest for small/medium files; standard output)
- CompareCSVs_medium.ps1 — Streaming/batched comparison (handles larger files; bounded writes; standard output)
- CompareCSVs_large.ps1 — External sort-merge using temporary sorted files (bounded memory for very large files; standard output)
- CompareCSVs_Detailed.ps1 — In-memory comparison with field-level match tracking and summary statistics (validation/audit scenarios)
- CompareCSVs_DetailedV2.ps1 — In-memory comparison with field-level match tracking, value transforms, and column filtering (advanced validation/audit scenarios)
Key features
- Robust header parsing (quoted headers, embedded delimiters) via Microsoft.VisualBasic.TextFieldParser
- Validation:
- Empty/blank header detection (with 1-based positions)
- Duplicate header detection after normalization (Trim + ToLowerInvariant)
- Required anchor column present in both files
- Duplicate anchor values within each file
- Consistent row length per file
- Blank/malformed data row detection
- Configurable delimiter: comma, tab, semicolon, pipe
- Configurable encoding for input and output
- Case-sensitive or insensitive comparisons
- Progress bars (Write-Progress) with phase updates; bars always clear even on errors
- Output filenames include millisecond-precision timestamps for uniqueness
Requirements
- Windows PowerShell 5.1 or PowerShell 7+ on Windows
- Microsoft.VisualBasic available (the scripts load it for TextFieldParser)
Parameters (common)
- -PreviousCSVFile: Path to the “Previous” CSV
- -CurrentCSVFile: Path to the “Current” CSV
- -AnchorColumn: Header name of the key column used to join rows
- -OutputFolder: Folder where the changes CSV will be written
- -DelimiterName: One of comma, tab, semicolon, pipe
- -EncodingName: One of auto, utf8, utf8BOM, unicode, oem, default
- auto: let Import-Csv detect BOM; export uses UTF-8 on PS 6+ (UTF-8 with BOM behavior on PS 5.1 mappings handled internally)
- utf8BOM: UTF-8 (Excel-friendly; default)
- utf8: UTF-8 (no BOM on PS 6+, BOM on PS 5.1)
- unicode: UTF-16 LE
- oem, default: legacy code pages as exposed by PowerShell
- -CaseSensitive: Use case-sensitive comparisons
Parameters (script-specific)
- CompareCSVs_medium.ps1: -BatchSize (default 1000)
- CompareCSVs_large.ps1: -BatchSize (default 1000)
- CompareCSVs_DetailedV2.ps1: -ValueTransforms (hashtable of column transformations), -IgnoreColumns (array of column names to skip)
ValueTransforms syntax (DetailedV2 only) -ValueTransforms applies transformations to Previous file values during comparison only (original values preserved in output). Format: @{ 'ColumnName' = @{ 'OldValue' = 'NewValue'; ... }; ... } Three transformation types:
- Direct replacement: 'OldValue' = 'NewValue' (replaces value directly)
- Prefix modifier: 'OldValue' = '<<prefix' (prepends prefix to original value; useful for padding: '5' = '<<00' becomes '005')
- Suffix modifier: 'OldValue' = '>>suffix' (appends suffix to original value; useful for codes: 'USA' = '>>0' becomes 'USA0')
- Wildcard key '': Matches any value not explicitly mapped (fallback rule) Example wrapper script: $transforms = @{ 'worker_type' = @{ 'Employee' = 'EMP' 'Contingent Worker' = 'CWK' } 'profit_center' = @{ '' = '>>0' } 'legal_entity' = @{ '*' = '>>0' } } .\CompareCSVs_DetailedV2.ps1 -PreviousCSVFile .\prev.csv -CurrentCSVFile .\curr.csv ` -AnchorColumn EmployeeID -OutputFolder .\out -ValueTransforms $transforms -DelimiterName comma -EncodingName utf8BOM
-IgnoreColumns syntax (DetailedV2 only) Excludes specified columns from comparison. Format: @('ColumnName1', 'ColumnName2', ...) Example: -IgnoreColumns @('LastModifiedDate', 'ProcessingNotes') Cannot ignore the anchor column. Ignored columns still appear in output but are not compared.
Delimiters
- comma => ,
- tab => `t
- semicolon => ;
- pipe => |
Output
- A CSV written to -OutputFolder named:
- Changes_{CurrentFileBaseName}GeneratedOn{yyyy-MM-dd_HHmmssfff}.csv
- Standard output (Small, Medium, Large):
- Columns: AnchorColumn, ChangeType, and for each header: "old header" and "new header"
- Includes all rows (Add, Update, Delete, None)
- Detailed output (Detailed and DetailedV2):
- Columns: AnchorColumn, ChangeType, and for each header: "old header", "new header", "match header"
- Includes all rows (Add, Update, Delete, None) with per-field match indicators (True/False)
- Summary row inserted as first record showing mismatch counts per column ("X of Y FALSE")
- DetailedV2 only: Summary row's "old" column also shows applied transform rule counts (e.g., "Employee→EMP (5 applied), Contingent Worker→CWK (3 applied)")
- Output sorted by anchor column
- ChangeType meanings (all scripts):
- Add: Exists only in Current
- Delete: Exists only in Previous
- Update: Exists in both with at least one differing value
- None: Exists in both with no value changes
- Match column (Detailed only):
- True: Values match between Previous and Current
- False: Values differ between Previous and Current
- Empty: Not applicable (Add/Delete rows cannot be matched)
Usage examples
- Standard output, in-memory (small/medium):
- .\CompareCSVs_small.ps1 -PreviousCSVFile .\prev.csv -CurrentCSVFile .\curr.csv -AnchorColumn EmployeeID -OutputFolder .\out -DelimiterName comma -EncodingName utf8BOM
- Standard output, streaming (larger files):
- .\CompareCSVs_medium.ps1 -PreviousCSVFile .\prev.csv -CurrentCSVFile .\curr.csv -AnchorColumn EmployeeID -OutputFolder .\out -BatchSize 2000 -DelimiterName comma -EncodingName auto
- Standard output, sort-merge (very large files):
- .\CompareCSVs_large.ps1 -PreviousCSVFile .\prev.csv -CurrentCSVFile .\curr.csv -AnchorColumn EmployeeID -OutputFolder .\out -BatchSize 5000 -DelimiterName comma -EncodingName utf8
- Detailed validation (field-level match tracking, audit trails):
- .\CompareCSVs_Detailed.ps1 -PreviousCSVFile .\prev.csv -CurrentCSVFile .\curr.csv -AnchorColumn EmployeeID -OutputFolder .\out -DelimiterName comma -EncodingName utf8BOM
- Detailed validation with value transforms (normalize data during comparison):
- .\CompareCSVs_DetailedV2.ps1 -PreviousCSVFile .\prev.csv -CurrentCSVFile .\curr.csv -AnchorColumn EmployeeID -OutputFolder .\out -ValueTransforms @{'Salary'='<<0.00'} -DelimiterName comma -EncodingName utf8BOM
- Detailed validation with column filtering (exclude audit columns):
- .\CompareCSVs_DetailedV2.ps1 -PreviousCSVFile .\prev.csv -CurrentCSVFile .\curr.csv -AnchorColumn EmployeeID -OutputFolder .\out -IgnoreColumns @('LastModifiedDate','ProcessingNotes') -DelimiterName comma -EncodingName utf8BOM
Choosing a script
- CompareCSVs_small.ps1: Entire files in memory. Fastest and simplest for small/medium datasets. Use when you need a quick list of changes.
- CompareCSVs_medium.ps1: Loads "Previous" into a dictionary; streams "Current" and writes in batches. Good for larger inputs where memory is a constraint.
- CompareCSVs_large.ps1: Sorts both files by anchor on disk, then merges. Best for very large inputs; temp files are always cleaned up. Use when standard output meets your needs but data is too large for memory.
- CompareCSVs_Detailed.ps1: In-memory with field-level match tracking. Use for validation scenarios where you need to identify exactly which fields differ and by how much. Ideal for pre-cutover validation (old process vs new process), audit trails, and Excel-based analysis with filtering. Not suitable for very large datasets due to memory requirements and output size.
- CompareCSVs_DetailedV2.ps1: Enhanced version of Detailed with -ValueTransforms (column-specific value mapping) and -IgnoreColumns (skip columns from comparison). Use when you need to normalize data during comparison or exclude specific columns from analysis. Same memory/output limitations as Detailed.
Progress and diagnostics
- Progress bars show phases such as loading, streaming, sorting, and merging, with periodic updates.
- Clear error messages for header and row validation issues.
- A summary line with Adds/Updates/Deletes/Unchanged counts is printed at completion.
- Elapsed time (minutes and seconds) is printed at completion.
- "No changes detected; no CSV written" is printed if all records are unchanged (ChangeType=None).
Notes
- Paths are handled with -LiteralPath to avoid wildcard surprises.
- Temp file names in Large include millisecond timestamps and are removed in a finally block, even on errors.
Contributing
- File issues and PRs with minimal repro CSVs where possible.
- Keep behavior consistent across scripts.
License
- MIT (or your preferred license)