-
Notifications
You must be signed in to change notification settings - Fork 98
Open
Description
I converted the database build scripts to Go over the holiday. When I compared the Go output to the original Bash + Python scripts, I noticed the following issues with the original results:
- The last item in each import statement is either being dropped or is picking up the ); as part of the title or link target.
- Titles or link targets with embedded commas are being truncated at the first comma.
- While the page IDs in each record of links.with_counts.txt.gz are sorted (alphabetically, not numerically), the records themselves are in a non-deterministic hash order.
Just FYI: there is so much garbage in the link targets in particular, it’s amazing that sed is doing as well as it does!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels