Imprement yaml parser written by pure ruby instead of Psych#9352
Imprement yaml parser written by pure ruby instead of Psych#9352
Conversation
There was a problem hiding this comment.
Pull request overview
This PR replaces RubyGems' dependency on Psych (a C extension YAML library) with a pure Ruby YAML parser implementation to allow users to freely specify library versions without C extension constraints. The implementation provides a custom Gem::YAMLSerializer module that handles YAML serialization and deserialization for Gem specifications and related data structures.
Changes:
- Implemented a new pure Ruby YAML parser in
lib/rubygems/yaml_serializer.rbwith custom dump and load methods - Removed dependency on Psych library and deleted
lib/rubygems/psych_tree.rb - Updated all YAML serialization/deserialization call sites to use the new YAMLSerializer API
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 21 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/rubygems/yaml_serializer.rb | Complete rewrite from simple stub to full YAML parser with support for Gem objects, arrays, hashes, strings, and Ruby object tags |
| lib/rubygems.rb | Changed to load yaml_serializer instead of psych and psych_tree |
| lib/rubygems/safe_yaml.rb | Updated to use YAMLSerializer.load instead of Psych.safe_load |
| lib/rubygems/specification.rb | Simplified to_yaml method to use YAMLSerializer.dump directly |
| lib/rubygems/package.rb | Changed to use YAMLSerializer.dump for checksums |
| lib/rubygems/package/old.rb | Changed exception handling from Psych::SyntaxError to StandardError |
| lib/rubygems/commands/specification_command.rb | Updated to use YAMLSerializer.dump for spec output |
| lib/rubygems/config_file.rb | Added defensive check for respond_to?(:empty?) |
| lib/rubygems/ext/cargo_builder.rb | Correctly switched from YAML to JSON for parsing cargo metadata |
| test/rubygems/test_gem_safe_yaml.rb | Added pend statements for Psych-specific tests (logic appears inverted) |
| test/rubygems/test_gem_package.rb | Updated to use YAMLSerializer.dump in tests |
| test/rubygems/test_gem_commands_owner_command.rb | Changed expected exception from Psych::DisallowedClass to ArgumentError |
| test/rubygems/helper.rb | Updated load_yaml helper to use YAMLSerializer.load |
| lib/rubygems/psych_tree.rb | Deleted (no longer needed) |
| Manifest.txt | Removed psych_tree.rb entry |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
9c2fe51 to
755fbc6
Compare
|
I read this blog post a few weeks ago, about a pure ruby implementation of a YAML parser/writer. Maybe relevant to this effort? This branch has a name very similar to that gem, so maybe they are already linked somehow? 😄 |
|
@sandstrom Yes, I read that article, and I thought about switching to |
|
@hsbt I understand, makes sense! 😄 Thanks for all your work on Ruby! 💎 |
a8b53a5 to
a57d889
Compare
8d63e95 to
0aadf09
Compare
7e4851e to
4b98e52
Compare
Add Gem.use_psych? and Gem.load_yaml branching so that YAMLSerializer is used by default, while Psych remains available via the use_psych config option in .gemrc or RUBYGEMS_USE_PSYCH environment variable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests covering the full pure-Ruby YAML implementation: - Gem object serialization round-trips (dump and load) - YAML anchors and aliases (enabled and disabled) - Permitted classes and symbols validation - Real-world gemspec parsing (fileutils, rubygems-bundler) - Edge cases: empty requirements, Hash-to-Array normalization, rdoc_options conversion, flow notation, non-specific tags, comment-only documents, special character quoting
| str.include?(":") || str.include?("#") || str.include?("[") || str.include?("]") || | ||
| str.include?("{") || str.include?("}") || str.include?(",") |
There was a problem hiding this comment.
These should be faster with a single Regexp like str =~ /[:#\[\]{},]/ since they would walk the string only once instead of up to 7 times.
Or even better match? instead of =~ if that's available in the oldest Ruby version supported by RubyGems.
|
I just wanted to point this out in case anyone runs into issues with this, but the change was not backward compatible and it now fails to parse some things that it used to. |
|
Also this issue which seems to stem from the same change: #9387 |
|
Maybe this can be comprehensively tested by downloading all the gems ever published (probably TBs of data), parsing the yaml with both Psych and YAMLSerializer, and comparing the resulting Ruby objects. |
|
@dubek That's what I would do if I had unlimited money and time. |
|
I've tested to load Top 300 downloads and their dependencies at #9398. |
What was the end-user or developer problem that led to this PR?
We would like to allow users to freely specify the version, but the version of the libraries used internally like
Psychcannot be specified.Vendoring is workaround solution, But we can't vendor Psych because that is C extension.
What is your fix for the problem, implemented in this PR?
I tried to rewrite the minimum specification of YAML with Gemini and Claude.
Performance
YAMLSerializerdump is ~2x faster thanPsych. Load is ~1.6x slower due to pure Ruby parsing, but round-trip performance is nearly equivalent since the dump advantage largely offsets the load gap.Make sure the following tasks are checked