Skip to content

Imprement yaml parser written by pure ruby instead of Psych#9352

Merged
hsbt merged 21 commits intomasterfrom
pure-psych
Mar 9, 2026
Merged

Imprement yaml parser written by pure ruby instead of Psych#9352
hsbt merged 21 commits intomasterfrom
pure-psych

Conversation

@hsbt
Copy link
Member

@hsbt hsbt commented Feb 25, 2026

What was the end-user or developer problem that led to this PR?

We would like to allow users to freely specify the version, but the version of the libraries used internally like Psych cannot be specified.

Vendoring is workaround solution, But we can't vendor Psych because that is C extension.

What is your fix for the problem, implemented in this PR?

I tried to rewrite the minimum specification of YAML with Gemini and Claude.

Performance

Operation YAMLSerializer Psych Ratio
Dump 6,546 i/s (153μs) 3,335 i/s (300μs) YAMLSerializer 1.96x faster
Load 2,458 i/s (407μs) 3,857 i/s (259μs) Psych 1.57x faster
Round-trip 1,561 i/s (640μs) 1,693 i/s (591μs) Psych 1.08x faster (within margin)

YAMLSerializer dump is ~2x faster than Psych. Load is ~1.6x slower due to pure Ruby parsing, but round-trip performance is nearly equivalent since the dump advantage largely offsets the load gap.

Make sure the following tasks are checked

Copilot AI review requested due to automatic review settings February 25, 2026 09:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces RubyGems' dependency on Psych (a C extension YAML library) with a pure Ruby YAML parser implementation to allow users to freely specify library versions without C extension constraints. The implementation provides a custom Gem::YAMLSerializer module that handles YAML serialization and deserialization for Gem specifications and related data structures.

Changes:

  • Implemented a new pure Ruby YAML parser in lib/rubygems/yaml_serializer.rb with custom dump and load methods
  • Removed dependency on Psych library and deleted lib/rubygems/psych_tree.rb
  • Updated all YAML serialization/deserialization call sites to use the new YAMLSerializer API

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 21 comments.

Show a summary per file
File Description
lib/rubygems/yaml_serializer.rb Complete rewrite from simple stub to full YAML parser with support for Gem objects, arrays, hashes, strings, and Ruby object tags
lib/rubygems.rb Changed to load yaml_serializer instead of psych and psych_tree
lib/rubygems/safe_yaml.rb Updated to use YAMLSerializer.load instead of Psych.safe_load
lib/rubygems/specification.rb Simplified to_yaml method to use YAMLSerializer.dump directly
lib/rubygems/package.rb Changed to use YAMLSerializer.dump for checksums
lib/rubygems/package/old.rb Changed exception handling from Psych::SyntaxError to StandardError
lib/rubygems/commands/specification_command.rb Updated to use YAMLSerializer.dump for spec output
lib/rubygems/config_file.rb Added defensive check for respond_to?(:empty?)
lib/rubygems/ext/cargo_builder.rb Correctly switched from YAML to JSON for parsing cargo metadata
test/rubygems/test_gem_safe_yaml.rb Added pend statements for Psych-specific tests (logic appears inverted)
test/rubygems/test_gem_package.rb Updated to use YAMLSerializer.dump in tests
test/rubygems/test_gem_commands_owner_command.rb Changed expected exception from Psych::DisallowedClass to ArgumentError
test/rubygems/helper.rb Updated load_yaml helper to use YAMLSerializer.load
lib/rubygems/psych_tree.rb Deleted (no longer needed)
Manifest.txt Removed psych_tree.rb entry

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@hsbt hsbt force-pushed the pure-psych branch 2 times, most recently from 9c2fe51 to 755fbc6 Compare February 25, 2026 11:59
@sandstrom
Copy link

sandstrom commented Feb 26, 2026

I read this blog post a few weeks ago, about a pure ruby implementation of a YAML parser/writer. Maybe relevant to this effort?

This branch has a name very similar to that gem, so maybe they are already linked somehow? 😄

@hsbt
Copy link
Member Author

hsbt commented Feb 28, 2026

@sandstrom Yes, I read that article, and I thought about switching to psych-pure, but this is the branch I'm trying out first to learn for myself.

@sandstrom
Copy link

sandstrom commented Mar 2, 2026

@hsbt I understand, makes sense! 😄

Thanks for all your work on Ruby! 💎

@hsbt hsbt force-pushed the pure-psych branch 2 times, most recently from a8b53a5 to a57d889 Compare March 6, 2026 01:13
@hsbt hsbt force-pushed the pure-psych branch 3 times, most recently from 8d63e95 to 0aadf09 Compare March 6, 2026 05:32
@hsbt hsbt changed the title [PoC] Imprement yaml parser written by pure ruby instead of Psych Imprement yaml parser written by pure ruby instead of Psych Mar 6, 2026
@hsbt hsbt force-pushed the pure-psych branch 6 times, most recently from 7e4851e to 4b98e52 Compare March 9, 2026 05:35
hsbt and others added 18 commits March 9, 2026 16:07
Add Gem.use_psych? and Gem.load_yaml branching so that YAMLSerializer
is used by default, while Psych remains available via the use_psych
config option in .gemrc or RUBYGEMS_USE_PSYCH environment variable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests covering the full pure-Ruby YAML implementation:
- Gem object serialization round-trips (dump and load)
- YAML anchors and aliases (enabled and disabled)
- Permitted classes and symbols validation
- Real-world gemspec parsing (fileutils, rubygems-bundler)
- Edge cases: empty requirements, Hash-to-Array normalization,
  rdoc_options conversion, flow notation, non-specific tags,
  comment-only documents, special character quoting
@hsbt hsbt merged commit 016ac4f into master Mar 9, 2026
94 checks passed
@hsbt hsbt deleted the pure-psych branch March 9, 2026 07:59
Comment on lines +763 to +764
str.include?(":") || str.include?("#") || str.include?("[") || str.include?("]") ||
str.include?("{") || str.include?("}") || str.include?(",")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be faster with a single Regexp like str =~ /[:#\[\]{},]/ since they would walk the string only once instead of up to 7 times.
Or even better match? instead of =~ if that's available in the oldest Ruby version supported by RubyGems.

@rwstauner
Copy link
Contributor

rwstauner commented Mar 11, 2026

I just wanted to point this out in case anyone runs into issues with this, but the change was not backward compatible and it now fails to parse some things that it used to.
See oxidize-rb/rb-sys#714 for an example.

$ RBENV_VERSION=4.0.1 ruby -rpsych -rrubygems/safe_yaml -e 'p Gem::SafeYAML.safe_load(%({"hash":1}))'
{"hash" => 1}

RBENV_VERSION=ruby-mdev ruby -rrubygems/yaml_serializer -rrubygems/safe_yaml -e 'p Gem::SafeYAML.safe_load(%({"hash":1}))'
"{\"hash\":1}"

@rwstauner
Copy link
Contributor

Also this issue which seems to stem from the same change: #9387

@dubek
Copy link

dubek commented Mar 12, 2026

Maybe this can be comprehensively tested by downloading all the gems ever published (probably TBs of data), parsing the yaml with both Psych and YAMLSerializer, and comparing the resulting Ruby objects.

@hsbt
Copy link
Member Author

hsbt commented Mar 12, 2026

@dubek That's what I would do if I had unlimited money and time.

@hsbt
Copy link
Member Author

hsbt commented Mar 16, 2026

I've tested to load Top 300 downloads and their dependencies at #9398.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants