-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Here's a simple program to parse rubygems index I've successfully used with rubymarshal 1.0.3:
#!/usr/bin/env python3
import gzip
import requests
import rubymarshal.reader
data = requests.get('https://api.rubygems.org/latest_specs.4.8.gz').content
data = gzip.decompress(data)
for (name, ver, gemplat), _ in zip(rubymarshal.reader.loads(data), range(10)):
print(name, ver, gemplat)It's output with 1.0.3:
_ UsrMarshal:Gem::Version(['1.4']) ruby
- UsrMarshal:Gem::Version(['1']) b'ruby'
0mq UsrMarshal:Gem::Version(['0.5.3']) b'ruby'
0xdm5 UsrMarshal:Gem::Version(['0.1.0']) b'ruby'
0xffffff UsrMarshal:Gem::Version(['0.1.0']) b'ruby'
10to1-crack UsrMarshal:Gem::Version(['0.1.3']) b'ruby'
1234567890_ UsrMarshal:Gem::Version(['1.2']) b'ruby'
12_hour_time UsrMarshal:Gem::Version(['0.0.4']) b'ruby'
16watts-fluently UsrMarshal:Gem::Version(['0.3.1']) b'ruby'
189seg UsrMarshal:Gem::Version(['0.0.1']) b'ruby'
With 1.2.6 it looks like this
_ UsrMarshal({}) ruby
- UsrMarshal({}) ruby
0mq UsrMarshal({}) ruby
0xdm5 UsrMarshal({}) ruby
0xffffff UsrMarshal({}) ruby
10to1-crack UsrMarshal({}) ruby
1234567890_ UsrMarshal({}) ruby
12_hour_time UsrMarshal({}) ruby
16watts-fluently UsrMarshal({}) ruby
189seg UsrMarshal({}) ruby
Nice thing is that unicode problem has gone, but bad thing is that custom object is no longer parsed.
At the very least, this requires major version bump.
Next, the documentation is not clean or wrong on how this can be parsed now. Changing it the way an example suggests:
#!/usr/bin/env python3
import gzip
import requests
import rubymarshal.reader
from rubymarshal.classes import RubyObject, registry
data = requests.get('https://api.rubygems.org/latest_specs.4.8.gz').content
data = gzip.decompress(data)
class GemVersion(RubyObject):
ruby_class_name = "Gem::Version"
registry.register(GemVersion)
for (name, ver, gemplat), _ in zip(rubymarshal.reader.loads(data), range(10)):
print(name, ver, gemplat)doesn't change a thing.
In fact, this cannot work (at least with this data file), because ClassRegistry uses class names in form of strs, but class name is read by Reader.read as Symbol("Gem::Version"), which is hashed differently, so self.registry.get(class_name, UsrMarshal) always returns UsrMarshal.
I've solved this by using ver.marshal_dump() instead, but I don't think it's correct solution.