Skip to content

Regression and/or documentation lackage: cannot parse version from rubygems index #3

@AMDmi3

Description

@AMDmi3

Here's a simple program to parse rubygems index I've successfully used with rubymarshal 1.0.3:

#!/usr/bin/env python3
  
import gzip
import requests
import rubymarshal.reader

data = requests.get('https://api.rubygems.org/latest_specs.4.8.gz').content
data = gzip.decompress(data)

for (name, ver, gemplat), _ in zip(rubymarshal.reader.loads(data), range(10)):
    print(name, ver, gemplat)

It's output with 1.0.3:

_ UsrMarshal:Gem::Version(['1.4']) ruby
- UsrMarshal:Gem::Version(['1']) b'ruby'
0mq UsrMarshal:Gem::Version(['0.5.3']) b'ruby'
0xdm5 UsrMarshal:Gem::Version(['0.1.0']) b'ruby'
0xffffff UsrMarshal:Gem::Version(['0.1.0']) b'ruby'
10to1-crack UsrMarshal:Gem::Version(['0.1.3']) b'ruby'
1234567890_ UsrMarshal:Gem::Version(['1.2']) b'ruby'
12_hour_time UsrMarshal:Gem::Version(['0.0.4']) b'ruby'
16watts-fluently UsrMarshal:Gem::Version(['0.3.1']) b'ruby'
189seg UsrMarshal:Gem::Version(['0.0.1']) b'ruby'

With 1.2.6 it looks like this

_ UsrMarshal({}) ruby
- UsrMarshal({}) ruby
0mq UsrMarshal({}) ruby
0xdm5 UsrMarshal({}) ruby
0xffffff UsrMarshal({}) ruby
10to1-crack UsrMarshal({}) ruby
1234567890_ UsrMarshal({}) ruby
12_hour_time UsrMarshal({}) ruby
16watts-fluently UsrMarshal({}) ruby
189seg UsrMarshal({}) ruby

Nice thing is that unicode problem has gone, but bad thing is that custom object is no longer parsed.

At the very least, this requires major version bump.

Next, the documentation is not clean or wrong on how this can be parsed now. Changing it the way an example suggests:

#!/usr/bin/env python3
  
import gzip
import requests
import rubymarshal.reader
from rubymarshal.classes import RubyObject, registry


data = requests.get('https://api.rubygems.org/latest_specs.4.8.gz').content
data = gzip.decompress(data)

class GemVersion(RubyObject):
    ruby_class_name = "Gem::Version"

registry.register(GemVersion)

for (name, ver, gemplat), _ in zip(rubymarshal.reader.loads(data), range(10)):
    print(name, ver, gemplat)

doesn't change a thing.

In fact, this cannot work (at least with this data file), because ClassRegistry uses class names in form of strs, but class name is read by Reader.read as Symbol("Gem::Version"), which is hashed differently, so self.registry.get(class_name, UsrMarshal) always returns UsrMarshal.

I've solved this by using ver.marshal_dump() instead, but I don't think it's correct solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions