Skip to content

Malformed HTTP headers lead to "ValueError: need more than 1 value to unpack" crash #19

@JustAnotherArchivist

Description

@JustAnotherArchivist

I have a WARC which contains an HTTP response whose headers are malformed. Specifically, it's from http://www.assoc-amazon.com/s/link-enhancer?tag=discount039-20&o=1 and this is the data returned:

HTTP/1.1 302 
Content-Type: text/html
nnCoection: close
Content-Length: 0
Location: //wms.assoc-amazon.com/20070822/US/js/link-enhancer-common.js?tag=discount039-20
Cache-Control: no-cache
Pragma: no-cache

More precisely, in Python repr notation:

b'HTTP/1.1 302 \nContent-Type: text/html\nnnCoection: close\nContent-Length: 0\nLocation: //wms.assoc-amazon.com/20070822/US/js/link-enhancer-common.js?tag=discount039-20\nCache-Control: no-cache\nPragma: no-cache\n\n'

There are several issues with this response, but the main one and the one causing trouble is that the line endings are just LF, not CRLF. This causes warcat verify to crash with the following traceback:

Traceback (most recent call last):
  File "/usr/lib/python3.4/runpy.py", line 170, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.4/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File ".../lib/python3.4/site-packages/warcat/__main__.py", line 154, in <module>
    main()
  File ".../lib/python3.4/site-packages/warcat/__main__.py", line 70, in main
    command_info[1](args)
  File ".../lib/python3.4/site-packages/warcat/__main__.py", line 136, in verify_command
    tool.process()
  File ".../lib/python3.4/site-packages/warcat/tool.py", line 95, in process
    check_block_length=self.check_block_length)
  File ".../lib/python3.4/site-packages/warcat/model/warc.py", line 75, in read_record
    check_block_length=check_block_length)
  File ".../lib/python3.4/site-packages/warcat/model/record.py", line 68, in load
    content_type)
  File ".../lib/python3.4/site-packages/warcat/model/block.py", line 21, in load
    field_cls=HTTPHeader)
  File ".../lib/python3.4/site-packages/warcat/model/block.py", line 92, in load
    fields = field_cls.parse(file_obj.read(field_length).decode())
  File ".../lib/python3.4/site-packages/warcat/model/field.py", line 215, in parse
    http_headers.status, s = s.split(newline, 1)
ValueError: need more than 1 value to unpack

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions