Bug/support BOM in xml feeds by Alek050 · Pull Request #540 · PySport/kloppy

Alek050 · 2026-02-24T10:23:19Z

Suggested fix for BOM's in xml feeds.

closes #539

probberechts · 2026-02-24T10:28:56Z

Thanks for looking into this! Would it be possible to use the utf-8-sig encoding instead? If that works, I’d prefer it over the current workaround, as it feels a bit hacky.

Alek050 · 2026-02-24T10:36:01Z

@probberechts I tried that first, but it raised the same error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 0: unexpected end of data
decoding with 'utf-8-sig' codec failed

probberechts · 2026-02-24T10:49:20Z

That's odd. It seems to work fine for me. Here is a minimal example:

import io
bom = b"\xef\xbb\xbf"
xml_content = bom + b'<?xml version="1.0"?><root/>'
feed = io.BytesIO(xml_content)
first_char = feed.read(4).decode("utf-8-sig")
assert first_char == '<'

Am I missing something?

Alek050 · 2026-02-24T11:57:58Z

Hahaha no, my bad.

I stupidly tested feed.read(1).decode("utf-8-sig")instead of feed.read(4). I changed the code.

The only assumption is that the 4th byte is now the first of the xml string. This holds for utf-8 encoding, however for utf-16 and utf32 it is 2 and 4 bytes respectively. But I guess we can keep it like this because the .decode("utf-8-sig") will not work on the other encoding types anyway (?).

Alek050 added 3 commits February 24, 2026 11:06

Added support for DOM in xml feeds

b713942

Changed to 5 bytes instead of 10

a0afc23

now also for statsperform

0a0309a

Updated to use utf-8-sig

ec14b7e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug/support BOM in xml feeds#540

Bug/support BOM in xml feeds#540
Alek050 wants to merge 4 commits intoPySport:masterfrom
Alek050:bug/support_BOM_in_xml_feeds

Alek050 commented Feb 24, 2026

Uh oh!

probberechts commented Feb 24, 2026

Uh oh!

Alek050 commented Feb 24, 2026

Uh oh!

probberechts commented Feb 24, 2026

Uh oh!

Alek050 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Alek050 commented Feb 24, 2026

Uh oh!

probberechts commented Feb 24, 2026

Uh oh!

Alek050 commented Feb 24, 2026

Uh oh!

probberechts commented Feb 24, 2026

Uh oh!

Alek050 commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants