Skip to content

Bug/support BOM in xml feeds#540

Open
Alek050 wants to merge 4 commits intoPySport:masterfrom
Alek050:bug/support_BOM_in_xml_feeds
Open

Bug/support BOM in xml feeds#540
Alek050 wants to merge 4 commits intoPySport:masterfrom
Alek050:bug/support_BOM_in_xml_feeds

Conversation

@Alek050
Copy link

@Alek050 Alek050 commented Feb 24, 2026

Suggested fix for BOM's in xml feeds.

closes #539

@probberechts
Copy link
Contributor

Thanks for looking into this! Would it be possible to use the utf-8-sig encoding instead? If that works, I’d prefer it over the current workaround, as it feels a bit hacky.

@Alek050
Copy link
Author

Alek050 commented Feb 24, 2026

@probberechts I tried that first, but it raised the same error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 0: unexpected end of data
decoding with 'utf-8-sig' codec failed

@probberechts
Copy link
Contributor

That's odd. It seems to work fine for me. Here is a minimal example:

import io
bom = b"\xef\xbb\xbf"
xml_content = bom + b'<?xml version="1.0"?><root/>'
feed = io.BytesIO(xml_content)
first_char = feed.read(4).decode("utf-8-sig")
assert first_char == '<'

Am I missing something?

@Alek050
Copy link
Author

Alek050 commented Feb 24, 2026

Hahaha no, my bad.

I stupidly tested feed.read(1).decode("utf-8-sig")instead of feed.read(4). I changed the code.

The only assumption is that the 4th byte is now the first of the xml string. This holds for utf-8 encoding, however for utf-16 and utf32 it is 2 and 4 bytes respectively. But I guess we can keep it like this because the .decode("utf-8-sig") will not work on the other encoding types anyway (?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Byte Order Marks in XML feeds

2 participants