Skip to content

Clean HTML tags from title#443

Open
danielfleischer wants to merge 2 commits intoskeeto:masterfrom
danielfleischer:clean-title
Open

Clean HTML tags from title#443
danielfleischer wants to merge 2 commits intoskeeto:masterfrom
danielfleischer:clean-title

Conversation

@danielfleischer
Copy link

@danielfleischer danielfleischer commented Sep 15, 2021

Remove tags of the form < ... > and & ... ;.

These tags appear sometimes, for example in Arts & Letters Daily feed.

Based on some code in #365.

@sinic
Copy link
Contributor

sinic commented Jan 15, 2022

Character entities, at least, might constitute an integral part of the title, whether as single characters, or because it's talking about HTML entities. Simply removing them won't do.

My pull request at #452 converts them to characters instead, but only for feeds that declare titles to be in HTML. The Arts & Letters feed doesn't, unfortunately. For that one, you could adapt the workaround I just posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants