Skip to content

Handle character entities in HTML titles#452

Open
sinic wants to merge 1 commit intoskeeto:masterfrom
sinic:master
Open

Handle character entities in HTML titles#452
sinic wants to merge 1 commit intoskeeto:masterfrom
sinic:master

Conversation

@sinic
Copy link
Contributor

@sinic sinic commented Jan 15, 2022

There has been some discussion about this already at issue #365 and #69. A general solution should also do something about HTML tags, but as a stopgap measure I suggest simply replacing (a subset) of the HTML character entities by the respective characters if the title of an Atom entry has an HTML type.

While technically not complete, this solves all issues for my current selection of feeds, and should be strictly more correct than doing nothing.

Atom titles with type "html" or "xhtml" might contain arbitrary HTML
content, but most commonly they only include HTML/XML entities, and
typically only numeric references.
@sinic
Copy link
Contributor Author

sinic commented Jan 15, 2022

Character entities in titles without HTML type are still a problem, of course. Users of such feeds might want to tag them as broken and do something like the following:

(defun fix-entities (tag entry)
  (when (elfeed-tagged-p tag entry)
    (elfeed-untag entry tag)
    (setf (elfeed-meta entry :title) (xml-substitute-special
                                      (elfeed-entry-title entry)))))
(add-hook 'elfeed-new-entry-hook (apply-partially #'fix-entities 'broken))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant