-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Some Unicode entities are unsupported in adsrefpipe/refparsers/unicode.py. In particular the entity for parrot (🦜 in the error trace below), however, support needs to be extended to other characters, error handling should be improves, or both.
Traceback (most recent call last): File "/app/adsrefpipe/refparsers/unicode.py", line 222, in __sub_hexnumasc_entity if self.unicode[entno]: IndexError: list index out of range During handling of the above exception, another exception occurred: Traceback (most recent call last): File "run.py", line 323, in process_files(source_filenames) File "run.py", line 107, in process_files parsed_references = toREFs.process_and_dispatch() File "/app/adsrefpipe/refparsers/arXivTXT.py", line 80, in process_and_dispatch reference = self.cleanup(raw_reference) File "/app/adsrefpipe/refparsers/arXivTXT.py", line 61, in cleanup reference = unicode_handler.ent2asc(reference) File "/app/adsrefpipe/refparsers/unicode.py", line 171, in ent2asc result = self.re_hexnumentity.sub(self.__sub_hexnumasc_entity, result) File "/app/adsrefpipe/refparsers/unicode.py", line 227, in __sub_hexnumasc_entity raise UnicodeHandlerError('Unknown hexadecimal entity: %s' % match.group(0)) adsrefpipe.refparsers.unicode.UnicodeHandlerError: Unknown hexadecimal entity: 🦜