fix goodreads parsing by DerBeutlin · Pull Request #3 · goderich/org-books

DerBeutlin · 2022-01-23T19:27:04Z

Parsing a book from goodreads fails for me with <URL> not understood

I used git bisect to identify 2dab795 as the first bad commit and reverted a part of it which fixed it for me.

I also tested LibraryThing with my version and it seems to work as well.

Unfortunately the unit tests are broken for me so I couldn't verify the functionality.

Feel free to close if inappropriate.

Revert partly Add LibraryThing series scraping capability (2dab795)

goderich · 2022-01-24T13:16:21Z

Hi @DerBeutlin ! Thanks for the PR!

I'm surprised Goodreads works for you at all. I haven't been able to use it for a while. (That's actually why I added LibraryThing and OpenLibrary scrapers.) They seem to have deliberately updated their website to make it more difficult to scrape. Half the time I get no response on enlive-fetch at all, and the other half it's unparseable junk. I just tried again, using both Elisp and Python, and couldn't get it to work. The html I get using enlive/requests is not what I see in the browser.

The code that you reverted was changed to allow a whole series to be parsed from LibraryThing at once, using just the series link. At this point it seems like a choice between Goodreads and LibraryThing, but Goodreads doesn't work for me.

Unless of course you can see a better way to handle scraping a series. Mine is pretty hacky, I'll admit, but hey, it works.

goderich · 2022-08-24T13:31:13Z

Hi @DerBeutlin ! I've recently refactored a lot of Goodreads parsing stuff. Does the new version work for you, or do you get the same error? (Or maybe a different one?)

DerBeutlin · 2022-09-04T14:21:29Z

I now get no error anymore, however apart from the title nothing gets parsed, the author field for example is empty no additional details are parsed, not sure if this is expected

goderich · 2022-11-28T08:06:28Z

Not really, no. The problem is, Goodreads is very difficult to parse. They appear to either change the structure of their site periodically, or else use other shenanigans that make reliable scraping frustrating.

I'm considering dropping Goodreads support from my fork entirely because trying to keep up seems an exercise in futility at this point.

fix goodreads parsing

df249ff

Revert partly Add LibraryThing series scraping capability (2dab795)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix goodreads parsing#3

fix goodreads parsing#3
DerBeutlin wants to merge 1 commit intogoderich:masterfrom
DerBeutlin:fix/goodreads

DerBeutlin commented Jan 23, 2022

Uh oh!

goderich commented Jan 24, 2022

Uh oh!

goderich commented Aug 24, 2022

Uh oh!

DerBeutlin commented Sep 4, 2022

Uh oh!

goderich commented Nov 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

DerBeutlin commented Jan 23, 2022

Uh oh!

goderich commented Jan 24, 2022

Uh oh!

goderich commented Aug 24, 2022

Uh oh!

DerBeutlin commented Sep 4, 2022

Uh oh!

goderich commented Nov 28, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants