Conversation
Revert partly Add LibraryThing series scraping capability (2dab795)
|
Hi @DerBeutlin ! Thanks for the PR! I'm surprised Goodreads works for you at all. I haven't been able to use it for a while. (That's actually why I added LibraryThing and OpenLibrary scrapers.) They seem to have deliberately updated their website to make it more difficult to scrape. Half the time I get no response on The code that you reverted was changed to allow a whole series to be parsed from LibraryThing at once, using just the series link. At this point it seems like a choice between Goodreads and LibraryThing, but Goodreads doesn't work for me. Unless of course you can see a better way to handle scraping a series. Mine is pretty hacky, I'll admit, but hey, it works. |
|
Hi @DerBeutlin ! I've recently refactored a lot of Goodreads parsing stuff. Does the new version work for you, or do you get the same error? (Or maybe a different one?) |
|
I now get no error anymore, however apart from the title nothing gets parsed, the author field for example is empty no additional details are parsed, not sure if this is expected |
|
Not really, no. The problem is, Goodreads is very difficult to parse. They appear to either change the structure of their site periodically, or else use other shenanigans that make reliable scraping frustrating. I'm considering dropping Goodreads support from my fork entirely because trying to keep up seems an exercise in futility at this point. |
Parsing a book from goodreads fails for me with
<URL> not understoodI used git bisect to identify 2dab795 as the first bad commit and reverted a part of it which fixed it for me.
I also tested LibraryThing with my version and it seems to work as well.
Unfortunately the unit tests are broken for me so I couldn't verify the functionality.
Feel free to close if inappropriate.