check_links: add concurrent version to speed up the check_links task#215
check_links: add concurrent version to speed up the check_links task#215ben-beauhurst wants to merge 5 commits intoDjangoAdminHackers:masterfrom
check_links: add concurrent version to speed up the check_links task#215Conversation
Was raising a `RuntimeWarning: DateTimeField Url.last_checked received a naive datetime ... while time zone support is active.`
Since we're using sqlite as the test db backend, we can't effectively test the `Url` objects being saved in the `ThreadPoolExecutor` futures, but there's enough test coverage that under the hood `Url.check_url` is doing the right thing
|
Technically speaking, it's nice! However, I'm worried that this may trigger spam/attack detectors when you suddenly burst head requests almost simultaneously to some websites. Of course, if all links are target to different websites or for internal links, that's not an issue. Ideally, the command would group links by domain and then concurrently call "normal" check_links" for each group, but I fear this would add much complexity to the command. Thoughts? |
Yes, that is possibly a risk, but as you point out not for all use cases. I can add a docstring that explains this risk, and users can make their own judgement about whether it's the appropriate choice. |
…etection if you have many links to the same domain.
Added a concurrent version of
check_linksfor speed.When I tested this out on 1000 URLs,
check_linkstook 1min 37s andconcurrent_check_linkstook 7.85 s (with 20 workers) so a massive speed up is possible by utilizingconcurrent.futureswith the existing code.There wasn't any code coverage of
check_linksso I added that, however it was a bit tricky to write good tests for theconcurrent_check_linkssince this project is using sqlite as it's test db so concurrent writes aren't really possible.