-
Notifications
You must be signed in to change notification settings - Fork 502
Allow URI encoding of the URLs in the path #447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Would base64 encoding the remote URL work for you? I really thought there was an issue or PR for that at one point, but all I'm finding is #431 where I said basically the same thing :) Maybe it was just an idea I had, but never implemented. I think that would ultimately be cleaner and cover a host of issues like character sets and such. URI encoding could handle it as well, but I think base64 would just be simpler and create cleaner URLs... completely opaque, but at least it's obvious that's the case. |
|
Take a look at the base64 branch and see if that would work for you |
|
I think it would solve the issues I described but for us one additional layer is that we've had some version of this in production for a while and have links using the URL encoded version that we want to continue working but only recently got unit test coverage of our solution up to a standard we felt comfortable back contributing. Totally understand if that isn't something you want to support but just likely means we'll be permanently forking. The handling we added shouldn't ever conflict with un-encoded URLs given the other validations you have on what constitutes a valid URL if that's a concern. |
|
I'm curious about combining the queries at the end... the URL encoded query and any non-encoded query that was on the original URL. Are you actually using that in practice? When I was working on the base64 implementation, I was taking the stance that if you've encoded the remote URL, then that is expected to be the entire URL (modulo a baseURL if that's in use). Is there a reason to support both encoded and unencoded query parameters in the same URL? |
|
I went ahead and merged the base64 encoding feature. With that in place, adding support for URI encoding becomes quite minimal, provided you don't need to combine the encoded and un-encoded query strings as noted above. #461 has a much simpler implementation built on top of some of the base64 changes, but with all of your original test cases. See if that works. |
|
I'm going to go ahead and merge #461 since it does pass all of the test cases you've got here, and is a much simpler implementation, based on the previous work to add base64 encoding support. I think merging that commit will end up closing this. If you do end up needing the combined query string support, I'd definitely love to hear more about the actual use case for that, and we can try to figure something out. |
Updates willnorris#431 Updates willnorris#447
Updates willnorris#250 Updates willnorris#290 Fixes willnorris#447 Co-authored-by: Will Norris <will@willnorris.com>
Summary
Allow URI encoding of the URLs in the path
Motivation:
We use URL from all over the internet who's hygine, use of query params, etc we largely don't have control over. These image urls are then also often put into RSS Feeds and other standards based formats that we make available to other crawlers. We often don't have control over the validations these third parties put on feeds including the URLS inside which often leads to validation issues, encoding is often the only ways to get these formats to pass, if there are query parameters or special characters.
Testing and Cases Handled
/1x2/,/,//httpelsewhere in the URL (in parts of code keying off of http might break if it's in multiple times)Open to make changes requested to fit either project style, go best practices, or your personal preferences.
Let me know if there's anything you need from to get this in.