[development] URL encoding
Scott Reynen
scott at makedatamakesense.com
Sun Mar 14 05:20:17 UTC 2010
On Mar 13, 2010, at 8:20 PM, nitin gupta wrote:
> I completely agree to what you and Scott are trying to say. But, I
> am not looking to create an URL, just to sanitize it to remove
> disallowed character, i.e. what a browser would do while accessing a
> URL when a user inputs an URL. Consider, I parse the following URL
> from XML:
>
> http://example.com?test/com
>
> Do you think I should encode the '/' in the query part i.e. [test/
> com]??
Technically, yes, but that's beside the point. Regardless of how
strictly you choose to apply URL encoding, you should be applying it
to specific URL parts, not full URLs.
> I don't think we need to. (Nor will Firefox, if you enter this URL
> in the address bar).
You're right that encoding the slash character isn't particularly
important in the query. In a path segment, however, the difference
between encoded and unencoded slashes is very significant; http://example.com/a/b/c
is different than http://example.com/a%2fb/c. And a slash
definitely shouldn't be encoded where it's used as a delimiter between
URL components. This is actually a good example of why encoding must
be applied to individual URL components, not the full URL.
> If a URL contains characters which are allowed in the URL
> dictionary, will we ever need to encode those characters? No.
What is the URL dictionary? Here's one of the relevant RFC on URLs:
http://www.ietf.org/rfc/rfc3986.txt
Selected quotes:
"A percent-encoding mechanism is used to represent a data octet
_in_a_component_"
"the conflicting data must be percent-encoded
_before_the_URI_is_formed_"
Emphasis added to, well, emphasize that encoding applies to component
parts.
--
Scott Reynen
MakeDataMakeSense.com
More information about the development
mailing list