[development] URL encoding

Scott Reynen scott at makedatamakesense.com
Sun Mar 14 05:20:17 UTC 2010


On Mar 13, 2010, at 8:20 PM, nitin gupta wrote:

> I completely agree to what you and Scott are trying to say. But, I  
> am not looking to create an URL, just to sanitize it to remove  
> disallowed character, i.e. what a browser would do while accessing a  
> URL when a user inputs an URL. Consider, I parse the following URL  
> from XML:
>
> http://example.com?test/com
>
> Do you think I should encode the '/' in the query part i.e. [test/ 
> com]??

Technically, yes, but that's beside the point.  Regardless of how  
strictly you choose to apply URL encoding, you should be applying it  
to specific URL parts, not full URLs.

> I don't think we need to. (Nor will Firefox, if you enter this URL  
> in the address bar).

You're right that encoding the slash character isn't particularly  
important in the query.  In a path segment, however, the difference  
between encoded and unencoded slashes is very significant; http://example.com/a/b/c 
  is  different than http://example.com/a%2fb/c.  And a slash  
definitely shouldn't be encoded where it's used as a delimiter between  
URL components.  This is actually a good example of why encoding must  
be applied to individual URL components, not the full URL.

> If a URL contains characters which are allowed in the URL  
> dictionary, will we ever need to encode those characters? No.

What is the URL dictionary?  Here's one of the relevant RFC on URLs:

http://www.ietf.org/rfc/rfc3986.txt

Selected quotes:

"A percent-encoding mechanism is used to represent a data octet  
_in_a_component_"
"the conflicting data must be percent-encoded  
_before_the_URI_is_formed_"

Emphasis added to, well, emphasize that encoding applies to component  
parts.

--
Scott Reynen
MakeDataMakeSense.com




More information about the development mailing list