[development] URL encoding
    Scott Reynen 
    scott at makedatamakesense.com
       
    Sun Mar 14 05:20:17 UTC 2010
    
    
  
On Mar 13, 2010, at 8:20 PM, nitin gupta wrote:
> I completely agree to what you and Scott are trying to say. But, I  
> am not looking to create an URL, just to sanitize it to remove  
> disallowed character, i.e. what a browser would do while accessing a  
> URL when a user inputs an URL. Consider, I parse the following URL  
> from XML:
>
> http://example.com?test/com
>
> Do you think I should encode the '/' in the query part i.e. [test/ 
> com]??
Technically, yes, but that's beside the point.  Regardless of how  
strictly you choose to apply URL encoding, you should be applying it  
to specific URL parts, not full URLs.
> I don't think we need to. (Nor will Firefox, if you enter this URL  
> in the address bar).
You're right that encoding the slash character isn't particularly  
important in the query.  In a path segment, however, the difference  
between encoded and unencoded slashes is very significant; http://example.com/a/b/c 
  is  different than http://example.com/a%2fb/c.  And a slash  
definitely shouldn't be encoded where it's used as a delimiter between  
URL components.  This is actually a good example of why encoding must  
be applied to individual URL components, not the full URL.
> If a URL contains characters which are allowed in the URL  
> dictionary, will we ever need to encode those characters? No.
What is the URL dictionary?  Here's one of the relevant RFC on URLs:
http://www.ietf.org/rfc/rfc3986.txt
Selected quotes:
"A percent-encoding mechanism is used to represent a data octet  
_in_a_component_"
"the conflicting data must be percent-encoded  
_before_the_URI_is_formed_"
Emphasis added to, well, emphasize that encoding applies to component  
parts.
--
Scott Reynen
MakeDataMakeSense.com
    
    
More information about the development
mailing list