[development] URL encoding

nitin gupta nitingupta.iitg at gmail.com
Sun Mar 14 06:46:40 UTC 2010


Hi Scott,

I don't think we are in contradiction here, but in point of view. I am
saying that we should not encode what is an allowed character. If the URL is
already present somewhere like (http://example.com/hj/hj) there is not need
to encode and if it is present like (http://example.com%2f/test) there is no
need to decode. And what you should do if you get such a URL, just do not
touch it, because it contains no invalid character.

@URL dictionary: Are you kidding?? I was obviously referring to the same
RFC.

I will like you to think for a moment and tell me what will you gain by
breaking the URL into components and then encoding it and then joining it
again. Consider this problem statement: You are given a URL, which is
extracted from a source HTML of a webpage, and you need to access it using
drupal_http_request(). I am, of course, interesting in improving what I
currently have in hand.

"Fire me all you can, but cast me into a solid and beautiful pot"

--
Regards,
Nitin Kumar Gupta
http://publicmind.in/blog/


On Sun, Mar 14, 2010 at 10:50 AM, Scott Reynen
<scott at makedatamakesense.com>wrote:

> On Mar 13, 2010, at 8:20 PM, nitin gupta wrote:
>
>  I completely agree to what you and Scott are trying to say. But, I am not
>> looking to create an URL, just to sanitize it to remove disallowed
>> character, i.e. what a browser would do while accessing a URL when a user
>> inputs an URL. Consider, I parse the following URL from XML:
>>
>> http://example.com?test/com
>>
>> Do you think I should encode the '/' in the query part i.e. [test/com]??
>>
>
> Technically, yes, but that's beside the point.  Regardless of how strictly
> you choose to apply URL encoding, you should be applying it to specific URL
> parts, not full URLs.
>
>
>  I don't think we need to. (Nor will Firefox, if you enter this URL in the
>> address bar).
>>
>
> You're right that encoding the slash character isn't particularly important
> in the query.  In a path segment, however, the difference between encoded
> and unencoded slashes is very significant; http://example.com/a/b/c is
>  different than http://example.com/a%2fb/c.  And a slash definitely
> shouldn't be encoded where it's used as a delimiter between URL components.
>  This is actually a good example of why encoding must be applied to
> individual URL components, not the full URL.
>
>
>  If a URL contains characters which are allowed in the URL dictionary, will
>> we ever need to encode those characters? No.
>>
>
> What is the URL dictionary?  Here's one of the relevant RFC on URLs:
>
> http://www.ietf.org/rfc/rfc3986.txt
>
> Selected quotes:
>
> "A percent-encoding mechanism is used to represent a data octet
> _in_a_component_"
> "the conflicting data must be percent-encoded _before_the_URI_is_formed_"
>
> Emphasis added to, well, emphasize that encoding applies to component
> parts.
>
> --
> Scott Reynen
> MakeDataMakeSense.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.drupal.org/pipermail/development/attachments/20100314/51e6915a/attachment.html 


More information about the development mailing list