[development] URL encoding
nitingupta.iitg at gmail.com
Sun Mar 14 03:20:46 UTC 2010
I completely agree to what you and Scott are trying to say. But, I am not
looking to create an URL, just to sanitize it to remove disallowed
character, i.e. what a browser would do while accessing a URL when a user
inputs an URL. Consider, I parse the following URL from XML:
Do you think I should encode the '/' in the query part i.e. [test/com]?? I
don't think we need to. (Nor will Firefox, if you enter this URL in the
address bar). If a URL contains characters which are allowed in the URL
dictionary, will we ever need to encode those characters? No. My point
being, the only characters we need to encode are those which are disallowed.
Of course, if encounter something like this:
we must not decode it either. i.e. we must maintain the integrity of the URL
while checking its validity.
I am thinking this way, please let me know some case which proves otherwise.
Nitin Kumar Gupta
On Sun, Mar 14, 2010 at 12:12 AM, CM Lubinski <cmc333333 at gmail.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> > There's an old quote  that seems somewhat apt here:
> >> Some people, when confronted with a problem, think "“I know, I'll use
> >> regular expressions."” Now they have two problems.
> Scott is completely correct. If you want this to be sane to any degree,
> you'll need to parse the url into its components before trying to escape
> anything. Once you know which parts map to the components found in
> parse_url(), you can apply the appropriate escape rules (such as
> url_encode()) to the them individual. From *there* you should build up
> the final, escaped url in the form $scheme . '://' . $host . '/' . $path
> . '?' . $query
> CM Lubinski
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> -----END PGP SIGNATURE-----
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the development