Hi,<br><br>I completely agree to what you and Scott are trying to say. But, I am not looking to create an URL, just to sanitize it to remove disallowed character, i.e. what a browser would do while accessing a URL when a user inputs an URL. Consider, I parse the following URL from XML: <br>
<br><a href="http://example.com?test/com">http://example.com?test/com</a><br><br>Do you think I should encode the '/' in the query part i.e. [test/com]?? I don't think we need to. (Nor will Firefox, if you enter this URL in the address bar). If a URL contains characters which are allowed in the URL dictionary, will we ever need to encode those characters? No. My point being, the only characters we need to encode are those which are disallowed. Of course, if encounter something like this:<br>
<br><a href="http://example.com?test%3Fcom">http://example.com?test%3Fcom</a><br><br>we must not decode it either. i.e. we must maintain the integrity of the URL while checking its validity.<br><br>I am thinking this way, please let me know some case which proves otherwise.<br>
<br clear="all">--<br>Regards,<br>Nitin Kumar Gupta<br><a href="http://publicmind.in/blog/" target="_blank">http://publicmind.in/blog/</a><br>
<br><br><div class="gmail_quote">On Sun, Mar 14, 2010 at 12:12 AM, CM Lubinski <span dir="ltr"><<a href="mailto:cmc333333@gmail.com" target="_blank">cmc333333@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div>-----BEGIN PGP SIGNED MESSAGE-----<br>
Hash: SHA1<br>
<br>
</div><div>> There's an old quote [1] that seems somewhat apt here:<br>
><br>
>> Some people, when confronted with a problem, think "“I know, I'll use<br>
>> regular expressions."” Now they have two problems.<br>
><br>
<br>
</div>+1<br>
<br>
Scott is completely correct. If you want this to be sane to any degree,<br>
you'll need to parse the url into its components before trying to escape<br>
anything. Once you know which parts map to the components found in<br>
parse_url(), you can apply the appropriate escape rules (such as<br>
url_encode()) to the them individual. From *there* you should build up<br>
the final, escaped url in the form $scheme . '://' . $host . '/' . $path<br>
. '?' . $query<br>
<br>
CM Lubinski<br>
<a href="http://cmlubinski.info" target="_blank">http://cmlubinski.info</a><br>
-----BEGIN PGP SIGNATURE-----<br>
Version: GnuPG v1.4.6 (GNU/Linux)<br>
<br>
iD8DBQFLm9yofzi1OiZiJLARAmiuAJ9YpTTIJmXI+eQFm7GraWBRmjEuvgCcDtkw<br>
wykkezVvS9PUsbebUT8n2v0=<br>
=KBjn<br>
-----END PGP SIGNATURE-----<br>
</blockquote></div><br>