Scott,<br><br>I was just going to post this, but you caught me.. yeah there is a problem with the above proposed solution and you caught it right. It will not encode what should be encoded. <br><br>You are right that such URLs must be corrected at source, but that doesn't count as an excuse if my module fails to work properly. (<a href="http://drupal.org/node/731798">http://drupal.org/node/731798</a>). As mentioned in first post, I know rawurlencode are for components of the URLs but I have not choice as I can not assume which literals will be there in the URL and which will be not.<br>
<br>Accidentally, while working on my other module (<a href="http://drupal.org/project/facebook_link">Facebook-style Links</a>) I found the same bug in Facebook. Try sharing the <a href="http://www.google.com/search?q=%22a%26b%22" target="_blank">http://www.google.com/search?q=%22a%26b%22</a> on facebook, you will see that they are doing the same. ;) I haven't reported this yet though.<br>
<br>I changed the function to this, let me know your views:<br><br>function encode_url($url) {<br> $reserved = array(<br> ":" => '!%3A!ui',<br> "/" => '!%2F!ui',<br> "?" => '!%3F!ui',<br>
"#" => '!%23!ui',<br> "[" => '!%5B!ui',<br> "]" => '!%5D!ui',<br> "@" => '!%40!ui',<br> "!" => '!%21!ui',<br>
"$" => '!%24!ui',<br> "&" => '!%26!ui',<br> "'" => '!%27!ui',<br> "(" => '!%28!ui',<br> ")" => '!%29!ui',<br>
"*" => '!%2A!ui',<br> "+" => '!%2B!ui',<br> "," => '!%2C!ui',<br> ";" => '!%3B!ui',<br> "=" => '!%3D!ui', <br>
);<br><br> $url = rawurlencode($url);<br> $url = preg_replace(array_values($reserved), array_keys($reserved), $url);<br> $url = preg_replace('!%25!ui', '%', $url);<br> return ($url);<br>}<br><br><br>
I am still testing, so let me know if some case fails for above function.<br><br clear="all">--<br>Regards,<br>Nitin Kumar Gupta<br><a href="http://publicmind.in/blog/">http://publicmind.in/blog/</a><br>
<br><br><div class="gmail_quote">On Fri, Mar 12, 2010 at 7:29 AM, Scott Reynen <span dir="ltr"><<a href="mailto:scott@makedatamakesense.com">scott@makedatamakesense.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div><div></div><div class="h5">On Mar 11, 2010, at 11:10 AM, nitin gupta wrote:<br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I am using the following to solve the problem, any ideas to improve it in terms of efficiency or otherwise are welcome:<br>
<br>
function encodeurl($url) {<br>
$reserved = array(<br>
":" => '!%3A!ui',<br>
"/" => '!%2F!ui',<br>
"?" => '!%3F!ui',<br>
"#" => '!%23!ui',<br>
"[" => '!%5B!ui',<br>
"]" => '!%5D!ui',<br>
"@" => '!%40!ui',<br>
"!" => '!%21!ui',<br>
"$" => '!%24!ui',<br>
"&" => '!%26!ui',<br>
"'" => '!%27!ui',<br>
"(" => '!%28!ui',<br>
")" => '!%29!ui',<br>
"*" => '!%2A!ui',<br>
"+" => '!%2B!ui',<br>
"," => '!%2C!ui',<br>
";" => '!%3B!ui',<br>
"=" => '!%3D!ui',<br>
);<br>
<br>
$url = rawurlencode(rawurldecode($url));<br>
$url = preg_replace(array_values($reserved), array_keys($reserved), $url);<br>
return $url;<br>
}<br>
</blockquote>
<br></div></div>
There's an old quote [1] that seems somewhat apt here:<br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Some people, when confronted with a problem, think "“I know, I'll use regular expressions."” Now they have two problems.<br>
</blockquote>
<br>
That's not entirely apt, as your regular expression might as well be done with str_replace(), but you are adding problems rather than removing them. You should really scrap this whole thing and take a few steps back rather than adding more to it; this will break URLs due to flaws in the fundamental approach.<br>
<br>
rawurlencode and rawurldecode are meant to be used on fragments of URLs, not whole URLs. It's impossible to properly encode an entire URL without first breaking it up into component parts, because the different parts require different encoding. For example, "/" should be encoded in a query string, but not in a path. Treating it the same everywhere is why you're having the problem with delimiters being encoded. The preg_replace() only hides this problem, while introducing new problems (not encoding things that should be encoded); it's not a solution.<br>
<br>
To illustrate the problem, consider this URL:<br>
<br>
<a href="http://www.google.com/search?q=%22a%26b%22" target="_blank">http://www.google.com/search?q=%22a%26b%22</a><br>
<br>
That's a Google search for the phrase "a&b". Your function turns that into this:<br>
<br>
<a href="http://www.google.com/search?q=%22a&b%22" target="_blank">http://www.google.com/search?q=%22a&b%22</a><br>
<br>
That's a Google search for "a, which returns completely different results.<br>
<br>
Backing up, you apparently have input that looks like this:<div class="im"><br>
<br>
<a href="http://example.com/path" target="_blank">http://example.com/path</a> with spaces/<br>
<br></div>
That's not a valid URL, so it needs to be fixed somewhere. Ideally it would be fixed at the source, but if that's not an option, you can fix this specific problem simply with str_replace(' ', '%20', $url); That won't break anything else because spaces aren't URL delimiters. I'm guessing your input has more complex problems with invalid URLs as your attempted solution is more broad in scope. It's hard to say what you should do without knowing more about the input. What does the raw XML look like?<br>
<br>
[1] <a href="http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html" target="_blank">http://www.codinghorror.com/blog/2008/06/regular-expressions-now-you-have-two-problems.html</a><br>
<font color="#888888">
<br>
--<br>
Scott Reynen<br>
MakeDataMakeSense.com<br>
<br>
<br>
</font></blockquote></div><br>