[development] Can .htaccess discard part of a path?

Jeffry Graham jeff at funnymonkey.com
Wed Nov 11 06:17:44 UTC 2009


Hi Nancy,

Your existing rewrite rules do nothing to match your QUERY_STRING. You  
need a combination of matching the REQUEST_URI and QUERY_STRING.

I would suggest the following as a starting point *before* the  
standard drupal rewrite rules.

RewriteCond %{REQUEST_URI} ^/cgi-bin/printOriginal.pl$
RewriteCond %{QUERY_STRING} ^file=(.*)$
RewriteRule ^(.*)$ %1? [R=301,L]

You may need to adjust the RewriteRule line to

RewriteRule ^(.*)$ /PATH/TO/LEGACY/FILESDIR/%1? [R=301,L]

That way if a  user requests:
http://www.example.com/cgi-bin/printOriginal.pl?file=/foo/bar.shtml

They should be redirected with a 301 (permanently moved) to:
http://www.example.com/foo/bar.shtml

The key here is that filepath and files are matched via REQUEST_URI,  
but any parameters passed must be matched via QUERY_STRING. Also, the  
QUERY_STRING regex may need to be adjusted appropriately based on your  
incoming requests. (eg. if more than 'file' appears in the parameter  
list)

I suggest using wget or similar to test your 301's as you write them  
as it will spit out the 301 if you trigger one, and show you the  
rewritten URL client side. This is useful for debugging the  
corresponding regex's.

I didn't test any of the above so I hope that helps get you started,

Jeff

PS. the 'regular' in regular expression is a reference to regular  
languages: http://en.wikipedia.org/wiki/Regular_language


On Nov 10, 2009, at 11:55 AM, Nancy Wichmann wrote:

> Wow, how did you know about MomsTeam (now YouthSportsParents)?
>
> I put this in there already    RewriteRule ^cgi-bin/printOriginal.pl/ 
> $ http://www.example.com[R=301,L]
> And I am still seeing these come through to the Drupal log.
>
> There might be a clue in RewriteRule ^alpha/sports/(.*) http://www.example.com/sports/$1 
> [R=301,L] if I really understood regular [sic] expressions.
>
> Nancy E. Wichmann, PMP
> Injustice anywhere is a threat to justice everywhere. -- Dr. Martin  
> L. King, Jr.
>
> From: development-bounces at drupal.org [mailto:development-bounces at drupal.org 
> ] On Behalf Of Seth Freach
> Sent: Tuesday, November 10, 2009 11:26 AM
> To: development at drupal.org
> Subject: Re: [development] Can .htaccess discard part of a path?
>
> Nancy,
>
> I'm assuming this is a leftover from the moms team site?  The  
> incoming requests are coming from the fact that Google appears to  
> have lots of these links in its index still to these URLs and sites  
> which still link to these URLs.
>
> Instead of a rewrite, I'd suggest a a response code 301 redirect.   
> This will be more Google friendly.
>
> look in the default .htaccess file for the (commented out by  
> default) lines that deal with www. redirection (ie, you always want  
> people to see "www" or never do, regardless of how they access the  
> site.)  Using those patterns should help show you how to redirect to  
> the same content but without the "cgi-bin/printOriginal.pl&file=/"
>
> Seth
>
>
> Nancy Wichmann wrote:
> I am getting lots of requests like this:
> http://www.example.com/index.php?q=cgi-bin/printOriginal.pl&file=/alpha/beta/gamma/rage_prevention.shtml
> The file argument is a valid page on our old site and is itself  
> redirected with a ReWriteRule in .htaccess. However, cgi-bin/ 
> printOriginal.pl does not exist and I have no idea what it was  
> supposed to do (well, I can guess print the page). We get lots of  
> these requests for different pages. I have tried a simple rewrite  
> rule and a URL alias to prevent the 404 processing, but neither has  
> fixed it.
> Is it possible to design a rewriterule that essentially discards the  
> "cgi-bin/printOriginal.pl" and just serves up the requested page  
> (well, after its own rewrite rule has worked)? So this would become
> http://www.example.com/index.php/alpha/beta/gamma/ 
> rage_prevention.shtml
>
>
> Nancy E. Wichmann, PMP
> Injustice anywhere is a threat to justice everywhere. -- Dr. Martin  
> L. King, Jr.
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.704 / Virus Database: 270.14.59/2494 - Release Date:  
> 11/10/09 02:38:00
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.drupal.org/pipermail/development/attachments/20091110/d965b2bc/attachment-0001.html 


More information about the development mailing list