5

On one of my sites I get 404s because some scripts are fetching all URLs from the start page as raw URLs, so they include the # in the URL. Normal browsers don’t ever send that part to the server, but these scripts do.

This is how a typical requests URL looks like:

/2014/how-to-manage-wordpress-multisite-imports-with-wp-cli/#comments

The # is not URL encoded.

I tried both following methods:

RedirectMatch 301 \#comments       /

and

RewriteRule #(.+)$ /? [L,R=301]

Both without success, the rules don’t catch these requests, because the # starts a comment. The referer and the user-agent fields are empty.

What should I do?

fuxia
  • 286
  • 6
  • 16

1 Answers1

4

From the mod_rewrite documentation you need to use the NE (no escape) flag when your rewrite rule has a hash:

RewriteRule #(.+)$ /? [L,R=301,NE]

You commented that the NE flag may only apply to the target URL and not the rewrite pattern. If that is the case, another approach would be to escape the # sign. mod_rewrite supports \x style escape sequences. The escape sequence for # would be \x23. So your rewrite rule could be:

RewriteRule \x23.+$ / [L,R=301]

If you want to test a solution, you can do so with telnet on the command line. Use the command line telnet example.com 80 to open a socket to your webserver. Then make a simple request like this:

GET /#test HTTP/1.0
Host: example.com

Followed by an extra new line.

Stephen Ostermiller
  • 99,822
  • 18
  • 143
  • 364