Using .htaccess to minimise comment and referrer spam

I have been using my .htaccess file to stop comment and referrer spam on this site and it has been surprisingly successful (so far!). How do I create a .htaccess file capable of greatly reducing comment and referrer spam?

Firstly, I use Awstats to analyse visits to my site daily and I use Spam Karma to help control comment spam. Both applications give me information on spammers visiting my site.

Awstats gives me a list of the referer sites – this list contains those sites which are trying to spam my referrer logs. I monitor those sites and as new ones appear I add them to my .htaccess list in the form:
RewriteCond %{HTTP_REFERER} \.domain\.tld [NC]
where .domain is the domain trying to spam my site (psxtreme, freakycheats, terashells, and so on) and the .tld is the top level domain the site is registered to (.com, .net, .org, .info, etc.).

So, for instance, in the case of the spammer coming from the smsportali.net domain, I have added the following line to my .htaccess code:
RewriteCond %{HTTP_REFERER} \.smsportali\.net [NC]
This will stop accesses from all subdomains of smsportali.net (spamterm.smsportali.net) to the site and the NC ensures that this rule is case insensitive.

In the case of comment spam, I have configured Spam Karma to email me every time it deletes a spam comment – this is becoming rarer and rarer as the .htaccess file becomes more and more effective. I have configured Spam Karma to include the server variables and request headers of a comment that is not approved in the email – this is one of the configuration options of this plugin.

Scanning these emails, I can see the User Agents being employed by these spammers – armed with this information, I added the following lines to my .htaccess file:
RewriteCond %{HTTP_USER_AGENT} Indy.Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Crazy\ Browser [NC]
RewriteRule .* – [F]
and this has greatly reduced the amount of comment spam coming through.

Also, Cindy alerted me to the fact that adding:
RewriteCond %{HTTP:VIA} ^.+pinappleproxy [NC]
RewriteRule .* – [F]
Will also catch a lot of the spammers.

I have a copy of my .htaccess file available for review (it is in .txt format).

NOTE:
For each set of rules in your .htaccess file, you need to finish with a RewriteRule – RewriteRule .* – [F] will give a 403 (page forbidden) to the spammers. Your last set of rules should end with RewriteRule .* – [F,L] – the L telling the RewriteEngine that this is the last line and to stop processing the rules here.

IMPORTANT WARNING:
the .htaccess file is a very unforgiving file. It has the power to make your entire site unavailable to anyone. It is strongly advised to read up on Regular Expressions and Mod_Rewrite (the Apache module which processes these commands in a .htaccess file) before creating a .htaccess file or modifying an existing one.

If you enjoyed this post, make sure you subscribe to my RSS feed!

36 Responses to “Using .htaccess to minimise comment and referrer spam”


  1. 1 Matthew Bartlett

    Cheers Mr Tom, this looks very helpful.

  2. 2 pericat

    It looks like you’re having problems with many of the same spammers as I am, but your htaccess file is much cleaner than the one I’ve been building. Thanks for publishing it!

  3. 3 lucia

    Because .htaccess is so powerful, I set up up a special directory to test the file. The directory contains an “index.html” file and a second one. I know I can’t fully test who is blocked, but I can at least test to make sure the new .htaccess file won’t block my whole site from absolutely postively everyone including me.

    This is useful for new users like me, who have never, ever, ever fiddles with the htaccess file.

  4. 4 Mark J

    Lucia, that’s an excellent idea. This is useful for everyone… even advanced users make stupid typos!

  5. 5 Michele

    Hi Tom, you might me interested in my tip on how to remove spamming referreres from your Apache logs too.

  6. 6 Charles

    I think your list of “bad words” could be shortened considerably by the simple expedient of using PHP’s “explode” and “implode” on the comments. Explode it to remove hyphens, and then implode to put spaces back in. Then process. Then you’ll weed out domains like buy-this-stuff when actually what you want to block is the word ’stuff’.

    I’ve done this on my 1.2 implementation but am still screwing up the courage to up to 1.5, as I hear it breaks various plugins I rely on. Spaminator, for one.

  7. 7 Tom Raftery

    Hi Charles,

    thanks for stopping by.

    If you combine your comment on imploding and exploding bad words with my post on stopping comment spam without comment spam plugins, you should have no problem upgrading.

  8. 8 Todd

    Oh, the irony of having comment spam in the comments of a post about getting rid of comment spam…

    Good tips here and I plan to make use of them. My main site doesn’t use WordPress, but a small CMS that I am proud to have modified to its current state (and am stubborn to give on up for that reason). As such, comment moderation is a bit difficult, so blocking spammers before they even reach the comment page is good for me. Thanks.

  9. 9 Tom Raftery

    Thanks Todd – I’m not sure how they got through (!), gone now though!

    Great, let me know how you get on.

    Cheers,

    Tom

  10. 10 Brian Layman

    Can you explain the Files trackback part of the code? I don’t have a file called trackback. Does it partial match on the name? Should it really be
    wp-trackback.php? That doesn’t seem right either or you may as well be blocking the get access…

    Also, isn’t there harm in blocking all mozilla and opera browsers from posting track backs?

    (Hopefully this next bit will look right in the comment… sorry if it doesn’t )

    Here’s the section I am talking about:
    # From Spamhuntress – code to deny the below user agents POST access to trackback

    # From Spamhuntress - code to deny the below user agents POST access to trackback
    <Files trackback>
    <limit POST>  

    SetEnvIf User-Agent ”Mozilla” trackers
    SetEnvIf User-Agent ”Opera” trackers
    SetEnvIf User-Agent ^$ trackers

    Order Allow,Deny
    Allow from all
    Deny from env=trackers

    </limit>

  11. 11 Tom Raftery
  12. 12 Johan Adler

    What my blog says in Swedish is that using your .htaccess leaves SpamKarma out of work. I have hardly had any spam (usually caught by SK) since I modified my .htaccess with your code.

    I also write about having considered to exchange your list of known spammers for the compact optimized regex version of ReferrerCops blacklist. It might give Apache and the server a harder time, with all those regular expressions, but there should be few spammers passing by that test.

    I also mention Chongqed’s blacklist, no regex, quite long.

    You are quite right in your comment on my site, I am positive. :-)

    Regards,
    Johan Adler
    Sweden

  13. 13 Tom Raftery

    Great Johan – glad it was of some use to you (and apologies for my lack of Swedish!).

    By the way, I have started using Akismet recently and I find it is the best anti-spam tool i have come across yet!

  14. 14 Anonymous

    Clarification: I have thought of using ReferrerCops regular expressions blacklist, but putting their regexes in your .htaccess.

    I have not switched to Akismet, have not bothered to get the needed API. Inspired by you, I got the key (and a wordpress.com blog that will be unused, waste of space) and activated Akismet. It might not have much work to do either. ;-)

  15. 15 Johan Adler

    Sorry Tom, last comment was mine. I am a bit tired. Maybe you could put my name on it and delete this one?

  16. 16 Brian Layman

    Oh- I meant to tell you this a month ago but you dropped the

    RewriteEngine On

    line from your .htaccess file.

    As I understand it, that line is somewhat important for the rest of the file to work right on most apache servers…

    http://www.apacheref.com/ref/mod_rewrite/RewriteEngine.html
    “By default, rewrite configurations are not inherited. Thus you need a RewriteEngine directive to switch this configuration on for each virtual host in which you wish to use it. ”

    But then again most all I know about .htaccess files, I learned from you so why should the student question the master!

  17. 17 Tom Raftery

    D’Oh!

    Thanks for the heads up Brian – hope that hasn’t been missing too long.

    I put it back in now (and I’m hardly a master – I just read up on that stuff when I was having spam problems – I’ve forgotten a lot of it now :-( ),

    Cheers,

    Tom.

  18. 18 tobto

    If I would use htaccess with this:
    —–
    Allow from yoursite.com
    —–
    Will it work as antispam / badreferer filter?

  19. 19 Brian Layman

    There are two BIG limitations of your method.

    The first big problem is with referrer checking in general. The problem is that many of today’s browsers and firewalls now strip the referrer information. It’s a privacy thing.

    Therefore if you implement that check, you will block a lot of legitmate traffic that comes to you with a blank referrer. Likewise, if you allow blanks, you will let in all of the spam bots that don’t send a referrer.

    Additionally, if you did something like:

    <
    order deny,allow
    deny from all
    allow from yoursite.com
    </Limit>

    How would the people get to your site initially? If they browsed straight to it, they would get permission denied error.

    This might work in some limited office environment on a subdirectory that should only be accessed from the site and using the company approved browser. Then you could consider all traffic without a reffer ilegitimate. Basically, if you have control over some of the variables in the situation, YMMV.

  20. 20 tobto

    thanks Brian for comment!

    ok. I have everyday spamming:
    - ip is different everytime
    - text the SAME, but antispam-bad-words-list doesn’t filter it.
    things like: -0-XXX-0, -0 XXX 0 -, etc.

    I can’t even imagine how to prevent spam, besides send *** kind words to spammers :)

    I suppose to use referrer check for spam-machine – if it isn’t yoursite.com that is badguy-goodbye.com 8)

  21. 21 Alfenet

    How can I create a .htaccess file for joomla cms?

  22. 22 Белая Церковь

    Alfenet, i saw some example for joomla .htaccess here – http://forum.joomla.org/. try use a search

  23. 23 greeningreen.com

    Our referer info is almost always showing up stripped in the logs now… any ideas? Otherwise, very useful here.

  24. 24 Json

    How does it work when “page.php” is in one, two, three level deep folder?
    Is it like this:
    RewriteCond %{REQUEST_URI} .folder1/folder2/folder3/page\.php*
    RewriteCond %{REQUEST_URI} .folder1/folder2/page\.php*
    RewriteCond %{REQUEST_URI} .folder1/page\.php*

    OR just like this:
    RewriteCond %{REQUEST_URI} .page\.php*

    Cheers

  25. 25 Json

    I am new to .htaccess and have to ask…
    Q1: Can I use this for any page that is posting data?
    Q2: If Q1 is YES, my page is one folder deep, ie:comments/page.php
    Do I do this:
    RewriteCond %{REQUEST_URI} .comments/page.php\.php*
    Or this:
    RewriteCond %{REQUEST_URI} .comments\page.php\.php*
    Or this:
    RewriteCond %{REQUEST_URI} .http://www/example.com/comments/page.php\.php*
    Or this:
    RewriteCond %{REQUEST_URI} ./var/htdocs/web/comments/page.php\.php*

    Any help would be great.
    Cheers

  26. 26 Brian Layman

    wow… the thread from beyond the grave! 4 years later and I’m still getting comment notifications :) We’ll I’m just stopping by because this thread seems like an old friend. Cheers all!

  27. 27 Kim Steinhaug

    Might be the thread from the grave, however a great thread and interesting use of .htaccess. I did some google and this is what i got, a great article. I especially liked the SetEnvIfNoCase method which seems very clean endeed!

  28. 28 janetalkstech (Jane Ullah)

    Twitter Comment


    the wp codex page on blocking bad referrer spam: http://bit.ly/cICGsk < - just what I needed. See this too: [link to post]

    Posted using Chat Catcher

  1. 1 orbitalworks
  2. 2 CLAMP Campus Adventures
  3. 3 Tom Raftery’s I.T. views » Blog Archive » Comment spam plugins no longer required!
  4. 4 nyfiken blog » Arkiv » .htaccess gör SpamKarma arbetslös
  5. 5 Gloria Weblog » Blog Archive »
  6. 6 Gedanken am Balkon über den Balkon » Blog Archive » Kommentare von Maschinen
  7. 7 Fight Blog Spam with Apache
  8. 8 nyfiken blog » arkiv » Blogg-spam, .htaccess igen

Leave a Reply