A GUIDE TO BLOCKING BAD BOTS WITH .HTACCESS FILES
- Category : Server Administration
- Posted on : Mar 15, 2018
- Views : 1,732
- By : Radcliff S.
One of the issues facing all webmasters is bad bots. Whether it’s comment spam, drive-by hacking attempts, or DDoS attacks, you’ve probably seen the issues some automated traffic can cause.
In this blog post, we’ll be delving into an easy way of stopping common bad bots, using .htaccess files and mod_rewrite. If you’re using the Apache web server, an afternoon of setting up a hardened .htaccess file can save you many headaches down the road.
If you’re not already aware, a .htaccess file is a hidden file (hence the dot in front of it) that gives Apache web servers instructions on how to handle traffic hitting the folder it lives in, and folders below it. It’s a plain text file, which you can just create in a folder.
BLOCKING BAD USER AGENTS
First off, we might want to block some generic bad bots, or user agents clearly indicative of an automated program. Here’s how we do that:
|
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
## Automated HTTP libraries
RewriteCond %{HTTP_USER_AGENT} ^.*(dav.pm/v|libwww-perl|urllib|python-requests|python-httplib2|winhttp.winhttprequest|lwp-request|lwp-trivial|fasthttp|Go-http-client|Java|httplib|httpclient|Zend_Http_Client).*$ [NC]
RewriteRule .* - [F,L]
## Commonly seen in DDoS attacks
RewriteCond %{HTTP_USER_AGENT} ^.*(CtrlFunc|w00tw00t|Apachebench).*$ [NC]
RewriteRule .* - [F,L]
</IfModule>
|
Usually, if a bot’s developer doesn’t bother changing their bot’s user agent from the default, they’re up to no good. You’ll commonly see these kinds of bots probing for phpmyadmin, for example. But we can do more.
INTRO TO BLOCKING HTTP HEADERS
Many bots use valid HTTP user agents, masquerading as a legitimate web browser. Fortunately for us, many of them are still based on the same automated libraries, and often get their HTTP headers slightly wrong, or send different ones from what a human would send. It’s hard to filter these because the same goes for legitimate, good bots (like Google), but let’s block the ones we can:
|
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
## There is no user agent at all
RewriteCond %{HTTP_USER_AGENT} ^s*$
RewriteRule .* - [F,L]
## There is no host header
RewriteCond %{HTTP_HOST} ^$
RewriteRule .* - [F,L]
</IfModule>
|
ADVANCED BLOCKING: WORDPRESS
The next part of this guide assumes you’re running WordPress. It can be adapted to any other software (you should seriously think about doing so!), and it’s some of the most effective filtering in this entire guide. Unfortunately, we can’t account for all software.
The following assumes the wp-login.php lives in the same folder as the .htaccess file you’re creating:
|
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
## We can employ much more strict filtering on login & comment pages, which only humans should ever access.
## We don't need to worry about accidentally filtering good bots here.
## All modern human user agents should contain the string "Mozilla/5.0"
RewriteCond %{THE_REQUEST} ^.*wp-login [OR]
RewriteCond %{THE_REQUEST} ^.*wp-comment
RewriteCond %{HTTP_USER_AGENT} !^.*Mozilla/5.*$ [NC]
RewriteRule .* - [F,L]
## And if we're POSTing to these pages (i.e. clicking a submit button) we should have a referer, too.
RewriteCond %{THE_REQUEST} ^.*wp-login [OR]
RewriteCond %{THE_REQUEST} ^.*wp-comment
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{HTTP_REFERER} ^$
RewriteRule .* - [F,L]
</ifModule>
|
BONUS ROUND: BLOCK HTTP/1.0
HTTP/1.0 is an old version of the HTTP protocol. Humans haven’t used it since the days of netflix, but many bots, both good and bad, still do. Common search engines like Google tend not to. We can turn this to our advantage, but it needs to be done carefully, and tested extensively, as it can block some good bots, or have false positives on servers using a proxy in front of Apache.
If you feel daring, uncomment the version of this rule you prefer:
|
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
## Block all HTTP/1.0 requests site-wide.
## RewriteCond %{THE_REQUEST} HTTP/1.0$
## RewriteRule .* - [F,L]
## OR, block all HTTP/1.0 POST requests site-wide (far less likely to break legitimate things)
## RewriteCond %{THE_REQUEST} HTTP/1.0$
## RewriteCond %{REQUEST_METHOD} POST
## RewriteRule .* - [F,L]
</ifModule>
|
Adapting these rules to your own software and website setup can drastically cut down on comment spam, and even help protect your website from hacking. It’s not a panacea, but it’ll help make life a little easier.
Categories
Subscribe Now
10,000 successful online businessmen like to have our content directly delivered to their inbox. Subscribe to our newsletter!Archive Calendar
Sat | Sun | Mon | Tue | Wed | Thu | Fri |
---|---|---|---|---|---|---|
1 | ||||||
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 | 29 |
30 |
Recent Articles
-
Posted on : Jul 25
-
Posted on : Jul 07
-
Posted on : Apr 07
-
Posted on : Mar 19
Optimized my.cnf configuration for MySQL 8 (on cPanel/WHM servers)
Tags
- layer 7
- tweak
- kill
- process
- sql
- Knowledge
- vpn
- seo vpn
- wireguard
- webmail
- ddos mitigation
- attack
- ddos
- DMARC
- server load
- Development
- nginx
- php-fpm
- cheap vpn
- Hosting Security
- xampp
- Plesk
- cpulimit
- VPS Hosting
- smtp
- smtp relay
- exim
- Comparison
- cpu
- WHM
- mariadb
- encryption
- sysstat
- optimize
- Link Building
- apache
- centos
- Small Business
- VPS
- Error
- SSD Hosting
- Networking
- optimization
- DNS
- mysql
- ubuntu
- Linux