This is Interesting: Free Magazines for Graphics designers and webmasters  


Home > Archive > Webmaster forum > April 2006 > Valid or Legitimate Hoovers?





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Valid or Legitimate Hoovers?
David Cary Hart

2006-04-24, 6:56 pm

I have started to "block" Java/1.x, NASA Search and some other user
agents. The first GET is redirected so that they receive
a short explanatory "sitesucker.html" and they are added to the
firewall for thirty minutes.

Does anyone have an opinion on how much legitimate traffic I might be
blocking?

I am getting an email on each incident. So far, not one of these has
first retrieved robots.txt. Many of these agents seem to be broken;
They attempt to GET external links through our server. They also
download every file that is linked to.
--
Displayed Email Address is a SPAM TRAP
Our DNSRBL - Eliminate Spam: http://www.TQMcube.com
Multi-RBL Check: http://www.TQMcube.com/rblcheck.php
The Dirty Dozen Spammiest Ranges: http://tqmcube.com/dirty12.php
William Tasso

2006-04-24, 6:56 pm

Fleeing from the madness of the www.TQMcube.com jungle
David Cary Hart <Deming2U@TQMcube.com> stumbled into news:alt.www.webmaster
and said:

> and they are added to the
> firewall for thirty minutes.
> ...


how are you doing that?

--
William Tasso

http://williamtasso.com/words/what-is-usenet.asp
David Cary Hart

2006-04-24, 6:56 pm

On Mon, 24 Apr 2006 18:52:58 +0100
"William Tasso" <SpamBlocked@tbdata.com> opined:
> Fleeing from the madness of the www.TQMcube.com jungle
> David Cary Hart <Deming2U@TQMcube.com> stumbled into
> news:alt.www.webmaster and said:
>
>
> how are you doing that?
>

1. The redirect is in httpd.conf:
ReWriteEngine on
RewriteCond %{HTTP_USER_AGENT} NASA.* [OR]
. . .
RewriteRule !sitesucker /sitesucker.htm [R]

The "R" flag causes the sitesucker.htm to be delivered as such.

2. Swatch:

/usr/bin/swatch --use-cpan-file-tail --config-file=/etc/swatch2.conf \
--daemon --awk-field-syntax --tail-file=/var/log/httpd/access_log

Swatch.conf
. . .
watchfor /sitesucker\.htm/
exec "/root/ipt-httpd $1"
. . .

3. ipt-httpd (there is a built-in function for timed rules but I have
never had much success and I am concerned about throughput. This
method uses at)

#!/bin/bash
isource=$1
iptables -A INPUT -s $isource -j HTTP
iptables -A OUTPUT -d $isource -j HTTPD-OUT
iptables-save >/etc/sysconfig/iptables
echo "iptables -D INPUT -s $isource -j HTTP" >/root/$isource.at
echo "iptables -D OUTPUT -d $isource -j HTTPD-OUT" >>/root/$isource.at
echo "iptables-save >/etc/sysconfig/iptables" >>/root/$isource.at
echo "rm -f /root/$isource.at" >>/root/$isource.at
at -qb -f /root/$isource.at now + 30 minutes
sleep 5
host=`host $isource`
echo "Host: $host" >/root/httpd-$isource
echo "-------------------------------------------------------" >>/root/httpd-$isource
echo "Offending Lines:" >>/root/httpd-$isource
echo "-------------------------------------------------------" >>/root/httpd-$isource
grep -i $isource /var/log/httpd/access_log >>/root/httpd-$isource
grep -i $isource /var/log/httpd/error_log >>/root/httpd-$isource
echo "-------------------------------------------------------" >>/root/httpd-$isource
grep -i $isource /var/log/messages >>/root/httpd-$isource
echo "-------------------------------------------------------" >>/root/httpd-$isource
atq >>/root/httpd-$isource
cat /root/httpd-$isource|mail -s "Swatch IPTables HTTPD Rule Add - $isource" root
rm -f /root/httpd-$isource

--
Displayed Email Address is a SPAM TRAP
Our DNSRBL - Eliminate Spam: http://www.TQMcube.com
Multi-RBL Check: http://www.TQMcube.com/rblcheck.php
The Dirty Dozen Spammiest Ranges: http://tqmcube.com/dirty12.php
William Tasso

2006-04-24, 6:56 pm

Fleeing from the madness of the www.TQMcube.com jungle
David Cary Hart <Deming2U@TQMcube.com> stumbled into news:alt.www.webmaster
and said:

> ...
> iptables -A INPUT -s $isource -j HTTP


Ahh - got it, the firewall is on the same server as the web server.

Cheers
--
William Tasso

http://williamtasso.com/words/what-is-usenet.asp
David Cary Hart

2006-04-25, 7:06 pm

On Mon, 24 Apr 2006 20:22:06 +0100
"William Tasso" <SpamBlocked@tbdata.com> opined:
> Fleeing from the madness of the www.TQMcube.com jungle
> David Cary Hart <Deming2U@TQMcube.com> stumbled into
> news:alt.www.webmaster and said:
>
>
> Ahh - got it, the firewall is on the same server as the web server.
>

Actually it's not. The iptables command is linked to a script to the
firewall machine via ssh.

--
Displayed Email Address is a SPAM TRAP
Our DNSRBL - Eliminate Spam: http://www.TQMcube.com
Multi-RBL Check: http://www.TQMcube.com/rblcheck.php
The Dirty Dozen Spammiest Ranges: http://tqmcube.com/dirty12.php
William Tasso

2006-04-25, 7:06 pm

Fleeing from the madness of the www.TQMcube.com jungle
David Cary Hart <Deming2U@TQMcube.com> stumbled into news:alt.www.webmaster
and said:

> On Mon, 24 Apr 2006 20:22:06 +0100
> "William Tasso" <SpamBlocked@tbdata.com> opined:
> Actually it's not. The iptables command is linked to a script to the
> firewall machine via ssh.


even better - thanks.

--
William Tasso

http://williamtasso.com/words/what-is-usenet.asp
Sponsored Links


Copyright 2003 - 2008 forum4designers.com  Software forum  Computer Hardware reviews