This is a courtesy page to help you to Use
Perl To Fight Spam
The web address at the bottom of this page will assist you to obtain more
information on this issue. (This page is a part-copy from that site).
(From Fighting Spam With Perl # Copyright 1999, Emmie P. Lewis)
Unsolicited commercial email, also known as spam, is a real problem for some people on the
internet. If you are being flooded with unwanted email, this feature is for you. The best
way to fight spam is to avoid giving away your email address at all. But this is difficult
to do when its a required field in order to download that great software you just can't
live without. Or maybe you need to subscribe to a critical email list that keeps you
updated on the latest technology, news, or developments in your field. Once you give away
your email address you have no control over where it goes.
Most legitimate sites have privacy statements that protect their members. You should
always check to see that one exists, and read it if it does. If there is no privacy
statement, but you really need services from that site, you can always sign up for free
email from the many sites available on the internet. That way, if spammers get an address,
it won't be your personal one.
But what if you don't ever sign up for subscriptions or services? Think you're safe? Think
again. There are email harvesting spiders and robots that roam the web looking for email
links on web sites. They work the same way that search index spiders and robots do. They
access the pages on your site and index the information. That's not a bad thing if you
want to get listed on the search engines, but it can be a disaster if you're trying to
protect the privacy of the people who trustingly sign your guestbook.
So how can you protect your site? There are two ways to fight spam using perl scripts. The
first is to spam the spammer. You can write a script to generate fake addresses and give
the spammers exactly what they want: LOTS of email addresses. When they send their
unwanted email to those addresses they will be flooded with bounced email. They still end
up with any addresses on your site, but you'll have the satisfaction of knowing you caused
them at least as much grief as they're dumping on you.
The second way to protect your site is to write a script that checks the HTTP_USER_AGENT
variable to find out who's accessing the page. If its from a known robot, the script exits
without giving any information. If its being accessed from a legitimate browser, then your
email address gets printed to the page as it normally would. This method keeps your
address from spammers, but it doesn't aggravate them.
Personally, I like both methods. I don't want the spammers to have my address, but I
really like the idea of being able to cause some grief for people who don't respect my
privacy. I'll show you how to write a script that implements both ways to fight spam. As
with all scripts presented at this site, you can use it as a launching point for your own
scripts.
The script will check to see who's accessing your page by checking the HTTP_USER_AGENT
environment variable. But first, we have to initialize the list of known robots, and to do
that just check the E-Mail Collectors
List. I have included the complete list as of 07/18/99. Be sure and
periodically check for updates, as new robots are sure to be released. Here's the list of
names that will show up in HTTP_USER_AGENT:
@Robot = ('TestThisScript', 'EmailSiphon', 'CherryPickerSE/1.0', 'CherryPickerElite/1.0',
'Crescent Internet ToolPak HTTP OLE Control v.1.0',
'EmailCollector/1.0', 'EmailWolf 1.00',
'Mozilla/2.0 (compatible; NEWT ActiveX; Win32)', '/0.5
libwww-perl/0.40',
'empty');
Note that the first element, 'TestThisScript', is my own addition for the purposes of
testing this script. If you want to see how the script works, you'll manually assign the
variable that holds the HTTP_USER_AGENT information. That will trick the script into
thinking that a robot is accessing your page, and you can see the resulting list of fake
email addresses. Just uncomment the line when you see:
# Uncomment the next line of code to see what will happen when a robot
# visits this page.
$user_agent[0] = 'TestThisScript';
Of course, comment it out when you use this on your web pages.
Because the script will print your email link when the page is accessed by a browser,
you'll also have to initialize your email address, and the text for the link.
# First, initialize $email with the correct email address
$Email = "user\@domain.com";
# Now, initalize the text for the link
$LinkText = "Click me!";
Next, we need to save the HTTP_USER_AGENT information.
# Find out who's visiting this page
@user_agent = split(/\//,$ENV{'HTTP_USER_AGENT'});
If this code is unfamiliar to you, please see my previous features Reading CGI
Data, and Understanding
Environment Variables.
Finally, we'll be using the rand() function, so we'll have to seed the randomizer first,
by calling srand(). The reason is that the rand function doesn't produce truly random
numbers. If you don't seed it first, it will always produce the same set of numbers.
srand() takes an argument that is used as the seed. If you don't pass an argument, it will
use the current value from time. The effect will be to produce a series of seemingly
random values, which we'll use to build fake email addresses. The rand() function should
not be used in strong cryptography because it can be easily broken if a hacker can figure
out what you're using for the seed. But rand() will certainly serve our purpose here
without any problem.
The call to srand() is only made once, and that is in the main program. The reason is that
if it is included in the subroutine that generates the fake addresses, it will be called
1000 times with close to the same time value. Because the same seed produces the same set
of values the email addresses will all be the same, and that's not what we want. A friend
of mine used to say "If you're going do something, do it first class". Since
we're planning to annoy the spammers, there's no point in making it easy for them by
giving them 1000 of the same email addresses, right? Let's make them work a little.
Now that all the important variables are initialized, let's work on the guts of the code.
We'll loop through the list of known robots, looking for a match in HTTP_USER_AGENT. If we
find a match we'll generate 1000 fake email addresses, and then we'll exit the script. If
there isn't a match, then we know that an interested user has arrived, so we'll print your
email link as it should normally appear. Here is the code.
foreach (@Robot) {
# If we found a robot, let's give it 1000 fake addresses.
if ($user_agent[0] eq $_) {
local $iCounter;
for ($iCounter = 0; $iCounter < 1000;
$iCounter++) {
# Get a new fake email
address
$Email =
&GenerateFakeAddress;
print "<A
HREF=\"mailto:$Email\">$Email</A><BR>";
}
# OK, we've done the job and now we can exit
this script
exit;
}
}
# If the script didn't exit, then we have a legitimate visitor, so
# print the right e-mail address
print "<A HREF=\"mailto:$Email\">$LinkText</A>";
As you can see, the code isn't very complicated. The real work is done in the function
GenerateFakeAddress.
For those of you unfamiliar with functions (also called subroutines), they are an easy way
to consolidate code that gets repeated a number of times. Using them makes your code
easier to read and debug, so its important to know how to use them. Briefly, a function is
defined as follows.
# define a new function
sub Function {
# your code goes here
}
When you call a function from the main program, precede the function name with '&',
like this
&Function;
Here the code for GenerateFakeAddress:
# generate fake mailto: addresses and links
sub GenerateFakeAddress {
local @Domains =
(".com",".net",".us",".edu",".nl",".de",".it",".se",".ch",
".uk",".ca",".hr",".ae",".br",".jp",".be",".us",".au",".ie",
".ar",".fi",".mil",".gov",".sg",".es",".mx",".no",".pt",
".dk",".il",".ru",".nz",".th",".pl",".id",".cy",".in",".kw",
".at",".za",".cn",".fr",".is",".ro",".kr",".gr",".co",".ph",
".bo",".hu",".cr",".pe",".cl",".tr",".arpa",".tw",".eg",
".ee",".ge",".ua",".om",".ec",".hk",".ve",".ag",".cz",".ni",
".to",".nu",".sm",".ni",".lt",".yu",".bg",".ba",".do",".qa",
".ck",".mt",".bf",".lu",".su",".bh");
local $FakeAddress = &GetWord . "\@" . &GetWord .
$Domains[rand(82)];;
return $FakeAddress;
}
You will see that it calls another function GetWord, which generates a string of 13 random
letters. Here is the code:
sub GetWord {
local @Letter =
("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o",
"p","q","r","s","t","u","v","w","x","y","z");
# initialize the variables
local $Word;
local $iWordCounter;
# Generate a string of 13 random letters
for ($iWordCounter=0; $iWordCounter < 13; $iWordCounter++) {
$Word .= $Letter[rand(26)];
}
return($Word);
}
Both functions use rand() to produce an email address that is fake, but has a legitimate
domain. This will make it harder to know that the address is bogus. In GetWord the loop
will cycle 13 times, and each cycle will add a random letter to a string. The string is
returned to GenerateFakeAddress, and is used to build the address. You will note that in
GetWord rand() has a parameter (26), which defines the range of values that it can return.
We use (26) because that's the number of letters in the alphabet. In GenerateFakeAddress,
the parameter value is (82), which corresponds to the number of domain name extensions in
the list.
Both functions use 'local' when declaring variables. What this means is that variables
such as $Word exist only within the function. Although different functions may use the
same variable name, any changes will only affect the variable in the function. Normally,
perl variables are global, which means that any function can change the value for the
whole program. I fell into that trap when I first wrote the script. I have almost 10 years
experience programming in C, and when I wrote the looping code in this script I used the C
conventions without thinking about it:
for( $i=0; $i<13; $i++)
Trouble is, there are several loops in the script, and they all used $i. It would complete
a loop in a function, and think it was done when it returned from the function. Duh. So
you see, even seasoned veterans can make silly mistakes if they're rushed. Using different
variable names works, but can be a headache in larger scripts. Do yourself a favor and use
'local'.
The script included in this feature requires SSI, or 'Server Side Includes'. This is
something that is configured on your server. If you don't know if it is, you'll have to
check with your system administrator. You'll also need to know if there are any special
extensions for the HTML files that call this script. Sometimes you'll be required to add a
special extension like .shtml instead of .htm or .html. Your system administrator can
answer these questions as well.
To call the script use the following code:
spam.shtml
<HTML>
<BODY BGCOLOR="#FFFFFF">
<!--#exec cgi="/cgi-bin/spam.cgi"-->
</BODY>
</HTML>
You may, of course, have to change the path to spam.cgi. If you plan to use the script in
your web pages, replace the link to your email address link
<a href="mailto:perl.guide@about.com"> Emmie Lewis - Perl Guide</a>
with
<!--#exec cgi="/cgi-bin/spam.cgi"-->.
Here is the code for spam.cgi.
#!/usr/local/bin/perl
#===============================
# Fighting Spam With Perl
# Copyright 1999, Emmie P. Lewis
# Created 07/18/99
#===============================
# This script is designed to
# determine who is accessing a
# web page, and if it's an email
# harvesting robot, generate 1000
# fake email addresses. Otherwise
# it will print a link to the
# correct email address
#===============================
print "Content-type:text/html\n\n";
#===================================
# A list of known robots to hide from
# The current list is at http://www.soclair.ch/resources/useragents/emailgrabbers.html
# NOTE: 'TestThisScript' is not a real robot! It is included in this list to test
# this script, and to see how it works from a legitimate browser.
@Robot = ('TestThisScript', 'EmailSiphon', 'CherryPickerSE/1.0', 'CherryPickerElite/1.0',
'Crescent Internet ToolPak HTTP OLE Control v.1.0',
'EmailCollector/1.0', 'EmailWolf 1.00',
'Mozilla/2.0 (compatible; NEWT ActiveX; Win32)', '/0.5
libwww-perl/0.40',
'empty');
# Initialize default info for the email link
# First, initialize $email with the correct email address
$Email = "user\@domain.com";
# Now, initalize the text for the link
$LinkText = "Click me!";
# Seed the rand function.
srand;
# Find out who's visiting this page
@user_agent = split(/\//,$ENV{'HTTP_USER_AGENT'});
# Uncomment the next line of code to see what will happen when a robot
# visits this page.
# $user_agent[0] = 'TestThisScript';
foreach (@Robot) {
# If we found a robot, let's give it 1000 fake addresses.
if ($user_agent[0] eq $_) {
# Give this counter a unique name so it won't
be
# accidentally changed by a subroutine
local $iCounter;
for ($iCounter = 0; $iCounter < 1000;
$iCounter++) {
# Get a new fake email
address
$Email =
&GenerateFakeAddress;
print "<A
HREF=\"mailto:$Email\">$Email</A><BR>";
}
# OK, we've done the job and now we can exit
this script
exit;
}
}
# If the script didn't exit, then we have a legitimate visitor, so
# print right e-mail address
print "<A HREF=\"mailto:$Email\">$LinkText</A>";
# generate fake mailto: addresses and links
sub GenerateFakeAddress {
local @Domains =
(".com",".net",".us",".edu",".nl",".de",".it",".se",".ch",
".uk",".ca",".hr",".ae",".br",".jp",".be",".us",".au",".ie",
".ar",".fi",".mil",".gov",".sg",".es",".mx",".no",".pt",
".dk",".il",".ru",".nz",".th",".pl",".id",".cy",".in",".kw",
".at",".za",".cn",".fr",".is",".ro",".kr",".gr",".co",".ph",
".bo",".hu",".cr",".pe",".cl",".tr",".arpa",".tw",".eg",
".ee",".ge",".ua",".om",".ec",".hk",".ve",".ag",".cz",".ni",
".to",".nu",".sm",".ni",".lt",".yu",".bg",".ba",".do",".qa",
".ck",".mt",".bf",".lu",".su",".bh");
local $FakeAddress = &GetWord . "\@" . &GetWord .
$Domains[rand(82)];;
return $FakeAddress;
}
sub GetWord {
local @Letter =
("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o",
"p","q","r","s","t","u","v","w","x","y","z");
# initialize the variables
local $Word;
local $iWordCounter;
# Generate a string of 13 random letters
for ($iWordCounter=0; $iWordCounter < 13; $iWordCounter++) {
$Word .= $Letter[rand(26)];
}
return($Word);
}
Fighting Spam With Perl # Copyright 1999, Emmie P. Lewis http://www.siteware.ch/webresources/useragents/collectors/
The following email address(s) is/are not copyright and are free to all email robots that
steal email@emailrobotsetc.com etc etc etc
email@billsrobotsetc.com email@billsrobotsetc.net email2@billsrobotsetc.com email3@billsrobotsetc.com email4@billsrobotsetc.com
emaila@billsrobotsetc.com emaila@billsrobotsetc.net emle@bollsribotsetc.com emoil@bullsrebotsetc.net eoil@ullsrebtsetc.net emoil@bullsrebotsetc.com
eoil@bullsrebotsetc.net emoil@bullsrebsetc.net eoil@pullsrebotsetc.net emoil@bullsrelpsetc.net emoil@blsrebotsetc.com emoil@bullsrebotsetc.au
bill@ballsrebotsetc.net pat@bull123rebotetc.net mal@myrebotetc.net tue@myrebotetc.net paul@myrebotetc.com mal@myrebotetc.kl mal@myrebotetc.pl
potts@mymaicebotetc.com potts@mymaicebotetc.net potts@mymaicebotetc.gh potts@mymaicebotetc.lo potts@mymaicebotetc.co potts@mymaicebotetc.om
potts@mymaicebotetc.cm potts@mymaicebotetc.cmm potts@mym321ebotetc.com pits@mymaicebotetc.com putts@mymaicebotetc.com
potts@mymaicebotetc.qe potts@mymaicebotetc.pl lotts@mymaicebotetc.net potts@mymaicebotetc.lk potts@mymaicebotetc.rt potts@mymaicebotetc.lc
potts@mymaicebotetc.as adam@bigtop123ax.com adam@bigtop123tap.com dam@bigtop123tap.com dam@bigtop100tap.com dam@bigtop123tap.net
sam@big957toptap.com sam@big957toptap.co sam@big957toptap.cm sam@big957toptap.cl sam@big957toptap.lm sam@big957toptap.bn
enid@big957toptap.com enid@big957toptap.net enid@big957toptap.di enid@big957toptap.au enid@big957toptap.ca enid@big957tap.ca
nid@big957tap.la nid@big95xtap.ca nid@big9jktap.ca
nid@big9df7tap.ca nid@big95ddtap.ca nid@big9ght7tap.ca
nid@tat957topap.ca
samid@tat957topap.ca sanid@tat957topap.ca sanid@tot957topap.ca sanid@tat453topap.ca sanid@tattopap.ca sanid@tatdg57topap.ca
sanid@tat231pap.ca said@patpolpap.uk said@patpolpap.net
said@patpolpap.usa aid@patpolpap.bg stay@patpolpap.uk
said@patpolpap.nu
said@patpolpip.nu sid@patpolpip.nu
tom@patpolpip.nu sara@patpolpip.nu
das@patpolpip.nu said@padfolpip.nu
said@patpolpip.ni said@patpolpip.no
said@papolpip12.nu said@patpolpip.un said@patwpolpip.nu
sid@patpolpip.nu mad@robatpolpip.nu
me@patpolpip.nu me@patpolpip.un
me@patpolpip.ni
itsme@patpolpip.nu me@2ppolpip.nu
wasme@patpolpip.nu notme@9patpolpip.nu me@patpolpop.nu
me@patpolpip.ni me@itspolpip.lo
me@tobadpolpip.nu
-------------------------------------
email@billsrobotsetc.com email@billsrobotsetc.net email2@billsrobotsetc.com email3@billsrobotsetc.com email4@billsrobotsetc.com
emaila@billsrobotsetc.com emaila@billsrobotsetc.net emle@bollsribotsetc.com emoil@bullsrebotsetc.net eoil@ullsrebtsetc.net emoil@bullsrebotsetc.com
eoil@bullsrebotsetc.net emoil@bullsrebsetc.net eoil@pullsrebotsetc.net emoil@bullsrelpsetc.net emoil@blsrebotsetc.com emoil@bullsrebotsetc.au
bill@ballsrebotsetc.net pat@bull123rebotetc.net mal@myrebotetc.net tue@myrebotetc.net paul@myrebotetc.com mal@myrebotetc.kl
mal@myrebotetc.pl
potts@mymaicebotetc.com potts@mymaicebotetc.net potts@mymaicebotetc.gh potts@mymaicebotetc.lo potts@mymaicebotetc.co potts@mymaicebotetc.om
potts@mymaicebotetc.cm potts@mymaicebotetc.cmm potts@mym321ebotetc.com pits@mymaicebotetc.com putts@mymaicebotetc.com
potts@mymaicebotetc.qe potts@mymaicebotetc.pl lotts@mymaicebotetc.net potts@mymaicebotetc.lk potts@mymaicebotetc.rt potts@mymaicebotetc.lc
potts@mymaicebotetc.as adam@bigtop123ax.com adam@bigtop123tap.com dam@bigtop123tap.com dam@bigtop100tap.com dam@bigtop123tap.net
sam@big957toptap.com sam@big957toptap.co sam@big957toptap.cm sam@big957toptap.cl sam@big957toptap.lm sam@big957toptap.bn
enid@big957toptap.com enid@big957toptap.net enid@big957toptap.di enid@big957toptap.au enid@big957toptap.ca enid@big957tap.ca
nid@big957tap.la nid@big95xtap.ca nid@big9jktap.ca
nid@big9df7tap.ca nid@big95ddtap.ca
nid@big9ght7tap.ca nid@tat957topap.ca
samid@tat957topap.ca sanid@tat957topap.ca sanid@tot957topap.ca sanid@tat453topap.ca sanid@tattopap.ca sanid@tatdg57topap.ca
sanid@tat231pap.ca said@patpolpap.uk
said@patpolpap.net said@patpolpap.usa aid@patpolpap.bg stay@patpolpap.uk
said@patpolpap.nu
said@patpolpip.nu sid@patpolpip.nu
tom@patpolpip.nu sara@patpolpip.nu
das@patpolpip.nu said@padfolpip.nu
said@patpolpip.ni said@patpolpip.no
said@papolpip12.nu said@patpolpip.un said@patwpolpip.nu
sid@patpolpip.nu mad@robatpolpip.nu
me@patpolpip.nu me@patpolpip.un
me@patpolpip.ni
itsme@patpolpip.nu me@2ppolpip.nu
wasme@patpolpip.nu notme@9patpolpip.nu me@patpolpop.nu
me@patpolpip.ni me@itspolpip.lo
me@tobadpolpip.nu
Courtesy: anti spam dot limited