This is a courtesy page to help you to Use Perl To Fight Spam

The web address at the bottom of this page will assist you to obtain more information on this issue. (This page is a part-copy from that site).
(From Fighting Spam With Perl # Copyright 1999, Emmie P. Lewis)

Unsolicited commercial email, also known as spam, is a real problem for some people on the internet. If you are being flooded with unwanted email, this feature is for you. The best way to fight spam is to avoid giving away your email address at all. But this is difficult to do when its a required field in order to download that great software you just can't live without. Or maybe you need to subscribe to a critical email list that keeps you updated on the latest technology, news, or developments in your field. Once you give away your email address you have no control over where it goes.

Most legitimate sites have privacy statements that protect their members. You should always check to see that one exists, and read it if it does. If there is no privacy statement, but you really need services from that site, you can always sign up for free email from the many sites available on the internet. That way, if spammers get an address, it won't be your personal one.

But what if you don't ever sign up for subscriptions or services? Think you're safe? Think again. There are email harvesting spiders and robots that roam the web looking for email links on web sites. They work the same way that search index spiders and robots do. They access the pages on your site and index the information. That's not a bad thing if you want to get listed on the search engines, but it can be a disaster if you're trying to protect the privacy of the people who trustingly sign your guestbook.

So how can you protect your site? There are two ways to fight spam using perl scripts. The first is to spam the spammer. You can write a script to generate fake addresses and give the spammers exactly what they want: LOTS of email addresses. When they send their unwanted email to those addresses they will be flooded with bounced email. They still end up with any addresses on your site, but you'll have the satisfaction of knowing you caused them at least as much grief as they're dumping on you.

The second way to protect your site is to write a script that checks the HTTP_USER_AGENT variable to find out who's accessing the page. If its from a known robot, the script exits without giving any information. If its being accessed from a legitimate browser, then your email address gets printed to the page as it normally would. This method keeps your address from spammers, but it doesn't aggravate them.

Personally, I like both methods. I don't want the spammers to have my address, but I really like the idea of being able to cause some grief for people who don't respect my privacy. I'll show you how to write a script that implements both ways to fight spam. As with all scripts presented at this site, you can use it as a launching point for your own scripts.

The script will check to see who's accessing your page by checking the HTTP_USER_AGENT environment variable. But first, we have to initialize the list of known robots, and to do that just check the E-Mail Collectors List. I have included the complete list as of 07/18/99. Be sure and periodically check for updates, as new robots are sure to be released. Here's the list of names that will show up in HTTP_USER_AGENT:

@Robot = ('TestThisScript', 'EmailSiphon', 'CherryPickerSE/1.0', 'CherryPickerElite/1.0',
    'Crescent Internet ToolPak HTTP OLE Control v.1.0', 'EmailCollector/1.0', 'EmailWolf 1.00',
    'Mozilla/2.0 (compatible; NEWT ActiveX; Win32)', '/0.5 libwww-perl/0.40',
    'empty');

Note that the first element, 'TestThisScript', is my own addition for the purposes of testing this script. If you want to see how the script works, you'll manually assign the variable that holds the HTTP_USER_AGENT information. That will trick the script into thinking that a robot is accessing your page, and you can see the resulting list of fake email addresses. Just uncomment the line when you see:

# Uncomment the next line of code to see what will happen when a robot
# visits this page.
$user_agent[0] = 'TestThisScript';

Of course, comment it out when you use this on your web pages.

Because the script will print your email link when the page is accessed by a browser, you'll also have to initialize your email address, and the text for the link.

# First, initialize $email with the correct email address
$Email = "user\@domain.com";

# Now, initalize the text for the link
$LinkText = "Click me!";

Next, we need to save the HTTP_USER_AGENT information.

# Find out who's visiting this page
@user_agent = split(/\//,$ENV{'HTTP_USER_AGENT'});

If this code is unfamiliar to you, please see my previous features Reading CGI Data, and Understanding Environment Variables.

Finally, we'll be using the rand() function, so we'll have to seed the randomizer first, by calling srand(). The reason is that the rand function doesn't produce truly random numbers. If you don't seed it first, it will always produce the same set of numbers. srand() takes an argument that is used as the seed. If you don't pass an argument, it will use the current value from time. The effect will be to produce a series of seemingly random values, which we'll use to build fake email addresses. The rand() function should not be used in strong cryptography because it can be easily broken if a hacker can figure out what you're using for the seed. But rand() will certainly serve our purpose here without any problem.

The call to srand() is only made once, and that is in the main program. The reason is that if it is included in the subroutine that generates the fake addresses, it will be called 1000 times with close to the same time value. Because the same seed produces the same set of values the email addresses will all be the same, and that's not what we want. A friend of mine used to say "If you're going do something, do it first class". Since we're planning to annoy the spammers, there's no point in making it easy for them by giving them 1000 of the same email addresses, right? Let's make them work a little.

Now that all the important variables are initialized, let's work on the guts of the code. We'll loop through the list of known robots, looking for a match in HTTP_USER_AGENT. If we find a match we'll generate 1000 fake email addresses, and then we'll exit the script. If there isn't a match, then we know that an interested user has arrived, so we'll print your email link as it should normally appear. Here is the code.

foreach (@Robot) {
    # If we found a robot, let's give it 1000 fake addresses.
    if ($user_agent[0] eq $_) {
        local $iCounter;
        for ($iCounter = 0; $iCounter < 1000; $iCounter++) {
            # Get a new fake email address
            $Email = &GenerateFakeAddress;
            print "<A HREF=\"mailto:$Email\">$Email</A><BR>";
        }
        # OK, we've done the job and now we can exit this script
        exit;
    }
}

# If the script didn't exit, then we have a legitimate visitor, so
# print the right e-mail address
print "<A HREF=\"mailto:$Email\">$LinkText</A>";

As you can see, the code isn't very complicated. The real work is done in the function GenerateFakeAddress.

For those of you unfamiliar with functions (also called subroutines), they are an easy way to consolidate code that gets repeated a number of times. Using them makes your code easier to read and debug, so its important to know how to use them. Briefly, a function is defined as follows.

# define a new function
sub Function {
# your code goes here
}

When you call a function from the main program, precede the function name with '&', like this

&Function;

Here the code for GenerateFakeAddress:

# generate fake mailto: addresses and links
sub GenerateFakeAddress {
    local @Domains = (".com",".net",".us",".edu",".nl",".de",".it",".se",".ch",
            ".uk",".ca",".hr",".ae",".br",".jp",".be",".us",".au",".ie",
        ".ar",".fi",".mil",".gov",".sg",".es",".mx",".no",".pt",
        ".dk",".il",".ru",".nz",".th",".pl",".id",".cy",".in",".kw",
        ".at",".za",".cn",".fr",".is",".ro",".kr",".gr",".co",".ph",
        ".bo",".hu",".cr",".pe",".cl",".tr",".arpa",".tw",".eg",
        ".ee",".ge",".ua",".om",".ec",".hk",".ve",".ag",".cz",".ni",
        ".to",".nu",".sm",".ni",".lt",".yu",".bg",".ba",".do",".qa",
        ".ck",".mt",".bf",".lu",".su",".bh");

    local $FakeAddress = &GetWord . "\@" . &GetWord . $Domains[rand(82)];;
    return $FakeAddress;

}

You will see that it calls another function GetWord, which generates a string of 13 random letters. Here is the code:

sub GetWord {
    local @Letter = ("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o",
            "p","q","r","s","t","u","v","w","x","y","z");

    # initialize the variables
    local $Word;
    local $iWordCounter;

    # Generate a string of 13 random letters
    for ($iWordCounter=0; $iWordCounter < 13; $iWordCounter++) {
        $Word .= $Letter[rand(26)];
    }
       
    return($Word);
}

Both functions use rand() to produce an email address that is fake, but has a legitimate domain. This will make it harder to know that the address is bogus. In GetWord the loop will cycle 13 times, and each cycle will add a random letter to a string. The string is returned to GenerateFakeAddress, and is used to build the address. You will note that in GetWord rand() has a parameter (26), which defines the range of values that it can return. We use (26) because that's the number of letters in the alphabet. In GenerateFakeAddress, the parameter value is (82), which corresponds to the number of domain name extensions in the list.

Both functions use 'local' when declaring variables. What this means is that variables such as $Word exist only within the function. Although different functions may use the same variable name, any changes will only affect the variable in the function. Normally, perl variables are global, which means that any function can change the value for the whole program. I fell into that trap when I first wrote the script. I have almost 10 years experience programming in C, and when I wrote the looping code in this script I used the C conventions without thinking about it:

for( $i=0; $i<13; $i++)

Trouble is, there are several loops in the script, and they all used $i. It would complete a loop in a function, and think it was done when it returned from the function. Duh. So you see, even seasoned veterans can make silly mistakes if they're rushed. Using different variable names works, but can be a headache in larger scripts. Do yourself a favor and use 'local'.

The script included in this feature requires SSI, or 'Server Side Includes'. This is something that is configured on your server. If you don't know if it is, you'll have to check with your system administrator. You'll also need to know if there are any special extensions for the HTML files that call this script. Sometimes you'll be required to add a special extension like .shtml instead of .htm or .html. Your system administrator can answer these questions as well.

To call the script use the following code:

spam.shtml

<HTML>
<BODY BGCOLOR="#FFFFFF">

<!--#exec cgi="/cgi-bin/spam.cgi"-->

</BODY>
</HTML>

You may, of course, have to change the path to spam.cgi. If you plan to use the script in your web pages, replace the link to your email address link

<a href="mailto:perl.guide@about.com"> Emmie Lewis - Perl Guide</a>

with

<!--#exec cgi="/cgi-bin/spam.cgi"-->.

Here is the code for spam.cgi.

#!/usr/local/bin/perl

#===============================
# Fighting Spam With Perl
# Copyright 1999, Emmie P. Lewis
# Created 07/18/99
#===============================
# This script is designed to
# determine who is accessing a
# web page, and if it's an email
# harvesting robot, generate 1000
# fake email addresses. Otherwise
# it will print a link to the
# correct email address
#===============================

print "Content-type:text/html\n\n";

#===================================

# A list of known robots to hide from
# The current list is at http://www.soclair.ch/resources/useragents/emailgrabbers.html
# NOTE: 'TestThisScript' is not a real robot! It is included in this list to test
# this script, and to see how it works from a legitimate browser.
@Robot = ('TestThisScript', 'EmailSiphon', 'CherryPickerSE/1.0', 'CherryPickerElite/1.0',
    'Crescent Internet ToolPak HTTP OLE Control v.1.0', 'EmailCollector/1.0', 'EmailWolf 1.00',
    'Mozilla/2.0 (compatible; NEWT ActiveX; Win32)', '/0.5 libwww-perl/0.40',
    'empty');

# Initialize default info for the email link
# First, initialize $email with the correct email address
$Email = "user\@domain.com";

# Now, initalize the text for the link
$LinkText = "Click me!";

# Seed the rand function.
srand;

# Find out who's visiting this page
@user_agent = split(/\//,$ENV{'HTTP_USER_AGENT'});

# Uncomment the next line of code to see what will happen when a robot
# visits this page.
# $user_agent[0] = 'TestThisScript';

foreach (@Robot) {
    # If we found a robot, let's give it 1000 fake addresses.
    if ($user_agent[0] eq $_) {
        # Give this counter a unique name so it won't be
        # accidentally changed by a subroutine
        local $iCounter;
        for ($iCounter = 0; $iCounter < 1000; $iCounter++) {
            # Get a new fake email address
            $Email = &GenerateFakeAddress;
            print "<A HREF=\"mailto:$Email\">$Email</A><BR>";
        }
        # OK, we've done the job and now we can exit this script
        exit;
    }
}

# If the script didn't exit, then we have a legitimate visitor, so
# print right e-mail address
print "<A HREF=\"mailto:$Email\">$LinkText</A>";



# generate fake mailto: addresses and links
sub GenerateFakeAddress {
    local @Domains = (".com",".net",".us",".edu",".nl",".de",".it",".se",".ch",
            ".uk",".ca",".hr",".ae",".br",".jp",".be",".us",".au",".ie",
        ".ar",".fi",".mil",".gov",".sg",".es",".mx",".no",".pt",
        ".dk",".il",".ru",".nz",".th",".pl",".id",".cy",".in",".kw",
        ".at",".za",".cn",".fr",".is",".ro",".kr",".gr",".co",".ph",
        ".bo",".hu",".cr",".pe",".cl",".tr",".arpa",".tw",".eg",
        ".ee",".ge",".ua",".om",".ec",".hk",".ve",".ag",".cz",".ni",
        ".to",".nu",".sm",".ni",".lt",".yu",".bg",".ba",".do",".qa",
        ".ck",".mt",".bf",".lu",".su",".bh");

    local $FakeAddress = &GetWord . "\@" . &GetWord . $Domains[rand(82)];;
    return $FakeAddress;

}

sub GetWord {
    local @Letter = ("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o",
            "p","q","r","s","t","u","v","w","x","y","z");

    # initialize the variables
    local $Word;
    local $iWordCounter;

    # Generate a string of 13 random letters
    for ($iWordCounter=0; $iWordCounter < 13; $iWordCounter++) {
        $Word .= $Letter[rand(26)];
    }
       
    return($Word);
}

Fighting Spam With Perl # Copyright 1999, Emmie P. Lewis http://www.siteware.ch/webresources/useragents/collectors/

The following email address(s) is/are not copyright and are free to all email robots that steal email@emailrobotsetc.com etc etc etc






























email@billsrobotsetc.com email@billsrobotsetc.net email2@billsrobotsetc.com email3@billsrobotsetc.com email4@billsrobotsetc.com

emaila@billsrobotsetc.com emaila@billsrobotsetc.net emle@bollsribotsetc.com emoil@bullsrebotsetc.net eoil@ullsrebtsetc.net  emoil@bullsrebotsetc.com

eoil@bullsrebotsetc.net emoil@bullsrebsetc.net eoil@pullsrebotsetc.net emoil@bullsrelpsetc.net emoil@blsrebotsetc.com emoil@bullsrebotsetc.au

bill@ballsrebotsetc.net pat@bull123rebotetc.net mal@myrebotetc.net tue@myrebotetc.net paul@myrebotetc.com mal@myrebotetc.kl mal@myrebotetc.pl

potts@mymaicebotetc.com potts@mymaicebotetc.net potts@mymaicebotetc.gh potts@mymaicebotetc.lo potts@mymaicebotetc.co potts@mymaicebotetc.om

potts@mymaicebotetc.cm potts@mymaicebotetc.cmm potts@mym321ebotetc.com pits@mymaicebotetc.com putts@mymaicebotetc.com

potts@mymaicebotetc.qe potts@mymaicebotetc.pl  lotts@mymaicebotetc.net potts@mymaicebotetc.lk potts@mymaicebotetc.rt potts@mymaicebotetc.lc

potts@mymaicebotetc.as adam@bigtop123ax.com adam@bigtop123tap.com dam@bigtop123tap.com dam@bigtop100tap.com dam@bigtop123tap.net

sam@big957toptap.com sam@big957toptap.co sam@big957toptap.cm sam@big957toptap.cl sam@big957toptap.lm sam@big957toptap.bn

enid@big957toptap.com enid@big957toptap.net enid@big957toptap.di enid@big957toptap.au enid@big957toptap.ca enid@big957tap.ca

nid@big957tap.la  nid@big95xtap.ca nid@big9jktap.ca nid@big9df7tap.ca nid@big95ddtap.ca nid@big9ght7tap.ca nid@tat957topap.ca

samid@tat957topap.ca sanid@tat957topap.ca sanid@tot957topap.ca sanid@tat453topap.ca sanid@tattopap.ca sanid@tatdg57topap.ca

sanid@tat231pap.ca said@patpolpap.uk said@patpolpap.net said@patpolpap.usa  aid@patpolpap.bg stay@patpolpap.uk said@patpolpap.nu

said@patpolpip.nu sid@patpolpip.nu tom@patpolpip.nu sara@patpolpip.nu das@patpolpip.nu said@padfolpip.nu said@patpolpip.ni said@patpolpip.no

said@papolpip12.nu said@patpolpip.un said@patwpolpip.nu sid@patpolpip.nu mad@robatpolpip.nu me@patpolpip.nu me@patpolpip.un me@patpolpip.ni

itsme@patpolpip.nu me@2ppolpip.nu wasme@patpolpip.nu notme@9patpolpip.nu me@patpolpop.nu me@patpolpip.ni me@itspolpip.lo me@tobadpolpip.nu


-------------------------------------

email@billsrobotsetc.com email@billsrobotsetc.net email2@billsrobotsetc.com email3@billsrobotsetc.com email4@billsrobotsetc.com

emaila@billsrobotsetc.com emaila@billsrobotsetc.net emle@bollsribotsetc.com emoil@bullsrebotsetc.net eoil@ullsrebtsetc.net  emoil@bullsrebotsetc.com

eoil@bullsrebotsetc.net emoil@bullsrebsetc.net eoil@pullsrebotsetc.net emoil@bullsrelpsetc.net emoil@blsrebotsetc.com emoil@bullsrebotsetc.au

bill@ballsrebotsetc.net pat@bull123rebotetc.net mal@myrebotetc.net tue@myrebotetc.net paul@myrebotetc.com mal@myrebotetc.kl mal@myrebotetc.pl

potts@mymaicebotetc.com potts@mymaicebotetc.net potts@mymaicebotetc.gh potts@mymaicebotetc.lo potts@mymaicebotetc.co potts@mymaicebotetc.om

potts@mymaicebotetc.cm potts@mymaicebotetc.cmm potts@mym321ebotetc.com pits@mymaicebotetc.com putts@mymaicebotetc.com

potts@mymaicebotetc.qe potts@mymaicebotetc.pl  lotts@mymaicebotetc.net potts@mymaicebotetc.lk potts@mymaicebotetc.rt potts@mymaicebotetc.lc

potts@mymaicebotetc.as adam@bigtop123ax.com adam@bigtop123tap.com dam@bigtop123tap.com dam@bigtop100tap.com dam@bigtop123tap.net

sam@big957toptap.com sam@big957toptap.co sam@big957toptap.cm sam@big957toptap.cl sam@big957toptap.lm sam@big957toptap.bn

enid@big957toptap.com enid@big957toptap.net enid@big957toptap.di enid@big957toptap.au enid@big957toptap.ca enid@big957tap.ca

nid@big957tap.la  nid@big95xtap.ca nid@big9jktap.ca nid@big9df7tap.ca nid@big95ddtap.ca nid@big9ght7tap.ca nid@tat957topap.ca

samid@tat957topap.ca sanid@tat957topap.ca sanid@tot957topap.ca sanid@tat453topap.ca sanid@tattopap.ca sanid@tatdg57topap.ca

sanid@tat231pap.ca said@patpolpap.uk said@patpolpap.net said@patpolpap.usa  aid@patpolpap.bg stay@patpolpap.uk said@patpolpap.nu

said@patpolpip.nu sid@patpolpip.nu tom@patpolpip.nu sara@patpolpip.nu das@patpolpip.nu said@padfolpip.nu said@patpolpip.ni said@patpolpip.no

said@papolpip12.nu said@patpolpip.un said@patwpolpip.nu sid@patpolpip.nu mad@robatpolpip.nu me@patpolpip.nu me@patpolpip.un me@patpolpip.ni

itsme@patpolpip.nu me@2ppolpip.nu wasme@patpolpip.nu notme@9patpolpip.nu me@patpolpop.nu me@patpolpip.ni me@itspolpip.lo me@tobadpolpip.nu



Courtesy: anti spam dot limited