The Solution to Spam Pollution

The Solution to Spam Pollution (posted 15 July 2002, 3am / revised 1pm 29 October 2008)

Here's a pretty old legacy post from the blog archives of Geekery Today; it was written about 24 years ago, in 2002, on the World Wide Web.

A few things have recently come together for me. First, Andrew Leonard recently penned an interesting column on spam-blocking technology for Salon; then Jennifer Lee wrote another interesting article for The New York Times. Finally, I made use of a brief free trial of McAfee’s SpamKiller software. I’ve also just been doing a lot of thinking lately about what needs to be done to seriously address the rising tide of spam that is flooding most everyone’s inbox. Spam e-mail has been getting worse over the past several years, and it’s been getting worse at an accelerating pace. If we don’t want Internet communications to become simply worthless from being drowned by spam e-mail, then we have to rethink our basic model for e-mail so that spammers can no longer take advantage of the system’s architecture to overwhelm legitimate messages with their crap. Lee’s article shows a good grasp of the problem and why anti-spam legislation won’t do much to solve it. Leonard’s has a good grasp on the overall technological shift needed to address the problem, but he doesn’t push the envelope nearly enough in the kind of framework that needs to be accomplished.

Leonard’s article describes the development of SpamAssasin, an open source spam blocker being adopted and improved by many system administrators. Leonard points out that the collaborative effort between legions of dedicated spam-fighters can greatly improve the ability of the software to identify spam messages. As Leonard puts it, The only way to stem the flood of unwanted e-mail may be to harness a million eyeballs and an army of open-source hackers. There’s an intuitive reason why this should be the case. Obviously, by harnessing the efforts of thousands of administrators who ferociously hate spam, it will get a big boost in productive energy. But that’s not all.

The basic problem is this: under the present e-mail architecture, the spam market works. It works phenomenally well, and especially well for the seedier side of online industries, in particular pornography and sex-related products, which can’t advertise through conventional media (other than other porn outlets) and don’t have any financial interest in maintaining a reputation as a friendly corporate citizen. The reasons are inherent features of the e-mail architecture:

It costs nearly nothing to send spam: once you have an Internet connection set up (which you’ll need for your product’s website, anyway), it costs virtually nothing to send out scads and scads of spam e-mail. Labor costs can be reduced to nill by feeding addresses from a web crawler into an automated spamming program. This is a fundamental reversal from direct mail and telemarketing, where a fixed cost for contacting a person is borne by the advertiser.
Lots of people see it: If you send out a spam message to a huge group of people, then most of the people you send it to will see it. In part, this is because e-mail is a durable medium, like direct mail or fax, and unlike the telephone, so if you send a message while the user is away, they still get it. It’s also due to the relatively primitive state of message sorting and spam filtering–users have very little control over the order and priority with which messages appear in their inboxes, so to get to the mesages they want, they generally have to wade through, or at least scan over, any spam that they get.
It’s hard to track offenders. Many comparisons have been drawn between spam e-mail and the junk faxes whose rising costs spurred a federal law against them in 1991. The two are alike in that advertisers get a basically free contact, while victims are stuck with the primary costs (in paper, bandwidth, time, what have you) of the interactions. However, there is a crucial difference: junk faxes can easily be tracked to their perpetrator through phone company records. Offenders can be blocked and identified for legal action. Spam e-mails, on the other hand, are generally very difficult to track to their originators. Headers can easily be forged, server relays can be found to use, one-time-only addresses created with free services, work can be farmed out to mule computer users, who are paid a small amount to send out a huge volume of messages, and then take the fall if they get caught. The anonymity of e-mail and its reliance on the honor system for identifying senders makes spam very difficult to flag and filter.

When we look at all these factors, we begin to see that we need a comprehensive solution which will work to address these structural holes. We cannot rely on anti-spam legislation, since spammers will merely relocate to different states or different countries, and use the anonymity of the communication to further shield themselves. Spam is only going to get worse until we have mass deployment of an easy-to-learn, easy-to-use, agile framework which harnesses both human intelligence and high-quality, flexible technological solutions to make legitimate email easier to access and identifies and deals with spam.

Unfortunately, most anti-spam solutions fail, because they are focused narrow-mindedly on a single goal–the goal of accumulating as many heuristic rules as possible to identify and kill spam (this is reflected in the names–McAfee’s SpamKiller, SpamAssasin, and so on. The most common and most maddening manifestation of this is scorched-earth spam programs such as SpamKiller, which works entirely by accumulating thousands and thousands of rules to try to identify common patterns in the way that spam messages are written or addressed. These do indeed catch a lot of spam, but they also slam perfectly legitimate e-mail. For example, my decision to uninstall SpamKiller was finalized when I saw it was trashing legitimate e-mails because a filter (one of thousands, which took lots of scrolling to find) was killing messages because they contained the word rape. Now, look, folks, I’m pretty much physically nauseated by some of the spam ads I’ve received for rape-fetish pornography sites. But I’m an anti-rape activist, and I receive tons of perfectly legitimate e-mail with the word rape in it. SpamKiller’s approach to spam is like trying to kill a swarm of mosquitoes with a cluster bomb, and plenty of perfectly innocent messages were getting clobbered.

The problem here is that most people who work on spam-blocking software and most of those who purchase it are basically in the frame of mind of trying to get rid of a source of long-term and maddening irritation. Programs tend to be reactively focused on axing spam by any means necessary, rather than proactively focused on improving the e-mail user’s experience. But if we keep our mind on what users need and want, rather than what gives us the temporary satisfaction of the kill, then we should begin to see a bit more clearly what needs to be done.

To reduce the effectiveness of spam, first spam management software needs to be widespread, usable, and respectful of user’s legitimate e-mail. With millions of users employing software that lets them take control of their own inboxes, users will be able to stay on top of their legitimate e-mail and sidestep the spam. Information for identifying spam should come from automated reports that millions of users submit: when a spam slips through, the recipient presses one button in the mail client and it is registered as a spam message so that no-one else receives it (SpamAssassin uses Vipul’s Razor, a system which does just this, but it needs to be integrated into easy to use clients, not just arcane Unix mail filters).

Second, we need to plug the anonymity hole through use of double-key authentication and encryption of e-mail. E-mail clients could prioritize messages which can be verified as coming from a valid address, and also messages which are encrypted for the recipient’s eyes only. Spammers who want their messages seen would have to separately acquire a public key for, and encrypt the message for every intended recipient. For millions of e-mail addresses, that’s an awful lot of extra processor time, network bandwidth, and human labor that the spammer has to pay for. Furthermore, the spammer’s PGP signature or signatures can be blacklisted as quickly as the spams start going out.

Finally, system administrators at big ISPs need to get responsible. One of the biggest conduits for spam open relays, poorly configured mail servers which allow anyone on the Internet to send e-mail through the server by forging headers to pose as a machine on the server’s network. System administrators need to get serious about ensuring that connections are only accepted from authenticated users or legitimate machines on the ISP’s own subnet. And when spam is being sent by a user, they need to be quick about axing that user’s account.

What you can do now:

You can do some things now, both short-term and long-term, to keep yourself from being overwhelmed and work towards an Internet not being drowned in spam.

Use shield accounts for online commerce. A lot of high-end spamhouses harvest addresses by buying them from merchants such as Amazon.com. For online interactions which won’t be anything other than perfunctory receipts, it’s good to maintain a shield account (say, diespammersdie@hotmail.com or somesuch) as the address through which you interact with online stores.
Download and use PGP. You can get PGP — a great security program which will let you securely sign messages (so that the recipient can verify your identity) and/or encrypt messages (so that only the recipient can read them). The Windows version of PGP automates the process of creating and using PGP keys, and has plugins for popular Windows e-mail clients which let you use simple pushbuttons for its functions. PGP will make your e-mail more secure, and also help build an Internet environment where spammers can no longer hide behind forged headers to conceal their identities.
Look for solid anti-spam software that suits you. If you can find spam management software which suits your needs, grab it! If you’re willing to geek around a lot, SpamAssasin looks very good. Better yet, Deersoft is in the process of developing SpamAssassin Pro, a commercial product for Windows based on the SpamAssassin engine and integrated with your mail client. Unfortunately, most spam management software I’ve tried (e.g., SpamKiller) is crap.
More tips: Jennifer Lee’s article is accompanied by some tips for avoiding spam, some of which I agree with, and others of which I don’t. Unfortunately, the present spam-heavy environment is encouraging a lot of people to take up measures which cut down spam at the expense of breaking human usability of the e-mail system. Lee suggests using complex e-mail addresses, which do thwart spammers who use dictionary searches on mail services, but which also makes it hard for your friends to remember your e-mail address. She also suggests removing your e-mail from any online directories in which it may be included, which will again thwart spammers but also keep people from being able to reach you. I totally disagree with this method of spam filtering. Again, it amounts to protecting your inbox at the cost of shredding real people’s ability to contact you. Nevertheless, some of her suggestions (such as disposable forwarding accounts for use on Usenet and bulletin boards) are solid.

4 replies to The Solution to Spam Pollution Follow replies to this article · TrackBack URI

Martin Striz /# 11pm • 16 Jul 2002

A propos your problem with filters that eliminated all messages with the word “rape”: I encountered the same problem with Hotmail’s filtering feature when it constantly redirected messages from dictionary.com Word of the Day, a biotechnology listserv, and the Vladimir Nabokov listserv, all of which I voluntarily signed up for. I imagine a lot of listservs, because they are mass mailings, are misidentified as spam. I haven’t tried to use Yahoo’s filter, mainly for that reason.

Follow Up

· August 2002 ·

Marc /# 7pm • 1 Aug 2002

Hi Charles. I think that keeping your email address off of any web pages actually is a good idea, but I understand your concern about that making it more difficult for people to legitimately contact you. There are 3 solutions that I know about: either display your email address as an image (immune to crawlers), use a contact form that hides the email address, or write out your email address in a weird way (like “cwj2 upon eskimo com”).

Follow Up
S.R. Prozak /# 4pm • 3 Aug 2002

Heh. Anus.com is filtered by many sites whose clever sysadmins assume we “must be” a porn site, even though there is minimal if any nudity on the site.

I don’t find your thinking to be very coherent; in one message, you lament the DMCA; in another, you suggest all email be authenticated.

My view is that anonymity – of whatever sort afforded – is going to be a necessary tool for less-limited speech on the internet in the future.

Oh, and leftism is secular Christianity. Make your own Hell, idiots.

Follow Up
Charles W. Johnson /# 5pm • 6 Aug 2002
I am not sure whether Mr. Prozak is aware of how e-mail authentication with tools such as PGP / GPG works.

PGP and GPG are based on double key encryption. The program uses a complex equation to generate a pair of cryptographic keys, a “public” key which is published to the world and a “private” key which is kept secret like an ordinary password. Neither key can be computed from the other, but messages encrypted with one key can only be decrypted with the other.

In addition to ordinary cryptography, this also creates an extremely secure system for authenticating the sender of a message. If you encrypt a message with your private key, then anyone can use your public key to decrypt and read it. But since your private key is known only to you, the recipient can verify that you sent the message.

The important thing to see here is that digital signatures of this sort plug up the “anonymity” hole by giving you a way to verify that the sender of a message is someone that you know – as opposed to the From: header, which virtually all spammers (and now some virus authors) forge to conceal their identities. It basically gives you a way to show that several web resources – a series of e-mails, a web page with the public key, a key server entry, etc. – all belong to the same person. But the important thing is that it’s just a unique sequence of digits. There’s no way to tie this ID back to a specific person.

So let’s say that in addition to being a computer dweeb who sits around firing off radical missives into cyberspace, I am also carrying out guerilla media vandalism or other actions that I want to keep separate from the identity which can be traced back to my offline self. I can just sign up for an anonymous e-mail, generate a new key pair, and use that key pair for all my communications through the anonymous e-mail account. My offline identity will never be compromised, but I will be fully cooperating with the spam management system in place. Indeed, if I felt like it I could generate a new e-mail address and a new ID for every communiqué that I send out.

Now, let’s say we have an intelligent GPG-aware spam management program. It checks every message for a signature, and if a valid signature is present, it checks the identity against a list of confirmed contacts and a list of blocked senders. If it matches with the first, the message is prioritized and exempted from any spam checks. If it matches with the second, it’s /dev/null’ed. If the identity is unknown, it’s run through spam checks with a high level of tolerance – since even if it does turn out to be spam but goes undetected, all future messages from that spammer can be easily blocked. If, on the other hand, the message is unsigned, it’s run through much harsher spam check and the message is de-prioritized.

This would instantly improve the experience a lot for the end-user, since she now has a reliable way of filtering mail from her friends and important contacts up to the top of the list. As a bonus, this will also work to reduce virus clutter, since the private key is only stored in an encrypted form that is unlocked by the user typing in a passphrase.

And for the spammer, there would be only three basic options left:
1. Just keep on sending out spam the old way. Most of it will be trashed immediately and the rest will be de-prioritized as likely junk mail.
2. Create a PGP ID and send out all your spam messages from this ID. After the first person gets spam from your ID, however, it will be reported as a spamhouse ID and virtually nothing else you send will get through.
3. Create a unique PGP ID for every single spam message you send. This will prevent immediate blacklisting, but your e-mails will still be from unknown IDs (and therefore ranked down and subjected to spam filters), and the time, labor, and computer costs of generating new IDs for millions of messages will quickly add up.
In any of these three options, a very low percentage of the spammer’s messages are going to get through – particularly if signature-based filtering is combined with other solid forms of spam filtering. Even with existing lame filters, spammers have had to make several expensive changes – such as hiring mules, sending out duplicate messages rather than a message addressed to thousands of people, generating random content to scramble filters, etc. – in order to make sure that their e-mails get seen. With these new measures in place, there’d be no danger to non-commercial anonymous use of the Internet, but spammers would have the bottom fall out of their market and quickly begin to fold.

Follow Up

Rad Geek People's Daily