Steve Outing recently published a column bemoaning
the sorry state of spam filtering software today. I think his article goes off in entirely the wrong direction, but it touches on some important issues that I only raised in passing in my previous post on spam management, and which I would like to take the time to expand on.
E-mail is the killer app of the Internet. Nothing, not even web surfing (and I say this with all due respect to you, gentle reader) matters a whit compared to the vast significance that e-mail has had on our every-day communications. Not only has e-mail enabled us to get in one-on-one contact with (potentially) anyone anywhere in the world for virtually free, it has also created and nurtured the listserv, where any group of people–small or large, far-flung or local–can instantly, reliably, and painlessly communicate with one another and collaborate on projects, whether this takes the form of a one-to-many newsletter, or a many-to-many discussion group. The cost is often nothing more than some time and labor to administer the list, and the fixed costs of Internet access. This medium for cheap/free communications has created vital communities all over the Internet, more or less single-handedly made the open source movement possible, and changed the face of every form of communication, from news to activism to comic strips. But as Outing points out, there is something rotten in the state of Denmark.
As he argues in his article, the wars between spammers and bad spam filters are beginning to seriously impair the usability of the medium.*
How does this happen? Well, think about the nature of a listserv for example. When you receive a message from a listserv, it is one of many–potentially thousands or even more–copies of the same message being sent out to many users. The From: header is an address that you may have never personally contacted, or even be a newsletter robot rather than a real person. The To: header may be the resender address rather than your address–making it appear forged. And content is included such as Remove me from this list,
a website to go to for more information on the list, and perhaps even content such as routine fundraising appeals. It may also contain trip words
such as viagra
which are actually a part of ordinary conversation, but set off spam alerts. If you’re familiar with how spam filtering works, you’ll see that the M.O. of a listserv is well-nigh indistinguishable from the M.O. of a spammer as far as a spam filter can see. The only clear difference between spam and legitimate listserv traffic is that you have voluntarily opted in for the listserv traffic, and you can voluntarily opt out; whereas spammers don’t care what you think or what you want, and barrage you with their messages without asking you for permission and without caring whether or not you’ve told them not to. This is all the difference in the world, of course, but it’s not a difference that spam filters have any way of seeing. It happens off-camera
because there’s no way right now for a spam filter to monitor which listservs you have signed on for, or which e-mails come from those listservs.
The spam situation is growing intolerable. But spam filters that kill listservs and newsletters are unacceptable. Some of Hotmail’s more over-zealous versions of its junk filter have trashed perfectly legitimate personal e-mails and listserv e-mails, and since the user set up the Junk Mail folder specifically so she doesn’t have to look at it, she ends up not knowing that anyone wrote or that her request to join the listserv has been honored. And since she doesn’t know these messages are coming in, she has no reason to change her junk filter so she never changes it so that new communications can get in. This is the absolute worst way to respond to spam: the whole point of spam management is supposed to be improving the user experience of reading e-mail, not breaking the e-mail system to the point of unusability.
As I said in my previous post, one of the first principles of a good spam solution is that it must be respectful of users’ legitimate e-mail. Spam blocking must never ever interfere with properly receiving legitimate e-mail. If it does somehow block a legitimate e-mail, this fact should be easily detectable, the e-mail should be easily recoverable, and measures should be taken to ensure that it doesn’t happen again. One crucial aspect of respecting legitimate e-mail is that spam management software must be aware of listservs and respectful of e-mail sent over them.
As Outing points out, for usability and responsibility’s sake, system administrators should not take control of content-based spam-screening out of the hands of users. There’s an problem in designing spam solutions that rarely gets discussed: in spam filtering, your system administrator’s interests may not be the same as yours. Filters imposed on everyone by the administrator can make a big effect in reducing the load on mail servers, but individual filters that users apply in their clients doesn’t reduce the load on mail servers at all. And it’s not the admin’s personal e-mail which gets deleted or bounced in false positives. Therefore, admins tend to be comfortable with very strict spam filter rules even if they block a lot of legitimate e-mail. End-users, on the other hand, don’t care much about system load, and want to receive all the legitimate e-mail that’s sent to them. They want spam filtering that is flexible, fine-grained, and which they can adjust according to their own levels of comfort and trust.
As users, we’re just going to have to be up-front and hold our system administrators accountable on this issue. We’ll take our business elsewhere if we start losing messages because of your spam filtering. There are still plenty of milder rules that system administrators can impose that will block significant amounts of spam–such as rejecting e-mail from open relays, using Vipul’s Razor to bounce messages that can be proven to be spam, or setting up "trap" accounts to identify spam attacks in progress and temporarily block off the source of the deluge. However, content filtering based on heuristics and trigger words should only be done at the client level, not by sysadmins.
All well and good; so what can we do to deal with the situation?
Listserv software needs to get its act together. It’s absolutely maddening to try to put together e-mail filters for listservs, because there is absolutely no standardization of how to indicate that messages are being sent over a listserv. And once you get a filter set up, the list owner may change software or robot addresses along the way, at which point you have to through out all your work and make a new filter. All listserv software should standardize on a common set of headers to indicate that e-mail has been sent out over a listserv, as well as some information about that listserv (such as a web page with more information, addresses for unsubscribe/subscribe, and so on). Any standard scheme which is expressive enough will do. They should also develop standardized methods for alerting the list robot that you wish to subscribe to or unsubscribe from the list. With standard features in place, it will be much easier for users to write and maintain filters for dealing with listserv traffic. Also, intelligent spam software can now relax its rules when it is examining list traffic, since we know that we’re dealing with a listserv.
Listservs should authenticate themselves. If we do adopt a standard scheme, and that scheme is used to relax spam filtering rules, then you can bet that spammers will immediately start forging listserv headers to make sure that their e-mails get through. The solution to this problem is to return to a principle I mentioned in the previous post: double-key encryption should be used to authenticate genuine messages. Basically, each listserv gets a unique double-key ID. When a user subscribes, she gets the public key as part of the subscription. Whenever a message is sent out over the listserv, the listserv software uses the private key to add its signature to the message as it is resent. Smart mail readers can then allow you to filter mail based on the ID of its resender rather than headers that might be forged.
Listserv operations must be integrated with mail readers. So far everything has pretty much been minor
reforms in the already-established interface for listservs and e-mail filtering. the purpose. However, this is much
more of a quantum leap
solution. To really improve the usability of listservs and deal with spam lists,
e-mail clients themselves should be aware of and work with listservs.
Here’s an example that should be obvious and trivial. The closest thing there is to a standard identifier for
listserv resending (Majordomo doesn’t have it, but most other products do) is the List-Unsubscribe: header. This
header gives the user a URL — usually just a mailto: link for the list robot — where the user can unsubscribe
from the list. It’s really a bit puzzling that no major e-mail software that I’m aware of (drop me a comment if you
know of any counter-examples!) creates a quick “Unsubscribe from this List” button when it encounters a
message with the List-Unsubscribe: header. This would instantly make things easier on millions of listserv
users, and prevent thousands of misdirected Please unsubscribe me!
messages from flooding listservs.
Let’s dig down a bit more. We suggested a standardized method for subscribing to any listserv, and
subscription requests can be mechanically generated if they are given the address of the list that is being
subscribed to. Then a smart client could feature a Subscribe…
button where users can enter the
address of any listserv and the subscription will be automagically handled by the mail reader. Do you see
where this is going? That’s right: client integration would solve the off-camera
problem once and for
all. All subscriptions and unsubscriptions are being handled by automated modules in the client. The reader
can track these actions and maintain an accurate list of all the mailing lists the user is on. By using double-key
IDs to prevent forging, this list can be made pretty darn near airtight. Now spam filters could separate the wheat
from the chaff. They could intelligently relax their rules when they are scanning confirmed listserv e-mails, to
avoid false positives, and jack them back up again when scanning non-listserv e-mails, to catch more spam.
Finally, listservs must not become conduits for spam. Some spamhouses are already paying ordinary Internet users to act as mules
— they sign up with their own e-mail address on a big listserv, confirm that the address is valid, and then send out a spam message through the listserv. I’ve seen attacks like this inundate more than one e-mail list that I’m on, not to mention the listservs which have been inundated with the Klez worm. Even a perfect scheme for identifying e-mail from listservs won’t matter, if the listservs themselves are resending spam or viruses. Listserv software authors should think about incorporating antivirus screening into their products, to prevent lists from being a dangerous vector of computer infectants. And list administrators need to use whatever options are available in their list software to prevent spam from being sent out. The obvious way is by requiring posts to be approved by a moderator who can screen for spam content. Some software (such as Yahoogroups) has improved on this by allowing listservs to be set so that only a user’s first message is screened by the moderator, and later ones are automatically approved. This way, a spammer will get blocked, since her first and only message is a spam message. However, the list moderator won’t be bothered with having to approve every message from regular list users.
Unfortunately, we can’t much count on listserv or newsletter patrons to come up with these solutions; most aren’t terribly tech-oriented, so we end up with columns like Outing’s, which just focuses on the problem rather than any solution, and where short-term solutions that authors can take are discussed but long-term technological solutions to the underlying problem are not discussed. And we can’t count on your average listserv author to come up with it either. The only people really concerned with ease-of-use in the listserv software world are big corporate entities like Yahoo! and Topica, who make their products easy for end-users but resist big technological jumps. And the only listserv software authors who make those jumps are clever Unix geeks who consider “arcane” to be a compliment when applied to software (take Majordomo–please!).
I love listservs and I pray that spam will not render them useless. I hope that some of us out there who still believe in usability but who don’t have much to lose from big experiments, can start creatively working out and implementing a framework something like this one, in order to prevent listservs from dying a death of the thousand spams.
Read the rest of The A-List: More On Spam Management …