Mail Filter Best Practices
Goals
- deliver every ham to the inbox
- without delay
- prevent delivery of every spam
- during the SMTP exchange
- minimize consumption of our resources by malware
- maximize consumption of malware resources
The first goal can be accomplished by simply delivering everything. Preventing spam delivery is a difficult task that is often constrained by the first goal. Rejecting delivery is preferred over silent discards as preventing delivery blunts the incentives to spam.
Spammer
Assets
- bot nets provide a vast supply of bandwidth, CPU, and IP addresses
- cheap domains (is tasting still permitted?)
- custom software optimized for fast delivery
liabilities
- poorly written software
- PCs turned on/off at times of the day
- bot nets are primarily windows PCs
- no control of rDNS
History
There was a time when spam filtering consisted of installing SpamAssassin and writing a few procmail rules. Ham was routed to the inbox, spam to the bit bucket, and the suspect messages to the spam folder. For quite a long time SA was sufficient to provide users with a clean enough inbox.
A weakness of early filtering was that MTAs didn't filter messages until after the SMTP connection was complete. By the time SA determined the message was rubbish, the spammer had already successfully delivered the message (the basis for charging his customers), obligating the recipient to deliver or bounce it. Bouncing it had the nasty habit of creating backscatter. Silent discards are fraught with technical, social, and often legal implications. To avoid those pitfalls, filtering needed to happen before the message was accepted.
Some MTAs evolved interfaces (milter, QMAILQUEUE, etc.) that enabled filtering during the SMTP conversation. Mail operators without those abilities front ended their MTAs with software filters like AMAVISD or ASSP. Others deployed commercial hardware products like the Barracuda. Dozens of filtering techniques have been developed with varying degrees of efficacy. We are going to cover most of them.
Most Effective Anti-Spam Weapons
Identifying bot/malware -vs- legit mail servers
Content Analysis
- Bayesian
- URIBL
SMTP Phases
connect
remote IP
- IPs in dialup pools, and to a lesser extent DSL and cable, are ephemeral. Tracking abusive IPs for more than a few days rapidly becomes less beneficial and more likely to generate False Positives. Hence why many DNSBLs automatically expire listings after 5-20 days.
- connection history. Legitimate mail servers rarely change their IPs. Their DNS information rarely changes. If remote IP history is stored for more than about 30 days, a very significant majority of servers that send ham to your users will have a stored history. After populating a sender history database, connections from IPs without a history is highly probably to be spam.
remote OS (p0f)
- Bot nets are primarily composed of older versions of Windows. A windows email sender that isn't from hotmail.com is 98% odds of spam.
geographic location 9_% of ham travels less than 4,000 km. _0% of spam travels more than 4,000 DNSBL listing(s)
- Range in efficacy from 45-90%. Tend to have False Positive rates from 1-30%. Using more than a couple tends to amplify the FP rate.
AS number (network neighborhood)
- Well managed networks have very little abuse and get it under control quickly.
- Abusive machines tend to be clustered on networks that tolerate abuse.
- early talker
HELO / EHLO
- hostname
- valid?
- match rDNS?
- TLS
- auth
- relay
- SPF
MAIL FROM
- black & white lists
- validity of domain in from address
- SPF
RCPT TO
- black & white lists
- local user existence / deliverability
DATA
- headers
- duplicated singular headers?
- missing any required headers?
- is Return Path valid?
- is Date reasonable?
- is UserAgent detected?
- is Mailing List detected?
- are there enough Received headers?
- does the From header match the envelope FROM?
- valid bounce?
- content
- bayesian
- virus
- spam URLs