Spam prevention

rxmd · Jul 12, 2007

Hi,

I was wondering if maybe we should think of better means of spam prevention on this forum. I don't want to sound alarmist, but forum spam here appears to be slowly on the rise and I'd like to think about it early on.

Spam posting on forums is largely automated by now; spammers have software that posts their spam on thousands of forums that run the same forum software. (We've even had some spammers here who used the unregistered version of one program that included a message saying "Pay up to remove this message". On the other hand it means that most spammers aren't interested in the individual site at all and tend to let their spamming software do the work, meaning that spammers tend not to do things that their software won't do for them.)

For spamming, the spammer automatically generates user accounts that are then used once and discarded. User account names used to be created randomly, resulting in names like "dj387vgfo9"; now they tend to use dictionaries, giving us names like "Kunilingus" who posted today's medication spam, and sometimes the dictionary takes the nature of the site into question so that on RFF the nick will seem more photo-related and people will click on it. Account creation is largely automatic; in theory there are these little Captcha images where you have to type letters into a box, but many Captchas are weak. In particular, the Captcha used by RFF's VBulletin software is apparently under attack, meaning that currently RFF's Captcha mechanism might not be too effective anymore (maybe the slowly increasing volume of spam can serve as evidence). Also there now is a number of commercial captcha-solving sweatshops where you can submit tasks (like "solve this captcha", but also "generate ten user accounts") and have them solved by humans for a little money, this sort of defense is slowly becoming ineffective. I know it's not the best site in the world, but here's a recent Slashdot discussion on Captchas slowly being overcome. The best-known human labour service of this kind is Amazon's Mechanical Turk.

Hence I wonder if we won't need more means of spam protection on RFF in the long run. Any method of spam prevention has the danger of being disruptive to normal site use; I guess we have to be a little bit creative in finding something that doesn't interfere too much. Here's a couple of ideas (most of them probably not actually new):

A maximum time between account creation and first post, otherwise the account will be deleted. The idea is that account creation and spam posting are done at different times; firstly the spammer has accounts generated on 500 forums, then he posts to them automatedly all at once. If we say that any new user has to post between 24 hours and otherwise the account will be deleted,
- Question: Is this really effective? How long have recent spam accounts lain dormant before posting the actual spam?
- Disadvantage: It makes it difficult if you just want to generate an account (for looking at attachments etc.) without posting, as it forces people to post. Some forums here, like "Rangefinder General Discussion", might get swamped with new user announcements; we could work around that by creating a "Say Hi" forum. Spammers might start to post semi-automatic "Hi" messages saying things like "Hi! I'm Sandy, 18 years old".
Getting a more efficient Captcha in place in VBulletin. See here for an overview.
- Question: I'm not sure how easy this actually is in VBulletin, and it probably takes a little bit of IT experience to change it. OTOH I guess someone in the RFF community has said experience.
- Advantage: Automated account creation becomes more difficult.
- Disadvantage: Does not defeat the sweatshop approach.
Having to solve a Captcha for your first post.
- Remark: Probably only makes sense if there is a better Captcha.
- Remark: This might be coupled with having to complete your first post within a given amount of time, say, an hour. The idea is that the spammer can't submit the Captcha somewhere else to have it solved for them. A time limit has the potential of creating a disruptive effect; I know from my own posting habits that I sometimes start a post (such as this), then drink a cup of coffee, do some other work, and get back to it an hour later. On the other hand, assuming that we have a sufficiently non-machine-solvable Captcha in place, the spammer would have to submit it to a sweatshop individually, costing him money and taking the financial incentive out of spamming, because spamming only works because forum posts don't cost him anything.
- Advantage: Makes it difficult and/or costly to post things automatically.
- Disadvantage: Users' first posts are more hassle. Still nothing too terrible though IMHO.
Having to submit your first post for review.
- Advantage: Spamming in the present model becomes impossible.
- Disadvantage: Working through the review queue is too much work for the existing moderators. We can take the work off their shoulders by setting up a review group of 100 or 200 users, or by having everyone with more than 100 posts on RFF eligible for reviewing. Somewhere in the user box which is displayed on every page (where you currently see how many private messages you have) you would get a message saying "2 first user posts currently requesting review", and people could regularly look at that and either accept new posts or throw them in the bin or keep them.
- Remark: If we think this through to the end, in the long run spammers might start to post more or less automated harmless first posts ("Hi! I'm Sandy, 18 years old and from the San Francisco Bay area, and interested in photography. Can you point me to good locations where I can meet other photographers?"), and start spamming with the second post. As this still requires significant manual input, which is expensive, it would be a couple of years down the road before this started, I hope.
- Remark: I am not sure how much this is warranted yet by the volume of spam; maybe some moderator with access to site statistics could have a look at the ratio of first posts containing messages later reported as spam to first posts containing meaningful content.

Any other ideas?

Philipp

Spam prevention

rxmd

May contain traces of nut

Similar threads