home  wiki

APlanForWikiSpam


Overview

Its rather simple. Rewrite each edit as an email, and use existing spam tools to classify the edit. Bayesian filters should work fantastically well for this application, though I can't think of any good reasons why more traditional filters such as External linkSpamAssassin won't work.

Wiki as Email

Headers

Most of the headers in this application are defunct, but well formed headers will help the filters work their magic in the correct fashion.

Content

The content is a little tricky. Do we simply supply the raw wiki text, or do we render into HTML? Which content do we include - everything or just the diff? Initially I think that the diff text in raw form should be enough, rendering into HTML is probably a good idea at a later date.

Filtering

Bayesian Filtering

The regular benefits of Bayesian filtering over other methods should apply equally as well on a wiki as in email. As with any Bayesian filtering, the system needs to be trained and so the training interface will probably be the most cumbersome component of our anti-wiki-spam coding.

SpamAssassin

SpamAssassin's default rules would need to be tweaked by use of a custom config file, as various tests (eg: MIME_HTML_ONLY) are useless in this context.


Version 1 (current) modified Tue, 03 Jul 2007 23:11:51 +1000 by tyson
[EditText] [Spelling] [Current] [Raw] [Code] [Diff] [Subscribe] [VersionHistory] [Revert] [Delete] [RecentChanges]
> home> about> events> files> members> maps> wiki board   > home   > categories   > search   > changes   > formatting   > extras> site map

Username
Password

 Remember me.
>

> forgotten password?
> register?
currently 0 users online
Node Statistics
building122
gathering190
interested464
operational248
testing201