|← Other essays||Handling Spam
Wizardry Dragon, Eagle 101, Joe Smack
|This is a proposal on a multi-wiki approach to handling spam. if you have ideas, comments, or suggestions, please discuss them on the talk page.|
- 1 What is linkspam?
- 1.1 Good faith linkspam
- 1.2 Bad faith linkspam
- 2 Countering spam
On the English Wikipedia, and on others, spam is becoming an increasing problem. The purpose of this essay is to propose a few ways to fight spam on a multiwiki, foundational level. Please feel free to contribute your own ideas to this proposal, but try to discuss them on the discussion page first before adding them to the body of this text.
This is a list of most of the linkspam I see added to articles when the intentions are clearly (and I hear about it on my talk page) made in good faith. They are harder to track as they are usually one or two from a user or IP and then stop, and user page warnings in general do no good because they do not contribute to Wikipedia often.
(Note: policy needs to be specified) What follows here is the rundown. Above all though, Wikipedia is not a mere directory of links.
MySpace is a social networking site and not a reliable source, even articles for whom the MySpace profile is 'official' (note: there is no MySpace side regulation of this term). Appropriate yet extremely extremely rare use of it as a link is when encyclopedic information can solely be derived from a specific entry of a MySpace profile: picture a famous film director who made a controversial and visible racist post on their MySpace profile which was widely criticized in the media. In this case too, it should be <ref>'d and not put into the 'External links' section. Of the hundreds of MySpace links I have removed, I could count on one hand the times I've seen this kind of situation happen.
Youtube/Google Video/Video Sites
The biggest issue here is that these sites rarely if ever enforce copyright policy, and thus cannot be linked. In specific cases in which licensing is transparent or provided, the content of the video must either be extremely pointed and uniquely encyclopedic otherwise a 10 minute clip may be linked for a 3 second soundbite. Similarly to MySpace, this rarity should rather be <ref>'d than put into 'External links'.
The problem is everyone who throws words onto a blogging site who has an article is NOT making a reliable source. Linking to a blog in 'External links' is also kind of like linking 'Atlantic-ocean.com' to the article on Sea bass; you're getting a ton of posts that often have very very little to do with the article at hand (like 'anyone else have a good Christmas? I did...'). Like MySpace, perhaps only slightly less rare, a blog link useful when encyclopedic information can solely be derived from a specific entry link. And, like MySpace and Youtube it should be <ref>'d and not put into the 'External links' section.
Fansites, while frequently containing a wealth of information, do not for the most part contain information specifically symmetric to the article's subject. Tertiary sources like these summarize secondary and primary sources that should be linked instead. Bands and models often pop up with these links. They tend to also be full of original research and promote what they are fans of.
External images/Gallery sites/Image shack/Flickr
Other than obvious copyright issues (usually a blanket revert telling someone to post it to the commons if the licensing is kosher will do), often someone thinks that dropping the link in the article will make the image popup too. Often times the best way to deal with this is to revert the user, and drop them a note about uploading. Also posting a link solely to a .jpg or .gif is considered leeching bandwidth of a given site and is generally a pretty jerk move. Copyright concerns are enough though. Pretty pictures of porn stars also aren't usually explicitly encyclopedic. People still link to them though.
Definitely almost always original research. I have never seen a link to a forum that met external link criteria for a link. They just aren't very stable, they are full of side conversation and generally don't provide much of any verifiable information. If whatever appeared in the forum is worthy of note, there will be third party, independent news about it. Use those as a en:cross reference, rather then link to the forum itself.
When these are posted, unless the article-in-question is about a foreign language site or it contains maps/diagrams that are easy to follow without language, they should be reverted. Adding a link to a language other then the language of the Wikipedia is not very much help for anyone. It is difficult for the contributors of that Wikipedia to verify the usefulness of the link, and it is near useless for the readers of the Wikipedia. Unless there is a linguistic equivalent/translation, it is unverifiable and unaccessible to far too many users for the Wikipedia. These can be easy to spot on websites with .cs, .de, or whatever country affix it may float.
These people carry POV, spam, COI and POINTs into the Wikipedia world. They can be easier to spot as 10 link additions to the same tourism site is a lot more noticeable than 2, and 2 is never enough. They can be harder when they use sockpuppets, but IRC botfeeds highlight the link more than the user/IP contributing the link - so all is not lost. As far as using bot feeds, these are the easiest to catch.
(Note: policy needs to be specified) Above all Wikipedia is not a place to promote or advertise.
These links, sometimes in another language, show up in a variety of articles. Tourism sites on location articles are common, and the spammer figures that an European tourism link belongs in every country article in the content. Perhaps a management-for-hire link will show up on economy articles. Pay-for-services links are quickly spottable and their link is phrased to seem attractive to the user's 'tastes'. The possibilities are endless, but the users/IPs contribs page is a virtual rap sheet.
An article created by a new user whose name matches the title, which matches the link added, with no wikification and reads like an ad, is, you guessed it, probably an ad. Sometimes you want to give it a chance to turn encyclopedic, but usually you can tell from the get-go if it is there to sell or there to help.
While more rare, these links usually point to an original research site and try to prove a point. Xxxx-guy-is-a-douchebag.com is a kind of thing you'd see, although obscure blogs without a moniker that point out the same original research are about as common. They will usually fight it because the authors are usually the contributors. Passion on their end doesn't given them a cart blanche on a link addition staying however.
Someone who has made their own website wants to link it to wikipedia. Their username is JohnFin.com and their website is JohnFin.com and they add the link to the same. These sources are rarely authorities but sometimes ok to use if it is not added in a POV way. They can become just as impassioned as the political linker in insistence on a link staying. Talk it out, many end up seeing their biased contribution as being POV. Often too if it is a verifiable authority, a more primary source has their work and that link should be used instead.
These links will point to something obscene or nonsense or nothing. They are tests or vandalism and not an EL, and should be treated as such.
There are sometimes a scorch of links that seem to speckle articles seemingly at random to promote anything from commercial to personal. The users/ips who contribute this are less likely to be checked up on (a pastrami.com link to an article on BitTorrent, wtf...musta been a mistake...*revert*...(nothing further)). IRC botfeeds make this kind of linkspam more transparent than normal as they highlight the link more than the article or the author.
The Spamhaus Project is a multinational effort to identify and track spammers, spam gangs, and botnets and bots used to spam the internet. Spamhaus has several lists and utilities that would be extremely helpful if utilised to help fight the increasing volume of spam on wiki projects. The use of these powerful tools may just be what the wikis need to tip the scales in the efforts to fight spam.
The ROKSO List
Perhaps one of the more powerful tools Spamhaus offers is the ROSKO list. ROKSO stands for "Registry of Known Spam Operators." The people and organizations in this list amount to about 80% of the spam on the internet. Tracking them, and ensuring that Wikipedia is protected from them would likely reduce the volume of spam.
From the ROKSO page:
Another effective tool in fighting both spam and vandalism has been the use of bots. On the English Wikipedia there has been very successful efforts using both AntiVandalBot and Shadowbot to fight spam. Bots like these help automate the otherwise tedious and time-consuming effort that is reverting spam edits made by serial spammers.
Interwiki Spam Bots
Spam is not a problem restrained to any one wiki. As such, I think efforts should be made to make interwiki spam and vandalism bots to help combat this problem. Basically this would be a network of bots similar to those existing in #wikipedia-spam, but perhaps one feed bot watching multiple wikis. From this one bot we can watch for spam and use attached bots to do the actual reverting if need be. Similar to the method being used with success in #wikipedia-spam. I will do some extensive programming over this holiday to come out with a prototype, that will need only to have bots monitor to know when to do a revert. Eagle 101 00:54, 20 December 2006 (UTC)
- To clarify the feed bot would be monitoring multiple wikis, preferably our largest wikis.
- Other bots then (once approved on their respective wikis) can do the actual reverts.
What to do with serial spammers
One problem happening increasingly more on Wikipedia is the occurance of serial spamming. This is the spamming of one or more websites across multiple pages by a user or, many times, malicious bot.
Many serial spammers have what in legal parlance is called a "MO" or method of operation, a unique pattern or signature, that like a fingerprint, can be used to identify that particular spammer. For example, they may be spamming a particular site, or sites with a constant, defining theme. For instance, many additions of the link exploring-xxxxx.com where xxxxx is a constantly changing town/country, depending on the article it is added to.
If you believe you have identified a serial spammer, you should immediately consider blocking them to prevent further damage to Wikipedia. You should see your local Wikipedia's blocking policy for the particulars of doing this. However, keep in mind a few things:
- Check that it is a dynamic IP. If it is, do not block it indefinitely, as this will lead to collateral as the user simply moves onto another IP and another innocent user gets stuck with the block.
- If it is a static IP, indefinitely block the user at discretion, or block them for a period of time according to your discretion and the local Blocking Policy.
- if it is a named user, indefinitely block the user, leaving the talk page unprotected so that they may be unblocked at the discretion of the administration.
The above is only suggestions on how to act in such a situation and does NOT supplant the local Blocking Policy in any way - consult it before blocking if in doubt on how to proceed.
Common spammer strawmen
Spammers will offer arguments like the following. These are strawman arguments, for the reasons listed.
- "But you have links to commercial sites in the list."
- Spamming is about promoting your own site or a site you love, not about commercial sites at all. Links to commercial sites are often appropriate. Links to sites for the purpose of using Wikipedia to promote your site are not.
- "But you have links to other sites that people have added for self-promotion."
- Those need to go, too. The fact that we haven't gotten around to it, yet, does not mean that we have some obligation to have your site.
- "But you have a link to site Y, and my site is just like that."
- We don't need to link to every site in existence that meets a certain criterion. Sometimes we just need one site representative of a category. (See also the comments about linking to web directories instead, so that Wikipedia does not become a web directory.)
- "But these links have been here for a long time."
- There are no binding decisions on Wikipedia, especially when the decision was never discussed on the talk page. Just because nobody noticed your spam a long time ago does not mean you now have a "right" to keep it in.
- "My site is non-commercial, so it's not spamming" (Similarly 'nonprofit', 'charitable', opposes cruelty to puppies, etc)".
- It doesn't matter--being noncommercial (etc.) doesn't confer a license to spam even when it's true, and these sites are often trying to sell something even if the business is organized as a nonprofit.