Fighting Syndication Based Spam

Nice little write up in the NYTimes talking about how Microsoft researchers have traced the companies and techniques used by web spammers to push their pages up the serps using blogs, otherwise known as splogs, and other throw away domains, which means they can probably block them in the future. So, hopefully some good news will be on the horizon as far as finding and removing such sites, but this is more than just a feel good article, this one is about how, once again, advertisers and ad networks are actually funding this mess, and one of the biggest problems appears to be, once again, Google and other large advertisers, like Orbitz.

Now, we all know that Google continues to fund these splogs by allowing them to earn money using their Adsense program, but why would they do that? Aren’t they the ones who want this pristine search engine with the perfect results? According to this article they are also one of the ones allowing all of the splogs to be created in the first place, as nearly 22% of the over 100,000 unique url’s that they collected were marked as spam and were from blogspot. 22%!

I wonder why they would let this spammers create these sites to start with, and then allow them to use Adsense to monetize their spam? Of course, Adsense wasn’t the only programs being used, but one would think it would be better controlled than the others, and I think in this case, Adsense wasn’t even the top program being used. Maybe the spammers had already figured out that Adsense earnings were going down hill and replaced them already.

Surprisingly, the researchers noted that the vast bulk of the junk listings was created from just two Web hosting companies and that as many as 68 percent of the advertisements sampled were placed by just three advertising syndicators.

The researchers found that for some keywords like “drugs” and “ring tone,” more than 30 percent of the results from major search engines were fake pages created by spammers.

They discovered that the average spam density – a measure of the percentage of Web pages that contain only advertisements – was 11 percent for 1,000 keywords they used in their research.

The researchers said large advertisers were to blame for a significant share of the spam problem.

The Microsoft research findings, based on a survey in October, also determined that much of the spam ad traffic was being funneled through the Internet addresses of just two Web-hosting companies. Source: Researchers Track Down a Plague of Fake Web Pages

The article goes on to say that one of those hosting companies, ISPrime out of New York, said the activity had been traced to a single user and it violated the companies acceptable use policy, and that they severed the relationship after they were notified by a reporter. Those single rogue user’s and affiliates seem to be everywhere.

I have been trying to read this whole PDF all day, and keep getting interrupted, but it boils down to two hosting companies hosted the bulk of the spam and three advertising companies were funding 68% of their sample. They used a five layer double-funnel techninique to work their way through the spam process.

1) Tens of thousands of advertisers pay a handful of syndicators(4), to display their ads, who buy traffic from a small number of aggregators(3), who themselves buy traffic from web spammers(2) who setup hundreds of thousands of redirection domains and create millions of doorway pages(1) that fetch ads from these redirection domains and widely spam the urls of the doorway pages to forums and blogs. If any of these doorway pages are clicked on by users, they are funneled back through the aggregators, who then de-multiplex the traffic to the right syndicators. While there can be a huge chain of redirects, there is always one domain at the end that is responsible for buying this traffic. If the spammers are using Adsense, then googlesyndication.com plays the role of the middle three layers, receiving the traffic and redirecting to the advertisers.

The top five keywords are drug related, phentermine, viagra, cialis, tramadol, and xanax, with the top 100 keywords comprised of 74 drug related keywords, 16 ringtone related and 10 gambling related.

Here are the top 15 redirection domains by doorway url appearance counts, and are probably good sites to ban your ads from being displayed on:
paysefeed.net
topsearch10.com
topmeds10.com
themp3direct.com
searchadv.com
sixxx.info
rightfinder.net
vip-online-search.info
a3b4.info
topmobile10.com
yourfastfind.org
arearate.com
find-more.biz
yourfreevids.com
webresources.info

The top three syndicators, or those who lie between the aggregators and the advertisers are:
findwhat.com appearing on 1,656 redirection chains
looksmart.com appearing on 803 redirection chains
7search.com appearing on 606 redirection chains

The top 15 advertisers, most of which are ringtone advertisers by number of ads appearing on the pages:
mobilesidewalk.com with almost twice as many ads appearing as number 2
freeringers.net
tunes4tones.com
funmobile.com
monstermarketplace.com
rockinringers.com
bestringtoneoffer.com
5starringtones.com
mp3-ringers.com
myshorttermloan.com
couponmountain.com
downloadrings.com
pharm24h.com
search-meds.net
toppoptones.com

In another section they show advertisers like shopping.com, dealtime.com, bizrate.com, and others, but while most of these are advertisers, if they are appearing in Google Adsense, or similar type of ads, then the network is responsible for those ads showing up on the spam sites, not the advertisers, as they are just buying traffic in Google, give me hits for ten cents Alex. The networks approve these sites and let them continue to gunk up their own search engines for one reason, money.

If search engines continue down this path, users will eventually find other search engines they can trust and move on, anybody remember Alta Vista?

The researchers will be presenting this paper and other info in May at the International World Wide Web Conference in Banff, Alberta.