Discussion of Online Advertising, CPA, SEO, Affiliate and Next Generation Marketing
  • NAVIGATION
  • TOPICS
  • THE REVENEWS BLOGGERS
  • QUICK CONTACT
ReveNews Online Revenue News & Opinions Since 1998

Stop Rogue Web Bots From Eating Your Bandwidth & Stealing Your Content

November 17th, 2005 by Jim Kukral

One day it’s going to happen to you too. You’re going to login and see that your site statistics have gone through the roof. “What happened,” you’ll think. “Was I Slashdotted? Did my site get a first page Google ranking?” Excitement sets in…for a moment. Skip this whole article and sign up for this beta.

Upon further inspection, you’re going to have that gut sinking feeling when you realize you’ve exceeded your bandwidth limit for the month to the tune of hundreds of dollars, but your traffic increase didn’t equate to more sales, or readers.

How strange? Not really. Rogue web bots are attacking your server.

If you’re a website owner (multiple actually) like me (hey, even bloggers with their own server can be hit), it’s just a matter of time before you’ll come to realize that your site has come under attack from web robots that are ruthlessly eating your bandwidth and scraping your website for their own personal gain. Often this takes the form of scraping your images and content, republishing your content somewhere and filling it up with Google Adsense ads.

botsense_logobeta.gif

But there’s hope! A press release out today announces a new free tool that helps stop bots in their tracks. Rogue Web Bots Attacking the Web; Increasing Costs, Stealing Content New Tool from BotSense.com Stop Bots in Their Tracks.

So I signed up for the free beta and did some reading. Here’s what I learned.

For example, I had no idea there were bots out there that scan your site for all of your content, email and images, grabbing what they can and pulling it into their own database, for whatever use they can imagine. Well, ok, I did know that these types of bots existed; however, what I didn’t know was how powerful they were, and how many of them existed either, or what they were called.

The BotSense.com beta tool (free for now) helps non-technical people (like me) create a .htaccess file that can block out harmful bots from attacking your web server. You might ask why use .htaccess to block rogue bots? Because it turns out that rogue bots simply ignore robots.txt. What can you expect from a thief?

Why block bots you ask? Good question.

According to the Botsense.com tool creator Don Cramer, there are several types of bots you can block, and specific reasons to block them. He has allowed me to republish the content from his site.

Site Downloading Bots : Bots that download all content found on the target site
Why should I block these bots?
This type of bot also known as an “Offline Browsing Bot” will download your entire site including all images and other media. This type of bot will waste bandwidth, server resources and slow down your site.

Media Harvesting Bots : Bots that seek and download all media files from target sites
Why should I block these bots?
Media bots will crawl your site and download all media types specified by the user. Users will never see your site as intended and will easily bypass any type of products or other revenue generating portions of your site. Media bots waste bandwidth and steal your content without any benefit to the site owner.

Email Harvesting Spam Bots : Bots that seek any email address found on the target site for spam use
Why should I block these bots?
Email “Spam” bots crawl your web site in search of email addresses which can be used for spam. These bots are a complete waste of bandwidth and server resources offer no benefit to site owners. These should always be blocked.

Surveillance Bots : Bots that crawl your site looking for certain types of content/media
Why should I block these bots?
This type of bot is specifically engineered to gather information for varios purposes of which your website does not necessarily receive any benefit.

Worried yet? You should be. The higher your rank on search engines, the more desirable your content the more susceptible you are to these types of forays.

“I would guess that 99.9% of people who have websites or run their own server have no idea what unwanted bots are doing to their system or their bandwidth limits,” said Mr. Cramer in an email interview. “I would estimate that costs associated with stolen content and excessive bandwidth charges that these bots create runs into the millions each year, maybe per month, future testing will tell the full tale.”

So how do you know what bots are good and what bots are bad?

“The thing is, most bots aren’t descriptive about who they are, so how is anyone supposed to know first what they do, and then how do you know which ones are good or bad? My team’s extensive research has allowed me to conclude which bots do what, and why, and the .htaccess generator tool I’ve created allows anyone who signs up the opportunity to see solid conclusions and then take action (block them) if they feel it’s appropriate.”

The tool itself is easy to use. Simply create a free beta account, login and follow a three-step process where you select the bots that you want to block, and generate a .htaccess file to place on your server.

So for now, I’m sleeping a bit easier knowing that many of these bots are being kept at bay. The only question is then what am I supposed to do about future/new bots? According to Mr. Cramer, there’s hope. The beta tool has a very limited number of bots to block for the time being, however, in the future, new versions of the tool will have an expansive list and advanced features that not only measure the activity of rogues but quantify the actions of the good bots.

I love fighting the good fights!

8 Comments | Filed under: Online Marketing

8 Comments

bryce said:

True Story - About one year ago someone copied and posted my entire site (all 6000 html pages) verbatim. He only just changed the domain name title and main logo gif.

I discovered it quickly when my affiliate referral credits leaped upward dramatically. However, the increase all came from a domain I had not heard of before. Imagine my shock when I went to see what was on that domain.

But with the increase sales, I didn’t know if I should stop him or let him go. He had copied ALL of my affiliate referral links verbatim.

Donald Simms said:

I wish i was lucky the punks come in and spider out all my coupons and robots.txt dont stop them. Ive had it. Thanks for the tip Jim Im going to try the tool.

Connie Berg said:

I have had issues with a well known competitor eating up my bandwidth. They were continuously scraping my site for new coupons. We blocked the ip, but they kept coming back. I sent an email, they just slowed down. They don’t care. They change ips all the time.
Maybe time to send them a bill for the bandwidth.

Connie Berg said:

Forgot to add, I signed up for this beta almost a month ago. Am still waiting in frustration for the control panel link and reports. I have contacted Don several times and still have nothing.

Suzi said:

I already have an extensive custom .htaccess file for my site at http://spywarewarrior.com. Does anyone know if I can add the contents of my own .htaccess file to the one generated by BotSense?

Jim Kukral said:

Hi Suzi,

Yes, you have two options at the end of tool. One is to download the file, and the other is to just copy the info to add to your existing file. Pretty nice!

If you want more discussion on this, Problogger is talking about it too.

Lucky said:

Your blog is so interesting that I want to subscribe to its feed… but alas, I can’t find the RSS link!

This is a great piece, thanks. Checking out the bot repeller…

Leave a comment

(required)
(required)

Search Through 10 Years of ReveNews Content: