Shawn Collins blogged this earlier today on his site - he really should be blogging here though so he gets more exposure (wink). What got my attention was the affiliate network data Project Black Book published on Commission Junction, LinkShare, ClickBank, Azoogle, My Affiliate Program and RegNow. Apparently they have spidered the Internet and analyzed sites for affiliate links to determine the following metrics: total number of links found, unique merchants, unique affiliate ID’s and unique affiliate domains. This data is of obvious value to anyone using or considering using one of these providers. A word of warning though - do not believe everything you read.
When I worked at Commission Junction, I did a lot of competitive analysis over the first 6 years. I used a multitude of resources to try and create a realistic, “apples to apples” comparison of our major competitors. I looked for third-party data that I knew would not be biased. My favorite early resource was AltaVista’s advanced search feature that let me view the number of web pages indexed in AltaVista that had affiliate tracking links on them. Unfortunately, this feature is no longer available. MSN had this search feature up until last year when it also disappeared. The search feature looked like this - link:trackingdomain.com and would show all the web pages indexed in a particular search engine with links to a particular tracking domain. This essentially told you the reach of any given network. I have yet to find another search engine that (a) offers this feature and (b) shows realistic results.
Another favorite resource that does still work is at SecuritySpace.com. They offer a bunch of free reports at the bottom of their homepage - the one I use is called “Web Bug Report”. They see pixels as web bugs and the pixels they report on are the ones that serve ads and track impressions. The Web Bug Report is actually two different reports - Web Bug Site Report and Web Bug Traffic Report.
The Web Bug Site Report, which shows the top 100 sites “benefiting” from web bugs. My definition is it lists the top 100 sites that show the most impression tracking pixels. So if you view this report, you will not be surprised to see that googlesyndication.com is number 1. As you scroll down the list, you will see that number 18 is linksynergy.com, which is the domain LinkShare uses. Number 22 is qksrv.net, which is Commission Junction; number 25 is bfast.com (formally BeFree, now also part of Commission Junction). Commission Junction now uses multiple domains in addition to qksrv.net and they are also listed in the top 100 as are zanox-affiliate.de, tradedoubler.com, valuecommerce.com, and assoc-amazon.com. To be fair, you cannot add up qksrv.net, bfast.com and the other 5 CJ tracking domains and then compare them to another network that uses one domain, because many of the CJ domains reside on the same pages and are therefore double counted.
The Web Bug Traffic Report essentially tells you the traffic that the web bugs are exposed to. In other words, if you were number one on the first report but happened to be on web sites that get little traffic, it would exposed on the web bug traffic report - like the old saying goes, if a tree falls in the woods and no one is there to hear it, did it make a sound? If your affiliate links are on pages no one visits, will you get any traffic from them much less generate any transactions? In this report googsyndication.com is still number 1. Linksynergy.com is number 54, zanox-affiliate.de is number 60, bfast is number 63, qksrv.net is number 66, tradedoubler.com is number 80 and again CJ has a few other tracking domains listed in the bottom 20.
The cool thing about these reports is they are dated by month and year in the URL so you can go back in time and look at each month of each year and see how the rankings change over time. For example if you go back to March 2003 (200306) for the Web Bug Count Report, google.com is way down the list below Yahoo and qksrv.net is number 2.
OK, so back to the premier addition of Project Black Book and their data. When you spider the Internet, it helps to know what you are trying to find and what you are looking at once you think you’ve found it. For example, the data they published for Total Links Found on Commission Junction may only take into account one domain (they do not say if the used all of them). Furthermore, they list 66,930 unique merchants on CJ. This is wrong - the number is closer to 1,700. If it were true, CJ’s $500 monthly minimum fee would account for over $400 million dollars in revenue each year! The mistake they made is they counted the AID in the tracking links and assumed it was unique per merchant. AID stands for Ad ID and is unique per creative per merchant. So if a merchant has 50 links, they saw it as 50 merchants. Looking over the data for the other networks, it appears they may have made similar mistakes for Unique Merchants.
Each network uses different values in their tracking links that have completely different purposes, even when they are sometimes called by similar names. It is important to show your work when doing analysis so others can understand what you looked and what you thought you saw, since some data is open for interpretation. The only information they provided on their data is the following: “Throughout this issue, you will see Summary Stats like the one above. The data presented came from spidering the Internet. The data did not come from Azoogle (or any other affiliate network), or through any of their data sources. The information was independently collected and summarized by PBB.” I would really be interested in seeing more specifics on how they came up with their data - hopefully they will provide more insight in upcoming issues.
It is great to see another publication covering the affiliate marketing industry - their premier issue is 65 pages long and can be downloaded for free from their site. I am excited to read it all the way through and look forward to upcoming issues.
There are serious problems with the MyAffiliateProgram numbers as well; they apparently only counted merchants who use our basic myaffiliateprogram.com links and didn’t count merchants who use private domains or direct linking, which makes up the bulk of our customer base as well as most of our better merchants.
As far as printing 67,000 merchants for CJ, that just shows an embarrassing lack of understanding about the affiliate marketing space. I think if CJ had that many merchants, Todd may have decided to stay there a little longer and buy himself an island or two.
Hi, Todd…
I thought you/your readers might enjoy reading comments made by one of their executives over at ThoughtShapers.
> he really should be blogging here though so he gets more exposure (wink)
I’m too much of a loose cannon - I’d have The Man editing my stuff all the time over here.
With the release of our preview issue sampler, we are getting alot of questions and interests in our data.
I would like to take the opportunity now to explain the methodology in the data collection.
This information will be published in the first subscription issue as well.
The statistics presented in the journal were gathered from the spidering activities of Cydata Services. Project Black Book (”PBB”) has an exclusive licensing arrangement to perform data analysis in their 1 Billions+ records database.
Cydata does spidering of the web, much like the googebots, and processes all spidered web pages for keyword indexing and linking relationships.
The affiliate network URLs are translated to show the actual merchant website (ie. the redirect).
On page 8 of the PDF (or the print version), is the first summary stat provided, which is on Azoogle Summary Stats.
The data presented:
Total number of links found: 39,686
Number of unique merchants: 4,735 (should have been typed out as merchant IDs)
Number of unique affiliate ID: 6,917
Number of affiliate domains: 8,131
Through the spidering of affiliate websites and parsing through all the linking relationships, the end data is shown.
The data was sampled from data from the last 8 months. In two weeks, Cydata is increasing its spidering to about 1M domains a day. This will allow the statistics to be timely and a summary of what was found in that month.
By increasing the spidering to monthly cycles, its possible to start to track new merchant products entering the space, and watching the popularity of that product as affiliates start to promote it.
Going back to the stats, we show 4,735 merchant IDs. These are unique merchant IDs, not number of merchants (sorry for the confusion in the stat). Next month, all the merchant IDs will be translated back to merchants (since merchants have multiple merchant IDs), and a new statistic will be tracked, that is the actual number of merchants.
We have found 6,917 affiliate IDs which were found on 8,131 unique domains. Affiliates can have multiple websites and have the same ID (and some may have multiple affiliate IDs).
Given a t3report on a merchant, the specific affiliate domains can be revealed along with summary linking data.
jeff doak wrote:
“There are serious problems with the MyAffiliateProgram numbers as well; they apparently only counted merchants who use our basic myaffiliateprogram.com links and didn’t count merchants who use private domains or direct linking,”
you are correct, the INITAL stats on MyAP were only on those that ran through the redirect means.
Those that private domains or direct linking would come only if we were focusing on a particular vertical and those domains would come up.
for example, if xyz_website.com were using myAP and had the direct linking like xyz_website.com/cgi/affid=123
then we could do analysis and statsin on xyz_website.com and be able to count the number of unique affiliate domains and number of affiliate links.
jeff doak wrote:
“As far as printing 67,000 merchants for CJ, that just shows an embarrassing lack of understanding about the affiliate marketing space”
Actually, it’s an embarassing omission of the word “IDs” after merchants. The number of 67,000 is what we have as # of unique merchant IDs, but as i have seen in the database, there are multiple IDs mer merchant.
I am filtering through that information for the next monthly stats update, that will still include number of unique merchant ID, but then the new summary stat of number of unique merchants… and then the number will be clearer.
I posted up on our blog, as well as my first response here to clarify the stats issue, since we have had alot of feedback so far and the methodology and stats were a little confusing.
-brandon
Todd Crawford wrote:
“I would really be interested in seeing more specifics on how they came up with their data - hopefully they will provide more insight in upcoming issues.”
Point well taken. Much like how we talk about “transparency and truth”, so we must also do the same with our data collection methodology and analysis.
I have dissected the affiliate link structures, the problem was as i stated above, i had not done enough filtering to get to the actul stat of “number of unique merchants”, where what the stat should have read “number of unique merchant ID”.. which isn’t very interesting on its own, unless the “number of unique merchants was also known”.
This information will be clarified as well as additional stats to come.
Feel free to suggest some intersting stats that we could be monitoring for.
some brainstorming examples:
-determining on average, how many websites an affiliate runs on any particular network
-a stat on “dirty traffic” that traces how prevalent is porn traffic feeding into non-porn websites
-where are the websites being hosted (geographically)
-determining how many different networks a domain uses on average.
We keyword index every page we visit, so we can do keyword analysis on top of linking analysis to offer alot of insights.
The data on page 48-49 of general keywords used in domains and the linking relationships to those domains is very interesting.
Well thank heavens for http://www.botsense.com if I see your bot coming I will trap it, throw it away with the rest of all the sniffers, scrapers and whatnot.
Hopefully your spider obeys the robots.txt file, correct?
Jimmy Daniels wrote:
“Hopefully your spider obeys the robots.txt file, correct?”
The short answer is no.
The longer answer:
Do web browsers “obey” robots.txt?
They don’t. So what if a webspider was actually a web browser?
Robots.txt is not a rule, law, nor listed as one of the 10 commandments.
It’s a protocol to optionaly follow by those that do spidering as some kind of mechanism of respecting wishes in an internet-based ‘do not disturb’ manner.
Google “obeys” robots.txt because they have nothing to lose by skipping your website. The advantages to allowing Google to index your website is to allow others to find you via search.
For our spidering, it allows merchants to find you (assuming you have an affiliate website), in being able to identify and target your website with some kind of financial proposition that benefits you, from being found.
There are hundreds of spiders runnig loose out there, and many don’t play very nice.. not by ignoring robots.txt, but by pounding the website repeatedly.
In our sense of being a good netizen, we throttle our spideing to the speed less than a 56K modem. We don’t pound your site, nor download any graphics, where the argument of wasting your bandwidth is null.
There are spiders that do complete mirroring, taking down your video clips, images, etc that clearly eat up your bandwidth.
When it comes to those kinds of spiders, robots.txt doesn’t protect you.
This is where you can attempt to block by IP (which is hard since spiders can use proxy servers and different ISP locations), or by browser-type, but that can be faked, or by limiting bandwidth rate (which would then penalize the legit web surfers who have broadband) or with software that can track the intervals of clicking to predict if it is a spider (but these measures can be circumvented by random timeouts and spacing out the spidering schedule for a targeted website).
My company has been spidering for 2 years, and i understand the frustrations by website owners when they see webspiders sucking down their bandwidth, making several concurrent connections that it brings the website to a crawl.
The only thing i can say, is that i have put measures in place in order to be a good netizen in not wasting your bandwidth (since only html is indexed), and also to limit the time on the size to less than 30 seconds, so the spiders don’t get caught up in deep hierarchy tree structures.
The end result of your website being found, is that merchants who are looking for good affiliates can do so via t3report.com
In speaking in only my spidering case, being spidered has been beneficial by those websites who we have visited, because competing or complementary offers have been made to them, that otherwise, the website owner would not have known about or have found.
Yes, my company does make a profit through the sale of this marketing intelligence, but it is a win situation for the website we spider, a win for the merchant who wants to find new/good affiliates, and a win for my company who delivers the data in a very insightful way.
brandon wrote:
“Actually, it’s an embarassing omission of the word “IDs” after merchants. The number of 67,000 is what we have as # of unique merchant IDs, but as i have seen in the database, there are multiple IDs mer merchant.”
As Todd pointed out, these are not multiple ID’s per merchant, they are creative ID’s. All this shows is that you not only had no idea how to determine a unique merchant by looking at CJ’s linking structure, but were ignorant enough about the industry as a whole to actually believe (and print) that CJ has 67,000 merchants. Anyone can spider websites and gather a “billion” records. But before you try to pretend to have real results, why not actually talk to someone in the industry to see if your methodology and results make any sense?
“Yes, my company does make a profit through the sale of this marketing intelligence, but it is a win situation for the website we spider, a win for the merchant who wants to find new/good affiliates, and a win for my company who delivers the data in a very insightful way.”
I’m sorry, but that’s simply obnoxious. You print completely inaccurate information, knowing full well that other websites will quote it, and then don’t even bother to accept responsibility and issue a retraction? And how exactly is anyone else “winning” in this scenario? Take responsibility for your publication and stop spinning.
Jeff Doak wrote:
“But before you try to pretend to have real results, why not actually talk to someone in the industry to see if your methodology and results make any sense?”
Each affiliate network does things differently, i was able to get a top level breakdown.
For the CJ specific example, given what you have pointed out, then the stat of “number of unique merchant IDs” is a meaningless one.. and no problem in dropping that one, but what is more relevant and interesting, is once filtering through all the creative IDs and such, that it comes down to the merchant being identified and cross-referenced to the number of affiliate (referring) domains and links.
So while you have expressed a challenge to the data, and one that i do appreciate you taking the time to share, the data does stand on its own, because it is the raw data, with no bias… what has to be taken into account by me, is that different affiliate networks will have different structures, and they have to be treated individually with better understanding…. which is exactly what I intend to do, and your posting has helped to illuminate that.
Jeff Doak wrote:
“I’m sorry, but that’s simply obnoxious. You print completely inaccurate information, knowing full well that other websites will quote it, and then don’t even bother to accept responsibility and issue a retraction? And how exactly is anyone else “winning” in this scenario? Take responsibility for your publication and stop spinning.”
Issue a retraction?? How does one do that in print magazine that just debuted and received both elation and criticism today? In case you didn’t read my post here, i did post up some better explanations of the methodology used, as well as created a blog on our projectblackbook.com website in order to cite some of the issues that some bloggers had posted early on.
You have brought up a specific issue with the CJ stats, i have explained things, and no retraction is needed..what is needed is clarification and fine tuning. None of the data is meant to smear or disparage affiliate networks..and anyone taking an offensive position against the transparency that we are undertaking, clearly have their reasons to want to subvert.
As far as who “wins” as you asked….. we have started talking with companies (large and small) who don’t have an affiliate programm, but are unsure of what to do.
They look at the networks, but they all say the same basic things, “we’re great”, “we;ll make you money”, “join us”, etc, etc, etc.
Since you were refencing CJ, i’ll use our Top List example on Page 17.
Some UK-based credit card might look our CJ Top List and see that CapitalOne.co.uk was on the list , giving this company the insight, that maybe they should join CJ so that affiliates who are promoting capitalone.co.uk might promote them as well.
now, it is possible that by looking at the merchant list, they could determine this, but our data is showing the “popularity” of the merchant, based on number of unique domains and the number of links to the merchant.
This doesn’t necessarily mean that affiliates are making lots of commissions, but it does serve to give some insight as to what the affiliates are promoting, who are the ones that are the closest in touch with the surfers.
It is one of the primary missions of PBB to help bring in more merchants and opportunities to the online space, by giving them the ability to understand what is going on.
The off-line world has transparency through media audits and such, the online interactive world is lagging.
There will be many affiliate networks that won’t like what we are doing.. and there will be many more affiliate networks being tracked in the next months release of stats that didn’t make it into the preview issue. I believe that there will be more good that comes about through our stats and the publication that outweighs the temporary knee-jerk reactions to the data that we present.
Quote Brandon>>
“Robots.txt is not a rule, law, nor listed as one of the 10 commandments.”
Ugh.
“Given a t3report on a merchant, the specific affiliate domains can be revealed along with summary linking data.”
And this is available to anyone?
“Robots.txt is not a rule, law, nor listed as one of the 10 commandments.”
If you were truly trying to be a “good netizen” then your spider would obey the robots file, but since your business depends on selling affiliate website info without permission of the affiliate, obviously you can’t.
From your first issue, “All our worries, both as consumers and business people, can be answered with Trust.”
Indeed.
Oh, and in the doc you keep mentioning the affiliate is losing PR by linking to other sites, please do some research first as this is not the case at all. If you want to call it anything, it is more like sharing their page rank, if they lost page rank, most websites would eventually end up with no page rank at all.
Sorry, that previous post was mine, not anonymous.
One thing that was mentioned above was that it monitored the keywords affiliates use. This is very appaling and I can’t help wonder when these results will be used against us…something that should be private
brandon wrote:
“Given a t3report on a merchant, the specific affiliate domains can be revealed along with summary linking data.”
jimmy wrote:
“And this is available to anyone?”
yes, http://www.t3report.com to view online demo as well as submit a request for domains you are interested to view. pricing is based on amount of data,which is included on the return quote.
jimmy wrote:
“Oh, and in the doc you keep mentioning the affiliate is losing PR by linking to other sites, please do some research first as this is not the case at all. If you want to call it anything, it is more like sharing their page rank,”
i will send your comment to the writer to get their feedback.
the feedback is greatly appreciated.
jimmy wrote:
“One thing that was mentioned above was that it monitored the keywords affiliates use. This is very appaling and I can’t help wonder when these results will be used against us…something that should be private”
we would not be mentioning specific domains of affiliates with what you are pointing out as being disparaging information. the intent is not to be some kind of internet police, but to show trends and market data.
broad summary views, that might show x% using porn related words on pages that link to xyz affiliate network of domain is fair game.
“My favorite early resource was AltaVista’s advanced search feature that let me view the number of web pages indexed in AltaVista that had affiliate tracking links on them.
The search feature looked like this - link:trackingdomain.com and would show all the web pages indexed in a particular search engine with links to a particular tracking domain. This essentially told you the reach of any given network. I have yet to find another search engine that (a) offers this feature and (b) shows realistic results.”
—
Have you seen Alexa’s “Traffic Rankings” section at alexa.com? I’m am “old but new” (worked for Mindspring, then Interland, then owned 3 companies and am now starting a new venture) to Affiliate Marketing. In my researching tonight I found this post. Granted, I about 2 years “behind on the times” when it comes to the web, web design, web marketing and affiliate marketing … but even if Alexa isn’t as detailed or precise as what you’ve used in the past, it does at least gives some information. the information is searchable by domain, too. Oh, and it’s powered by Amazon and Google which at least gives us an idea of reliability.
Ps. Here’s a direct link to the Traffic Ranking report for Revenews.com!
http://www.alexa.com/data/details/main?q=www.revenews.com&url=www.revenews.com
jenna wrote:”Have you seen Alexa’s “Traffic Rankings” section at alexa.com?”
Have you seen T3Report.com? You can try to mine alexa, google, altavista, etc for data, but they only limit you to 1,000 results. T3report does its own independent spidering.
Brandon wrote:
“Have you seen T3Report.com? You can try to mine alexa, google, altavista, etc for data, but they only limit you to 1,000 results. T3report does its own independent spidering.”
I just found this last night, well at about 3am actually! I haven’t had a chance to really take a look at it, but I certainly will very soon. Thanks!