SEO Matrix’s- Markov Chaining and Term Vector Models
2008 is upon us and if I hear Happy New Year one more time I am going to scream! To take my mind off the trauma that was the New Year Eve of 2008 I thought I would rant a bit about SEO “yes, apparently I am going to rant now, yeah!”
I was asked once by a company I worked for to write down my “list” of what it takes to provide quality SEO – my best practice notes if you will. I realized how impossible it was to try to explain Markov Chaining and page rank factors, Latent semantic indexing (LSI), as well as the organization it takes concerning density percentages of keywords being weighted across multiple points of bot scrutiny both with on-page content as well as the factors for the incoming anchor text – etc. It is an extensive list of requirements and each SEO professional has developed their own best practice items that they approach a natural search engine optimization project with. I have developed a content matrix that I use to apply keyword density patterns that follow suit of the Term Vector Model formulas and Markov Chaining principles and include LSI patternage of keyword spread on a web page as well for heightened PR and keyword placement. Each site and each vertical is different…Pattern, density, application…Identify the pattern and you can beat the algorithms.
Markov Chaining and Term Vector Models
Google’s page rank algorithm uses Markov Chaining to help calculate PR scoring
Wikipedia defines Markov Chaining as follows – “The PageRank of a webpage as used by Google is defined by a Markov chain. It is the probability to be at page i in the stationary distribution on the following Markov chain on all (known) webpages. If N is the number of known webpages, and a page i has ki links then it has transition probability (1-q)/ki + q/N for all pages that are linked to and q/N for all pages that are not linked to. The parameter q is taken to be about 0.15.”
Term Vector Modeling is an algebraic model for representing text documents, I am not going to get into all the details of TVM or Vector space modeling but I will say when you combine the principles of Markov’s page rank chain and include the Term vector modeling principles of term weighting within your linking chain and within your web pages content you get some pretty great results concerning SEO.
Markov Chaining points to consider for optimal page rank
1) Anchor text of your incoming links (Use keywords in the anchor text you want to actually acquire search placement for, you should have a high keyword density count on the page the link is coming into within all points of bot scrutiny)
Example

2) Ensuring keywords are applied within your H1 – H2 etc and meta content (Yes this means adding keywords in your description tag as well) and density of keywords comparative to your incoming links anchor text and the content of your own site should be balanced by a percentage of density that you must pre-configure.
3) Anchor text and path nomenclature of your internal link structure should “match” for heightened relevancy of the “term” to increase the likelihood of search placement for that term (Do they match? Example: anchor text is: “Custom Footwear” and path nomenclature is: “http://www.yoursite.com/custom_footwear.php and meta title = Custom Footwear and H1 on the page is “Custom Footwear”, and – and – and.)
4) Relevance is based on the density of semantic and LSI (Latent semantic) words on your page – It’s not how MUCH content (And content density patterns change per vertical) it’s the tfi = number of times a term i occurs in a document and Li = total number of terms in a document. The keyword density of a 600-word document that repeats the term “footwear” 6 times is KD = 6/600 = 0.01 or 1%; it is also the WHERE or which region on the page or in the code the term is applied. Google also weights keywords from your pages; image nomenclature, folder nomenclature, video, document files, etc. (What are you naming your images? – DOH)
Tip on keyword page weighting: Some regions on the web page are given a higher relevancy uptick or relevancy factor for keyword weighting then others – remember back in the early 2000′s (Pre Google’s Florida Storm) how we killed it with a HUGE H1 tag above the first table? NOW each vertical has different relevancy regions look for them in your vertical.
5) It is not how MANY links are coming in, it is the link QUALITY of the links (Amount of PR Juice you get from them – PR juice is dependent on the incoming links site(s) or pages own factors) Using spamming link techniques does not work.
6) One of my secrets on generating incoming links is finding a high PR site optimized for the “base root” term for the entire sites web page content that I want a higher quality or relevancy factor for. So if the weighting or relevance needs to be higher for the term “custom” over the term “footwear” I gain incoming links to weight relevance according to the “term” that has the largest spread for additional keyword terms on my site. So in essence I want to achieve heightened parameter for Markov chaining from a high PR site that is optimized for the root term that I have weighted heavily using Term vector weighting on my websites pages.
A few of my suggestions for increasing your sites search engine placement and or PR
1) Use absolute links on all internal links of your website.
2) Use an organized code and on-page content matrix to organize the density or spread of your keywords across your websites pages (I cluster 5 root terms per gage with LSI thrown in to achieve multiple keyword placement in the engines- Your percent of keyword weighting is dependent on your documents word count – formula should be determined based on competitor evaluation – keyword spread “where you put the keywords” is dependent on regions of bot scrutiny for your vertical)
3) Use -no follow- attributes within internal links in your site for pages you want to stop throwing away PR juice at (C’mon do you REALLY want your “contact us” page to take away PR juice from your home page?)
4) Google’s universal search has added social media to the Markov Chain mix: blogs, video, press, all hold PR weighting and should be considered. (Is you blog rss in Technorati? Or bumpzee?) Start being more social!
5) Per page text clustering is an iterative process – developing a site content matrix can help you assign multi page agglomerative clustering techniques to achieve multi page placement in natural search for thousands of keywords. Hire an SEO content specialist to develop your website content, press, blog posts, etc.
6) Garden Wall’s – Ensure all of the pages in your site are linked to internally or externally. IF you have pages that are not linked to or from any source stop the garden wall on these pages by creating a Google sitemap and actually loading it up in Google (Google Webmaster tools can help)
Each site has its own keyword density or “weighting” formula this formula is based on many factors that a professional search engine optimization professional should be able to identify and manage. Your content matrix needs to be refreshed monthly to trigger natural keyword search placement for specific pages on your site or to help you out maneuver your competitor’s placement. These factors are also influenced or changed based on the code that your website was written in. (Certain code allows or disallows bot spidering: redirects, error pages, there are many things to consider, do you need a .php explode script? Or mod rewrite? Are you running on .asp?)
Individual verticals run off of different algorithms in Google- adjustments per vertical should be based on competitive evaluation of the sites that ARE in the top positions comparative to your own desire for page placement or page rank. Pull back link reports and placement reports for your competitors to determine how they achieved search placement. Companies that are interested in the “best” of breed competitive analysis tool for SEO need to have Syntryx.com
So how well does all this work?
Well I launched my company website a couple of months ago and was able to achieve top placement for most of my terms within 8 days (THROW out the theory that older domains place better please!) – Including a nice placement for the term “Search Management” in Google out of 135,000,000 other websites I am in the top 15! Yup, it works.
You do not have to understand Term Vector Model formulas to do SEO. You may, like I, just simply have a knack for identifying patterns. It’s all in the code, just remember to leave a good crumb trail for the ants and some sugar for the spiders!
Happy SEO’ing and sigh, Yes! – Happy New Year!
-
http://www.cumbrowski.com Carsten Cumbrowski
-
http://www.paulsonmanagementgroup.com Heather Paulson
-
http://www.paulsonmanagementgroup.com Heather Paulson
-
http://www.cumbrowski.com Carsten Cumbrowski
-
http://www.paulsonmanagementgroup.com heather paulson
-
http://www.jimkukral.com Jim Kukral
-
http://www.rhinofish.com Pat Grady
-
http://www.paulsonmanagementgroup.com Heather Paulson
-
http://www.portentinteractive.com Tom Schmitz
-
http://www.paulsonmanagementgroup.com Heather Paulson
-
http://www.rickvskiki.blogspot.com Tony Cohn
-
http://www.jordankasteler.com/utah-seo-pro-blog/ Jordan Kasteler – Ut
-
tracker
-
http://www.paulsonmanagementgroup.com Heather Paulson
-
http://www.mycompanymarketing.com seo professional
