Is Yahoo Under the Influence of TrustRank?

by Kim Roach

As you travel the vast world of search engine results among Google, Yahoo, and MSN, you are likely to run into junk pages at some point in your journey. Although the search engines are working daily to improve search engine results, the search engine spammers are working just as hard to slip through the cracks.

As you know, Google’s algorithm is light years ahead of Yahoo and MSN. However, Yahoo has been implementing a number of changes to keep up. One of these includes the recent filing of a patent application called “Link-based spam detection.” (http://tinyurl.com/f539j) This patent details Yahoo’s ideas on how to reduce the massive amount of web spam that litters the search engines.

The search engines are very well aware that there are spammers who would like nothing more than to trick the search engines in any way possible. This is shown within their patent, which states:

“Since top positions (high ranking) in a query result list may confer business advantages, authors of certain Web pages attempt to maliciously boost the ranking of their pages. Such pages with artificially boosted ranking are called “web spam” pages and are collectively known as “web spam.” ”

In fact , the Yahoo patent even describes many of the spam techniques that are currently being used today.

Little has been said about the release of this new patent application. I am sure if it had been Google releasing a new patent, there would have been massive coverage on the topic. However, as web masters, we should not ignore the search engines, even if they are minor players. This new patent reveals important trends that should not be overlooked.

Before I begin, keep in mind that Yahoo does not necessarily use these techniques. They have simply filed a patent application, which gives us some good indications on what they have planned for the future.

Within this document, Yahoo has outlined a system to cut down on web spam. The authors propose a technique to semi-automatically separate good, quality sites from spam sites. This is achieved through an algorithm that detects spam farms with the help of PageRank and TrustRank.

Interestingly enough, both of these terms are trademarks of Google. Although the same terms are used, the application of these algorithms are probably somewhat different. Here is how Yahoo’s patent application defines each term:

“PageRank is a family of well known algorithms for assigning numerical weights to hyperlinked documents (or web pages or web sites) indexed by a search engine. PageRank uses link information to assign global importance scores to documents on the web.[…]. The PageRank of a document is a measure of the link-based popularity of a document on the Web.

TrustRank is a link analysis technique related to PageRank. TrustRank is a method for separating reputable, good pages on the Web from web spam. TrustRank is based on the presumption that good documents on the Web seldom link to spam. TrustRank involves two steps, one of seed selection and another of score propagation. The TrustRank of a document is a measure of the likelihood that the document is a reputable (i.e., a nonspam) document.”

This is not the first time that Yahoo has thought about TrustRank. In 2004 , Yahoo co-authored a research paper with Standford University entitled, “Combating Web Spam with Trustrank.” (http://dbpubs.stanford.edu:8090/pub/2004-52)

This paper has many of the same theories as the new Yahoo patent application. Both use a semi-automated system for determining whether a page is reputable or spam. Some human intervention is required in order to pick out a set of reputable seed pages. The algorithm then uses this set of seed pages and rates other pages based on their interlinking pattern with the trusted seed pages.

However , in this particular document, details were not given on how this would take place. With the release of Yahoo’s new patent, we are given a glimpse at one possible approach. Unfortunately, the explanation is way beyond my technical and mathematical abilities.

The basics, on the other hand, are pretty easy to understand. For example, let’s say that a particular web site has been determined to be a reputable web site. If you acquire a link from this site, your web site would then be given a higher TrustRank because you are closely associated with the reputable site.

The further out a web site is within the linking structure, the lower the TrustRank they would receive. Basically, according to Yahoo’s proposed mechanism, the link structure of reputable web sites can be used to discover other pages that are likely to be reputable sites.

What does this mean for your web site?

This is just one more attempt to improve the relevancy of search results. This time the idea is centered around detecting links from link farms and other shady resources. The value of staying in the search engine’s “good” book is becoming increasingly more important.

It is crucial that you obtain inbound links from quality, authority sites and avoid un-reputable junk sites at all costs. Focus on organic link building and link to high-quality sites that are beneficial to you and your web site visitors. Services that offer instant link exchanges may look good on the surface, but they could very well cause damage in the long run.

The search engines are getting smarter every day. Fortunately, we don’t have to. The search engines have always been looking for the same thing: good quality content. As long as you fill your site with good content and follow some basic search engine optimization principles, you should do well.

About the Author

Kim Roach is a staff writer and editor for the SiteProNews and SEO-News newsletters. You can contact Kim at: kim @ seo-news.com