What is Cloaking and Why it is considered Black Hat SEO or spam?

iPage Affordable Web Hosting only $3.50/mo

cloaking search google bot 300x225 What is Cloaking and Why it is considered Black Hat SEO or spam?

Definition and explanation of Cloaking

Cloaking comes from the English verb “to cloak” – which means to mask, camouflage, hide, cover up. Cloaking – is a method of showing the search engine robot an optimized page, instead of the one a person visit site would see. Sometimes the pages are made not only optimized but completely different than the originals. For instance, the robot is fed with the page containing the list of popular musical tracks in mp3 format, when at the same time a person visiting this same site is offered to purchase a CD from the online CD store.

This is done to lure ordinary Internet visitors to visit the CD-store. These visitors were looking for links to see and download latest songs in the mp3 format. They were not looking for the online CD-store that they end up in. Thus defeating the point of search engine services. If you are told you are getting the results of what you searched for and when you are on the page where you expect those results to be, they are not there and instead you are offered to buy something. Yeah I know, I know:)

Cloaking is done with the help of software and scripts, that execute on the web-server side. Server scripts create outgoing data depending on the changing parameters:

  • parameters in the request address
  • accessible system web-server integers and the platform
  • some request parameters

There is a list of information in the data by which you can find out who is sending the request – a robot or a regular visitor and create the resulting page accordingly. Sadly enough, to execute the cloaking using regular HTML or JavaScript code is not going to work (just in case there are some cool guys reading this:) At first it seems that it’s impossible to “catch” the cloaking, since you need to be a regular visitor to see the original site, and you need to be a robot to see the optimized page. I have to disappoint you, there is a way to determine if there is cloaking being done, even if you are not working for Google. You just need to pay attention from here on:)

So how do we determine if it’s the search engine bot or robot? There are two ways:

  • To check the User-agent field
  • To check the IP-address, where the request is coming from

User-agent Cloaking

This is the simplest method, based on the checking the User-agent data, where during the request in most cases the robot’s name and additional data is passed on. Below are examples of User-agent data for some search engine bots:

  • Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  • Yandex/1.01.001 (compatible; Win16; I)
  • Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

Your script that is doing the “page substitution” has a list of these lines, or even sub-lines, and if there is a sub-line in the User-agent, script gives the optimized page, and can even give each bot a different page optimized just for this bot. This takes a little bit more time, since the robots are different, but does deliver more specialized results.

This type of cloaking is pretty easy to “catch” – yes, you guess it right, you can use a special software that you wrote or someone else, which allows you to pretend to be a search engine bot and there you have it… An optimized page instead of the original one.

IP-address Cloaking

This is a more advanced approach, based on checking the IP-address of the sender. You need to be in the top five coolest guys in the class to do this type of cloaking. Ip-address is almost impossible to fake (some day I will dive deeper into this stuff, if there are enough people interested). Knowing the corporate sub-nets (address ranges) of the search engine systems, you can give the optimized and tailored page to all web-clients from those sub-nets. In this case, even the search engine employee with the regular browser will see the same optimized page as the engine with the ip-address from the same sub-net would see. People with the ip-addresses from outside of the ip address range will see the different page, in our example the online CD-store.

I have to disappoint you again here, this method is also pretty easy to do. That same search engine employee can request the page via proxy-server from another address range. That proxy can be anonymous proxy located somewhere in Sweden or Portugal. There are many sites like webwraper.net that can provide these services. An ordinary visitor can just look at the saved copy of the page in the search engine’s database (many search engine systems have these databases today). You need to look at the dates of the page changes and its indexing, since it’s possible to confuse cloaking with another approach – swapping.

Swapping is done right after the page indexing has been done. It’s basically replacing a part of the entire page code and content after the indexing in order to reach the top of the search results. The substituted code and page does not even remotely remind of the original page. Swapping is meaningless for Google since Google indexing updates are very frequent. However, there are many popular engines out there that index pages with large time gaps, making swapping more relevant and existing black hat or gray approach.

Combined approach

There is even cooler way to do this, some bad boys use both of the approaches at the same time – first they determine the ip-address range of the request, then their script checks the User-agent data.

What if this is not spam?

If your goals is not to fool the search engine system, but actually help your visitors (yes, believe it or not those guys exist too). Here are the examples:

  • show the page visitor the same page in a different language (visitor preferred, retrieved from the visitor’s browser)
  • redirecting the visitor to a mirror site depending on the visitor’s location
  • showing the page optimized for a certain version of the visitor’s browser for better compatibility and so on

Google is actually a step ahead in this matter, since when entering www.google.com, depending on your language and regional settings, redirects you to the matching localized version of this page (www.google.com)

Note to self:

When using cloaking or other similar methods, it’s important to remember, that a reason for the ban will be the obvious cloaking in attempt to manipulate or fool the search engine robots. If you are doing this in order to make your visitor’s happy, then you shouldn’t get banned.

I have to give a little secret away about the search engine robots. The search engine bot will receive only one of all possible versions of the document. For instance, when giving out different language versions, GoogleBot will most likely see the English version and will not know about the versions in other languages and other bots in different languages will not know about the English version. This opens a more sophisticated ways to do things that we will not talk about in this topic:)

Beck @ ProfitSEO.com

Similar Posts:

Popularity: 6%

Leave a Reply