Feb 24

Footprints

No, I’m not talking about footprints in the sand. Nor is this some CSI type of topic although some might find it related because this article is about forensic website research.

More specifically, you’ll get to find how Google, Yahoo, MSN, and even your competitors find your networks.

So - first off - what is a footprint?

Well, quite simply, it’s nothing more than an easy way to identify your site(s).

For instance, years ago when Traffic Equalizer was a popular piece of site generation software, it came with a default template.

I don’t have the template anymore so this is from memory but there was one piece of text that appeared on every page that was generated by Traffic Equalizer. Something along these lines:

You have reached xxx.com - your best resource for finding information on xxx.

Now, if you went to Google and searched for ‘your best resource for finding information‘ you would identify a plethora (literally thousands) of sites that used the default templates for Traffic Equalizer.

It’s this type of footprint that makes it very very easy to target a web page.

Now, there has been quite a bit of discussion over the years about actual HTML or JavaScript code being part of a footprint for a site.

While this is technically possible, it’s my belief that this has never been done. I’ve done various testing and I’ve never actually seen anything that results in a site getting discovered from a common HTML template.

In my experience it has always been through text that is common to every page.

Now, having said that, text is easy to detect through an algorithm. Put another way, it’s trivial to detect a common set of sites based on a bit of software that looks for a specific bit of text.

It’s also pretty easy for Google and other search engines to detect a network of sites based off of the links that are outgoing as well as incoming.

In fact, what webmasters often do is link to their one site of theirs from another one that they own. To cut down on this footprint, they’ll try things like linking in a circle or using a hub linking structure or a three way linking structure or something else.

Most of these methods are easily detectable by the search engines. In fact, they even have a name for it - a spam island. It’s just a collection of sites that are all linking to each other in some way.

So…they can detect you through text…they can detect you through links…they could (but probably do not) detect you through html or javascript.

How do you avoid detection?

Well, unlike the other guys out there…I’m going to put it in black and white for you.

You Will Not Escape Detection!

That’s right. Whatever little trick you do to escape detection is not going to work in the long haul.

That’s why the term ‘churn and burn‘ is often used with regards to black hat sites. We know that they’re only going to last a month, two months or maybe six months if we’re lucky.

Sure, there willl be unusual circumstances where a site manages to last longer than that but that’s the odd one out of the bunch and, in my belief, is not a regular occurence.

The Dreaded Manual Review

Believe it or not but Google actually hires an army of web spam reviewers. I recently read that they have about 3000 people on the spam team in Germany.

That’s a lot of people to look for spam sites. If you’ve got a theme that looks the same or you have text that is duplicated throughout your sites or even if you’re part of a spam island, chances are that army is going to find you.

What’s a Spammer to Do?

So what’s a spammer gotta do to make some money?

Well, it’s about the quantity rather than the quality of the pages that you create.

As a matter of fact, in my testing I’ve discovered that the uglier the site is, the better my CTR is.

Over a year ago I had a network of over 50,000 sites that generated an average of 25% CTR.

So, there you are cranking out the pages and you’re trying to avoid detection as much as you possibly can.

Some of the things you might do:

  • Use multiple domains.
  • Hide your whois info - keep in mind, however, that Google is a registrar and they can view the details anyway
  • Make sure you have no common text on your pages.
  • Vary the number of links on your pages.
  • Vary the ratio of incoming to outgoing links on your pages.
  • Vary the size of your pages.
  • Vary the size of your sites.
  • Avoid the use of subdomains - it’s much easier to take an entire network of sites down when you’ve only got one domain. I made this mistake when I first started and went from $500 a day to $5 a day - overnight!
  • If you’re really paranoid, you can even put in random snippets of HTML/Javascript. Heck, I know a few people who have gone so far as to rename their CSS classes.
  • Cloak your content - provide a legit looking page to Google and other search engines but provide something else to users. Keep in mind, however, that this isn’t foolproof.

Yep. There’s a ton of things you can do to help your sites last longer.

Every little thing that you do, however, is going to take more of your time and resources to implement.

So you’ll have to figure out if it’s worth the time, effort and money to implement each of those techniques or if you should keep it simple and let the sites go down quicker.

It’s a tradeoff that will vary from situation to situation.

One last thing before I end, however…Don’t take my word for it on Footprints! Go out there and do some testing yourself to find out if you get detected or not and how quickly.

There is a lot of misinformation out there - especially when it comes to Footprints. There are a lot of theories but, in my opinion, people resort to FUD (Fear, Uncertainty and Doubt) when it comes to how the search engines could detect them vs how the search engines do detect them.

G-Man

1 Response

  1. kiran says

    hi  thats very useful information, i would like to know is there any way of avoiding footprints / deleting  them ?

    April 4th, 2008 |

:mrgreen: :neutral: :twisted: :shock: :smile: :???: :cool: :evil: :grin: :oops: :razz: :roll: :wink: :cry: :eek: :lol: :mad: :sad:

TrackBack URI

  G-man
 
Email Updates
Email:
     

  

View Geoffrey 
'G-Man' Faivre-Malloy's profile on LinkedIn

Links