Perils of Blog Scrapers

scraper-dangers-website There are several different types of website and blog content scrapers and many different activities that they use.  Some are harmless and some could be harmful to your websites health. 

Some but probably not all the types

  1. Copy and paste all your content onto their site with no credit
  2. Copy and paste and unreasonably large amount of content onto their site with no permission but with a link to your site as attribution
  3. Copy a paragraph of your content (or maybe a few sentences) and a link to your article saying something like -> “Here’s another great article about botox complications and hemorrhoids! “
  4. Running a short syndicated feed of your material (Headline, excerpt and link to your article)
  5. Building an entire news or blog site dedicated to doing number 4 with all the news and blog sites in the world (think
  6. Google news or Google Blog search) -
    1. Note. I am not passing judgement on Google for this behavior.  Google has been sued for this several times.

Click HereAny of the above (not counting Google news and the syndicated feed) can trigger a potential comment in wordpress with a trackback link in your comments that points to the permalink that referenced your site.

If you allow that comment to go through, you are allowing a nofollow link to point to that website.  You are not passing juice, but nofollow links are indexed and you could be mixing your site with a blacklisted neighborhood on the internet.  This could be very bad for your website health.

I recommend marking scraped trackback comments as SPAM, just to be safe at a minimum, remove the link from their trackback, and maybe just show their url address in text form.

If you consider your readers perspective, if they come to your site first, and they then click out from a link to a scraper site that has a link in your comments,

  1. you are losing readers to a scraper site
  2. Your sites credibility is harmed in the eyes of the reader as a crank website is pointing at you
  3. They may see something very vile on the other end or even pick up a computer virus or something if they are scraping not to steal your content but to pull people away with your content and give their computer something nasty!

 

Other Articles that I read before writing this article about content scrapers

  • When Scarfing Content Strikes Back - I wrote this back in October of 2007. 
    • I first noticed this issue when writing an article on a satire site.  I happened to include the word botox in a derogatory way about Anna Nicole Smith or something like that.  I then immediately got a pingback from a site dedicated to the B injection and started looking into the issue.  I have seen it on sites ranging from diet pills to SEO to Webkinz and politics.
  • Getting Value Out of Scrapers - (I do not agree with this article, but do like the idea of including a link in your feed back to your own site.) dated 1-10-2008
  • Fighting Scrapers and Using them to Our Advantage  dated 1-10-2008

Spread the Word

4 Responses to “Perils of Blog Scrapers”


  1. Thanks, Brett. So let me know if I’m doing this right - usually my Aksimet catches it as spam. I simply delete them. Is this the best way to go about it?

  2. Therefore, I should remove the feature that enables the trackback link in my comments? I’m new to Wordpress, so I hope I am understanding the terminology.

  3. 3admin

    Hi Tee,

    Yes, letting akismet catch them and then deleting them is the safest way to go.

    If akismet doesn’t catch them, then mark them as spam and delete them.

    As an half way step, you could just edit the comment and remove the hyperlink. This would be safe for SEO, but I think it does your readers a disservice.

  4. 4admin

    Therefore, I should remove the feature that enables the trackback link in my comments? I’m new to Wordpress, so I hope I am understanding the terminology.

    I do not think you have to do that. There are times when it is perfectly acceptable to enable trackback links. If you want your blog to be extremely automated, and not worry about this at all, you could turn them off. However, I do think that enabling them and leaving them on does help to network with other bloggers and to give your readers access to other related perspectives.

    At the end of the day, if you treat this like an editorial exercise, picking and choosing good trackback’s versus trackbacks that do not add any value to your readers experience, I think you will be happy with the result.

Leave a Reply