Perils of Blog Scrapers

scraper-dangers-website There are several different types of website and blog content scrapers and many different activities that they use.  Some are harmless and some could be harmful to your websites health. 

Some but probably not all the types

  1. Copy and paste all your content onto their site with no credit
  2. Copy and paste and unreasonably large amount of content onto their site with no permission but with a link to your site as attribution
  3. Copy a paragraph of your content (or maybe a few sentences) and a link to your article saying something like -> “Here’s another great article about botox complications and hemorrhoids! “
  4. Running a short syndicated feed of your material (Headline, excerpt and link to your article)
  5. Building an entire news or blog site dedicated to doing number 4 with all the news and blog sites in the world (think
  6. Google news or Google Blog search) -
    1. Note. I am not passing judgement on Google for this behavior.  Google has been sued for this several times.

Click HereAny of the above (not counting Google news and the syndicated feed) can trigger a potential comment in wordpress with a trackback link in your comments that points to the permalink that referenced your site.

If you allow that comment to go through, you are allowing a nofollow link to point to that website.  You are not passing juice, but nofollow links are indexed and you could be mixing your site with a blacklisted neighborhood on the internet.  This could be very bad for your website health.

I recommend marking scraped trackback comments as SPAM, just to be safe at a minimum, remove the link from their trackback, and maybe just show their url address in text form.

If you consider your readers perspective, if they come to your site first, and they then click out from a link to a scraper site that has a link in your comments,

  1. you are losing readers to a scraper site
  2. Your sites credibility is harmed in the eyes of the reader as a crank website is pointing at you
  3. They may see something very vile on the other end or even pick up a computer virus or something if they are scraping not to steal your content but to pull people away with your content and give their computer something nasty!

 

Other Articles that I read before writing this article about content scrapers

  • When Scarfing Content Strikes Back - I wrote this back in October of 2007. 
    • I first noticed this issue when writing an article on a satire site.  I happened to include the word botox in a derogatory way about Anna Nicole Smith or something like that.  I then immediately got a pingback from a site dedicated to the B injection and started looking into the issue.  I have seen it on sites ranging from diet pills to SEO to Webkinz and politics.
  • Getting Value Out of Scrapers - (I do not agree with this article, but do like the idea of including a link in your feed back to your own site.) dated 1-10-2008
  • Fighting Scrapers and Using them to Our Advantage  dated 1-10-2008
Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google
  • BlinkList
  • BlogMemes
  • BlogMemes Cn
  • co.mments
  • E-mail this story to a friend!
  • Fark
  • feedmelinks
  • Furl
  • LinkedIn
  • Live
  • Ma.gnolia
  • MySpace
  • NewsVine
  • Pownce
  • Print this article!
  • Propeller
  • Reddit
  • Slashdot
  • Spurl
  • StumbleUpon
  • TailRank
  • Technorati
  • TwitThis
  • Yahoo! Buzz

4 Responses to “Perils of Blog Scrapers”

  1. Tee Says:

    Thanks, Brett. So let me know if I’m doing this right - usually my Aksimet catches it as spam. I simply delete them. Is this the best way to go about it?

  2. Mrs. Mecomber Says:

    Therefore, I should remove the feature that enables the trackback link in my comments? I’m new to Wordpress, so I hope I am understanding the terminology.

  3. admin Says:

    Hi Tee,

    Yes, letting akismet catch them and then deleting them is the safest way to go.

    If akismet doesn’t catch them, then mark them as spam and delete them.

    As an half way step, you could just edit the comment and remove the hyperlink. This would be safe for SEO, but I think it does your readers a disservice.

  4. admin Says:

    Therefore, I should remove the feature that enables the trackback link in my comments? I’m new to Wordpress, so I hope I am understanding the terminology.

    I do not think you have to do that. There are times when it is perfectly acceptable to enable trackback links. If you want your blog to be extremely automated, and not worry about this at all, you could turn them off. However, I do think that enabling them and leaving them on does help to network with other bloggers and to give your readers access to other related perspectives.

    At the end of the day, if you treat this like an editorial exercise, picking and choosing good trackback’s versus trackbacks that do not add any value to your readers experience, I think you will be happy with the result.

Leave a Reply

Free FireFox Plugin Tool

One of the best tools that I have found this week is a free FireFox plugin offered by the people behind SEOBook. The plugin is free and is more useful than any plugin I have seen to date. Plus, they have great free tutorials and instructions on how to put it to work for you to improve your sites!

Click the banner above, then click tools to go to the plugin page! :)

Web Resources

.htaccess (3)
Ad Optimization (1)
ad placements (2)
ajax (1)
anti-virus (1)
attributes (1)
Backup (2)
Better Blogging (5)
Bill Payment Services (8)
blog testing (2)
blogger templates (4)
Blogging Finance 101 (3)
Blogging Tips (84)
Blogroll (1)
Blogworld (3)
Browser Tips (3)
building traffic (2)
Buttons (1)
Category Management (3)
censor fighting tools (2)
Charities on the Web (1)
Comment Management (2)
Contact Information (1)
Content Generation (4)
content sources (3)
Converters (1)
css (3)
directory listings (1)
dsl (1)
ebook evolution (1)
Ecommerce (4)
Electromagnetic Impact (1)
email (1)
email security (2)
embedded links (1)
Failing to Improve (1)
Feed (4)
Fighting Spam (2)
firefox addons (3)
Flash (3)
fonts (1)
ftp upload (1)
full screen video (1)
Future Niches (3)
Google (15)
Graphics (4)
Groups (1)
Hardware (1)
html tips (7)
images (5)
javascript (2)
logos (1)
market research (5)
marketing (3)
micro blogging (3)
Microsoft (4)
MindManager (2)
mobile blogging (4)
Monetization (23)
Nofollow (3)
Online Games (1)
Online Maps (1)
Online Protests (1)
Online Services (9)
Optimization (2)
PayPal (6)
Permalinks (2)
phishing (4)
php (1)
Pinging Servers (1)
Plugins (14)
podcasting (3)
port 25 (1)
Privacy,Credibility (2)
reporting abuse (2)
Revver (2)
search engines (5)
Security (3)
SEO (7)
Sitemaps (2)
social networking (9)
Sound Effects (1)
spam (2)
statistics (1)
tags (1)
Telecommute (3)
toolbars (2)
Traffic (1)
twitter (1)
Uncategorized (9)
updates (9)
user generated content (1)
Utterli (2)
Utterli (2)
Utterz (15)
validation (2)
Video (61)
video blogging (4)
Video Conferencing (2)
Video Marketing (1)
Video Production Tips (10)
Video Services (12)
Viruses (1)
VOIP (3)
Web Design (9)
Web Design Software (7)
Web Launch Planning (2)
Web Research (2)
Web Resources (12)
Web Service Review (5)
Website Reviews (3)
Website Rumors (2)
whitepaper creation (1)
Widgets (2)
Wiki, wiki, Wikis (1)
WindowsLiveWriter (17)
word counter (1)
Wordpress (23)
Writing Tips (4)
xml (1)

WP Cumulus Flash tag cloud by Roy Tanck requires Flash Player 9 or better.

January 2008
S M T W T F S
« Dec   Feb »
 12345
6789101112
13141516171819
20212223242526
2728293031  
Categories
ss_blog_claim=610ce799b239e79251b6e63cc094914d ss_blog_claim=610ce799b239e79251b6e63cc094914d