Unoriginal Content

February 15th, 2007

I’ve seen two particularly strange examples of plagiarism recently. Actionscript Hero’s blog is being reproduced on several domains. I figured they were simply caching his site, but they’re actually listing his site’s IP address under their domain names. The only purpose I can see is to enhance the domain through quality content (which aSH provides) and then swap it out later with a spam site. This cannot stand.

I offer this advice to aSH and anybody else suffering from similar predicaments. There’s always ye olde .htaccess.

RewriteEngine On
RewriteCond %{HTTP_HOST} !^(www.)?somedomain.com$ [NC]
RewriteCond %{HTTP_HOST} !^(127.0.0.1|localhost)$ [NC]
RewriteRule ^(.*)$ http://www.somedomain.com/$1 [R=Permanent]

I can’t personally test this (still on a shared host believe it or not) but unless I’m wrong this will redirect any request that is not from either somedomain.com or www.somedomain.com to www.somedomain.com. There’s also an extra condition for the standard localhost and 127.0.0.1 loopback addresses. You might want to add another condition for people who access your site directly from the IP address as well. This is what it will do…

http://www.plagiarist.com/blog/really-cool-post -> http://www.creator.com/blog/really-cool-post
http://www.shadycharacter.com/about-me -> http://www.coolperson.com/about-me

You could also make the page redirect to a special page which explains what’s going on to any confused readers, further depriving the content-stealing site of any credibility.

There’s another more old-fashioned form of plagiarism inflicted upon Aral Balkan. Some of his better posts were directly copied from his blog onto someone else’s blog. The offender was stupid enough to (it looks like he wasn’t aware of what he was doing, everybody makes mistakes) hotlink the images from Aral’s site so Aral replaced them with images of the plagiarist proclaiming his lack of original thought. I applaud Aral for the restraint shown and not replacing the images with something less… polite.

Simple referer-blocking will prevent most attempts at hotlinking, although I am personally aware that this isn’t foolproof. There are ways around it, although I’m not sure what they are, I’ve experienced extreme forms of this mysterious event. If anyone links to his blog, please make absolutely sure the rel=”nofollow” attribute is included so his page rank won’t be improved.

I have one last tip. If there’s a situation where a site is being cached on another site, find a way to include the IP address the person accessing your site on every page. When the content stealer updates the cache, his IP address will be included in the HTML on his site. Block that IP permanently with this .htaccess code…

order deny,allow
deny from <IP address>

4 Responses to “Unoriginal Content”

  1. aSH Says:

    thank you!!!

    best regards,
    aSH

  2. zyboy Says:

    just wondering about that last directive.. here’s the apache docs:

    “Allow,Deny
    The Allow directives are evaluated before the Deny directives. Access is denied by default. Any client which does not match an Allow directive or does match a Deny directive will be denied access to the server.”

    so your bad IP *will* match the ALL in allow..(I think – guess I should test this before I hit submit, but there ya go)

    so would this be better?

    order deny, allow
    deny from
    allow from all

    Cheers.

  3. Aral Says:

    Thanks for the tips. I’ve seen my posts (and those of others) being added to sites similar to the ones that aSH’s are being added to. I believe you’re right about the intent behind these sites: another sad attempt at spam. I’ll look into implementing the .htaccess rules you mentioned. It seems whenever I touch that thing something breaks though 🙂

  4. Max Says:

    zyboy: You’re half right, I was wrong but it looks like the “allow from all” line allows requests from those who would previously be denied. I’ve updated the post with the new example.

    order deny,allow
    deny from <IP address>

    Thanks.

Leave a Reply