If you do anything with search engine optimization including email marketing, the term duplicate content you must have heard before and like me and any other SEO’s you are not sure exactly what duplicate content is all about and how it affects your SEO.
Duplicate content is when two (or more sites – thank you Google Adwords) has the same content on their site. Now, you might be wondering what does that mean? We are talking about content that is nearly identical word for word but I also l have experienced duplicate content issues with templates as well but we can talk about this later.
How does duplicate content happen? If you have written a blog post, article, press release that’s been picked up, scrapped, or syndicated your content will be duplicated on the web. Unfortunately this is the nature of the Internet beast and I really do not see an end in sight but Google has made it easier to manage this through your Google Webmaster Central. The Google Webmaster Central team has developed a new feature to assist with duplicate content issues and search engine marketing. It’s called parameter handling and it allows you to tell Google what to ignore or pay attention to when it comes to indexing your site.
This is great for the SEO that doesn’t have quick access to add code regarding duplicate content to a website. However, these parameters will be treated as “hints” by Google.
Why do search engines care about duplicate content? It really comes down to money but doesn’t it always? Seriously, The goal of the search engine is to deliver the best value for a given search term or phrase to make sure you will come back and hopefully click on an ad or signup for a “free” service where you can be marketed to.
The intent for the search engine is to avoid serving up many of the exact same Web pages in the search results. Thus, creating confusion for the searcher and delivering a poor searcher experience. So they attempt to filter all of the duplicate content and choose one based on certain criteria and then serve it up. Notice how I said “filter” and not “penalize” because this is that is exactly what the search engines do is filter the duplicate content not penalize. The difference is that a filter does not hurt the domain trust or authority of the domain overall whereas a penalty will. Of course there is always exception to this rule and search engines never tell all so your mileage may vary.
So how do search engines deal with duplicate content? Search engines send out a spider (bot) or program to surf the Internet and collect all of the content it finds. This content is indexed and placed into a database.
During this process, the content is compared against other duplicate content. Then an attempt is made to determine the original. Some clues that help it determine this are:
- How trusted is the domain? Terms like Page Rank and Trust Rank are used here.
- What is the domain age?
- How many inbound links are pointing to the site and content?
- Where is the first place the search engine found the content?
- Has any of the content appears to have been “scraped” or repurposed?
- With a template are files names, images, and HTML components such as table names and “div” names the same?
Once a decision is made by the search engines, the content that wins stays and the rest is filtered out.
What Can You do to Avoid Duplicate Content Issues? So how to do limit duplicate content on your site? Some people have stated that duplicate content has nothing to do with HTML or templates but my experience tells me differently. How many templates from the “templates are us” companies do you see listed multiple times in the top 10? Now I am not saying that a template will get you totally thrown into duplicate content hell but why risk it. If you do use a template change around some of the HTML naming conventions and you should be fine.
I am a big fan of the canonical tag that was agreed upon by the major search engines last year. If you have multiple areas on your site that use duplicate content, use the canonical tag to tell the search engines to designate only one page as the authority.
Google recently posted an article on ways to handle on their webmaster blog about legitimate cross-domain content duplication. They announced the support of a link element and other tips for handling the problem. Basically, Google recognizes there is some legitimate uses for duplicate content like company contact and about us information and are willing to give solutions and help.
If you like to do press releases as part of your search engine marketing make sure you have the press release in HTML form (not PDFs as PDF SEO is not really the best was to search engine optimize) on your site a few days before you start to syndicate it out with a newswire service.
If you like to syndicate articles you can do the same thing or better yet change up the article content to have it say the same thing theme-wise yet be unique in nature.
Please discuss other ways you handle you potential duplicate content issues.
{ 4 comments }
