|
December 19, 2006
Have you ever done an online search and found that the results returned were all pretty much the same thing? Maybe you've done a search for your content and discovered that both the regular version and the printer-friendly version are showing up in the results? That's not good news for most website owners because printer-friendly versions of content don't usually include active links… so people can't use them to discover your other content. Duplicate content can be a problem, especially when it's not intentional (we'll just assume that none of you would duplicate content to try to manipulate the search engine rankings). Google addressed the issue yesterday afternoon and outlined how they deal with duplicate information. And while all search engines operate differently, Google's approach should be viewed as fairly standard and indicative of similar policies you'd probably find at the other search engines. Defining the Problem Duplicate content, in the world of Google, is generally defined as content that is either identical verbatim or "appreciably similar". Examples provided by Google include content generated for mobile use, store items available at two distinct URLs, or content that is automatically displayed in multiple formats (blog posts, printer-friendly versions, etc). Versions of your content in different languages and "snippets" or quotes from other sites are not considered duplicate content. Google says that it addresses the issue for a couple of reasons. For one, some website owners will intentionally duplicate content to try to manipulate the system. I know none of you would try that. But sadly, some people do. In addition, Google wants to make sure that your best foot is being put forward in its SERPs. You don't want the mobile version of your content showing up at a URL that's 6000 characters long when there's a standard version at a much shorter URL available. Google's Approach Google's crawling and indexing technology tries to determine what is distinct information and what isn't. When someone searches for "peanut" and the search returns 50 pages with the exact same information about peanuts on them, Google is most likely going to show you one of those. If all 50 of those pages are yours, in different languages and formats, it's pretty much a hit and miss situation as to which version Google will choose to show. If the search company thinks someone is trying to manipulate things, they will adjust the index and ranking of the guilty sites. According to the post, Google tries to filter information before it dings a page's ranking, but filtering doesn't always work. Manipulative people will continue to try to find ways around those filtering techniques. Google has little choice but to intervene. Now do you see why it's so important to get your duplicate content under control? Do you really want the granddaddy of all search engines mistaking your website(s) for cheaters? Your Job Google provided a fairly detailed list of steps webmasters can take to make sure their content isn't incorrectly flagged as malicious or displayed in unpleasant ways on the SERP:
The ball is in our court. As webmasters, it's our job to make sure Google's crawler is getting the best of our content to index. Here are 10 ways to make sure that happens.
Post a comment
|
|