Internet Marketing Monitor
November 17, 2006
Filed Under (Search Engines, Yahoo, SEO Tips) by Matt / Derick on 11-17-2006

The term "search engine optimization" is a little misleading.  Most of the techniques that traditionally fall under the SEO category are really aimed at the search engine crawler.  In fact, the crawler is the part of search technology that assigns all of the attributes to your website that the engine uses to decided if and where to display your site.

Some search engine crawlers can be directed around your site by a file called robots.txt.  Not all crawlers acknowledge robots.txt and choose to ignore it.  But some of them, included Yahoo's Slurp crawler, access this file (if it exists) and use it to analyze your site.  In fact, Yahoo!  announced upgrades to Slurp earlier this month that allow the crawler to recognize more characters and commands in the robots.txt file.

Two new recognizable characters, the wildcard (*) and the end anchor ($), allow website owners to further customize how Slurp crawls their website.

The asterisk as a wildcard is pretty standard.  It can be used to tell Slurp to allow or "disallow" certain files and directories from being indexed.  For example, the string:

Allow: /users*/

tells Slurp to index any directory that begins with "users".  The wildcard means /users/, /users_profiles/, and /users-photos/ would all be crawled by Slurp.  Similarly,

Disallow: /corp*/

tells Slurp NOT to index any directory that begins with "corp".  This feature is useful for keeping files and directories that you do not want added to the search engine from being indexed.  If you store personal files or other files unrelated to your website on the same server, this can keep your private files from ending up in Yahoo.

The $ character is used to specify to Slurp that something must be at the end of a url for the condition to be true.  For example,

Disallow: /*.bak$/

would keep any file that ended with ".bak" from getting indexed, but would allow a file with ".bak" in the middle of its name.  The $ character is useful for specifying certain extensions or other characters that typically fall at the end of a file or url.

More detailed information is available from Yahoo! Slurp's Help Center.

Are you using a robots.txt file to direct search engine crawlers around your site?  It may not be a bad idea, especially if there are certain parts of your website that you don't want showing up online.

Visit The Web Robots Pages for more information on the robots.txt file.

 

Related Posts & Pages Recent Posts



Post a comment
Name: 
Email: 
URL: 
Comments: