|
December 28, 2006
Filed Under (Google) by Matt / Derick on 12-28-2006
When it comes to search engines, Google is by far the one that most companies and individuals shoot for inclusion into. But it's not always enough to get into Google. You want your website to show up at the top of the results page for keywords and phrases that are important to your business. That's what all the SEO stuff is about. One of the best ways to make sure Google sees your entire website is to use Google Sitemaps. I've talked about Sitemaps before. They're single-file snapshots of every file on your server that Google uses to crawl your website. It's no doubt that they're very helpful in getting your site indexed. But, as the Guardian Unlimited reports, Shoemoney.com found out the hard way what an improperly created sitemap can do. In short, it can open your site up to hackers. Shoemoney.com used default settings to generate a sitemap for Google to index from. The sitemap included a listing for every file on Shoemoney's website, including old, unupdated, unused files in obscure locations on the server. Using Google Code search, a hacker located one of those old files and used it to hack Shoemoney.com. I see the fault here falling into two camps. Shoemoney.com accepts full responsibility for what happened to the site because the sitemap generation was done incorrectly. In part, that's entirely correct. Don't upload sitemaps to Google (or any other search engine) until you're sure they're created properly and don't contain links to parts of your server that you'd rather keep prying eyes out of. But I also think Google should accept some responsibility for this and other instances of indexing gone completely wrong. There are several directories, file types, and individual files that Google has no business indexing and making searchable, regardless of what a sitemap says. And despite their desire to catalog the world's information, I don't think that involves directories and files store on private servers that the company really knows better than to index. Oh, but you say the crawling and indexing of websites is done by a computer? That's true. But Google really needs to make sure their computers aren't indexing the kind of information mentioned in the GU and Shoemoney articles. When would a website owner ever want their log files or database configuration files added to the public Google search engine? The "world's greatest search company" can figure out a way to fix this. If not, they need to rethink their strategy to using sitemaps. As soon as the wrong person's website gets exploited because Google indexed a password file, the issue will explode. But that's not the end of it. Google's eyes can see much more than the files it indexes for the search engine. The much-touted Google Desktop is also guilty of what many would consider a pretty serious security violation. As one IT department found out, a feature included in Google Desktop sends entire files to Google for storage.
But here's where I think the flaw in that philosophy lies. Nothing is computer secure. Sure, Google probably has some of best security measures available in place on its servers. It would probably be very hard for a hacker to access the server that all of these private files are being stored on. But nothing is impossible. It's quite possible, in fact, that private or sensitive information could find its way into foreign hands either by way of a hack or simply curious eyes at Google. As the IT department in the article I referenced above illustrated, the information that would have been sent to Google from their business contained sensitive medical information on who knows how many people. Every company out there won't be as observant as that IT department was. Who's to say that other companies with your private data on their computers aren't sending all that information to Google without even realizing it? How could Google ever really think this system could be totally secure? While the information is there for those willing to look, they don't publicize the fact that their search engine crawls private files or that their Desktop product sends entire files to their servers. Why? Because they know people wouldn't go for it.
Comments:
3 Comments posted on "Hey Google: Get Out of My Private Files!"
Ionut on December 29th, 2006 at 5:57 pm #
Sorry, but you are way off. You say: “While the information is there for those willing to look, they don’t publicize the fact that their search engine crawls private files or that their Desktop product sends entire files to their servers.” Search across computers is off by default and you need to explicitly enable it. If you were curios enough to install Google Desktop, you would see this in the preferences page: “Index and search my documents and viewed web pages from across all my computers. (This feature transmits the text of your indexed files to Google Desktop servers for copying to your other computers. Only files you open after turning on this feature will be copied to your other computers for searching).” And if you and your security gurus searched on the web, you’d find out that there’s an enterprise version of Google Desktop that lets you disable many features, including this one, for each and every computers in a network.
Matt / Derick on January 1st, 2007 at 4:53 pm #
Thanks for taking the time to respond. I apologize for the delay in getting back to you. This time of the year is always a little hectic. To start with, I’m actually quite familiar with Google Desktop. I’ve been using it since it was a beta product. I’m not only curious enough to install it, I’m curious enough to keep using it. Haha. That being said, I am aware that the search across computers feature is off by default. That’s beside the point. Especially when you look at the way the explanation text that accompanies the product is worded. It says “This feature transmits the text of your indexed files to Google Desktop servers for copying to your other computers”. That implies that the Google servers act as a gateway during a transmission. It says your files are copied to the Google server “for copying to your other computers”… but what it doesn’t say is that the files are copied and stored on the Google servers. Any file you open after enabling the feature is copied to Google and stored on the off chance that you might need it from another computer. The wording implies that Google only acts as an intermediary between your two computers when you’re “copying to your other computers”. They need to make it more clear about exactly what that feature is doing. How about this: “This feature transmits the text of your indexed files to Google Desktop servers for storage until you need to copy them to your other computers. Only files you open after turning on this feature will be copied to Google’s server and your other computers for searching.” The computers that the company in the article I was talking about received came pre-installed with Google Desktop, so in this case it didn’t matter than an enterprise version was available. As the person in the article said, people turned on their new computers and up popped Google Desktop. Most employees assume if an application is on their work computer, they’re free to use it. The enterprise version of Google Desktop is a great solution for companies that want to deploy it. But when it’s coming pre-installed on computers, Google needs to be more proactive about telling consumers and businesses what their product is doing. I hope that clears up any confusion about the intent of the post. Thanks again for your feedback! -Derick
Google Plugs a Hole in Its Desktop Application. Why Aren’t Others? on February 21st, 2007 at 12:13 pm #
[…] So Google Desktop had a big, honking security hole in it, huh? That's what The Washington Post is saying, anyway. Why does that sound familiar? Oh! That's right! Because we've talked about the potential security threats posed by Google Desktop in the past. […] Post a comment
|
|