Trevin serves as the VP of Marketing at WebFX. He has worked on over 450 marketing campaigns and has been building websites for over 25 years. His work has been featured by Search Engine Land, USA Today, Fast Company and Inc.
As SEOs, a big part of our job is to create compelling content that is indexed by Google and easy for searchers to find. That’s simple enough, but there are also plenty of things on a web server that you don’t want Google to find. Log files, configuration files, personal data, customer databases, and administration documents are just a few examples of files that shouldn’t be crawled by search engines.
Should Google or another engine crawl and index sensitive data, your site becomes vulnerable to all sorts of things. How does this work? Hackers or curious searchers can use advanced query operators in search engines to specify the type of file and data they are looking for.
Typically, they rely on some sort of footprint that will be present on a large number of sites. This footprint can come from text on the page or URL/site structure. The best way to understand this is to execute a few Google hacks yourself.
Below are five examples of advanced queries that utilize footprints left by files and folders that people typically do not want available for public consumption. None of these are particularly sinister and all have been widely known for a few years. Use these at your own risk for research purposes only 😀
Google’s search operators allow you to specify a bunch of different operators, including TLD and file type. This query returns a list of PDF files on government sites that are ‘confidential’ to everybody but Google.
This query takes advantage of a footprint left by Panasonic webcams that still use their default settings. Typically, no password is required and you can pan, zoom and tilt the camera. You can also find and control many Axis webcams with this query: inurl:indexFrame.shtml Axis.
Toner isn’t cheap … if your printing network is unsecure, somebody from far away can run wild on printing test pages … and that is possibly the ‘nicest’ thing they could do.
Professors oftentimes post final grades online so all of the students have easy access. Unfortunately, so does Google. Professors often use personal information such as student IDs to separate the students.
These are just a few examples of thousands of vulnerabilities that search engines can find.
There are far more advanced queries that have been formulated to hack sites, steal credit card information, gain MySQL access and all sorts of nasty things.
Here are a few tips on how you can prevent information leakage like this on your own site.
Always hire IT people who are knowledgeable about security. Failing to make security a priority from the start can lead to major problems down the road.
Ask Google to remove a URL from their index.
Put Robots.txt to use. This configuration file tells search engine bots which pages and folders on your site they aren’t allowed to crawl.
Password protect sensitive information. Duh.
Setup Google Alerts to signal you when potential information leakage occurs.
Craft a tailored online marketing strategy! Utilize our free Internet marketing calculator for a custom plan based on your location, reach, timeframe, and budget.
Get expert ideas, industry updates, case studies, and more straight to your inbox to help you level up and get ahead.
"*" indicates required fields
Try our free Marketing Calculator
Craft a tailored online marketing strategy! Utilize our free Internet marketing calculator for a custom plan based on your location, reach, timeframe, and budget.