Open Source Intelligence Gathering

 A Penetration Test almost always needs to begin with an extensive Information Gathering phase. This post talks about how Open Sources of information on the Internet can be used to build a profile of the target. The gathered data can be used to identify servers, domains, version numbers, vulnerabilities, mis-configurations, exploitable endpoints and sensitive information leakages. Read on!

There is a ton of data that can be discovered via open source intelligence gathering techniques, especially for companies who have a large online presence. There is always some tiny piece of code, a tech’ forum question with elaborate details, a sub-domain that was long forgotten or even a PDF containing marketing material with metadata that can be used against a target site. Even simple Google searches can normally lead to interesting results. Here are some of the things that we do once we have the client’s (domain) name (in no particular order):

1.Whois lookup to find the admin contact and other email addresses. These email addresses very often exist as valid users on the application as well. Email addresses can be searched through database leaks or through a search service like HaveIBeenPwned that tells you if your email was found as part of a breach.

Apart from email addresses, whois queries can return IP history information, domain expiry dates and even phone numbers that can be used in Social Engineering attacks.

A Google advanced search using the site operator, to restrict to the target domain, to find php (or any server side script filetype), txt or log files

site:*.example.org ext:php | ext:txt | ext:log

On several occasions we have identified interesting files (log files for example) that contain sensitive information and full system path of the application using search queries like these. You can couple this query with a minus operator to exclude specific search results.

Perform a search on the domain (and sub domains) for good old-fashioned documents. File types include PDF, Excel, Word and PowerPoint to begin with. These documents may contain information that you can use for other attacks. Often, the document’s metadata (author name etc.) contained in file properties can be used as a valid username on the application itself.

site:*.example.org ext:pdf | ext:doc| ext:docx | ext:ppt | ext:pptx | ext:xls | ext:xlsx | ext:csv

You can download these files locally and run them through a document metadata extractor or view properties of each file to see what information is leaked.

To see all the options that can be used for searching data refer to https://www.google.co.in/advanced_search. Also, the Google Hacking Database (now on exploit-db) allows you to use pre-crafted queries to search for specific and interesting things on the Internet.

You can download these files locally and run them through a document metadata extractor or view properties of each file to see what information is leaked.

To see all the options that can be used for searching data refer to https://www.google.co.in/advanced_search. Also, the Google Hacking Database (now on exploit-db) allows you to use pre-crafted queries to search for specific and interesting things on the Internet.

.Check the robots.txt file for hidden, interesting directories. Most shopping carts, frameworks and content management systems have well defined directory structures. So the admin directory is a /admin or a /administration request away. If not, the robots.txt will very likely contain the directory name you seek.

Look through the HTML source to identify carts/CMS/frameworks etc. Identifying the application type helps in focusing the attack to areas of the application that have vulnerable components (plugins and themes for example). For example, if you look at the page source and see wp-content then you can be certain that you are looking at a WordPress site.

A lot of publicly available browser addons can also be used to identify website frameworks. Wappalyzer on Firefox does a pretty good job at identifying several different server types, server and client side frameworks and third party plugins on the site.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *