SOC - Reconnaissance (Lesson)

Reconnaissance

Introduction

In this lesson, you will learn about the reconnaissance methods threat actors use to investigate their targets prior to an attack. As you have already learned in a previous lesson, reconnaissance is gaining information about targeted computers or networks that can be used as a preliminary step toward a further attack seeking to exploit the target system. It includes a combination of tools and techniques used to create a full profile of the organization and its security posture. These include its domain names, Internet Protocol (IP) addresses, and network blocks.

In performing reconnaissance, the attacker will usually attempt to gather information at different layers, or different distances, from the target.  If possible, the attacker will apply reconnaissance techniques moving from layers 1 to 2 to 3.  Even if the attacker has easy local access to the target, they will still apply all reconnaissance techniques to improve their mapping of the victim. 

Techniques at Each Level Presentation

Here are the techniques at each level:

Real Life Analogy Activity

Web Search Reconnaissance: Google Dorking

The OSINT content in a previous lesson covered a lot of web search reconnaissance. If you remember, we defined OSINT (Open-Source Intelligence) as any information that can be gathered from free public sources about an individual or organization. In this lesson, you will learn how to perform specialized web searches with Google Advanced Search Operators that allow you to perform extremely fine-tuned searches.  This is called Google Dorking. When someone uses Google Dorking to create a string for a specific search, that is called a Google Dork. Google is the #1 tool for web search reconnaissance.

Advanced Google Operators:  Special Characters

For example, quotation marks are used to tell Google to group a set of words or characters instead of searching for individual words.

quotation marks are used to tell Google to group a set of words

Here is another example: dolphin + Miami will search for websites that have the word dolphin AND Miami.

 

A Google Dork (aka Google Hack) is a search string that uses Advanced Search Operators to find OSINT information that is hard to find with a simple search. It includes information that is not intended for public viewing but that has not been properly protected. 

Examples of data that can be found using Google Dorks:

  • Admin login pages
  • Usernames and passwords
  • Credit card information
  • Sensitive documents
  • Government/military data
  • Email lists
  • Bank account details
  • Vulnerable online devices

Sample Scenario

Google Hacking DataBase (GHDB)

The GHDB is a collection of MANY Google Dorks that are nicely organized in categories, with descriptions of what they can accomplish.  It is a pool of searches, aka Google Dorks, that will find websites with specific types of vulnerabilities, such as posted usernames, passwords, or login portals. As defined by their website, Google Hacking Database (GHDB) is a categorized index of Internet search engine queries designed to uncover interesting, and usually sensitive, information made publicly available on the Internet.

Important point -- Google Hacking has NOTHING to do with actually targeting Google AND more importantly, Google has NOTHING to do with Dorking in general or this specific database. Be aware that Google algorithms spot these kinds of activities and you may even get a captcha message to check whether your dorks are being performed by bots.

Google Practice Activity

Advanced Google Searching

Syntax is operator:search_term (no spaces)

Advanced Google Searching Practice Activity

More Advanced Google Searching

More advanced Google searching

 

Let’s make sure you are clear about the difference between a webpage title and a webpage URL:

  • The title is just a descriptor that appears at the very top of the webpage, usually on the browser tab.
  • The URL is the actual address that the browser used to access that webpage.

Examples of Reconnaissance Google Dorks

Examples of Reconnaissance Google Dorks

These are basic queries that are useful for reconnaissance or for a site administrator to use to identify their own vulnerabilities:

  • Example #1 looks like it is canceling itself out, but it is reducing the results to exclude the www (web server) so that we can identify other servers or subdomains.
  • Example #2 will find any directory listings. It can be a real vulnerability issue as it can provide a map of the website and often the files/downloads are not restricted when accessed from the directory listing.
  • Example #3 is commonly used as the beginning of a longer dork to find the access page of a site.

Robots.txt

Let’s pause from dorks for a minute to learn about the Robots.txt file.  It is not a Google Dork itself, but it is an important item to consider.  Many web admins treat the robots.txt file as if it is private -- probably because it is usually read by bots instead of humans. 

The purpose of the robots.txt file is to inform web bots as to whether they are allowed to index your website pages and if so, which ones are not allowed. The robots.txt file has to be open to the public so that bots can refer to it, so there is no expectation of privacy. This detailed information about the website file structure can be very useful to hackers!

Administrators should never list sensitive files directly in the robots.txt file.  Yes, there is an admin folder for the website, but you don’t have to advertise it. The web folders should be organized so that admin is several layers down. For instance, the path could be changed to /subdomain1/docs/admin.

Here is an example of a Real Robots.txt:

Real Robots.txt

Disallow: /admin

Disallow: /cda/testing

Disallow: /comment/reply

Disallow: /filter/tips

Disallow: /node/add

Disallow: /node

Disallow: /nofollow

Disallow: /search

Disallow: /taxonomy

Disallow: /user/activate

Disallow: /user/login

 

  • The Disallow: Field is used to identify what files/folders on the website should not be indexed.
  • However, when you say “don’t look there” -- a lot of people get interested! Bot crawlers don’t have to respect the “Disallow”!
  • It is a great tool for reconnaissance as it often provides a list of files/folders that will not be found while browsing that website.

Reconnaissance Security Solution

What methods can we use to secure against reconnaissance using web searches like Google Hacking Database? The first step in mitigating against reconnaissance is to spend a lot of time Googling your organization. Perform an aggressive Google search on your organization to identify information that should not be public or accessible through OSINT -- then remove it! You need to work as hard at this as a hacker will, which means it will be time intensive and you will need to schedule it regularly.

Reflection and Wrap-up

In this lesson, we have learned that digital reconnaissance is kind of like bank surveillance before a robbery. It is the crucial first step for cyber attackers to gain insights into potential vulnerabilities within a target's network. We have delved into various techniques used by malicious actors to collect data, ranging from domain names and IP addresses to the details of an organization's network topology.

The key to these reconnaissance efforts is tools and techniques that allow attackers to methodically map out the victim's environment, assessing it layer by layer. Furthermore, we have seen how real-world analogies, like preparing for a bank heist, parallel the systematic gathering of intelligence in cyber scenarios. We have also introduced the concept of Google Dorking, a method for using advanced search operators to uncover information that may not be readily apparent through regular searches, thus highlighting the wealth of sensitive data that can inadvertently be exposed online. Finally, the lesson emphasizes the importance of regularly auditing your own organization’s online footprint as a defense strategy to minimize the risk of such reconnaissance efforts being successful.

 

[CC BY-NC-SA 4.0 Links to an external site.] UNLESS OTHERWISE NOTED | IMAGES: LICENSED AND USED ACCORDING TO TERMS OF SUBSCRIPTION - INTENDED ONLY FOR USE WITHIN LESSON