At Digital Operatives, sometimes our eyes are bigger than our stomach. We’re a small, focused security company trying to help the nation improve in cyber-security. Fundamentally though, we’re an “ideas” company. We come up with ideas, and as part of our business model, share them with our customers to see if they are interested in funding us to do the research.

As a side product of the research, development, and operational support we perform for our customers, we have dozens of ideas that we’ve explored internally and can provide value to our customers and the cybersecurity community. We’d like to share more of them publicly to start productive conversations and contribute to advancements in the field.

Here’s one of our most recent ideas in abstract form:

Automated Reconnaissance For Penetration Testing

“In a penetration test, information gathering is one of the most important stages. A common axiom is, “The more information gathered, the higher the likelihood of success.” Therefore, a significant amount of time is spent obtaining as much information as possible about the target. Typically, this includes details about the company, individuals, assets, and all other connections to the target. These details can be obtained from various sources, including – but not limited to: social media, professional profiles, corporate web sites, and company blogs. We seek to find ways to automate the process of network and computer system reconnaissance.

Because information gathering and analysis absorbs a large portion of the time spent on the penetration test as a whole, automating this process could significantly reduce the total time of the test or allow more time for the assessment, planning, gaining access, and collection phases. The proposed method of automation is to use a combination of technologies such as Natural Language Processing, Computer Vision, and Artificial Intelligence to identify a wide variety of details that together – the aggregation of which – can be used to build a profile of the company, its employees, the security posture, and even the software/hardware components of the network and computers. The technologies that will be explored in this study include: machine learning, computer vision, artificial intelligence, and optical character recognition (OCR).

The intended outcome of this study is prove the viability of using these technologies to automatically identify relevant target information. If successful, a proof-of-concept will be developed to demonstrate the viability of fully-automated reconnaissance of a target organization.

These capabilities are definitely already in the hands of some. Google can discern lots of information using computer vision to recognize objects in images. A screenshot of Facebook’s website uploaded to google to search for similar screenshots seems modestly successful:

Google recognizes this as a screenshot of Facebook. A trained human currently recognizes this as Facebook’s login page on OSX in a Safari Web Browser (all interesting targeting information). Perhaps Google knows more than they are letting on. If Google can recognize a Facebook screenshot, can we automate this as an Early Warning system?

It’d be great to run a search like “SELECT * FROM EMPLOYEES WHERE CAT_NAME != NULL” on an organization you’re pen-testing or trying to protect. CAT_NAME could automatically be populated by reviewing employee twitter posts of cat pictures and names. If you were a Fortune 500 company, protecting your employee passwords, wouldn’t this be a worthwhile tool to automatically detect exposed vulnerabilities on your employee’s social media presence? Food for thought. Twitter is helpful:

Relevance: Information leaks and risks are all around us. Seemingly innocuous information that can be used by an adversary or in a penetration test is put on the Internet all the time. This capability can help employees and ordinary citizens understand the risk of oversharing, or of using your cat’s name as a password.

Risk: Developing this technology and putting it the hands of “attackers” can boost the effectiveness of attacks. It clearly has dual-use properties.

Public Good: Organizations, large and small, need to understand the multi-dimensional risks they face in the information domain. Not building it advantages the big and potentially adversaries while building it advantages the individual and the small organization.

Prior Art: There is some prior art in the domain obviously, but not of the maturity of a tool like “Metasploit”. Whether Metasploit inherently has boosted the security of the Internet or not is outside the scope of this post, but an argument can be made that it does of course. We’d love to build a tool, broadly usable that automates this process and puts the power in the hands of the individual.

We’ve been building parts of this for several years. We’ve been working on Computer Vision, Natural Language Processing, Artificial Intelligence, and Machine Learning that can be combined to enable this technology, and hopefully put it in the hands of red teams and blue teams. As of now we are in the early stages of bringing this idea to reality. We welcome discussion on the topic. Feel free to comment or inquire about ways Digital Operatives can help automate your reconnaissance, improve your cyber-security program, or are interested in helping us build this.