A Brief Guide to Open Source Intelligence (OSINT)

Imagine yourself as a Roman scholar, tasked with finding a particular papyrus script from the Library of Alexandria.

A Brief Guide to Open Source Intelligence (OSINT)

Imagine yourself as a Roman scholar, tasked with finding a particular papyrus script from the Library of Alexandria. You approach Zenodotus, the first librarian of this great library, and seek for his help. Eager to help, he points you to one of the many rooms across which the manuscripts are organised. As you walk into the appropriate room, you glance over the tags attached to each manuscript. The tags are quite informative! It contains information such as author, title and the subject of the manuscript. In less than a few minutes, you walk out the library holding the desired manuscript in your hand, impressed by their organisational structure.

The onset of information overload is widely regarded to have begun with the exponential growth of the internet, yet humanity has been wary of information overload since at least the Roman times. We have all experienced the frustration of combing through search engine results to find that one article or video. However, we possess a variety of great tools that other civilizations did not. Collectively, we can describe the techniques used to gather information as Open Source Intelligence.

Going back to the story mentioned above, did you notice that a few details are analogous to how we store data electronically? Instead of rooms, we have hard drives. Instead of shelves, we have folders. Instead of tags, we have metadata. Fascinating how the technology changes, yet the idea remains the same. Open Source Intelligence, simply put, exploits the metadata to find content.

When talking about open source intelligence, it is important to remember that are a lot of routes capable of taking you to your destination. Just like how a painting can be interpreted in a variety of ways, an objective can be accomplished using various techniques. I'll narrate multiple stories throughout this article to showcase its practicality.

Open Source Intelligence in World War II

When the Office of Strategic Services was established on June 13, 1942 by President Franklin D. Roosevelt in response to his concern regarding American intelligence deficiencies, it was met with little support from competing agencies. In the thick of war, it had to prove itself.

OSS chief William Donovan knew from experience that experts from other domains such as economics, geography and psychology could provide valuable insights that would have been overlooked otherwise. So, when the question of approximating German manpower arose (as in how much man power could they put into the battlefield and how much in the economy) - the OSS, internally the R&A (Research and Analysis) wing, got to work.

Despite their achievements, the analysts of the R&A wing were not spies. Furthermore, getting information out of Germany was an arduous task and foreign publications such as magazines, journals and other literature didn't exactly provide the best intelligence. However, there were a few sources they could tap into.

These sources, later revealed by the chief himself, would turn out to be the small town newspapers in Germany itself. These were obtained via "underground means". The newspapers carried obituaries of German soldiers killed in action. Knowing that the ratio of enlisted men to officers killed is fairly constant across armies, they were able to to approximate by 1943 the strength of the German army. It would later be found that this estimate would be precisely close to the real figure.

Yet the insights would not stop here! In another newspaper, specifically in the society column, an item was published which inadvertently revealed the location of a division the OSS had been seeking.

Realizing the power of literature, they poured through pamphlets, periodicals and scientific journals to determine the total reorganization of the German armaments and munition industries by the end of 1942. Such endeavors would also confirm the existence of German submarine oil tankers along with photographs showcasing them refueling a submarine at sea.

In one case, OSS economists would be flown into the battlefield where they would note down the serial numbers of captured German tanks. Armed with the knowledge that such numbers were consecutively inscribed and never varied, they were able to estimate tank production.

Before closing its doors in 1945, the OSS would carry out more than a hundred missions to determine troop movements, hidden factories and storage dumps, treatment of prisoners of war, and even to what extent did the Nazis control the civilian population.

In the digital age, we have the luxury of using the Internet to read a variety of publications. No longer bound to a particular region, most information is displayed within a few keystrokes. Depending on your requirement, different sources will have to be surveyed, but here are a few tools to get you started :

Google Books

CORE : Open Access Research Papers

arXiv : An open archive of scholarly articles

Elephind : Search historical newspapers

The International Criminal Court

The 10th of February would be a historic day for the International Criminal Court, as it issued the arrest warrant against Mr. Thomas Lubanga Dyilo (which would later turn out to be its first conviction as well). The primary reason behind the warrant was the evidence found against Mr Lubanga for conscripting child soldiers to participate in hostilities across the regions of the Democratic Republic of Congo.

The Office of the Prosecutor had submitted video evidence that showcased children in military clothing being inspected by Mr. Lubanga. This, along with supporting evidence provided by AJEDI-ka ( a non-profit organization against the use of children as soldiers in the DRC) further lead judges towards conviction. The videos provided by AJEDI-ka were available on YouTube since 2008.

However, this is not the sole case where openly available information has been brought in court. Pictures and videos posted on Facebook, Instagram and Twitter have been used to link defendants to other crimes and in some cases provide footage of crime taking place.

Open source intelligence, as I hope you now begin to see, is beyond just "tricks". In times of need, it can provide direct evidence against perpetrators and bring them to justice. In such situations, images and videos are crucial. So, here are a few resources that can aid your search :

TinEye - A Reverse Image Search Engine

Ghiro - An automated image forensics tool

watchframebyframe - Watch YouTube videos frame by frame

In addition to the tools shown above, it is also important to search for images through various search engines such as Google, Bing and Yandex. As of now, Yandex provides the most accurate results. Another tool of interest is a custom search engine. A custom search engine performs searches against a predefined list of websites. You can find more information here.

Disaster Management

In 2017, a category 4 hurricane ravaged through Texas and Louisiana resulting in an estimated $125 billion in damages. Homes were destroyed, schools flooded, and in some areas strong winds tore off roofs from residences. Even worse, people were stranded across regions. Search and rescue not only needed to come first - it had to be quick as well.

Rescue operations require an awareness of the terrain. The coast guard is known to use a combination of old and real-time data to predict potential drifting paths of individuals lost at sea. However, when faced with a hurricane - time is of the essence. Not only do you require accurate data, but you need lots of it.

The best source of such information would definitely be the people, and boy did they rise up. Esri, a geographic information system (GIS) company, decided to provide access to its services pertaining to the affected areas. It automatically geo-tagged photos uploaded to its servers by users across Houston and surrounding areas to quickly display a heat map that teams could use to respond effectively.

Simultaneously, local governments worked with tech organizations such as Sketch City to distribute Google Sheets where residents could fill in their requirements and location. Houston Crowdsource Team, meanwhile built a website where critical information could be entered by the individuals on the ground - the residents themselves. These crowdsourced maps displayed shelters across towns, so that residents could walk, swim or boat towards safety.

Insurance companies wanted insights as well. After all, they would be the ones paying the immediate bills! DirectGlobal, another GIS company, would provide its services towards assessing the damage done, ensuring safety of on-the-ground staff, and facilitating claims immediately when possible.

The federal government didn't fall behind either - the Department of Homeland Security, as part of its HIFLD subcommittee, put up its own website where it provided authoritative geospatial data. Firefighters, first responders, and medical staff all used these resources to estimate needs and readied themselves accordingly.

Even DMVs rushed to help! Having data regarding the number of vehicles registered for evacuation assistance they could direct resources to more scarce areas. Or call upon the local residents to carpool whenever possible to get out of dangerous areas.

As an open source intelligence enthusiast, there are quite a few resources you can tap into when it comes to maps :

Google Maps


Bing Maps

Before you cry out saying how these are generic resources, remember that the maps are going to be the same. What matters is the frequency with which they are updated. Sometimes, Bing comes first.

There is another project that I would like to mention here, namely the NGA's (National Geospatial Agency) MAGE server. On its GitHub page, it describes itself as "The Mobile Awareness GEOINT Environment, or MAGE, provides mobile situational awareness capabilities. The MAGE web client allows you to create geotagged field reports that contain media such as photos, videos, and voice recordings and share them instantly with who you want. Using the HTML Geolocation API, MAGE can also track users locations in real time.". I will certainly be looking into this!

Profit and Plunder

CVE-2019-11510 is one of those vulnerabilities that widens eyes immediately. If exploited, it allows unauthorized users to bypass the Pulse Secure VPN to read private keys and user passwords. Discovered by security researcher Orange Tsai in March 2019, Pulse Secure quickly applied a patch in April 2019 in response.

Six months in, Bad Packets, a threat intelligence company, reported an IP address in Estonia was found mass scanning the internet for pulse servers vulnerable to this exploit (note that the origin of an IP is not indicative of the actual location of a threat actor).

Multiple clientele were told of this vulnerability. Among them was Travelex, a foreign exchange company based in London. Unfortunately, they had applied the patch a little too late. Hacked on New Years Eve's, they were eventually forced to pay $2.3 million to the ransomeware gang in April.

In January of this year, Bad Packets performed its own mass scan to determine whether any servers were left unpatched. There were. Precisely, there were 3,826 pulse servers vulnerable.

This isn't the only CVE that has caused such havoc though. In 2014, CVE-2014-0160 came out, given the nickname "HeartBleed". Exploiting a vulnerability in the SSL/TLS encryption scheme itself - nearly a third of all websites were vulnerable to it. CVE-2015-3456, nicknamed "Venom" was a way for attackers to break out of a VM.

Now, what does this have to do with open source intelligence? It is simple - what is available to system administrators and network security engineers is also available to threat actors. They can use the same techniques as those who defend systems, and they have to be right only once.

This doesn't mean we stop sharing though. We have to - because despite the various industries across which companies operate, the plethora of problems they face everyday - we're all in this together. Keeping that in mind, here are a few resources that you can use to keep systems secure :

National Vulnerability Database

CVE Details

REScure Threat Intelligence Feed

Open Source Intelligence Techniques

I am aware of the fact that I have not detailed how certain OSINT techniques are carried out in practice. This is by design, as I wanted to move away from the "OSINT is pretty much browsing through Facebook timelines" view (yes, sometimes that's all you can do but shouldn't be our first thought when thinking about OSINT!). Instead, what I'll do is link to various websites that teach you how some of these techniques are carried out as they have done a better job than I could.

Google Dorking : Using Google (and other search engines) to carry out reconnaissance.

Shodan : A search engine for finding devices

OSINT : How to find information on anyone.

Bellingcat : The Home of Online Investigations.

Open Source Intelligence Challenges

There are three main challenges when it comes to open source intelligence, volume, jurisdiction, and attribution. Volume can be a great burden especially if the target is using generic names such as "John Smith" or "Jane Doe". The job of researching a target is much easier if profiles are set to 'public', yet this is becoming a rarer occurrence as many social media users actively change their settings to 'private'.

In places like the EU, where data security laws are more stringent you need to be careful about not only how you are collecting data but also how you are processing it. Furthermore, you need to have a sound legal basis for collection in the first place. I highly implore you to check out this blog as a starting point and understand the laws relevant to your region as well.

Attribution is a different issue altogether but with regard to OSINT we can summarise it according to the old adage "Don't believe everything you see on the internet". Verify what you see, hear and read to the best of your abilities. Ask for help from the professionals on Twitter.

Additional Resources

There are few extra resources that I have linked to below, but please remember that techniques and tools vary from time to time. This doesn't mean you switch tools every other day, but do keep yourself informed about what interests you.

awesome-osint : A curated list of OSINT resources.

osint.link : A variety of OSINT resources.

Week in OSINT : Superb weekly roundup of open source intelligence techniques and tools.

OSINTCurious : OSINT news, blogs, instructional videos and podcast.

Any OSINT guide wouldn't be fun without a challenge right? I give you the following questions then :

  1. Can you find out which magazine and which issue was the famous V-J Day photo published in?
  2. Where was the photographer born?
  3. Who is the lady in the photograph?

Lastly, I would like to add that the techniques and tools mentioned here are for educational purposes. By reading this post, you agree to the notion that the author will not be held liable for any damages caused in any form whatsoever. The information is provided on an "as is" basis and you agree that you use it at your own risk.

The awesome image used in this article is called Lost In Space and was created by Jared Shofner.