Blog

The Engine Behind Security Intelligence Explained

Posted: 18th December 2019
By: STAFFAN TRUVÉ

Recorded Future captures all information gathered from the internet for over a decade and makes it available for analysis in a structured and organized way. We call this the Security Intelligence Graph, and it is at the heart of all services offered by Recorded Future.

Having all information readily available in the Security Intelligence Graph offloads a tremendous amount of work from analyst teams. It could take an organization thousands of man hours to build out a fraction of what is now available, and that time can instead be spent on analysis. By adding their own analyst notes, security teams can even connect their own findings to the Security Intelligence Graph. Navigation in the graph is what powers the easy pivoting between different views in the Recorded Future® Platform, and relationships in the graph underlie the risk score calculations that enable analysts to make quick, informed decisions.

To make full use of Recorded Future, it helps to have a good understanding of our underlying data model and design philosophies — explaining this is the purpose of this blog. The following is an excerpt from our Security Intelligence Graph white paper. To read the full white paper, download your complimentary copy today.

The Security Intelligence Graph Explained

Just as many industrial companies today are creating “digital twins” of their products, we aim to build a digital twin of the world, representing all entities and events that are talked about on the internet — with a particular focus on threat intelligence. The Security Intelligence Graph is that representation of the world, and our goal is to make this information available at the fingertips of all security analysts to help them work faster and better.

security-intelligence-engine-1-1.png

A very early (2010) visualization of the Security Intelligence Graph, in this case showing discussions on the internet related to Barack Obama. (See related video)

A graph is an abstract mathematical object, consisting of nodes (representing entities, or “nouns” of the world) and edges (representing relationships between these entities). Graphs can be represented using different technologies, including both relational and graph databases, but the important thing is that graph algorithms operate on the abstract notion of nodes and edges. These algorithms allow us to find the shortest path between two cities, identify key influencers in a social network, or calculate the risk level of an internet domain name, to name a few examples.

Coming from a computer science and artificial intelligence background, it was natural for us to organize information in a security intelligence graph made up of two parts: an ontology graph and an event graph. The ontology graph is used to represent slower-changing information about the world with a high degree of reliability. The other part of the Security Intelligence Graph is the event graph, which represents fast-changing and evolving information gathered from the internet. The two parts of the graph are connected by edges representing the relationships between events and the entities related to them. For example, a cyberattack event node can be related to an attacker and target entity node.

Nodes and edges in the Security Intelligence Graph can also have attributes. For reference nodes, these attributes describe metadata such as the original source, the media type, and the publication and event times of a reference. References also have attributes computed from the text itself, such as sentiment scores. The entity nodes can have a large number of attributes, including Recorded Future risk scores and entity type-specific attributes such as the birth date of a “person” entity or the population of a “city” entity.

A security analyst can use these attributes to quickly select relevant subsections of the Security Intelligence Graph (e.g., by searching for events mentioned only on social media sources or company entities belonging only to a certain industry category).

security-intelligence-engine-2-1.jpg The Security Intelligence Graph and its two components, the event graph and the ontology graph.

The Security Intelligence Graph allows both human analysts and algorithms to seamlessly pivot through complex relationships (e.g., when working with the Diamond Model of Intrusion Analysis). The image below shows an example starting with a file hash and finding a related file, the vulnerability that file is related to, malware that exploits that vulnerability, a threat actor utilizing that malware, and in the end, a government organization associated with that threat actor.

security-intelligence-engine-3-1.png

A subsection of the Security Intelligence Graph showing how it allows pivoting between hashes, files, vulnerabilities, malware, and threat actors.

Intelligence Is a Volume Game

Over time, our Security Intelligence Graph has grown, and today consists of more than 63 billion reference nodes and more than four billion entity nodes, all connected by hundreds of billions of edges. On a daily basis, we typically add more than 40 million new reference nodes and three million new entity nodes to the graph. Computed attributes such as risk scores are all updated in real time for more than one billion entity nodes.

Building the Security Intelligence Graph

The Security Intelligence Graph is constructed and updated from a number of sources. All text sources that are harvested by Recorded Future are analyzed using natural language processing (NLP) to extract entities, events, and temporal information. This information is used to create new, or update existing, entity nodes in the graph and to create new event reference nodes and edges between the entity and reference nodes. Technical sources are also used to create entity nodes, update their attributes, and sometimes create new reference nodes.

Ontological data is used to update the ontology graph with information about relationships between geographical entities, person entities, and which company entities they have a role in. Information from the NLP analysis of text is sometimes also used to update ontological relationships. Recorded Future uses this analytics machinery to compute risk scores for entities, which are stored as attributes of the entity nodes in the graph.

Future Evolution of the Security Intelligence Graph

The Security Intelligence Graph continues to grow. Millions of nodes and edges are added every day — in reflection of the living, breathing heartbeat of the internet. In addition to this organic growth, we constantly add new sources, new entity and event types, and new analytics.

At Recorded Future, our goal is for the Security Intelligence Graph to always be the most comprehensive source for security analysts, with the ambition to protect their organizations or nations from present or future threats. Our belief is that threat analyst centaurs — the seamless combination of algorithms and humans — is the only way to achieve this goal, and the Security Intelligence Graph is the fabric enabling that collaboration.

To read the full white paper, download your complimentary copy today.

Related