Introduction to CyberSecurity Warheads

Cyber Security Warheads

“Ransomware is more about manipulating vulnerabilities in human psychology than the adversary’s technological sophistication”

 James Scott, Sr. Fellow, Institute for Critical Infrastructure Technology

According to Yahoo’s latest revelation, two years back half a billion Yahoo user accounts’ security was compromised. Twitter experienced an outage due to massive DDoS attack on Dyn. Recently, LinkedIn’s Lynda was affected by a data breach which made LinkedIn to contact 9.5 million users out of caution. These are the most recent and few to name out of many major cyber-attacks launched on small and big firms.

Cyber security is becoming one of the major challenges for all organizations. Organizations are more concerned about frequent data breaches and are hunting for new ways to secure their vulnerable assets.

The traditional Security Incident & Event Management (SIEM) was unable to handle rapidly generating large volumes of data. Hence, analyzing large scale data with lowest possible latency had become the need of the hour, which in turn opened the doors for next generation security tools.

Organizations started looking out for investing in Security Operations Centre (SOC). SOC was a centralized capability to handle the security incidents across thousands of endpoints. SOC collectively provided tools for data collection, data aggregation, threat detection, advanced analytics and workflow capabilities from a single management area. OpenSOC was the project that came out as a solution and fulfilled most of the requirements. Cisco’s OpenSOC was designed to help SOC team to detect security threats and hack attempts much effectively. Later, OpenSOC model gave birth to multiple products like Apache MetronApache Spot, Elastics’s opensource Elastic Stack and Splunk.


SIEMs like Apache Metron & Apache Spot are developed on Apache Hadoop stack to particularly  process large volumes of telemetry data generated by number of devices in large infrastructures.

Michael Schiebel, CyberSecurity Strategist at Hortonworks, in his blog series “echo ‘hello, world.’” has explained how traditional SIEM’s rule-based security solutions was not the right approach. He has also briefly explained about the challenges faced by a SOC analyst and the importance of single platform that stores telemetry data and processes data using various analysis tools.


Apache Metron

Apache Metron is an opensource cybersecurity application framework powered by Apache, dedicated to provide next generation advanced cybersecurity platform to detect real time security risks. Apache Metron provides organizations the ability to ingest, process and store huge volumes of diverse security data feeds at scale. The Metron framework provides real-time streaming enrichment, integration with threat intelligence feeds, and threat triage like capabilities. These capabilities help to process data in real-time which is very fast, detect cyber anomalies, and to rapidly respond to them.

Brief History

Brief history of Apache Metron

Between 2005 to 2008, there was a significant increase in malicious activities and cyber attacks. This was the period when Apache Hadoop was still developing and simultaneously there was a scarcity of security professionals. Cisco being the only organization to possess the required skills, it came up with a Security Operations Center as service offering.

Post 2008, with the advent of big data, there was a huge flow of data streaming the data centers across the world. Cisco’s use of traditional SIEM tools was in jeopardy. As a result OpenSOC was born and made use of Apache Hadoop data analysis.

Between September 2013 to April 2015, Cisco’s Chief Data Scientist James Sirota and Hortonworks Team collectively worked together to create a next generation managed SOC built on top of opensource big data technologies. Later closed the development of OpenSOC.

In December 2015, OpenSOC was submitted and accepted as an Apache Incubator project and renamed to “Apache Metron”.

In April 2016, first official release of Apache Metron 0.1 released by Metron Community. The current release as of date is 0.3.0, still in incubator. One can download the latest release from here.

Brief End to End Architecture

Apache Metron integrates a variety of opensource big data technologies in order to offer a centralized tool for cybersecurity monitoring and analysis. Telemetry sensors generate data and pump it into Apache Kafka topics. Metron uses most popular streaming processor, Apache Storm. The storm topology takes the raw data and parses it into JSON format. This JSON format has total eight fields that are required. The parsed output is again fed to Kafka and the output is given to Storm enrichment topology. Using the enrichment topology we can enrich the fields with other information like IP address with location information or the necessary DNS information. To make the process fast, every enrichment is backed up by local cache. Next the threat intel feed holds various different feeds that contain the information about malicious IP addresses which it verifies against the data generated. Finally, the treated data is indexed using ElasticSearch or Solr search engine. The Metron stack has Kibana as a frontend UI to ElasticSearch. To analyse the live streaming raw data transmitted over the network can also be captured using PCAP (Packet Capture) utility. This data is stored in HDFS and can be replayed any time. Visit my Slideshare Account to view the presentation.


Sensors are generally a set of rules which are much slower than the machine learning algorithms. The adoption of machine learning might be helpful in creating models for a number of use cases. Apache Metron can also be called as IoT streaming application for all aspects of cybersecurity.


Apache Spot

Apache Spot, formerly known as Open Network Insight (ONI), was originally developed by Intel Corp with a focus on building a big data analytics platform along with machine learning capabilities for cybersecurity use cases. Later in September 2016, Cloudera & Intel together donated this community-driven open data model, an opensource project under the umbrella of the Apache Software Foundation. As an incubator project it got a new life and was renamed to Apache Spot. Apache Spot is basically based on Cloudera platform.

Currently the biggest challenges faced by organizations battling cybercrimes are collecting data from a large number data sources and then processing it. Like Apache Metron, Apache Spot’s primary use case is network traffic analysis. Also, Apache Spot provides a centralized data storage needed for investigation and a query response time that is in seconds and minutes. Due to which Apache Spot is able to provide reduced incident MTTR and thus reducing the impact of breach. Features like context enrichment, noise filtering, whitelisting and heuristics are applied to generate the list of most likely security threats.

Hunting Undetected Threats

Traditional technologies were able to detect only a known number of threats. Their rules-based approach was effective only while detecting known threats. It was also observed that the passive threat detection approach used by traditional tools was not sufficient to perform real-time continuous ad-hoc searches and queries over huge amounts of live data. The machine learning algorithms help to provide analysis of large datasets (i.e. billions of events per day), making Apache Spot work like a forensic tool that helps in recording any nefarious and unusual behaviour in the network. It is very much like finding a proverbial needle in the haystack.

The most important difference where it stands unique than rest of it’s peer competitors is it’s ability to detect new threats in advance, with the help of it’s unique supervised and unsupervised machine learning capabilities. Apache Spot uses machine learning as a filter to separate out the malicious harmful traffic and the harmless traffic.

Open Data Models and Analytic Collaboration

Apache spot provides a single platform for collecting and managing security data . The idea was to create a common data model by collaborating the developers around the world and integrating their applications to deal with cybersecurity issues. The noted primary use cases of Apache Spot are analyzing network flow, DNS and proxy, which is achieved with the help of Open Data Models (ODM) that contain a standard format of enriched events data. Apache Spot provides these common open data models for network, endpoint and user thus making it easier for the organizations to share the analytics as soon as new threats are detected.

Apache Spot Framework

The foundation on which Apache Spot is built is a platform which comprises of Cloudera Enterprise Data Hub (EDH) Edition on Intel Hardware. Cloudera EDH provides Apache Hadoop for data storage and Apache Spark for data analytics by utilizing it’s ability to perform near real-time anomaly detection using it’s unique machine learning capabilities. Cloudera EDH has been optimized for Intel hardware and Apache Spark leverages some Intel libraries which further increases the platform’s ability to further scale and perform. At the core of Apache Spot are the open data models.

The telemetry data from the sources is fetched by StreamSets and is imported into the Hadoop framework that comprises of Apache Kafka and Apache Spark. StreamSets Data Collector is an adaptable engine for ingestion of wide variety of data sources using plug and play origins, destinations and transformations. StreamSets allows organizations to quickly setup data ingestion pipelines to land data into Apache Spot’s endpoint, user, network open data models. The final output is indexed using a search engine i.e. Apache Solr. Finally, visualization is performed on the treated data using Kibana fork, Banana.

Following are the recommended data formats for Apache Spot

Avro: Avro is an optimal data format for streaming-based analytics use cases. It’s because it supports pure JSON representation for readability and it’s binary representation of data for efficient storage. Schema representation, compatibility checks and it’s interoperability with Apache Hadoop makes it a better choice.

JSON: Due to JSON’s ease of use, it is commonly used for data-interchange format. It’s familiarity within development community is an added advantage.

Parquet: Parquet is columnar storage format that offers columnar data representation and data compression benefits. It is optimal for batch-based analytics use cases.


Both Apache Metron & Apache Spot have some overlapping similarities that provide cybersecurity solution using big data technology. The difference that makes them stand apart from each other is machine learning and the Hadoop platform for provisioning. Apache Metron is built on Hortonworks Data Platform whereas Apache Spot is uses Cloudera Enterprise Data Hub Edition platform. The most prominent feature of Apache Spot is the concept open data model.

Although we have several products in the market to deal with cyber threats, but still none of them guarantees advanced safety measures. It’s high time that we adopt cybersecurity analytics with big data solutions for a better future. Apart from Apache Metron & Apache Spot, there are also commercially supported products that are based on Apache Spot like, Accenture Cyber Intelligence Platform and Cloudwick Opensource Adaptive Security Platform. RANK Software is another such product providing cybersecurity solutions using big data.

The modern day hackers and their cyber-criminal organizations stay updated by learning techniques from their counterparts  who frequently share information on various hidden forums maintained by them. In order to defeat this well-organized community of hackers, we also must take a community based approach and developers must come together and unite for fight against the rising cybercrimes.

We hope that Apache Spot’s open data model strategy will unlock broader set of use cases than are currently supported. The joint expertise of contributors and their contributions to open data models will help to spread the idea of common data model, and this unified front of technical talents would no doubt help to fight against any upcoming cybersecurity challenge more efficiently.

‘I think compute viruses should count as life. I think it says something about human nature that the only form of life we have created so far is purely destructive. We have created life in our own image.”

— Stephen Hawking

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.