Introduction to the Dragonfly Machine Learning Engine (MLE)
Andrew Fast, Chief Data Scientist, CounterFlow AI
The Dragonfly Machine Learning Engine (MLE) provides the machine learning and data science capabilities included within OPNids. Data science and machine learning promise to counteract the dynamic threat environment created by growing network traffic and increasing threat actor sophistication. This post will provide an overview of the MLE engine itself, reasoning for why data science and cybersecurity go together, and some insight into using the MLE as part of the OPNids system.
The Dragonfly MLE provides a powerful framework for deploying anomaly detection algorithms, threat intelligence lookups, and machine learning predictions within a network security infrastructure. The MLE can process hundreds of thousands of events per second using a multi-threaded, script-able streaming application engine for network threat detection implemented in C.
Using this scalable foundation, the MLE also allows the creation of custom scripts and analytics to be applied to network traffic streaming through the sensor. Scripting includes the following capabilities "out of the box":
· Lua (LuaJIT)
Redis-ML is a module for Redis that includes the ability to score previously-trained models including:
· Linear regression
· Logistic regression
· Forests of decision trees (random forests)
The MLE can read and write streaming data from files, Unix sockets, or Kafka brokers. It is designed to integrate closely with Suricata.
Solving the Machine Learning Deployment Problem
Integrating the MLE into OPNids helps to solve the machine learning deployment problem, one of the largest challenges facing the machine learning industry as a whole. Many of the network analysts we have spoken to recently about machine learning in network security lament the cost and complexity of the majority of data science platforms. For example, running Spark requires the management of a large number of servers (either on-premises or virtualized), a Hadoop cluster for data storage, and Spark or other "Big Data" analytics platform on top of that. OPNids combines data collection via Suricata with ETL, Scripting, and model scoring included in the MLE. This tight integration between data and analysis, along with the transition from batch to streaming analytics, allows powerful analysis and scoring data without the huge cost or complexity.
OPNids also combines signatures, scripts, and models into a single package, since no one solution is sufficient to cover all network threat detection use cases. Signatures capture known, but largely static, threats. Scripts and machine learning models handle those more dynamic cases but are not as helpful for known threats. OPNids merges data collection in Suricata with the powerful combination of signatures, scripts, and machine learning in the Dragonfly MLE.
Using the MLE for Data Science and ML
The MLE uses a powerful yet familiar "pipes and filters" model for processing data with the addition of Redis for live data caching. There are three types of event processors available for inclusion — two types that are user-configurable and one type that is built in:
— *Input processors* - User-configurable scripts that pull messages out of a source, normalize the data into JSON format, and route each message to the appropriate analyzer queue for processing. Message sources can be files, Unix sockets, or Kafka brokers. Normalization and ETL operations are performed by a user-defined Lua script.
— *Analysis processors* - User-configurable scripts that pull messages out of the input queue, analyze each event, and route the results to the appropriate output queue for processing. Analyzers are implemented as user-defined Lua scripts and take advantage of both native Redis and Redis modules.
— *Output processors* - The built-in processors pull messages out of the queue and deliver each message to the appropriate sink. Current message sinks are either files, Unix sockets, or Kafka brokers that can be ingested by SIEM, security orchestration, and/or other downstream systems.
The operational pipeline is specified in a user-defined configuration file defining the interaction between the processors.
OPNids with the Dragonfly MLE is a powerful platform for improving threat detection capabilities using the combination of signatures, scripts, and machine learning models. With the inclusion of Suricata into OPNids and a focus on streaming analytics, many of the traditional challenges with deploying machine learning have been eliminated. Download OPNids today or explore the Dragonfly MLE directly.