This post is part of an ongoing blog series examining predictions and recommendations for cybersecurity in 2018.
Automated Threat Response and Relevance to IT and OT
Automated threat response, which we’ll simply refer to as ATR, is the process of automating the action taken on detected cyber incidents, particularly those deemed malicious or anomalous. For each type of incident, there is a predefined action for containment or prevention where newer technologies, such as behavioral analytics and artificial intelligence, are utilized to bring incidents of interest to the surface. With these technologies, the goal is to automate the process of detection, and implement an equally automated and closed-loop process of prevention. This not only reduces the burden on the SecOps teams but also shortens the response time. Over recent years, IT organizations have needed to adopt ATR technologies, such as our WildFire and behavioral analytics offerings, to be able to better combat the advanced attacks that have increased in frequency and capability.
So how applicable is this technology in protecting Industrial Control Systems (ICS) and Operational Technology (OT) environments from advanced threats? It is clearly relevant for the adjacent corporate and business networks of the OT environment, which are often internet- connected and subject to usage by threat actors as a pivot point for an attack. But what I’m more interested in is the relevance to the core areas of ICS: Levels 3, 2, 1 of the Purdue model, and the DMZs between them. ATR is very relevant, in fact.
Consider the scenario where an HMI station in an electric utility Energy Management System is suddenly being used to issue an unusually high amount of DNP3 operate commands (to open breakers), much higher than the baseline. This could constitute a malicious event or, at minimum, an anomalous one. ATR systems could be used to detect such events and automatically respond, whether it is to block the rogue device or limit the connection; for example, by giving the device of interest read-only access for the DNP3 protocol.
So why has this technology not been adopted yet? There are several reasons. Most OT organizations’ current OT cybersecurity initiatives focus on visibility and access control. Advanced threat prevention is a longer-term initiative. Second, the newer AI/machine learning technologies used to baseline ICS-specific traffic and detect anomalies have been mostly reserved for R&D or PoC environments. Third, ICS/OT asset owners and operators tend to be very conservative. The idea of allowing a system to automatically respond to detecting threat incidents is pretty scary for most OT folks due to the fear of accidentally blocking legitimate traffic/devices and causing downtime. Finally, the use cases and response actions for incidents detected in OT have not been well-defined.
2018 Is the Year of ATR in OT
My prediction for 2018 is that ATR in OT will reach production-level maturity and be deployed in a meaningful way. “Meaningful” means that we will start seeing large-scale deployments by leading operators of ICS in critical infrastructure and manufacturing environments.
There are several reasons I believe this will be the case. Some leading organizations have matured beyond visibility and segmentation, and have completed their PoCs of the technology. To add, a strong ecosystem around the OT-specific profiling, behavioral analytics, and anomaly detection has emerged. Some of these solutions exist as dedicated sensors or as modules that supplement SIEM devices. Initially deployed as stand-alone detection tools, these ICS network- monitoring solutions are starting to be integrated with enforcement devices, such as our next-generation firewalls, which are then used to realize the appropriate threat response.
Further driving the adoption are recent high-profile, cyber-physical attacks, such as those to the Ukraine grid in 2015 and 2016, which many perceive may have been mitigated or possibly prevented with ICS-specific ATR technologies. The scope of ATR in OT could also be applied to threats typically associated with IT, like ransomware, which could still impact OT. The effect of WannaCry in causing downtime in some manufacturing plants in 2016 is an example of this. I also see the development of OT incident response playbooks and semi-automated approaches, which make adoption into OT more amenable for resource-constrained and risk-averse OT types.
To be sure, ATR in OT will be initially limited to cases deemed less risky in terms of accidentally causing process downtime or safety issues. What defines “less risky” is certainly debatable and is still being worked out. Also, this definition will differ between organizations. However, there are some seemingly amenable ATR in OT scenarios mentioned often in my discussions with OT security teams. These include the case where a pre-existing host is suddenly performing unusual commands and then limiting its access; for example, limiting an HMI or engineering workstation to read-only access to the PLCs. It may also include blocking new devices that were not included in any installation plan of record.
One other such scenario would be to quarantine a non-critical host, such as a redundant HMI, found to be infected with ransomware. Another aspect of how I see gradual adoption is in how OT users will want the option to manually accept or reject a proposed threat response. This isn’t a fully automated approach, but is likely a necessary intermediate step toward proving full automation. Integrators developing these systems will be wise to develop user interfaces and workflows to support this semi-automated approach.
Palo Alto Networks Enables Automated Response
In anticipation of this growing use case, Palo Alto Networks has been engaged closely with the ATR ecosystem, customers and industry organizations to put in place the integration required to facilitate the adoption. A key enabler for the integration and automation is our powerful application programming interface, which makes interacting with sensors, SIEMs and other system elements very easy. Furthermore, the flexibility and granularity of the controls that could be automatically implemented are immense. Specifically, for OT environment, users can apply the App-ID we have for ICS to implement protocol-level responses down to functional commands; for example, to the DNP3 operate command mentioned earlier. Couple that with User-ID for role-based access, Content-ID for control over payloads/threats and, of course, more basic controls based on IP and port, and here you have a very flexible ATR platform that can accommodate a range of response templates, which could be tied to an organizations risk tolerance. A hazard and operability study (HAZOP) may be something organizations do to determine the appropriate ATR for certain scenarios where more conservative responses, which may include redundant systems, would be applied to processes/operations.
Whether it is in 2018 or later that our users decide to implement ATR in OT, they will be happy to know we have the capabilities in place, and the ecosystem to support their initiatives.