The state of unsupervised learning in cybersecurity
By Spandan Mahapatra
Cybercriminals continue to disrupt current defense mechanisms through new methods by exploiting vulnerabilities, especially in the overlapping areas of machine and human interactions. Cybersecurity software has matured well in terms of reacting to breaches based on signatures. This mode of operation is not sustainable—especially as bad actors and cybercriminals are creating more new ways to attack security systems than ever before.
In many cases, it may take up to six months to even detect a breach—and on top of that, there is an average of nearly 50 days between when a breach is discovered and when it is reported.
With the cybercrime industry set to cost businesses over $2 trillion each year, cybersecurity software clearly needs to get smarter. As industries shift their business models to digital-centric, data-driven customer interactions and services—they need advanced cybersecurity tactics to make sure customer data and assets always remain safe.
The cyber defense industry is investing efforts in awareness and propagation of cyber resiliency methods and practices. A key driver of this shift is the rise of unsupervised learning as a critical toolset in cyber defense approaches with different degrees of sophistication and success achieved in the industry.
What is unsupervised learning?
Unsupervised learning is made possible through a group of emerging technologies that allow for cybersecurity software to predict and safeguard against potential future attacks—without ever needing to experience a similar breach or an attack.
For example, the combination of clustering, anomaly detection, and deep learning allows for cybersecurity systems to have a greater degree of accuracy and confidence in what could be considered as a potential cyber-attack. Traditional cyber defense approaches depend on labeling data to determine a threat and then applying a response.
In the new world of dynamic attack surfaces and threat vectors, cyber defense mechanisms must continuously evolve to work upon unlabeled data which is under the purview of unsupervised learning.
To further illustrate, imagine a bank’s cyber-defense team and the tactics used to protect its assets. Traditional cyber defense tactics are based on previous breach attempts, and an overall understanding of how cybercriminals operate today. These tactics are developed based on a supervised, reactive approach to how cybercriminals have attempted to attack their systems in the past.
While specific models have tremendous capabilities, the models by themselves are very fragile and brittle. The learning models are specific to the problems that they are being trained on, and any changes to the data cause inconsistency in outcomes and model drift.
With unsupervised learning, machine learning and AI-based algorithms are constantly working to discover new potential ways that they could possibly be attacked in the future. The bank will then be able to safeguard itself against future attacks without ever witnessing a successful breach.
Similarly in the highly complex world of internet of things (IoT), exponential growth is occurring in the number of devices connected to the cloud for a myriad of use cases. Unsupervised learning through the usage of deep neural networks is being leveraged for attack prevention and intrusion detection. Most IoT zero-day attacks have no prior context or patterns or indications. Therefore supervised learning approaches just don’t work effectively.
This is where unsupervised learning has the widest potential, even though the success rate is not 100 percent, and neither is the coverage comprehensive today. But it is the only effective way of handling such events which have not been seen before.
5G’s huge impact on security
5G brings potential computational speeds and power to a wide variety of industries. The low-latency networking capabilities will allow cybercriminals to act faster and extract more sensitive data in a fraction of time than they currently can. Due to the massive increase in computational speed, cybercriminals will be able to successfully infiltrate and extract data before breach detection and response will be possible.
5G ushers in network segmentation capabilities through network slicing, hence network slice security will be the next battleground. At an intuitive level, network slicing will enable enterprises to apply additional focus to protect each network slice based on the data carried on that slice. While there is an advantage, it also allows bad actors to pursue specific slices with greater effort for penetration, since now they will know the value of specific slices.
The development of 5G will undoubtedly increase the intensity of the battle between cyber defense stakeholders and the bad actors. Because of this, cybersecurity technology needs to move away from the reactive firewall and defense tactics it has historically operated on and develop proactive tactics to stay ahead of these increasingly tricky cybercriminals.
Unsupervised learning—while still experimental—shows promise for allowing defense capabilities to scale alongside these cyberattack developments. Especially when paired with 5G’s high-speed and low-latency traits and network segmentation capabilities, unsupervised learning tactics will allow enterprises to remain one step ahead of bad actors.
The integration of 5G will allow unsupervised learning technologies to predict and safeguard against an exponentially higher number of potential new threats. Unsupervised learning shows good potential in terms of the approach, methodology, and algorithms related to anomaly detection with the presumption of fingerprinting Transport Layer Security (TLS) applications. Called JA3, this method was initially posted by Salesforce researchers and leveraged by multiple cybersecurity software companies.
Unsupervised learning can’t guarantee 100 percent cyber defense. However, the industry focused on creating more objective measures of performance in applied unsupervised learning methodologies in cyber defense will eventually lead to greater trust and security. Especially in the new world, where there will be a proliferation of a greater degree of IoT devices connected to 5G, unsupervised learning arguably has to be developed for a sustainable defense posture, since the effort on supervised learning in these scenarios will be significantly higher and more expensive at an overall level.
Hesitant enterprise adoption
Enterprises aren’t clear on the exact verifiable benefits that unsupervised learning can bring to a cyber defense program—and understandably so. What makes unsupervised learning so appealing also draws skepticism. Unsupervised learning is based on the premise of never labeling data, and therefore performance measurement from understanding how successful cyber defense tactics are cannot be achieved.
Because there is no way of truly knowing how accurate an unsupervised learning-based cybersecurity effort is in new scenarios, it doesn’t provide the necessary level of assurance that enterprises need to fully trust that their investment is worthwhile.
Additionally, no matter how fast unsupervised learning technology can predict and fortify cybersecurity efforts, there will always be the potential that bad actors are developing threats that unsupervised algorithms are not considering.
Unsupervised learning approaches are based on not labeling data. Hence there is always the potential for missing out specific attack vectors because it is impossible to have a “catch-all” cyber defense in the current scenario. Organizations need to also still invest in supervised cybersecurity threat detection and response teams.
When unsupervised technology is unsuccessful in developing proactive safeguards, there needs to be a team dedicated to addressing these breaches. Because of this, it is crucial to invest in technology geared toward reducing the time between breach and detection, and detection and response. It may be difficult, especially for organizations with limited resources, to invest heavily in both.
Until cybersecurity software developers find a way to record and visualize the success-rate and performance reports of these unsupervised learning efforts, adoption will continue to happen at a slower pace.
The road ahead is hazy. But there is tremendous promise and hype in the cybersecurity industry. On the one hand, investment in unsupervised cybersecurity will entail very real benefits. And as cybercriminals become increasingly creative and have more and more tools at their disposal, they will do more damage than ever before if organizations don’t protect themselves.
And it’s true, unsupervised learning cannot guarantee an enterprise 100 percent certainty that there will never be a successful cyberattack. But the benefits are glaring. As soon as measurement and reporting capabilities are fully realized, enterprises will at least have a more quantitative and reliable framework for planning and executing their cyber defense initiatives.