Using replication technology to ensure cloud-based IoT security can be maintained at scale
By Dennis Monner
How long can today’s cloud security systems cope with the growing mass of connected devices and users, the continued expansion of the IoT, and the expectation that remote workforces will become a permanent trend? In theory, cloud security systems can scale indefinitely, but the evidence is that there comes a point when the volume of devices and users means that protecting all those in a linear way becomes impossible and protection is destabilised.
The challenge of maintaining security in the cloud at scale is due to the way it is structured: part of computing capacity focuses on functions, and the other on synchronisation of security instances and exchange operational data with them, for example, to share malware definitions. As numbers of end points increase, more security instances are required to cope with the growing volume of data. However, at the same time, the number of direct connections that each security instance has to make to all the other instances in the system to ensure synchronisation also grows. So, the sheer volume of computing power required to maintain those connections becomes huge.
If the system is focused on synchronising security instances, then the system’s functionality capacity is compromised. In the worst case scenario, there would no longer be any free computing power available to analyse and secure data traffic. Plus, at that point, adding new instances does not help either: even newly-added instances would immediately be occupied with synchronisation, so the security system’s effectiveness would no longer increase proportionally to the amount of newly added security instances.
The challenge is real
According to McKinsey, the worldwide number of IoT-connected devices is projected to reach 43 billion by 2023, an almost threefold increase from 2018. Given that the number of end points is only expected to increase, it is essential to find new ways in which to address the security risk. While it is hard to put an exact number on the point at which security becomes destabilised — as multiple factors are involved — 10 million could well be that number. For a mobile provider, an ISP, or any company supporting large volumes of end points, 10 million is a number that can be easily reached soon, if not already.
Large telecommunications providers offering integrated security services, for example, have to process huge amounts of data from their customers and their customers‘ connected devices, often up to a hundred million users in regions like Asia-Pacific. 5G technology will exacerbate the issue, when masses of Industry 4.0 sensors and other IoT components will go online in customers’ networks. Clearly, cloud-based networks are the only viable method to support all those end points, which is why it is essential to find new ways to address cloud-based security.
One answer to solving the cloud security at scale challenge is through using replication technology, or more specifically, through replication groups. More than just theory, it is already being rolled out in the networks of some mobile network operators and ISPs in Asia Pacific. Replication — sharing of information to multiple remote sources to ensure consistency— in computing is well-established, and the concept of grouping of replication also exist. In a cloud-based security system, that same idea can be used to create replication groups. Each group bundles together a number of security instances, but with a limit that cannot be exceeded. Synchronisation of instances only takes place within the instances belonging to the same group.
So how does that work in terms of keeping the whole system secure? The answer is that each security instance belongs not just to one replication group, but to two, so that information each instance receives from the first group is passed on to the second one. This is a very effective way of propagating information fast and efficiently across the entire system, maintaining synchronisation without having to have a separate connection made by each instance to all the others individually.
So, regardless of the size of the overall system, the number of maximum connections each instance has to make is limited. Even if new replication groups are added, the number of connections each security instance can make does not change, so the volume of computing power required to carry out synchronisation never exceeds a pre-defined percentage.
Here is an example of what that might mean in practice. Say a security system has ten security instances (this is obviously a very low number, and is just being used to describe the theory). Conventionally, to synchronise with all the other security instances in the system, that would mean each instance must establish nine direct connections. However, if those instances are now divided into four replication groups — each with a maximum of five instances — the number of possible connections is capped. Instance no. 9, for example, belongs to both the blue and the red group. This means it only has to synchronise with the other instances of these two groups: four for the blue group, three for the red (instance no. 2 also belongs to both groups and therefore only counts once). Instead of nine direct connections, instance no. 9 only has to set up seven direct connections.
This may not sound like a huge gain — 7 instead of 9 — but when the same idea is applied to a larger system, the benefits become much clearer. If the number of IoT devices to be protected increases, the number of security instances must also be increased. In cluster C, the number of instances has increased to 15, the number of groups to six. Without the replication groups, 14 direct connections would be required for synchronisation in this scenario. However, despite the two new groups, the maximum number of connections remains limited: Instance no. 9 now belongs to the blue and green groups and still only needs to synchronise with seven other instances, instead of 14.
Apply that to a system that has millions of connected end points and the capacity gains will scale accordingly. The number of security instances within a network will vary hugely, depending on features and the underlying hardware specifications, but for an ISP with 10m customers, a ballpark estimate might be 20 nodes for each basic feature, and 40 nodes for more sophisticated ones. Likewise, capacity gains and cost-savings are going to be different for each organisation, but if taking overall cluster resources (CPU, RAM, network I/O etc) into account — which do have a financial impact — then the potential becomes clearer.
For example, for 50 nodes, the amount of synchronisation work required gets multiplied by 2450 – that is a lot of CPU and RAM in the overall cluster compared, to say, 5 nodes when we segment the cluster into overlapping replication groups as described above of 5 nodes per group. For 5 nodes, the multiplier is 20, which then needs to be multiplied by the number of replication groups in the overall cluster, equalling 17, giving a total multiplier of only 340, compared to 2450.
Gaining time for the future
The replication group method effectively buys time organisations supporting customers and devices on cloud-based networks, at least several years, even with the predicted rapid growth of the IoT we can expect between now and 2023 cited by McKinsey. Realistically, no-one can predict how big the IoT will be in five years’ time, and even the replication group method will reach its limit one day. Once a certain number of replication groups is reached, the time it would take for data to flow from one group to the other would be so high that groups would not have the same information at exactly the same time, and so synchronisation would be impaired. However, replication groups can deal with the challenge in the medium-term.
An Internet of Things of this size is still a utopia. However, with regards to the ever-evolving technology, it will probably only be a matter of time before the IoT networks of the future call for new approaches once again. Structuring methods like the replication groups approach are no panacea for all eternity. However, it can deal with the medium-term scaling problem, and so enables organisations of all kinds carry on operating efficiently and securely, while research and development into solving the next level of IoT’s security takes place.