Commentary by Lee Joon Sern, Lead Data Scientist, Ensign Labs
Artificial Intelligence (AI), particularly Machine Learning (ML), has been gaining renewed interest in the last decade. This interest has been facilitated largely by the significant and exponential growth in its computational power to potentially address computationally-intensive, hard problems that might have been insurmountable in the past. With this increased processing power becoming more readily available, many industry players are avidly looking at how breakthroughs in AI and ML can be applied to their respective domains. These breakthroughs can improve and optimise business processes or procedures, as well as develop new products and solutions.
While AI and ML bring about a new dawn of positive transformation, these techniques can also be used for harmful purposes. This is especially so in the cyberspace, as cyber attackers are constantly finding new ways to mount attacks on organisations. In today’s environment, it is all about combatting intelligence with intelligence. As organisations employ AI or ML for cyber defence, so do malicious entities for cyberattacks.
AI/ML: A new fuel for more potent cyber attacks
In recent years, we have seen an increase in AI-fuelled cyberattacks. Take for example, OpenAI’s GPT-3 natural language model. Researchers found that it was possible to employ such state-of-the-art language models to craft realistic phishing emails.
Weaponising attacks with AI and ML capabilities lowers the risk of detection. These capabilities potentially enable cyber attackers to learn from detection events, allowing their AI/ML models to adapt and evade more potent cyber defences through reinforcement learning techniques.
How then do we combat the increase in the use of such AI-fuelled techniques from attackers? Cyber defenders need to quickly adopt advanced ML capabilities to stay ahead of the curve and detect such threats, while keeping false alarm rates low. Generalisability of cyber defenders’ ML capabilities to new unseen threats is a key consideration when designing AI and ML systems.
AI/ML solutions deployed at Ensign to protect customers
At Ensign, our Data Science team has developed and patented state-of-the-art machine learning and deep learning techniques which are used to develop a suite of proprietary detection models (AI-Powered Cyber Analytics). These models are based on behavioural analytics that can empower our customers to detect advanced cyber threats.
Unlike most AI-based solutions, our models are designed to learn on large, partially labelled datasets using proprietary self-taught learning techniques. By using the entire dataset for training, despite it being only partially labelled, we enhance our model’s ability to detect existing and previously unknown threats faster and more accurately. These models are also more cost-effective to deploy since less resources and onerous manual labelling effort are required to train them.
For instance, our deployment of Domain Generation Algorithm (DGA) in Ensign’s AI-Powered Cyber Analytics has detected cyber attacks surrounding threats like Winnti, Sunburst, Lazarus and even Ryuk Ransomware. These were threats that traditional cyber security tools struggled to detect with a very low false positive rate, even while operating on standard DNS logs.
Overcoming Limitations of AI/ML While utilising AI and ML is proving to be promising, it is not without its limitations. Consider data poisoning, for instance.
Data poisoning is a technique where attackers manipulate datasets used in AI and ML. Attackers can tamper with data inputs to masquerade the behaviour of a cyberattack. Many cyber researchers use open-source datasets on the Internet to train their models. Such open-source datasets may not be clean, especially the large ones.
Attackers can manipulate these datasets to evade detection, as the AI-based detection models would have been trained on a wrongly labelled dataset. For this reason, we employ state-of-the-art techniques that facilitate training even on noisy data to reduce the impact of wrong labels.
Cybersecurity is a constant effort, and this means having to implement a multi-layered approach to deny cyber criminals from succeeding. To validate whether a threat has occurred or not, we use image analytics, time series analytics, graph analytics and even signal processing, over and above AI. This ensures that Ensign’s proprietary suite of AI-powered Cyber Analytics are robust and can constantly keep pace with the ever-evolving cyber landscape and attack techniques.