How Businesses Can Harness Big Data and Data Science to Fend Off Cyberattacks

Priyan Sampath May 25, 2022

0 4 minutes read

Written by Zeki Turedi, CTO – EMEA, CrowdStrike

Big data and data science are common words in cybersecurity. Big data is a collection of large and often complex, semistructured, and unstructured data used in machine learning. It is true that data has intrinsic value, but it’s not useful until that value is discovered through analysis and the reality is, that not all big data or data science is the same.

History of the cybersecurity landscape
Many people don’t realise that big data and data science go hand in hand, especially in cybersecurity. The cybersecurity landscape 15 years ago was considerably different from what it is now. Back then, new malware strains were uncommon and the amount was low and manageable. Even 10 years ago, when more sophisticated actors began to arise, only a tiny percentage of these represented advanced persistent threats (APT) or nation-states and governments looking to grab intelligence. Only a few governments were harnessing this ability, such as China, Russia, North Korea, and later Iran.

But today, our world has massively changed. The threat landscape has significantly matured. For example, Iran has advanced its cyber-abilities and use of ransomware to blend disruptive operations with authentic eCrime activity, Russia and China have become even more dominant in the weaponization of vulnerabilities at scale to facilitate initial access efforts and other countries are learning and following, from the likes of Turkey and Vietnam to India. Every country now understands that its cybersecurity posture needs to have the capability for intelligence collection. The lack of threat intelligence capabilities was seen by threat actors as being massively profitable, especially during the pandemic, with countries like Vietnam-based adversaries making vital response plans purely from intelligence collected via cyber operations.

What has the current threat landscape taught us?
The threat landscape is becoming more blurred day by day. Research shows that 62% of the attacks are malware-free. That means attackers are using living-off-the-land techniques or file-less attack techniques, meaning that they are disguising themselves as an administrator or a normal user.

Ukraine, for example, has for years been bombarded by sophisticated cyberattacks from Russia, such as DriveSlayer, a destructive wiper malware targeting governments in Ukraine. This is also similar to a threat actor known as VOODOO BEAR, also known as the Main Directorate of the General Staff of the Armed Forces of the Russian Federation or simply as the GRU. Unfortunately, it is not just the various Russian nation-states targeting Ukraine but also extremely sophisticated and capable adversaries that are part of the eCrime underworld.

WIZARD SPIDER, also known as CONTI or Ryuk, have also taken to support the Russian Federation and is actively warning that they will target organisations, governments, and any other groups directly targeting Russia with sanctions or other losses. Recently, there have also been elevated efforts by Russia-nexus adversaries to gain access to the network infrastructure in Western countries via the scanning and attempted exploitation of external-facing remote services. This activity suggests preparations that could provide intelligence collection opportunities and the potential to enable disruptive or destructive operations.

As sanctions continue to impose high costs on Russia’s economy, the timing of this activity may be in preparation for cyber operations specifically meant to retaliate against countries participating in sanctions or aiding the Ukrainian war effort.

The importance of proactivity in cybersecurity
Fifteen years ago, the cybersecurity industry was only reacting and responding accordingly. But today, it is imperative to stay one step ahead of threat actors by predicting their next move. The most effective cybersecurity solutions can correctly predict adversary behaviour using a combination of two elements: data science and machine learning (ML) or AI. But, it is important to note that AI is useless without the right data points.

One of the most significant cybersecurity issues is understanding the difference between normal behaviour and bad adversary behaviour. In some technologies, false positives or even false negatives are acceptable, but in cybersecurity, this can result in alert fatigue and/or, worse yet, major breaches – costing organisations a fortune.

So, the threat is real. Where does this leave us?
The key theme of the recent UK government’s Cyber Security Strategy was the importance of security data, not only as a way to understand risk and identify vulnerabilities but, more importantly, to identify events before they become incidents. Security data sits in many places. On endpoints, and servers, it traverses the network, and sits in the cloud, in containers, SaaS, or PaaS platforms. It sits on our Active Directory or Cloud Directory services. Security data or telemetry is everywhere. To accurately identify attacks, all data and pieces from across the network are required.

This is where extended detection and response (XDR) comes into play. These cybersecurity solutions harness data from all across the network, provide further context to incidents, correlate what usually would be isolated data sets, and bring up to the top the incidents that may have been missed in isolation.

The role of big data and data science
The only way of minimising false positives and false negatives is by using a vast amount of data to train the AI. The most effective cybersecurity solutions on the market use a single graph data store that collects over 1 trillion events every single day. Data today is collected not only from endpoints but also from the cloud, threat intelligence, and third-party data. This data is then used to identify bad threat actors and train the hundreds of machine learning models used to predict attacks and identify new unknown attacks.

The final piece of the puzzle and what separates good from great cybersecurity solutions is making sure there is a human element. This means that specialised threat hunting teams will also detect hidden attacks and new techniques that may have been missed during the automated process. Cybersecurity experts can then continuously tune, feed, alter, and verify, making sure the model, with every event, gets stronger and stronger, minimising false positives and false negatives.

Asking the right questions
Luckily, cybersecurity solutions have progressed significantly in the last 15 years. Cybersecurity technology has incorporated the benefits of both big data and data science. But, organisations that want to ensure their enterprise is protected still need to make sure they are asking the right questions. Some vendors may use buzzwords like AI or big data, but it is crucial to ask what it means and whether it will effectively protect your organisation.