What is machine learning?
Machine learning (ML) is a subset of artificial intelligence (AI) that uses algorithms, data sets, and statistical analysis to make assumptions and pass judgements about behavior.
ML systems have made impressive progress in the last few years in the range of tasks they can perform, which now include real-world functions such as image classification, language translation, autonomous vehicle control, and playing complex games including chess, Go, and Atari video games.
In fact, machine learning algorithms have already found their way into critical fields including healthcare, finance, and transportation — areas where security failures can have severe repercussions. Take self-driving cars, for example; the security risks in machine learning systems for this field could be deadly.
Famous for its ability to analyze large data sets and identify patterns, machine learning is increasingly popular in many fields. Cybersecurity experts, however, worry about threats.
Often when organizations adopt new technologies such as machine learning, they make security a secondary consideration. And although AI systems and machine learning have been around for more than 30 years, the security of machine learning systems themselves has been largely ignored.
While machine learning offers a number of benefits for organizations, it also brings new security issues.
Is machine learning secure?
Although AI models and machine learning models face some security threats similar to those posed by earlier technologies, they have unique threats as well.
For instance, machine learning requires more data, and more complex data, than other technologies. More data means more volume and processing, which increases complexity and vulnerability.
At the same time, many organizations aren’t following basic security practices that should come with machine learning, such as keeping a full inventory of all machine learning projects or conducting audits and testing.
To operate effectively, artificial intelligence and machine learning require three sets of data:
- Training data to build a predictive model;
- Testing data to assess how well the model works; and
- Live transactional or operational data when the model is put to work.
While live transactional or operational data is a valuable corporate asset, it’s easy to overlook the sensitive information contained in pools of training and testing data.
Anonymization, tokenization, and encryption are all principles used to protect data in other systems, and can also be applied to protect data in machine learning projects.
The first step in protecting data in machine learning is to ask if the data is needed. When preparing for machine learning projects, one temptation is to collect all possible data to see what can be done with it. Resist that impulse: the more data you collect, the more data you put at risk.
Machine learning systems also want contextualized data, which expands your organization’s risk for exposure. These newer and richer data sets are more likely to attract hackers, and a breach could be even more devastating to your organization’s reputation.
Unfortunately, data scientists and mathematicians don’t often worry about vulnerabilities when writing machine learning algorithm codes.
Organizations that build their own machine learning systems can go back to the training data or to the algorithms used, and fix the problem when something goes wrong.
Meanwhile, machine learning developed with open source software code creates an opportunity for hackers to slip in malicious code that includes vulnerabilities or vulnerable dependencies.
Adversarial machine learning
Adversarial machine learning is a growing field of research that explores ways learning algorithms can be compromised.
The objective of adversarial machine learning is to examine how malicious actors can take advantage of weaknesses in machine learning to target the organizations that use them. The implicit message in this approach: that organizations should act now to secure their machine learning systems.
Developers and adopters of machine learning algorithms, however, aren’t taking the necessary measures to strengthen their models against adversarial attacks.
Ultimately, securing machine learning systems will involve moving beyond the attack of the day and “penetrate and patch,” toward real security engineering.
Machine learning and cybersecurity
Although machine learning poses unique risks to the organizations that use it, cybersecurity is actually one of the top functions for which machine learning is used.
Because computers powered by machine learning algorithms can perform functions that they hadn’t been programmed to perform, ML is an ideal choice for identifying cybersecurity threats and mitigating the risk that comes along with various types of attacks.
For example, in 2018, Microsoft’s Windows Defender successfully identified and blocked crypto miners (attackers attempting to plant cryptocurrency miners on thousands of computers via Trojan malware) before the intruders even started digging. That was thanks to multiple layers of machine learning.
In this instance, machine learning transformed endpoint security by adding accuracy and contextual intelligence.
That said, cyber attackers use machine learning too, to develop sophisticated malware and cybersecurity attacks to bypass and fool security systems.
A 2020 Deloitte survey suggests that 62 percent of machine learning adopters see cybersecurity risks as a major or extreme concern — but only 39 percent said that they are prepared to address those risks.
For this reason, it is important that your organization understand the most common machine learning security risk problems, so you are better prepared to address and reduce those risks.
Common Machine Learning Security Risk Problems
Here are five common machine learning security risk problems and how to avoid them:
Adversarial Examples
An adversarial example is an instance with small and intentional feature disruptions that cause a machine learning mode to make a false prediction or categorization. Adversarial examples are a growing threat in the artificial intelligence research community, and are the most commonly discussed attacks against machine learning.
For example, researchers found they could cause a self-driving car to change lanes into oncoming traffic by placing a few small stickers on the ground. Other studies have shown that a computer vision system could be fooled into wrongly classifying a stop sign as a speed sign with just a few pieces of tape.
Although adversarial examples clearly pose a real threat to machine learning systems, a very small percentage of current artificial intelligence goes toward defending systems against adversarial efforts.
One resource is the Adversarial ML Threat Matrix framework developed by Microsoft and MITRE to detect cyber threats in enterprise networks. It uses the ATT&CK table to combine known and documented tactics and techniques used in attacking digital infrastructure with methods that are unique to machine learning systems.
Ultimately, defending machine learning systems against adversarial examples begins with strengthening algorithms against adversaries. You can also test your machine learning models with a Trojan attack, which involves modifying a model to respond to input triggers that cause it to infer an incorrect response.
Data Poisoning
Data poisoning occurs when a cybercriminal pollutes a machine learning system’s training data. Tampering with training data is considered an attack on the system’s integrity, and affects the machine learning model’s ability to make correct predictions.
A machine learning system learns to do what it does directly from its data. Machine learning systems contain multiple data sources that are subject to poisoning attacks, including raw data and datasets that are assembled to train, test, and validate a machine learning system.
Machine learning uses classification, or the process of predicting the class of given data points. The most common classification algorithms in machine learning include logistic regression, naive bayes, K-nearest neighbors, decision tree, and support vector machines.
In a blackbox scenario, data poisoning takes place against classifiers that rely on user feedback to update their learning.
In a whitebox scenario, the attacker gains access to the model and its private training data somewhere in the supply chain, especially if the data comes from multiple sources.
Fixing data poisoning after the fact is nearly impossible, so security pros must focus on prevention and detection.
Data poisoning attacks happen over time and over a number of training cycles. Because machine learning models get retrained periodically with newly collected data, knowing when prediction accuracy starts to shift can be difficult.
To reverse data poisoning, you must analyze past inputs to identify poisoned data samples and remove them. This is an extremely time-consuming process, especially with large quantities of data and a large number of attacks.
Therefore, model developers should focus on block attack attempts or detecting malicious inputs before the next training cycle occurs.
Input validity checking, rate limiting, regression testing, manual moderation, and using various statistical techniques to detect anomalies are methods for prevention and detection that should be used liberally to combat data poisoning.
Online System Manipulation
A machine learning system is considered to be “online” when it continues to learn during operational use and modifies its behavior over time.
With online system manipulation, a cyberattacker can nudge the system in the wrong direction using system input to slowly retrain the model to do the wrong thing. This type of attack is both subtle and reasonably easy to carry out.
Machine learning engineers must consider data provenance, algorithm choice, and system operations in order to properly address online system manipulation.
Transfer-Learning Attack
Many machine learning systems are constructed using an already-trained base model with somewhat generic capabilities that are fine-tuned with a round of specialized training.
A transfer-learning attack occurs when an attacker devises attacks using a widely available pretrained model that is effective enough to succeed against your tuned task-specific model.
Likewise, a fine-tuned machine learning system could possibly be a Trojan that includes subtle machine learning behavior that is unanticipated.
To avoid a transfer-learning attack, groups posting models for transfer should precisely describe exactly what their systems do and how they control the risks, so that users can watch for behavior that deviates from the norm.
Data Confidentiality
One of the biggest challenges in machine learning is protecting sensitive or confidential data that are built right into a model through training.
Preserving data confidentiality in a machine learning system is more challenging than in a standard computing system. That’s because a machine learning system that is trained on confidential or sensitive data has some aspects of that data built into the system via training.
The best way to protect against data confidentiality attacks is to pick and choose which data is included in a machine learning system, avoiding data that is particularly sensitive.
Manage security risks with ZenGRC
Keeping your machine learning systems secure isn’t easy. It can be a time-consuming process that doesn’t always yield results, especially as cybercriminals find new ways to infiltrate systems.
Fortunately, there is a GRC solution that can help your organization keep its machine learning systems safe. ZenGRC is a risk management software from Reciprocity that can help manage compliance, as well as streamline your risk management process.
Staying ahead of cybercriminals who want to attack your machine learning systems through prevention and detection begins with a healthy risk management program.
Using ZenGRC’s risk management software to conduct regular risk assessments is just one way you can better protect yourself against future attempts to breach your machine learning system.
Providing greater visibility throughout your organization to better manage risks and mitigate business exposure, ZenGRC offers operationalized risk management, customizable risk calculations, and continuous risk monitoring.
Solve your risk management challenges and sign up for a demo today. Discover the power of ZenGRC and manage risk worry-free — the Zen way.