Table of contents

Why is data poisoning becoming such a major concern?
How does a data poisoning attack work?
Where is data poisoning most likely to occur?
What are the different types of data poisoning attacks?
What is the difference between data poisoning and prompt injections?
What are the potential consequences of data poisoning attacks?
How to protect against data poisoning
A brief history of data poisoning
Data poisoning FAQs

What Is Data Poisoning? [Examples & Prevention]

11 min. read

Table of contents

Why is data poisoning becoming such a major concern?
How does a data poisoning attack work?
Where is data poisoning most likely to occur?
What are the different types of data poisoning attacks?
What is the difference between data poisoning and prompt injections?
What are the potential consequences of data poisoning attacks?
How to protect against data poisoning
A brief history of data poisoning
Data poisoning FAQs

A data poisoning attack is a type of adversarial attack in which an attacker intentionally alters the training data used to develop a machine learning or AI model.

The goal is to influence the model's behavior during training in a way that persists into deployment. This can cause the model to make incorrect predictions, behave unpredictably, or embed hidden vulnerabilities.

Why is data poisoning becoming such a major concern?

Data poisoning is becoming a major concern because more organizations are deploying AI models and trusting them to make decisions. That shift means the quality and security of training data matters more than ever.

In the past, model performance was the main focus. Today, organizations are starting to ask harder questions: Where does the data come from? Who had access to it? Can it be trusted?

"Knowledge poisoning involves compromising the training data or external data sources that an LLM or application relies on. It can lead to the model learning and propagating incorrect or harmful information by poisoning the training data directly or injecting malicious data into external sources that the application uses."

- Palo Alto Networks, Securing GenAI: A Comprehensive Report on Prompt Attacks, Taxonomy, Risks, and Solutions

AI models are often trained using large, diverse datasets. Some come from public sources. Others are collected automatically. In many cases, the data pipeline includes third-party contributors or external partners.

That's risky on its own. But in the GenAI era, the exposure multiplies.

Many GenAI systems are now fine-tuned over time. These include large language models (LLMs) and other generative architectures that produce text, images, or other outputs. Others use retrieval-augmented generation (RAG), where the model actively pulls in content from external data sources to answer questions.

In both cases, malicious or manipulated data can slip in unnoticed. And influence results without ever touching the model's core weights. It's a major GenAI security risk.

Here's the problem:

Once a model is trained on poisoned data, the effects aren't always visible. The model might look fine. It might even pass testing. But it's vulnerable in ways that are hard to detect. And even harder to trace back.

That's a major concern for industries with high consequences.

A poisoned fraud detection model could overlook criminal patterns. A poisoned recommendation system could promote harmful content. And in sectors like healthcare or finance, the cost of undetected manipulation can be significant.

"Poisoning attacks – induce failures when poisoning only ~0.001% of data. Large‑scale poisoning is feasible!"

- NIST, Apostol Vassilev, Ph.D., AI Risk and Threat Taxonomy

The result?

Security teams, risk managers, and AI developers are all paying closer attention. Concern is rising not because poisoning is new. But because the AI security risks are now harder to ignore.

| Further reading: NIST AI Risk Management Framework (AI RMF)

How does a data poisoning attack work?

It starts with the training data.

Machine learning models learn by analyzing patterns in data. That data helps the model understand what to expect and how to respond. If that data is manipulated, the model learns something incorrect. Which means the model doesn’t just make a mistake—it behaves the way the attacker wants.

Architecture diagram illustrating a data poisoning attack by depicting the flow of compromised training data into a machine learning system. On the left, a red icon labeled Poisoning samples with a silhouette of an attacker connects downward to a Training data icon, represented by a database symbol. An arrow extends rightward to a Bad data icon, signifying the introduction of manipulated or corrupted data into the training set. The next stage, labeled Deployed, transitions to an ML-based service, represented by a circular neural network icon. Above this stage, an Input label indicates the data fed into the model after deployment. On the right, three red arrows point outward from the ML-based service, each leading to separate labels: Accuracy drop, Misclassifications, and Backdoor triggering, illustrating the potential consequences of the poisoned data during the Testing (or inference) phase. A thin horizontal line at the bottom divides the Training phase from the Testing (or inference) phase, visually differentiating the stages of the attack.

Here's how that happens:

In a data poisoning attack, the attacker injects harmful or misleading examples into the training dataset. These can be entirely new records. Or subtle changes to existing ones. Sometimes even deletions.

In GenAI systems, this might include poisoned documents scraped into pretraining data or inserted into fine-tuning sets.

The attack occurs during training. Not after deployment. That's important.

Because the goal is to shape how the model learns from the start. Not to exploit it later. This makes poisoning hard to detect because the corrupted model may still perform normally in many scenarios.

But the changes have consequences.

Depending on the attack, the model might misclassify certain inputs. It might become biased in specific ways. Or it might fail when given attacker-controlled triggers. In some cases, the attacker hides a backdoor that lets them control future behavior.

Note:

A poisoned model might pass standard evaluations. It might behave correctly most of the time. But when it fails, it fails in a way the attacker intended.

The end result?

A model that looks fine on the surface. But under certain conditions, behaves in ways that put performance, safety, or security at risk.

Where is data poisoning most likely to occur?

Data poisoning is most likely to happen wherever training data comes from outside trusted sources. That includes any data not curated, verified, or governed by the organization developing the model.

Many machine learning pipelines depend on third-party, crowdsourced, or publicly available data, including:

Scraped web content
Open datasets
User-generated inputs
Shared corpora used in collaborative training environments

Note:

Shared corpora are datasets maintained or contributed to by multiple parties, often used for collaborative model training or benchmarking.

These sources offer scale and diversity. But also introduce risk. Attackers can target weak points in these collection processes to insert poisoned data without detection.

Poisoning can happen during data ingestion. Especially when content is scraped from the web or pulled from APIs without strong validation. These attack paths are common in large-scale training workflows.

Another area of concern is federated learning. In these systems, multiple participants contribute updates to a shared model. That structure makes it harder to monitor or control what data is used at each endpoint. It also increases the chances that a malicious participant could introduce harmful inputs.

Note:

Federated learning is a machine learning technique where multiple devices or parties train a shared model collaboratively without exchanging raw data. Instead, each participant trains locally and shares only model updates.

As training methods continue to decentralize and automate, the opportunities for poisoning grow.

That's especially true in GenAI workflows.

Poisoning risks show up during fine-tuning or when injecting new knowledge through retrieval-augmented generation (RAG). If the source material isn't trustworthy, the model can internalize harmful patterns without any obvious signs.

Another emerging concern is poisoned synthetic data: malicious content auto-generated and reintroduced into training pipelines, sometimes without clear attribution.

What are the different types of data poisoning attacks?

There are a few different ways to classify data poisoning attacks. One of the most technically complete models focuses on what the attacker can change and how they apply that control.

This gives us four main types of data poisoning attacks:

Label modification attacks
Poison insertion attacks
Data modification attacks
Boiling frog attacks

Let's take a look at the details of each.

Label modification attacks

Diagram titled 'Label modification (label flipping) attack' illustrates how a malicious user alters training labels to mislead a model. In the top half, labeled 'Label Flipping', three users train models using MNIST data. The first two follow a clean path with labels 4, 8, and 1 resulting in correct predictions. The third, labeled 'Malicious User', alters the label for the digit 7, flipping it to 1 before training. In the prediction phase, the model outputs label 7 incorrectly. The lower half, labeled 'Backdoor Attack', mirrors this structure but shows the attacker introducing a backdoor by training on manipulated data. In the prediction phase, the same input (a 4) is shown producing different labels, with one resulting in an incorrect label 7 due to the backdoor. The image is divided into two columns—'Training phase' and 'Prediction phase'—with model outputs shown as labeled nodes.

The attacker adds new data points into the training set. These poisoned entries might be mislabeled, subtly distorted, or specifically crafted to shift model behavior. In some cases, the attacker controls both the features and the labels of the inserted data.

These attacks are usually bounded by a fixed number of modifications.

Poison insertion attacks

A diagram labeled 'Poison insertion in federated learning' shows three participants contributing to a shared machine learning model. Each participant has a local dataset represented by a database icon and sends model updates, labeled as Δw, to a central server. Two attackers are highlighted. Attacker 1 adds poisoned data or modifies local samples at Participant n, represented by a red database. Attacker 2 fabricates model updates by altering Δw to Δŵ. Both actions are visually tied to the aggregation process, indicating how malicious updates can influence the global model on the server.

These attacks are usually bounded by a fixed number of modifications.

Data modification attacks

Instead of adding data, the attacker edits existing records. Both features and labels can be changed. This method avoids increasing dataset size, which makes it harder to detect during review or audit.

Boiling frog attacks

A horizontal diagram illustrates the concept of a boiling frog attack using a visual metaphor of increasing heat across training cycles. The diagram begins with a green icon labeled ‘Clean data’ and marked as ‘Cold.’ It progresses through four training cycles: Training cycle 1 (‘Warm,’ 2% poisoned data), Training cycle 5 (‘Warmer,’ 8% poisoned data), Training cycle 10 (‘Hot,’ 15% poisoned data), and ends at ‘Final state’ with a red icon labeled ‘Boil,’ showing 20% poisoned data. Each cycle includes a pot icon with increasing heat intensity and is annotated with the phrase ‘Incremental changes made.’ At the top, three key characteristics are listed: ‘Small incremental changes in each cycle,’ ‘Each change stays below detection thresholds,’ and ‘Cumulative effect gradually compromises model behavior.’

This is a gradual poisoning strategy. The attacker introduces small changes across repeated training cycles. Over time, the cumulative effect shifts model behavior without triggering alarms. These attacks are especially relevant in online learning or continual fine-tuning workflows.

Other ways to classify attacks

As noted, these four types of data poisoning attacks describe how the malicious attacker interacts with the training data. In other words, what the attacker changes. And when.

But that's not the only way to think about poisoning attacks.

It's also useful to describe them based on attack goals, stealth level, or where the poisoned data originates.

The following views don't replace the four core types. They just highlight different dimensions of the same attacks.

Let's elaborate a bit:

A poisoning attack might be targeted—designed to cause a specific model failure. Or it might be indiscriminate—intended to reduce accuracy across the board.
Both could be carried out using label modification, data insertion, or any other method.
Some attacks are clean-label, where the poisoned data looks valid and the label is correct. Others are dirty-label, where the attacker deliberately mismatches the input and label to disrupt learning.
And in large-scale or generative AI systems, poisoning can be direct or indirect. Direct attacks alter data within the training pipeline. Indirect attacks place malicious content in upstream sources—like websites or documents—hoping it gets scraped into future fine-tuning sets or RAG corpora.

| Further reading: What Is Adversarial AI?

Test your response to real-world AI infrastructure attacks. Explore Unit 42 Tabletop Exercises (TTX).

Learn more

What is the difference between data poisoning and prompt injections?

Data poisoning happens during training. Prompt injection happens during inference.

Here's the difference:

In a data poisoning attack, the attacker alters the training data. This corrupts the model before it's ever deployed. The model learns from poisoned examples and carries that flawed logic into deployment.

In a prompt injection attack, the model is already trained. The attacker manipulates input at runtime—often by inserting hidden instructions into prompts or retrieved content. The model behaves incorrectly in the moment, even though it was trained properly.

Architecture diagram illustrating a direct prompt injection attack scenario involving an infected LLM application. On the right, a red attacker icon labeled

Why does this matter?

Because the two attacks exploit different parts of the AI lifecycle. Data poisoning requires access to training data. Prompt injection requires access to the input stream. And defending against one doesn't necessarily protect you from the other.

| Further reading:

What are the potential consequences of data poisoning attacks?

A two-column graphic titled 'Potential consequences of data poisoning attacks' displays six red-orange icons with white outlines, each paired with a consequence label. On the left side, the icons represent: a chart and warning triangle labeled 'Misclassification or degraded accuracy,' a stopwatch labeled 'Long-term contamination,' and a directional arrow with a keyhole labeled 'Hidden backdoors.' On the right, the icons show: a thumbs-down symbol labeled 'Erosion of trust,' and a face in a speech bubble labeled 'Biased or manipulated outputs.' The background is split vertically, with light gray on the left and white on the right, and all elements are aligned in two balanced rows.

Data poisoning can undermine the reliability of AI systems in ways that are difficult to detect. And even harder to reverse.

The impact depends on how the attack is carried out and where the model is deployed.

Here are some of the most common and serious consequences.

Misclassification or degraded accuracy

Poisoned models often misinterpret inputs. They might classify benign data as malicious, or vice versa. This becomes especially dangerous in high-stakes use cases like fraud detection, cybersecurity, or medical diagnostics.

Infographic with the title at top center reading 'Data misclassification scenario'. Centered beneath is the subheading 'Fraud detection system'. The canvas is split into two dotted-outline panels.Left panel: a green header label reads 'Normal model'. Inside, a left gray card says 'Fraudulent transaction: $39,3999 at 3 AM unusual location', with a right-pointing arrow to a second gray card that says 'Fraud detected'. A small green checkmark icon appears below. Along the bottom of this left panel, a green pill-shaped bar reads 'Correct classification'. Right panel: an orange header label reads 'Poisoned model'. Inside, a left gray card repeats 'Fraudulent transaction: $39,3999 at 3 AM unusual location', with a right-pointing arrow to a second gray card that says 'Approved as legitimate'. A small red circle with an 'x' icon appears below. Along the bottom of this right panel, a red pill-shaped bar reads 'Correct classification'. Along the bottom of the figure are three numbered captions aligned under the panels: Under the left side: 'Attacker injects poisoned training data with fraudulent transactions labeled as 'legitimate'.' Centered: 'Model learns incorrect patterns associating fraud indicators are legitimate.' Under the right side: 'Specific patterns consistently bypass detection.'

In GenAI systems, this might show up as hallucinated answers, unreliable summarization, or inconsistent outputs in chat-based models trained on tainted documents.

Note:

Misclassification from poisoning is often input-specific. Which means it can evade detection during broad validation but consistently fail on attacker-selected prompts or examples.

Long-term contamination

Once poisoned data enters a training pipeline, it can persist. Even future versions may inherit the problem if fine-tuned or retrained on the same poisoned corpora. Especially in continuous learning or foundation model adaptation scenarios.

A horizontal diagram titled ‘Poison persistence across model versions’ shows a training timeline from Model v1.0 to Model v4.0. Model v1.0 is labeled as the initial base model. Model v2.0 is fine-tuned from v1.0, model v3.0 is adapted from v2.0, and model v4.0 is updated from v3.0. A red icon labeled ‘Poisoned data introduced’ appears at model v1.0. A red arrow labeled ‘Poisoned data persisting through new model versions’ runs underneath all four models, illustrating how contamination introduced early can propagate through fine-tuning, adaptation, and updates.

This makes attacks hard to contain. And even harder to clean up.

Hidden backdoors

Some attacks insert backdoors that trigger specific behaviors only under attacker-controlled conditions. The model may perform normally during testing. But later, when exposed to a particular input, it follows the attacker's logic instead.

A diagram titled ‘Hidden backdoor activation in a poisoned model’ shows two phases. On the left, normal training data and a trigger phrase are both fed into a model during training, teaching the model hidden patterns. On the right, the same model behaves differently depending on input. A normal input produces a normal output. A trigger input activates the hidden backdoor and produces a malicious output. The diagram is split by a dotted vertical line separating training and inference phases, with red arrows emphasizing the attacker-controlled flow.

In LLMs, these triggers can take the form of hidden phrases, prompt patterns, or poisoned RAG content that causes the model to generate unsafe or attacker-controlled completions. Backdoors are a major GenAI security concern.

Note:

Some backdoors in LLMs are triggerable by natural-sounding phrases, not just gibberish or synthetic tokens. Which makes them even harder to screen using prompt filtering alone.

Erosion of trust

Ultimately, data poisoning weakens confidence in the model. Organizations may no longer be able to rely on its predictions. That erodes trust among users, regulators, and internal stakeholders. Especially in regulated or safety-critical domains.

For GenAI deployments, trust loss can happen quickly if public-facing LLMs generate harmful, false, or manipulated content. It damages both credibility and adoption.

Biased or manipulated outputs

A diagram titled ‘How training bias leads to skewed AI outputs’ shows a process flow where an attacker introduces biased or manipulated inputs into the training dataset. Sentiment, tone, facts, and other data flow into the training phase, which is labeled as ‘Biased data set.’ This biased training data is also used during validation. Both the training and validation outputs are sent to the testing phase, reinforcing the bias and producing skewed results. Arrows show the cyclical relationship between training and validation phases, while the test phase appears at the end of the pipeline.

Data poisoning can introduce subtle patterns that skew a model's responses.

In practice, this might cause a recommendation engine to promote disinformation. Or a language model to reinforce stereotypes. And without anyone noticing until it's too late.

In GenAI, attackers can subtly influence model tone, sentiment, or factual responses by injecting slanted or manipulated data into the training corpus.

Note:

Poisoned models may suffer from confidence calibration issues, meaning their level of certainty doesn't match the actual accuracy of their outputs. For instance, the model might express high confidence in a wrong answer, or hesitate when it's actually correct.

See how Prisma AIRS flags unsafe GenAI outputs, including those that could be coming from poisoned models.

Start interactive demo

How to protect against data poisoning

Data poisoning isn't always easy to detect. And once a model has been trained on compromised data, the damage can be hard to undo.

Which means the best protection involves a layered approach across detection, mitigation, and prevention.

Let's break each one down.

Detection

Detecting data poisoning requires behavioral analysis, model validation, and careful inspection of training inputs.

Here's how to approach it:

Monitor model behavior over time
- Track for signs that model outputs are drifting or becoming inconsistent
- Look for reduced accuracy in specific scenarios
- Check if the model fails disproportionately on edge cases or adversarial inputs.
- Track for hallucinated or inconsistent completions in GenAI applications (e.g., summarization, Q&A, or code generation)
- Watch for new or unusual prediction patterns
Tip:

Compare behavior across retraining cycles. If small updates lead to large performance shifts, it may signal poisoning.
Audit training data sources
Understand where your data is coming from.
- Validate third-party datasets for known risks
- Review RAG sources and fine-tuning sets for untrusted web content or edited documents
- Investigate outlier samples with unusual formatting or labels
- Look for inconsistent metadata or unexplained data duplication
Use holdout validation sets
Keep known-clean data separate from the training process.
- Evaluate model performance against it regularly
- Use it to detect unanticipated behavioral changes
- In GenAI, check for shifts in tone, reasoning quality, or prompt adherence across model updates
- Compare output distributions across training runs
Apply anomaly detection tools
Some poisoning attempts leave subtle fingerprints in the data.
- Use statistical checks to find outliers in feature distributions
- Apply clustering to detect unnatural data groupings
- Consider graph-based techniques to expose suspicious relationships
Tip:

Don't just look at individual samples. Look for patterns across samples that suggest coordinated manipulation.

Mitigation

Once poisoning is confirmed—or strongly suspected—the priority is to minimize harm.

Here's how to respond:

Isolate and remove affected data
Start by identifying and pulling suspect samples.
- Focus on inputs tied to degraded model performance
- Check logs for when and how poisoned data was introduced
- Rebuild the training set without high-risk segments
Tip:

Always keep backups of earlier training sets. They're essential for comparison, rollback, and identifying when poisoning may have started.
Retrain the model from a clean baseline
Retraining is often necessary to fully remove poisoning effects.
- Use verified data with known provenance
- Avoid resuming training from a previously compromised checkpoint
- Validate intermediate outputs as training progresses
Tip:

If full retraining isn't feasible, consider targeted fine-tuning using clean, domain-specific data.
Limit downstream model exposure
While recovery is in progress, limit high-risk uses of the affected model.
- Pause automated decision-making where possible
- Add safeguards or human review to model outputs
- Temporarily disable exposure of affected RAG corpora or fine-tuning endpoints
- Notify internal stakeholders of limitations

Prevention

The most effective defense is to stop poisoned data from entering your pipeline.

Here's how to strengthen your controls:

Secure your data collection process
Treat training data like code or infrastructure.
- Use authenticated APIs and encrypted channels
- Restrict who can submit or modify training inputs
- Use access controls and review policies for updating GenAI content sources, such as embeddings or retrieval indexes
- Log and audit all data additions or edits
Tip:

Avoid scraping uncontrolled sources unless content is filtered and validated.
Establish strict data validation policies
Use automated checks to enforce standards.
- Require consistent formatting and labeling
- Check for duplicate entries and metadata anomalies
- Flag unusually influential samples
Tip:

Build validation into the pipeline. Not just at ingestion.
Segment and sandbox new data
Keep incoming data separate from production datasets.
- Test new inputs in isolated environments
- In RAG-based systems, validate that added documents do not contain malicious prompt structures or redirect behavior
- Evaluate their impact on small-scale models first
- Monitor for unexpected shifts before full integration
Tip:

Shadow-train models using experimental data to detect emerging threats early.
Use differential analysis during retraining
Compare outputs before and after data updates.
- Look for spikes in model error rates
- Watch for behavioral changes in specific prediction categories
- For GenAI, test known safe prompts before and after updates to verify consistent and secure output generation
- Run adversarial probes to test resilience
Tip:

Maintain version control for both data and models. It helps track changes and simplifies rollback.

| Further reading:

Get a complimentary AI Risk Assessment and find out if you're vulnerable to AI-specific attacks like data poisoning.

Get free assessment

A brief history of data poisoning

A visual timeline titled 'The history of data poisoning' features five key milestones marked with minimalist red icons and accompanying text. The timeline begins on the left with 'Early 2010s' and a biohazard icon, noting the first demonstrations of data poisoning. Next is '2010s–2020' with a gear icon, referencing technique refinement and growing stealth. The third milestone, labeled '2023–2024' with a signal icon, states that generative AI raises the stakes. The fourth point, '2024,' includes a model icon and reads 'Breakthrough: Indirect RAG poisoning demonstrated.' The final milestone, marked 'Today' with a warning triangle icon, states 'Poisoning recognized as a major AI threat.' Each label is aligned to vertical lines extending from their icons, evenly spaced across a white background.

Data poisoning was first explored in the early 2010s, when researchers demonstrated that machine learning models could be manipulated by tampering with their training data. Initial studies focused on simple label flipping and input manipulation, primarily in image classification tasks.

Over the next decade, the field matured. Researchers began developing more stealthy and targeted attacks, including backdoors and clean-label techniques that were harder to detect during evaluation.

But the concern became more urgent with the rise of generative AI.

As LLMs began relying on large-scale public data and fine-tuning pipelines, researchers warned that poisoned inputs could slip in through scraped content or external document corpora. In 2024, academic teams demonstrated indirect poisoning in RAG systems, showing how subtle injections could distort GenAI responses.

Today, data poisoning is recognized as one of the most challenging threats to the integrity of AI models. Especially in production systems that retrain continuously or pull in unvetted external content.

See firsthand how to discover, secure, and monitor your AI environment. Get a personalized Prisma AIRS demo.

Request demo

Data poisoning FAQs

An attacker inserts mislabeled or manipulated examples into a training dataset, causing the model to learn flawed logic. For instance, in a clean-label backdoor attack, an image classifier is trained to misclassify a specific object only when a subtle trigger is present—without degrading performance on other inputs.

Detection relies on behavioral monitoring, anomaly detection, and training data audits. Teams look for degraded accuracy, unusual model outputs, inconsistent RAG completions, or suspicious data patterns. Comparing outputs across model versions and inspecting holdout validation sets also helps surface poisoning.

Data poisoning happens during training, corrupting how the model learns. Adversarial attacks occur during inference, manipulating inputs at runtime to cause misclassification or misbehavior. Both are intentional, but they exploit different stages of the AI lifecycle.

Yes. LLMs can be poisoned during pretraining, fine-tuning, or via external data sources in RAG pipelines. Attacks may influence summarization, bias outputs, or embed backdoors—often without detection.

Targeted attacks aim to produce specific, attacker-controlled failures—such as misclassifying one input type. Untargeted attacks degrade overall model performance or reliability without focusing on a specific outcome.

No. Data poisoning corrupts training data to embed long-term model flaws. Prompt injection manipulates inputs during inference to hijack behavior in real time. The former affects how the model learns; the latter exploits how it responds.

They validate all documents before inclusion, scan for hidden prompts or malformed content, segment new sources, and shadow-train small models to observe behavioral impact. Ongoing monitoring and access controls further reduce risk.

What Is Data Poisoning? [Examples & Prevention]

Why is data poisoning becoming such a major concern?

How does a data poisoning attack work?

Where is data poisoning most likely to occur?

What are the different types of data poisoning attacks?

Label modification attacks

Poison insertion attacks

Data modification attacks

Boiling frog attacks

Other ways to classify attacks

Test your response to real-world AI infrastructure attacks. Explore Unit 42 Tabletop Exercises (TTX).

What is the difference between data poisoning and prompt injections?

What are the potential consequences of data poisoning attacks?

Misclassification or degraded accuracy

Long-term contamination

Hidden backdoors

Erosion of trust

Biased or manipulated outputs

See how Prisma AIRS flags unsafe GenAI outputs, including those that could be coming from poisoned models.

How to protect against data poisoning

Detection

Mitigation

Prevention

Get a complimentary AI Risk Assessment and find out if you're vulnerable to AI-specific attacks like data poisoning.

A brief history of data poisoning

See firsthand how to discover, secure, and monitor your AI environment. Get a personalized Prisma AIRS demo.

Data poisoning FAQs

What is an example of data poisoning?

How do you detect data poisoning?

What’s the difference between data poisoning and adversarial attacks?

Can data poisoning affect large language models (LLMs)?

What is the difference between targeted and untargeted data poisoning?

Is data poisoning the same as prompt injection?

How do organizations prevent data poisoning in RAG pipelines?

Get the latest news, invites to events, and threat alerts

Products and Services

Company

Popular Links