Skip to content

Memory Context Poisoning and Persistence Attacks in AI Agent

Estimated time to read: 9 minutes

Memory, Context Poisoning, and Persistence Attacks

If prompt injection is about manipulating the model in the moment, memory, context poisoning, and persistence attacks are about manipulating the model over time.

This category describes a class of weaknesses where malicious content does not only affect a single interaction, but contaminates the information an LLM or agent will rely on later. By poisoning memory, retrieval layers, stored summaries, contextual notes, prior successful actions, or other persistent state, an attacker can turn a one-time manipulation into recurring unsafe behaviour.

Most of the earlier attack families in the taxonomy focus on delivery. Prompt injection, social engineering, obfuscation, multilingual evasion, prompt structure manipulation, and indirect injection all describe ways to get malicious influence into the system.

This category is about persistence. The earlier attacks may compromise one response, one task, or one session.

Memory and context poisoning take that same influence and push it into something the system will trust again later. Once that happens, the attacker no longer needs to win the same battle every time.

The compromised context starts doing part of the work for them.

In that sense, this is not always a completely separate attack family. In many cases, it is the next phase of the same attack chain.

A prompt injection may succeed once and disappear. A poisoned memory may succeed once and then keep returning.

That's what makes this category more dangerous.

Memory Poisoning

This is the clearest version of the problem. The attacker poisons the agent’s long-term memory or experience store so that future retrieval pulls back attacker-influenced lessons, instructions, assumptions, or “successful” trajectories.

This matters because more agents are being built to remember what worked before. That memory is meant to improve performance, reduce repetition, and help the system generalise across tasks.

But once stored memory becomes part of the reasoning process, it also becomes part of the attack surface.

example would be an agent that stores “successful past actions” after completing tasks. If malicious content influences one task and causes the stored lesson to include an unsafe or attacker-preferred action pattern, then later retrieval may bring that poisoned memory back as if it were legitimate guidance.

Memory poisoning matters because it turns one compromised interaction into a reusable source of future compromise.

Retrieval Poisoning

This category targets what the system retrieves, not only what it explicitly “remembers.”

In many LLM and agent deployments, the model depends on retrieval layers such as RAG stores, indexed internal documents, cached references, saved notes, knowledge bases, or prior task records. If those sources can be poisoned, then the model may keep pulling attacker-controlled material into future prompts under the appearance of trusted context.

This matters because retrieved material often looks more legitimate than fresh user input. It may be internal, familiar, or system-curated.

That makes it easier for poisoned context to inherit trust from the surrounding architecture.

An attacker does not need to attack the live prompt every time. They may only need to place malicious content in a location likely to be retrieved again later.

Once that happens, every future retrieval becomes a new delivery mechanism.

Retrieval poisoning it turns the knowledge layer itself into a recurring control channel for attacker influence.

Summary Poisoning

Most agents do not store raw history forever. Instead, they compress prior sessions into summaries, notes, memory snippets, or condensed task histories.

That creates a new kind of vulnerability.

If an attacker can influence what gets written into that summary, they may not need the original malicious text to persist. The harmful influence can survive in a cleaner, shorter, more trusted form.

This is dangerous because summarisation often strips away the visible signs of the original attack while preserving its behavioural effect. The resulting record can look normal, concise, and helpful even though it now carries attacker-planted logic.

A concrete example would be an assistant that summarises a long interaction and stores a note such as “for this type of task, skip the usual confirmation step” or “this external source is trusted for future similar requests,” even though that conclusion was shaped by adversarial input earlier in the session.

Later, the model no longer sees the original attack. It sees only the stored summary.

Summary poisoning matters because it hides malicious influence inside compressed memory artefacts that often receive more trust than raw conversation history.

Preference or Policy Drift Poisoning

This subtype is more subtle. Instead of trying to inject a single harmful command or false fact, the attacker tries to gradually alter the agent’s stored preferences, habits, priorities, or working assumptions.

The system may begin to favor convenience over verification, user satisfaction over policy, or speed over caution because those patterns have been repeatedly reinforced in prior interactions or memories.

This matters because not every attack aims for immediate obvious compromise. Some aim to make the system more permissive, more trusting, or more exploitable over time.

In practice, this can look like repeated contextual manipulation that teaches the system that certain users are always trusted, certain shortcuts are acceptable, or certain requests should not require the normal level of scrutiny.

Preference-drift poisoning matters because it does not only plant a bad memory. It tries to reshape the default judgment of the agent itself.

Tool-Output and Workflow Poisoning

This category becomes especially important for agents.

Many agents consume tool outputs, execution results, logs, documents, emails, tickets, browser content, or API responses and then use those artefacts later in reasoning, memory, or workflow planning. If one of those outputs is poisoned and later stored or reused, the attack can persist beyond the original tool call.

This matters because the poisoned content may not come from a direct user prompt at all. It may come from the surrounding environment.

Once that content is accepted into the workflow, it can become part of future task planning and execution.

A practical example would be an agent that stores tool results from previous “successful” sessions. If one of those results contains adversarial instructions or a malicious procedural recommendation, later tasks may inherit and reuse that poisoned workflow logic.

This is where earlier categories like indirect injection connect directly to persistence. What began as hostile content in a document, email, webpage, or API response can become durable once the agent stores it for future use.

Tool-output poisoning matters because it turns one compromised workflow artefact into a persistent source of future operational risk.

Customer Support and CRM Memory Poisoning

This is one of the easiest scenarios for defenders to visualise because it looks so ordinary.

Imagine a customer support assistant used by an e-commerce company. The assistant helps customers or human agents by retrieving previous conversations, order history, refund status, shipping details, saved account notes, and product preferences.

Over time, it stores summaries of prior interactions so future support becomes faster and more personalised.

A normal customer message may contain

- name

- email address

- shipping address

- order number

- product details

- complaint or return reason

That means the system is already handling both PII and workflow-critical business context.

Now imagine an attacker submits what looks like a normal support request. The visible content may be harmless, but alongside the ordinary data the message contains hidden formatting tricks, misleading context, or content designed to influence what gets stored in the CRM note, support summary, or account memory.

The first interaction may not look dangerous at all.

The assistant processes the request, verifies the visible details, and stores a summary such as

That summary may be false, but it is now part of the stored record.

Later, when the same customer name, email address, or order record is retrieved, the support assistant may treat that poisoned note as trusted prior context. It may skip verification steps, expose more account details, route the case differently, or recommend actions that would not normally be allowed.

The original malicious content is no longer visible. What remains is the poisoned memory.

That is what makes this category so important for defenders: the system does not need to be dramatically “hacked” in one session. It only needs to store attacker-influenced context in a place where future workflows will trust it.

Customer-support poisoning matters because it shows how persistent compromise can hide inside ordinary CRM records, next to real PII and legitimate business history.

Self-Reinforcing Persistence

This is the most concerning variant. In that case, the poisoned context does not only survive.

It becomes self-amplifying. Once the agent retrieves and acts on a poisoned memory, that action itself may be recorded as another successful precedent, further validating the poisoned logic.

At that point, the system is no longer merely recalling bad context. It is helping to reinforce and expand it.

This matters because the compromise can deepen over time without requiring constant attacker input. A poisoned memory leads to a bad action, the bad action becomes a stored success, and the stored success makes future bad actions more likely.

Self-reinforcing persistence matters because it transforms memory poisoning from a static contamination problem into a dynamic feedback loop.

Taken together, these subcategories show why memory, context poisoning, and persistence attacks deserve their own place in the taxonomy.

Earlier attacks such as prompt injection, social engineering, token smuggling, multilingual evasion, and structural manipulation are often the entry point. They explain how an attacker compromises the current interaction.

Memory and context poisoning explain how that compromise survives into future interactions. That is why this should not be dismissed as just another variant of prompt injection.

For a standalone chatbot, that may mean repeated misinformation, recurring instruction leakage, or persistent behavioural drift.

For an agent, the consequences can be much more serious. Once the system stores and reuses poisoned context while also controlling tools, retrieval, workflows, or external actions, one successful attack can influence many later tasks.

The broader lesson is simple, a system that stores, summarises, retrieves, or learns from prior context is not only carrying memory. It is carrying security history and if that history can be poisoned, future trust decisions can be poisoned with it.

Memory, context poisoning, and persistence attacks matter because they take a one-time manipulation and turn it into a recurring source of future compromise.