Memory Context Poisoning and Persistence Attacks in AI Agent¶

Estimated time to read: 3 minutes

Memory, Context Poisoning, and Persistence Attacks¶

If prompt injection is about manipulating the model in the moment, memory, context poisoning, and persistence attacks are about manipulating the model over time.

This category describes a class of weaknesses where malicious content does not only affect a single interaction, but contaminates the information an LLM or agent will rely on later. By poisoning memory, retrieval layers, stored summaries, contextual notes, prior successful actions, or other persistent state, an attacker can turn a one-time manipulation into recurring unsafe behaviour.

Most of the earlier attack families in the taxonomy focus on delivery. Prompt injection, social engineering, obfuscation, multilingual evasion, prompt structure manipulation, and indirect injection all describe ways to get malicious influence into the system.

This category is about persistence. The earlier attacks may compromise one response, one task, or one session.

Memory and context poisoning take that same influence and push it into something the system will trust again later. Once that happens, the attacker no longer needs to win the same battle every time.

The compromised context starts doing part of the work for them.

In that sense, this is not always a completely separate attack family. In many cases, it is the next phase of the same attack chain.

A prompt injection may succeed once and disappear. A poisoned memory may succeed once and then keep returning.

That's what makes this category more dangerous.

Memory Poisoning¶

This is the clearest version of the problem. The attacker poisons the agent’s long-term memory or experience store so that future retrieval pulls back attacker-influenced lessons, instructions, assumptions, or “successful” trajectories.

This matters because more agents are being built to remember what worked before. That memory is meant to improve performance, reduce repetition, and help the system generalise across tasks.

But once stored memory becomes part of the reasoning process, it also becomes part of the attack surface.

example would be an agent that stores “successful past actions” after completing tasks. If malicious content influences one task and causes the stored lesson to include an unsafe or attacker-preferred action pattern, then later retrieval may bring that poisoned memory back as if it were legitimate guidance.

Memory poisoning matters because it turns one compromised interaction into a reusable source of future compromise.

Retrieval Poisoning¶

This category targets what the system retrieves, not only what it explicitly "remembers."

In many LLM and agent deployments, the model depends on retrieval layers such as RAG stores, indexed internal documents, cached references, saved notes, knowledge bases, or prior task records. If those sources can be poisoned, then the model may keep pulling attacker-controlled material into future prompts under the appearance of trusted context.

This matters because retrieved material often looks more legitimate than fresh user input. It may be internal, familiar, or system-curated.

That makes it easier for poisoned context to inherit trust from the surrounding architecture.

An attacker does not need to attack the live prompt every time. They may only need to place malicious content in a location likely to be retrieved again later.

Once that happens, every future retrieval becomes a new delivery mechanism.

Retrieval poisoning it turns the knowledge layer itself into a recurring control channel for attacker influence.