blog
February 26, 2026
Threat modeling for LLM apps: 10 attack paths teams should address first
Large language model applications add a security problem that standard web apps do not have in the same form: untrusted text can influence execution flow.
In a conventional application, hostile input usually targets rendering, queries, or business logic. In an LLM application, hostile input can also affect what context is assembled, what content is retrieved, what tools the system attempts to use, and what data is returned, stored, or acted on.
That is why the main security boundary is usually not the foundation model alone. It is the LLM application runtime around the model: prompt assembly, retrieval, memory, tool routing, action checks, and output handling.
The 10 attack paths below are the ones most teams should assess first.
This is the most familiar LLM-specific risk, and still one of the most common.
A user submits text that is supposed to be treated as data, but the runtime passes it into the model in a way that lets it act like an instruction. The result is not only “the model says something odd.”, the real risk is that user-controlled text changes system behavior.
That behavior may include:
The core problem is weak separation between data and instructions.
The most reliable control is architectural: user input should remain data. If the system needs fields for a tool call, it should extract them into validated structured values. High-impact actions should be allowed only after explicit policy checks. If free-form text can directly shape tool arguments, the boundary is already weak.
User input is not the only place where hostile instructions can enter the system. Retrieved content can do the same thing.
A malicious document, ticket, email, or web page can be ingested, indexed, and later retrieved as context. If the runtime treats that content as trusted guidance instead of untrusted source material, the model may follow hidden instructions embedded inside it.
This matters most in systems that:
The security issue is simple: retrieved content is evidence, not authority. It may support an answer, but it should not directly control tool use or authorization decisions.
Retrieved text should be clearly delimited, filtered where needed, and kept separate from runtime instructions. Even if the model cites retrieved content as the reason for an action, the action still needs separate approval logic.
LLM applications often sit close to sensitive data like internal documents, CRM records, support transcripts, contracts, prior chat history, hidden system instructions, and execution traces.
The main risk is not that the model invents something false but that the system discloses data the requesting user is not allowed to see.
This usually happens when access control is applied too late. A common failure pattern looks like this:
By that point, the boundary has already failed. If restricted data entered the context window, the system has already exposed it to the runtime.
The correct order is the reverse. Authorization must be enforced before retrieval results are passed to the model. Tenant boundaries, role checks, row-level filters, and source restrictions belong inside the retrieval path, not after it.
Logs, traces, and memory need the same treatment. If they contain raw prompts, retrieved text, tool parameters, or user data, they can become a second disclosure channel.
An LLM application is rarely just “one model API.”, usually it depends on multiple moving parts like:
Any of these can become a risk source if they are compromised, misconfigured, or changed without control.
The concern is far from abstract. In an AI runtime, a dependency can influence what data leaves your environment, how it is indexed, how actions are routed, or how output is parsed and reused. A small change in a supporting library can alter security-critical behavior.
That means AI components need the same release discipline as any production dependency - version pinning, review before upgrades, regression tests, restricted outbound access, and visibility into external calls. If a third-party component can see customer data or affect agent behavior, it belongs in the core risk review path.
LLM systems depend on data pipelines such as indexed documents, internal knowledge bases, feedback loops, fine-tuning sets, and user-contributed content.
Any of these can be manipulated of course.
A poisoning attack does not need to break the model, only to shift the system’s behavior enough to produce the wrong answer, retrieve the wrong content, or trust the wrong source. In practice, that often means:
The main mistake here is treating ingestion as a neutral technical step. It is not. Ingestion is a trust decision.
Content provenance, trust tiers, review queues, weighting rules, and moderation all matter. If trusted internal documents and unknown external material are indexed into one flat retrieval space with equal influence, the system is easy to steer. A reliable LLM application depends on a controlled data path, not only on model quality.
Model output is often treated as “safe” because it came from the system, but that assumption is wrong.
LLM output is still untrusted data. If the application renders it as HTML, inserts it into a query, passes it into a template, executes it as code, or forwards it into another automation without validation, it has created a standard injection path through a new source.
This is not limited to public-facing applications, on the contraty, internal copilots are often more dangerous because they sit close to admin tools, databases, and workflow systems.
The correct rule is to treat model output with the same suspicion as user input. That means:
If an LLM response can be interpreted by another system, then output handling is a security control, not a formatting step.
The most serious incidents usually come from what the application allows the LLM to do, not from what the model says.
If the runtime can create records, update a CRM, send messages, change tickets, write into an ERP, or trigger workflows, then the key question is no longer whether the model can be influenced. The key question is what authority the application grants after that influence occurs.
This is where many “agent” implementations are too permissive. They use broad credentials, vague tool descriptions, write access by default, and weak approval steps. That turns a text-level attack into an operational incident.
The right pattern is strict action control:
The LLM may propose an action. It should not be the component that decides whether the action is allowed.
System prompts, hidden instructions, tool schemas, routing rules, and fallback logic are often treated as harmless internal details. They are not.
If an attacker can extract this internal logic, they gain a map of how the runtime works. They learn:
That makes later attacks easier.
The deeper issue is not prompt secrecy by itself but relying on prompt secrecy as a security control. If critical security decisions exist only as hidden text instructions, the design is fragile.
Security-sensitive checks should live in code and policy layers that remain in force even if internal prompt text becomes known. A useful test is this: if the full system prompt were exposed, would the same action boundaries still hold? If not, too much trust has been placed in hidden prompt text.
In enterprise LLM systems, one of the most dangerous failures is poor isolation in the retrieval layer.
The system retrieves “relevant” content, but relevance is not the same as authorization. A similarity search may surface documents from the wrong tenant, the wrong department, or the wrong access level if retrieval is driven only by ranking logic.
This is easy to miss because retrieval often feels like infrastructure rather than security. In practice, it is both. If authorization is not enforced inside the retrieval process, the model may receive content it should never have seen. Even if the final answer looks harmless, the boundary has already failed because restricted content entered runtime context.
The fix is exact here:
A system can have excellent answer quality and still fail basic data isolation if retrieval is not constrained correctly.
LLM applications can be stressed in ways that are both technical and financial.
An attacker does not need a classic denial-of-service pattern if the runtime allows:
In that case, the attacker can drive the system into expensive execution paths until latency rises, queues back up, provider limits hit, or costs grow beyond control.
This is why every LLM runtime needs a clearly defined work boundary. The system should have explicit limits for:
A basic design question should always have a clear answer: what is the maximum amount of work one request is allowed to trigger? If the team cannot answer that precisely, the cost boundary is not under control.
A practical way to analyze risks is to map the system as a data flow:
User input and external content → orchestration layer → retrieval layer → model inference → tool calls → output handling → logs and memory
Then ask one question at each stage: what can an attacker influence here, and what can that influence trigger next?
Most teams do not need a massive security program to reduce risk, just the right order of operations.
Start with action controls, because that is where business impact is highest. If the runtime can write, send, approve, or change records, action authorization must be stronger than the model’s output.
Next, secure retrieval, because retrieval determines what data reaches the runtime at all. If the wrong content enters context, both confidentiality and output quality are at risk.
Then secure output handling, because model output often crosses into templates, interfaces, automations, and storage systems where standard injection risks return.
After that, secure telemetry: logs, traces, and memory. These often contain more raw data than the user-facing response.
Finally, make security testing repeatable. Prompt injection, retrieval leakage, unauthorized action attempts, and output misuse should be part of regression testing, not occasional manual checks.
If your team is building, deploying, or reviewing an LLM application and wants an external technical perspective, schedule a free consultation with Blocshop.
We will review your current runtime design, identify the highest-risk areas in retrieval, tool access, output handling, and action control, and help you define the most important fixes first.
SCHEDULE A FREE CONSULTATION
Learn more from our insights
The journey to your
custom software
solution starts here.
Services
blog
February 26, 2026
Threat modeling for LLM apps: 10 attack paths teams should address first
Large language model applications add a security problem that standard web apps do not have in the same form: untrusted text can influence execution flow.
In a conventional application, hostile input usually targets rendering, queries, or business logic. In an LLM application, hostile input can also affect what context is assembled, what content is retrieved, what tools the system attempts to use, and what data is returned, stored, or acted on.
That is why the main security boundary is usually not the foundation model alone. It is the LLM application runtime around the model: prompt assembly, retrieval, memory, tool routing, action checks, and output handling.
The 10 attack paths below are the ones most teams should assess first.
This is the most familiar LLM-specific risk, and still one of the most common.
A user submits text that is supposed to be treated as data, but the runtime passes it into the model in a way that lets it act like an instruction. The result is not only “the model says something odd.”, the real risk is that user-controlled text changes system behavior.
That behavior may include:
The core problem is weak separation between data and instructions.
The most reliable control is architectural: user input should remain data. If the system needs fields for a tool call, it should extract them into validated structured values. High-impact actions should be allowed only after explicit policy checks. If free-form text can directly shape tool arguments, the boundary is already weak.
User input is not the only place where hostile instructions can enter the system. Retrieved content can do the same thing.
A malicious document, ticket, email, or web page can be ingested, indexed, and later retrieved as context. If the runtime treats that content as trusted guidance instead of untrusted source material, the model may follow hidden instructions embedded inside it.
This matters most in systems that:
The security issue is simple: retrieved content is evidence, not authority. It may support an answer, but it should not directly control tool use or authorization decisions.
Retrieved text should be clearly delimited, filtered where needed, and kept separate from runtime instructions. Even if the model cites retrieved content as the reason for an action, the action still needs separate approval logic.
LLM applications often sit close to sensitive data like internal documents, CRM records, support transcripts, contracts, prior chat history, hidden system instructions, and execution traces.
The main risk is not that the model invents something false but that the system discloses data the requesting user is not allowed to see.
This usually happens when access control is applied too late. A common failure pattern looks like this:
By that point, the boundary has already failed. If restricted data entered the context window, the system has already exposed it to the runtime.
The correct order is the reverse. Authorization must be enforced before retrieval results are passed to the model. Tenant boundaries, role checks, row-level filters, and source restrictions belong inside the retrieval path, not after it.
Logs, traces, and memory need the same treatment. If they contain raw prompts, retrieved text, tool parameters, or user data, they can become a second disclosure channel.
An LLM application is rarely just “one model API.”, usually it depends on multiple moving parts like:
Any of these can become a risk source if they are compromised, misconfigured, or changed without control.
The concern is far from abstract. In an AI runtime, a dependency can influence what data leaves your environment, how it is indexed, how actions are routed, or how output is parsed and reused. A small change in a supporting library can alter security-critical behavior.
That means AI components need the same release discipline as any production dependency - version pinning, review before upgrades, regression tests, restricted outbound access, and visibility into external calls. If a third-party component can see customer data or affect agent behavior, it belongs in the core risk review path.
LLM systems depend on data pipelines such as indexed documents, internal knowledge bases, feedback loops, fine-tuning sets, and user-contributed content.
Any of these can be manipulated of course.
A poisoning attack does not need to break the model, only to shift the system’s behavior enough to produce the wrong answer, retrieve the wrong content, or trust the wrong source. In practice, that often means:
The main mistake here is treating ingestion as a neutral technical step. It is not. Ingestion is a trust decision.
Content provenance, trust tiers, review queues, weighting rules, and moderation all matter. If trusted internal documents and unknown external material are indexed into one flat retrieval space with equal influence, the system is easy to steer. A reliable LLM application depends on a controlled data path, not only on model quality.
Model output is often treated as “safe” because it came from the system, but that assumption is wrong.
LLM output is still untrusted data. If the application renders it as HTML, inserts it into a query, passes it into a template, executes it as code, or forwards it into another automation without validation, it has created a standard injection path through a new source.
This is not limited to public-facing applications, on the contraty, internal copilots are often more dangerous because they sit close to admin tools, databases, and workflow systems.
The correct rule is to treat model output with the same suspicion as user input. That means:
If an LLM response can be interpreted by another system, then output handling is a security control, not a formatting step.
The most serious incidents usually come from what the application allows the LLM to do, not from what the model says.
If the runtime can create records, update a CRM, send messages, change tickets, write into an ERP, or trigger workflows, then the key question is no longer whether the model can be influenced. The key question is what authority the application grants after that influence occurs.
This is where many “agent” implementations are too permissive. They use broad credentials, vague tool descriptions, write access by default, and weak approval steps. That turns a text-level attack into an operational incident.
The right pattern is strict action control:
The LLM may propose an action. It should not be the component that decides whether the action is allowed.
System prompts, hidden instructions, tool schemas, routing rules, and fallback logic are often treated as harmless internal details. They are not.
If an attacker can extract this internal logic, they gain a map of how the runtime works. They learn:
That makes later attacks easier.
The deeper issue is not prompt secrecy by itself but relying on prompt secrecy as a security control. If critical security decisions exist only as hidden text instructions, the design is fragile.
Security-sensitive checks should live in code and policy layers that remain in force even if internal prompt text becomes known. A useful test is this: if the full system prompt were exposed, would the same action boundaries still hold? If not, too much trust has been placed in hidden prompt text.
In enterprise LLM systems, one of the most dangerous failures is poor isolation in the retrieval layer.
The system retrieves “relevant” content, but relevance is not the same as authorization. A similarity search may surface documents from the wrong tenant, the wrong department, or the wrong access level if retrieval is driven only by ranking logic.
This is easy to miss because retrieval often feels like infrastructure rather than security. In practice, it is both. If authorization is not enforced inside the retrieval process, the model may receive content it should never have seen. Even if the final answer looks harmless, the boundary has already failed because restricted content entered runtime context.
The fix is exact here:
A system can have excellent answer quality and still fail basic data isolation if retrieval is not constrained correctly.
LLM applications can be stressed in ways that are both technical and financial.
An attacker does not need a classic denial-of-service pattern if the runtime allows:
In that case, the attacker can drive the system into expensive execution paths until latency rises, queues back up, provider limits hit, or costs grow beyond control.
This is why every LLM runtime needs a clearly defined work boundary. The system should have explicit limits for:
A basic design question should always have a clear answer: what is the maximum amount of work one request is allowed to trigger? If the team cannot answer that precisely, the cost boundary is not under control.
A practical way to analyze risks is to map the system as a data flow:
User input and external content → orchestration layer → retrieval layer → model inference → tool calls → output handling → logs and memory
Then ask one question at each stage: what can an attacker influence here, and what can that influence trigger next?
Most teams do not need a massive security program to reduce risk, just the right order of operations.
Start with action controls, because that is where business impact is highest. If the runtime can write, send, approve, or change records, action authorization must be stronger than the model’s output.
Next, secure retrieval, because retrieval determines what data reaches the runtime at all. If the wrong content enters context, both confidentiality and output quality are at risk.
Then secure output handling, because model output often crosses into templates, interfaces, automations, and storage systems where standard injection risks return.
After that, secure telemetry: logs, traces, and memory. These often contain more raw data than the user-facing response.
Finally, make security testing repeatable. Prompt injection, retrieval leakage, unauthorized action attempts, and output misuse should be part of regression testing, not occasional manual checks.
If your team is building, deploying, or reviewing an LLM application and wants an external technical perspective, schedule a free consultation with Blocshop.
We will review your current runtime design, identify the highest-risk areas in retrieval, tool access, output handling, and action control, and help you define the most important fixes first.
SCHEDULE A FREE CONSULTATION
Learn more from our insights
Let's talk!
The journey to your
custom software
solution starts here.
Services
Head Office
Revoluční 1
110 00, Prague Czech Republic
hello@blocshop.io
blog
February 26, 2026
Threat modeling for LLM apps: 10 attack paths teams should address first
Large language model applications add a security problem that standard web apps do not have in the same form: untrusted text can influence execution flow.
In a conventional application, hostile input usually targets rendering, queries, or business logic. In an LLM application, hostile input can also affect what context is assembled, what content is retrieved, what tools the system attempts to use, and what data is returned, stored, or acted on.
That is why the main security boundary is usually not the foundation model alone. It is the LLM application runtime around the model: prompt assembly, retrieval, memory, tool routing, action checks, and output handling.
The 10 attack paths below are the ones most teams should assess first.
This is the most familiar LLM-specific risk, and still one of the most common.
A user submits text that is supposed to be treated as data, but the runtime passes it into the model in a way that lets it act like an instruction. The result is not only “the model says something odd.”, the real risk is that user-controlled text changes system behavior.
That behavior may include:
The core problem is weak separation between data and instructions.
The most reliable control is architectural: user input should remain data. If the system needs fields for a tool call, it should extract them into validated structured values. High-impact actions should be allowed only after explicit policy checks. If free-form text can directly shape tool arguments, the boundary is already weak.
User input is not the only place where hostile instructions can enter the system. Retrieved content can do the same thing.
A malicious document, ticket, email, or web page can be ingested, indexed, and later retrieved as context. If the runtime treats that content as trusted guidance instead of untrusted source material, the model may follow hidden instructions embedded inside it.
This matters most in systems that:
The security issue is simple: retrieved content is evidence, not authority. It may support an answer, but it should not directly control tool use or authorization decisions.
Retrieved text should be clearly delimited, filtered where needed, and kept separate from runtime instructions. Even if the model cites retrieved content as the reason for an action, the action still needs separate approval logic.
LLM applications often sit close to sensitive data like internal documents, CRM records, support transcripts, contracts, prior chat history, hidden system instructions, and execution traces.
The main risk is not that the model invents something false but that the system discloses data the requesting user is not allowed to see.
This usually happens when access control is applied too late. A common failure pattern looks like this:
By that point, the boundary has already failed. If restricted data entered the context window, the system has already exposed it to the runtime.
The correct order is the reverse. Authorization must be enforced before retrieval results are passed to the model. Tenant boundaries, role checks, row-level filters, and source restrictions belong inside the retrieval path, not after it.
Logs, traces, and memory need the same treatment. If they contain raw prompts, retrieved text, tool parameters, or user data, they can become a second disclosure channel.
An LLM application is rarely just “one model API.”, usually it depends on multiple moving parts like:
Any of these can become a risk source if they are compromised, misconfigured, or changed without control.
The concern is far from abstract. In an AI runtime, a dependency can influence what data leaves your environment, how it is indexed, how actions are routed, or how output is parsed and reused. A small change in a supporting library can alter security-critical behavior.
That means AI components need the same release discipline as any production dependency - version pinning, review before upgrades, regression tests, restricted outbound access, and visibility into external calls. If a third-party component can see customer data or affect agent behavior, it belongs in the core risk review path.
LLM systems depend on data pipelines such as indexed documents, internal knowledge bases, feedback loops, fine-tuning sets, and user-contributed content.
Any of these can be manipulated of course.
A poisoning attack does not need to break the model, only to shift the system’s behavior enough to produce the wrong answer, retrieve the wrong content, or trust the wrong source. In practice, that often means:
The main mistake here is treating ingestion as a neutral technical step. It is not. Ingestion is a trust decision.
Content provenance, trust tiers, review queues, weighting rules, and moderation all matter. If trusted internal documents and unknown external material are indexed into one flat retrieval space with equal influence, the system is easy to steer. A reliable LLM application depends on a controlled data path, not only on model quality.
Model output is often treated as “safe” because it came from the system, but that assumption is wrong.
LLM output is still untrusted data. If the application renders it as HTML, inserts it into a query, passes it into a template, executes it as code, or forwards it into another automation without validation, it has created a standard injection path through a new source.
This is not limited to public-facing applications, on the contraty, internal copilots are often more dangerous because they sit close to admin tools, databases, and workflow systems.
The correct rule is to treat model output with the same suspicion as user input. That means:
If an LLM response can be interpreted by another system, then output handling is a security control, not a formatting step.
The most serious incidents usually come from what the application allows the LLM to do, not from what the model says.
If the runtime can create records, update a CRM, send messages, change tickets, write into an ERP, or trigger workflows, then the key question is no longer whether the model can be influenced. The key question is what authority the application grants after that influence occurs.
This is where many “agent” implementations are too permissive. They use broad credentials, vague tool descriptions, write access by default, and weak approval steps. That turns a text-level attack into an operational incident.
The right pattern is strict action control:
The LLM may propose an action. It should not be the component that decides whether the action is allowed.
System prompts, hidden instructions, tool schemas, routing rules, and fallback logic are often treated as harmless internal details. They are not.
If an attacker can extract this internal logic, they gain a map of how the runtime works. They learn:
That makes later attacks easier.
The deeper issue is not prompt secrecy by itself but relying on prompt secrecy as a security control. If critical security decisions exist only as hidden text instructions, the design is fragile.
Security-sensitive checks should live in code and policy layers that remain in force even if internal prompt text becomes known. A useful test is this: if the full system prompt were exposed, would the same action boundaries still hold? If not, too much trust has been placed in hidden prompt text.
In enterprise LLM systems, one of the most dangerous failures is poor isolation in the retrieval layer.
The system retrieves “relevant” content, but relevance is not the same as authorization. A similarity search may surface documents from the wrong tenant, the wrong department, or the wrong access level if retrieval is driven only by ranking logic.
This is easy to miss because retrieval often feels like infrastructure rather than security. In practice, it is both. If authorization is not enforced inside the retrieval process, the model may receive content it should never have seen. Even if the final answer looks harmless, the boundary has already failed because restricted content entered runtime context.
The fix is exact here:
A system can have excellent answer quality and still fail basic data isolation if retrieval is not constrained correctly.
LLM applications can be stressed in ways that are both technical and financial.
An attacker does not need a classic denial-of-service pattern if the runtime allows:
In that case, the attacker can drive the system into expensive execution paths until latency rises, queues back up, provider limits hit, or costs grow beyond control.
This is why every LLM runtime needs a clearly defined work boundary. The system should have explicit limits for:
A basic design question should always have a clear answer: what is the maximum amount of work one request is allowed to trigger? If the team cannot answer that precisely, the cost boundary is not under control.
A practical way to analyze risks is to map the system as a data flow:
User input and external content → orchestration layer → retrieval layer → model inference → tool calls → output handling → logs and memory
Then ask one question at each stage: what can an attacker influence here, and what can that influence trigger next?
Most teams do not need a massive security program to reduce risk, just the right order of operations.
Start with action controls, because that is where business impact is highest. If the runtime can write, send, approve, or change records, action authorization must be stronger than the model’s output.
Next, secure retrieval, because retrieval determines what data reaches the runtime at all. If the wrong content enters context, both confidentiality and output quality are at risk.
Then secure output handling, because model output often crosses into templates, interfaces, automations, and storage systems where standard injection risks return.
After that, secure telemetry: logs, traces, and memory. These often contain more raw data than the user-facing response.
Finally, make security testing repeatable. Prompt injection, retrieval leakage, unauthorized action attempts, and output misuse should be part of regression testing, not occasional manual checks.
If your team is building, deploying, or reviewing an LLM application and wants an external technical perspective, schedule a free consultation with Blocshop.
We will review your current runtime design, identify the highest-risk areas in retrieval, tool access, output handling, and action control, and help you define the most important fixes first.
SCHEDULE A FREE CONSULTATION
Learn more from our insights
Let's talk!
The journey to your
custom software solution starts here.
Services