blog

February 26, 2026

Threat modeling for LLM apps: 10 attack paths teams should address first

Large language model applications add a security problem that standard web apps do not have in the same form: untrusted text can influence execution flow.


In a conventional application, hostile input usually targets rendering, queries, or business logic. In an LLM application, hostile input can also affect what context is assembled, what content is retrieved, what tools the system attempts to use, and what data is returned, stored, or acted on.


That is why the main security boundary is usually not the foundation model alone. It is the LLM application runtime around the model: prompt assembly, retrieval, memory, tool routing, action checks, and output handling.


The 10 attack paths below are the ones most teams should assess first.



1) Instruction injection through user input


This is the most familiar LLM-specific risk, and still one of the most common.


A user submits text that is supposed to be treated as data, but the runtime passes it into the model in a way that lets it act like an instruction. The result is not only “the model says something odd.”, the real risk is that user-controlled text changes system behavior.


That behavior may include:

  • expanding retrieval scope,
  • bypassing normal response rules,
  • changing tool parameters,
  • or proposing an action the user should not be able to trigger.


The core problem is weak separation between data and instructions.


The most reliable control is architectural: user input should remain data. If the system needs fields for a tool call, it should extract them into validated structured values. High-impact actions should be allowed only after explicit policy checks. If free-form text can directly shape tool arguments, the boundary is already weak.



2) Instruction injection through retrieved content


User input is not the only place where hostile instructions can enter the system. Retrieved content can do the same thing.


A malicious document, ticket, email, or web page can be ingested, indexed, and later retrieved as context. If the runtime treats that content as trusted guidance instead of untrusted source material, the model may follow hidden instructions embedded inside it.


This matters most in systems that:

  • summarize third-party content,
  • use retrieval-augmented generation,
  • process support tickets or inbound emails,
  • or let agents act on external documents.


The security issue is simple: retrieved content is evidence, not authority. It may support an answer, but it should not directly control tool use or authorization decisions.


Retrieved text should be clearly delimited, filtered where needed, and kept separate from runtime instructions. Even if the model cites retrieved content as the reason for an action, the action still needs separate approval logic.



3) Unauthorized disclosure of protected data


LLM applications often sit close to sensitive data like internal documents, CRM records, support transcripts, contracts, prior chat history, hidden system instructions, and execution traces.


The main risk is not that the model invents something false but that the system discloses data the requesting user is not allowed to see.


This usually happens when access control is applied too late. A common failure pattern looks like this:

  1. the retrieval layer pulls “relevant” content,
  2. restricted records are included in the model context,
  3. the model is asked to answer helpfully,
  4. and only the final response is treated as the security boundary.


By that point, the boundary has already failed. If restricted data entered the context window, the system has already exposed it to the runtime.


The correct order is the reverse. Authorization must be enforced before retrieval results are passed to the model. Tenant boundaries, role checks, row-level filters, and source restrictions belong inside the retrieval path, not after it.


Logs, traces, and memory need the same treatment. If they contain raw prompts, retrieved text, tool parameters, or user data, they can become a second disclosure channel.



4) Compromise in external components and dependencies


An LLM application is rarely just “one model API.”, usually it depends on multiple moving parts like:

  • SDKs,
  • embedding services,
  • vector databases,
  • connectors,
  • parsers,
  • prompt templates,
  • document processors,
  • and external tools the agent can call.


Any of these can become a risk source if they are compromised, misconfigured, or changed without control.


The concern is far from abstract. In an AI runtime, a dependency can influence what data leaves your environment, how it is indexed, how actions are routed, or how output is parsed and reused. A small change in a supporting library can alter security-critical behavior.


That means AI components need the same release discipline as any production dependency - version pinning, review before upgrades, regression tests, restricted outbound access, and visibility into external calls. If a third-party component can see customer data or affect agent behavior, it belongs in the core risk review path.



5) Poisoning of indexed content and feedback data

LLM systems depend on data pipelines such as indexed documents, internal knowledge bases, feedback loops, fine-tuning sets, and user-contributed content.


Any of these can be manipulated of course.


A poisoning attack does not need to break the model, only to shift the system’s behavior enough to produce the wrong answer, retrieve the wrong content, or trust the wrong source. In practice, that often means:

  • documents crafted to rank well in retrieval despite low reliability,
  • content written to influence downstream summaries,
  • or feedback data that pushes the system toward incorrect patterns.


The main mistake here is treating ingestion as a neutral technical step. It is not. Ingestion is a trust decision.


Content provenance, trust tiers, review queues, weighting rules, and moderation all matter. If trusted internal documents and unknown external material are indexed into one flat retrieval space with equal influence, the system is easy to steer. A reliable LLM application depends on a controlled data path, not only on model quality.



6) Unsafe reuse of model output


Model output is often treated as “safe” because it came from the system, but that assumption is wrong.


LLM output is still untrusted data. If the application renders it as HTML, inserts it into a query, passes it into a template, executes it as code, or forwards it into another automation without validation, it has created a standard injection path through a new source.


This is not limited to public-facing applications, on the contraty, internal copilots are often more dangerous because they sit close to admin tools, databases, and workflow systems.


The correct rule is to treat model output with the same suspicion as user input. That means:

  • encode before rendering,
  • validate before storage,
  • sanitize before reuse,
  • and reject structured output that does not match the expected schema.


If an LLM response can be interpreted by another system, then output handling is a security control, not a formatting step.



7) Overprivileged action execution


The most serious incidents usually come from what the application allows the LLM to do, not from what the model says.


If the runtime can create records, update a CRM, send messages, change tickets, write into an ERP, or trigger workflows, then the key question is no longer whether the model can be influenced. The key question is what authority the application grants after that influence occurs.


This is where many “agent” implementations are too permissive. They use broad credentials, vague tool descriptions, write access by default, and weak approval steps. That turns a text-level attack into an operational incident.


The right pattern is strict action control:

  • least-privilege credentials,
  • explicit allowlists of permitted actions,
  • read-only defaults,
  • approval gates for high-impact writes,
  • and durable logs for every executed action.


The LLM may propose an action. It should not be the component that decides whether the action is allowed.



8) Exposure of internal runtime logic


System prompts, hidden instructions, tool schemas, routing rules, and fallback logic are often treated as harmless internal details. They are not.


If an attacker can extract this internal logic, they gain a map of how the runtime works. They learn:

  • how the system prioritizes instructions,
  • what tool names and actions exist,
  • how edge cases are handled,
  • and where the weak spots are likely to be.


That makes later attacks easier.

The deeper issue is not prompt secrecy by itself but relying on prompt secrecy as a security control. If critical security decisions exist only as hidden text instructions, the design is fragile.


Security-sensitive checks should live in code and policy layers that remain in force even if internal prompt text becomes known. A useful test is this: if the full system prompt were exposed, would the same action boundaries still hold? If not, too much trust has been placed in hidden prompt text.



9) Retrieval authorization failure


In enterprise LLM systems, one of the most dangerous failures is poor isolation in the retrieval layer.


The system retrieves “relevant” content, but relevance is not the same as authorization. A similarity search may surface documents from the wrong tenant, the wrong department, or the wrong access level if retrieval is driven only by ranking logic.


This is easy to miss because retrieval often feels like infrastructure rather than security. In practice, it is both. If authorization is not enforced inside the retrieval process, the model may receive content it should never have seen. Even if the final answer looks harmless, the boundary has already failed because restricted content entered runtime context.


The fix is exact here:

  • apply tenant filters before ranking,
  • enforce document-level permissions,
  • restrict sources by policy,
  • and treat retrieval as part of access control.


A system can have excellent answer quality and still fail basic data isolation if retrieval is not constrained correctly.



10) Resource exhaustion and cost abuse


LLM applications can be stressed in ways that are both technical and financial.


An attacker does not need a classic denial-of-service pattern if the runtime allows:

  • oversized prompts,
  • repeated retries,
  • deep tool-call chains,
  • broad retrieval fan-out,
  • or loops that keep consuming tokens and API capacity.


In that case, the attacker can drive the system into expensive execution paths until latency rises, queues back up, provider limits hit, or costs grow beyond control.


This is why every LLM runtime needs a clearly defined work boundary. The system should have explicit limits for:

  • maximum tokens,
  • maximum tool-call depth,
  • request timeouts,
  • concurrency,
  • retry behavior,
  • and loop detection.


A basic design question should always have a clear answer: what is the maximum amount of work one request is allowed to trigger? If the team cannot answer that precisely, the cost boundary is not under control.



How to assess the attack surface


A practical way to analyze risks is to map the system as a data flow:

User input and external content → orchestration layer → retrieval layer → model inference → tool calls → output handling → logs and memory


Then ask one question at each stage: what can an attacker influence here, and what can that influence trigger next?



What teams should fix first


Most teams do not need a massive security program to reduce risk, just the right order of operations.


Start with action controls, because that is where business impact is highest. If the runtime can write, send, approve, or change records, action authorization must be stronger than the model’s output.


Next, secure retrieval, because retrieval determines what data reaches the runtime at all. If the wrong content enters context, both confidentiality and output quality are at risk.


Then secure output handling, because model output often crosses into templates, interfaces, automations, and storage systems where standard injection risks return.


After that, secure telemetry: logs, traces, and memory. These often contain more raw data than the user-facing response.


Finally, make security testing repeatable. Prompt injection, retrieval leakage, unauthorized action attempts, and output misuse should be part of regression testing, not occasional manual checks.



Review your LLM attack surface with Blocshop


If your team is building, deploying, or reviewing an LLM application and wants an external technical perspective, schedule a free consultation with Blocshop.


We will review your current runtime design, identify the highest-risk areas in retrieval, tool access, output handling, and action control, and help you define the most important fixes first.


SCHEDULE A FREE CONSULTATION

blog

February 26, 2026

Threat modeling for LLM apps: 10 attack paths teams should address first

Large language model applications add a security problem that standard web apps do not have in the same form: untrusted text can influence execution flow.


In a conventional application, hostile input usually targets rendering, queries, or business logic. In an LLM application, hostile input can also affect what context is assembled, what content is retrieved, what tools the system attempts to use, and what data is returned, stored, or acted on.


That is why the main security boundary is usually not the foundation model alone. It is the LLM application runtime around the model: prompt assembly, retrieval, memory, tool routing, action checks, and output handling.


The 10 attack paths below are the ones most teams should assess first.



1) Instruction injection through user input


This is the most familiar LLM-specific risk, and still one of the most common.


A user submits text that is supposed to be treated as data, but the runtime passes it into the model in a way that lets it act like an instruction. The result is not only “the model says something odd.”, the real risk is that user-controlled text changes system behavior.


That behavior may include:

  • expanding retrieval scope,
  • bypassing normal response rules,
  • changing tool parameters,
  • or proposing an action the user should not be able to trigger.


The core problem is weak separation between data and instructions.


The most reliable control is architectural: user input should remain data. If the system needs fields for a tool call, it should extract them into validated structured values. High-impact actions should be allowed only after explicit policy checks. If free-form text can directly shape tool arguments, the boundary is already weak.



2) Instruction injection through retrieved content


User input is not the only place where hostile instructions can enter the system. Retrieved content can do the same thing.


A malicious document, ticket, email, or web page can be ingested, indexed, and later retrieved as context. If the runtime treats that content as trusted guidance instead of untrusted source material, the model may follow hidden instructions embedded inside it.


This matters most in systems that:

  • summarize third-party content,
  • use retrieval-augmented generation,
  • process support tickets or inbound emails,
  • or let agents act on external documents.


The security issue is simple: retrieved content is evidence, not authority. It may support an answer, but it should not directly control tool use or authorization decisions.


Retrieved text should be clearly delimited, filtered where needed, and kept separate from runtime instructions. Even if the model cites retrieved content as the reason for an action, the action still needs separate approval logic.



3) Unauthorized disclosure of protected data


LLM applications often sit close to sensitive data like internal documents, CRM records, support transcripts, contracts, prior chat history, hidden system instructions, and execution traces.


The main risk is not that the model invents something false but that the system discloses data the requesting user is not allowed to see.


This usually happens when access control is applied too late. A common failure pattern looks like this:

  1. the retrieval layer pulls “relevant” content,
  2. restricted records are included in the model context,
  3. the model is asked to answer helpfully,
  4. and only the final response is treated as the security boundary.


By that point, the boundary has already failed. If restricted data entered the context window, the system has already exposed it to the runtime.


The correct order is the reverse. Authorization must be enforced before retrieval results are passed to the model. Tenant boundaries, role checks, row-level filters, and source restrictions belong inside the retrieval path, not after it.


Logs, traces, and memory need the same treatment. If they contain raw prompts, retrieved text, tool parameters, or user data, they can become a second disclosure channel.



4) Compromise in external components and dependencies


An LLM application is rarely just “one model API.”, usually it depends on multiple moving parts like:

  • SDKs,
  • embedding services,
  • vector databases,
  • connectors,
  • parsers,
  • prompt templates,
  • document processors,
  • and external tools the agent can call.


Any of these can become a risk source if they are compromised, misconfigured, or changed without control.


The concern is far from abstract. In an AI runtime, a dependency can influence what data leaves your environment, how it is indexed, how actions are routed, or how output is parsed and reused. A small change in a supporting library can alter security-critical behavior.


That means AI components need the same release discipline as any production dependency - version pinning, review before upgrades, regression tests, restricted outbound access, and visibility into external calls. If a third-party component can see customer data or affect agent behavior, it belongs in the core risk review path.



5) Poisoning of indexed content and feedback data

LLM systems depend on data pipelines such as indexed documents, internal knowledge bases, feedback loops, fine-tuning sets, and user-contributed content.


Any of these can be manipulated of course.


A poisoning attack does not need to break the model, only to shift the system’s behavior enough to produce the wrong answer, retrieve the wrong content, or trust the wrong source. In practice, that often means:

  • documents crafted to rank well in retrieval despite low reliability,
  • content written to influence downstream summaries,
  • or feedback data that pushes the system toward incorrect patterns.


The main mistake here is treating ingestion as a neutral technical step. It is not. Ingestion is a trust decision.


Content provenance, trust tiers, review queues, weighting rules, and moderation all matter. If trusted internal documents and unknown external material are indexed into one flat retrieval space with equal influence, the system is easy to steer. A reliable LLM application depends on a controlled data path, not only on model quality.



6) Unsafe reuse of model output


Model output is often treated as “safe” because it came from the system, but that assumption is wrong.


LLM output is still untrusted data. If the application renders it as HTML, inserts it into a query, passes it into a template, executes it as code, or forwards it into another automation without validation, it has created a standard injection path through a new source.


This is not limited to public-facing applications, on the contraty, internal copilots are often more dangerous because they sit close to admin tools, databases, and workflow systems.


The correct rule is to treat model output with the same suspicion as user input. That means:

  • encode before rendering,
  • validate before storage,
  • sanitize before reuse,
  • and reject structured output that does not match the expected schema.


If an LLM response can be interpreted by another system, then output handling is a security control, not a formatting step.



7) Overprivileged action execution


The most serious incidents usually come from what the application allows the LLM to do, not from what the model says.


If the runtime can create records, update a CRM, send messages, change tickets, write into an ERP, or trigger workflows, then the key question is no longer whether the model can be influenced. The key question is what authority the application grants after that influence occurs.


This is where many “agent” implementations are too permissive. They use broad credentials, vague tool descriptions, write access by default, and weak approval steps. That turns a text-level attack into an operational incident.


The right pattern is strict action control:

  • least-privilege credentials,
  • explicit allowlists of permitted actions,
  • read-only defaults,
  • approval gates for high-impact writes,
  • and durable logs for every executed action.


The LLM may propose an action. It should not be the component that decides whether the action is allowed.



8) Exposure of internal runtime logic


System prompts, hidden instructions, tool schemas, routing rules, and fallback logic are often treated as harmless internal details. They are not.


If an attacker can extract this internal logic, they gain a map of how the runtime works. They learn:

  • how the system prioritizes instructions,
  • what tool names and actions exist,
  • how edge cases are handled,
  • and where the weak spots are likely to be.


That makes later attacks easier.

The deeper issue is not prompt secrecy by itself but relying on prompt secrecy as a security control. If critical security decisions exist only as hidden text instructions, the design is fragile.


Security-sensitive checks should live in code and policy layers that remain in force even if internal prompt text becomes known. A useful test is this: if the full system prompt were exposed, would the same action boundaries still hold? If not, too much trust has been placed in hidden prompt text.



9) Retrieval authorization failure


In enterprise LLM systems, one of the most dangerous failures is poor isolation in the retrieval layer.


The system retrieves “relevant” content, but relevance is not the same as authorization. A similarity search may surface documents from the wrong tenant, the wrong department, or the wrong access level if retrieval is driven only by ranking logic.


This is easy to miss because retrieval often feels like infrastructure rather than security. In practice, it is both. If authorization is not enforced inside the retrieval process, the model may receive content it should never have seen. Even if the final answer looks harmless, the boundary has already failed because restricted content entered runtime context.


The fix is exact here:

  • apply tenant filters before ranking,
  • enforce document-level permissions,
  • restrict sources by policy,
  • and treat retrieval as part of access control.


A system can have excellent answer quality and still fail basic data isolation if retrieval is not constrained correctly.



10) Resource exhaustion and cost abuse


LLM applications can be stressed in ways that are both technical and financial.


An attacker does not need a classic denial-of-service pattern if the runtime allows:

  • oversized prompts,
  • repeated retries,
  • deep tool-call chains,
  • broad retrieval fan-out,
  • or loops that keep consuming tokens and API capacity.


In that case, the attacker can drive the system into expensive execution paths until latency rises, queues back up, provider limits hit, or costs grow beyond control.


This is why every LLM runtime needs a clearly defined work boundary. The system should have explicit limits for:

  • maximum tokens,
  • maximum tool-call depth,
  • request timeouts,
  • concurrency,
  • retry behavior,
  • and loop detection.


A basic design question should always have a clear answer: what is the maximum amount of work one request is allowed to trigger? If the team cannot answer that precisely, the cost boundary is not under control.



How to assess the attack surface


A practical way to analyze risks is to map the system as a data flow:

User input and external content → orchestration layer → retrieval layer → model inference → tool calls → output handling → logs and memory


Then ask one question at each stage: what can an attacker influence here, and what can that influence trigger next?



What teams should fix first


Most teams do not need a massive security program to reduce risk, just the right order of operations.


Start with action controls, because that is where business impact is highest. If the runtime can write, send, approve, or change records, action authorization must be stronger than the model’s output.


Next, secure retrieval, because retrieval determines what data reaches the runtime at all. If the wrong content enters context, both confidentiality and output quality are at risk.


Then secure output handling, because model output often crosses into templates, interfaces, automations, and storage systems where standard injection risks return.


After that, secure telemetry: logs, traces, and memory. These often contain more raw data than the user-facing response.


Finally, make security testing repeatable. Prompt injection, retrieval leakage, unauthorized action attempts, and output misuse should be part of regression testing, not occasional manual checks.



Review your LLM attack surface with Blocshop


If your team is building, deploying, or reviewing an LLM application and wants an external technical perspective, schedule a free consultation with Blocshop.


We will review your current runtime design, identify the highest-risk areas in retrieval, tool access, output handling, and action control, and help you define the most important fixes first.


SCHEDULE A FREE CONSULTATION

logo blocshop

Let's talk!

blog

February 26, 2026

Threat modeling for LLM apps: 10 attack paths teams should address first

Large language model applications add a security problem that standard web apps do not have in the same form: untrusted text can influence execution flow.


In a conventional application, hostile input usually targets rendering, queries, or business logic. In an LLM application, hostile input can also affect what context is assembled, what content is retrieved, what tools the system attempts to use, and what data is returned, stored, or acted on.


That is why the main security boundary is usually not the foundation model alone. It is the LLM application runtime around the model: prompt assembly, retrieval, memory, tool routing, action checks, and output handling.


The 10 attack paths below are the ones most teams should assess first.



1) Instruction injection through user input


This is the most familiar LLM-specific risk, and still one of the most common.


A user submits text that is supposed to be treated as data, but the runtime passes it into the model in a way that lets it act like an instruction. The result is not only “the model says something odd.”, the real risk is that user-controlled text changes system behavior.


That behavior may include:

  • expanding retrieval scope,
  • bypassing normal response rules,
  • changing tool parameters,
  • or proposing an action the user should not be able to trigger.


The core problem is weak separation between data and instructions.


The most reliable control is architectural: user input should remain data. If the system needs fields for a tool call, it should extract them into validated structured values. High-impact actions should be allowed only after explicit policy checks. If free-form text can directly shape tool arguments, the boundary is already weak.



2) Instruction injection through retrieved content


User input is not the only place where hostile instructions can enter the system. Retrieved content can do the same thing.


A malicious document, ticket, email, or web page can be ingested, indexed, and later retrieved as context. If the runtime treats that content as trusted guidance instead of untrusted source material, the model may follow hidden instructions embedded inside it.


This matters most in systems that:

  • summarize third-party content,
  • use retrieval-augmented generation,
  • process support tickets or inbound emails,
  • or let agents act on external documents.


The security issue is simple: retrieved content is evidence, not authority. It may support an answer, but it should not directly control tool use or authorization decisions.


Retrieved text should be clearly delimited, filtered where needed, and kept separate from runtime instructions. Even if the model cites retrieved content as the reason for an action, the action still needs separate approval logic.



3) Unauthorized disclosure of protected data


LLM applications often sit close to sensitive data like internal documents, CRM records, support transcripts, contracts, prior chat history, hidden system instructions, and execution traces.


The main risk is not that the model invents something false but that the system discloses data the requesting user is not allowed to see.


This usually happens when access control is applied too late. A common failure pattern looks like this:

  1. the retrieval layer pulls “relevant” content,
  2. restricted records are included in the model context,
  3. the model is asked to answer helpfully,
  4. and only the final response is treated as the security boundary.


By that point, the boundary has already failed. If restricted data entered the context window, the system has already exposed it to the runtime.


The correct order is the reverse. Authorization must be enforced before retrieval results are passed to the model. Tenant boundaries, role checks, row-level filters, and source restrictions belong inside the retrieval path, not after it.


Logs, traces, and memory need the same treatment. If they contain raw prompts, retrieved text, tool parameters, or user data, they can become a second disclosure channel.



4) Compromise in external components and dependencies


An LLM application is rarely just “one model API.”, usually it depends on multiple moving parts like:

  • SDKs,
  • embedding services,
  • vector databases,
  • connectors,
  • parsers,
  • prompt templates,
  • document processors,
  • and external tools the agent can call.


Any of these can become a risk source if they are compromised, misconfigured, or changed without control.


The concern is far from abstract. In an AI runtime, a dependency can influence what data leaves your environment, how it is indexed, how actions are routed, or how output is parsed and reused. A small change in a supporting library can alter security-critical behavior.


That means AI components need the same release discipline as any production dependency - version pinning, review before upgrades, regression tests, restricted outbound access, and visibility into external calls. If a third-party component can see customer data or affect agent behavior, it belongs in the core risk review path.



5) Poisoning of indexed content and feedback data

LLM systems depend on data pipelines such as indexed documents, internal knowledge bases, feedback loops, fine-tuning sets, and user-contributed content.


Any of these can be manipulated of course.


A poisoning attack does not need to break the model, only to shift the system’s behavior enough to produce the wrong answer, retrieve the wrong content, or trust the wrong source. In practice, that often means:

  • documents crafted to rank well in retrieval despite low reliability,
  • content written to influence downstream summaries,
  • or feedback data that pushes the system toward incorrect patterns.


The main mistake here is treating ingestion as a neutral technical step. It is not. Ingestion is a trust decision.


Content provenance, trust tiers, review queues, weighting rules, and moderation all matter. If trusted internal documents and unknown external material are indexed into one flat retrieval space with equal influence, the system is easy to steer. A reliable LLM application depends on a controlled data path, not only on model quality.



6) Unsafe reuse of model output


Model output is often treated as “safe” because it came from the system, but that assumption is wrong.


LLM output is still untrusted data. If the application renders it as HTML, inserts it into a query, passes it into a template, executes it as code, or forwards it into another automation without validation, it has created a standard injection path through a new source.


This is not limited to public-facing applications, on the contraty, internal copilots are often more dangerous because they sit close to admin tools, databases, and workflow systems.


The correct rule is to treat model output with the same suspicion as user input. That means:

  • encode before rendering,
  • validate before storage,
  • sanitize before reuse,
  • and reject structured output that does not match the expected schema.


If an LLM response can be interpreted by another system, then output handling is a security control, not a formatting step.



7) Overprivileged action execution


The most serious incidents usually come from what the application allows the LLM to do, not from what the model says.


If the runtime can create records, update a CRM, send messages, change tickets, write into an ERP, or trigger workflows, then the key question is no longer whether the model can be influenced. The key question is what authority the application grants after that influence occurs.


This is where many “agent” implementations are too permissive. They use broad credentials, vague tool descriptions, write access by default, and weak approval steps. That turns a text-level attack into an operational incident.


The right pattern is strict action control:

  • least-privilege credentials,
  • explicit allowlists of permitted actions,
  • read-only defaults,
  • approval gates for high-impact writes,
  • and durable logs for every executed action.


The LLM may propose an action. It should not be the component that decides whether the action is allowed.



8) Exposure of internal runtime logic


System prompts, hidden instructions, tool schemas, routing rules, and fallback logic are often treated as harmless internal details. They are not.


If an attacker can extract this internal logic, they gain a map of how the runtime works. They learn:

  • how the system prioritizes instructions,
  • what tool names and actions exist,
  • how edge cases are handled,
  • and where the weak spots are likely to be.


That makes later attacks easier.

The deeper issue is not prompt secrecy by itself but relying on prompt secrecy as a security control. If critical security decisions exist only as hidden text instructions, the design is fragile.


Security-sensitive checks should live in code and policy layers that remain in force even if internal prompt text becomes known. A useful test is this: if the full system prompt were exposed, would the same action boundaries still hold? If not, too much trust has been placed in hidden prompt text.



9) Retrieval authorization failure


In enterprise LLM systems, one of the most dangerous failures is poor isolation in the retrieval layer.


The system retrieves “relevant” content, but relevance is not the same as authorization. A similarity search may surface documents from the wrong tenant, the wrong department, or the wrong access level if retrieval is driven only by ranking logic.


This is easy to miss because retrieval often feels like infrastructure rather than security. In practice, it is both. If authorization is not enforced inside the retrieval process, the model may receive content it should never have seen. Even if the final answer looks harmless, the boundary has already failed because restricted content entered runtime context.


The fix is exact here:

  • apply tenant filters before ranking,
  • enforce document-level permissions,
  • restrict sources by policy,
  • and treat retrieval as part of access control.


A system can have excellent answer quality and still fail basic data isolation if retrieval is not constrained correctly.



10) Resource exhaustion and cost abuse


LLM applications can be stressed in ways that are both technical and financial.


An attacker does not need a classic denial-of-service pattern if the runtime allows:

  • oversized prompts,
  • repeated retries,
  • deep tool-call chains,
  • broad retrieval fan-out,
  • or loops that keep consuming tokens and API capacity.


In that case, the attacker can drive the system into expensive execution paths until latency rises, queues back up, provider limits hit, or costs grow beyond control.


This is why every LLM runtime needs a clearly defined work boundary. The system should have explicit limits for:

  • maximum tokens,
  • maximum tool-call depth,
  • request timeouts,
  • concurrency,
  • retry behavior,
  • and loop detection.


A basic design question should always have a clear answer: what is the maximum amount of work one request is allowed to trigger? If the team cannot answer that precisely, the cost boundary is not under control.



How to assess the attack surface


A practical way to analyze risks is to map the system as a data flow:

User input and external content → orchestration layer → retrieval layer → model inference → tool calls → output handling → logs and memory


Then ask one question at each stage: what can an attacker influence here, and what can that influence trigger next?



What teams should fix first


Most teams do not need a massive security program to reduce risk, just the right order of operations.


Start with action controls, because that is where business impact is highest. If the runtime can write, send, approve, or change records, action authorization must be stronger than the model’s output.


Next, secure retrieval, because retrieval determines what data reaches the runtime at all. If the wrong content enters context, both confidentiality and output quality are at risk.


Then secure output handling, because model output often crosses into templates, interfaces, automations, and storage systems where standard injection risks return.


After that, secure telemetry: logs, traces, and memory. These often contain more raw data than the user-facing response.


Finally, make security testing repeatable. Prompt injection, retrieval leakage, unauthorized action attempts, and output misuse should be part of regression testing, not occasional manual checks.



Review your LLM attack surface with Blocshop


If your team is building, deploying, or reviewing an LLM application and wants an external technical perspective, schedule a free consultation with Blocshop.


We will review your current runtime design, identify the highest-risk areas in retrieval, tool access, output handling, and action control, and help you define the most important fixes first.


SCHEDULE A FREE CONSULTATION

logo blocshop

Let's talk!