blog
May 12, 2026
AI-assisted pull requests: what code review must catch when AI helped write the change
When AI helps write a pull request, the same workflow can produce the implementation, tests, comments, configuration changes, dependency suggestions, and pull request description. That creates a specific review risk. The change may look internally consistent, but coherence does not prove safety.
Recent research supports a risk-based approach. A 2025 ACM study of AI-generated code in GitHub projects identified 733 generated snippets and found security weaknesses in 29.5% of Python snippets and 24.2% of JavaScript snippets, across 43 CWE categories. A USENIX Security 2025 paper on package hallucinations generated 576,000 code samples across Python and JavaScript and found hallucinated packages across all 16 tested coding models, with 205,474 unique hallucinated package names.
The conclusion is straightforward - AI-assisted pull requests need review that is risk-based, enforceable, and evidence-driven. AI disclosure helps reviewers focus, but the review standard should be triggered by what changed, not only by whether the author says AI was used.
Most pull request reviews still focus on the visible diff. Does the code look clean? Do the tests pass? Does the implementation match the ticket? Did someone approve it?
AI-assisted work makes that too shallow for high-risk changes.
The reviewer needs evidence for five things:
This is especially important when the pull request touches authentication, authorization, tenant isolation, payments, customer data, audit logs, infrastructure, CI/CD, secrets, or migrations.
GitHub already supports parts of this model. Branch protection can require pull request reviews, status checks, conversation resolution, and restrictions before merge. CODEOWNERS can route changes to the owners of affected files and, with branch protection, require code-owner approval.
The missing piece is usually the operating rule: which changes require stronger evidence before approval.
For AI-assisted pull requests, the useful review question is:
What assumption would make this change unsafe, and where does the pull request prove that assumption has been checked?
This is where human review still matters. OWASP describes secure code review as manual examination of source code to identify vulnerabilities that automated tools often miss, especially in application logic, data flow, implementation details, and context-specific flaws.
The reviewer should not ask vaguely for “more tests” or “a security check.” The reviewer should ask for evidence against the riskiest failure mode.
This section can be used as the actual how-to.
Apply this protocol to every pull request that touches authentication, authorization, tenant isolation, customer data, payments, audit logs, infrastructure, CI/CD, secrets, external integrations, or database migrations.
The author must mark whether the pull request touches any high-risk area.
High-risk areas include:
If none of these areas are touched, the pull request can follow normal review.
If one or more are touched, the pull request needs the evidence below.
The author should state whether AI materially helped with:
Do not require prompt histories. The useful question is where generated assumptions may have entered the change.
AI disclosure is useful, but the high-risk review still applies even when AI use is not declared.
For high-risk changes, the author must explain:
If the author cannot explain these points, the pull request is not ready for merge.
Every high-risk pull request must include one negative test that challenges the riskiest assumption.
Examples:
Any new dependency, workflow change, permission change, environment variable, service access, infrastructure setting, or deployment change must be reviewed as a separate risk item.
For a new dependency, the reviewer should confirm:
This matters because package hallucination creates supply-chain risk. In the USENIX study, models recommended non-existent packages; such names can later be registered by attackers and used in package-confusion attacks.
For CI/CD changes, the reviewer should confirm:
GitHub’s Actions security guidance states that third-party actions can access secrets or use repository tokens, and that pinning an action to a full-length commit SHA is the only way to use an action as an immutable release.
High-risk changes must be approved by someone who owns the affected boundary.
Examples:
If an AI tool can edit files, run commands, open pull requests, change configuration, or call tools, treat it as an actor in the delivery process.
The agent must not be able to bypass controls a human developer has to pass.
That means:
The model’s maximum possible impact should stay smaller than the system’s maximum possible damage.
For high-risk changes, the pull request must record:
This is not paperwork for its own sake. It prevents future incident review, audit work, and maintenance from depending on memory or Slack messages.
A high-risk AI-assisted pull request can be merged only when all of the following are true:
Week 1: update the pull request template. Add fields for risk area, AI assistance, reviewer focus, negative test, dependencies, configuration, and release evidence.
Week 2: define high-risk areas. Start with authentication, authorization, tenant boundaries, customer data, payments, audit logs, external integrations, infrastructure, CI/CD, secrets, migrations, and production deployment.
Week 3: assign owners and enforce approval. Use CODEOWNERS, branch protection, rulesets, or the equivalent in your platform. Require owner approval for high-risk areas and passing checks before merge.
Week 4: enforce the evidence gate. High-risk changes need one negative test, author explanation of the failure path, explicit review of dependencies and configuration, and a rollback or forward-fix note for release-sensitive changes.
This keeps low-risk work moving and puts stronger controls around the places where generated code can cause real damage.
AI can help produce the draft. The delivery system still has to decide whether the change belongs in production.
Blocshop helps software development teams use AI inside real delivery workflows without losing control of architecture, review quality, testing, and release ownership.
If your team is already using AI-assisted development and review quality is becoming the bottleneck, schedule a free consultation with Blocshop.
Learn more from our insights

blog
May 12, 2026
AI-assisted pull requests: what code review must catch when AI helped write the change
When AI helps write a pull request, the same workflow can produce the implementation, tests, comments, configuration changes, dependency suggestions, and pull request description. That creates a specific review risk. The change may look internally consistent, but coherence does not prove safety.
Recent research supports a risk-based approach. A 2025 ACM study of AI-generated code in GitHub projects identified 733 generated snippets and found security weaknesses in 29.5% of Python snippets and 24.2% of JavaScript snippets, across 43 CWE categories. A USENIX Security 2025 paper on package hallucinations generated 576,000 code samples across Python and JavaScript and found hallucinated packages across all 16 tested coding models, with 205,474 unique hallucinated package names.
The conclusion is straightforward - AI-assisted pull requests need review that is risk-based, enforceable, and evidence-driven. AI disclosure helps reviewers focus, but the review standard should be triggered by what changed, not only by whether the author says AI was used.
Most pull request reviews still focus on the visible diff. Does the code look clean? Do the tests pass? Does the implementation match the ticket? Did someone approve it?
AI-assisted work makes that too shallow for high-risk changes.
The reviewer needs evidence for five things:
This is especially important when the pull request touches authentication, authorization, tenant isolation, payments, customer data, audit logs, infrastructure, CI/CD, secrets, or migrations.
GitHub already supports parts of this model. Branch protection can require pull request reviews, status checks, conversation resolution, and restrictions before merge. CODEOWNERS can route changes to the owners of affected files and, with branch protection, require code-owner approval.
The missing piece is usually the operating rule: which changes require stronger evidence before approval.
For AI-assisted pull requests, the useful review question is:
What assumption would make this change unsafe, and where does the pull request prove that assumption has been checked?
This is where human review still matters. OWASP describes secure code review as manual examination of source code to identify vulnerabilities that automated tools often miss, especially in application logic, data flow, implementation details, and context-specific flaws.
The reviewer should not ask vaguely for “more tests” or “a security check.” The reviewer should ask for evidence against the riskiest failure mode.
This section can be used as the actual how-to.
Apply this protocol to every pull request that touches authentication, authorization, tenant isolation, customer data, payments, audit logs, infrastructure, CI/CD, secrets, external integrations, or database migrations.
The author must mark whether the pull request touches any high-risk area.
High-risk areas include:
If none of these areas are touched, the pull request can follow normal review.
If one or more are touched, the pull request needs the evidence below.
The author should state whether AI materially helped with:
Do not require prompt histories. The useful question is where generated assumptions may have entered the change.
AI disclosure is useful, but the high-risk review still applies even when AI use is not declared.
For high-risk changes, the author must explain:
If the author cannot explain these points, the pull request is not ready for merge.
Every high-risk pull request must include one negative test that challenges the riskiest assumption.
Examples:
Any new dependency, workflow change, permission change, environment variable, service access, infrastructure setting, or deployment change must be reviewed as a separate risk item.
For a new dependency, the reviewer should confirm:
This matters because package hallucination creates supply-chain risk. In the USENIX study, models recommended non-existent packages; such names can later be registered by attackers and used in package-confusion attacks.
For CI/CD changes, the reviewer should confirm:
GitHub’s Actions security guidance states that third-party actions can access secrets or use repository tokens, and that pinning an action to a full-length commit SHA is the only way to use an action as an immutable release.
High-risk changes must be approved by someone who owns the affected boundary.
Examples:
If an AI tool can edit files, run commands, open pull requests, change configuration, or call tools, treat it as an actor in the delivery process.
The agent must not be able to bypass controls a human developer has to pass.
That means:
The model’s maximum possible impact should stay smaller than the system’s maximum possible damage.
For high-risk changes, the pull request must record:
This is not paperwork for its own sake. It prevents future incident review, audit work, and maintenance from depending on memory or Slack messages.
A high-risk AI-assisted pull request can be merged only when all of the following are true:
Week 1: update the pull request template. Add fields for risk area, AI assistance, reviewer focus, negative test, dependencies, configuration, and release evidence.
Week 2: define high-risk areas. Start with authentication, authorization, tenant boundaries, customer data, payments, audit logs, external integrations, infrastructure, CI/CD, secrets, migrations, and production deployment.
Week 3: assign owners and enforce approval. Use CODEOWNERS, branch protection, rulesets, or the equivalent in your platform. Require owner approval for high-risk areas and passing checks before merge.
Week 4: enforce the evidence gate. High-risk changes need one negative test, author explanation of the failure path, explicit review of dependencies and configuration, and a rollback or forward-fix note for release-sensitive changes.
This keeps low-risk work moving and puts stronger controls around the places where generated code can cause real damage.
AI can help produce the draft. The delivery system still has to decide whether the change belongs in production.
Blocshop helps software development teams use AI inside real delivery workflows without losing control of architecture, review quality, testing, and release ownership.
If your team is already using AI-assisted development and review quality is becoming the bottleneck, schedule a free consultation with Blocshop.
Learn more from our insights
Talk to sales

blog
May 12, 2026
AI-assisted pull requests: what code review must catch when AI helped write the change
When AI helps write a pull request, the same workflow can produce the implementation, tests, comments, configuration changes, dependency suggestions, and pull request description. That creates a specific review risk. The change may look internally consistent, but coherence does not prove safety.
Recent research supports a risk-based approach. A 2025 ACM study of AI-generated code in GitHub projects identified 733 generated snippets and found security weaknesses in 29.5% of Python snippets and 24.2% of JavaScript snippets, across 43 CWE categories. A USENIX Security 2025 paper on package hallucinations generated 576,000 code samples across Python and JavaScript and found hallucinated packages across all 16 tested coding models, with 205,474 unique hallucinated package names.
The conclusion is straightforward - AI-assisted pull requests need review that is risk-based, enforceable, and evidence-driven. AI disclosure helps reviewers focus, but the review standard should be triggered by what changed, not only by whether the author says AI was used.
Most pull request reviews still focus on the visible diff. Does the code look clean? Do the tests pass? Does the implementation match the ticket? Did someone approve it?
AI-assisted work makes that too shallow for high-risk changes.
The reviewer needs evidence for five things:
This is especially important when the pull request touches authentication, authorization, tenant isolation, payments, customer data, audit logs, infrastructure, CI/CD, secrets, or migrations.
GitHub already supports parts of this model. Branch protection can require pull request reviews, status checks, conversation resolution, and restrictions before merge. CODEOWNERS can route changes to the owners of affected files and, with branch protection, require code-owner approval.
The missing piece is usually the operating rule: which changes require stronger evidence before approval.
For AI-assisted pull requests, the useful review question is:
What assumption would make this change unsafe, and where does the pull request prove that assumption has been checked?
This is where human review still matters. OWASP describes secure code review as manual examination of source code to identify vulnerabilities that automated tools often miss, especially in application logic, data flow, implementation details, and context-specific flaws.
The reviewer should not ask vaguely for “more tests” or “a security check.” The reviewer should ask for evidence against the riskiest failure mode.
This section can be used as the actual how-to.
Apply this protocol to every pull request that touches authentication, authorization, tenant isolation, customer data, payments, audit logs, infrastructure, CI/CD, secrets, external integrations, or database migrations.
The author must mark whether the pull request touches any high-risk area.
High-risk areas include:
If none of these areas are touched, the pull request can follow normal review.
If one or more are touched, the pull request needs the evidence below.
The author should state whether AI materially helped with:
Do not require prompt histories. The useful question is where generated assumptions may have entered the change.
AI disclosure is useful, but the high-risk review still applies even when AI use is not declared.
For high-risk changes, the author must explain:
If the author cannot explain these points, the pull request is not ready for merge.
Every high-risk pull request must include one negative test that challenges the riskiest assumption.
Examples:
Any new dependency, workflow change, permission change, environment variable, service access, infrastructure setting, or deployment change must be reviewed as a separate risk item.
For a new dependency, the reviewer should confirm:
This matters because package hallucination creates supply-chain risk. In the USENIX study, models recommended non-existent packages; such names can later be registered by attackers and used in package-confusion attacks.
For CI/CD changes, the reviewer should confirm:
GitHub’s Actions security guidance states that third-party actions can access secrets or use repository tokens, and that pinning an action to a full-length commit SHA is the only way to use an action as an immutable release.
High-risk changes must be approved by someone who owns the affected boundary.
Examples:
If an AI tool can edit files, run commands, open pull requests, change configuration, or call tools, treat it as an actor in the delivery process.
The agent must not be able to bypass controls a human developer has to pass.
That means:
The model’s maximum possible impact should stay smaller than the system’s maximum possible damage.
For high-risk changes, the pull request must record:
This is not paperwork for its own sake. It prevents future incident review, audit work, and maintenance from depending on memory or Slack messages.
A high-risk AI-assisted pull request can be merged only when all of the following are true:
Week 1: update the pull request template. Add fields for risk area, AI assistance, reviewer focus, negative test, dependencies, configuration, and release evidence.
Week 2: define high-risk areas. Start with authentication, authorization, tenant boundaries, customer data, payments, audit logs, external integrations, infrastructure, CI/CD, secrets, migrations, and production deployment.
Week 3: assign owners and enforce approval. Use CODEOWNERS, branch protection, rulesets, or the equivalent in your platform. Require owner approval for high-risk areas and passing checks before merge.
Week 4: enforce the evidence gate. High-risk changes need one negative test, author explanation of the failure path, explicit review of dependencies and configuration, and a rollback or forward-fix note for release-sensitive changes.
This keeps low-risk work moving and puts stronger controls around the places where generated code can cause real damage.
AI can help produce the draft. The delivery system still has to decide whether the change belongs in production.
Blocshop helps software development teams use AI inside real delivery workflows without losing control of architecture, review quality, testing, and release ownership.
If your team is already using AI-assisted development and review quality is becoming the bottleneck, schedule a free consultation with Blocshop.
Learn more from our insights
