AI-assisted pull requests: what code review must catch when AI helped write the change

blog

May 12, 2026

When AI helps write a pull request, the same workflow can produce the implementation, tests, comments, configuration changes, dependency suggestions, and pull request description. That creates a specific review risk. The change may look internally consistent, but coherence does not prove safety.

Recent research supports a risk-based approach. A 2025 ACM study of AI-generated code in GitHub projects identified 733 generated snippets and found security weaknesses in 29.5% of Python snippets and 24.2% of JavaScript snippets, across 43 CWE categories. A USENIX Security 2025 paper on package hallucinations generated 576,000 code samples across Python and JavaScript and found hallucinated packages across all 16 tested coding models, with 205,474 unique hallucinated package names.

The conclusion is straightforward - AI-assisted pull requests need review that is risk-based, enforceable, and evidence-driven. AI disclosure helps reviewers focus, but the review standard should be triggered by what changed, not only by whether the author says AI was used.

The review failure is missing evidence

Most pull request reviews still focus on the visible diff. Does the code look clean? Do the tests pass? Does the implementation match the ticket? Did someone approve it?

AI-assisted work makes that too shallow for high-risk changes.

The reviewer needs evidence for five things:

The author understands the generated parts.
The change fits the existing architecture.
The tests challenge the riskiest assumption.
Dependencies and configuration changes were checked separately.
The right owner approved the part of the system being changed.

This is especially important when the pull request touches authentication, authorization, tenant isolation, payments, customer data, audit logs, infrastructure, CI/CD, secrets, or migrations.

GitHub already supports parts of this model. Branch protection can require pull request reviews, status checks, conversation resolution, and restrictions before merge. CODEOWNERS can route changes to the owners of affected files and, with branch protection, require code-owner approval.

The missing piece is usually the operating rule: which changes require stronger evidence before approval.

The review question that matters

For AI-assisted pull requests, the useful review question is:

What assumption would make this change unsafe, and where does the pull request prove that assumption has been checked?

For a permission change, the risky assumption may be that the code uses the right source of authority.
For a payment change, it may be that retry behaviour cannot duplicate a charge.
For a data import, it may be that partial failure cannot corrupt accepted records.
For a logging change, it may be that the logged object cannot contain personal data, tokens, payment data, or customer identifiers.
For a migration, it may be that the change can be rolled forward safely if rollback is not possible.

This is where human review still matters. OWASP describes secure code review as manual examination of source code to identify vulnerabilities that automated tools often miss, especially in application logic, data flow, implementation details, and context-specific flaws.

The reviewer should not ask vaguely for “more tests” or “a security check.” The reviewer should ask for evidence against the riskiest failure mode.

The copy-pasteable review protocol

This section can be used as the actual how-to.

AI-assisted pull request review protocol

Apply this protocol to every pull request that touches authentication, authorization, tenant isolation, customer data, payments, audit logs, infrastructure, CI/CD, secrets, external integrations, or database migrations.

1. Classify the risk before reviewing the diff

The author must mark whether the pull request touches any high-risk area.

High-risk areas include:

authentication,
authorization,
tenant boundaries,
customer or personal data,
payments or billing,
audit logs,
file upload or parsing,
external integrations,
infrastructure,
CI/CD,
secrets or environment variables,
database migrations,
production deployment or release automation.

If none of these areas are touched, the pull request can follow normal review.

If one or more are touched, the pull request needs the evidence below.

2. Identify where AI was used, when known

The author should state whether AI materially helped with:

implementation,
refactoring,
tests,
comments or documentation,
dependency suggestions,
configuration,
CI/CD,
infrastructure,
pull request summary.

Do not require prompt histories. The useful question is where generated assumptions may have entered the change.

AI disclosure is useful, but the high-risk review still applies even when AI use is not declared.

3. Require author ownership

For high-risk changes, the author must explain:

the main logic path,
the failure path,
the most important edge case,
the source of authority used for permissions or data access,
the reason for the chosen approach,
what could break if the main assumption is wrong.

If the author cannot explain these points, the pull request is not ready for merge.

4. Require one negative test

Every high-risk pull request must include one negative test that challenges the riskiest assumption.

Examples:

For authorization, test that a user from another tenant cannot access the record by changing the identifier.
For payments, test that retry after timeout does not duplicate the charge.
For imports, test that malformed input fails without committing partial bad data.
For logging, test that logs do not expose tokens, personal data, payment data, or customer identifiers.
For external integrations, test timeout, duplicate callback, or invalid response handling.
For migrations, document and test the recovery path where possible.
The author must name the negative test and state which risk it covers.

5. Review dependencies and configuration separately

Any new dependency, workflow change, permission change, environment variable, service access, infrastructure setting, or deployment change must be reviewed as a separate risk item.

For a new dependency, the reviewer should confirm:

the package is real,
the name is correct,
the package is maintained,
the license is acceptable,
transitive dependencies are understood,
install or build scripts are acceptable,
the package does not create unwanted access to files, secrets, network calls, parsing paths, authentication, or customer data,
existing platform code cannot reasonably do the same job.

This matters because package hallucination creates supply-chain risk. In the USENIX study, models recommended non-existent packages; such names can later be registered by attackers and used in package-confusion attacks.

For CI/CD changes, the reviewer should confirm:

an owner reviewed the change,
required checks still run,
token permissions are restricted,
access to secrets is reviewed,
third-party actions or reusable workflows are approved,
production deployment still requires manual approval.

GitHub’s Actions security guidance states that third-party actions can access secrets or use repository tokens, and that pinning an action to a full-length commit SHA is the only way to use an action as an immutable release.

6. Require owner approval for the changed boundary

High-risk changes must be approved by someone who owns the affected boundary.

Examples:

Authentication and authorization changes need approval from the security or platform owner.
Payment changes need approval from the owner of billing or payments.
Tenant-boundary changes need approval from someone responsible for the product’s data isolation model.
CI/CD and infrastructure changes need approval from the platform or delivery owner.
Migration changes need approval from the database or backend owner.
The reviewer should match the risk of the change, not whoever is available.

7. Keep agents inside normal delivery controls

If an AI tool can edit files, run commands, open pull requests, change configuration, or call tools, treat it as an actor in the delivery process.

The agent must not be able to bypass controls a human developer has to pass.

That means:

no direct push to protected branches,
no production deployment without manual approval,
no production secrets by default,
no database migration without named human ownership,
no infrastructure change without platform-owner approval,
no CI/CD change without owner review and passing checks,
no dependency addition without dependency review.

The model’s maximum possible impact should stay smaller than the system’s maximum possible damage.

8. Keep review evidence in the pull request

For high-risk changes, the pull request must record:

what problem was solved,
which high-risk area changed,
where AI was used, when known,
what assumption was checked,
which negative test was added,
whether dependencies, permissions, workflows, environment variables, or service accesses changed,
who approved the sensitive area,
what checks passed,
what rollback or forward-fix path exists for release-sensitive changes.

This is not paperwork for its own sake. It prevents future incident review, audit work, and maintenance from depending on memory or Slack messages.

Merge rule for high-risk AI-assisted pull requests

A high-risk AI-assisted pull request can be merged only when all of the following are true:

The risk area is marked.
AI-assisted parts are identified when known.
The author explains the main logic and failure path.
One negative test challenges the riskiest assumption.
Dependencies and configuration changes are explicitly reviewed.
The affected area has owner approval.
Required checks pass on the protected branch.
Production-impacting changes still require manual approval.

How to roll it out in 30 days

Week 1: update the pull request template. Add fields for risk area, AI assistance, reviewer focus, negative test, dependencies, configuration, and release evidence.

Week 2: define high-risk areas. Start with authentication, authorization, tenant boundaries, customer data, payments, audit logs, external integrations, infrastructure, CI/CD, secrets, migrations, and production deployment.

Week 3: assign owners and enforce approval. Use CODEOWNERS, branch protection, rulesets, or the equivalent in your platform. Require owner approval for high-risk areas and passing checks before merge.

Week 4: enforce the evidence gate. High-risk changes need one negative test, author explanation of the failure path, explicit review of dependencies and configuration, and a rollback or forward-fix note for release-sensitive changes.

This keeps low-risk work moving and puts stronger controls around the places where generated code can cause real damage.

AI can help produce the draft. The delivery system still has to decide whether the change belongs in production.

Have more questions?

Blocshop helps software development teams use AI inside real delivery workflows without losing control of architecture, review quality, testing, and release ownership.

If your team is already using AI-assisted development and review quality is becoming the bottleneck, schedule a free consultation with Blocshop.

SCHEDULE A FREE CONSULTATION

Learn more from our insights

March 19, 2026

Agent access in B2B software: identity, permissions, approvals and auditability

March 12, 2026

AI agents in B2B software: your next user may not be a person

March 5, 2026

Why AI coding gets expensive when context is badly prepared

The journey to your custom software solution starts here.

Services

Custom Software Development Services

Fintech Applications Development

AI Integration and LLM API Enhancement

.NET Business Application Development

Open Banking API Development

Corporate Innovation Lab

ETL Services and Data Transformations

About

Case studies

Careers

Blog

Our Projects

Gleek

Weekwise

Roboshift

hello@blocshop.io

Revoluční 1110 00, Prague Czech Republic

blog

May 12, 2026

AI-assisted pull requests: what code review must catch when AI helped write the change

The review failure is missing evidence

Most pull request reviews still focus on the visible diff. Does the code look clean? Do the tests pass? Does the implementation match the ticket? Did someone approve it?

AI-assisted work makes that too shallow for high-risk changes.

The reviewer needs evidence for five things:

The author understands the generated parts.
The change fits the existing architecture.
The tests challenge the riskiest assumption.
Dependencies and configuration changes were checked separately.
The right owner approved the part of the system being changed.

This is especially important when the pull request touches authentication, authorization, tenant isolation, payments, customer data, audit logs, infrastructure, CI/CD, secrets, or migrations.

The missing piece is usually the operating rule: which changes require stronger evidence before approval.

The review question that matters

For AI-assisted pull requests, the useful review question is:

What assumption would make this change unsafe, and where does the pull request prove that assumption has been checked?

For a permission change, the risky assumption may be that the code uses the right source of authority.
For a payment change, it may be that retry behaviour cannot duplicate a charge.
For a data import, it may be that partial failure cannot corrupt accepted records.
For a logging change, it may be that the logged object cannot contain personal data, tokens, payment data, or customer identifiers.
For a migration, it may be that the change can be rolled forward safely if rollback is not possible.

The reviewer should not ask vaguely for “more tests” or “a security check.” The reviewer should ask for evidence against the riskiest failure mode.