AI-Assisted Code Reviews: What the Latest Research Reveals About GPT in Pull Request Workflows

Smiling person in layered hair w/eyelashes,gesturing

Zoia Baletska

2 December 2025

ac485c.webp

At ZEN, we keep a close eye on emerging research that affects how engineering teams build, ship, and maintain software. Recently, we came across a new study — “The Impact of Large Language Models (LLMs) on Code Review Process” (Antonio Collantea, Samuel Abedu, SayedHassan Khatoonabadi, Ahmad Abdellatif, Ebube Alor, Emad Shihaba) — that analyses more than 25,000 GitHub pull requests to understand how GPT-style models influence code review speed and collaboration.

The findings are both promising and highly relevant for modern engineering teams working in cloud-native, agile, or distributed environments.

Below is our breakdown of the study — and what we think teams can take away from it.

What the Study Found

The researchers compiled a dataset of 25,473 pull requests from 9,254 GitHub repositories, identifying around 1,600 GPT-assisted PRs.

Here are the standout results:

1. GPT-assisted PRs are merged significantly faster

  • Median merge time for GPT-assisted PRs: ~9 hours

  • Median merge time for standard PRs: ~23 hours
    That's roughly a 61% reduction.

2. Faster time to first review comment

The “At Review” phase improved dramatically:

  • GPT-assisted PRs: 1 hour

  • Non-assisted PRs: 3 hours
    An improvement of ~66.7%.

3. Huge impact on “Waiting for Changes” phases

This is where the biggest gains show up:

  • GPT-assisted PRs median wait: ~3 hours

  • Non-assisted median wait: ~24 hours
    An 87.5% improvement.

4. How developers actually use GPT

The study classified GPT usage in PRs as:

  • 60% – Enhancements (refactoring, renaming, error handling)

  • 26% – Bug fixes

  • 12% – Documentation

These patterns reflect how teams are naturally using LLMs: to accelerate small but important improvements that often clog review cycles.

Why This Matters

From our perspective, the findings illustrate something we’ve observed in the industry:

LLMs aren’t replacing reviewers — they’re compressing the waiting time around reviews.

The biggest bottleneck in PR workflows isn’t usually quality or complexity — it’s idle time: waiting for feedback, waiting for changes, waiting for someone to pick up the next step.

By providing instant suggestions and improvements, GPT reduces that downtime and keeps the workflow moving. For modern teams, especially those practising trunk-based development or high-velocity delivery, this is significant.

How Teams Can Apply These Insights

1. Use LLMs to support (not replace) reviewers

The study shows the biggest improvements in enhancements and bug fixes — not major features or architectural decisions.
Teams can benefit by using GPT for:

  • cleaning up code

  • eliminating trivial review comments

  • improving naming and documentation

  • drafting initial fixes or refactor suggestions

This leaves human reviewers free to focus on design, architecture, and correctness.

2. Be transparent about AI assistance

The authors observed that GPT-assisted PRs were identifiable through commit messages, PR descriptions, and patterns of changes.
Teams may want to formalise this by:

  • adding a “LLM assistance used?” checkbox

  • writing guidelines for acceptable usage

  • documenting reviewer expectations

This reduces ambiguity and builds trust.

3. Track and measure real impact

The study used clear PR-level metrics (merge time, time-to-review, waiting time). Teams can replicate this internally to see whether LLM assistance is actually accelerating workflows — or just adding noise.

4. Train teams on safe and effective usage

The study mentions common pitfalls of LLM-generated changes:

  • context loss

  • misinterpretation of code

  • superficial fixes

  • incorrect refactoring under token limits

Teams should treat LLM output like a strong-but-junior engineer’s suggestion: useful, but always reviewed critically.

5. Keep humans in the loop

Even though metrics improved dramatically, quality wasn't evaluated in this study.
Our takeaway: LLMs can speed up the mechanics of a review, but not the judgment.

Design decisions, security concerns, domain logic, and architectural tradeoffs still require experienced engineers.

Limitations to Keep in Mind

We appreciate that the study is rigorous but also transparent about its constraints:

  • It uses open-source GitHub projects — enterprise workflows may behave differently.

  • GPT-assisted PRs were detected through heuristics, not precise logs.

  • PRs assisted by GPT may skew toward simpler tasks.

  • Quality of code changes wasn't analysed, only timing.

These limitations are important for interpreting the results responsibly.

Our Takeaway

This study provides one of the earliest large-scale, quantitative assessments of how LLMs impact code review workflows — and the results are encouraging.

As the ZEN Team, our takeaway is simple:

LLMs dramatically speed up the slow, idle stages of PR workflows — not by replacing human reviewers, but by reducing friction around them.

For engineering teams looking to improve developer experience, flow efficiency, and delivery speed, integrating LLM-assisted review practices is a promising direction — as long as it’s paired with strong human oversight and sensible guidelines.

background

Optimize with ZEN's Expertise