AI-Assisted Code Reviews: What the Latest Research Reveals About GPT in Pull Request Workflows

Smiling person in layered hair w/eyelashes,gesturing

Published on 2 December 2025 by Zoia Baletska

2 December 2025

At ZEN, we keep a close eye on emerging research that affects how engineering teams build, ship, and maintain software. Recently, we came across a new study — “The Impact of Large Language Models (LLMs) on Code Review Process” (Antonio Collantea, Samuel Abedu, SayedHassan Khatoonabadi, Ahmad Abdellatif, Ebube Alor, Emad Shihaba) — that analyses more than 25,000 GitHub pull requests to understand how GPT-style models influence code review speed and collaboration.

The findings are both promising and highly relevant for modern engineering teams working in cloud-native, agile, or distributed environments.

Below is our breakdown of the study — and what we think teams can take away from it.

What the Study Found

The researchers compiled a dataset of 25,473 pull requests from 9,254 GitHub repositories, identifying around 1,600 GPT-assisted PRs.

Here are the standout results:

1. GPT-assisted PRs are merged significantly faster

Median merge time for GPT-assisted PRs: ~9 hours
Median merge time for standard PRs: ~23 hours
That's roughly a 61% reduction.

2. Faster time to first review comment

The “At Review” phase improved dramatically:

GPT-assisted PRs: 1 hour
Non-assisted PRs: 3 hours
An improvement of ~66.7%.

3. Huge impact on “Waiting for Changes” phases

This is where the biggest gains show up:

GPT-assisted PRs median wait: ~3 hours
Non-assisted median wait: ~24 hours
An 87.5% improvement.

4. How developers actually use GPT

The study classified GPT usage in PRs as:

60% – Enhancements (refactoring, renaming, error handling)
26% – Bug fixes
12% – Documentation

These patterns reflect how teams are naturally using LLMs: to accelerate small but important improvements that often clog review cycles.

Why This Matters

From our perspective, the findings illustrate something we’ve observed in the industry:

LLMs aren’t replacing reviewers — they’re compressing the waiting time around reviews.

The biggest bottleneck in PR workflows isn’t usually quality or complexity — it’s idle time: waiting for feedback, waiting for changes, waiting for someone to pick up the next step.

By providing instant suggestions and improvements, GPT reduces that downtime and keeps the workflow moving. For modern teams, especially those practising trunk-based development or high-velocity delivery, this is significant.

How Teams Can Apply These Insights

1. Use LLMs to support (not replace) reviewers

The study shows the biggest improvements in enhancements and bug fixes — not major features or architectural decisions.
Teams can benefit by using GPT for:

cleaning up code
eliminating trivial review comments
improving naming and documentation
drafting initial fixes or refactor suggestions

This leaves human reviewers free to focus on design, architecture, and correctness.

2. Be transparent about AI assistance

The authors observed that GPT-assisted PRs were identifiable through commit messages, PR descriptions, and patterns of changes.
Teams may want to formalise this by:

adding a “LLM assistance used?” checkbox
writing guidelines for acceptable usage
documenting reviewer expectations

This reduces ambiguity and builds trust.

3. Track and measure real impact

The study used clear PR-level metrics (merge time, time-to-review, waiting time). Teams can replicate this internally to see whether LLM assistance is actually accelerating workflows — or just adding noise.

4. Train teams on safe and effective usage

The study mentions common pitfalls of LLM-generated changes:

context loss
misinterpretation of code
superficial fixes
incorrect refactoring under token limits

Teams should treat LLM output like a strong-but-junior engineer’s suggestion: useful, but always reviewed critically.

5. Keep humans in the loop

Even though metrics improved dramatically, quality wasn't evaluated in this study.
Our takeaway: LLMs can speed up the mechanics of a review, but not the judgment.

Design decisions, security concerns, domain logic, and architectural tradeoffs still require experienced engineers.

Limitations to Keep in Mind

We appreciate that the study is rigorous but also transparent about its constraints:

It uses open-source GitHub projects — enterprise workflows may behave differently.
GPT-assisted PRs were detected through heuristics, not precise logs.
PRs assisted by GPT may skew toward simpler tasks.
Quality of code changes wasn't analysed, only timing.

These limitations are important for interpreting the results responsibly.

Our Takeaway

This study provides one of the earliest large-scale, quantitative assessments of how LLMs impact code review workflows — and the results are encouraging.

As the ZEN Team, our takeaway is simple:

LLMs dramatically speed up the slow, idle stages of PR workflows — not by replacing human reviewers, but by reducing friction around them.

For engineering teams looking to improve developer experience, flow efficiency, and delivery speed, integrating LLM-assisted review practices is a promising direction — as long as it’s paired with strong human oversight and sensible guidelines.

Optimize with ZEN's Expertise

Upgrade your development process or let ZEN craft a subsystem that sets the standard.

AI-Assisted Code Reviews: What the Latest Research Reveals About GPT in Pull Request Workflows

What the Study Found

1. GPT-assisted PRs are merged significantly faster

2. Faster time to first review comment

3. Huge impact on “Waiting for Changes” phases

4. How developers actually use GPT

Why This Matters

How Teams Can Apply These Insights

1. Use LLMs to support (not replace) reviewers

2. Be transparent about AI assistance

3. Track and measure real impact

4. Train teams on safe and effective usage

5. Keep humans in the loop

Limitations to Keep in Mind

Our Takeaway

Optimize with ZEN's Expertise

Contact Form

Read more:

AI Fatigue in Development: Why Constant AI Assistance Can Wear You Down

Did React Router Framework Kill Next.js?

How to Boost Your Google Cloud Skills: A Practical Growth Guide for Modern Engineers

AI-Assisted Code Reviews: What the Latest Research Reveals About GPT in Pull Request Workflows

Code Reviews That Teach, Not Torture: Patterns for Effective, Respectful, and Useful Code Review Culture

The Future of Frontend Hosting: What Developers Need to Know in 2025