In partnership with

So I had this terrible piece of code.

You know the type. Works, but barely. Nested ifs everywhere. Variables named x and temp. Zero error handling. The kind of code you write at 2 AM and promise to fix later.

Except "later" never comes.

So I decided to test something: What if I gave this exact same messy code to every major AI coding model?

Claude Sonnet 4.5. GPT-5. Gemini 2.0 Pro. GitHub Copilot.

Same code. Same prompt. Let them refactor it.

And the results? Honestly shocked me.

Before I show you what happened, let me give you context.

Claude Sonnet 4.5 just launched recently, claiming to be "the best coding model in the world" with a 77.2% score on SWE-bench Verified.

That's higher than GPT-5 (72.8%), Claude Opus 4.1 (74.5%), and way higher than where these models were just months ago.

But benchmarks are one thing.

So I ran a simple test. One messy function. Four AI models. Let's see who actually delivers.

The Messy Code (What I Started With)

Here's the code I gave every model. It's a user data fetcher with validation. It works, but it's awful:

Yeah. It's bad.

The Prompt (Same For Everyone)

I kept it simple and identical for all four models:

"Refactor this code to be clean, maintainable, and production-ready. Improve error handling, readability, and follow best practices."

That's it. No special instructions. No hints. Just refactor it properly.

Now, let me show you what each model did.

AI NEWS AI NEWS AI NEWS:

Tools That Made Me Stop & Stare

1. ThumblifyAI: Make Thumbnails That Grab Attention

ThumblifyAI turns your ideas into clickable thumbnails in seconds, no design skills needed. Upload a sketch or describe what you want, and it handles layout, styling, and polish automatically. It also suggests color combinations, fonts, and layouts that perform best for YouTube and social media.

My take: For creators constantly fighting for eyeballs, this tool saves tons of time and stress. No more fumbling with Photoshop, just idea in, thumbnail out.

2. VidAU: Make video ads in minutes

VidAU makes video ads for you. Like, literally makes them. You just paste your product link from Amazon or Shopify. Then the AI creates a full video ad - with voice, text, everything. It speaks 40+ languages too! No video editing skills needed. No hiring expensive teams. The AI does it all in minutes.

My take: Perfect for people selling stuff online who want cool ads but don't know how to make videos. This is a lifesaver for small sellers! Making videos used to mean spending big money. Now, anyone can make pro-looking ads. Love it.

3. Scaloom: Reddit marketing on auto-pilot (without spamming)

Scaloom is like a robot that does Reddit marketing for you. It finds the right Reddit groups where your customers hang out. Then it watches your posts and replies to comments automatically - but in a helpful, natural way. It works 24/7, even when you're sleeping. No more spending hours on Reddit trying to promote your stuff.

My take: Reddit is exhausting to use for business. This tool saves so much time! For startups and solopreneurs, it’s like having a set-it-and-forget-it marketing engine on Reddit. Having an AI handle conversations smartly? Yes please.

Test 1: Claude Sonnet 4.5

Time taken: 8 seconds

What it did: Claude immediately added proper error handling, extracted validation logic, and included JSDoc comments.

You can get the full result here.

What I noticed:

  • Stayed in JavaScript (didn't assume TypeScript)

  • Extracted helper functions (isValidId, isValidUserData)

  • Proper async/await instead of promise chains

  • Custom error class with error codes

  • Re-throws errors for the caller to handle

  • Very detailed JSDoc comments

Grade: A

Test 2: GPT-5

Time taken: 11 seconds

What it did: GPT-5 immediately went to TypeScript, but went the safest route - returns null for everything, never throws.

You can get the full result here.

What I noticed:

  • Never throws errors, always returns null

  • Type checks the ID (typeof id !== "number")

  • Uses console.warn vs console.error appropriately

  • Inline validation (no separate function)

  • Very defensive programming style

Grade: B+

Test 3: Gemini 2.0 Pro

Time taken: 9 seconds

What it did: Gemini went full enterprise mode with numbered comments explaining each decision.

You can get the full result here.

What I noticed:

  • Stayed in JavaScript (didn't assume TypeScript)

  • Numbered comments teaching best practices

  • URL constant extraction

  • Optional chaining (?.)

  • Re-throws with context

  • Included usage examples

  • Educational approach

Grade: B

Test 4: GitHub Copilot

Time taken: 6 seconds (fastest)

What it did: Copilot went for maximum robustness with detailed validation.

You can get the full result here.

What I noticed:

  • Uses Number.isInteger() for stricter validation

  • Adds Accept header to fetch

  • Validates response shape

  • Separate email validation function with domain check

  • Detailed error messages with context

  • Returns null consistently

Grade: A+

Let me break down how each model approached the refactor:

What This Actually Tells Us

Benchmarks said Claude Sonnet 4.5 is the best coding model at 77.2% on SWE-bench.

But in this real test? GitHub Copilot produced the most production-ready code.

Why?

Because benchmark tests problem-solving. Real work tests judgment.

GitHub Colpilot understood that API calls need robust error handling. That returning error states is better than throwing exceptions. That email validation should be thorough.

Which One Actually Won?

1st Place: GitHub Copilot: I did NOT expect this.

2nd Place: Claude Sonnet 4.5: Claude optimized for flexibility. But the custom error class with error codes is genuinely useful for large apps.

3rd Place: Gemini 2.0 Pro: Gemini optimized for teaching. Best for learning. Perfect for junior devs or documentation. Not the cleanest code, but the most educational.

4th Place: GPT-5: Solid, safe, boring. Returns null everywhere, never throws. Did the job, nothing more.

All four models can refactor code.

But they refactor for different priorities.

And that means "which is best" is the wrong question.

The right question is: "Which fits how YOU work?"

Because at this level, they're all good enough.

We're not debating if AI can code anymore.

We're debating which AI codes the way we want.

That's a completely different conversation.

Until next time,
Vaibhav 🤝

If you read till here, you might find this interesting

#AD 1

This Technology Makes Every City a Potential Surf Destination

Topgolf revolutionized golf by turning it into a social, tech-driven game for anyone. And they’ve made billions in annual revenue doing it. Surf Lakes is applying that same model to surfing. Their patented tech creates 2,000 ocean-quality rides per hour, anywhere in the world, across all skill levels.

Surf tourism is a $65B global industry, yet fewer than 1% of people live near real waves. Licenses sold across the U.S. and Australia, with plans for a first commercial park in the works.

3x world champ Tom Curren and surf icon Mark Occhilupo have joined as ambassadors and shareholders. Even actor Chris Hemsworth has praised Surf Lakes.

You have until October 30th at 11:59 PM PT to invest in Surf Lakes.

This is a paid advertisement for Surf Lakes’ Regulation CF offering. Please read the offering circular at https://invest.surflakes.com

#AD 2

Are Your IT Costs Scaling Too Fast?

IT costs scale fast without standards. Deel’s free IT Policy Template helps reduce unnecessary spend, enforce best practices, and support compliance globally. Download it now to take control of your IT operations.

Keep Reading

No posts found