In partnership with

Every AI image model has had the same text-to-image conversion problem.

FireRed dropped recently and claimed to have finally solved it (we’ve been hearing for months now).

So I put it to the test and want to show you what came out.

But first, let's catch up on AI this week:

NEWS

TOOLS

1. Balance: An AI-native accounting firm that manages bookkeeping, taxes, and financial operations for small and mid-sized businesses. It aims to replace fragmented finance tools and manual workflows with a unified, automated AI system.

2. Clam: A semantic firewall designed to monitor and control what AI agents can access and execute across systems. Built for the rise of autonomous agents, it focuses on permissions, safety, and guardrails for real-world AI workflows.

3. Piris Labs: Infrastructure for ultra-fast AI inference aiming to dramatically reduce latency and speed up how models run in production. The startup focuses on optimizing performance across hardware and software so AI applications can operate closer to real time at scale.

The Tests

The three models in the arena today:

Qwen Image 2.0: It is free and requires no signup.

FireRed-Image-Edit: It is open-source, built by Xiaohongshu's AI team (China's Instagram x Pinterest, 300M+ users).

GPT Image 1.5: the default for most people.

All five test prompts are at the end, along with a scoring framework so you can run these tests on any model yourself.

Test 1: The vintage bookshop window

A rain-soaked bookshop window at night….

The trick here is the neon sign should appear reversed in the window reflection.

FireRed got the atmosphere right, and its warm amber glow looks beautiful. But the neon reads forward instead of reversed.

Good vibes but bad accuracy.

Qwen did the opposite, and it is the only model that understood how reflections work.

GPT Image 1.5 tried to reverse the neon but scrambled the letters. The card is clean though.

Winner: Qwen

Test 2: The architect's desk

Overhead shot of an architect's desk, late afternoon sun casting…

Five different text elements on five different surfaces.

FireRed nailed the lighting but I could barely read any of the text.

Qwen got everything readable except two tiny typos. Very close.

GPT got every word right, and its shadow play is the most photorealistic of the three. You could drop this straight into a pitch deck.

Winner: GPT Image 1.5.

Test 3: The Tokyo street food stall

A Tokyo yatai (street food stall) at 11 PM. The vendor is an…

The kanji on the lanterns is the tell. Real characters or random strokes pretending to be Japanese?

FireRed got the mood right but the kanji is gibberish. Random strokes that look Japanese from a distance.

Qwen felt like a documentary photo. Three salary-men with loosened ties, beer cans on the counter. This looks like someone actually took this at 11 PM in Yurakucho.

Best composition by far.

GPT surprised me. The lantern reads yakitori in actual hiragana, contextually correct Japanese and you can see individual coals glowing on the grill.

Winner: Tie

Test 4: The Havana taxi

A 1957 Chevrolet Bel Air taxi parked on a narrow Havana street at…

The taxi door lettering should look hand-painted. And I wanted to see if any model could render the Coca-Cola ghost sign bleeding through the revolutionary mural.

FireRed got the brush strokes right. It is uneven and imperfect, they actually look hand-painted.

It understood the aesthetic but not the words.

Qwen got every word correct.

GPT was close. Has the richest Kodachrome color though.

Winner: Qwen

Test 5: The vinyl record store

Interior of a cramped vinyl record store at night. Wooden crates overflowing with…

Seven text elements across five surfaces. Genre dividers, record label, neon sign, chalkboard with three albums and prices, and door poster.

FireRed got "JAZZ" and "SOUL" on the dividers. That's it. Seven elements was too many for it.

Qwen hit almost everything, and only missed the record label.

GPT got the neon and most genres. But the chalkboard you have to squint at, the record label is incorrect, and there's no door poster.

Winner: Qwen

My take

FireRed made the most beautiful failures I've seen.

Atmosphere and accuracy are still two different pipelines inside these models.

Text rendering is a separate problem altogether, but nobody is treating it like one.

There's also a complexity ceiling none of them have cracked yet.

  • GPT Image 1.5 nailed five text elements on the architect's desk. but failed to do seven.

  • Qwen handled seven in the vinyl store but missed the record label.

These models compress the prompt into a vibe and hope the details survive. Past a certain point, they don't.

The free model won, mainly because the Havana taxi shot from Qwen is the best image in this entire piece, and it cost nothing.

I don't know what to make of that yet.

The architect's desk by GPT Image 1.5 is untouchable but when your $20/month Pro tier loses to a free model on text accuracy, the gap isn't where most people think it is.

The model that merges Qwen's accuracy with GPT's polish wins the whole thing.

I'm sharing all five prompts along with a scoring framework and comparison template so you can run these tests yourself on these models or whatever drops next month.

Until next time,
Vaibhav 🤝

If you read till here, you might find this interesting

#AD1

The Lithium Boom Is Heating Up

Lithium stock prices grew 2X+ from June to January. $ALB climbed 227%. $LAC hit 151%. $SQM, 159%. But the real winner may be a private stock, EnergyX. Their tech can recover 3X more lithium than traditional methods, leading General Motors to invest. Now they’re preparing to unlock up to 9.8M tons of lithium. Buy private EnergyX shares alongside 40k+ people before EnergyX’s share price increases after 2/26.

This is a paid advertisement for EnergyX Regulation A offering. Please read the offering circular at invest.energyx.com. Under Regulation A, a company may change its share price by up to 20% without requalifying the offering with the Securities and Exchange Commission.

#AD2

What 100K+ Engineers Read to Stay Ahead

Your GitHub stars won't save you if you're behind on tech trends.

That's why over 100K engineers read The Code to spot what's coming next.

  • Get curated tech news, tools, and insights twice a week

  • Learn about emerging trends you can leverage at work in just 10 mins

  • Become the engineer who always knows what's next

Reply

Avatar

or to participate

Keep Reading