In partnership with

Which model release impressed you the most this week?

Login or Subscribe to participate

Grok 4.1 released hours before Gemini 3.0 last week.

Nobody noticed.

My Twitter feed was wall-to-wall Gemini coverage. We wrote about it too. Meanwhile Grok dropped, got buried, and everyone moved on.

That's a mistake.

I spent Tuesday night testing both.

Esp after seeing LMArena votes.

Here's what I found.

Before that, here's what's up in AI this week:

News News News:

TOOLS THAT CAUGHT MY ATTENTION

1. Bluma - Describe what you want, get short-form video ads. No timeline editing. Built for performance marketers testing dozens of variations.

2. Fifth Door - Analyzes your codebase, finds bugs and tech debt, generates PRs that fix them. Runs in your CI/CD pipeline.

3. Nessie - Upload an Excel sheet, get a production-ready web app with auth and workflows. No code.

The Test.

LMArena ranks both models highest in two categories: Text and Search.

Text = versatility, linguistic precision, cultural context. The stuff you use daily.

Search = real-time information, external knowledge, grounded citations. Google invented search. Their LLM has historically sucked at it.

Time to find out if that's changed.

Test 1: Text search

After a movie night disaster last weekend (don't ask), I needed this solved.

Prompt: Design a simple algorithm to schedule 5 friends' movie nights over a month, ensuring no two with conflicting tastes (e.g., Friend A hates rom-coms, Friend B loves sci-fi) overlap, while maximizing group size. Write it in pseudocode, then explain edge cases like last-minute cancellations

Grok had half the answer done before I switched tabs.

Both produced the same algorithm, both handled edge cases fine.

Output quality: tie.

But Grok's speed is genuinely startling.

Creative Writing

Been watching Pluribus on Apple TV so SciFi HAD to feature in my work somehow.

Prompt: Roleplay as a sarcastic alien ambassador negotiating peace with Earth leaders. The humans offer pizza as a gesture; respond in character to their proposal, weaving in cultural misunderstandings, witty banter, and a subtle threat disguised as a compliment. Keep it under 300 words.

Gemini phrases it as "I…" vs Grok narrating it in third person. I'm a fan of Grok's storytelling, the depth it has in context of the pizza itself - the way it describes the ingredient and their texture (cheese stretch like a dying star) is more fun to read.

Unfortunately I happened to read the better response first so Gemini felt underwhelming.

Grok destroys here. The cheese stretch like a dying star line alone wins it.

Gemini's fine. But fine doesn't cut it when you're comparing SOTA models.

Factual analysis and Long form content

I wanted to see how they handle complex, speculative planning.

Prompt: "You're consulting for a startup building network cities: hyper-connected urban systems where every building, vehicle, and citizen is part of a real-time data mesh powered by edge computing, 6G/7G, and city-wide AI. Brainstorm a plan to integrate vertical farms into these networked skyscrapers so agriculture becomes a native layer of the city's digital infrastructure. Cover feasibility (costs, tech), challenges (energy use, crop yields), and a phased rollout timeline. Use bullet points for clarity, then narrate a 'day in the life' of a resident benefiting from it"

This is the first part of the output. This is where Gemini shines and I was so impressed. It generated a diagram! So cool.

Grok focused a lot more on the numbers and as a stats nerd I am intrigued.

Broadly covered the same challenges. Gemini seemed a lot more optimistic with the timelines and phases, Grok felt RELATIVELY more realistic.

A day in the life of 2035. Both sold me a compelling narrative. Personally prefer Grok's way of narrating it but in terms of output it's a draw.

Overall result

Grok wins for me. Gemini has definitely come a long way from their past 2 editions and the rate of improvement is commendable but Grok is already there just about nicking it.

Test 2: Search

This is broadly to look for basic fact finding, and some level of research synthesis.

Current events:

Prompt: What's the latest on the volcano eruption in Ethiopia and the resuming of flight operations to and from india? Cite sources.

Gemini finally caught up in terms of speed so it lives up to the "Fast" toggle. Both showed up with answers almost immediately. Compared to Perplexity (personal experience) - both are clear winners.

Gemini's output felt more digestible. If someone's not fully aware it helps with all the context before talking about what's the latest update. Those even half informed would probably prefer Grok's output given it caters to the immediate requirement for the latest news but I like Gemini more.

Simple comparison

Prompt: Compare the battery life of OnePlus 15 vs. iQOO 15. Include specs and user reviews from the past month.

Gemini was FASTER this time. Both got the specs right (Gemini listed in more detail).

Both listed similar sources for user reviews. Gemini was more diplomatic with the conclusion, Grok was concise and opinionated.

Health advice

Prompt: What are the most effective home remedies for seasonal allergies in urban areas like New Delhi? Back with recent studies.

Gemini won in speed again. And the output and the structure of the answers are far more readable and digestible. Grok's answer was more comprehensive, contrary to the previous test.

Overall result

Gemini wins search. Faster, cleaner output, better structure.

Google's search reputation finally catching up to their LLM quality.

About time.

Closing thoughts

These models are neck and neck.

Grok's my daily driver for the next week. It deserves way more attention than it got.

Maybe releasing hours before Gemini was terrible timing.

Maybe everyone's burned from Twitter's algorithm changes.

Either way - Grok 4.1 is legit. Test it yourself.

Until next time,
Vaibhav 🤝🏻

If you read till here, you might find this interesting

#AD 1

Join Derek Jeter and Adam Levine

They’re both investors in AMASS Brands Group. You can join them and get up to 23% bonus stock. But only if you invest by Thursday, Dec. 4.

Why invest? They’re growing fast. Their brands cover everything from organic wine to protein seltzers. So with consumers seeking healthier options in the $900B beverage market, it’s no surprise AMASS has made over $80M to date, including 1,000% year-over-year growth.

They have even more ambitious plans for the future too. They’ve reserved the Nasdaq ticker $AMSS, enlisted a major investment bank to fuel their growth, and plan to 3X their retail footprint by 2028.

But your chance to amplify your investment with bonus stock ends soon. Become an AMASS Brands Group shareholder and secure your bonus stock by Dec. 4.

This is a paid advertisement for AMASS’s Regulation CF offering. Please read the offering circular at https://invest.amassbrands.com

#AD 2

Earn a master’s in AI for under $2,500

AI skills aren’t optional—they’re essential. Earn a Master of Science in AI, delivered by the Udacity Institute of AI and Technology and awarded by Woolf, an accredited institution. During Black Friday, lock in savings to earn this degree for under $2,500. Build deep AI, ML, and generative expertise with real projects that prove your skills. Take advantage of the most affordable path to career-advancing graduate training.

Keep Reading

No posts found