Quick poll, do you think this is AI? |
Answer? You guessed it right.
The crazy part is that it was done just using an image.
HeyGen launched Avatar IV claiming to do photo-to-video, scripting, and voice cloning in one platform.
Sounded like typical SaaS marketing. Everything promises to be the 'all-in-one solution.'
Thirty minutes later, I had a video that looked like I spent a day filming in a professional studio. Same result, 90% less hassle.
Before we look at the tutorial, let’s look at what’s new in the AI world.
1. Andreessen Horowitz Admits Google and Grok Might Beat ChatGPT
A16z's latest AI report shows Gemini reached No. 2 on mobile with nearly half of ChatGPT's users, while Grok hit 20 million monthly users despite launching as a standalone app just months ago. Google now has four products in the top AI tools list. Meanwhile, Chinese AI tools dominate 22 of the top 50 mobile AI apps, though only 3 are primarily used in China.
My take: The narrative is shifting from "ChatGPT owns everything" to "everyone's catching up fast." Grok's growth from zero to 20 million users in months proves that distribution beats technology, being integrated with X/Twitter gives them instant access to millions of users.
Read more
2. NVIDIA Drops Thor Hammer on Robotics (While Everyone's Arguing About AI Safety)
NVIDIA just launched Jetson Thor, their Blackwell-powered supercomputer-on-a-chip designed to give robots human-level AI processing power. The platform promises to handle everything from humanoid robots to autonomous vehicles, all while consuming less power than your gaming laptop.
My take: NVIDIA is basically saying "forget ChatGPT, we're building the brains for physical AI." While everyone debates whether AI will take jobs, NVIDIA is literally building the hardware to put robots in those jobs. The timing is perfect - as labor costs skyrocket globally, Thor makes humanoid workers economically viable.
3. Google Translate Goes Full Polyglot Tutor (Because Learning Languages is "Too Hard")
Google just turned Translate into a duolingo competitor. The AI now offers real-time feedback on pronunciation, grammar, and context, essentially replacing human tutors with a pocket polyglot.
The feature supports 20+ languages and uses advanced speech recognition to coach you through conversations, complete with cultural context and idiomatic corrections.
My take: Why pay for Duolingo Premium when Google gives you a native speaker in your pocket for free? Who needs human teachers when AI never gets tired, never judges your accent, and costs nothing? The irony? As AI removes language barriers entirely, we're all racing to learn languages we might not even need.
1. Conductor: Claude Code's Army Commander
Conductor lets you run multiple Claude Code agents in parallel, each working on different parts of your project simultaneously. You can see which Claude is working, which one's stuck, and what changes each has made in real-time.
My take: It's like having a development team that works 24/7 and doesn't need coffee breaks. At around $20/month (uses your existing Claude subscription), it's cheaper than hiring one junior developer.
2. Gemini 2.5 Flash Image - Google's Answer to the "Nano Banana" mystery
Google finally revealed that "nano banana" - is their new Gemini 2.5 Flash Image model. (although I told you days ago).
It does everything from character consistency across images to targeted photo editing with natural language commands. Priced at $0.039 (almost 3 rupees) per generated image, it handles multi-image fusion, maintains brand consistency, and even understands hand-drawn diagrams.
My take: I feel scared for legacy tools like photoshop. The character consistency feature alone is worth the price. No more explaining to Midjourney what your brand mascot looks like for the 50th time.
3. FirstAnswer.ai - SEO for the AI Search Era
FirstAnswer monitors how your brand appears in AI responses across ChatGPT, Perplexity, Google AI Overviews, and Copilot. It tracks citation frequency, compares you with competitors, and shows which content types AI platforms favor most.
Think of it as "GEO" (Generative Engine Optimization) - the new SEO for AI-powered search.
My take: Everyone's optimizing for Google while AI search is eating their lunch. This tool shows you the truth - if your brand isn't mentioned in AI responses, you're invisible to the next generation of searchers.
Coming back to our Heygen story…
I was spending ₹5,100/month across four different tools just to create one weekly video. But more than the cost, the frustrating part is juggling between multiple tools and the time it takes.
HeyGen's Avatar IV promised to handle everything in one platform. Photo-to-video, script generation, voice cloning, even lip-sync.
Could it actually match the quality of my specialized tools?
Go to https://app.heygen.com/home and click on ‘Photo to Video with Heygen’
Step 1: Photo Setup (2 minutes)
The key is nailing the initial photo. HeyGen needs:
Well-lit face (ring light or window light works)
Neutral background (avoid busy patterns)
Camera at eye level
Clear view of face and shoulders
At least 720p resolution
I used my standard headshot - the same one from my newsletter header. No special preparation needed for me.
Step 2: Voice Training (3 minutes)
Instead of ElevenLabs' complex voice cloning process, HeyGen makes it simple:
Record 30 seconds of natural speech
Vary your pace and tone
Include a few head turns and blinks
Speak clearly but conversationally
The AI learns your timing and micro-expressions from this short sample. Much faster than ElevenLabs
Step 3: Script Generation (Built-in)
Here's where HeyGen surprised me. Instead of jumping to Claude, I used their built-in script assistant:
Enter your topic and target audience
Choose tone (professional, casual, educational)
Set desired length
Get a complete script with natural transitions
The output quality was 85% as good as Claude's scripts. For most use cases, that's enough.
Step 4: Video Generation (5 minutes)
Paste/Record your script.. and that’s all about the setup
Just click on generate and let the magic begin!
HeyGen renders everything automatically:
Perfect lip-sync to your voice
Natural head movements and eye contact
Consistent lighting and background
Professional-quality output
And there you go…the video is ready in 2 minutes!
I can now create personalized video responses at scale. Instead of sending the same emails, I generate custom videos for different scenarios.
For clients, students and even my team. Each one looks like a personal recording. Win-win!
The Three Mistakes That Will Ruin Your Results
Mistake #1: Rushing the Photo Setup Bad lighting or camera angles create weird shadows and unnatural expressions. Spend 5 minutes getting this right - it affects every video you make.
Mistake #2: Reading Scripts Like a Robot Your voice sample teaches HeyGen how you naturally speak. If you sound monotone during training, every video will be boring. Record like you're explaining something to a friend.
Mistake #3: Expecting Perfection on Complex Scripts HeyGen handles straightforward explanations perfectly. But complex technical terms, rapid-fire delivery, or emotional content can look artificial. Keep scripts conversational and paced normally.
HeyGen isn't perfect, but it's 90% as good as my four-tool setup at 25% of the cost. For entrepreneurs creating regular video content, that math is hard to ignore.
The real advantage isn't just saving money - it's removing friction. When creating a video takes 15 minutes instead of 2 hours, you actually do it consistently.
Reply to this newsletter with one video you've been putting off because the process is too complicated. I'll create it for the first 20 people who reply!
Waiting to read your responses!
Until next time,
Vaibhav 🤝
If you read till here, you might find this interesting
#AD
Typeless turns your raw, unfiltered voice into beautifully polished writing - in real time.
It works like magic, feels like cheating, and allows your thoughts to flow more freely than ever before.
Your voice is your strength. Typeless turns it into a superpower.
A new trend in real estate is making the most expensive properties obtainable. It’s called co-ownership, and it’s revolutionizing the $1.3T vacation home market.
The company leading the trend? Pacaso. Created by the founder behind a $120M prior exit, Pacaso turns underutilized luxury properties into fully-managed assets and makes them accessible to the broadest possible market.
The result? More than $1B in transactions and service fees, 2,000+ happy homeowners, and over $110m in gross profit to date for Pacaso.
With rapid international growth and 41% gross profit growth last year alone, Pacaso is hitting their stride. They even recently reserved the Nasdaq ticker PCSO.
The same VCs that backed Uber, eBay, and Venmo also backed Pacaso. Join them as a Pacaso shareholder before the opportunity ends September 18.
Paid advertisement for Pacaso’s Regulation A offering. Read the offering circular at invest.pacaso.com. Reserving a ticker symbol is not a guarantee that the company will go public. Listing on the NASDAQ is subject to approvals.