Show HN: Chaining VEO 2's 8-second clips into 2-minute videos with sync audio

2 abilafredkb 0 5/25/2025, 9:15:24 AM storage.googleapis.com ↗
The technical challenge: VEO 2 only generates 8-second clips with no audio. I needed to create longer marketing videos, so I built a system that chains multiple clips into cohesive 2-minute videos with synchronized narration. The core technical problems I solved: 1. Script Continuity Across Clips VEO 2 doesn't maintain context between generations. I built a script parser that breaks longer narratives into 8-second segments while maintaining visual and narrative continuity. Each segment gets contextual prompts that reference the previous clip's end state. 2. Seamless Visual Transitions To avoid jarring cuts between clips, I analyze the final frame of each segment and use it to inform the opening prompt of the next segment. This creates natural visual flow across the 8-second boundaries. 3. Audio Synchronization Pipeline Since VEO 2 is silent, I integrated Google's Gemini Flash audio model with a custom FFmpeg pipeline:

Parse the script timing Generate audio segments that match video pacing Use FFmpeg to sync audio with video transitions Handle audio crossfades between clip boundaries

4. Cost Optimization At current VEO 2 pricing, a 2-minute video costs ~$60 in API calls. I built request batching and caching to minimize redundant generations. The system can theoretically generate longer videos, but I capped it at 2 minutes due to cost considerations. This powers the video generation feature in my marketing tool Smarketly, but the core chaining technique could work for any VEO 2 application needing longer content. Technical questions I'm still working on:

Better frame-to-frame consistency algorithms More efficient prompt engineering for visual continuity

Happy to share code snippets or discuss the technical implementation!

But you can check what i'm building this on by visiting https://smarketly.lema-lema.com

Comments (0)

No comments yet