Despite deepfake audio tech, banks, ISPs push voice print authentication (2021) (keydiscussions.com)

It’s super interesting to me how the process of fully making audio/video searchable requires so much processing. Like, extracting the audio and video, transcribing the audio, chunking the video into 15-sec scenes and describing them visually, etc.

I wonder if as a test you could use the video descriptions, run them as a prompt through something like Veo, then stitch them together into something close to the original. Wild.

mkauffman23 · 4h ago

I have no idea how accurate the reconstruction would be but it would make for a wild experminent!

mkauffman23 · 6h ago

In this blog we detail the api design and technical decisions we made when adding audio video support to Ragie's RAG service. We explore some of the approaches we tried and the rationale behind what we landed on. Worth a read if you're building similar systems.

Here's a TLDR: - Built a full pipeline that processes audio/video → transcription + vision descriptions → chunking → indexing - Audio: faster-whisper with large-v3-turbo (4x faster than vanilla Whisper) - Video: Chose Vision LLM descriptions over native multimodal embeddings (2x faster, 6x cheaper, better results) 15-second video chunks hit the sweet spot for detail vs context - Source attribution with direct links to exact timestamps

Happy to answer any further questions folks might have!

bobremeika · 6h ago

Source attribution with direct links to exact timestamps is truly unique when it comes to A/V RAG solutions.

When Existence is Inefficient (2022) (inference-review.com)

Comment with your favorite local-first content (lofi.so)

The average Apple Watch user gets 49 minutes of deep sleep per night (empirical.health)

Windows 11 gets new Black Screen of Death, auto recovery tool (bleepingcomputer.com)

China begins building largest dam, fuelling fears in India (bbc.com)

Show HN: How Claude Code Improved My Dev Workflow

Despite deepfake audio tech, banks, ISPs push voice print authentication (2021) (keydiscussions.com)

The dangers of Musk's new, Manga-style [flirty] chatbot [video] (youtube.com)

Qwen3 – Coder (old.reddit.com)

Vector Tiles are deployed on OpenStreetMap.org (blog.openstreetmap.org)

How Silicon Valley is becoming militarized (english.elpais.com)

Show HN: How Claude Code Improved My Dev Workflow

Checklist Genie – Create Sharable Checklists with Just Your Voice and AI (checklistgenie.app)

Qwen3-Coder: Agentic Coding in the World (qwenlm.github.io)

Ask HN: A Reddit UI where all writing is done by an AI?

Show HN: A CLI tool for creating Typst screenplay projects (github.com)

Hackers Behind $140M Brazil Banking Heist Turn to Crypto to Launder Their Loot (coindesk.com)

RFC 1392: Internet Users' Glossary (rfc-editor.org)

A power utility is reporting suspected pot growers to cops. EFF says illegal (arstechnica.com)

SmoothCSV: The CSV Editor (smoothcsv.com)

Ask HN: Can You Buy Your Way into Your Dream Job?

SWE-Bench Verified Is Flawed Despite Expert Review (ddkang.substack.com)

Migrating to AWS in production with zero downtime (loops.so)

Show HN: Free crypto screener for Binance, Bybit, OKX and Coinbase (no login) (devisecrypto.com)

NPM 'Is' Package Hijacked in Expanding Supply Chain Attack (socket.dev)

If Coding Agents Were Rappers (install.md)

UFOs once took control of Russian ICBMs, nearly caused WW3 – testimony (jpost.com)

Andrej Karpathy – The append-and-review note (karpathy.bearblog.dev)

Vibe Coding an SMTP Server, in Rust (mailpace.com)

Build, Learn, Delete, Repeat (ymichael.com)

Veles, Google's open source secret scanner (opensource.googleblog.com)

Show HN: Let ChatGPT Plus control any Python or JavaScript object in 3 lines (chatgpt.com)

Leak: Anthropic Says the Company Will Pursue Gulf State Investments After All (wired.com)

Thursday Is Durable Computing Day

A Quick(ish) Introduction to Tuning Postgres (byteofdev.com)

Ask HN: Can we better use heat from data centers?

Distribution Package vs. Import Package (packaging.python.org)

Burning Man Festival Is Burning Through Cash (bloomberg.com)

MCK: Open-Source MongoDB Operator (github.com)

ΜFork: A pure actor-based concurrent machine architecture with memory-safety an (ufork.org)

Study: How American Consumers Are Using AI (joeyoungblood.com)

Why "How many tennis balls fit in a bus?" is a good interview question (medium.com)

Amazon buys Bee AI wearable that listens to everything you say (theverge.com)

Inheritance over Composition, Sometimes (death.andgravity.com)

Show HN: Featurevisor v2.0 – declarative feature flags management with Git (featurevisor.com)

Crowdfunding Success – Was it worth it? (atomic14.substack.com)

Show HN: It's Like FIFA for Developers 1vs1 Code Battle (battlegpt.website)

Why everyone is probably wrong about AI (greyenlightenment.com)

Brave Browser Blocks Windows Recall (neowin.net)

Taiwan is creating an offshore wind industry to fuel its semiconductor factories (restofworld.org)

We built audio/video RAG

Comments (4)