Claude Code: Now in Beta in Zed (zed.dev)

387 points by meetpateltech 6h ago 273 comments

The Bitter Lesson Is Misunderstood (obviouslywrong.substack.com)

24 points by JnBrymn 6d ago 1 comments

Nuclear: Desktop music player focused on streaming from free sources (github.com)

198 points by indigodaddy 5h ago 120 comments

What Is It Like to Be a Bat? (en.wikipedia.org)

76 points by adityaathalye 4h ago 80 comments

We're Joining OpenAI (alexcodes.app)

65 points by liurenju 3h ago 50 comments

Where's the Shovelware? Why AI Coding Claims Don't Add Up (mikelovesrobots.substack.com)

8 points by dbalatero 34m ago 0 comments

Poor man's bitemporal data system in SQLite and Clojure (evalapply.org)

98 points by adityaathalye 4h ago 21 comments

Depot (YC W23) Is Hiring a Solutions Engineer (Remote US and Canada) (ycombinator.com)

1 points by kylegalbraith 52m ago 0 comments

Writing a C compiler in 500 lines of Python (2023) (vgel.me)

104 points by ofou 5h ago 18 comments

Microsoft BASIC for 6502 Microprocessor – Version 1.1 (github.com)

194 points by marvinborner 4h ago 123 comments

VibeVoice: A Frontier Open-Source Text-to-Speech Model (microsoft.github.io)

360 points by lastdong 11h ago 141 comments

Speeding up PyTorch inference on Apple devices with AI-generated Metal kernels (gimletlabs.ai)

110 points by nserrino 4h ago 14 comments

OSMAnd vs. Organic Maps (blog.firedrake.org)

46 points by icheyne 2d ago 18 comments

The wall confronting large language models (arxiv.org)

82 points by PaulHoule 10h ago 40 comments

Understanding Transformers Using a Minimal Example (rti.github.io)

111 points by rttti 6h ago 5 comments

Tufte CSS (edwardtufte.github.io)

109 points by avinassh 3h ago 27 comments

Voyager – An interactive video generation model with realtime 3D reconstruction (github.com)

284 points by mingtianzhang 10h ago 199 comments

6NF File Format (habr.com)

39 points by sergeyprokhoren 4h ago 4 comments

The worst possible antitrust outcome (pluralistic.net)

66 points by leotravis10 1h ago 8 comments

A Random Walk in 10 Dimensions (2021) (galileo-unbound.blog)

81 points by just_human 6h ago 13 comments

Lessons from building an AI data analyst (pedronasc.com)

47 points by pedromnasc 2d ago 6 comments

The Theoretical Limitations of Embedding-Based Retrieval (alphaxiv.org)

22 points by sonabinu 2h ago 3 comments

Launch HN: Risely (YC S25) – AI Agents for Universities

34 points by danialasif 6h ago 18 comments

Building the most accurate DIY CNC lathe in the world [video] (youtube.com)

117 points by pillars 8h ago 41 comments

Lively Linear Lisp (1992) [pdf] (cs.utexas.edu)

14 points by todsacerdoti 2d ago 0 comments

The 16-year odyssey it took to emulate the Pioneer LaserActive (readonlymemo.com)

237 points by LaSombra 11h ago 51 comments

The Sudden Surges That Forge Evolutionary Trees (quantamagazine.org)

30 points by rbanffy 1d ago 6 comments

John Coltrane's Tone Circle (roelsworld.eu)

130 points by jim-jim-jim 9h ago 49 comments

Ask HN: Gandi is holding my domain hostage. What can I do?

41 points by ohgandihelpme 3h ago 14 comments

UK Electricity Generation Map (energydashboard.co.uk)

140 points by zeristor 11h ago 85 comments

This Page Is a Quine (2021) (pranavg.me)

37 points by ycombyourhair 3d ago 7 comments

How to Give a Good Talk (blog.sigplan.org)

184 points by pykello 4d ago 65 comments

Magic Lantern Is Back (magiclantern.fm)

463 points by felipemesquita 3d ago 149 comments

PFP: A Probabilistic Functional Programming Library for Haskell (2006) (web.engr.oregonstate.edu)

26 points by alhazraed 3d ago 2 comments

Gleam Gathering 2026 (gleamgathering.com)

53 points by crowdhailer 2h ago 22 comments

Vector search on our codebase transformed our SDLC automation (medium.com)

29 points by antonybrahin 1d ago 6 comments

Abstract Machine Models (2022) (dr-knz.net)

72 points by mustache_kimono 3d ago 7 comments

Finding thousands of exposed Ollama instances using Shodan (blogs.cisco.com)

143 points by rldjbpin 13h ago 64 comments

Who Owns, Operates, and Develops Your VPN Matters (opentech.fund)

150 points by sdsantos 5h ago 132 comments

Kernel-hack-drill and exploiting CVE-2024-50264 in the Linux kernel (a13xp0p0v.github.io)

213 points by r4um 14h ago 35 comments

Show HN: Entropy-Guided Loop – How to make small models reason (github.com)

25 points by andrewmonostate 4h ago 0 comments

Airbus B612 Cockpit Font (github.com)

123 points by Bogdanp 7h ago 80 comments

Tesla moves 'Robotaxi' safety monitor from passenger to driver's seat (electrek.co)

16 points by TheAlchemist 1h ago 4 comments

We already live in social credit, we just don't call it that (thenexus.media)

447 points by natalie3p 1d ago 570 comments

Comic Sans typeball designed to work with the IBM Selectric typewriters (printables.com)

142 points by Sami_Lehtinen 18h ago 33 comments

Ask HN: Difficult Interview Question

9 points by ransom1538 1h ago 8 comments

MIT Study Finds AI Use Reprograms the Brain, Leading to Cognitive Decline (publichealthpolicyjournal.com)

487 points by cainxinth 9h ago 482 comments

TPDE-LLVM: Faster LLVM -O0 Back-End (discourse.llvm.org)

154 points by mpweiher 4d ago 68 comments

Sharing a mutable reference between Rust and Python (blog.lilyf.org)

31 points by Bogdanp 8h ago 4 comments

With AI Boom, Dell's Datacenter Biz Is Finally Bigger Than Its PC Biz (nextplatform.com)

94 points by rbanffy 4d ago 77 comments

The Theoretical Limitations of Embedding-Based Retrieval

22 sonabinu 3 9/3/2025, 7:09:29 PM alphaxiv.org ↗

Comments (3)

gmueckl · 2h ago

Why not link the paper on arxiv? https://arxiv.org/abs/2508.21038

forks · 2h ago

So, back to BM25?

dmezzetti · 1h ago

This paper has been misrepresented many times. At then end it says:

Multi-vector models

Multi-vector models are more expressive through the use of multiple vectors per sequence combined with the MaxSim operator [Khattab and Zaharia, 2020]. These models show promise on the LIMIT dataset, with scores greatly above the single-vector models despite using a smaller backbone (ModernBERT, Warner et al. [2024]). However, these models are not generally used for instruction-following or reasoning-based tasks, leaving it an open question to how well multi-vector techniques will transfer to these more advanced tasks.

Sparse models

Sparse models (both lexical and neural versions) can be thought of as single vector models but with very high dimensionality. This dimensionality helps BM25 avoid the problems of the neural embedding models as seen in Figure 3. Since the of their vectors is high, they can scale to many more combinations than their dense vector counterparts. However, it is less clear how to apply sparse models to instruction-following and reasoning-based tasks where there is no lexical or even paraphrase-like overlap. We leave this direction to future work.

In other words, it says that both multi-vector (i.e. late interaction) and sparse models hold promise.