Apache SeaTunnel has surpassed 8,500 GitHub Stars (github.com)

Just curious, are there any techniques other than using embeddings, computing cosine similarity, and sorting the results based on that? RRF could be used but again its very simple as well.

sitkack · 2h ago

embedding search via https://searchthearxiv.com/ takes either a word vector, or an abs or pdf link to an arxiv paper.

https://news.ycombinator.com/item?id=42519487

I just did a spot check, I think searchthearxiv search results are superior.

masterjack · 2h ago

There’s also the search and browsing on https://sugaku.net, it’s more focused on math but does also have all of the arxiv on it

0101111101 · 2h ago

Looks cool! You can input either a search query or a paper URL on arxiv xplorer. You can even combine paper URLs to search for combinations of ideas by putting + or - before the URL, like `+ 2501.12948 + 1712.01815`

elliotec · 4h ago

This is really cool, and very relevant to something I'm working on. Would you be willing to do a quick explanation of the build?

0101111101 · 2h ago

Sure! I first used openai embeddings on all the paper titles, abstracts and authors. When a user submits a search query, I embed the query, find the closest matching papers and return those results. Nothing too fancy involved!

I'm also maintaining a dataset of all the embeddings on kaggle if you want to use them yourself: https://www.kaggle.com/datasets/tomtum/openai-arxiv-embeddin...

heisenburgzero · 4m ago

So did you just combine Title+Abstracts+Authors into a single chunk and embed them or embedded them individually?

madars · 3h ago

Looks great! Could you add eprint.iacr.org (Cryptology ePrint Archive)?

0101111101 · 2h ago

Do they have a public API/dataset?

madars · 1h ago

They have RSS feeds for new/updated papers: https://eprint.iacr.org/rss/

bbor · 3h ago

Oh god, there's a medrxiv?? TIL...

Don't forget chemrXiv!

0101111101 · 2h ago

Sadly I couldn't find a public API for chemrxiv, but would be happy to be proven wrong!

Apache SeaTunnel has surpassed 8,500 GitHub Stars (github.com)

At Least Two Newspapers Syndicated AI Garbage (theatlantic.com)

Proposal for Standardized JSX (vanillajsx.com)

FastMCP v2 – now defaults to streamable HTTP with SSE fallback (github.com)

The Dangers of Browsing AI Agents (arxiv.org)

TeleMessage 410GB dump available to journalists (ddosecrets.com)

How Music Apps Die - The Design of Finale [video] (youtube.com)

Microsoft-backed UK tech unicorn Builder.ai collapses into insolvency (ft.com)

What if Vintage and Modern got together (jaydip.me)

I made a system to abolish subscriptions (joinares.com)

At Google I/O, everything is changing and normal and scary and chill (platformer.news)

Astronomy: Time Is an Angle (oliverkwebb.github.io)

Show HN: Toffu AI is a Vibe Marketing agent (toffu.ai)

Teen swimmer caught in rip current rescued by drone [video] (youtube.com)

Build with Jules, your asynchronous coding agent (blog.google)

Code Improvement Practices at Meta (arxiv.org)

Relume (relume.io)

Ask HN: Trivial things that you have weirdly strong opinions about

Magnus Carlsen forced into a draw by more than 143000 people playing against him (apnews.com)

Good Design Comes from Looking, Great Design Comes from Looking Away (chrbutler.com)

A broken thruster jeopardized Voyager 1, but engineers executed a remote fix (npr.org)

Waymo says it reached 10M robotaxi trips, doubling in five months (cnbc.com)

The Agentic Web and Original Sin (stratechery.com)

AI could keep us dependent on natural gas for decades to come (technologyreview.com)

AI in Search: Going beyond information to intelligence (blog.google)

You Won't Learn This in School: Disabling Kernel Functions in Your Process(2009) (chadaustin.me)

The data center boom in the desert (technologyreview.com)

Gemma 3n (deepmind.google)

Firebase MCP Server (firebase.blog)

Ask HN: Are AI Agents a Lie?

The unlikely rise of the Indian space program [video] (youtube.com)

Windows ML: The future of machine learning development on Windows (blogs.windows.com)

A Secret Trove of Rare Guitars Heads to the Met (newyorker.com)

Show HN: Interactive AI vibe analysis and edit agent for connected data (trynexus.io)

New Microbe Discovered Aboard Chinese Space Station (newsweek.com)

Adreno Control Panel for Devices with Snapdragon X Elite (qualcomm.com)

Wolfspeed prepares to file for bankruptcy within weeks (reuters.com)

Practical AI techniques for daily engineering work (seangoedecke.com)

Show HN: Gen-ts-type – Code Generate TS Type from JSON data with collapsed field (gen-ts-type.surge.sh)

Mice use chemical cues such as odours to sense social hierarchy (crick.ac.uk)

Obesity and Efforts to Lose Weight (1992) (nejm.org)

Fat 'remembering' past obesity drives yo-yo diet effect, say experts (theguardian.com)

Has AI generated a new treatment for blindness? (twitter.com)

Chromaplane Unlocked: The Electromagnetic Synth You Must Try [video] (youtube.com)

Understanding How Violet Light Can Stop Myopia Progression (bme.gatech.edu)

Apple Turnaround (hypercritical.co)

The One-Tree Website (ratfactor.com)

Stop Calling Everything a Painkiller (0toreal.com)

Unlocking hidden powers in Xtensa based Qualcomm ath10k WiFi chips (forum.defcon.org)

Sokol: Cross-platform libraries for C, C++, and Wasm, written in C (github.com)

Semantic search engine for ArXiv, biorxiv and medrxiv

Comments (12)