Eating Cap'n Crunch (akkartik.name)

As I've said several times, the corpus is key: LLMs thus far "read" most anything, but should instead have well-curated corpora. "Garbage In, Garbage Out!(GIGO)" is the saying.

While the Harry Potter series may be fun reading, it doesn't provide information about anything that isn't better covered elsewhere. Leave Harry Potter for a different "Harry Potter LLM".

Train scientific LLMs to the level of a good early 20th century English major and then use science texts and research papers for the remainder.

alephnerd · 20m ago

> While the Harry Potter series may be fun reading, it doesn't provide information about anything that isn't better covered elsewhere

It has copyright implications - if Claude can recollect 42% of a copyrighted product without attribution or royalties, how did Anthropic train it?

> Train scientific LLMs to the level of a good early 20th century English major and then use science texts and research papers for the remainder

Plenty of in-stealth companies approaching LLMs via this approach ;)

For those of us who studied the natural sciences and CS in the 2000s and early 2010s, there was a bit of a trend where certain PIs would simply translate German and Russian papers from the early-to-mid 20th century and attribute them to themselves in fields like CS (especially in what became ML).

ninetyninenine · 1m ago

So if I memorized Harry Potter the physical encoding which definitely exists in my brain is a copyright violation?

weird-eye-issue · 4m ago

Why are you talking about Claude and Anthropic?

Eating Cap'n Crunch (akkartik.name)

Permission Application in HarmonyOS (github.com)

Powerful Orchestration, Everything as Code (kestra.io)

Extended Thinking Tips (docs.anthropic.com)

Claude Code Tips (spiess.dev)

Show HN: Hackernews Clone (The 1001st) (news.expatcircle.com)

OneZoom Tree of All Life (onezoom.org)

Show HN: Missing slash-command package for Emacs (github.com)

The Hewlett-Packard Archive (hparchive.com)

A $3,600 luxury keyboard for the keyboard obsessed (theverge.com)

It's Not Just for Your Brain: Meditating Can Change Your DNA (fastcompany.com)

Engineer creates first custom motherboard for 1990s Playstation console (arstechnica.com)

How to protect your 23andMe genetic data (apnews.com)

Why Claude's Comment Paper Is a Poor Rebuttal (victoramartinez.com)

Meta's Llama 3.1 can recall 42 percent of the first Harry Potter book (understandingai.org)

Trump team leaks AI plans in public GitHub repository (theregister.com)

Generalist AI doesn't scale (2024) (daemonology.net)

Show HN: Turn any video into searchable notes in your Obsidian vault (hovernotes.io)

League of Professional System Administrators Board to Dissolve Organization (lopsa.org)

Facial recognition error sees woman accused of theft (bbc.com)

Embedding Godot games in iOS apps is easy now (christianselig.com)

Nvidia's CoreWeave position alone would be among most profitable US companies (sherwood.news)

Would you switch browsers for a chatbot? (theverge.com)

MI6 appoints first female chief in 116-year history (bbc.co.uk)

ClackyAi：Your Agentic Coding Studio, Prototype, Refine, Collaborate and Evolve (clacky.ai)

"Don't Mock What You Don't Own" in 5 Minutes (hynek.me)

Cmapv2: A high performance, concurrent map (github.com)

Show HN: LinkedIn Data Extraction Services (twitter.com)

DevTUI – A Swiss-army app for developers (devtui.com)

People can be identified by their breathing patterns with 97% accuracy (livescience.com)

Apple-on-device-OpenAI: OpenAI-compatible API server for Apple on-device models (github.com)

New model helps to figure out which distant planets may host life (space.com)

Preparation of a neutral nitrogen allotrope hexanitrogen C2h-N6 – Nature (nature.com)

Is Gravity Just Entropy Rising? Long-Shot Idea Gets Another Look (quantamagazine.org)

Setting up a smooth i3 window manager experience in WSL(G) (perweij.gitlab.io)

Embabel-agent: Agent framework for the JVM from the creator of Spring (github.com)

Elvis Act (en.wikipedia.org)

Koch's Postulates (en.wikipedia.org)

The Peacock's Dilemma (domofutu.substack.com)

Oblivion Remastered is selling new horse armor like it's 2006 (pcgamer.com)

Portable EPUBs (2024) (willcrichton.net)

Listen to This Harrowing Audio of B-52s Bombing Hanoi [video] (youtube.com)

I Miss the Internet (2024) (joanwestenberg.com)

Jokes and Humour in the Public Android API (voxelmanip.se)

Russian Troops Are Lobbing Chemical Rockets in Eastern Ukraine (daxe.substack.com)

Fowler Museum at UCLA returns Larrakia cultural objects to Australia (newsroom.ucla.edu)

Show HN: We built a customer intelligence platform that replaces 5 tools (crowdapp.io)

How to validate scientific ideas outside academia? (academia.stackexchange.com)

Assessing grand narratives of economic inequality across time (pnas.org)

MI6 Hires M (washingtonpost.com)

Meta's Llama 3.1 can recall 42 percent of the first Harry Potter book

Comments (5)