A love letter to the CSV format (2024)

52 jordigh 53 9/10/2025, 9:27:13 AM medialab.sciencespo.fr ↗

Comments (53)

joz1-k · 43m ago

Except that the "comma" was a poor choice for a separator, the CSV is just a plain text that can be trivially parsed from any language or platform. That's its biggest value. There is essentially no format, library, or platform lock-in. JSON comes close to this level of openness and ease, but YAML is already too complicated as a file format.

jstanley · 27m ago

JSON has the major annoyance that grep doesn't work well on it. You need tooling to work with JSON.

theknarf · 23m ago

grep is a tool. jq is a good tool for json.

kergonath · 11m ago

grep is POSIX and you can count on it being installed pretty much anywhere. That’s not the case for jq.

humanfromearth9 · 21m ago

And the best thing about CSV is that it is a text file with a standardized, well known, universally shared encoding, so you don't have to guess it when opening a CSV file. Exactly in the same way as any other text file. The next best thing with CSV is that separators are also standardized and never positional, you never have to guess.

john_the_writer · 23m ago

100%.. xml also worked here too..

YAML is a pain because it has every so slightly different versions, that sometimes don't play nice.

csv or TSV's are almost always portable.

untrimmed · 50m ago

This is a great defense, but I feel like it misses the single biggest reason CSV will never die: your boss can open it. We can talk about streaming and Parquet all day, but if the marketing team can't double-click the file, it's useless.

imtringued · 48m ago

With what software? LibreOffice? Excel doesn't support opening CSV files with a double click. It lets you import CSV files into a spreadsheet, but that requires reading unreasonably complicated instructions.

ertgbnm · 29m ago

On windows, csv's automatically open in Excel through the file explorer. Almost all normal businesses use windows so the OPs claim is pretty reasonable.

efitz · 30m ago

Excel absolutely can open csv files with a double click if you associate the file type extension.

boshomi · 10m ago

You should never blindly trust Excel when using CSV files. Try this csv file:

    COL1,COL2,COL3 
    5,"+A2&C1","+A2*8&B1"

delta_p_delta_x · 26m ago

> Excel doesn't support opening CSV files with a double click

Yes, it does. When Excel is installed, it installs a file type association for CSV and Explorer sets Excel as the default handler.

jowea · 45m ago

How is that not opening?

imtringued · 40m ago

You are creating a new spreadsheet that you can save as an xlsx. What you are looking at is not the CSV file itself.

NoboruWataya · 29m ago

This is a distinction that does not matter to most non-technical people.

eviks · 17m ago

That's not true either, try it yourself with a simple csv file, open it, add a row and save - you'll see the original update

(there are some limitations)

john_the_writer · 19m ago

Well I mean unless you're inspecting it with a hex editor, you're not looking at the csv file itself. Even then, I suppose you could say that's not even the file itself. An electron microscope perhaps? But then you've got the whole Heisenberg issue, so there's that.

tokai · 29m ago

You are missing the point so hard its hilarious.

john_the_writer · 40m ago

What are you talking about? Excel opens csv with zero issue. In windows, and mac. Mac you right click and "open with". Or you open excel, and click file/open and find the csv. I do the first one a dozen times a day.

1wd · 26m ago

Only if the Windows Regional Settings List Separators happens to be "comma", which is not the case in most of Europe (even in regions that use the decimal point) so only CSV files with SEP=, as the first line work reliably with Excel.

john_the_writer · 16m ago

Literally did this all day today. Took a csv file, parsed it in elixir, processed it and created a new csv file, then opened that in excel, to confirm the changes. At least 100 times today.

hiAndrewQuinn · 1h ago

I like CSV because its simplicity and ubiquity make it an easy Schelling point in the wide world of corporate communication. Even very non-technical people can, with some effort, figure out how to save a CSV from Excel, and figure out how to open a CSV with Notepad if absolutely necessary.

On the technical side libraries like pandas have undergone extreme selection pressure to be able to read in Excel's weird CSV choices without breaking. At that point we have the luxury of writing them out as "proper" CSV, or as a SQLite database, or as whatever else we care about. It's just a reasonable crossing-over point.

heresie-dabord · 30m ago

CSV is a flexible solution that is as simple as possible. The next step is JSONL.

https://jsonlines.org/

https://medium.com/@ManueleCaddeo/understanding-jsonl-bc8922...

guzik · 53m ago

I am glad that we decided to pick CSV as our default format for health data (even for heavy stuff like raw ECG). Yeah, files were bigger, but clients loved that they could just download them, open in Excel, make a quick chart. Meanwhile other software was insisting on EDF (lighter, sure) but not everything could handle it.

roland35 · 16m ago

To people saying that "your boss can open it" being an benefit of csv, well I have a funny story!

Back in the early 2000s I designed and built a custom data collector for a air force project. It saved data at 100 Hz on an SD card. The project manager loved it! He could pop the SD card out or use the handy USB mass storage mode to grab the csv files.

The only problem... Why did the data cut off after about 10 minutes?? I couldn't see the actual data collected since it was secret, but I had no issue on my end, assuming there was space on the card and battery life was good.

Turns out, I learned he was using excel 2003 to open the csv file. There is a 65,536 row limit (does that number look familiar?). That took a while to figure out!!

efitz · 31m ago

I don’t think I ever heard anyone say “csv is dead”.

Smart people (that have been burned once too many times) put quotes around fields in csv if they aren’t 100% positive the field will be comma-free, and escape quotes in such fields.

femto · 48m ago

CSV is good for debugging C/C++ real-time signal processing data paths.

Add cout or printf lines, which on each iteration print out relevant intermediate values separated by commas, with the first cell being a constant tag. Provided you don't overdo it, the software will typically still run in real-time. Pipe stdout to a file.

After the fact, you can then use grep to filter tags to select which intermediate results you want to analyse. This filtered data can be loaded into a spreadsheet, or read into a higher level script for analysis/debugging/plotting/... In this way you can reproducibly visualise internal operation over a long period of time and see infrequent or subtle deviations from expected behaviour.

matt_daemon · 21m ago

Agree this is the main use for it

mcdonje · 1h ago

>Excel hates CSV. It clearly means CSV must be doing something right.

Use tabs as a delimiter and excel interops with the format as if natively.

tacker2000 · 1h ago

The problem is that nobody in the real world uses tabs.

Everyone uses , or ; as delimiters and then uses either . or , for decimals, depending on the source.

It shouldnt be so hard to auto-detect these different formats, but somehow in 2025, Excel still cannot do it.

pragmatic · 24m ago

Pipe enters the chat.

For whatever reason, pipe seems to be support common in health care data.

sfn42 · 41m ago

You don't need to auto-detect the format. The delimiter can be declared at the top of the file as for example sep=;

yrro · 22m ago

But now that's not CSV. It's CSV with some kind of ad-hoc header...

gentooflux · 1h ago

Use tabs as a delimiter and it's not CSV anymore, that's TSV.

mcdonje · 1h ago

They're essentially the same format. Same with PSV. They're all DSVs.

Most arguments for or against one apply to all.

https://en.m.wikipedia.org/wiki/Delimiter-separated_values

jcattle · 18m ago

If you don't care that much about the accuracy of your data (like only caring about a few decimals of accuracy in your floats), you don't generate huge amounts of data, you do not need to work with it across different tools and pass it back and forth, then yes CSV CAN be nice.

I wouldn't write it a love letter though. There's a reason that parquet exists.

christophilus · 9m ago

CSV is just a string serialization, so you can represent floats with any accuracy you choose. It’s streamable and compressible, so large files are fine, though maybe not “huge” depending on how you define “huge”. It works fine passing back and forth between various tools, so…

Without more specifics, I disagree with your take.

lan321 · 17m ago

I hate parsing CSV. There are so many different implementations it's a constant cat and mouse.. Literally any symbol can be the separator, then the ordering starts getting changed, then since you have to guess what's where you go by type but strings, for example, are sometimes in quotations, other times not, then you have some decimal split with a comma when the values are also separated with commas so you have to track what's a split and what's a decimal comma.. Then you get some line with only 2 elements when you expect 7 and have no clue what to do because there's no documentation for the output and hence what that line means..

If the CSV is not written by me it's always been an exercise in making things as difficult as possible. It might be a tad smaller as a format but I find the parsing to be so ass you need really good reason to use it.

Edit: Oh yeah, and some have a header, others don't and CSV seems to always come from some machine where the techs can come over to do an update, and just reorder everything because fuck your parsing and then you either get lucky and the parser dies, or since you don't really have much info the types just align and you start saving garbage data to your database until a domain expert notices something isn't quite right so you have to find when was the last time someone touched the machines and rollback/reparse everything..

xbmcuser · 23m ago

I did not care for CSV format much till I started using them with llm and python scripts.

ayhanfuat · 1h ago

Previously: A love letter to the CSV format (https://github.com/medialab/xan/blob/master/docs/LOVE_LETTER...)

708 points | 5 months ago | 698 comments (https://news.ycombinator.com/item?id=43484382)

eviks · 23m ago

> Okay, it's a lie,

Indeed, a lie only a lover would believe,,,

vim-guru · 1h ago

Excel hates CSV

It clearly means CSV must be doing something right.

wkat4242 · 57m ago

Especially in Europe because we use the comma as a decimal point. So every csv file opened in Excel is screwed up.

klinch · 1h ago

Hot take: I prefer xlsx over CSV

I used to work on payment infrastructure and whenever a vendor offered us the choice between CSV and some other format we always opted for that other format (often xlsx). This sounds a bit weird, but when using xlsx and a good library for handling it, you never have to worry about encoding, escaping and number formatting.

This is one of these things that sound absolutely wrong from an engineering standpoint (xlsx is abhorrently complex on the inside), but works robustly in practice.

Slightly related: This was a German company, with EU and US payment providers. Also note that Microsoft Excel (and therefore a lot of other tools) produces "semicolon-separated values" files when started on a computer with the locale set to German...

n4r9 · 1h ago

Works okay until someone opens the file in Excel, writes "2-9" into a cell, and saves it without realising it's been changed to "02/09/2025" behind the scenes.

chungy · 46m ago

Wait until you find out that "02/09/2025" is actually 45697 behind the scenes ;)

cluckindan · 30m ago

And thus, with some semantic leeway, -7 = 45697

olive-n · 1h ago

I'll take csv over xlsx any time.

I work a lot with time series data, and excel does not support datetimes with timezones, so I have to figure out the timezone every time to align with other sources of data.

Reading and writing them is much slower than csv, which is annoying when datasets get larger.

And most importantly, xlsx are way more often fucked up in some way than any other format. Usually, because somebody manually did something to them and sometimes because the library used to write them had some hiccup.

So I guess a hot take indeed.

personalityson · 1h ago

It should have been semicolon from the start

IanCal · 37m ago

There's ascii characters for field and record delimiters which would be perfect.

I tried using them once after what felt like an aeon of quoting issues, and the first customer file I had had them randomly appearing in their fields.

imtringued · 44m ago

This is the correct take. I've never had any significant problems with xlsx. You may call it abhorrently complex, but for me it is just a standardized way to serialize tabular data via XML.

IanCal · 40m ago

Counterpoint - CSV is absolutely atrocious and should be cast into the Sun.

It's unkillable, like many eldritch horrors.

> The specification of CSV holds in its title: "comma separated values". Okay, it's a lie, but still, the specification holds in a tweet and can be explained to anybody in seconds: commas separate values, new lines separate rows. Now quote values containing commas and line breaks, double your quotes, and that's it. This is so simple you might even invent it yourself without knowing it already exists while learning how to program.

Except that's just one way people do it. It's not universal and so you cannot take arbitrary CSV files in and parse them like this. You can't take a CSV file constructed like this and pass it into any CSV accepting program - many will totally break.

> Of course it does not mean you should not use a dedicated CSV parser/writer because you will mess something up.

Yes, implementers often have.

> No one owns CSV. It has no real specification

Yep. So all these monstrosities in the real world are all... maybe valid? Lots of totally broken CSV files can be parsed as CSV but the result is wrong. Sometimes subtly.

> This means, by extension, that it can both be read and edited by humans directly, somehow.

One of the very common ways they get completely fucked up, yes. Someone goes and sorts some rows and boom broken, often unrecoverable data loss. Someone doesn't correctly add or remove a comma. Someone mixes two files that actually have differently encoded text.

> CSV can be read row by row very easily without requiring more memory than what is needed to fit a single row.

CSV must be parsed row by row.

> By comparison, column-oriented data formats such as parquet are not able to stream files row by row without requiring you to jump here and there in the file or to buffer the memory cleverly so you don't tank read performance.

Sort of? Yes if you're building your own parser but who is doing that? It's also not hard with things like parquet.

> But of course, CSV is terrible if you are only interested in specific columns because you will indeed need to read all of a row only to access the part you are interested in.

Or if you're interested in a specific row, because you're going to have to be careful about parsing out every row until you get there.

CSV does not have a row separator. Or rather it does but it also lets you have that row separator appear and not mean "separate these rows" so you can't simply trust it.

> But critics of CSV coming from this set of pratices tend to only care about use-cases where everything is expected to fit into memory.

Parquet uses row groups which means you can stream chunks easily, those chunks contain metadata so you can easily filter rows you don't need too.

I much more often need to keep the whole thing in memory working with CSV than parquet. With parquet I don't even need to be able to fit all the rows on disk I can read the chunk I want remotely.

> CSV can be appended to

Yeah that's easier. Row groups means you can still do this though, but granted it's not as easy. *However* I will point out that absolutely nothing stops someone completely borking things by appending something that's not exactly the right format.

> CSV is dynamically typed

Not really. Everything is strings. You can do that with anything else if you want to. JSON can have numbers of any size if you just store them as strings.

> CSV is succinct

Yes, more so than jsonl, but not really more than (you guessed it) parquet. Also it's horrific for compression.

> Reverse CSV is still valid CSV

Get a file format that doesn't absolutely suck and you can parse things in reverse if you want. More usefully you can parse just sections you actually care about!

> Excel hates CSV

Helpfully this just means that the most popular way of working with tabular data in the world doesn't play that nicely with it.

cluckindan · 28m ago

Or just forget about the quotes which open a new can of worms, and use TSV while escaping newlines and tabs in values.

Show HN: Bottlefire – Build single-executable microVMs from Docker images (bottlefire.dev)

Show HN: Downloading a folder from a repo using rust (github.com)

Show HN: Vicinae – a native, Raycast-compatible launcher for Linux (github.com)

Show HN: An Open Source XR(AR/VR) Operating System (getxeneva.com)

Show HN: Shellcast.tv – Stream your vibe coding (shellcast.tv)

Show HN: Ion, a Rust/Tokio powered JavaScript runtime for embedders (github.com)

Show HN: Superagents – connect spreadsheets to any database, API or MCP server (sourcetable.com)

Show HN: Nixite – automatically install all your Linux software unattendedly (github.com)

Show HN: AegisClip – Minimalist Mac Clipboard Manager with Auto Paste (aegisclip.com)

Show HN: Attempt – A CLI for retrying fallible commands (github.com)

Show HN: Bottleneck Calculator (bottleneckcalculator.work)

Show HN: Sayathing – Open-Source platform that gives your text a voice (github.com)

Show HN: ZeroFS, the Filesystem That Makes S3 Your Primary Storage (github.com)

Show HN: Backwalk – A lightweight backtrace library written in C (github.com)

Show HN: Atsphinx-qrcode – Sphinx extension to generate QR code in document (github.com)

Show HN: ArduinoCogs adds web-based dashboards and config to ESP32 projects (github.com)

Show HN: Personalized Learning Pathways (eigenarc.com)

Show HN: Lightweight tool for managing Linux virtual machines (github.com)

Show HN: DevOps Alchemy – Little Alchemy with DevOps Elements (devops-alchemy.vercel.app)

Show HN: OpenCV over WebRTC (in Go) (github.com)

Show HN: I recreated Windows XP as my portfolio (mitchivin.com)

Show HN: C++ Compiler Support Page (cppstat.dev)

Show HN: Project Chimera – Hybrid AI Agent Combining LLM, Symbolic, and Causal (github.com)

Show HN: Real time visual saliency detection (github.com)

Show HN: Run any GUI app in the terminal with term.everything (github.com)

Show HN: Paper's Heat Map Shader (app.paper.design)

Show HN: Open-sourcing our text-to-CAD app (github.com)

Show HN: Swimming in Tech Debt (helpthisbook.com)

Show HN: Focalist – A distraction-free task app that helps you focus (focalist.app)

Show HN: I'm making an open-source platform for learning Japanese (kanadojo.com)

Show HN: Tablemd – canvas-based Markdown table editor (tablemd.app)

Show HN: Semantic grep with local embeddings (github.com)

Show HN: BPM Finder: Advanced Audio Analysis Toolkit (bpm-finder.net)

Show HN: RekoSearch – Semantic Search for Media SaaS (Rust, Python, AWS and K8s) (rekosearch.com)

Show HN: Read Japanese Manga More Effectively with MangaRenshuu (mangarenshuu.online)

Show HN: MP3 File Editor for Lossless Bulk Processing (github.com)

Show HN: Writing Arabic in English (sherifelmetwally.com)

Show HN: Common FP – A New JavaScript Utility Lib (common-fp.org)

Show HN: Super Simple Inventory Management (usestockstore.com)

Show HN: Stressio – Control Stress Levels (stressio.dev)

Show HN: Send kind and aspirational words to a stranger who needs it (kindnesssender.com)

Show HN: A roguelike game that runs inside Notepad++ (github.com)

Show HN: Gemini connected to 18 native iOS tools and shortcuts (saturn-live.app)

Show HN: Start a Text with 'Like "'

Show HN: Stroboscopic Instrument Tuner (github.com)

Show HN: DevSwarm, run multiple AI coding assistants in parallel (devswarm.ai)

Show HN: A deep reading tool for those engaging with cutting-edge knowledge (densewiki.org)

Show HN: Vizza – Interactive, Beautiful Simulations (github.com)

Show HN: I built a way to monetize any link with a crypto paywall

Show HN: rm-safely – A shell alias that moves files to trash instead of deleting (github.com)

A love letter to the CSV format (2024)

Comments (53)