DuckDB is probably the most important geospatial software of the last decade

220 dbreunig 71 5/3/2025, 7:30:38 PM dbreunig.com ↗

Comments (71)

Demiurge · 5h ago
> Prior to this, getting up and running from a cold-start might’ve required installing or even compiling severall OSS packages, carefully noting path locations, standing up a specialized database… Enough work that a data generalist might not have bothered, or their IT department might not have supported it.

I've been able to "CREATE EXTENSION postgis;" for more than a decade. There have been spatial extensions for PG, MySQL, Oracle, MS SQL Server, and SQLite for a long time. DuckDB doesn't make any material difference in how easy it is to install.

wenc · 5h ago
That requires data to already be in Postgres, otherwise you have to ETL data into it first.

DuckDB on the other hand works with data as-is (Parquet, TSV, sqlite, postgres... whether on disk, S3, etc.) with requiring an ETL step (though if the data isn't already in a columnar format, things are gonna be slow... but it will still work).

I work with Parquet data directly with no ETL step. I can literally drop into Jupyter or a Python REPL and duckdb.query("from '*.parquet'")

Correct me if I'm wrong, but I don't think that's possible with Postgis. (even pg_parquet requires copying? [1])

[1] https://www.crunchydata.com/blog/pg_parquet-an-extension-to-...

Demiurge · 4h ago
Yeah, if you want to work with GeoParquet, and you want to keep your data in that format. I can see how that's easer to use your example. That's not what a lot of geospatial data is in. You might have shapefiles, geopackages, geojsons, who knows? There is a lot of software, from QGIS to ESRI to work with different formats to solve different problems. I don't think GeoParquet, even though it might be the fastest geospatial vector data format right now, is that common, and the article did not claim that either. So, given an average user trying to answer some GIS question, some ETL is pretty much a given, on average. And given that, installing PostGIS and installing DuckDB, both require some ETL, and learning some query and analytics language. DuckDB might be an improvement, but it's certainly not as much of a leap as quote is making it out to be.
jeffbee · 3h ago
Yeah, just an example of a QoL issue with DuckDB: even though it can glob files in other cases, the way it passes parameters to GDAL means that globs are taken literally instead of expanded. So I can't query a directory with thirty million geojson files. This is not a problem in geopandas because ipython, being a full interactive development environment, allows me to produce the glob any way I choose.

I think this is a fundamental problem with the SQL pattern. You can try to make things just work, but when they fall then what?

maxxen · 3h ago
I think this is just cause it hasn't been implemented in spatial yet. DuckDB is currently going through a pretty big refactor of the way we glob/scan/union multiple files with all the recent focus on data lake formats, but my plan is to get to it in spatial after next release when that part of the code has stabilized a bit.
edoceo · 4h ago
Not wrong. Load to PG, then query. Duck UVP is like bringing 8 common tools/features under one tent.
wenc · 7h ago
I'm a big fan of DuckDB and I do geospatial analysis, mostly around partitioning geographies (into Uber H3 hexagons), calculating Haversine distances, calculating areas of geometries, figuring out which geometry a point falls in, etc. Many of these features have existed in some form or other in geopandas or postgis, so DuckDB's spatial extensions bring nothing new.

But what DuckDB as an engine does is it lets me work directly on parquet/geoparquet files at scale (vectorized and parallelized) on my local desktop. It beats geopandas in that respect. It's a quality of life improvement to say the least.

DuckDB also has an extension architecture that admits more exotic geospatial features like Hilbert curves, Uber H3 support.

https://duckdb.org/docs/stable/extensions/spatial/functions....

https://duckdb.org/community_extensions/extensions/h3.html

larsiusprime · 7h ago
“import geopandas” also exists and has for some time. Snark aside, WHAT is special about duckDB? I wish the author had actually shown some practical examples so I could understand their claims better.
maxxen · 6h ago
I replied to another comment, but I think a big part is that duckdbs spatial extension provides a SQL interface to a whole suite of standard foss gis packages by statically bundling everything (including inlining the default PROJ database of coordinate projection systems into the binary) and providing it for multiple platforms (including WASM). I.E there are no transitive dependencies except libc.

Yes, DuckDB does a whole lot more, vectorized larger-than-memory execution, columnar compressed storage and a ecosystem of other extensions that make it more than the sum of its parts. But while Ive been working hard on making the spatial extension more performant and more broadly useful (I designdd a new geometry engine this year, and spatial join optimization just got merged on the dev-branch), the fact that you can e.g. convert too and from a myriad of different geospatial formats by utilizing GDAL, transforming through SQL, or pulling down the latest overture dump without having the whole workflow break just cause you updated QGIS has probably been the main killer feature for a lot of the early adopters.

(Discmaimer, I work on duckdb-spatial @ duckdblabs)

larsiusprime · 6h ago
This is an excellent reply and what I wish the article had been, thanks!
vemom · 6h ago
Oh yeah the article does stop abruptly after "2 lines to install"
timschmidt · 6h ago
I'm not the OP, but thank you for such a detailed answer. The integration and reduced barriers to entry you mention mirror my own experiences with tooling in another area, and your explanation made parallels clear.
jjtheblunt · 6h ago
duckdb has parquet support and can operate, in SQL syntax, on enormous 'tables' spread across huge collections of parquet files as if one virtual file. i believe the underlying implication is opportunities to leverage vector instructions on parquet. it's very "handy".
dbreunig · 7h ago
Author here: what's special is that you can go from 0 to spatial data incredibly quickly, in the data generalist tool you're already using. It makes the audience of people working with geospatial data much bigger.

(Geopandas is great, too.)

dopidopHN · 6h ago
I’m very familiar with Postgres and spinning one with postgis seems easy enough. Do I get more with duckdb?

Most of the time I store locations and compute distance to them. Would that being faster to implement with duckdb

wenc · 5h ago
Probably no difference for your use-case (ST_Distance). If you already have data in Postgres, you should continue using Postgis.

In my use case, I use DuckDB because of speed at scale. I have 600GBs of lat-longs in Parquet files on disk.

If I wanted to use Postgis, I would have to ingest all this data into Postgres first.

With DuckDB, I can literally drop into a Jupyter notebook, and do this in under 10 seconds, and the results come back in a flash: (no need to ingest any data ahead of time)

  import duckdb
  duckdb.query("INSTALL spatial; LOAD spatial;")
  duckdb.query("select ST_DISTANCE(ST_POINT(lng1, lat1), ST_POINT(lng2, lat2)) dist from '/mydir/*.parquet'")
vasco · 18m ago
I haven't yet understood this pattern (and I tried using duckdb). Unless you're only ever going to query those files once or twice in your life, importing them into postgres shouldn't be that long and then you can do the same or more than with DuckDB.

Also as a side note, is everyone just using DuckDB in memory? Because as soon as you want some multiple session stuff I'd assume you'd use DuckDB on top of a local database, so again I don't see the point but I'm sure I'm missing something.

getnormality · 6h ago
Everything is special about DuckDB. Pandas is way, way behind the state of the art in tabular data analysis.

No comments yet

joshvm · 7h ago
I haven't used duckDB but the real comparison is presumably postgis? Which is also absent from the discussion, but I think what the author alludes to.

I have no major qualm with pandas and geopandas. However I use it when it's the only practical solution, not because I enjoy using it as a library. It sounds like pandas (or similar) vs a database?

stevage · 4h ago
Yeah, PostGIS is readily available, and postgres is much more widely used than DuckDB. Either I don't understand OP's argument for why this is so important or I just don't buy it.

If you're using JavaScript you install Turf. The concept that you can readily install spatial libraries is hardly earth shattering.

tom_m · 4h ago
Convenience will always be a personal preference.
tsss · 6h ago
For one it doesn't have the god awful pandas API.
tmpz22 · 6h ago
I've been researching DuckDB - while it has many technical merits I think the main argument will be ease of use. It has a lot of the operational advantages of sqlite paired with strong extensibility and good succinct documentation.

Folks who have been doing DevOps work are exasperated with crummy SaaS vendors or antiquated OSS options that have a high setup cost. DuckDB is just a mature project that offers an alternative, hence an easy fan favorite among hobbyists (I imagine at scale the opportunity costs change and it becomes less attractive).

wenc · 5h ago
How is the adoption among DevOps folks?

I'm still getting feedback that many devs are not too comfortable with reading and writing SQL. They learned simple SELECT statements in school, but get confused by JOINs and GROUP BYs.

vasco · 16m ago
There's no point in learning any much deeper SQL anymore, AI assistants have largely solved SQL querying. Just ask for what you want with natural language.
edoceo · 4h ago
Random voice here: they should get better at SQL. Not 9 joins and GROUP BY and HAVING and other magic. But two joins and GROUP for sure. If one gets the 3NF already then join and others are a quick (2 week) learn.

I'd pick that before traveling the DuckDB path.

Yeroc · 3h ago
Since when is strong SQL knowledge not a core skill for developers? I suppose it's the rise of frontend vs backend specialization that is the cause?
jparishy · 7h ago
I work on geospatial apps and the software I think I am most excited about is https://felt.com/. I want to see them expand their tooling such that maps and data source authentication/authorization was controllable by the developer, to enable tenant isolation with proprietary data access. They could really disrupt how geospatial tech gets integrated into consumer apps.

This article doesn't acknowledge how niche this stuff is and it's a lot of training to get people to up to speed on coordinate systems, projections, transformations, etc. I would replace a lot of my custom built mapping tools with Felt if it were possible, so I could focus on our core geospatial processes and not the code to display and play with it in the browser, which is almost as big if not bigger in terms of LOC to maintain.

As mentioned by another commenter, this DuckDB DX as described is basically the same as PostGIS too.

jandrewrogers · 4h ago
> it's a lot of training to get people to up to speed on coordinate systems, projections, transformations, etc

This can mostly be avoided entirely with a proper spheroidal reference system, computational geometry implementation, and indexing. Most uses of geospatial analytics are not cartographic in nature. The map is at best a presentation layer, it is not the data model, and some don’t use a map at all. Forcing people to learn obscure and esoteric cartographic systems to ask simple intuitive questions about geospatial relationships is a big part of the problem. There is no reason this needs to be part of the learning curve.

I’ve run experiments on unsophisticated users a few times with respect to this. If you give them a fully spheroidal WGS84 implementation for geospatial analytics, it mostly “just works” for them anywhere on the globe and without regard for geospatial extent. Yes, the software implementation is much less trivial but it is qualitatively superior UX because “the world” kind of behaves how people intuit it should without having to know anything about projections, transforms, etc. And to be honest, even if you do know about projections and transforms, the results are still often less than optimal.

The only issue that comes up is that a lot of cartographic visualization toolkits are somewhat broken if you have global data models or a lot of complex geometry. Lots of rendering artifacts. Something else to work on I guess.

maxxen · 4h ago
Im inclined to agree, but unfortunately a huge amount of the existing data and processes in this space does not assume a spheroidal earth and come provided with a coordinate reference system. Ultimately there are also some domains where you got data that you explicitly don't want to interpret using spheroidal semantics, e.g. when working with a city plan - in which case the map _is_ the data model, and you definitely want the angles of a triangle to sum up to 180.
jparishy · 4h ago
Similarly I see it as it an inevitable that you will deal with a problem related to these issues, then have a massive blocker if you aren't already familiar with the details because debugging requires more than a cursory understanding for hard problems. In the alternative, you don't know the problems exist and the code is broken. I don't think it's as simple as abstracting away the entire concept, which is what I would say is too high level like I mentioned above. I don't know the right answer here honestly, I think it will be disruptive when someone figures it out
groggo · 4h ago
> the software implementation is much less trivial

Aren't most geospatial tools just doing simple geometry? And therefore need to work on some sort of projection?

If you can do the math on the spheroidal model, ok you get better results and its easier to intuit like you said, but it's much more complicated math. Can you actually do that today with tools like QGIS and GDAL?

jandrewrogers · 1h ago
Many do use simple geometry. This causes endless headaches for people who are not cartographers, they don’t expect that. The good geospatial tools usually support spheroidal models but it is not the default, you have to know to explicitly make sure it uses that (many people assume that is the default).

An additional issue is that the spheroidal implementations have undergone very little optimization, perhaps because they are not the defaults. So when people figure out how to turn them on, performance is suddenly terrible. Now you have people that believe spheroidal implementations are terribly slow, when in reality they just used a pathologically slow implementation. Really good performance-engineered spheroidal implementations are much faster than people assume based on the performance of open source implementations.

Demiurge · 2h ago
This is not really a problem, unless you’re trying to simulate some 3D space orbits, physics. The crossover from geo INFORMATION systems to geo simulation systems is a bit rough, but the projections and calculations on projected cartesian space are enough for many typical questions, like distance, area, routing. However, even topology support starts getting specialized, and the use cases are more niche. I think it’s asking a bit too much from a database/storage layer to do efficient calculations outside of those supported by GEOS. At this point, you might want to import the relevant data into higher level applications.
jandrewrogers · 54m ago
Speaking for myself, I was not referring to any kind of simulation systems. This is a standard requirement of many operational geospatial data models, and there are a lot of these in industry. Anything that works from a projection is a non-starter, this causes demonstrable issues for geospatial analysis at scale or if any kind of precision is required. Efficient calculation just means efficient code, there is nothing preventing this from existing in open source beyond people writing it. Yes, you may be able to get away with it if your data model is small, both geographically and data size, but that does not describe every company.

It is entirely possible to do this in databases. That is how it is actually done. The limitations of GEOS are not the limitations of software, it is not a particularly sophisticated implementation (even PostGIS doesn’t use it for the important parts last I checked). To some extent you are affirming that there is a lack of ambition in this part of the market in open source.

dbreunig · 7h ago
Author here: the beauty of DuckDB spatial is that the projections and CRS options are hidden until you need them. For 90% of geospatial data usage people don't and shouldn't need to know about projections or CRS.

Yes, there are so many great tools to handle the complexity for the capital-G Geospatial work.

I love Felt too! Sam and team have built a great platform. But lots of times a map isn't needed; an analyst just needs it as a column.

PostGIS is also excellent! But having to start up a database server to work with data doesn't lend itself to casual usage.

The beauty of DuckDB is that it's there in a moment and in reach for data generalists.

korkoros · 6h ago
My experience has been that data generalists should stay away from geospatial analysis precisely because they lack a full appreciation of the importance of spatial references. I've seen people fail at this task in so many ways. From "I don't need a library to reproject, I'll just use a haversine function" to "I'll just do a spatial join of these address points in WGS84 to these parcels in NAD27" to "these North Korean missiles aren't a threat because according to this map using a Mercator projection, we are out of range."

DuckDB is great, but the fact that it makes it easier for data generalists to make mistakes with geospatial data is mark against it, not in its favor.

groggo · 4h ago
jparishy · 6h ago
I think we're mostly making the same point about complexity, ya.

To me, I think it's mostly a frontend problem stopping the spread of mapping in consumer apps. Backend geo is easy tbh. There is so much good, free tooling. Mapping frontend is hell and there is no good off the shelf solution I've seen. Some too low level, some too high level. I think we need a GIS-lite that is embeddable to hide the complexity and let app developers focus on their value add, and not paying the tax of having frontend developers fix endless issues with maps they don't understand.

edit: to clarify, I think there's a relationship between getting mapping valued by leadership such that the geo work can be even be done by analysts, and having more mapping tools exist in frontend apps such that those leaders see them and understand why geo matters. it needs to be more than just markers on the map, with broad exposure. hence my focus on frontend web. sorry if that felt disjointed

dbreunig · 6h ago
Not disjointed at all. That last topic is the big challenge to solve.
stevage · 4h ago
I was just about to get into Felt then they took away the free tier and made it very expensive.
mtmail · 4h ago
https://atlas.co/ still has a free tier. Less features I think, depends on your use case of course.
wodenokoto · 8h ago
I’m not sure I agree that “install geospatial” is a game changer in simplicity compared to “pip install geopandas”.

They are both one line.

maxxen · 7h ago
I think a big part is that duckdbs spatial extension doesnt have any transitive dependencies (except libc). It statically packages the standard suite of foss gis tools (including a whole database of coordinate systems) for multiple platforms (including WASM) and provides a unified SQL interface to it all.

(Disclaimer, I work on duckdb-spatial @duckdblabs)

carlhjerpe · 5h ago
"Accessibility" is too often dismissed, yes you CAN do things with things, but getting people to do it is a craft and art. This is often where open-ish-core differs from the enterprise version of something too
WD-42 · 7h ago
Is it that much simpler than ‘load extension postgis’? I know geos and gdal have always kinda been a pain, but I feel like docker has abstracted it all away anyway. ‘docker pull postgis’ is pretty easy, granted I’m not familiar with what else duckdb offers.
dbreunig · 6h ago
Yes. The difference between provisioning a server and running 'install spatial' in a CLI is night and day.

Docker has been a big improvement (when I was first learning PostGIS, the amount of time I had to hunt for proj directories or compile software just to install the plugin was a major hurdle), but it's many steps away from:

``` $ duckdb D install spatial; ```

Demiurge · 5h ago
What do you mean by "provisioning a server"? That's a strange requirement. You can install Postgis on a macbook in one command, or actually on all 3 major OS's in one command: "brew install postgis", "apt-get install postgresql-postgis, and "choco install postgis-9.3". Does DuckDB not require a "server" or a "computer"? What does Docker have to do with anything? This is a very confusing train of thought.

No comments yet

frainfreeze · 6h ago
I mean I like duckdb but this feels like you're pushing for it. On my system postgis comes from apt install, and it's one command to activate the "plugin". Is the night and day part not having to run random sh script from the internet to install software on my system?
dbreunig · 6h ago
That’s great! The difference is you’re familiar and know how to do that

Getting started from 0 with geo can be difficult for those unfamiliar. DuckDB packages everything into one line with one dependency.

_boffin_ · 3h ago
tomnipotent · 6h ago
DuckDB doesn't require a running server. I run duckdb in a terminal, query 10,000 CSV or parquet files and run SQL on them while joining to data hosted in sqlite, a separate duckdb file using its native format, or even Postgres.
frainfreeze · 6h ago
It is not simpler. I use it with testcontainers in the notebooks usually https://testcontainers-python.readthedocs.io/en/latest/
patja · 7h ago
SQL Server has geospatial capabilities without any extensions or add-ons. I've been happily using geospatial datatypes on the free Express version for years, probably well over a decade.
twelvechairs · 7h ago
DuckDB is a great thing for geospatial but most important of the past decade? There's so many tools in different categories it wouldnt come near top for me. Some might be QGIS, postGIS (still the standard), ArcGIS online (still the standard), JS mapping tools like mapbox (i prefer deckgl), new data types like COG, geopackage and geoparquet, photogrammetry tools, 3d tiles, core libraries like gdal and now pdal, shapely, etc.
dbreunig · 6h ago
Most of those tools came out circa ~2000.

Yeah, I feel old.

badmonster · 7h ago
How might embedding spatial capabilities directly into general-purpose data tools like DuckDB reshape who participates in geospatial analysis—and what kinds of problems they choose to solve?
fidotron · 7h ago
Honestly, I think it's actually https://www.uber.com/en-CA/blog/h3/
jsemrau · 4h ago
Makes one wonder if the YOLO algorithm would work better with hexagons.

"Hexagons were an important choice because people in a city are often in motion, and hexagons minimize the quantization error introduced when users move through a city. Hexagons also allow us to approximate radiuses easily, such as in this example using Elasticsearch."

[Edit]Maybe https://www.researchgate.net/publication/372766828_YOLOv8_fo...

jandrewrogers · 4h ago
I think geospatial analytics is important (because of course I would), but to be frank geospatial software has been stagnant for a long time. Every new thing is just a fresh spin on the same stagnant things we already have. This more or less says exactly this?

For geospatial analysis, the most important thing that could happen in software would be no longer treating it, either explicitly or implicitly, as having anything to do with cartography. Many use cases are not remotely map-driven but the tools require users to force everything through the lens of map-making.

dbreunig · 4h ago
I was struck by this as people suggest alternatives that refute the headline (QGIS, PostGIS, GDAL, etc): nearly every one emerged in the early 2000s.

Strongly agree with your sentiment around maps: most people can’t read them, they color the entire workflow and make it more complex, and (imo) lead to a general undervaluing of the geospatial field. Getting the data into columns means it’s usable by every department.

azinman2 · 4h ago
Can you give some examples?
jandrewrogers · 4h ago
Of the stagnation? I’ve been doing geospatial analytics for over 20 years and shockingly little has changed, both in features and capability. Given the amount of time that has passed and the vastly expanded scope of the geospatial data models people are working with today, I think most people would expect more to have changed.
azinman2 · 2h ago
No I meant to this: “Many use cases are not remotely map-driven but the tools require users to force everything through the lens of map-making.”
WD-42 · 5h ago
Uhoh, another pushover-licensed database. I wonder when it will begin it’s own redis saga.
bingaweek · 7h ago
We need a "come on" clause for these absurd headlines. Come on.

No comments yet

cyanydeez · 6h ago
No. QGIS is.

Good god.

dbreunig · 6h ago
Author here.

QGIS is amazing. It's really great. It also came out in 2002, so I think the headline is safe.

cyanydeez · 4h ago
Nope, it's constantly being improved and still wins the decade. Do you disqualify it because it existed? Re-read your headline, it's definitely not qualifying what you think it's qualifying.
jeffbee · 8h ago
Ehh I tried to do some spatial stuff but there just wasn't enough there, or I could not figure out how to use it. Loading spatial information into ipython and fiddling with it is well-traveled and it doesn't seem to me that SQL is an inherently lower hurdle for the user.
fithisux · 9h ago
I agree