How to build vector tiles from scratch

107 ajd555 32 9/4/2025, 12:40:47 PM debuisne.com ↗

Comments (32)

stevage · 1d ago
>The spec refers to 4096 tile sizes, but MapLibre seems to use 512 pixels,

I think OP is maybe confusing two different things. The 4096 is, IIRC, not "pixels" but rather "units of precision". They are saying that within a vector tile, the far left is 0, the far right is 4095, with 4096 discrete distance values between.

The 512 is about the portion of the map that is occupied by a vector tile, in "pixels". So if you have a vector tile with 4096 units of precision crammed into 512 pixels, that means you actually get a bit of sub-pixel precision, which is really helpful when you overzoom - you can stretch that same vector tile out over to a bigger and bigger area as you zoom in.

This matters because generally you generate vector tiles to some finite depth (eg, zoom 13) but then you want to be able to display them through to say zoom 17, which means the tile is going to be stretched 4 more zoom depths, so 2^4 (16) times wider/higher.

ajd555 · 1d ago
Ah, I was missing that bit of information, and it explains why I was seeing these two values. Thank you stevage, I'll update the post with this info and cite you.
stevage · 1d ago
Vector tiles are super un-intuitive. They took me a very long time to get my head around, and I still feel shakey on some of the details.

FYI you might enjoy this talk I gave a few years back: https://www.youtube.com/watch?v=Zuo80hZYA1k

ajd555 · 1d ago
Thank you - I will watch that with my coffee. I also realize this means I set the extent to 512 in the mvt file - that means I won't benefit from that subpixel precision you mentioned - I'll update the code and test it out
kocheez75 · 1d ago
If your data is already in GeoJSON, you can use tippecanoe to produce a tileset in either MBTiles or PMTiles format: https://github.com/felt/tippecanoe.

Or if your data is more dynamic, you can also use the ST_AsMVT extension to query PostGIS, which will generate tiles on-the-fly: https://postgis.net/docs/ST_AsMVT.html

ajd555 · 1d ago
Hi HN! This is my first blog, inspired by many posts read here. I attempt to explain how Vector Tiles in GIS tools are constructed from scratch, and how easy it is. This also sets up future blog posts about MapLibre tiles as well as some tricks in the backend to pre-generate the MVT files to serve them up quickly, and update tiles for real time updates on a map. I'll be happy to answer any questions or feedback!
stevage · 1d ago
I'm not entirely clear on why you had to go all the way to "from scratch". Are there really no libraries in Go that can generate tiles more conveniently?

(I've been basically a full time freelance Mapbox consultant for the last 7 years. When I'm generating tiles, it's either statically (tippecanoe), through PostGIS, or occasionally using a JavaScript library. Never had to get my hands quite as dirty as you here...)

It is a real shame that Mapbox GL JS doesn't support GeoJSON vector tiles, which makes a lot of this stuff easier. Yes, they're not as compact as PBF, but for ease of development, debuggability etc, they're a great step along the way and flatten the learning curve. (From memory, OpenLayers does support them...)

Also curious whether you looked at https://docs.protomaps.com/pmtiles/

ajd555 · 1d ago
I have not looked at PMTiles, so far everything holds nicely generated in memory, so I haven't had the need to store this information. I'll keep it in mind.

Any idea why Mapbox GL JS doesn't support GeoJSON vector tiles?

starkparker · 1d ago
> I have not looked at PMTiles, so far everything holds nicely generated in memory

For a planet-scale project that I worked on, a multi-layer PMTiles set generated from GeoJSON by Tippecanoe reduced the amount of storage needed by about 60% vs. MVT, at the cost of a longer build time. Result was a single file served by MapLibre on a static web server, no tileserver needed.

The storage savings allowed us to add 3 additional zoom levels on the same cheap, storage-constrained VPS host that ran the web server. We considered moving it to S3, which would be much easier with PMTiles since it's just a single file, but it would've only cost us more money and we didn't need edge caching.

I'd link to the project but it'd be a waste of time vs. reading the two-step process to generate in the PMTiles docs: https://docs.protomaps.com/pmtiles/create#tippecanoe

And the 4-LOC installation process of the PMTiles library for MapLibre: https://docs.protomaps.com/pmtiles/maplibre

ajd555 · 1d ago
Good to know - thank you!
stevage · 1d ago
As far as I know, the design of Mapbox GL JS was very heavily geared towards their own needs of producing high performance maps that would be loaded on millions of devices. Obviously they'd never use MVT in that use case, so they didn't bother supporting it.

There are lots of areas where they could have made the lives of developers a lot easier. Another that comes to mind is just how hard it is to look inside vector tiles - they don't really provide much in the way of tools for inspection, etc. I had to build https://stevage.github.io/vector-inspector/ and https://www.npmjs.com/package/tileinfo for instance.

sbma44 · 1d ago
hey, longtime Mapbox employee here. I appreciate all the work you're doing here to help people wrap their heads around vector tiles! This is an old technology at this point, and as you've explained, there are robust tools for moving from GeoJSON to tilesets. It's cool to pull apart the nuts and bolts of a thing (and the Mapbox Vector Tile Spec is open) but there are easier ways to accomplish this objective.

A question for you:

> Obviously they'd never use MVT in that use case, so they didn't bother supporting it.

What does this mean? Mapbox GL (JS and native) both support MVT, of course--that's why we created it! Perhaps you were referring to something else? Higher in this post I see a reference to "GeoJSON vector tiles" and I'm curious what that means. GeoJSON is very convenient (Mapbox helped support its enshrinement as IETF RFC 7496), but one of the hard parts of tiling is slicing features and knowing when to simplify them. Once you've done that, you might as well cram it into a protobuf or other highly efficient container. When you pass Mapbox GL a GeoJSON, it actually cuts the geometry into tiles in memory and uses those for rendering.

Some other general notes: - the process of tiling is lossy (or should be). if you zoom out to see all of north america, your GeoJSON of neighborhood address points is going to overlap. you should drop most of them from the low-zoomlevel tiles. Tippecanoe does this in brilliant and highly tuneable ways. This applies to geometry simplification, too. Folks should keep this in mind when doing size comparisons. - map tiling is fundamentally about moving compute into preprocessing and assembling geometry from highly parallelized fetches. MVT is a technology built on and for S3-like services. it's been exciting to see new approaches to this problem that offer lovely ergonomics around deployment etc, but if you have cloud infra, a hot cache, and are optimizing for performance, MVT remains hard to beat - we continue to research additional optimizations for VT, but the technology has stood the test of time, and proven useful in many different contexts beyond map rendering, including routing and geocoding

stevage · 22h ago
>What does this mean?

Ugh, dumb typo - it was late. I meant "Obviously they'd never use GeoJSON in that use case".

> Higher in this post I see a reference to "GeoJSON vector tiles" and I'm curious what that means.

It's what it sounds like: vector tiles, but instead of protobuf, the data is simply passed directly as GeoJSON. Really convenient for a use case like a server that generates data on demand: easy to generate (ie, it avoids all the difficulty of OP's post), easy to inspect in the browser for debugging. Only downside is it's less efficient space-wise than protobuf. So it's useful as a first step for a proof of concept (to be replaced by MVT), or in a case where the size doesn't matter.

>Once you've done that, you might as well cram it into a protobuf or other highly efficient container.

I'm disputing the "you might as well" bit for many use cases. :) (Again, I think Mapbox is very geared towards large scale uses, but a lot of the internet is small and bespoke).

It was actually Tangram, not OpenLayers, that I was thinking of that supports it: https://github.com/tangrams/tangram?tab=readme-ov-file#vecto...

>MVT is a technology built on and for S3-like services.

It's interesting that you say that. My experience, having been down this road a few times, is that serving MVT from S3 is generally a pain that I don't recommend for new clients. It takes some pretty specific AWS configuration, and the process of uploading thousands of individual files is slow and error-prone. (I wrote a guide on it once but can't find it now).

Yeah it's a good solution for large-scale uses (again...) but not good for the average user.

PMTiles seems like a pretty compelling alternative for those scenarios: ship one file instead of thousands, and rely on HTTP range requests. The downside I ran into is that not all "S3-like services" support that.

In practice, I recommend either hosting data on Mapbox/MapTiler/whoever is cheapest this month if the volumes are low, or setting up a tiny tile server. Even a tiny server is sufficient for serving tiles, and costs a fraction of what Mapbox charges (especially since Mapbox's change to charging per square kilometre, which is absolutely cost prohibitive for sparse data).

>we continue to research additional optimizations for VT,

Can you elaborate? The spec (https://github.com/mapbox/vector-tile-spec) has not had an update in 4 years, and since MVT v2 did not include any substantive changes, the spec is essentially unchanged since 2014. In 2018, a working group for a version 3 was announced (https://github.com/mapbox/vector-tile-spec/issues/103) but then apparently quietly abandoned only a couple of months later.

sbma44 · 5h ago
Didn't mean to imply that tiling is trivial--our initial business model was focused on taking care of that difficulty for our customers, after all, and it wouldn't have made sense if we didn't think we were delivering value.

I will defer to your experience re the utility of tiled-but-still-GeoJSON as a sensible middle ground. I think you're right that we haven't seen this as an area that merits significant attention--it's sort of "not worth optimizing yet [geojson]" or "worth optimizing [MVT]". But I can see how there could be middle grounds in some scenarios.

PMtiles is what I had in mind when I mentioned ergonomics. Brandon's delivered a very compelling approach, as I hope I conveyed to him at CNG earlier this year. The lack of fully specified behavior re range requests is a lingering concern, as you acknowledge, and there are some other areas like incremental updates where having a huge mess of tiles still looks pretty good. But I think it's fair to say that it's overperformed as a solution, and I understand why people are excited about it and the other cloud-native geo formats that have emerged in recent years. Decades ago, Mapbox founders were at DevSeed writing PHP--there will always be some sympathy around here for "just upload it" as a deploy story!

I can't talk about the optimizations we are investigating, but I can at least acknowledge some of what makes the problem so hard (and the update schedule so slow): MVT is quite good, and backward compatibility is a pain, especially when you're optimizing for bundle/binary size (a particularly hard problem when your competitors get to preinstall their libraries on every mobile phone in the world) and working with a customer base that's preintegrated, in every way and to every extent imaginable, with an existing ecosystem. There is a reason people still use shapefiles! Though I hope MVT's reputation remains a bit better than that...

ajd555 · 1d ago
This is amazing, I hadn't come across these tools, these are extremely useful!
stevage · 1d ago
Heh, I feel like you're me 8 or 9 years ago, trying to get my head around vector tiles for the first time.
ajd555 · 1d ago
Honestly, simple curiosity + wanted to keep it in Go
nilsingwersen · 1d ago
I understand that you probably wanted to implement this from scratch but did you take a look at tippecanoe? https://github.com/felt/tippecanoe originally by mapbox but got forked by felt.
jjgreen · 1d ago
Interesting read, thanks
aaa_2006 · 1d ago
Very cool project. One option worth exploring is FlatGeobuf. It provides fast, indexed storage for large GeoJSON datasets and makes it easy to stream only the features you need per tile. It can slot neatly into a vector tile pipeline and improve both batch and real time generation.
sfblah · 1d ago
I built a system like this to create vector tiles for mapbox at a side-job I had a few years back. Unfortunately my boss couldn't understand what I was doing and instead wanted us to run a SQL query on the database every time the map was moved, and just buy an extra-large instance for the load. I tried to explain it to him but to no avail. I finally just quit. He wasn't a dumb guy either.

Point being: If you're doing GIS stuff on a website, it's worth making sure you have folks who actually can understand these underlying technologies.

snickerdoodle12 · 1d ago
If you need a backend anyway I like using postgis ST_AsMVT and caching the result. So pretty much running a sql query on the database every time the map is moved and then caching it. Super easy to maintain, don't have to pregenerate anything. Just bust the cache when necessary.
chrisdalke · 10h ago
Exactly! I run a service that handles >1M tile requests per month served directly from PostGIS with ST_AsMVT etc. and file caching.
ajd555 · 1d ago
So running a SQL query and returning GeoJSON? Sorry that didn't work - hopefully you can work on another system like this soon, with someone who actually understands the benefits of tiles!
Bedon292 · 1d ago
I commend you for actually digging into it and trying to understand the format. I have used them plenty but hadn't really dug into their internals. As you are adding more data layers, it may be worth looking at Geoserver [1]. You can load the data in and let it handle the tile generation and caching. Even if you don't use it, the Vector Tiles extension [2] may be a useful reference for implementing it.

[1] https://geoserver.org/

[2] https://github.com/geoserver/geoserver/tree/main/src/extensi...

ajd555 · 1d ago
Thank you! I'll take a look, this looks great
stevage · 22h ago
Op-ed: No. You don't need Geoserver. The main benefit of using Geoserver is operating a service that provides many different kinds of layers to different users, and being able to manage them through a web interface. Every time I have deployed Geoserver I have regretted it.

Geoserver does not really do much directly: it's just a kind of web glue for various kinds of backends. So for tile generation, you're better off using Martin (https://maplibre.org/martin/martin-cp.html) or similar.

e1gen-v · 1d ago
I actually did this at work! Set up a fast api with a tile server endpoint and it wasn’t as bad as I thought it would be.
ajd555 · 1d ago
Awesome, good job! Did you implement likes and polygons too?
e1gen-v · 9h ago
Yes but I actually didn’t end up building any of the tile logic in python. I used PostGIS and Postgres. Each query is like 25 lines and supports polys and lines out the box.
J_Sherz · 1d ago
This is cool - is there any data you can aggregate on waterways traffic / the ferry system?
ajd555 · 1d ago
I've found some waterways traffic data, but it seems to be previous day usage data, and not real-time usage or position. I'll dig to see if there is some info, as that would add some value to the dashboard for sure!