If you're going to use SQLite as an application file format, you should:
1. Enable the secure_delete pragma <https://antonz.org/sqlite-secure-delete/> so that when your user deletes something, the data is actually erased. Otherwise, when a user shares one of your application's files with someone else, the recipient could recover information that the sender thought they had deleted.
2. Enable the options described at <https://www.sqlite.org/security.html#untrusted_sqlite_databa...> under "Untrusted SQLite Database Files" to make it safer to open files from untrusted sources. No one wants to get pwned when they open an email attachment.
3. Be aware that when it comes to handling security vulnerabilities, the SQLite developers consider this use case to be niche ("few real-world applications" open SQLite database files from untrusted sources, they say) and they seem to get annoyed that people run fuzzers against SQLite, even though application file formats should definitely be fuzzed. https://www.sqlite.org/cves.html
They fail to mention any of this on their marketing pages about how you should use SQLite as an application file format.
charleslmunger · 17h ago
>and they seem to get annoyed that people run fuzzers against SQLite, even though application file formats should definitely be fuzzed.
I think that's an unfair reading. Sqlite runs fuzzers itself and quickly addresses bugs found by fuzzers externally. There's an entire section in their documentation about their own fuzzers and thanking third party fuzzers, including credit to individual engineers.
The tone of the CVE docs are because people freak out about CVEs flagged by automated tools when the CVEs are for issues that have no security impact for typical usage of SQLite, or have prerequisites that would already have resulted in some form of compromise.
perching_aix · 5h ago
> The tone of the CVE docs are because people freak out about CVEs flagged by automated tools when the CVEs are for issues that have no security impact for typical usage of SQLite, or have prerequisites that would already have resulted in some form of compromise.
The CVE docs:
> The attacker can submit a maliciously crafted database file to the application that the application will then open and query
This is exactly the normal use case GP talks about with application file formats.
charleslmunger · 2h ago
That's true, but most usage of sqlite is not as an application file format, and many of those CVEs don't apply even to that use case. The reason people have policies around CVE scanning is because CVEs often represent real vulnerabilities. But there's also a stuff like "this regex has exponential or polynomial runtime on bad inputs", which is a real security issue for some projects and not others, depending on what the input to the regex is. That's also true for SQLite, and I'm guessing that the author of that page has spent a bunch of time explaining to people worried about some CVE that their usage is not vulnerable. The maintainer of cURL has expressed similar frustration.
sigmarule · 5h ago
On the other hand, exploiting weaknesses in MITRE’s CVE program to create ticket management primitives, creating “shellcode” that composes them to implement a feature request tracking API, using it to manage your open source organization’s feature roadmap, sure would make for a great 2600 article…
ectospheno · 7h ago
To be fair, PRAGMA trusted_schema=OFF is recommended by the docs, it just isn't default. The docs also recommend the SQLITE_DIRECTONLY flag on all custom SQL functions.
Seattle3503 · 19h ago
Hrm, using sqlite as an application format would be a good use case for Limbo.
chris_wot · 19h ago
"Most applications can use SQLite without having to worry about bugs in obscure SQL inputs." And then they recommend SQLite as a document interchange format.
ncruces · 13h ago
Untrusted database file is not the same as untrusted SQL input.
There are parts of the SQL engine that are exposed to malicious file manipulation (the schema is stored as SQL DDL text) but that's not arbitrary SQL input.
If you want to highlight an inconsistency, this is way more worrying:
> “All historical vulnerabilities reported against SQLite require at least one of these preconditions: (…) 2. The attacker can submit a maliciously crafted database file to the application that the application will then open and query. Few real-world applications meet either of these preconditions…”
However, most of the rest of the page is speaking of arbitrary SQL input, not purposely broken database files.
agwa · 9h ago
> There are parts of the SQL engine that are exposed to malicious file manipulation (the schema is stored as SQL DDL text) but that's not arbitrary SQL input.
View and triggers can contain arbitrary SQL and can be defined by a malicious database file, though these can be disabled as described on the "Defense Against The Dark Arts" page.
That leaves default column values and indexes on expressions, which can execute a limited subset of SQL. I'd be worried about certain arbitrary SQL input vulnerabilities being reachable this way.
wzdd · 16h ago
Although this is indeed a worrying statement, it seems true to me. Most users of sqlite control the SQL they use. The problem I would expect from using a database document interchange format is that a maliciously crafted database could result in a CVE. The page acknowledges this possibility, even while pointing out (in their CVE list) that it hasn't happened so far, or is rare (it's hard to parse some of their descriptions).
munch117 · 14h ago
I'm not that concerned with bugs in sqlite. sqlite is high quality software, and the application that uses it is a more likely source of vulnerabilities.
But I do see a problem if you really need to use a sqlite that's compiled with particular non-default options.
Say I design a file format and implement it, and my implementation uses an sqlite library that's compiled with all the right options. Then I evangelize my file format, telling everyone that it's really just an sqlite database and sooo easy to work with.
First thing that happens is that someone writes a neat little utility for working with the files, written in language X, which comes with a handy sqlite3 library. But that library is not compiled with the right options, and boom, you have a vulnerable utility.
ncruces · 13h ago
Most of the recommended [1] setting are available on a per connection basis, through PRAGMAs, sqlite3_db_config, sqlite3_limit, etc; some are global settings, like sqlite3_hard_heap_limit64.
A binding can expose those settings. It's not a given a third party utility will use them, but they can.
Ah, I missed that 9.a-c were alternatives. And that, in the absence of custom tables or functions, they are merely defense in depth for something that is already secure, barring bugs. I withdraw my concern.
liuliu · 22h ago
One thing I would call out, if you use SQLite as an application format:
BLOB type is limited to 2GiB in size (int32). Depending on your use cases, that might seem high, or not.
People would argue that if you store that much of binary data in a SQLite database, it is not really appropriate. But, application format usually has this requirement to bundle large binary data in one nice file, rather than many files that you need to copy together to make it work.
Retr0id · 21h ago
You can split your data up across multiple blobs
liuliu · 6h ago
That's right, but it is much easier to just use blob without application logic to worry about chunking. It is the same reason why we use SQLite in the first place, a lot of transaction / rollback logic now is on SQLite layer, not the application layer.
So the limitation is really a structural issue that Dr. Hipp at some point might resolve (or not), but pretty much has to be resolved by SQLite core team, not outside contributors (of course you can resolve it by forking, but...).
johncolanduoni · 17h ago
Also you almost certainly want to do this anyway so you can stream the blobs into/out of the network/filesystem, well before you have GBs in a single blob.
Retr0id · 12h ago
Singular sqlite blobs are streamable too! But for streaming in you need to know the size in advance.
bob1029 · 15h ago
This is essential if you want to have encryption/compression + range access at the same time.
I've been using chunk sizes of 128 megabytes for my media archive. This seems to be a reasonable tradeoff between range retrieval delay and per object overhead (e.g. s3 put/get cost).
chmaynard · 23h ago
Dr. Hipp occasionally gets on a soapbox and extolls the virtue of sqlite databases for use as an application file format. He also preaches about the superiority of Fossil over Git. His arguments generally make sense. I tolerate his sermons because he is one of the truly great software developers of our time, and a personal hero of mine.
korkor55 · 17h ago
These are thought-experiments to help better understand how SQLite works.
This is exactly how supporting documentation should be written so that others read it.
He even went over the top with the disclaimers.
burnte · 5h ago
I was skeptical at the start but by the end I didn't care if it was a good idea or a bad one, I learned so much it was a great read.
jockm · 7h ago
The problem is that better is not an abstract measure. It is better at what, for what purpose, in what context? I like fossil in the abstract, but it isn’t integrated well into any of my tools; there is only one hosting service I know of; and they took away the wysiwyg option from the build in wiki (a preference of mine). So it isn’t better for me
Your better will be measured against different criteria, etc.
rudedogg · 1h ago
> The use of a ZIP archive to encapsulate XML files plus resources is an elegant approach to an application file format. It is clearly superior to a custom binary file format.
Can anyone expand on this? Why would it be better than a binary format?
Having to map between SQLite and the application language seems like it’d add lots of complexity, but I don’t have any experience with custom file formats so would love some advice.
lifthrasiir · 20h ago
SQLite can't be reliably used in networked file systems because it heavily relies on locking to be correctly implemented. I recently had to add a check for such file systems in my application [1] because I noticed a related corruption firsthand. Simpler file formats do not demand such requirements. SQLite is certainly good, but not for this use.
In the context of this article, that's largely irrelevant: ZIP cannot be used in a multi-user scenario at all, so even if sqlite isn't perfect, it's still miles better than the ZIP format it replaces in this thought experiment.
chungy · 17h ago
That's pretty broad and over-generalized. Networking file systems without good lock support is almost always a bad setup by an administrator. Both NFS and CIFS can work with network-wide locks just fine.
SQLite advises against using a networking file system to avoid potential issues, but you can successfully do it.
lifthrasiir · 12h ago
As noted in my other comment, those "potential" issues are real and do happen from time to time. Unless SQLite gives some set of configurations to avoid such issues, I can't agree that it's over-generalized.
jdboyd · 16h ago
Are the typical Synology, Qnap, or TrueNAS devices with default Linux, macOS and Windows clients going to be set up correctly by default? If any of the typical things someone is likely to setup following wizards in a home or small office is likely to result in lock not working correctly for SQLite, then it is fair for them to warn against using it on a network file system.
As an application format, you don't generally expect people to be editing an ODF file at the same time though, so network locking doesn't really disqualify it for use as a document format.
mschuster91 · 15h ago
> As an application format, you don't generally expect people to be editing an ODF file at the same time though
Oh hell yes you do. Excel spreadsheets are notorious for people wanting to collaborate on them, and PowerPoint sheets come in close second. It used to be an absolute PITA but at least Office 365 makes the pains bearable.
greenavocado · 20h ago
Easy fix is an empty lock file adjacent to the real one.
lifthrasiir · 20h ago
Yeah, but only if SQLite did support that mode in some built-in VFS implementation...
hedora · 19h ago
Which network filesystems are still corrupting sqlite files?
Sqlite on NFSv3 has been rock solid for some NFS servers for a decade.
Maybe name and shame?
lifthrasiir · 17h ago
Specifically I had an issue over 9p used by WSL2. (I never thought it was networked before this incident.)
afiori · 17h ago
In that case the application would keep a temporary file and copy over when saving
cpach · 16h ago
Maybe, but how would the application know if /data/foo.bar is a local file or mounted via NFS/SMB/etc?
afiori · 13h ago
it would always use such a temporary file and update the "real" file only on explicit saves with fast mv or cp operations
floating-io · 1d ago
An interesting skim, but it would have been more meaningful if it had tackled text documents or spreadsheets to show what additional functionality would be enabled with those beyond "versioning".
Maybe it's just me, but I see the presentation functionality as one of the less used aspects of the OpenOffice family.
jdboyd · 16h ago
What he listed as the first improvement, "Replace ZIP with SQLite" would certainly apply to the other ODF formats.
He advocates breaking the XML into smaller pieces in SQLite. I suppose making each slide a new XML record could make sense. Moving over to spreadsheets, I don't know how ODF does it now, but making each sheet a separate XML could make sense.
Thinking about Write documents, I wonder what a good smaller unit would be. I think one XML per page would be too fine a granularity. You could consider one record per chapter. I doubt one record per paragraph would make sense, but it could be fun to try different ideas.
netsharc · 4h ago
> I think one XML per page would be too fine a granularity.
If I add a 1/3 page graphic on page 2, it'd have to repaginate pages 2-n of that chapter, modifying n-1 XML files...
maweki · 13h ago
Splitting the presentation into multiple fragments makes it more difficult to generate/alter a presentation using xslt.
scott_w · 17h ago
While reading I was musing one way to handle text could be to use a linked list format as storage? To make it work like that, you’d need the editor to work on a block concept and I don’t think document editors work like that?
Spreadsheets might be a little easier because you can separate out by sheet or even down to a row/column level?
Part of me wants to try it now…
tekkk · 6h ago
The fundamental problem in my mind is the mixing of binary and text content. An optimal solution would separate them, allowing systems like Git do the versioning. But separating the tightly coupled parts into own files would also be annoying sharing/management wise.
Base64:ing the images into strings, like one could do with html, would probably not be ideal for compression. As a matter of fact, text-files as such would not be ideal compression-wise.
So I suppose if binary-format cant be avoided, SQLite would be as good as any other compression format. But without built-in collaboration protocol support, like CRDT, with history truncation (and diverged histories can always fall back to diff) I dont think it'd be good enough to justify the migration.
gwd · 13h ago
Anki's storage format is SQLite (or was a few years ago). That made it really lovely when I wanted to import the contents (including the view logs) of Anki deck I'd been using for a decade into a custom system I was designing. Just pop up the `sqlite3` REPL, poke around and see what it looks like, then write standard SQL queries to get the data out.
conorbergin · 23h ago
I've being trying out SQLite for a side project of mine, a virtual whiteboard, I haven't quite got my head around it, but it seems to be much less of a bother than interacting with file system APIs so far. The problem I haven't really solved is how sync and maybe collaboration is going to interact with it, so far I have:
1. Plaintext format (JSON or similar) or SQLite dump files versioned by git
2. Some sort of modern local first CRDT thing (Turso, libsql, Electric SQL)
3. Server/Client architecture that can also be run locally
Has anyone had any success in this department?
rogerbinns · 22h ago
SQLite has a builtin session extension that can be used to record and replay groups of changes, with all the necessary handling. I don't necessarily recommend session as your solution, but it is at least a good idea to see how it compares to others.
That provides a C level API. If you know Python and want to do some prototyping and exploration then you may find my SQLite wrapper useful as it supports the session extension. This is the example giving a feel for what it is like to use:
CRDTs are the way to go if you need something very robust for lots of offline work.
nloomans · 12h ago
Interesting read! I find the idea to use SQL queries to get only the relevant data quite convincing. I do wonder how this would work in practice though. Any changes the user makes would have to be inserted with SQL to allow for the new data to be included in SQL queries, but users also expect to be able to make changes and then not save them (or save them into a different file).
Should one make a massive transaction that is only committed when saving? It is possible to commit such a transaction to a different file when using Save As?
Or maybe for editing one would need to copy the file to a separate temporary location, constantly commit to that file, and when saving move the temporary file over the original file (this way we aren't losing the resilience against corruption SQLite offers).
Or is there a better way to do this? I don't like storing pending changes into the original file since it kinda goes against how users expect files to work (and could cause them to accidentally leak data).
joseda-hg · 9h ago
You could insert any modifications and just mark whatever row the current saved one is
This would also work as a really crude undo tree
I don't really know if it actually goes against users expectations, Office kinda "saves" stuff for you and stores them as temporary versions anyway, to be presented in case you forgot to save
sakesun · 23h ago
If I remember correctly Mendix project file format is simply a sqlite db. I thought the designer was lazy but it turns out it's a reasonable decision.
Recently, DuckDB team raise similar question on DataLake catalog format. Why not just use SQL database for that ? It's simpler and more efficient as well.
sigwinch · 8h ago
With regard to DuckDB catalogs, I think a database is preferred for that. In particular, the tutorials assume PostGres.
psnehanshu · 5h ago
It should be Postgres not PostGres. The latter looks weird.
sgc · 23h ago
It seems like it would be relatively straightforward to make an sqlite based file format and just have users add a plugin if for some reason they couldn't upgrade their older version of LibreOffice etc. I agree with the other commenter who mentioned that the benefits for text and spreadsheet files needs more explanation. But it seems like a good enough idea to have a LibreOffice working group perform a more in depth study. If significant memory reduction is real and that would translate to fewer crashes, it would be a huge boost even if it had no other benefits, IMHO.
bob1029 · 15h ago
> it is still bothersome that changing a single character in a 50 megabyte presentation causes one to burn through 50 megabytes of the finite write life on the SSD.
I used to worry a lot about this but it has never once actually come up for me. 50 megabytes is a pretty extreme example, but even so if you edit this document fewer than several million times it won't
matter.
Serializing the object graph all over again can be way faster than mapping into a tabular model. There are JSON serializers that can push multiple gigabytes per second per core. It might even be the case that, once you factor in the SSD controller quirks, the tabular updates could cause more blocks to be written than just dumping a big fat json stream all at once.
atonse · 21h ago
Didn’t Apple actually move to SQLite for their Pages/Numbers format? I remember reading years ago that it was rocky (the transition), but was maybe eventually smoothed out?
What if instead of API's for data sets, we simply placed a sqlite file onto a web server as a static asset, so you could just periodically do a GET and have a local copy.
abtinf · 23h ago
A few years ago someone posted a site that showed how to query portions of a SQLite file without having to pull the whole thing down.
>> I implemented a virtual file system that fetches chunks of the database with HTTP Range requests
That's wild!
yupyupyups · 23h ago
This works as long as the data is "small" and you have no ACL for it. Assuming you mean automatic downloads.
Devdocs does something similar, but there you request to download the payload manually, and the data is still browsable online without you having to download all of it. The data is also split in a convenient manner (by programming language/library). In other words, you can download individual parts. The UI also remains available offline, which is pretty cool.
With an S3 object lambda, I suppose you could generate the sqlite file on the fly.
anon291 · 23h ago
You can do this today by using the WASM-compiled SQLite module with a custom Javascript VFS that implements the SQLite VFS api appropriately for your backend. I've used it extensively in the past to serve static data sets direct from S3 for low cost.
As a document _exchange_/_interchange_ format, what I prefer for durability is a non-binary format (e.g. XML based).
For local use, I agree SQLite might be much faster than ZIP, and of course the ability to query based on SQL has its own flexibility merits.
thayne · 17h ago
XML isn't great for exchange/interchange either due to security problems and inconsistencies in implementations. A big part of the problem is that xml has a lot of complexity, which leads to a bigger attack surface when parsing and processing untrusted data. And then xml entities are just inherently insecure, unless you disable some of their capabilities (like using remote files, and unlimited recursion).
That said, creating a format that can convey rich untrusted data is a hard problem.
jdboyd · 16h ago
Part of the problem though with saying SQLite instead of XML is a lot of things would lend themselves to XML in SQLite.
Ekaros · 15h ago
Complex features are inherently complex. Say you want external resources or some scripts in document. No matter what storage format you use those are more surfaces. Problem is not storage, but what is done with information. And very often that is a lot and poorly thought out and even more poorly implemented.
thayne · 14h ago
But most applications don't need those features. And if they do, that should be part of the application logic, with appropriate controls. Having your parsing library make arbitrary http requests is a bad idea.
thayne · 15h ago
Oh, I'm not saying sqlite is better than xml for data exchange. As mentioned in other comments, sqlite's security posture towards an untrusted database is problematic. My point is that xml has problems too.
I would change this: "Do what works, not what your database professor said you ought to do."
To this: "Unless you work for Google or FaceBook, just do what works, not what your database professor said you ought to do."
RainyDayTmrw · 23h ago
Juggling all the fragments inside the database, garbage collecting all the unused ones, and maintaining consistency are all quite challenging in this use case.
tombert · 22h ago
I remember I played with some software called "The Illumination Software Creator" [1], and I remember the saved project files were just SQLite databases.
I actually thought it was kind of cool, because I was able to play with it easily with some SQLite explorer tool (I forget which one) and I could easily look at how the save files actually worked.
I haven't really used SQLite for anything serious [2], but always found the idea of it kind of charming. Maybe I should dust it off and try it again.
[2] At least outside of the "included" database in a few web frameworks.
wakawaka28 · 17h ago
What is it that makes you think Lunduke is pseudo-intellectual? He certainly doesn't try to pose as a scholar. If you are like most of his haters, you just refuse to believe that smart people can be conservatives.
kstrauser · 17h ago
There’s no way to discuss Lunduke without getting into politics, so I’ll leave it that Lunduke is clearly a very intelligent person who IMO mistakes his knowledge in some areas for general expertise in other unrelated fields.
It’s a common trap to fall into. See also: Ben Carson. Both of them are obviously intelligent and highly skilled in their professional fields. And both have let that convince themselves that they know everything about everything.
wakawaka28 · 17h ago
I don't think Lunduke is a Ben Carson type. That would be ridiculous. He has opinions about things outside his area of expertise, like all of us, but he also has some unique experiences like having worked for Microsoft and OpenSUSE. His opinions on tech are pretty solid. I also agree with his politics for the most part.
kstrauser · 17h ago
I would hear what he has to say about his tech experiences. I would not be in a room where he was discussing his politics.
tombert · 8h ago
I used to think he was reasonably smart but after a certain point I realized that his knowledge of basically anything he talked about was extremely surface level, and doesn’t appear to know much after that.
I disliked him before he went super conservative, but now his YouTube channel boils down to “OMG GUYS LOOK AT HOW WOKE EVERYTHING IS WOKE WOKE WOKE WOKE WOKE PEOPLE ARE HATERS ON ME BECAUSE I SAID SOMETHING THEY DONT LIKE WOKE WOKE!”
It’s typical low effort grifter stuff.
est · 17h ago
has anyone actually used the `content BLOB` pattern in a larger scale? Suppose if I have tens of thousands of small jpegs would it be better off in a .sqlite file?
Fwiw, autocad uses database format for its file data.
librasteve · 1d ago
wouldn’t an XML database be easier?
duskwuff · 23h ago
You can't* index into XML. You have to read through the whole document until you get to the part you want.
*: without adding an index of your own, at which point it isn't really XML anymore, it's some kind of homebrew XML-based archive format.
Mikhail_Edoshin · 9h ago
This applies to any secondary index. The data themselves can only be ordered by a single criteria. It may be a meaningful one, but I guess in most cases it is merely the internal ID, which means you will have to scan the whole table too.
XML was meant for documents so in most cases the sequence of elements is given. But technically if I compose XML myself I can lay it out the way I want and thus can have it sorted too. This means it will be directly searchable without an index: read a bit at the middle, find an element name, see where we are, choose head or tail, repeat.
duskwuff · 1h ago
Blindly seeking into XML data is a risky, error-prone approach. It's not impossible to do, but doing it correctly is difficult - even if the tags you're looking for are unique, there are a lot of messy edge cases involving comments and <!CDATA> blocks.
HelloNurse · 14h ago
You can store the content of a XML document in a database faithfully enough to reconstruct it exactly. Any system that can produce XML documents is a "XML database".
floating-io · 1d ago
Does an embeddable XML database engine exist at a similar level of reliability?
jsight · 20h ago
They could resurrect xindice!
supportengineer · 23h ago
No.
Zambyte · 20h ago
Why?
renecito · 23h ago
LOL!
ignoramous · 21h ago
> SQLite database has a lot of capability, which this essay has only begun to touch upon. But hopefully this quick glimpse has convinced some readers that using an SQL database as an application file format is worth a second look.
It really is. One of the experiments we have been doing currently to make bug reporting from Androids easier (and to an extent, reduce user frustration and fatigue) is to store app logs (unstructured) in (an in-memory) SQLite table. It lends very well in to on-device LLMs (like Gemma 3n or Qwen2.5 0.5b), as users can Q&A to know just what the app is doing and why it won't work the way they want it to. On-device LLMs are limited (context length and/or embeddings) and too many writes (in batches of 1000 rows) to the in-memory SQLite table (surprisingly) eats up battery like no tomorrow, so this "chat to know what the app is doing" isn't rolled out to everyone, yet.
treyd · 17h ago
What kinds of queries are being done on the logs such that it makes sense to use sqlite instead of, like, just a ring buffer?
mschuster91 · 15h ago
The problem they're alluding to, I think, isn't the query side, it's the creation side. adb logcat and logging in Android in general is one hell of a clusterfuck, not being helped by logging in Java being a PITA.
mac-attack · 1d ago
I'm a fan of both as a Linux user. Interesting thought experiment.
1. Enable the secure_delete pragma <https://antonz.org/sqlite-secure-delete/> so that when your user deletes something, the data is actually erased. Otherwise, when a user shares one of your application's files with someone else, the recipient could recover information that the sender thought they had deleted.
2. Enable the options described at <https://www.sqlite.org/security.html#untrusted_sqlite_databa...> under "Untrusted SQLite Database Files" to make it safer to open files from untrusted sources. No one wants to get pwned when they open an email attachment.
3. Be aware that when it comes to handling security vulnerabilities, the SQLite developers consider this use case to be niche ("few real-world applications" open SQLite database files from untrusted sources, they say) and they seem to get annoyed that people run fuzzers against SQLite, even though application file formats should definitely be fuzzed. https://www.sqlite.org/cves.html
They fail to mention any of this on their marketing pages about how you should use SQLite as an application file format.
I think that's an unfair reading. Sqlite runs fuzzers itself and quickly addresses bugs found by fuzzers externally. There's an entire section in their documentation about their own fuzzers and thanking third party fuzzers, including credit to individual engineers.
https://www.sqlite.org/testing.html
The tone of the CVE docs are because people freak out about CVEs flagged by automated tools when the CVEs are for issues that have no security impact for typical usage of SQLite, or have prerequisites that would already have resulted in some form of compromise.
The CVE docs:
> The attacker can submit a maliciously crafted database file to the application that the application will then open and query
This is exactly the normal use case GP talks about with application file formats.
There are parts of the SQL engine that are exposed to malicious file manipulation (the schema is stored as SQL DDL text) but that's not arbitrary SQL input.
If you want to highlight an inconsistency, this is way more worrying:
> “All historical vulnerabilities reported against SQLite require at least one of these preconditions: (…) 2. The attacker can submit a maliciously crafted database file to the application that the application will then open and query. Few real-world applications meet either of these preconditions…”
However, most of the rest of the page is speaking of arbitrary SQL input, not purposely broken database files.
View and triggers can contain arbitrary SQL and can be defined by a malicious database file, though these can be disabled as described on the "Defense Against The Dark Arts" page.
That leaves default column values and indexes on expressions, which can execute a limited subset of SQL. I'd be worried about certain arbitrary SQL input vulnerabilities being reachable this way.
But I do see a problem if you really need to use a sqlite that's compiled with particular non-default options.
Say I design a file format and implement it, and my implementation uses an sqlite library that's compiled with all the right options. Then I evangelize my file format, telling everyone that it's really just an sqlite database and sooo easy to work with.
First thing that happens is that someone writes a neat little utility for working with the files, written in language X, which comes with a handy sqlite3 library. But that library is not compiled with the right options, and boom, you have a vulnerable utility.
A binding can expose those settings. It's not a given a third party utility will use them, but they can.
1: https://www.sqlite.org/security.html
BLOB type is limited to 2GiB in size (int32). Depending on your use cases, that might seem high, or not.
People would argue that if you store that much of binary data in a SQLite database, it is not really appropriate. But, application format usually has this requirement to bundle large binary data in one nice file, rather than many files that you need to copy together to make it work.
Also, SQLite did provide good support for read / write the blob in streamable fashion, see: https://www.sqlite.org/c3ref/blob_read.html
So the limitation is really a structural issue that Dr. Hipp at some point might resolve (or not), but pretty much has to be resolved by SQLite core team, not outside contributors (of course you can resolve it by forking, but...).
I've been using chunk sizes of 128 megabytes for my media archive. This seems to be a reasonable tradeoff between range retrieval delay and per object overhead (e.g. s3 put/get cost).
He even went over the top with the disclaimers.
Your better will be measured against different criteria, etc.
Can anyone expand on this? Why would it be better than a binary format?
I was watching a talk Andrew Kelley gave about a simple binary format he’s using in Zig: https://www.hytradboi.com/2025/05c72e39-c07e-41bc-ac40-85e83...
Having to map between SQLite and the application language seems like it’d add lots of complexity, but I don’t have any experience with custom file formats so would love some advice.
[1] https://github.com/lifthrasiir/angel/commit/50a15e703ef2c1af...
SQLite advises against using a networking file system to avoid potential issues, but you can successfully do it.
As an application format, you don't generally expect people to be editing an ODF file at the same time though, so network locking doesn't really disqualify it for use as a document format.
Oh hell yes you do. Excel spreadsheets are notorious for people wanting to collaborate on them, and PowerPoint sheets come in close second. It used to be an absolute PITA but at least Office 365 makes the pains bearable.
Sqlite on NFSv3 has been rock solid for some NFS servers for a decade.
Maybe name and shame?
Maybe it's just me, but I see the presentation functionality as one of the less used aspects of the OpenOffice family.
He advocates breaking the XML into smaller pieces in SQLite. I suppose making each slide a new XML record could make sense. Moving over to spreadsheets, I don't know how ODF does it now, but making each sheet a separate XML could make sense.
Thinking about Write documents, I wonder what a good smaller unit would be. I think one XML per page would be too fine a granularity. You could consider one record per chapter. I doubt one record per paragraph would make sense, but it could be fun to try different ideas.
If I add a 1/3 page graphic on page 2, it'd have to repaginate pages 2-n of that chapter, modifying n-1 XML files...
Spreadsheets might be a little easier because you can separate out by sheet or even down to a row/column level?
Part of me wants to try it now…
Base64:ing the images into strings, like one could do with html, would probably not be ideal for compression. As a matter of fact, text-files as such would not be ideal compression-wise.
So I suppose if binary-format cant be avoided, SQLite would be as good as any other compression format. But without built-in collaboration protocol support, like CRDT, with history truncation (and diverged histories can always fall back to diff) I dont think it'd be good enough to justify the migration.
1. Plaintext format (JSON or similar) or SQLite dump files versioned by git
2. Some sort of modern local first CRDT thing (Turso, libsql, Electric SQL)
3. Server/Client architecture that can also be run locally
Has anyone had any success in this department?
https://sqlite.org/sessionintro.html
That provides a C level API. If you know Python and want to do some prototyping and exploration then you may find my SQLite wrapper useful as it supports the session extension. This is the example giving a feel for what it is like to use:
https://rogerbinns.github.io/apsw/example-session.html
Should one make a massive transaction that is only committed when saving? It is possible to commit such a transaction to a different file when using Save As?
Or maybe for editing one would need to copy the file to a separate temporary location, constantly commit to that file, and when saving move the temporary file over the original file (this way we aren't losing the resilience against corruption SQLite offers).
Or is there a better way to do this? I don't like storing pending changes into the original file since it kinda goes against how users expect files to work (and could cause them to accidentally leak data).
This would also work as a really crude undo tree
I don't really know if it actually goes against users expectations, Office kinda "saves" stuff for you and stores them as temporary versions anyway, to be presented in case you forgot to save
Recently, DuckDB team raise similar question on DataLake catalog format. Why not just use SQL database for that ? It's simpler and more efficient as well.
I used to worry a lot about this but it has never once actually come up for me. 50 megabytes is a pretty extreme example, but even so if you edit this document fewer than several million times it won't matter.
Serializing the object graph all over again can be way faster than mapping into a tabular model. There are JSON serializers that can push multiple gigabytes per second per core. It might even be the case that, once you factor in the SSD controller quirks, the tabular updates could cause more blocks to be written than just dumping a big fat json stream all at once.
That's wild!
Devdocs does something similar, but there you request to download the payload manually, and the data is still browsable online without you having to download all of it. The data is also split in a convenient manner (by programming language/library). In other words, you can download individual parts. The UI also remains available offline, which is pretty cool.
https://devdocs.io/
More industrious people have apparently wrapped this up on NPM: https://www.npmjs.com/package/sqlite-wasm-http
As a document _exchange_/_interchange_ format, what I prefer for durability is a non-binary format (e.g. XML based).
For local use, I agree SQLite might be much faster than ZIP, and of course the ability to query based on SQL has its own flexibility merits.
That said, creating a format that can convey rich untrusted data is a hard problem.
To this: "Unless you work for Google or FaceBook, just do what works, not what your database professor said you ought to do."
I actually thought it was kind of cool, because I was able to play with it easily with some SQLite explorer tool (I forget which one) and I could easily look at how the save files actually worked.
I haven't really used SQLite for anything serious [2], but always found the idea of it kind of charming. Maybe I should dust it off and try it again.
[1] https://en.wikipedia.org/wiki/Illumination_Software_Creator by Bryan Lunduke before I realized how much of a pseudo-intellectual dimwit that he is.
[2] At least outside of the "included" database in a few web frameworks.
It’s a common trap to fall into. See also: Ben Carson. Both of them are obviously intelligent and highly skilled in their professional fields. And both have let that convince themselves that they know everything about everything.
I disliked him before he went super conservative, but now his YouTube channel boils down to “OMG GUYS LOOK AT HOW WOKE EVERYTHING IS WOKE WOKE WOKE WOKE WOKE PEOPLE ARE HATERS ON ME BECAUSE I SAID SOMETHING THEY DONT LIKE WOKE WOKE!”
It’s typical low effort grifter stuff.
1. https://www.sqlite.org/fasterthanfs.html
*: without adding an index of your own, at which point it isn't really XML anymore, it's some kind of homebrew XML-based archive format.
XML was meant for documents so in most cases the sequence of elements is given. But technically if I compose XML myself I can lay it out the way I want and thus can have it sorted too. This means it will be directly searchable without an index: read a bit at the middle, find an element name, see where we are, choose head or tail, repeat.
It really is. One of the experiments we have been doing currently to make bug reporting from Androids easier (and to an extent, reduce user frustration and fatigue) is to store app logs (unstructured) in (an in-memory) SQLite table. It lends very well in to on-device LLMs (like Gemma 3n or Qwen2.5 0.5b), as users can Q&A to know just what the app is doing and why it won't work the way they want it to. On-device LLMs are limited (context length and/or embeddings) and too many writes (in batches of 1000 rows) to the in-memory SQLite table (surprisingly) eats up battery like no tomorrow, so this "chat to know what the app is doing" isn't rolled out to everyone, yet.