I'm curious as to why you choose to break out specific headers in the schema.
For example, you have recipients, subject, and sender as JSON fields, when you could have just a headers field with all of them, and even add the rest of the headers in the message.
If it's performance related, you can still have headers as a single json blob and then use generated columns for the specific fields.
For example
CREATE TABLE IF NOT EXISTS "messages" (
"id" INTEGER NOT NULL PRIMARY KEY, -- internal id
"message_id" TEXT NOT NULL, -- Gmail message id
"thread_id" TEXT NOT NULL, -- Gmail thread id
"headers" JSON NOT NULL, -- JSON object of { "header": value },
"subject" TEXT GENERATED ALWAYS AS (json_extract("headers", '$.Subject')) VIRTUAL NOT NULL)
...
);
CREATE INDEX subjectidx on messages(subject);
I've found this model really powerful, as it allows users to just alter table to add indexed generated columns as they need for their specific queries. For example, if I wanted to query dkim status, it's as simple as
ALTER TABLE messages ADD dkim TEXT GENERATED ALWAYS AS (json_extract("headers", '$."Dkim-Signature"')) VIRTUAL NOT NULL);
CREATE INDEX dkimidx on messages(dkim);
SELECT dkim, COUNT(0) FROM messages GROUP BY dkim;
or whatever you want.
Hakkin · 1h ago
Note that you don't actually need the generated column either, SQLite supports indexes on expressions, so you can do, for example,
CREATE INDEX subjectidx ON messages(json_extract(headers, '$.Subject'))
and it will use this index anywhere you reference that expression.
I find it useful to create indexes like this, then create VIEWs using these expressions instead of ALTER'ing the main table with generated columns.
xearl · 4h ago
TIL, thanks a lot!
formerly_proven · 1h ago
You can also create indices directly on expressions, including json_extract etc.
dotancohen · 4h ago
I see that you defined the `dkim` column as NOT NULL. So what happens when an email message does not contain the Dkim-Signature header?
hun3 · 3h ago
Probably something like
Error: stepping, NOT NULL constraint failed: messages.dkim (19)
Hey this is really neat! It's like those disk usage visualizers, except that it seems to focus on the total volume of the mail rather than the disk usage.
Is there a size option too? To see which senders are using most of my storage.
(Also your website's SSL certificate has expired.)
terhechte · 4h ago
No currently not. It would be easy to add though. I haven't updated the tool in a while (after using it to clean up my Gmail inbox). Thanks for pointing out the certificate!
the_mitsuhiko · 3h ago
I really lament that you cannot sign in even with an application specific password any more and you need to get an oauth client and go through an oauth flow. It’s my email, but Google takes away an open standard even for myself to access it.
sdoering · 1h ago
Given the amount of spam I receive on my free Gmail addresses (compared to my paid for freelance one), and the amount of spam I receive from Gmail servers on my non Gmail-E-Mail accounts I get more and more inclined towards degoogling myself.
Especially as I receive more and more information that my freelance e-mail is put into spam by recipient systems.
Not sure how to get rid of my Google ecosystem routines, though. Feels daunting.
TekMol · 5h ago
Shouldn't this be "imap to sqlite" or something? Why tie it to one specific email provider?
isaachinman · 4h ago
Because _it is_ specific to Gmail. It's using OAuth and presumable API access.
IMAP is much harder, and much slower, and is bound by Google's bandwidth limits.
pastage · 38m ago
Doing a mbox export with Google Takeout from gmail is pretty fast.
oulipo · 22m ago
What's the best open-source GMail backup software that exists? Someone has setup something like that? (also archiving attachments, etc)
sbarre · 10m ago
This isn't exactly what you're asking for, but Google offers a service called Takeout that lets you request and download backups of all your data from their services, including Gmail.
I have a reminder to trigger this every few months and update my local backup. If I recall it comes as a gzipped mbox file.
yread · 5h ago
Would be nice to enable fulltext search as well
padjo · 4h ago
Yes! I find gmail’s full text search surprisingly bad given it’s run by a search company.
porker · 2h ago
But not as bad as Outlook 365's search...
pastage · 28m ago
Outlook must be the worst email client there is. Something about least common denominator.
jbverschoor · 1h ago
Boot as bad as the Mail.app from iOS and macOS
isaachinman · 4h ago
Agreed! One of the reasons we started working on Marco.
* Google collects vast amounts of personal data, specifically through receiving all of your email and analyizing it.
* It builds elaborate user profiles and uses them to target you with ads designed to better influence you.
* Its hold on information (from different sources) has made it excessively powerful economically, and thus also politically.
* Google/Alphabet has long started to affect legislation, including through direct registered lobbying: ~15 Million USD in 2024 (opensecrets.org).
* It has been known to pass, and likely still passes, the information it collects - including copies of your email correspondence - on to the US government (Edward Snowden leaks).
and finally:
* There are multiple email providers, many of them quite good - both for pay and gratis. Naturally most of the gratis ones have their own interests, but nothing like Google.
This should be seen as a encouraging to switch to something else rather than defeatist. Many of my communications do not touch Google services, professionally it has been judged as too risky, personally I keep a google account but also others.
Edit: You can create groups of people that are not affected by Google/Apple/Facebook, this should be seen as a goal.
mediumsmart · 41m ago
I was with you from day one and never started using gmail.
For example, you have recipients, subject, and sender as JSON fields, when you could have just a headers field with all of them, and even add the rest of the headers in the message.
If it's performance related, you can still have headers as a single json blob and then use generated columns for the specific fields.
For example
I've found this model really powerful, as it allows users to just alter table to add indexed generated columns as they need for their specific queries. For example, if I wanted to query dkim status, it's as simple as or whatever you want.I find it useful to create indexes like this, then create VIEWs using these expressions instead of ALTER'ing the main table with generated columns.
https://github.com/terhechte/postsack
Is there a size option too? To see which senders are using most of my storage.
(Also your website's SSL certificate has expired.)
Especially as I receive more and more information that my freelance e-mail is put into spam by recipient systems.
Not sure how to get rid of my Google ecosystem routines, though. Feels daunting.
IMAP is much harder, and much slower, and is bound by Google's bandwidth limits.
I have a reminder to trigger this every few months and update my local backup. If I recall it comes as a gzipped mbox file.
https://marcoapp.io
I think it's a big eml file.
To me having to install a tool through Python is a show-stopper.
[0] https://github.com/djipko/gbackup-rs
* Google collects vast amounts of personal data, specifically through receiving all of your email and analyizing it.
* It builds elaborate user profiles and uses them to target you with ads designed to better influence you.
* Its hold on information (from different sources) has made it excessively powerful economically, and thus also politically.
* Google/Alphabet has long started to affect legislation, including through direct registered lobbying: ~15 Million USD in 2024 (opensecrets.org).
* It has been known to pass, and likely still passes, the information it collects - including copies of your email correspondence - on to the US government (Edward Snowden leaks).
and finally:
* There are multiple email providers, many of them quite good - both for pay and gratis. Naturally most of the gratis ones have their own interests, but nothing like Google.
Edit: You can create groups of people that are not affected by Google/Apple/Facebook, this should be seen as a goal.