I'm curious as to why you choose to break out specific headers in the schema.
For example, you have recipients, subject, and sender as JSON fields, when you could have just a headers field with all of them, and even add the rest of the headers in the message.
If it's performance related, you can still have headers as a single json blob and then use generated columns for the specific fields.
For example
CREATE TABLE IF NOT EXISTS "messages" (
"id" INTEGER NOT NULL PRIMARY KEY, -- internal id
"message_id" TEXT NOT NULL, -- Gmail message id
"thread_id" TEXT NOT NULL, -- Gmail thread id
"headers" JSON NOT NULL, -- JSON object of { "header": value },
"subject" TEXT GENERATED ALWAYS AS (json_extract("headers", '$.Subject')) VIRTUAL NOT NULL)
...
);
CREATE INDEX subjectidx on messages(subject);
I've found this model really powerful, as it allows users to just alter table to add indexed generated columns as they need for their specific queries. For example, if I wanted to query dkim status, it's as simple as
ALTER TABLE messages ADD dkim TEXT GENERATED ALWAYS AS (json_extract("headers", '$."Dkim-Signature"')) VIRTUAL NOT NULL);
CREATE INDEX dkimidx on messages(dkim);
SELECT dkim, COUNT(0) FROM messages GROUP BY dkim;
or whatever you want.
Hakkin · 2h ago
Note that you don't actually need the generated column either, SQLite supports indexes on expressions, so you can do, for example,
CREATE INDEX subjectidx ON messages(json_extract(headers, '$.Subject'))
and it will use this index anywhere you reference that expression.
I find it useful to create indexes like this, then create VIEWs using these expressions instead of ALTER'ing the main table with generated columns.
This looks interesting. I've DIY'd something similar with qdirstat before but you need to arrange your emails a certain way like dated folders and can't re-slice with different criteria.
On the other hand, qdirstat "cache" files are really easy to generate so can be used for visualizing a bunch of file-like things
andai · 7h ago
Hey this is really neat! It's like those disk usage visualizers, except that it seems to focus on the total volume of the mail rather than the disk usage.
Is there a size option too? To see which senders are using most of my storage.
(Also your website's SSL certificate has expired.)
terhechte · 5h ago
No currently not. It would be easy to add though. I haven't updated the tool in a while (after using it to clean up my Gmail inbox). Thanks for pointing out the certificate!
I really lament that you cannot sign in even with an application specific password any more and you need to get an oauth client and go through an oauth flow. It’s my email, but Google takes away an open standard even for myself to access it.
kasey_junk · 3m ago
Sorry, why do you consider app specific passwords an open standard but oauth not?
sdoering · 3h ago
Given the amount of spam I receive on my free Gmail addresses (compared to my paid for freelance one), and the amount of spam I receive from Gmail servers on my non Gmail-E-Mail accounts I get more and more inclined towards degoogling myself.
Especially as I receive more and more information that my freelance e-mail is put into spam by recipient systems.
Not sure how to get rid of my Google ecosystem routines, though. Feels daunting.
redeeman · 1h ago
step 1: extract data
step 2: just dont use google shit anymore. Deal with it.
you dont get it done by moping about it, but by doing
flas9sd · 33m ago
having sqlite exporters for platforms is great help for archiving, but also general questions: I used https://github.com/ltdangle/mail2db to see how much mail volume I still have on a mail account that I want to mostly move away from. A top10 of senders directed my un- and resubscribe actions.
TekMol · 7h ago
Shouldn't this be "imap to sqlite" or something? Why tie it to one specific email provider?
isaachinman · 5h ago
Because _it is_ specific to Gmail. It's using OAuth and presumable API access.
IMAP is much harder, and much slower, and is bound by Google's bandwidth limits.
pastage · 2h ago
Doing a mbox export with Google Takeout from gmail is pretty fast.
phh · 1h ago
FWIW, for several years I've tried backuping my gmail account with imap (including some stuff made specifically for gmail): It never succeeded. The best syncer were running for one month, and after one month it hit some mails that it simply couldn't retrieve? Like I guess it was in too cold storage and timeout-ed? I don't know.
So I can understand why using Google's proprietary API might work better (or not, I don't know)
Anyway, as a sibling says, nowadays Google Takeout includes mbox and work properly (and is pretty fast, like half a day), but doesn't allow continuous update.
And I migrated to another mail provider (infomaniak), and I've thanked myself for using my own mail domain name years earlier.
yread · 7h ago
Would be nice to enable fulltext search as well
padjo · 6h ago
Yes! I find gmail’s full text search surprisingly bad given it’s run by a search company.
porker · 4h ago
But not as bad as Outlook 365's search...
pastage · 2h ago
Outlook must be the worst email client there is. Something about least common denominator.
jbverschoor · 3h ago
Boot as bad as the Mail.app from iOS and macOS
isaachinman · 5h ago
Agreed! One of the reasons we started working on Marco.
What's the best open-source GMail backup software that exists? Someone has setup something like that? (also archiving attachments, etc)
nijave · 6m ago
You can also use an IMAP client and set it to offline/download mode so it downloads everything and saves it locally. I think "offline mode" is what it's called in Evolution--not sure what Thunderbird or other clients call it.
- Resume only for backup (so large restores generally fail)
sbarre · 1h ago
This isn't exactly what you're asking for, but Google offers a service called Takeout that lets you request and download backups of all your data from their services, including Gmail.
I have a reminder to trigger this every few months and update my local backup. If I recall it comes as a gzipped mbox file.
einpoklum · 5h ago
Let us stop using GMail:
* Google collects vast amounts of personal data, specifically through receiving all of your email and analyizing it.
* It builds elaborate user profiles and uses them to target you with ads designed to better influence you.
* Its hold on information (from different sources) has made it excessively powerful economically, and thus also politically.
* Google/Alphabet has long started to affect legislation, including through direct registered lobbying: ~15 Million USD in 2024 (opensecrets.org).
* It has been known to pass, and likely still passes, the information it collects - including copies of your email correspondence - on to the US government (Edward Snowden leaks).
and finally:
* There are multiple email providers, many of them quite good - both for pay and gratis. Naturally most of the gratis ones have their own interests, but nothing like Google.
phantompeace · 23m ago
What are the best free replacements to gmail that are realistic to switch to? I.E well established and not poised to close down any time soon
This should be seen as a encouraging to switch to something else rather than defeatist. Many of my communications do not touch Google services, professionally it has been judged as too risky, personally I keep a google account but also others.
Edit: You can create groups of people that are not affected by Google/Apple/Facebook, this should be seen as a goal.
mediumsmart · 2h ago
I was with you from day one and never started using gmail.
jokoon · 6h ago
I would have preferred a script that parses the mail backup Google sends you.
I think it's a big eml file.
ukuina · 6h ago
Google Takeout regularly fails to complete for me. Syncing via the API seems like a reasonable alternative.
pdyc · 4h ago
this is great if only there was a tool for whatsapp to sqlite it would make my data so much more useful
For example, you have recipients, subject, and sender as JSON fields, when you could have just a headers field with all of them, and even add the rest of the headers in the message.
If it's performance related, you can still have headers as a single json blob and then use generated columns for the specific fields.
For example
I've found this model really powerful, as it allows users to just alter table to add indexed generated columns as they need for their specific queries. For example, if I wanted to query dkim status, it's as simple as or whatever you want.I find it useful to create indexes like this, then create VIEWs using these expressions instead of ALTER'ing the main table with generated columns.
https://github.com/terhechte/postsack
On the other hand, qdirstat "cache" files are really easy to generate so can be used for visualizing a bunch of file-like things
Is there a size option too? To see which senders are using most of my storage.
(Also your website's SSL certificate has expired.)
Especially as I receive more and more information that my freelance e-mail is put into spam by recipient systems.
Not sure how to get rid of my Google ecosystem routines, though. Feels daunting.
you dont get it done by moping about it, but by doing
IMAP is much harder, and much slower, and is bound by Google's bandwidth limits.
So I can understand why using Google's proprietary API might work better (or not, I don't know)
Anyway, as a sibling says, nowadays Google Takeout includes mbox and work properly (and is pretty fast, like half a day), but doesn't allow continuous update.
And I migrated to another mail provider (infomaniak), and I've thanked myself for using my own mail domain name years earlier.
https://marcoapp.io
- Open source
- Resume (so backups/restores will eventually complete)
Honorable mention: https://www.mailstore.com/en/products/mailstore-home/
- Not open source
- GUI with index: nice for searching mail locally
- Resume only for backup (so large restores generally fail)
I have a reminder to trigger this every few months and update my local backup. If I recall it comes as a gzipped mbox file.
* Google collects vast amounts of personal data, specifically through receiving all of your email and analyizing it.
* It builds elaborate user profiles and uses them to target you with ads designed to better influence you.
* Its hold on information (from different sources) has made it excessively powerful economically, and thus also politically.
* Google/Alphabet has long started to affect legislation, including through direct registered lobbying: ~15 Million USD in 2024 (opensecrets.org).
* It has been known to pass, and likely still passes, the information it collects - including copies of your email correspondence - on to the US government (Edward Snowden leaks).
and finally:
* There are multiple email providers, many of them quite good - both for pay and gratis. Naturally most of the gratis ones have their own interests, but nothing like Google.
Edit: You can create groups of people that are not affected by Google/Apple/Facebook, this should be seen as a goal.
I think it's a big eml file.
To me having to install a tool through Python is a show-stopper.
[0] https://github.com/djipko/gbackup-rs