Since it's impossible to verify, and there's a whole ton of deception by companies, I simply don't trust that any data is ever deleted just because I've been told it is.
Instead, I try to minimize the amount of data that others have.
nemothekid · 4h ago
One of the headaches of system design in this area is how do you deal with backups. Lets say you do regular backups to s3, glacier, tape, stone tablets.
When you tell your customer "we have deleted all your data", are you loading all your backups and scrubbing their data from there as well? Probably not as it would probably be too expensive, and depending on your size, your backups may cease to be backups as users request for data deletion daily.
Ok, then you might say when you restore backups, you reference all the data to a master "deletion" list and before a backup is restored then you reference that deletion list. Still though you are dependent on the company promising to reference the deletion list. When someone "really really" needs data from a week ago and gets a one off backup, it has deleted customer data in it.
Next, my idea was you encrypt all user data with a user specific key, and when the customer requests to delete that, you just delete the key. Perfect. Up until it's time to backup the encryption keys database and you are back at square one. I understand this solution is probably 95% of the way there but if anyone knows any designs which, when implemented, are fairly foolproof and don't require your customers to be "smart" (e.g. have them hold the encryption key).
jimkoen · 4h ago
This is why the GDPR is so nice. A company needs a strategy in place to make sure your data is actually deleted and that strategy needs to be verified to work. Purging backups of records to be deleted upon reimport is fine, but you better make sure that process works, else the person who's data you accidentally didn't delete has a case against you in court.
trinix912 · 4h ago
The thing is people generally aren't going to court when they suddenly notice a deleted photo still in their cloud docs. They're going to think it's a glitch or that they hadn't deleted it. Proving something like this in court is tricky - how would you prove to the judge you deleted something and that it randomly reappeared after some years?
bestouff · 4h ago
It looks trivial to me. Encrypt your backups.
nemothekid · 3h ago
Not sure how encrypting backups solves the problem stated. If I hold the encryption keys, that means I still have access to your data.
sidewndr46 · 6h ago
It's not really any different when a court orders a company to delete data that weren't supposed to have. The judge pretends he understands the plaintiffs arguments and knows what the data is and where it is stored. Later the judge tells the defendant to delete it. The defendant pretends to delete it and the court pretends to verify that something doesn't exist. Afterwards, everyone goes back to business as usual.
outworlder · 4h ago
If you can, edit the data first, don't delete.
Not all systems keep all versions indefinitely.
quantified · 1h ago
Did you ever sign up for 23andMe?
Veserv · 6h ago
Yep, until they make a legally binding claim supported by a self-imposed liquidated damages clause for failure to abide by their own claims their claims should be ignored.
lrvick · 4h ago
The only way I would believe proof of deletion is if my data was submitted, end to end encrypted, to a key only held in memory of a quorum of remotely attestable secure enclaves deterministically built from publicly available code that I can easily confirm has no means to export keys to the control of any individual.
This is not only possible, I designed and open sourced a lot of tooling to do it and a few companies are doing this today. Shameless plug: My company (https://distrust.co) provides consulting for orgs that want to be ahead of the pack to retrofit their existing infrastructure to support these types of assurances.
Now we just need to require verifiable deletion techniques like this in order to get a standardized privacy certification browsers can verify and alert users to along the lines of the TLS green lock.
I give it 20 years.
3s · 1h ago
We built something similar using secure e enclaves at Tinfoil for verifiably private AI! Unless there is proof of no data access / retention we cannot trust what happens to our data (see recent openAI court ordered retention)
pluto_modadic · 4h ago
do you provide datacenter attestation primitives (e.g. Intel DCAP (only newer chips), ARM CCA-SSE(still being built), or AMD trust zone verifications or whatever they're called)?
software attestable enclaves are one thing. hardware attestable ones are quite another.
clickety_clack · 7h ago
*For definitions of “your”, “information”, “permanently” and “deleted”, please refer to one of the dense, poorly worded contracts you implicitly agreed to when you thought about our site.
reverendsteveii · 4h ago
These definitions are subject to not just variance from their common meaning but also unilateral change without notice. Offer not valid in Alaska, Hawaii or Puerto Rico. Your mileage may vary. Do not taunt Happy Fun Ball.
tempodox · 4h ago
It's less “poorly worded” than finely tuned legalese that gives the company carte blanche.
codeplea · 6h ago
It's probably in the Privacy Policy they just emailed about.
codeplea · 6h ago
>I received a confirmation that said, “Your personal information and items associated with your account have now been deleted. This action is permanent and cannot be reversed.”
By the same logic, wasn't this first email self-contradicting? If your data is gone, how are they emailing you to tell you that your data is gone?
But really, aren't companies legally required to retain a lot of information anyway? Such as invoices needed for tax purposes?
JohnMakin · 6h ago
With data deletion requests, you sometimes do need a mechanism to keep track of who/what you deleted. This inevitably involves PII. What comes to mind is CCPA requests to delete data from private data brokers - there is an inherent problem that to avoid re-ingesting your data into their system, they need to know what that data is in the first place.
andrewflnr · 6h ago
> If your data is gone, how are they emailing you to tell you that your data is gone?
It was still in-memory for the deletion request after they finished the deletion query. It probably stayed in memory after the request finished, too, until the page was re-used. The horror.
SoftTalker · 6h ago
It's nothing necessarily nefarious. His email address is still in a mailing list, probably at MailChimp or some other third party that they use for mass emailing. Doesn't mean they still have an "account" or "profile" of personal information for him. Of course, it doesn't mean they don't.
midtake · 4h ago
It could just be mailing list negligence. Mailing lists are usually decoupled from the main user db/IAM.
nitwit005 · 4h ago
Retaining contact information for legal communication seems a logical exception.
After all, how would they even email you back to tell you they deleted your data, if they deleted all records that include your email address?
therobot24 · 7h ago
until there's actual enforcement, there isn't the incentive to tell the truth...
It really is sad how much data has been captured and monetized of the average person. It seems like we're only continuing to turn up the heat as we continue to 'boil the frog'.
arez · 6h ago
I thought that's why we have GDPR and similar laws, so you can enforce it? If the company says it deleted your data but it didn't it's definitely not complying with GDPR
ygjb · 6h ago
GDPR requires data to be deleted where feasible. A common area where this falls apart is in backups made of systems implemented prior to GDPR rules, or systems which have not implemented a mechanism to allow user level deletion from backups.
There is a somewhat accepted pattern here where backup processes are updated to retain a list of users who have requested deletion, and when a restore from backup is performed, before the restored system is brought back online, the data of users who have requested deletion is removed.
As with many other compliance and governance controls, this is a known pattern, but is subject to review by auditors, and the overall pattern, or the specific implementation of the pattern may not survive a legal test via a complaint by a consumer or regulator.
Nextgrid · 2h ago
GDPR can only be enforced by regulators. The bar for a valid complaint is quite high, and a company can lie and essentially remove your grounds for said complaint. And even once you do get a valid complaint in, it'll stay in limbo for years. Noyb has some info on the subject: https://noyb.eu/en/data-protection-day-only-13-cases-eu-dpas...
ecshafer · 6h ago
So many databases are set up to "delete" a record by just marking the column is_deleted as true, and the record is not actually deleted. Meaning a lot of deleted data is around on disk somewhere but just ignored in most queries.
andrewflnr · 6h ago
I'm leaning toward incompetence on this one. Certainly if they were deliberately keeping his data against his will, it would be stupid to email him about it. The people responsible for deleting the account info probably deleted everything they knew about or had access to, but his email was also in some other database run by marketing or something. Or their databases are just overall horrifically denormalized and inconsistent.
(Of course, sufficiently advanced incompetence is indistinguishably from malice. Hard to say if that's applicable here.)
kps · 6h ago
From the headline I expected a startling breakthrough in physics.
petercooper · 6h ago
The GDPR isn't mentioned, but as one of the more stringent privacy regulation regimes, its 'right to erasure' has all sorts of conditions attached to it where a customer might be told that all of their data has been deleted, but some legally has to be (or can be) stored.
For example, you can store a record that an erased user requested erasure so you can prove it later on if needed in a legal situation (article 17.3.e). Updating such users about legal policies that apply to such retained data may still be subject to would seem rather inane but I could easily believe it existing as a policy at companies adopting a very eager interpretation of the regulations.
asadotzler · 5h ago
Can you claim to that user it is deleted when it is not just because you're holding onto it for legal reasons? I understand the need or requirement to hold some documents, but I don't understand how companies can lie to users claiming their information was deleted when it was not. IMO, they should be required to inform users what specific items were not deleted and the reasons for that.
petercooper · 5h ago
It's semantics, but one man's "lying" is another man's pragmatic, non-legalese customer-facing wording.
For example "Your personal information has been deleted" versus the potentially much messier truth, which might involve citing the GDPR, mentioning that for accounting reasons you have to maintain their details on invoices, areas of your financial auditing process, that you're maintaining a record of their request to delete the account, and so on and so forth.
JohnFen · 4h ago
No, it's lying. If they say your data has been deleted without a qualifier that some of it remains undeleted (regardless of the reason), that's just a straight-up lie because their statement is factually untrue and they know it.
They could tell the truth without going into the specific messy detail.
xeonmc · 6h ago
“Right to erasure” may mean completely different things depending the kind of government you’re dealing with.
Mystery-Machine · 4h ago
Interesting that this comes from a Microsoft employee...
ghewgill · 4h ago
Raymond Chen happens to have worked for Microsoft for a long time, but he is an institution unto himself.
Instead, I try to minimize the amount of data that others have.
When you tell your customer "we have deleted all your data", are you loading all your backups and scrubbing their data from there as well? Probably not as it would probably be too expensive, and depending on your size, your backups may cease to be backups as users request for data deletion daily.
Ok, then you might say when you restore backups, you reference all the data to a master "deletion" list and before a backup is restored then you reference that deletion list. Still though you are dependent on the company promising to reference the deletion list. When someone "really really" needs data from a week ago and gets a one off backup, it has deleted customer data in it.
Next, my idea was you encrypt all user data with a user specific key, and when the customer requests to delete that, you just delete the key. Perfect. Up until it's time to backup the encryption keys database and you are back at square one. I understand this solution is probably 95% of the way there but if anyone knows any designs which, when implemented, are fairly foolproof and don't require your customers to be "smart" (e.g. have them hold the encryption key).
Not all systems keep all versions indefinitely.
This is not only possible, I designed and open sourced a lot of tooling to do it and a few companies are doing this today. Shameless plug: My company (https://distrust.co) provides consulting for orgs that want to be ahead of the pack to retrofit their existing infrastructure to support these types of assurances.
Now we just need to require verifiable deletion techniques like this in order to get a standardized privacy certification browsers can verify and alert users to along the lines of the TLS green lock.
I give it 20 years.
software attestable enclaves are one thing. hardware attestable ones are quite another.
By the same logic, wasn't this first email self-contradicting? If your data is gone, how are they emailing you to tell you that your data is gone?
But really, aren't companies legally required to retain a lot of information anyway? Such as invoices needed for tax purposes?
It was still in-memory for the deletion request after they finished the deletion query. It probably stayed in memory after the request finished, too, until the page was re-used. The horror.
After all, how would they even email you back to tell you they deleted your data, if they deleted all records that include your email address?
It really is sad how much data has been captured and monetized of the average person. It seems like we're only continuing to turn up the heat as we continue to 'boil the frog'.
There is a somewhat accepted pattern here where backup processes are updated to retain a list of users who have requested deletion, and when a restore from backup is performed, before the restored system is brought back online, the data of users who have requested deletion is removed.
As with many other compliance and governance controls, this is a known pattern, but is subject to review by auditors, and the overall pattern, or the specific implementation of the pattern may not survive a legal test via a complaint by a consumer or regulator.
(Of course, sufficiently advanced incompetence is indistinguishably from malice. Hard to say if that's applicable here.)
For example, you can store a record that an erased user requested erasure so you can prove it later on if needed in a legal situation (article 17.3.e). Updating such users about legal policies that apply to such retained data may still be subject to would seem rather inane but I could easily believe it existing as a policy at companies adopting a very eager interpretation of the regulations.
For example "Your personal information has been deleted" versus the potentially much messier truth, which might involve citing the GDPR, mentioning that for accounting reasons you have to maintain their details on invoices, areas of your financial auditing process, that you're maintaining a record of their request to delete the account, and so on and so forth.
They could tell the truth without going into the specific messy detail.