Cognition (Devin AI) to Acquire Windsurf (cognition.ai)
282 points by alazsengul 3h ago 218 comments
Replicube: 3D shader puzzle game, online demo (replicube.xyz)
48 points by inktype 3d ago 7 comments
SQLite async connection pool for high-performance (github.com)
15 points by slaily 3d ago 5 comments
Data brokers are selling flight information to CBP and ICE
340 exiguus 159 7/14/2025, 4:02:36 PM eff.org ↗
In 2012 I created a killer prototype that demonstrated that you could accurately reconstruct most people's flight history at scale from social media and/or ad data. Probably the first of its kind. This has been possible for a long time.
A quick sketch of how it worked:
We filtered out all spatiotemporal edges in the entity graph with an implied speed of <300 kilometers per hour or <200 kilometers distance, IIRC. This was the proxy for "was on a plane". It also implicitly provided the origin and destination.
These edges can be correlated with both public flight data and maintenance IoT data from jet engines to put entities on a specific flight. People overlook the extent to which innocuous industrial IoT data can be used as a proxy for relationships in unrelated domains.
In rare cases, there was more than one plausible commercial flight. Because we had their flight history, we assumed in these cases that it was the primary airline they had used in the past, either generally or for that specific origin and destination. This almost always resolved perfectly.
This was impressively effective and it didn't require first-party data from airlines or particularly sophisticated analytics. Space and time are the primary keys of reality.
Sounds like the bigger issue is that you're able to get "spatiotemporal" data in the first place? Otherwise it's like saying "we can figure out all stores you've been to, if we have your credit card transaction history". Sure, it's kinda creepy that you can figure out which stores I went to, but the bigger problem is that you can get the transaction data in the first place. Moreover whatever "spatiotemporal" data needed to reconstruct such flight history is probably more valuable than the flight history itself. Who cares if you know Joe flew on United 8340 when you have hour-by-hour updates on his rough location?
The preposterous thing is that payment processors aren't just allowed to collect this information and tie it to your name, they're required to do that.
People talk a big game about fighting fascism, but how can you allow these laws to exist if you can contemplate what happens when actual fascists get hold of that data going back decades? They need to be dismantled now.
If you want to do it, get a warrant.
To use the US as an example (I doubt other countries are much better) it's estimated that every adult in the US commits multiple Federal felonies per day[1], Federal law is replete with ridiculous laws[2] and the number of federal laws is uncountable by Congressional Research Service staff. Does it matter at that point?
[1] Three Felonies A Day - ISBN 978-1594035227
[2] https://x.com/CrimeADay
That's not a serious estimate: https://news.ycombinator.com/item?id=43744267
Yeah, this just sounds like it's written from the perspective of a data broker.
Tying particular ad analytics (presumably ip geolocation?) to thousands of particular individuals and having it well populated enough to track them is "privileged first-party data access" by another name.
Okay, fine, I'll just install another operating system then, like KDE plasma mobile or GrapheneOS. Your location is still leaked 24/7. This is because your cellular modem has it's own operating system, running underneath your phone's operating system, which is triangulating your location at all times. Once again, you are trusting that telecommunications companies aren't misusing this - but please remember they're complied, by law, to make a lot of this information available to numerous third parties.
Okay fine, let me just remove the Sim then and use my phone on Wifi only, always through a VPN. Your location is still being leaked potentially, for example, by your car. Your car also has a cellular modem which leaks your location, and you probably signed a contract allowing that data to be given to hundreds of third-parties.
Of course, all of this is assuming you don't use any social media. Social media can also leak your location, even without location services. If you review a restaurant - that's your location. Where are your friends? You're probably around them. And on and on.
source?
>You are trusting that this data is not leaked to any third-parties. You cannot verify this, as the data is exfiltrated to servers which you can't verify.
At least on Android you can theoretically disable "google location accuracy" which stops it sending nearby hotspot mac addresses to Google. That's the only public route where google gets your location without you knowingly sending to it. You also imply that mobile operating systems are surreptitiously sending locations back to google/apple even if users have all location related features disabled, but I'm not aware of any evidence this is the case, and this falls into same category as "facebook is secretly listening to you" territory until proven otherwise.
It's also disappointing that the root comment is distracting from the 4th amendment violations by making the conversation about their vague claims of selling mini-palantir demos through abusing web ads.
> you have a record of a lot of location/timestamp data for people
What is the source of that data?
So if you have ad impression data you have IP geolocation, or maybe better, along with the timestamp. Similarly for socials sometimes you get location metadata, and with image uploads you can can get location metadata (though today these are often stripped, historically they weren't).
People on this site probably understand this better than 99% of the world.
The problem is "What can I, as an individual, do about it?"
However, it turns out that thousands of people like to talk about their flights on social media, so we scraped that as a spot check and it mostly lined up perfectly. Good enough for a demo and it would have been difficult to come up with an alternative explanation for the patterns in the data.
The purpose of the PoC was to sell the data analysis infrastructure that made that type analysis possible at scale, it wasn't about the data per se. It was a compelling demo we invented given the data that happened to be available. Startup life.
For fun edge cases, there's always Antarctica, where you can travel from a US base (which looks like you're in the US) to a NZ base (which looks like you're in NZ) in a couple of minutes: https://brr.fyi/posts/credit-card-shenanigans
- https://www.blakefire-security.co.uk/blog/social-media-and-j...
>- https://www.blakefire-security.co.uk/blog/social-media-and-j...
FYI the source you posted never claimed that John Terry's insurance tried to deny the claim, only mentioning that "some" insurance companies warn of it. However even that claim is questionable, because it isn't even from an insurance company, it's from a content marketing piece by an insurance comparison website.
ARC and IATA absolutely do play such a role, as the financial clearinghouses for ensuring that travel agents (online and offline) and airlines can pay each other, and as gatekeepers/certification bodies for agencies to ensure these financial systems aren't abused.
Now, they absolutely do sell access to data to third parties, governmental and nongovernmental. But the reason they have this data isn't because they buy it to resell it; they are fully part of the funds flow for the underlying transaction. Whether they should be allowed to sell or share non-anonymized data on passenger records and prices paid is a very good question, but at the very least this is about as first-party as data gets.
https://www.altexsoft.com/blog/airline-reporting-corporation... describes some of these flows. (Here be dragons.)
The general public have no idea how much ad providers and data brokers know about them.
I’m guessing with the help of Palantir, the government has even more data and can probably link Reddit posts etc based on styleometry and can even perform psychological analysis on your personality and tendencies, etc.
After being burnt by things taken from my social media out of context, used to publicly shame me, I locked down my social media
Am I "sweetly naive" to think that had an effect? I do think it did
Before I stopped using Facebook I noticed, over the last decade, that almost every account I encountered was locked down similarly
My point is I suspect it is getting harder, not easier, for data thieves. The golden age of data theft has passed. Maybe.
Do banks sell this information? This bill was pulled from this ATM in Georgia by one Claudius McMoneyhands, and then deposited by one CashMoneyBusiness LLC in South Carolina three weeks later
Seems like there could still be intermediaries and a lack of what you actually bought with it at least?
https://www.nytimes.com/2023/09/22/magazine/hank-asher-data....
I'm aware that using adblockers and avoiding social media doesn't entirely prevent tracking, shadow profiles, and such, but surely it makes things more difficult for these companies, no? Or would you say that there's practically no difference between making an effort to preserve one's privacy and just giving up entirely?
In Manufacturing Consent they measured column inches in the NYT-- IIRC it was something like measuring the total that support the relevant U.S. administration's official position on given policy vs. inches that went against the gov't position. In any case, they were measuring column inches.
What were you measuring to come to your conclusion?
If the FTC could do anything here to make this situation better, it would be to give every person access to any data about them that gets sold.
If you were caught demoing something both horrific and internal you would risk serious damage to your career, and ultimately will have zero impact on the industry as there's just too much data out there and too much money wrapped up in it.
Plus, most people working with the data don't bother to look at it. The places I've internally demo'd massive privacy risks were shocked because they didn't realize what their own data was capable of. Most people are just writing jobs that run and shuffle data around from one place to another never really asking "what is this data?" Even among data scientists I'm routinely surprised (so maybe I shouldn't be surprised) how frequently data scientist never do any real error analysis by looking at what the model got wrong and trying to understand why.
https://therecord.media/ftc-complaint-against-kochava-unseal...
Among the additional information Kochava collects and sells are non-anonymized individual home addresses, phone numbers, email addresses, gender, age, ethnicity, yearly income, “economic stability,” marital status, education level, political affiliation and “interests and behaviors,” compiling and selling dossiers on individuals marketed as offering a “360-degree perspective,” the FTC said.
...
According to the FTC, Kochava’s data can identify women who visit reproductive clinics by name and address along with, for example, when they visit particular buildings, their names, email and home addresses, number of children, race and app usage.
...
Kochava marketing materials tell customers it offers “rich geo data spanning billions of devices globally” and that its location data feed “delivers raw latitude/longitude data with volumes around 94B+ geo-transactions per month, 125 million monthly active users, and 35 million daily active users, on average observing more than 90 daily transactions per device.”
...
The complaint also alleges that the company has lax procedures for determining who it is selling data to, saying purchasers are allowed to use a generic personal email address, label an alleged company as “self” and explain they plan to use the data for “business.”
And then there's this: https://therecord.media/data-brokers-are-selling-military-se...
I know someone who bought the address of everyone with a specific first name.
Where I live it is.
Let's say we want this dataset: Credit card line items for 35-year-old dentists living on the 400 block of Elm street in local town
How much do I have to pay you to get it?
Never ask a sales person how much yo have to pay when the prices are not already clearly stated. Tell them how much you are willing to spend to see if they will do it for that amount. Sales people will always shoot high hoping to not leave money on the table. The price might change depending on how much you squeal and how high they shot. Your initial "willing to spend" should also be lower than you're actually willing to spend for the same but converse reason
Seems like the first thing to do would be to get an account with one of these data brokers. I'd imagine most of these places are "contact us for pricing" so they can play used car salesman games
Or, you could ask John Oliver to do it for you and then tell all of us on one of his episodes exactly how in depth it could get. They have the money to do this, and it seems like something right in his team's wheel house
I think most people here understand that Google sells ads against that data, but they aren't selling the data.
If the headline is "Mark Zuckerberg is amassing your data and you know it's for evil", it's an easy sell. If it's "there's an ecosystem of little-known companies that sell transaction, location and lifestyle data to marketers, journalists, PIs, and police departments alike", it's not exactly the kind of a message that spurs people to action. And yeah, the newspaper that would be breaking the news is a customer too.
I do not believe that. I would like evidence before I am convinced
If my bank is releasing that data I am horrified. I live in anew Zealand and our privacy laws are clear: it would be illegal
To add to this, any mention of "telemetry" is taken to mean your PII being taken by bad actors to abuse, instead of what it is in 99% of cases, which is usage statistics. (X% of our users use feature A, it merits investment). It can be both, but there's usually no place for differentiation, just pitchforks.
Answering to court orders isn't "ratting". You either answer court orders or go to prison.
https://discuss.privacyguides.net/t/privacy-pass-the-new-pro...
Fool me once, shame on you. Fool me 153,927,861 times, shame on me.
The place for differentiation, the place for "oh this is probably fine", the benefit of the doubt is, of course, lost.
Because someone (you? people shaped like you?) who misuse telemetry destroyed trust.
> It can be both
should instead be "it usually is both and you the user have no way to know anyway."
In many cases joining datasets is both labor intensive and creates a surprising amount of new information, and there is also plenty of "free" data that is incredibly tedious to work with.
I used to work with real estate data for the government and if you search for any common things you might want to know you often land on a data brokers page even though property assessor data is freely available in most counties. The problem is each county has their own system of storing data and their own process for searching it. It's a lot of work to learn how just this one dataset works, combining this for all counties in the US is a massive project.
Whenever I buy a new home I always look up all my neighbors, figure out when they bought the house, how much they paid etc. Some people get freaked out by this, but this information is public in most counties.
By joining this data with another public data set, you can actually figure out which lender your neighbors used and what their reported income at time of sale, their age and ethnic background.
Of course there are plenty of other ways data brokers come across data, but even cleaning up and joining public data can require a fair bit of time and expertise.
I am a perfect example of this. Due to a bit of a quirk in how my house got its address assigned to it in 1959, we have a unique postal code. If a data broker gets access to a list of product purchases by postal code from a retailer, that's in theory somewhat anonymized. However... if they also get a list of people-postal code mappings, they have now established exactly what products my wife and I have purchased (by virtue of us being the only two people with this postal code).
Do that across multiple retailers and they've painted an incredibly vivid picture of what exactly we do with our time.
It should not be surprising that they are selling your data for a profit...
I've been touting this as a business model for years. Better still, I'd like to see it done with behavioural models (in the open). That would really blow the lid off the industry. Imagine people charging companies, instead of simply being the product...
Here's some research aided by Perplexity, which estimates that the global data market is valued at about $1.7 Trillion, with data monetization growing at about 17.6% CAGR:
https://www.perplexity.ai/search/today-i-would-like-to-try-a... (138 sources)
Also, Meta can identify you based on your movement and a few pieces of social data (all of which is in the open).
Tel Aviv airport has been running behavioural monitoring for about a decade, predicting crimes before they happen.
You mention a case from 2021, which is about $5 trillion ago, and think that the government selling data is surprising. This is mature market that already knows everything about everyone, especially in the US, and is more concerned with what to do with it. The faucet is open, the ground floor is flooded, and we're discussing the different types of fish that have moved into our apartment.
It is not even certain that the data actually comes from the TSA. It could come from airlines, payment companies, etc.
There is no guarantee of quality when purchasing data from a broker.
It's kinda like how the police need warrants to request cellphone data, but cellphone companies could sell realtime data to third parties who in turn sold it to the police.
https://news.ycombinator.com/item?id=17081684
But you're proposing something even more outlandish, asking another agency for data. The politics of this are mind bending. If one one agency give their data to another and that agency is successful using it it will make the giving agency look bad which is unacceptable. It was wild how many times another, supposedly friendly agency, would not share data. In fact, I was cautioned not to even bring up the idea in shared meetings because it would create unnecessary friction.
If you buy it from a 3rd party government contractor, none of this has to happen.
Flight purchases would be critical and distinct information for law enforcement.
okay...
>zero cost surveillance for the big brother
How is it "free" if they are the ones funding the data brokers?
>>"Movement unrestricted by governments is a hallmark of a free society. "
The other half of the lede is that this govt is using Insert_Method of restricting the movements of it's residents.
At this point, any persecuted activity, e.g., obtaining reproductive healthcare with a link to a person in a Red State, requires opsec procedures comparable to a CIA dark op just to not get persecuted.
To do it properly, not only would you have to change all your logins and email accounts, but simultaneously start using a new computer and phone. Also, move home.
In other words: very hard to achieve. But I wonder if there is a set of achievable actions one can take that gets you to 'very good privacy'?
But here, the controller of the data is the airline, the transfer to the data broker might be illegal, and an airline is the worst company to commit GDPR violations with: They have a lot of global revenue but a relatively thin margin, very little of that margin comes from data abuse (so they can't just shrug off the GDPR fine as a small cost of doing shady business), and they are reachable in the EU (worst case a member state can ground and confiscate their planes, and essentially ban them from flying to the EU by threatening to confiscate any other plane that lands). And yes, Germany will impound a plane to get debts paid: https://www.reuters.com/article/world/thai-prince-to-pay-bon...
The barcode in the boarding pass contains all the information that airlines know about you [1]. It is after all only encoded and not encrypted and so many companies manufacture readers for it.
Airports check-in systems, or it could be from the baggage handling system , the duty free shop or the airport lounge and so on.
There are so many different players who have access to most or all of the data it would hard to prove it came any one source at all.
That is just the barcodes on the boarding pass, passport scanners are like couple of hundred dollars ans airport shops/car rentals use them all the time.
Many airports use facial scanning these days and don’t even ask for boarding pass/passport/visa during boarding at all .
There are auxiliary sources which could be used in conjunction with other sources like Uber booking and so on.
[1] https://krebsonsecurity.com/2015/10/whats-in-a-boarding-pass...
You'll get a response from their legal counsel requesting some information for them to verify your request.
Frankly, Clear and TSA-Pre makes my life so much easier and since I don’t commit crimes I’m not very worried… just a little worried.
I hate the excuse "since I don't commit crimes". It's not about that. If they want your info that you're not directly giving them, they can get a warrant.
What if it affects your ability to get work? Have you ever made or viewed any posts that could be considered political or made comments on a political post? What agenda do you support with those actions?
Terms of service are meaningless if they keep the extent as secret as possible. Facebook has demonstrably shown this and as shocking as it is they are restrained compared to lots of companies.
Especially when you can out source the full evil to a wholly owned subsidiary for plausible deniability.
And if private corpse know something, many foreign governments know all of it.
But hey, it makes Silicon Valley money.