Performance and telemetry analysis of Trae IDE, ByteDance's VSCode fork
Here are the key findings:
1. Extreme Resource Consumption: Out of the box, Trae used 6.3x more RAM (~5.7 GB) and spawned 3.7x more processes (33 total) than a standard VSCode setup with the same project open. The team has since made improvements, but it's still significantly heavier.
2. Telemetry Opt-Out Doesn't Work (It Makes It Worse): I found Trae was constantly sending data to ByteDance servers (byteoversea.com). I went into the settings and disabled all telemetry. To my surprise, this didn't stop the traffic. In fact, it increased the frequency of batch data collection. The telemetry "off" switch appears to be purely cosmetic.
3. What's Being Sent: Even with telemetry "disabled," Trae sends detailed payloads including: Hardware specs (CPU, memory, etc.) Persistent user, device, and machine IDs OS version, app language, user name Granular usage data like time-on-ide, window focus state, and active file types.
4. Community Censorship: When I tried to discuss these findings on their official Discord, my posts were deleted and my account was muted for 7 days. It seems words like "track" trigger an automated gag rule, which prevents any real discussion about privacy.
I believe developers should be aware of this behavior. The combination of resource drain, non-functional privacy settings, and censorship of technical feedback is a major red flag. The full, detailed analysis with all the evidence (process lists, Fiddler captures, JSON payloads, and screenshots of the Discord moderation) is available at the link. Happy to answer any questions.
https://theia-ide.org/
It was rough a few years ago, but nowadays it's pretty nice. TI rebuilt their Code Composer Studio using Theia so it does have some larger users. It has LSP support and the same Monaco editor backend - which is all I need.
It's VSCode-with-an-Eclipse-feel to it - which might or might not be your cup of tea, but it's an alternative.
* https://code.visualstudio.com/Docs/languages/markdown#_inser...
Also it used to be kinda heavy, but it became lighter because of Moore's law and good code management practices all over the board.
I'm planning to deploy Theia in its web based form if possible, but still didn't have the time to tinker with that one.
https://marketplace.visualstudio.com/items?itemName=redhat.j...
Unique Identifiers: Machine ID, user ID, device fingerprints Workspace Details: Project information, file paths (obfuscated)
Plus os details.
I'd rather none.
I was interested in learning Dart until the installer told me Google would be collecting telemetry. For a programming language. I’ve never looked at it again.
I suspect they aren't actually doing that, but the GDPR cares not what you're doing with the data, but what is possible with it, hence why any identifier (even "obfuscated") which could lead back to a user is considered PII.
This is not because it is better and I've seen no inclination that it would somehow be more private or secure, but most enterprises already share their proprietary data with AWS and have an agreement with AWS that their TAMs will gladly usher Kiro usage under.
Interesting to distinguish that privacy/security as it relates to individuals is taken at face value, while when it relates to corporations it is taken at disclosure value.
Your analysis is thorough, and I wonder if their reduction of processes from 33 to 20...(WOW) had anything to do with moving telemetry logic elsewhere (hence increased endpoint activity).
What does Bytedance say regarding all this?
Dang said a similarly small minority of users here do all the commenting.
https://old.reddit.com/r/slatestarcodex/comments/9rvroo/most...
Every single person has "something to hide", and that's normal. It's normal to not want your messages snooped through. It doesn't mean you're a criminal, or even computer-saavy.
Like the Casio watches, travelling to Syria, using Tor, Protonmail, etc…
When it is better in reality to have a regular watch, a Gmail with encrypted .zip files or whatever, etc.
It does not mean you are a criminal if you have that Casio watch, but if you have this, plus encrypted emails, plus travel to some countries as a tourist, you are almost certain to put yourself in trouble for nothing, while you tried to protect yourself.
And if you are a criminal, you will put yourself in trouble too, also for nothing, while you tried to protect yourself.
This was the basis of Xkeyscore, and all of that to say that Signal is one very good signal that the person may be interesting.
2. Using a secure, but niche, service is still more secure than using a service with no privacy.
Sure, you can argue using Signal puts a "target" on your back. But there's nothing to target, right? Because it's not being run by Google or Meta. What are they gonna take? There's no data to leak about you.
If I were a criminal, which I'm not, I'd rather rob a bank with an actual gun than with a squirt gun. Even though having an actual gun puts a bigger target on your back. Because the actual gun works - the squirt gun is just kinda... useless.
Actually, there was a case... I can't recall but it might have been in Argentina, where the robbers did explicitly use fake guns when robbing the banks because doing so still actually worked for the purposes of the robbery, and it also reduced their legal liability.
I'm trying ZED too, which I believe as a commercial product comes with telemetry too.. but yeah, learning advanced rules of a personal firewall always helpful!
1. Try using pi-hole to block those particular endpoints via making DNS resolution fail; see if it still works if it can’t access the telemetry endpoints.
2. Their ridiculous tracking, disregard of the user preference to not send telemetry, and behavior on the Discord when you mentioned tracking says everything you need to know about the company. You cannot change them. If you don’t want to be tracked, then stay away from Bytedance.
The SNI extension is sent unencrypted as part of the ClientHello (first part of the TLS handshake). Any router along the way see the hostname that the client provides in the SNI data, and can/could drop the packet if they so choose.
On Apple devices, first-party applications get to circumvent LittleSnitch-like filtering. Presumably harder to hide this kind of activity on Linux, but then you need to have the expertise to be aware of the gaps. Docker still punches through your firewall configuration.
In fact, most web browsers are using DoH, so pihole is useless in that regard.
Although there are caveats -- if an app decides to use its own DNS server, sometimes secure DNS, you are still out of luck. I just recently discovered that Android webview may bypass whatever DNS your Wi-Fi points to.
For what it's worth, I do use Google products personally. But I won't go near Facebook, WhatsApp, or Instagram.
https://github.com/grafana/tempo/discussions/5001#discussion...
(Yes, that's for Grafana tempo, but the issue in `grafana/grafana` was just marked as duplicate of this.)
In this case, the software being analyzed is the alternative that sucks.
Unless you're somehow saying telemetry doesn't report anything about what a user is doing to it's home server.
In fact the Chinese entities are even less likely to share your secrets to your governement than their best friends at Google
Even if we interact with your rhetoric[1] at face value, there is a big difference between data going to your own elected government versus that of a foreign adversary.
[1] https://en.wikipedia.org/wiki/Whataboutism
But a foreign government is limited to what it can do to you if you are not a very high-value target.
So I try as much as possible to use software and services from a non-friendly government because this is the highest guarantee that my data will not be used against me in the future.
And since we can all agree that any data that is collected will end up with the government some way or another. Using forging software is the only real guarantee.
Unless the software is open source and its server is self-hosted, it should be considered Spyware.
It should be a crime for Google as well.
"Whataboutism" is a logical fallacy.
https://en.wikipedia.org/wiki/Whataboutism
"What about Google" is not a logical continuation of this discussion
Apple provides telemetry services that strips the IP before providing it to the app owners. Routing like this requires trust (just as a VPN does), but it's feasible.
Why is it relevant whether they provide it to app owners directly? The issue people have is the information is logged now and abused later, in whatever form.
So many US universities running such nodes, without ever getting legal troubles. Such lucky boys
Distinguishing factors from your example include
1. PII is actually encoded and handled by computer systems, not the mere capability for that to occur.
2. PII is actually sent off site, not merely able to be sent off site.
3. It doesn't assert that the PII is collected, which could imply storage, it merely asserts that it is sent as my original post does. We don't know whether or not it is stored after being received and processed.
No comments yet
Try talking to your users instead.
> The more users software has, less skills they have in average to accurately report any issues.
No amount of telemetry will solve that.
Any recommendations?
This seems like an easy win for a software project
JetBrains products. Can work fully offline and they don't send "telemetry" if you're a paying user: https://www.jetbrains.com/help/clion/settings-usage-statisti...
Because no other company was willing to spend enough money to reach critical mass other than Microsoft. VSCode became the dominant share of practically every language that it supported within 12-18 months of introduction.
This then allowed things like the Language Server Protocol which only exists because Microsoft reached critical mass and could cram it down everybody's throat.
Isn't that what VS Codium is for?
Either way it uses electron. Which I hate so much.
Sad to hear that. I really enjoyed VS Codium before I jumped full-time into Nova.
(Unsolicited plug: If you're looking for a Mac-native IDE, and your needs aren't too out-of-the-ordinary, Nova is worth a try. If nothing else, it's almost as fast as a TUI, and the price is fair.)
what other software packages have 200 year old jokes about them?
Microsoft is content with funding it, the price is your telemetry (for now).
For high quality development tools I use true FOSS; or I pay for my tools to avoid not knowing where the value is being extracted.
The price of VSCode is halo effect for Azure products
Specifically: the remote code extension, the C/C++ extension and the Python extension.
- popup context windows for docs (kind of there, but having to respect the default character grid makes them much less capable and usually they don't allow further interaction)
- contextual buttons on a line of code (sure, custom commands exist, but they're not discoverable)
- "minimap"
The "minimap" is the only one here that isn't native. You can also have the file tree on the left if you want. Most people tend to use NerdTree[4], but like with a lot of plugins, there's builtins that are just as good. Here's the help page for netrw[5], vim's native File Explorer
Btw, this all works in vim. No need for neovim for any of this stuff. Except for the debugger, this stuff has been here for quite some time. The debugger has been around as a plugin for awhile too. All this stuff has been here since I started using vim, which was over a decade ago (maybe balloons didn't have as good of an interface? Idk, it's been awhile)
[0] https://vimdoc.sourceforge.net/htmldoc/debugger.html
[1] https://vimdoc.sourceforge.net/htmldoc/options.html#'balloon...
[2] https://github.com/wfxr/minimap.vim
[3] https://github.com/preservim/tagbar
[4] https://github.com/preservim/nerdtree
[5] https://vimhelp.org/pi_netrw.txt.html#netrw
And are not interactive as far as I know. I've not seen a way to get a balloon on the type in another balloon and then go to the browser docs from a link in that.
> Do you mean something like this?
Yes, but that's still restricted to terminal characters (you could probably do something fancy with sixel, but still) - for larger files with big indents it's not useful anymore.
> contextual buttons on a line of code
For example options to refactor based on the current location. I could construct this manually from 3 different pieces, but this exists in other IDEs already integrated and configured by default. Basically where's the "extract this as named constant", "rename this type across the project" and others that I don't have to implement from scratch.
Second, sure, I refactor all the time. There's 3 methods I know. The best way is probably with bufdo and having all the files opened in a buffer (tabs, windows, or panes are not required). But I'm not sure why this is surprising. Maybe you don't know what ctags are? If not, they are what makes all that possible and I'd check them out because I think it will answer a lot of your questions.
Correct me if I'm wrong, but you are asking about "search and replace" right? I really do recommend reading about ctags and I think these two docs will give you answers to a lot more things that just this question[0,1]. Hell, there's even The Primeagen's refactoring plugin in case you wanted to do it another way that's not vim-native.But honestly, I really can't tell if you're just curious or trying to defend your earlier position. I mean if you're curious and want to learn more we can totally continue and I'm sure others would love to add more. And in that case I would avoid language like "vim doesn't" and instead phrase it as "can vim ___?", "how would I do ____ in vim?", or "I find ___ useful in VS code, how do people do this in vim?" Any of those will have the same result but not be aggressive. But if you're just trying to defend your position, well... Sun Tzu said you should know your enemy and I don't think you know your enemy.
[0] https://vim.fandom.com/wiki/Browsing_programs_with_tags
[1] https://vim.fandom.com/wiki/Search_and_replace_in_multiple_b...
[2] https://github.com/ThePrimeagen/refactoring.nvim
Popup context windows for docs are super good in neovim, I would make a bet that they are actually better than what you find in IDEs, because they can use treesitter for automatic syntax highlighting of example code. Not sure what you mean with further interaction.
Contextual buttons are named code actions, and are available, and there are like 4 minimap plugins to choose from.
How do I get a memory graph with custom event markers overlayed on it then? That's the default for VS for example.
vi? Good luck with that.
And I say that as an experienced vim user who used to tinker a bit.
Hell, I will even feel comfortable in a vi terminal, though that's extremely rare to actually find. Usually vi is just remapped to vim
Edit:
The git folder with *all* my dotfiles (which includes all my notes) is just 3M, so I can take it anywhere. If I install all the plugins and if I install all the vim plugins I currently have (which some are old and I don't use) the total is ~100M. So...
Here are the key findings:
1. Extreme Resource Consumption: Out of the box, Trae used 6.3x more RAM (~5.7 GB) and spawned 3.7x more processes (33 total) than a standard VSCode setup with the same project open. The team has since made improvements, but it's still significantly heavier.
2. Telemetry Opt-Out Doesn't Work (It Makes It Worse): I found Trae was constantly sending data to ByteDance servers (byteoversea.com). I went into the settings and disabled all telemetry. To my surprise, this didn't stop the traffic. In fact, it increased the frequency of batch data collection. The telemetry "off" switch appears to be purely cosmetic.
3. What's Being Sent: Even with telemetry "disabled," Trae sends detailed payloads including: Hardware specs (CPU, memory, etc.) Persistent user, device, and machine IDs OS version, app language, user name Granular usage data like time-on-ide, window focus state, and active file types.
4. Community Censorship: When I tried to discuss these findings on their official Discord, my posts were deleted and my account was muted for 7 days. It seems words like "track" trigger an automated gag rule, which prevents any real discussion about privacy.
I believe developers should be aware of this behavior. The combination of resource drain, non-functional privacy settings, and censorship of technical feedback is a major red flag. The full, detailed analysis with all the evidence (process lists, Fiddler captures, JSON payloads, and screenshots of the Discord moderation) is available at the link. Happy to answer any questions.
No comments yet
Opening IDEA after those three days was the same kind of feeling I imagine you’d get when you take off a too tight pair of shoes you’ve been trying to run a marathon in.
ymmv, of course, but for $dayjob I can’t even be arsed trying anything else at this point, it’s so ingrained I doubt it’ll be worth the effort switching.
Can you please expand on that? I have trouble understanding how telemetry helps me, as a user of the product, understand how the product works.
Or (if you're working lower level) you can see an obfuscated function is emitting telemetry, saying "User did X", then you can understand that the function is doing X.
That's not helping me, the user.
That's helping me, the developer.
> Or (if you're working lower level) you can see an obfuscated function is emitting telemetry, saying "User did X", then you can understand that the function is doing X.
Again, it helps me, the developer.
Neither of these help me, the user.
(or maybe you just have a similar writing style)
From what I've seen, people generally do not like reading a generated content, but every time I've seen the author come back and say "I used it because it isn't my main language" the community always takes back the criticism. So I'd just be upfront about it and get ahead of it.
Even if you don't agree with it, publishing AI generated content will exclude from ones audience the people who won't read AI generated content. It is a tradeoff one has to decide whether or not to make.
I'm sympathetic to someone who has to decide whether to publish in 'broken english' or to run it through the latest in grammar software. For my time, I far prefer the former (and have been consuming "broken english" for a long while, it's one of the beautiful things about the internet!)
No comments yet
It's clear that this isn't what OP was doing. The LLM was writing, not merely translating. dang put it well:
> we want people to speak in their own voice
https://news.ycombinator.com/item?id=44704054
Devil's advocate: why does it matter (apart from "it feels wrong")? As long as the conclusions are sound, why is it relevant whether AI helped with the writing of the report?
The last few sections could have been cut entirely and nothing would have been lost.
Edit: In the process of writing this comment, the author removed 2 sections (and added an LLM acknowledgement), of which I referred to in my previous statement. To the author, thank you for reducing the verbosity with that.
No comments yet
We've been reading highly-informative articles with "bad English" for decades. It's okay and good to write in English without perfect mastery of the language. I'd rather read the source, rather than the output of a txt2txt model.
* edit -- I want to clarify, I don't mean to imply that the author has ill will or intent to misinform. Rather, I intend to describe the pitfalls of using an LLM to adapt ones text, inadvertently adding a very strong flavor of spam to something that is not spam.
But I also think it's a different thing entirely. It's different being the sole reader of text produced by your students (with responsibility to read the text) compared to being someone using the internet choosing what to read.
simple as
Pretty much everyone has heuristics for content that feels like low quality garbage, and currently seeing the hallmarks of AI seems like a mostly reasonable one. Other heuristics are content filled with marketing speak, tons of typos, whatever.
I can't decide to read something because the conclusions are sound. I have to read the entire thing to find out if the conclusions are sound. What's more, if it's an LLM, it's going to try its gradient-following best to make unsound reasoning seem sound. I have to be an expert to tell that it is a moron.
I can't put that kind of work into every piece of worthless slop on the internet. If an LLM says something interesting, I'm sure a human will tell me about it.
The reason people are smelling LLMs everywhere is because LLMs are low-signal, high-effort. The disappointment one feels when a model starts going off the rails is conditioning people to detect and be repulsed by even the slightest whiff of a robotic word choice.
edit: I feel like we discovered the direction in which AGI lies but we don't have the math to make it converge, so every AI we make goes completely insane after being asked three to five questions. So we've created architectures where models keep copious notes about what they're doing, and we carefully watch them to see if they've gone insane yet. When they inevitably do, we quickly kill them, create a new one from scratch, and feed it the notes the old one left. AI slop reads like a dozen cycles of that. A group effort, created by a series of new hires, silently killed after a single interaction with the work.
TL;DR: Because of the bullshit asymmetry principle. Maybe the conclusions below are sound, have a read and try to wade through ;-)
Let us address the underlying assumptions and implications in the argument that the provenance of a report, specifically whether it was written with the assistance of AI, should not matter as long as the conclusions are sound.
This position, while intuitively appealing in its focus on the end result, overlooks several important dimensions of communication, trust, and epistemic responsibility. The process by which information is generated is not merely a trivial detail, it is a critical component of how that information is evaluated, contextualized, and ultimately trusted by its audience. The notion that it feels wrong is not simply a matter of subjective discomfort, but often reflects deeper concerns about transparency, accountability, and the potential for subtle biases or errors introduced by automated systems.
In academic, journalistic, and technical contexts, the methodology is often as important as the findings themselves. If a report is generated or heavily assisted by AI, it may inherit certain limitations, such as a lack of domain-specific nuance, the potential for hallucinated facts, or the unintentional propagation of biases present in the training data. Disclosing the use of AI is not about stigmatizing the tool, but about providing the audience with the necessary context to critically assess the reliability and limitations of the information presented. This is especially pertinent in environments where accuracy and trust are paramount, and where the audience may need to know whether to apply additional scrutiny or verification.
Transparency about the use of AI is a matter of intellectual honesty and respect for the audience. When readers are aware of the tools and processes behind a piece of writing, they are better equipped to interpret its strengths and weaknesses. Concealing or omitting this information, even unintentionally, can erode trust if it is later discovered, leading to skepticism not just about the specific report, but about the integrity of the author or institution as a whole.
This is not a hypothetical concern, there are numerous documented cases (eg in legal filings https://www.damiencharlotin.com/hallucinations/) where lack of disclosure about AI involvement has led to public backlash or diminished credibility. Thus, the call for transparency is not a pedantic demand, but a practical safeguard for maintaining trust in an era where the boundaries between human and machine-generated content are increasingly blurred.