Performance and Telemetry Analysis of Trae IDE, ByteDance's VSCode Fork
Here are the key findings:
1. Extreme Resource Consumption: Out of the box, Trae used 6.3x more RAM (~5.7 GB) and spawned 3.7x more processes (33 total) than a standard VSCode setup with the same project open. The team has since made improvements, but it's still significantly heavier.
2. Telemetry Opt-Out Doesn't Work (It Makes It Worse): I found Trae was constantly sending data to ByteDance servers (byteoversea.com). I went into the settings and disabled all telemetry. To my surprise, this didn't stop the traffic. In fact, it increased the frequency of batch data collection. The telemetry "off" switch appears to be purely cosmetic.
3. What's Being Sent: Even with telemetry "disabled," Trae sends detailed payloads including: Hardware specs (CPU, memory, etc.) Persistent user, device, and machine IDs OS version, app language, user name Granular usage data like time-on-ide, window focus state, and active file types.
4. Community Censorship: When I tried to discuss these findings on their official Discord, my posts were deleted and my account was muted for 7 days. It seems words like "track" trigger an automated gag rule, which prevents any real discussion about privacy.
I believe developers should be aware of this behavior. The combination of resource drain, non-functional privacy settings, and censorship of technical feedback is a major red flag. The full, detailed analysis with all the evidence (process lists, Fiddler captures, JSON payloads, and screenshots of the Discord moderation) is available at the link. Happy to answer any questions.
https://theia-ide.org/
It was rough a few years ago, but nowadays it's pretty nice. TI rebuilt their Code Composer Studio using Theia so it does have some larger users. It has LSP support and the same Monaco editor backend - which is all I need.
It's VSCode-with-an-Eclipse-feel to it - which might or might not be your cup of tea, but it's an alternative.
Also it used to be kinda heavy, but it became lighter because of Moore's law and good code management practices all over the board.
I'm planning to deploy Theia in its web based form if possible, but still didn't have the time to tinker with that one.
This is not because it is better and I've seen no inclination that it would somehow be more private or secure, but most enterprises already share their proprietary data with AWS and have an agreement with AWS that their TAMs will gladly usher Kiro usage under.
Interesting to distinguish that privacy/security as it relates to individuals is taken at face value, while when it relates to corporations it is taken at disclosure value.
Your analysis is thorough, and I wonder if their reduction of processes from 33 to 20...(WOW) had anything to do with moving telemetry logic elsewhere (hence increased endpoint activity).
What does Bytedance say regarding all this?
I'm trying ZED too, which I believe as a commercial product comes with telemetry too.. but yeah, learning advanced rules of a personal firewall always helpful!
Dang said a similarly small minority of users here do all the commenting.
Every single person has "something to hide", and that's normal. It's normal to not want your messages snooped through. It doesn't mean you're a criminal, or even computer-saavy.
Like the Casio watches, travelling to Syria, using Tor, Protonmail, etc…
When it is better in reality to have a regular watch, a Gmail with encrypted .zip files or whatever, etc.
It does not mean you are a criminal if you have that Casio watch, but if you have this, plus encrypted emails, plus travel to some countries as a tourist, you are almost certain to put yourself in trouble for nothing, while you tried to protect yourself.
And if you are a criminal, you will put yourself in trouble too, also for nothing, while you tried to protect yourself.
This was the basis of Xkeyscore, and all of that to say that Signal is one very good signal that the person may be interesting.
1. Try using pi-hole to block those particular endpoints via making DNS resolution fail; see if it still works if it can’t access the telemetry endpoints.
2. Their ridiculous tracking, disregard of the user preference to not send telemetry, and behavior on the Discord when you mentioned tracking says everything you need to know about the company. You cannot change them. If you don’t want to be tracked, then stay away from Bytedance.
For what it's worth, I do use Google products personally. But I won't go near Facebook, WhatsApp, or Instagram.
On Apple devices, first-party applications get to circumvent LittleSnitch-like filtering. Presumably harder to hide this kind of activity on Linux, but then you need to have the expertise to be aware of the gaps. Docker still punches through your firewall configuration.
In fact, most web browsers are using DoH, so pihole is useless in that regard.
Although there are caveats -- if an app decides to use its own DNS server, sometimes secure DNS, you are still out of luck. I just recently discovered that Android webview may bypass whatever DNS your Wi-Fi points to.
In this case, the software being analyzed is the alternative that sucks.
Try talking to your users instead.
> The more users software has, less skills they have in average to accurately report any issues.
No amount of telemetry will solve that.
Unless you're somehow saying telemetry doesn't report anything about what a user is doing to it's home server.
In fact the Chinese entities are even less likely to share your secrets to your governement than their best friends at Google
"What about Google" is not a logical continuation of this discussion
Even if we interact with your rhetoric[1] at face value, there is a big difference between data going to your own elected government versus that of a foreign adversary.
[1] https://en.wikipedia.org/wiki/Whataboutism
It should be a crime for Google as well.
"Whataboutism" is a logical fallacy.
https://en.wikipedia.org/wiki/Whataboutism
Apple provides telemetry services that strips the IP before providing it to the app owners. Routing like this requires trust (just as a VPN does), but it's feasible.
So many US universities running such nodes, without ever getting legal troubles. Such lucky boys
No comments yet
Any recommendations?
This seems like an easy win for a software project
Microsoft is content with funding it, the price is your telemetry (for now).
For high quality development tools I use true FOSS; or I pay for my tools to avoid not knowing where the value is being extracted.
The price of VSCode is halo effect for Azure products
JetBrains products. Can work fully offline and they don't send "telemetry" if you're a paying user: https://www.jetbrains.com/help/clion/settings-usage-statisti...
Isn't that what VS Codium is for?
Either way it uses electron. Which I hate so much.
Sad to hear that. I really enjoyed VS Codium before I jumped full-time into Nova.
(Unsolicited plug: If you're looking for a Mac-native IDE, and your needs aren't too out-of-the-ordinary, Nova is worth a try. If nothing else, it's almost as fast as a TUI, and the price is fair.)
what other software packages have 200 year old jokes about them?
- popup context windows for docs (kind of there, but having to respect the default character grid makes them much less capable and usually they don't allow further interaction)
- contextual buttons on a line of code (sure, custom commands exist, but they're not discoverable)
- "minimap"
Popup context windows for docs are super good in neovim, I would make a bet that they are actually better than what you find in IDEs, because they can use treesitter for automatic syntax highlighting of example code. Not sure what you mean with further interaction.
Contextual buttons are named code actions, and are available, and there are like 4 minimap plugins to choose from.
The "minimap" is the only one here that isn't native. You can also have the file tree on the left if you want. Most people tend to use NerdTree[4], but like with a lot of plugins, there's builtins that are just as good. Here's the help page for netrw[5], vim's native File Explorer
Btw, this all works in vim. No need for neovim for any of this stuff. Except for the debugger, this stuff has been here for quite some time. The debugger has been around as a plugin for awhile too. All this stuff has been here since I started using vim, which was over a decade ago (maybe balloons didn't have as good of an interface? Idk, it's been awhile)
[0] https://vimdoc.sourceforge.net/htmldoc/debugger.html
[1] https://vimdoc.sourceforge.net/htmldoc/options.html#'balloon...
[2] https://github.com/wfxr/minimap.vim
[3] https://github.com/preservim/tagbar
[4] https://github.com/preservim/nerdtree
[5] https://vimhelp.org/pi_netrw.txt.html#netrw
vi? Good luck with that.
And I say that as an experienced vim user who used to tinker a bit.
Hell, I will even feel comfortable in a vi terminal, though that's extremely rare to actually find. Usually vi is just remapped to vim
Edit:
The git folder with *all* my dotfiles (which includes all my notes) is just 3M, so I can take it anywhere. If I install all the plugins and if I install all the vim plugins I currently have (which some are old and I don't use) the total is ~100M. So...
Here are the key findings:
1. Extreme Resource Consumption: Out of the box, Trae used 6.3x more RAM (~5.7 GB) and spawned 3.7x more processes (33 total) than a standard VSCode setup with the same project open. The team has since made improvements, but it's still significantly heavier.
2. Telemetry Opt-Out Doesn't Work (It Makes It Worse): I found Trae was constantly sending data to ByteDance servers (byteoversea.com). I went into the settings and disabled all telemetry. To my surprise, this didn't stop the traffic. In fact, it increased the frequency of batch data collection. The telemetry "off" switch appears to be purely cosmetic.
3. What's Being Sent: Even with telemetry "disabled," Trae sends detailed payloads including: Hardware specs (CPU, memory, etc.) Persistent user, device, and machine IDs OS version, app language, user name Granular usage data like time-on-ide, window focus state, and active file types.
4. Community Censorship: When I tried to discuss these findings on their official Discord, my posts were deleted and my account was muted for 7 days. It seems words like "track" trigger an automated gag rule, which prevents any real discussion about privacy.
I believe developers should be aware of this behavior. The combination of resource drain, non-functional privacy settings, and censorship of technical feedback is a major red flag. The full, detailed analysis with all the evidence (process lists, Fiddler captures, JSON payloads, and screenshots of the Discord moderation) is available at the link. Happy to answer any questions.
Also, if you’d like to hide the fact that you’re from Poland or speak Polish, you should be more careful about what’s visible on your screenshots :)
Opening IDEA after those three days was the same kind of feeling I imagine you’d get when you take off a too tight pair of shoes you’ve been trying to run a marathon in.
ymmv, of course, but for $dayjob I can’t even be arsed trying anything else at this point, it’s so ingrained I doubt it’ll be worth the effort switching.
Can you please expand on that? I have trouble understanding how telemetry helps me, as a user of the product, understand how the product works.
Or (if you're working lower level) you can see an obfuscated function is emitting telemetry, saying "User did X", then you can understand that the function is doing X.
(or maybe you just have a similar writing style)
From what I've seen, people generally do not like reading a generated content, but every time I've seen the author come back and say "I used it because it isn't my main language" the community always takes back the criticism. So I'd just be upfront about it and get ahead of it.
It's clear that this isn't what OP was doing. The LLM was writing, not merely translating. dang put it well:
> we want people to speak in their own voice
https://news.ycombinator.com/item?id=44704054
Even if you don't agree with it, publishing AI generated content will exclude from ones audience the people who won't read AI generated content. It is a tradeoff one has to decide whether or not to make.
I'm sympathetic to someone who has to decide whether to publish in 'broken english' or to run it through the latest in grammar software. For my time, I far prefer the former (and have been consuming "broken english" for a long while, it's one of the beautiful things about the internet!)
Devil's advocate: why does it matter (apart from "it feels wrong")? As long as the conclusions are sound, why is it relevant whether AI helped with the writing of the report?
The last few sections could have been cut entirely and nothing would have been lost.
Edit: In the process of writing this comment, the author removed 2 sections (and added an LLM acknowledgement), of which I referred to in my previous statement. To the author, thank you for reducing the verbosity with that.
No comments yet
We've been reading highly-informative articles with "bad English" for decades. It's okay and good to write in English without perfect mastery of the language. I'd rather read the source, rather than the output of a txt2txt model.
* edit -- I want to clarify, I don't mean to imply that the author has ill will or intent to misinform. Rather, I intend to describe the pitfalls of using an LLM to adapt ones text, inadvertently adding a very strong flavor of spam to something that is not spam.
But I also think it's a different thing entirely. It's different being the sole reader of text produced by your students (with responsibility to read the text) compared to being someone using the internet choosing what to read.
simple as
Pretty much everyone has heuristics for content that feels like low quality garbage, and currently seeing the hallmarks of AI seems like a mostly reasonable one. Other heuristics are content filled with marketing speak, tons of typos, whatever.
I can't decide to read something because the conclusions are sound. I have to read the entire thing to find out if the conclusions are sound. What's more, if it's an LLM, it's going to try its gradient-following best to make unsound reasoning seem sound. I have to be an expert to tell that it is a moron.
I can't put that kind of work into every piece of worthless slop on the internet. If an LLM says something interesting, I'm sure a human will tell me about it.
The reason people are smelling LLMs everywhere is because LLMs are low-signal, high-effort. The disappointment one feels when a model starts going off the rails is conditioning people to detect and be repulsed by even the slightest whiff of a robotic word choice.
edit: I feel like we discovered the direction in which AGI lies but we don't have the math to make it converge, so every AI we make goes completely insane after being asked three to five questions. So we've created architectures where models keep copious notes about what they're doing, and we carefully watch them to see if they've gone insane yet. When they inevitably do, we quickly kill them, create a new one from scratch, and feed it the notes the old one left. AI slop reads like a dozen cycles of that. A group effort, created by a series of new hires, silently killed after a single interaction with the work.
TL;DR: Because of the bullshit asymmetry principle. Maybe the conclusions below are sound, have a read and try to wade through ;-)
Let us address the underlying assumptions and implications in the argument that the provenance of a report, specifically whether it was written with the assistance of AI, should not matter as long as the conclusions are sound.
This position, while intuitively appealing in its focus on the end result, overlooks several important dimensions of communication, trust, and epistemic responsibility. The process by which information is generated is not merely a trivial detail, it is a critical component of how that information is evaluated, contextualized, and ultimately trusted by its audience. The notion that it feels wrong is not simply a matter of subjective discomfort, but often reflects deeper concerns about transparency, accountability, and the potential for subtle biases or errors introduced by automated systems.
In academic, journalistic, and technical contexts, the methodology is often as important as the findings themselves. If a report is generated or heavily assisted by AI, it may inherit certain limitations, such as a lack of domain-specific nuance, the potential for hallucinated facts, or the unintentional propagation of biases present in the training data. Disclosing the use of AI is not about stigmatizing the tool, but about providing the audience with the necessary context to critically assess the reliability and limitations of the information presented. This is especially pertinent in environments where accuracy and trust are paramount, and where the audience may need to know whether to apply additional scrutiny or verification.
Transparency about the use of AI is a matter of intellectual honesty and respect for the audience. When readers are aware of the tools and processes behind a piece of writing, they are better equipped to interpret its strengths and weaknesses. Concealing or omitting this information, even unintentionally, can erode trust if it is later discovered, leading to skepticism not just about the specific report, but about the integrity of the author or institution as a whole.
This is not a hypothetical concern, there are numerous documented cases (eg in legal filings https://www.damiencharlotin.com/hallucinations/) where lack of disclosure about AI involvement has led to public backlash or diminished credibility. Thus, the call for transparency is not a pedantic demand, but a practical safeguard for maintaining trust in an era where the boundaries between human and machine-generated content are increasingly blurred.