>Recently, we also trained an experimental social model which was well received by our community. However, this was not trained on Discord API data. In fact, we would have no use of such data. Shapes have exchanged millions of messages through our website, through X, through email and other experimental integrations. We used a small anonymised dataset of prompts between users and shapes on these non-Discord platforms to train this model. We have always used Discord's API data as directed by users, to enable interactions with their Shapes.
not sure what's meant here? so you meant you used data collected from other platforms to train model but not discord one? if so why is that even valid coz you confirm that most of users traffic from discord. it would be more better off if you've released detailed research paper to cover the details of how you anonymised the data no matter what platform you took the data. what if someother integration platforms you mentioned above wanted to sue shapes again?
matthewsh · 6h ago
Valid point tbh, if they said they were training off of data and never explicitly stated what data sources they were using then Discord should be concerned of that violation. Would also love to see that announcement from them about it. If the announcement was made in Discord then that only solidifies the reason for Discord to be concerned.
matthewsh · 8h ago
Correct me if I'm wrong but didn't they say they were basically gonna train their LLMs of message data?
Which states:
"What’s more, community members have already interacted with Shapes enough to trigger millions of messages over the short, several month duration that the product has been in beta. We believe this head start in an emergent market will further enrich the conversation datasets which power Circle’s NPCs, and serve as a competitive moat over time."
This is an old post though, so it's philosophy could've changed, but even back then stating something like that is concerning. I do feel like it's worth calling out that the Discord developer policy did not explicitly state this until the 2024 policy, but it's been in effect since July 8th, 2024...so they had plenty of time to stop training their "shapes" on the user data before this happened and it seems they've been in contact with them before too so they could've just gotten clarification on it or just asked for permission.
Complete side-note it bothers me how they're using all these examples of people who use these "shapes" as emotional support and basically therapists as a way to "strengthen" their argument when IMO it weakens it. If so many people are reliant on robots and code for emotional support then they need to seek help or seek real, human connection. It's not healthy to talk to these "shapes" all day. What's even more concerning is that trauma you're dumping is then being used to "enrich the conversation datasets."
ETA: I also think Discord is probably taking action now so they can release their own version later on without any competition, but this still could've been mitigated. Even if they got them on the tokens aspect they could've had a really strong argument considering that's what they were advised to do.
Liftyee · 9h ago
Anecdotally, Discord seems to have a history of questionable inconsistent behaviour like this. Watch them roll out an equivalent/competing feature next week along with making the UI worse as usual.
not sure what's meant here? so you meant you used data collected from other platforms to train model but not discord one? if so why is that even valid coz you confirm that most of users traffic from discord. it would be more better off if you've released detailed research paper to cover the details of how you anonymised the data no matter what platform you took the data. what if someother integration platforms you mentioned above wanted to sue shapes again?
I know that there was a post here: https://medium.com/lightspeed-venture-partners/circle-labs-t...
Which states: "What’s more, community members have already interacted with Shapes enough to trigger millions of messages over the short, several month duration that the product has been in beta. We believe this head start in an emergent market will further enrich the conversation datasets which power Circle’s NPCs, and serve as a competitive moat over time."
This is an old post though, so it's philosophy could've changed, but even back then stating something like that is concerning. I do feel like it's worth calling out that the Discord developer policy did not explicitly state this until the 2024 policy, but it's been in effect since July 8th, 2024...so they had plenty of time to stop training their "shapes" on the user data before this happened and it seems they've been in contact with them before too so they could've just gotten clarification on it or just asked for permission.
Complete side-note it bothers me how they're using all these examples of people who use these "shapes" as emotional support and basically therapists as a way to "strengthen" their argument when IMO it weakens it. If so many people are reliant on robots and code for emotional support then they need to seek help or seek real, human connection. It's not healthy to talk to these "shapes" all day. What's even more concerning is that trauma you're dumping is then being used to "enrich the conversation datasets."
ETA: I also think Discord is probably taking action now so they can release their own version later on without any competition, but this still could've been mitigated. Even if they got them on the tokens aspect they could've had a really strong argument considering that's what they were advised to do.