Silent speaking is a conscious effort to say a word, characterized by subtle movements of internal speech organs without actually voicing it. The process results in signals from your brain to your muscles which are picked up as neuromuscular signals and processed by our device.
We're shocked by the generative stuff, but AI's ability to extract signal from noise is another big leap in tech. Especially in surveillance tech.
stevage · 48m ago
The great thing about a product like this is that it's so easy to fake it in video.
I don't really buy that typing speed is a bottleneck for most people. We can't actually think all that fast. And I suspect AI is doing a lot of filling in the gaps here.
It might have some niche use cases, like being able to use your phone while cycling.
Bjartr · 30m ago
Personal anecdote: I do find typing to be a bottleneck in situations where typing speed is valuable (so notes in meetings, not when coding).
I can break 100wpm, especially if I accept typos. It's still much, much slower to type than I can think.
robofanatic · 13m ago
> notes in meetings
That’s already solved by AI, if you let AI listen to your meetings.
j45 · 14m ago
Speech to text can be 130-200 wpm.
Also, keybr.com helps speed up typing if you were thinking about it.
blixt · 48m ago
I found it interesting that in the segment where two people were communicating "telepathically", they seem to be producing text, which is then put through text-to-speech (using what appeared to be a voice trained on their own -- nice touch).
I have to wonder, if they have enough signal to produce what essentially looks like speech-to-text (without the speech), wouldn't it be possible to use the exact same signal to directly produce the synthesized speech? It could also lower latency further by not needing extra surrounding context for the text to be pronounced correctly.
pedalpete · 1h ago
I'd love to get a better understanding of the technology this is built with (without sitting through an exceedingly long video).
I suspect it's EMG though muscles in the ear and jaw bone, but that seems too rudimentary.
The TED talk describes a system which includes sensors on the chin across the jaw bone, but the demo obviously has removed that sensor.
ilaksh · 43m ago
Maybe they have combined an LLM or something with the speech detection convolution layers or whatever they were doing. Like with JSON schemas constraining the set of available tokens that are used for structured outputs. Except the set of tokens comes from the top 3-5 words that their first analysis/network decided are the most likely. So with that smarter system they can get by with fewer electrodes in a smaller area at the base of the skull where cranial nerves for the face and tongue emerge from the brainstem.
jackthetab · 1h ago
Thirteen minutes is an "exceedingly long video"?! Man, I thought I was jaded complaining about 20 minute videos! :-)
I want to know is what are the connected to? A laptop? A AS400? An old Cray they have lying around? I'd think doing the demo while walking would have been de riguer.
The accuracy is going to be the real make or break for this. In a paper from 2018 they reported 92% word accuracy [1]. That's a lifetime ago for ML but they were also using five facial electrodes where now it looks confined to around the ears. If the accuracy was great today they would report it. In actual use I can see even 99% being pretty annoying and 95% being almost unusable (for people who can speak normally).
The presentation of this product reminds me of peak crypto when a 'white paper' and a two-page website was all anyone needed to get bamboozled into handing their money over.
tibbon · 25m ago
So... Sub-Etha?
zknowledge · 1h ago
either this is the world's biggest grift OR the 2nd greatest product of the 21st century... so far.
lukebechtel · 1h ago
been waiting for something like this. Looking forward to adoption!
We're shocked by the generative stuff, but AI's ability to extract signal from noise is another big leap in tech. Especially in surveillance tech.
I don't really buy that typing speed is a bottleneck for most people. We can't actually think all that fast. And I suspect AI is doing a lot of filling in the gaps here.
It might have some niche use cases, like being able to use your phone while cycling.
I can break 100wpm, especially if I accept typos. It's still much, much slower to type than I can think.
That’s already solved by AI, if you let AI listen to your meetings.
Also, keybr.com helps speed up typing if you were thinking about it.
I have to wonder, if they have enough signal to produce what essentially looks like speech-to-text (without the speech), wouldn't it be possible to use the exact same signal to directly produce the synthesized speech? It could also lower latency further by not needing extra surrounding context for the text to be pronounced correctly.
I suspect it's EMG though muscles in the ear and jaw bone, but that seems too rudimentary.
The TED talk describes a system which includes sensors on the chin across the jaw bone, but the demo obviously has removed that sensor.
I want to know is what are the connected to? A laptop? A AS400? An old Cray they have lying around? I'd think doing the demo while walking would have been de riguer.
Anyway, tres cool!
https://www.media.mit.edu/projects/alterego/overview/
check also the publications tab, and this pr:
https://docsend.com/view/dmda8mqzhcvqrkrk/d/fjr4nnmzf9jnjzgw
[1] https://www.media.mit.edu/publications/alterego-IUI/
https://www.media.mit.edu/projects/alterego/overview/
adding also their press release here:
https://docsend.com/view/dmda8mqzhcvqrkrk/d/fjr4nnmzf9jnjzgw