Llasa: Llama-Based Speech Synthesis

95 CalmStorm 9 5/1/2025, 4:43:37 PM llasatts.github.io ↗

Comments (9)

ks2048 · 2h ago
Odd that the page doesn't seem to link to either,

paper: https://arxiv.org/abs/2502.04128

github: https://github.com/zhenye234/LLaSA_training

CalmStorm · 5h ago
LLaSA is a simple framework for speech synthesis that employs a single-layer vector quantizer (VQ) codec and a single Transformer architecture to fully align with standard LLMs such as LLaMA.
WastedCucumber · 4h ago
Probably the title should have the correct capitalization then. Cause I was fully expecting a speech synthesis tool that sounded like llamas talking human language and now I'm bummed out!
mring33621 · 3h ago
the long 'uuuuhhhhhhh' from some of the lesser models is killing me.
jszymborski · 3h ago
based on the samples, it really seams like anything smaller than 3B is pretty useless.
hadlock · 2h ago
If you're doing a home lab voice assistant 1B is nice, because on a 12gb gpu you can run a moderately competent 7b LLM and two 1b models; 1 for speech to text and also text to speech, plus some for the wake word monitor. Maybe in a couple of years we can combine all this into a single ~8b model that runs efficiently on 12gb gpu. Nvidia doesn't seem very incentivized right now to sell consumer GPUs that can run all this on a single consumer grade chip when they're making so much money selling commercial grade 48gb cards.
StevenNunez · 4h ago
I can't wait see this integrated into Open WebUI! These sound amazing.
dheera · 2h ago
> employs a single-layer vector quantizer (VQ) codec and a single Transformer architecture to fully align

I really wish when new models were released that they would draw a diagram of all the layers and the tensor input and output sizes at each layer, with zoom in/out capabilities if needed using D3.js or whatever visualization framework if needed. Every single layer should be on there with its input and output sizes.

These one-sentence descriptions, and approximate block diagrams with arrows pointing at each other are never enough to understand how something is actually implemented.

exe34 · 2h ago
Sounds like a solid SaaS business plan!