The personalization examples are compelling — especially how one change in a cart shifts the recommendation path. Have you quantified how early in a sequence you can steer outcomes reliably? That could be a huge insight for anyone designing nudges or interventions.
rgabrielsson · 3d ago
Transformer-based model with 150 M params, trained on 15 B purchases, and showed 10× lift in conversion,
Qhx42 · 3d ago
where the line is between a "behavioral foundation model" and an LLM with behavioral pretraining. Is it the data, architecture, or something else?
rgabrielsson · 3d ago
Great question! It's three things:
1) Data. Every next event can be one out of billions of potential actions, so our vocabulary are in billions instead of in 100K as for LLMs
2) Architecture. Every action is multimodal, it consist of images, text, and numbers, so we need an architecture that is optimized for that.
3) Loss function. Learning efficiently on behavior where the data, downstream tasks, and bottlenecks are different requires different loss functions for it to work well
Qhx42 · 3d ago
If youre using a transformer model, why reinvent the stack?
LLMs adapted to code, speech, proteins, all with custom vocab, modalities, and objectives. Why is behavior a domain that requires a whole new foundation, instead of fine-tuning?
LLMs adapted to code, speech, proteins, all with custom vocab, modalities, and objectives. Why is behavior a domain that requires a whole new foundation, instead of fine-tuning?
What breaks if you build on existing LLM infra?