Generating Pixels One by One

77 cyruseption 2 6/5/2025, 6:59:19 AM tunahansalih.github.io ↗

Comments (2)

akoboldfrying · 30d ago
Surely the ideal choice for the

> learned token embeddings

for the pixel intensities in M3 would be... a single scalar value between 0 and 1? (That is, a "vector" of length 1, consisting of just the original pixel intensity.)

Am I misunderstanding something? Or is this maybe a case of "Yes, you could just do that much simpler thing in this simplified example, but in general you need the complicated approach that we're trying to demonstrate"?

archermarks · 29d ago
Really nice article! This clarified a lot to me about how positional embeddings work and how other useful context might be embedded in other models.