Back in 2015 or so I was working for a startup trying to train language models for comprehension of clinical notes. We didn't have transformers then, but you could do the same "predict the next character" or "predict the next word" thing.
We had the worst problem with the models getting stuck and repeating themselves but these were character level models so they repeated gibberish or just a few words and wouldn't get stuck in such a ruminative depressive kind of dialogue the way that Gemini was doing there.
Back then it was obvious we had problems with coherence in terms of remembering that it was a 69-year old male patient named "Phil" and we talked to experts who'd say "we could give it a memory but it wouldn't help it understand better". In retrospect, transformers were the answer to the problems we were having at that company and at a later company I worked at that was using CNN models -- that and subword tokens.
incomingpain · 2h ago
I've had some issues that local llm couldnt handle, gemini 2.5 pro did manage to fix it, eventually, but it was a difficult problem.
Boy did it ever say stuff similar to this. Very odd and deprecating.
We had the worst problem with the models getting stuck and repeating themselves but these were character level models so they repeated gibberish or just a few words and wouldn't get stuck in such a ruminative depressive kind of dialogue the way that Gemini was doing there.
Back then it was obvious we had problems with coherence in terms of remembering that it was a 69-year old male patient named "Phil" and we talked to experts who'd say "we could give it a memory but it wouldn't help it understand better". In retrospect, transformers were the answer to the problems we were having at that company and at a later company I worked at that was using CNN models -- that and subword tokens.
Boy did it ever say stuff similar to this. Very odd and deprecating.