I like the idea of memory management. Maybe someone experienced in this stuff can help me with some questions:
Is it possible to use this concept to keep a very long session? Tell it to forget things, or replace some part of the memory, without "rebooting" (starting another instance or conversation)?
So far, I'm unable to find any information on how to do that. In the OS analogy, the stuff I found looks more like putting stuff on autoexec.bat to open on the next boot than proper management of memory during execution.
It looks more like "autoexec.bat engineering". Is that the same overall idea?
That time to first token is really expensive, like a reboot. Any tool to reduce it and keep the model running for longer would be a real breakthrough, but I haven't found any practical examples of it, just stuff that does "autoexec.bat" analogues.
wooders · 10h ago
I think the "memory blocks" are essentially what you are describing - to have an infinite session (which systems like Letta is designed for) you have to have a mechanism for organizing the important information and persisting it for future interactions. This organization can be done via tool calls (which was what MemGPT did) or done by other agents in the background. While the message buffer is continues to grow / old messages get evicted, the memory blocks are fixed size and always in context.
alganet · 10h ago
Your answer is too vague for the details I asked.
I could design an autoexec.bat to remember the programs that were opened after reboot, all automatically. If I open something, it goes there. If I close, I remove it from autoexec.bat. MacOS does this. But that's not really the persistence that saves me time and money. MacOS is good because _I rarely need to reboot it_, and the "reopen windows after reboot" option is barely used.
There's one question I placed there that perfectly encapsulates my doubts:
_Can I use this "context engineering" to mitigate the costs of the time for first token?_
If I cannot, then it's just like rebooting an OS, and it is merely the illusion of persistance. I can totally do this on my own just like I can craft hacky autoexec.bat scripts, nothing special about it.
I've seen attempts at doing "snapshotting" of parts of a GPU memory, which are similar to pausing a VM after boot and then restoring it. That's also not what I'm talking about, and it is just an optimization on the process of rebooting and does not improve much on the time for first token (there's a time penalty either way).
Is it possible to use this concept to keep a very long session? Tell it to forget things, or replace some part of the memory, without "rebooting" (starting another instance or conversation)?
So far, I'm unable to find any information on how to do that. In the OS analogy, the stuff I found looks more like putting stuff on autoexec.bat to open on the next boot than proper management of memory during execution.
It looks more like "autoexec.bat engineering". Is that the same overall idea?
That time to first token is really expensive, like a reboot. Any tool to reduce it and keep the model running for longer would be a real breakthrough, but I haven't found any practical examples of it, just stuff that does "autoexec.bat" analogues.
I could design an autoexec.bat to remember the programs that were opened after reboot, all automatically. If I open something, it goes there. If I close, I remove it from autoexec.bat. MacOS does this. But that's not really the persistence that saves me time and money. MacOS is good because _I rarely need to reboot it_, and the "reopen windows after reboot" option is barely used.
There's one question I placed there that perfectly encapsulates my doubts:
_Can I use this "context engineering" to mitigate the costs of the time for first token?_
If I cannot, then it's just like rebooting an OS, and it is merely the illusion of persistance. I can totally do this on my own just like I can craft hacky autoexec.bat scripts, nothing special about it.
I've seen attempts at doing "snapshotting" of parts of a GPU memory, which are similar to pausing a VM after boot and then restoring it. That's also not what I'm talking about, and it is just an optimization on the process of rebooting and does not improve much on the time for first token (there's a time penalty either way).