Since the end of last September, I’ve been using the notepad feature as a long-term memory for hierarchical interests, item/environment backgrounds, NPC information, and dynamic setting supplements. Especially after learning to let the AI distinguish between static retained information and dynamically updated information, the experience has been quite good. Although sometimes updates still get overwritten, the retrieval is more reliable and stable than a knowledge base. I really like this feature!
I have two questions about retrieval.
The first is about the number of retrievals. I’ve enabled five Agents: query, character, item, custom, and scene. Sometimes I also enable others related to calendar schedules and livepage. I’ve created a total of 7 categories in the notepad, containing 100 notebooks. However, when checking the retrieval logs, each retrieval usually calls the API 5 times, but not always; sometimes it’s only 1 or 6 times. I have a prompt asking the AI to scan all the key names under the notepad categories with each response, then decide whether to retrieve and which key values to call, and report the scanning and retrieval situation in (although it sometimes fabricates). So, I’d like to ask the official team what determines the number of retrievals?
The second question is about tokens during retrieval. I remember seeing in Fangtang’s video tutorial that there are two types of API calls when using Agents: “by volume” and “by count.” The “by volume” option has caching and is relatively more economical, but in practice, both methods still call the same number of times, and the cost per call for “by volume” is about the same, with my wallet rapidly depleting at six times the speed Orz. The API also claims to have caching… What is the reason for this?
Thank you to all the teachers for your responses and exchanges!!
The update instructions for the notepad should be clearly written in the prompt. You need to get the original value first before updating to ensure that old data is not overwritten.
The retrieval of the notepad is full-scale, meaning every time you call it, your prompt will be sent to the API, which can be very token-consuming. The number of retrievals depends on the model’s mood, so it’s crucial to clearly write out the process and then verify the results in the chain of thought, such as how many key names are in the key list.
Currently, APIs charge based on the number of tokens, right? I haven’t encountered any that charge per call.
There are indeed useful prompt constraints (in the puzzle)! However, maybe the API isn’t very smart or the prompts aren’t very stable? Sometimes it still overrides and pretends it retained them.
I also have chain of thought verification (also in the puzzle), sometimes it’s honestly written out knowing it read it, sometimes it fabricates the quantity. But I don’t know the relationship with the number of calls.
Finally, the API, now it indeed charges per call, right? As long as streaming and Agent are enabled, you can call the notepad, as mentioned in Mr. Fangtang’s tutorial video.
Regarding the “update” command, it will still overwrite—this might be a model hallucination. You need to write a more detailed chain of thought, such as first displaying the old information with get, then adding new information, and finally using the “update” command.
If you use the agent’s tool call to “update” multiple keys, the token consumption and wait time will be overwhelming.
It depends on what you want to do with the notepad tool. If it’s just for recording character information, I suggest you use the secondary invocation function of the memory extraction feature. This function specifically updates the notepad through a sub-model. It doesn’t require the agent’s function; as long as the sub-model returns text in the correct format, Omate will automatically record it in the notepad.