• BlackLaZoR@lemmy.world
    link
    fedilink
    English
    arrow-up
    40
    ·
    16 hours ago

    Just to make things clear: API access to most models is charged per input tokens + output tokens. It means that the longer your conversation is, the more you pay for every new answer. Single prompt with no context and 100 tokens of answer is cheap. Single prompt with 100k tokens of context and 100 tokens of answer is NOT cheap.

    Extremely long conversations with most expensive top of the line models can absolutely demolish your budget.

    • perviouslyiner@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      ·
      15 hours ago

      does it give the full history to the LLM each time?

      Last time I tried implementing something like this, it suggested to have a rolling window of history so that it takes into account your last X messages but not the entire conversation.

      (I guess this is what ollama calls “context length”?)

      • BlackLaZoR@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 hours ago

        does it give the full history to the LLM each time?

        It’s limited to the context size supported by given model. You can give the model 100k tokens of history but if it’s configured for less, it will just truncate it before processing (usually by removing oldest tokens first)

      • Sabata@ani.social
        link
        fedilink
        English
        arrow-up
        7
        ·
        edit-2
        15 hours ago

        You send the entire history for that conversation every time and likely more if its getting info from tools. If its not in the context the model dose not see it unless you have a memory system that dose something like feeding in summaries of past conversations that also takes up tokens and context. Rolling drops old messages to not reach context limits but you can lose important info or get odd results. If the history gets bigger than the context things break or slow way down.

        • perviouslyiner@lemmy.world
          link
          fedilink
          English
          arrow-up
          9
          ·
          14 hours ago

          presumably this is why Claude periodically writes its conclusions so far into a text file that it can read later instead of having to remember everything. Sounds like an interesting approach.