Design goals

It's becoming clear that langchain's abstractions aren't the right thing. Langchain introduced abstractions that allow converting their way of interacting with LLMs to whichever underlying model is used. That's too much complexity overhead. We take a more pragmatic approach: Make it easy to work with current APIs. The problem right now isn't that we need future proof code, it's that we need good code quickly.

Specifically, goals include:

  • Make it easy to iterate on prompts separately from iterating on the code.
  • Allow changing of the model as easily as possible. Model can be specified by default in code and overridden by prompt, or the other way around.
  • Make it as easy as possible to send requests asynchronously without having to think about concurrency more than necessary
  • Handle rate limits in a smart way.

Status

Still lots to do, just asking for some initial feedback on:

  • spec template idea in general
  • naming!
  • ruby conventions i'm violating

Design Details

A GlimRequest represents a request to an LLM to perform a task. GlimRequest itself contains functionality and parameters that are common to all supported LLMs:

  • parameters like temperature, top_p, etc
  • the name of the llm to be used
  • code for handling erb templates
  • token counting and cost estimate code

To support functionality that is specific to some LLM APIs, there is, for each support LLM API, a GlimRequestDetails class.

Each GlimRequest can have a reference to a GlimRequestDetails object, to which it delegates any methods. The GlimRequest, potentially with support from a GlimRequestDetails object, has to meet one key responsibility: After it is created, it must at all times be able to provide a request_hash, which is a Hash that contains all of the data that needs to be sent to the LLM's API in order to submit the request.

Thus, the GlimRequest and GlimRequestDetails must, whenever the user make a modification to either, update its internal request_hash to stay consistent.

There is one tricky situation that is a bit annoying, but we decided to be pragmatic about it and tolerate some awkwardness: If you change the llm for a GlimRequest to an llm that requires a different GlimRequestDetails class, then the GlimRequestDetails will be replaced and any data in it is lost.

For example, when changing from "gpt-3.5-turbo" (ChatRequestDetails) to "claude-instant-1" (AnthropicRequestDetails), then the output_schema or function_object will of course be deleted. This is facilitated by the GlimRequest creating a new AnthropicRequestDetails instance; as it is created, it is responsible for making sure that the request_hash is accurate. In the other direction, changing from claude to GPT, similarly, a new ChatRequestDetails instance would be created.

Above we have described that (and how) a GlimRequest can always provide a request_hash object. The point of this is that

  • this hash is used for generating the cache key. If the hashes are identical, we don't need to contact the LLM API again, which saves time and money.
  • the corresponding GlimResponse class can call GlimRequest#request_hash to obtain the necessary data, and then it is responsibe for sending the request off to an LLM, as well as interpreting the response and making it accessible in a convenient way to the user.

There is one additional feature that is related: For each GlimRequest, there is a log directory, in which at any time there are several files that represent the content of the GlimRequest:

  • generic_request_params: temperature, llm_name, etc
  • prompt
  • template_text (if a template was used)
  • request_hash

And for ChatRequestDetails, also:

  • messages: the array of messages, up to and including the message that will be sent
  • output_schema.json

Once a response has been created, it would also contain:

  • raw_response.json: the exact reponse as received when making the LLM API call
  • completion.txt: just the completion that was generated by the LLM for this request

Running the code

Probably some annoying issues with paths and whatnot. Maybe best to just read the code without running it. Check out: test/* examples/*

License

The gem will be available as open source under the terms of the MIT License.

TODO

Cleanup before alpha

  • write a better README
  • make a proper gem

Features to add / change

(1) post processing, including a way to throw errors and perhaps have an errors/ subdir in the llm_logs -- maybe not needed because extract_data? or maybe support it only for json?

(2) AICallable

  • more data types: array, boolean?
  • allow changing the ai_name for the function, not just the args

(3)

  • request#response -- rename to make clear that it's async
  • ask for feedback on this one?

(4) support "continue" prompting, especially for claude; 2k tokens is not much

(5) make include_files and extract_files so that they can be model specific... for example, to use Anthropic's XML tag training

  • might need to get rid of the ruby anthropic API since it forces the \n\nHuman: pattern

(6) web view on the input and outputs for the llm?

(7) iterate on prompt to measure effectiveness. have LLM develop variations of prompt. try it out on the multi-file-generation.

(8) Token healing? https://github.com/guidance-ai/guidance


git pull for code_gen?