Refer to Model Configs for how to set the environment variables for your particular deployment.
Note: While we support self hosted LLMs, you will get significantly better responses with a more powerful model like GPT-4.
What is FastChat
FastChat is a way to easily host LLMs on cli, using their web client, or as an API server. For the Danswer use case we will focus on interfacing with the model through the API server. See here for more information: https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md
In this case, we use LiteLLM’s custom model server option. See here for more information: https://litellm.vercel.app/docs/providers/custom_openai_proxy
Set Danswer to use FastChat Server
# Pick any model served by FastChat
# Hint: To point Docker containers to http://localhost, use http://host.docker.internal
# Don't forget to include the /v1 below
GEN_AI_LLM_PROVIDER_TYPE=openai # Since it's an OpenAI compatible API
# Let's also make some changes to accommodate the weaker locally hosted LLM
QA_TIMEOUT=120 # Set a longer timeout, running models on CPU can be slow
# Use only 1 section from the documents and do not require quotes