Refer to Model Configs for how to set the environment variables for your particular deployment.

Note: While we support self hosted LLMs, you will get significantly better responses with a more powerful model like GPT-4.

What is FastChat

FastChat is a way to easily host LLMs on cli, using their web client, or as an API server. For the Danswer use case we will focus on interfacing with the model through the API server. See here for more information:

In this case, we use LiteLLM’s custom model server option. See here for more information:

Set Danswer to use FastChat Server

# Pick any model served by FastChat
# Hint: To point Docker containers to http://localhost, use http://host.docker.internal
# Don't forget to include the /v1 below
GEN_AI_LLM_PROVIDER_TYPE=openai  # Since it's an OpenAI compatible API

# Let's also make some changes to accommodate the weaker locally hosted LLM
QA_TIMEOUT=120  # Set a longer timeout, running models on CPU can be slow
# Always run search, never skip
# Don't use LLM for reranking, the prompts aren't properly tuned for these models
# Don't try to rephrase the user query, the prompts aren't properly tuned for these models
# Don't use LLM to automatically discover time/source filters
# Use only 1 section from the documents and do not require quotes