Refer to Model Configs for how to set the environment variables for your particular deployment.

Note: While we support local LLMs, you will get significantly better responses with a more powerful model like GPT-4.

What is GPT4All

GPT4All provides a way to run LLMs (closed and opensource) by calling APIs or running in memory. For self-hosted models, GPT4All offers models that are quantized or running with reduced float precision. Both of these are ways to compress models to run on weaker hardware at a slight cost in model capabilities.

GPT4All provides a Python wrapper which Danswer uses to run the models in same container as the Danswer API Server.

Because GPT4All is not compatible with certain architectures, Danswer does not package it by default. You will have to install it in your deployment by uncommenting gpt4all==2.0.2 in danswer/backend/requirements/default.txt and rebuilding with GPT4All installed.

Note: Despite GPT4All offering quantized models, it is still significantly slower than models fully hosted on GPUs. If you’re running the models purely on CPU, there may be significant delay to processing the context documents and in generating answers.

Set Danswer to use GPT4All

GEN_AI_MODEL_VERSION=mistral-7b-openorca.Q4_0.gguf  # Or any other GPT4All model

# Let's also make some changes to accommodate the weaker locally hosted LLM
QA_TIMEOUT=120  # Set a longer timeout, running models on CPU can be slow
# Always run search, never skip
# Don't use LLM for reranking, the prompts aren't properly tuned for these models
# Don't try to rephrase the user query, the prompts aren't properly tuned for these models
# Don't use LLM to automatically discover time/source filters
# Use only 1 section from the documents and do not require quotes